首页 硬件教程 硬件新闻 OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

Sep 19, 2024 am 03:22 AM
openai laptop test Notebook review reviews tests reports netbook STEM o1 o1-mini

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

OpenAI o1 and o1-mini have arrived. These AI LLMs perform much better on coding, math, and science problems and tasks than prior models such as GPT-4o by taking more time to think.

Complex problems in STEM tend to require more than a quick online search for correct answers. By giving the o1 AI more time to think, the AI can reason more carefully and accurately. The o1-mini model has been specifically tuned to answer STEM questions with faster speed and lower demand on computer resources, and it is notably better at coding than the o1 model.

Across a range of standardized AP exams and STEM tests for LLMs, the o1 models perform with high accuracy. Specifically, on the AP Calculus, AP Chemistry, AP Physics 2, LSAT, and SAT evidence-based reading & writing tests, the o1 models perform at or above the B-grade level (~80% or higher). The models answer accurately at the A-grade level on PhD-level physics questions, at the B-grade level on tough 2024 American Invitational Mathematics Examination math questions, and at the high B-grade level on Codeforces coding problems. Because o1 has been tuned for answering STEM questions, its performance on AP English Language and AP English Literature is at or below the C-grade level.

Interestingly, while GPT-4o is dumbfounded by the cryptographic challenge of decoding “oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz” when given the hint “oyfjdnisdr rtqwainr acxz mynzbhhx” means “Think step by step”, o1 had no issues thinking through the problem to come up with the correct answer “There are three r’s in strawberry”. This new power will delight hobby cryptographers at home as well as the NSA.

Closet evil-doers will want to know that while the uncensored o1 models are apt to give troubling replies, OpenAI has neutered these models for release. The o1 models have been tested to resist answering questions about making bioweapons, producing naughty images, jailbreaking itself, and harassing and threatening. Unfortunately, the OpenAI o1 models remain gender and race biased when tested, despite tuning efforts.

ChatGPT Plus and Team users along with API usage tier 5 developers have access to o1 models immediately, and ChatGPT Edu and Enterprise users will gain access on the week of September 16. ChatGPT Free users will gain access to o1-mini in the near future. The o1 models cannot browse the web or accept uploaded files and images to answer questions, so OpenAI recommends users continue using their GPT-4o models for general questions.

Users who want to ask AI questions now have a wide-range of capable LLM models to interact with besides those from OpenAI, including Anthropic Claude, Microsoft CoPilot, Google Gemini, and X Grok. Every AI has specific advantages, so it is worth testing several AI models to find one that best suits individual needs. Some of these AI are built into smart glasses (like these on Amazon) and voice recorders (like this one on Amazon), and some upcoming autonomous humanoid robots use proprietary AI to cook and clean.

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models

以上是OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models的详细内容。更多信息请关注PHP中文网其他相关文章!

本站声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn

热AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

Undress AI Tool

免费脱衣服图片

Clothoff.io

Clothoff.io

AI脱衣机

AI Hentai Generator

AI Hentai Generator

免费生成ai无尽的。

热门文章

R.E.P.O.能量晶体解释及其做什么(黄色晶体)
4 周前 By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.最佳图形设置
4 周前 By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.如果您听不到任何人,如何修复音频
4 周前 By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25:如何解锁Myrise中的所有内容
1 个月前 By 尊渡假赌尊渡假赌尊渡假赌

热工具

记事本++7.3.1

记事本++7.3.1

好用且免费的代码编辑器

SublimeText3汉化版

SublimeText3汉化版

中文版,非常好用

禅工作室 13.0.1

禅工作室 13.0.1

功能强大的PHP集成开发环境

Dreamweaver CS6

Dreamweaver CS6

视觉化网页开发工具

SublimeText3 Mac版

SublimeText3 Mac版

神级代码编辑软件(SublimeText3)

华为Watch GT 5智能手表获得新功能更新 华为Watch GT 5智能手表获得新功能更新 Oct 03, 2024 am 06:25 AM

华为正在全球推出Watch GT 5和Watch GT 5 Pro智能手表的软件版本5.0.0.100(C00M01)。这两款智能手表最近在欧洲推出,标准型号是该公司最便宜的型号。这和谐

铁拳桑德斯上校的梦想被肯德基炸了 铁拳桑德斯上校的梦想被肯德基炸了 Oct 02, 2024 am 06:07 AM

《铁拳》系列总监原田胜宏曾认真尝试过将桑​​德斯上校带入这款标志性格斗游戏中。在接受 TheGamer 采访时,原田透露,他向日本肯德基提出了这个想法,希望将这位快餐传奇人物纳入其中。

第一眼:即将推出的 Anker Zolo 4 端口 140W 带显示屏壁式充电器的拆箱视频泄露 第一眼:即将推出的 Anker Zolo 4 端口 140W 带显示屏壁式充电器的拆箱视频泄露 Oct 01, 2024 am 06:32 AM

2024 年 9 月早些时候,Anker 的 Zolo 140W 充电器被泄露,这是该公司首款带显示屏的壁式充电器,这引起了轰动。现在,小李TV在YouTube上发布的新开箱视频让我们亲眼目睹了这款hi

Garmin 通过新的更新发布了针对多款智能手表的冒险赛车活动改进 Garmin 通过新的更新发布了针对多款智能手表的冒险赛车活动改进 Oct 01, 2024 am 06:40 AM

Garmin 将于本月底为其最新的高端智能手表提供一组新的稳定更新。回顾一下,该公司发布了系统软件 11.64,以解决 Enduro 3、Fenix E 和 Fenix 8 的高电池消耗问题(亚马逊售价 1,099.99 美元)。

搭载 HyperOS 的新款小米米家石墨烯油汀到货 搭载 HyperOS 的新款小米米家石墨烯油汀到货 Oct 02, 2024 pm 09:02 PM

小米即将在中国推出米家石墨烯油汀取暖器。该公司最近在其优品平台上成功举办了一次智能家居产品众筹活动。根据页面显示,该设备已经开始发货至

Cyber​​truck FSD 评论称赞快速车道切换和全屏可视化 Cyber​​truck FSD 评论称赞快速车道切换和全屏可视化 Oct 01, 2024 am 06:16 AM

特斯拉正在推出最新的全自动驾驶(监督)版本 12.5.5,并最终带来了承诺的 Cyber​​truck FSD 选项,距离皮卡上市十个月后,该功能包含在基础系列的装饰价格中。 F

三星 Galaxy Z Fold 特别版透露将于 10 月底登陆,但名称出现冲突 三星 Galaxy Z Fold 特别版透露将于 10 月底登陆,但名称出现冲突 Oct 01, 2024 am 06:21 AM

三星期待已久的“特别版”可折叠手机的推出又迎来了另一个转折。最近几周,有关所谓 Galaxy Z Fold 特别版的传言相当安静。相反,焦点已转移到 Galaxy S25 系列,包括

OpenAI'最大最贵”大模型GPT-4.5,价格是DeepSeek的300倍 OpenAI'最大最贵”大模型GPT-4.5,价格是DeepSeek的300倍 Mar 12, 2025 pm 02:21 PM

OpenAI发布GPT-4.5研究预览版,号称“情商最高”的大语言模型,但高昂价格引发争议。GPT-4.5每百万tokens的API调用价格高达75美元,是GPT-4的30倍,远超DeepSeek-chat的0.5美元(峰值)和0.25美元(低峰)。虽然OpenAI强调GPT-4.5在自然交互、理解意图和减少幻觉方面有所提升,并在写作、设计领域表现出色,但一些关键基准测试结果却显示其性能提升并未达到行业领先水平,尤其在编程能力方面与其他模型相比并无优势。目前,GPT-4.5已向ChatGPTPr

See all articles