


OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models
OpenAI o1 and o1-mini have arrived. These AI LLMs perform much better on coding, math, and science problems and tasks than prior models such as GPT-4o by taking more time to think.
Complex problems in STEM tend to require more than a quick online search for correct answers. By giving the o1 AI more time to think, the AI can reason more carefully and accurately. The o1-mini model has been specifically tuned to answer STEM questions with faster speed and lower demand on computer resources, and it is notably better at coding than the o1 model.
Across a range of standardized AP exams and STEM tests for LLMs, the o1 models perform with high accuracy. Specifically, on the AP Calculus, AP Chemistry, AP Physics 2, LSAT, and SAT evidence-based reading & writing tests, the o1 models perform at or above the B-grade level (~80% or higher). The models answer accurately at the A-grade level on PhD-level physics questions, at the B-grade level on tough 2024 American Invitational Mathematics Examination math questions, and at the high B-grade level on Codeforces coding problems. Because o1 has been tuned for answering STEM questions, its performance on AP English Language and AP English Literature is at or below the C-grade level.
Interestingly, while GPT-4o is dumbfounded by the cryptographic challenge of decoding “oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz” when given the hint “oyfjdnisdr rtqwainr acxz mynzbhhx” means “Think step by step”, o1 had no issues thinking through the problem to come up with the correct answer “There are three r’s in strawberry”. This new power will delight hobby cryptographers at home as well as the NSA.
Closet evil-doers will want to know that while the uncensored o1 models are apt to give troubling replies, OpenAI has neutered these models for release. The o1 models have been tested to resist answering questions about making bioweapons, producing naughty images, jailbreaking itself, and harassing and threatening. Unfortunately, the OpenAI o1 models remain gender and race biased when tested, despite tuning efforts.
ChatGPT Plus and Team users along with API usage tier 5 developers have access to o1 models immediately, and ChatGPT Edu and Enterprise users will gain access on the week of September 16. ChatGPT Free users will gain access to o1-mini in the near future. The o1 models cannot browse the web or accept uploaded files and images to answer questions, so OpenAI recommends users continue using their GPT-4o models for general questions.
Users who want to ask AI questions now have a wide-range of capable LLM models to interact with besides those from OpenAI, including Anthropic Claude, Microsoft CoPilot, Google Gemini, and X Grok. Every AI has specific advantages, so it is worth testing several AI models to find one that best suits individual needs. Some of these AI are built into smart glasses (like these on Amazon) and voice recorders (like this one on Amazon), and some upcoming autonomous humanoid robots use proprietary AI to cook and clean.
以上是OpenAI o1 and o1-mini arrive as AIs that handle STEM questions better than prior models的详细内容。更多信息请关注PHP中文网其他相关文章!

热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

AI Hentai Generator
免费生成ai无尽的。

热门文章

热工具

记事本++7.3.1
好用且免费的代码编辑器

SublimeText3汉化版
中文版,非常好用

禅工作室 13.0.1
功能强大的PHP集成开发环境

Dreamweaver CS6
视觉化网页开发工具

SublimeText3 Mac版
神级代码编辑软件(SublimeText3)

热门话题

华为正在全球推出Watch GT 5和Watch GT 5 Pro智能手表的软件版本5.0.0.100(C00M01)。这两款智能手表最近在欧洲推出,标准型号是该公司最便宜的型号。这和谐

《铁拳》系列总监原田胜宏曾认真尝试过将桑德斯上校带入这款标志性格斗游戏中。在接受 TheGamer 采访时,原田透露,他向日本肯德基提出了这个想法,希望将这位快餐传奇人物纳入其中。

2024 年 9 月早些时候,Anker 的 Zolo 140W 充电器被泄露,这是该公司首款带显示屏的壁式充电器,这引起了轰动。现在,小李TV在YouTube上发布的新开箱视频让我们亲眼目睹了这款hi

Garmin 将于本月底为其最新的高端智能手表提供一组新的稳定更新。回顾一下,该公司发布了系统软件 11.64,以解决 Enduro 3、Fenix E 和 Fenix 8 的高电池消耗问题(亚马逊售价 1,099.99 美元)。

小米即将在中国推出米家石墨烯油汀取暖器。该公司最近在其优品平台上成功举办了一次智能家居产品众筹活动。根据页面显示,该设备已经开始发货至

特斯拉正在推出最新的全自动驾驶(监督)版本 12.5.5,并最终带来了承诺的 Cybertruck FSD 选项,距离皮卡上市十个月后,该功能包含在基础系列的装饰价格中。 F

三星期待已久的“特别版”可折叠手机的推出又迎来了另一个转折。最近几周,有关所谓 Galaxy Z Fold 特别版的传言相当安静。相反,焦点已转移到 Galaxy S25 系列,包括

OpenAI发布GPT-4.5研究预览版,号称“情商最高”的大语言模型,但高昂价格引发争议。GPT-4.5每百万tokens的API调用价格高达75美元,是GPT-4的30倍,远超DeepSeek-chat的0.5美元(峰值)和0.25美元(低峰)。虽然OpenAI强调GPT-4.5在自然交互、理解意图和减少幻觉方面有所提升,并在写作、设计领域表现出色,但一些关键基准测试结果却显示其性能提升并未达到行业领先水平,尤其在编程能力方面与其他模型相比并无优势。目前,GPT-4.5已向ChatGPTPr
