Home Technology peripherals AI The account that has been breaking the news about OpenAI's 'Strawberry” is actually an intelligent agent? Stanford startup 'hype” AgentQ

The account that has been breaking the news about OpenAI's 'Strawberry” is actually an intelligent agent? Stanford startup 'hype” AgentQ

Aug 14, 2024 pm 05:09 PM
openai industry Agent Q MultiOn

When the hype has created "tremendous traffic", no one cares about whether the product is great or not.

Recently, OpenAI’s secret project “Q*” has been receiving widespread attention from insiders. Last month, a project based on it and codenamed "Strawberry" was exposed again. Presumably, the project is capable of providing advanced reasoning capabilities.

In recent days, regarding this project, there have been several waves of rumors on the Internet that "the dead pigeon will not pay for the life of the human being". Especially the account of "Brother Strawberry", which promotes non-stop, gives people expectations but also disappoints them.
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ
Unexpectedly, wherever this Sam Altman appears, the "marketing account" where he posts is actually an intelligent body under his skin?

Today, the founder of an AI agent startup company "MultiOn" came out directly to claim: Although we did not wait for OpenAI to release "Q*", we released a new agent Agent that controls the "Strawberry Brother" account Q, come and play with us online!联 Multion co -founder and CEO DIV GARG, who took a break from a PhD in computer science in Stanford.
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ
It seems that OpenAI’s marketing operation of making wedding dresses for itself has left everyone confused. After all, many people have been staying up all night waiting for OpenAI’s “big news” recently. This goes back to the interaction between Sam Altman and "Brother Strawberry". Under the photo of strawberries posted by Sam Altman, he replied to "Brother Strawberry": The surprise will come soon.

However, Div Garg, the founder of “MultiOn”, has quietly deleted the post claiming that Agent Q is “Brother Strawberry”.
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQThis time, "MultiOn" announced that the
Agent Q they released is a breakthrough AI agent
. Its training method combines Monte Carlo Tree Search (MCTS) and self-criticism, and it learns from human feedback through an algorithm called Direct Preference Optimization (DPO).

At the same time, as a next-generation AI agent with planning and AI self-healing capabilities, Agent Q’s performance is 3.4 times higher than the LLama 3 baseline zero-sample performance. At the same time, in the evaluation of real-scenario tasks, Agent Q's success rate reached 95.4%.

What can Agent Q do? Let’s take a look at the official demo first.
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQIt can reserve a seat for you at a certain restaurant at a certain time.

Then perform web operations for you, such as checking the availability. Finally booked successfully.

The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ In addition, you can book flights (such as flying from New York to San Francisco this Saturday, one-way, window seat and economy class).

The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ

However, netizens don’t seem to buy Agent Q. What everyone is more concerned about is whether they are really using the "Strawberry Brother" account to promote things. Some people even call them shameless liars.

The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ

重要組件和方法概覽

目前,Agent Q 的相關論文已經放出,由 MultiOn 和史丹佛大學的研究者共同撰寫。這項研究的成果將在今年稍後向開發人員和使用 MultiOn 的一般用戶開放。
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ

  • 論文地址:https://multion-research.s3.us-east-2.amazonaws.com/AgentQ.pdf
Q
網頁上實施規劃並自我糾錯,從成功和失敗的經驗中學習,提高它在複雜任務中的表現。最終,該智能體可以更好地規劃如何在網路上衝浪,以適應現實世界的複雜情況。

在技術細節上, Agent Q 的主要組件包括如下:

使用MCTS(Monte Carlo Tree 探索,蒙特卡洛樹搜尋)進行引導式搜尋:該技術透過不同的操作和網頁來自主生成數據,以平衡探索和利用。 MCTS 使用高採樣溫度和多樣化提示來擴展操作空間,確保多樣化和最佳的軌跡集合。

AI 自我批評:在每個步驟中,基於 AI 的自我批評都會提供有價值的回饋,從而完善智能體的決策過程。這一步驟級回饋對於長期任務至關重要,因為稀疏訊號通常會導致學習困難。

直接偏好最佳化(DPO):此演算法透過從 MCTS 產生的資料建立偏好對以微調模型。這種離策略訓練方法允許模型從聚合資料集(包括搜尋過程中探索的次優分支)中有效地學習,從而提高複雜環境中的成功率。

下面重點講一下網頁(Web-Page)端的 MCTS 演算法。研究者探索如何透過 MCTS 賦予智能體額外的搜尋能力。

在以往的工作中,MCTS 演算法通常由四個階段組成:選擇、擴展、模擬和反向傳播,每個階段在平衡探索與利用、迭代細化策略方面都發揮關鍵作用。

研究者將網頁智能體執行公式化為網頁樹搜索,其中狀態由智能體歷史和當前網頁的 DOM 樹組成。與國際象棋或圍棋等棋盤遊戲不同,研究者使用的複雜網路智能體操作空間是開放格式且可變的。

研究者將基礎模型用作操作建議(action-proposal)分佈,並在每個節點(網頁)上採樣固定數量的可能操作。一旦在瀏覽器中選擇並執行一個操作,則會遍歷下個網頁,並且該網頁與更新的歷史記錄共同成為新節點。

研究者對回饋模型進行多次迭代查詢,每次從清單中刪除從上一次迭代中選擇的最佳操作,直到對所有操作進行完整排序。下圖 4 為完整的 AI 回饋過程。
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ
擴展和回溯。研究者在瀏覽器環境中選擇並執行一個操作以到達一個新節點(頁面)。從選定的狀態節點軌跡開始,他們使用目前策略 ?_? 展開軌跡,直到到達終止狀態。環境在軌跡結束時返回獎勵 ?,其中如果智能體成功則 ? = 1,否則 ? = 0。接下來,透過從葉節點到根節點自下而上地更新每個節點的值來反向傳播此獎勵,如下所示:
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ
下圖 3 展示了所有結果和基線。當讓智能體在測試時能夠搜尋資訊時,即為基礎xLAM-v0.1-r 模型應用MCTS 時,成功率從28.6% 提升到了48.4%,接近平均人類表現的50.0%,並且顯著超過了僅透過結果監督訓練的零樣本DPO 模型的表現。
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ
研究者進一步根據下圖中概述的演算法對基礎模型進行了微調,結果比基礎 DPO 模型提高了 0.9%。在精心訓練的 Agent Q 模型上再應用 MCTS,智能體的表現提升到了 50.5%,略微超過了人類的平均表現。
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ
他們認為,即使智能體經過了大量的強化學習訓練,在測試時具備搜尋能力仍然是一個重要的範式轉移。與沒有經過訓練的零樣本智能體相比,這是一個顯著的進步。

此外,儘管密集級監督比純粹的基於結果的監督有所改善,但在 WebShop 環境中,這種訓練方法的提升效果並不大。這是因為在這個環境裡,智能體只需要做很短的決策路徑,可以透過結果來學習信用分配。

評估結果

研究者選擇了讓智能體在 OpenTable 官網上預訂餐廳的任務來測試 Agent Q 框架在真實世界中的表現如何。要完成這個訂餐任務,智能體必須在 OpenTable 網站上找到餐廳的頁面,選擇特定的日期和時間,並挑選符合使用者偏好的座位,最後提交使用者的聯絡方式,才能預定成功。

最初,他們對 xLAM-v0.1-r 模型進行了實驗,但該模型表現不佳,初始成功率僅為 0.0%。因此,他們轉而使用 LLaMa 70B Instruct 模型,取得了一些初步的成功。

不過由於 OpenTable 是一個即時環境,很難透過程式設計或自動化的方式進行測量和評估。因此,研究者使用GPT-4-V 根據以下指標為每個軌跡收集獎勵:(1) 日期和時間設定正確,(2) 聚會規模設定正確,(3) 使用者資訊輸入正確,以及(4) 點擊完成預訂。如果滿足上述所有條件,則視為智能體完成了任務。結果監督設定如下圖 5 所示。
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ
而 Agent Q 將 LLaMa-3 模型的零樣本成功率從 18.6% 大幅提高到了 81.7%,這個結果僅在單日自主資料收集後便實現了,相當於成功率激增了 340%。在引入線上搜尋功能後,成功率更是攀升至 95.4%。
The account that has been breaking the news about OpenAIs Strawberry” is actually an intelligent agent? Stanford startup hype” AgentQ
更多技術細節和評估結果請參閱原論文。

參考連結:https://www.multion.ai/blog/introducing-agent-q-research-breakthrough-for-the-next-generation-of-ai-agents-with-planning- and-self-healing-capabilities

The above is the detailed content of The account that has been breaking the news about OpenAI's 'Strawberry” is actually an intelligent agent? Stanford startup 'hype” AgentQ. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1663
14
PHP Tutorial
1263
29
C# Tutorial
1236
24
DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI ​​robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI ​​side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

See all articles