今年 3 月份,英偉達 CEO 黃仁勳舉辦了一個非常特別的活動。他邀請開創性論文《Attention Is All You Need》的作者們齊聚 GTC,暢談生成式 AI 的未來發展方向。
「我們所處的領域如今所享有的一切都可以追溯到那一刻…… 你們改變了世界……」黃仁勳在現場說道。
對於 AI 這種科學研究導向的產業來說,下一次改變世界的機會,可能也藏在某篇論文裡。
因此,我們在這個圈子裡看到了一個不尋常的現象:一些非技術背景的 CEO 們也開始熬夜讀起了論文,希望藉此來降低決策的試錯成本。
CEO 尚且如此,領域內的其他從業者就更不用說了。前段時間,OpenAI、Google、Meta 連放大招,有些新創公司也不斷推出新模型、新方法,相信許多從業者都會覺得論文多到看不過來。
把論文丟給AI 去總結是現在常用的閱讀方法,但很多AI 總結的內容缺乏清晰的層次以及對於創新點、局限性的詳細描述,我們需要經過幾輪追問才能形成對於論文的完整認知。而且,一些關鍵的模型架構圖、實驗結果圖還要自己去論文找,實際節省的時間非常有限。
在騰訊「元寶」的最近一次更新中,我們看到了這些問題的解決方案。他們新推出的「深度閱讀模式」支援長文精讀,能夠輸出模組化的、圖文並茂的解析,非常適合用來讀論文。
為了驗證這項新功能的效果,本站進行了一手實測。
論文「精讀」,精在哪裡?
用 AI 讀論文是怎樣的體驗?很多時候是:你丟給它一個 PDF,它回傳一段總結 + 若乾條概述(有時可能高達 10 條)。這些資訊確實有幫助,但有時候,你很難分辨哪些是亮點,以及論文解決了什麼、沒解決什麼,有哪些核心問題值得細看。
我們實測發現,「元寶」是透過提供一系列模組化、結構化資訊來解決這些問題的。
以我們測試的一篇 SIGGRAPH 論文為例。如果你直接把論文丟進去,它回來的總結和其他 AI 差別不大。不過,只要你耐心往下拉,你會看到一個「深度閱讀該文件」的按鈕,這才是「一鍵直達」論文精讀的開關。
與先前總結論文的介面不同,精讀頁面會把論文拆解得非常有層次,研究背景、研究方法、實驗設計、結果分析、總體結論各自被組織成一個模組,很像本站平時介紹論文的版面。所有這些都可以透過左邊的大綱迅速跳躍。
別看每個模組字不多,其實這些字是非常有資訊量的。例如在研究背景這個模組,「研究難點」段落只用三個短句就描述了四個難點,「相關工作」更是對第二章「related work」的高度濃縮,一段話就把該領域的主要技術路線介紹清楚了。所以,在讀完這個模組之後,我們能基本上搞清楚論文研究的是一個什麼問題,面臨的是怎樣一個研究現況。
除了這些常規的結構化訊息,元寶的精讀還有一個令人眼前一亮的設計—— 它會把論文的優點與不足列出來,方便研究者迅速了解自己能從這篇論文中學到什麼,還有什麼問題值得繼續研究。
Why is this feature so important? Peng Minghui, a professor at Tsinghua University in Taiwan, once wrote in an article about paper reading that papers are different from textbooks. Textbooks provide systematic knowledge compiled and organized by others, while papers require readers to retrieve and organize unorganized knowledge on their own. Filter and organize knowledge. Among them, the ability to analyze the advantages and disadvantages of existing research is particularly important. This is a key part of critical thinking and an important way to improve yourself in academic research. By quickly analyzing and summarizing the advantages and disadvantages of papers, Yuanbao can help researchers save a lot of time in screening and preliminary understanding, allowing them to focus more quickly on papers directly related to their research.
However, if you feel that the previous information is too tedious, you can also jump directly to the last "Key Questions and Answers" module. Here are some of the most critical questions to help you quickly understand the value of the paper, and then determine whether it is worth spending time to read the original text. Of course, many previous AI assistants will also present some key questions at the end of the answer, and you can get the answer with one click. However, if you are a beginner or a reader with an interdisciplinary background, it may not be easy for you to judge which questions are more critical. The direct presentation of Yuanbao feels more intuitive.
Original picture number, who said that AI reading papers cannot be accompanied by pictures?
When reading a paper, many people have a habit of reading the description of the paper while looking at the pictures. It's faster and easier to understand this way. However, most AI applications on the market now return text results. If you want to see pictures, you need to find them in the original text.
We found in the test that Yuanbao is one of the few AIs that can directly cut out the paper pictures and put them in the corresponding positions of the paragraphs. For example, if you talk about architecture in a certain module, it will put the corresponding architecture diagram:
If you talk about experimental results in a certain module, it will put the corresponding chart:
As we all know, there is no way to completely solve the illusion problem of large models. Therefore, this presentation of original figures is actually a more reliable output method, which makes it easier for readers to verify the model’s answers at any time, and is safer when used as a reference.
In addition, we also found that if you need to write a blog or other external output, Yuanbao can also help you draw charts, and you don’t need to tell it where to find the data. It can locate the relevant tables in the paper by itself, and Extract the data and plot it. This function can be invoked through the "Ask a Question" button on the right side of the intensive reading page.
Read anytime and anywhere you want. Who said there are many obstacles to reading papers?
In addition to structured information and output with pictures and texts, we also found during the test that Yuanbao actually has some very practical small functions that can make reading papers more convenient.
The first is word translation and search, which are two practical small functions of the "original text" reading interface. Cross-word translation can help readers with poor English to clear language barriers anytime and anywhere, and cross-word search goes a step further. For example, the search function of Yuanbao has been made into a plug-in, so that you can search for relevant information at any time. Moreover, the explanation given by Yuanbao is not only a brief summary, but also a modular expansion. It is really "structured" and "informative" in every detail.
The second is "Offline Reading". The practicality of this function is that it allows you to review the intensive reading content and original text in "airplane mode" without wasting any fragmented time. This allows airlines to regain some ground in the competition with high-speed rail. Maybe the researchers’ next inspiration will come from reviewing the intensive reading on the plane.
The last small function is "Calculator". Some time ago, AI caused a lot of discussion because it couldn't tell which was bigger, 9.9 or 9.11. In Yuanbao, we find that it has an integrated calculator function that ensures that answers are generated based on accurate calculations. This feature is very useful when we read experimental data.
Behind the intensive reading of long articles: It turns out there is expert guidance
According to official information, Tencent Yuanbao’s upgrade focuses on “intensive reading of long articles” and can natively support input of up to nearly 500,000 words. The papers we used in the test are far from reaching this length, and most of the papers we come into contact with daily cannot reach this length. Therefore, when using Yuanbao to read papers intensively, the context window is sufficient in most cases. Its modularity, graphic and text output, and small functions such as word search and translation also make reading papers truly convenient and efficient, taking another step closer to "practicability."
This evolution is inseparable from the model behind it - the upgrade of Tencent Hunyuan large model. It is reported that in order to improve the professionalism and practicality of the model in the professional field, Tencent Hunyuan team specially invited domain experts to summarize the core skills of each professional field and formulated the answer standards for professional questions so that the model can be used as a real domain Experts provide services. So after using it, we feel that Yuanbao knows what information readers need and how the information should be presented.
In addition to papers, this new function can also be used to intensively read financial reports, research reports and other long texts. In these scenarios, it can sort out information from multiple dimensions and generate professional charts such as DuPont analysis charts based on the report content, allowing people who do not understand these documents to understand the company's financial status and other information.
However, in the context of reading papers, Yuanbao still has some room for improvement, such as the lack of complete original text-translation comparison in the original text reading interface, and the recognition of formulas is sometimes not accurate enough. We also hope Yuanbao can improve these issues in future updates.
But as an app that has only been launched for more than two months, Tencent Yuanbao’s performance has already exceeded expectations. Its evolutionary trajectory allows us to see how large models will become new productivity step by step. We also look forward to this APP bringing us more surprises.
以上是這個大模型,真的治好了我的論文閱讀障礙的詳細內容。更多資訊請關注PHP中文網其他相關文章!