Home Technology peripherals AI Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer words

Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer words

Sep 19, 2023 pm 08:33 PM
gpt-4 project

Automatic summarization technology has made significant progress in recent years, mainly due to paradigm shifts. In the past, the technology relied mainly on supervised fine-tuning on annotated data sets, but now uses large language models (LLM) for zero-shot prompts, such as GPT-4. Through careful prompt settings, fine control of summary length, theme, style and other features can be achieved without additional training

But one aspect is often overlooked: the information density of the summary. Theoretically, as a compression of another text, a summary should be denser, that is, contain more information, than the source file. Considering the high latency of LLM decoding, it is important to cover more information with fewer words, especially for real-time applications.

However, information density is an open question: if the abstract contains insufficient details, it is equivalent to no information; if it contains too much information without increasing the total length, it will become difficult to understand. To convey more information within a fixed token budget, it is necessary to combine abstraction, compression, and fusion.

In recent research, researchers from Salesforce, MIT, and others have attempted to determine the limits of increasing density by soliciting human preferences for a set of summaries generated by GPT-4. This method provides a lot of inspiration for improving the "expression ability" of large language models such as GPT-4

Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer words

Paper link: https://arxiv.org/pdf/2309.04269 .pdf

Dataset address: https://huggingface.co/datasets/griffin/chain_of_density

Specifically, the researchers used the average number of entities per token as the density of representatives, generating an initial, entity-sparse summary. Then, they iteratively identify and fuse the 1-3 entities that were missing from the previous summary without increasing the total length (5 times the total length). Each digest has a higher entity to token ratio than the previous digest. Based on human preference data, the authors ultimately determined that humans prefer summaries that are nearly as dense as human-written summaries, and denser than summaries generated by ordinary GPT-4 prompts. The overall contribution of the study can be summarized as The following points:

    We need to develop a prompt-based iterative method (CoD) to improve the entity density of the summary
  • for CNN / Manual and automated assessment of the density of summaries in Daily Mail articles to better understand the trade-off between informativeness (favoring more entities) and clarity (favoring fewer entities)
  • Open source GPT-4 abstracts, annotations and a set of 5000 unannotated CoD abstracts for evaluation or refinement.

Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer words

What does CoD mean?

The author formulated a single chain of density (CoD) Prompt, which generates an initial summary and makes its entity density continuously increase. Specifically, within a fixed number of interactions, a unique set of salient entities in the source text are identified and merged into the previous summary without increasing the length.

Examples of prompts and output are shown in Figure 2. The author does not explicitly specify the type of entity, but defines the missing entity as:

    Related to the main story:
  • Specific: Concise Summary (5 words or less);
  • Unique: not mentioned in previous summaries;
  • Faithful: present in In the article;
  • Anywhere: Located anywhere in the article.

Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer wordsThe author randomly selected 100 articles from the CNN/DailyMail summary test set to generate CoD summaries for them. For ease of reference, they compared CoD summary statistics to human-written bullet-point reference summaries and summaries generated by GPT-4 under the normal prompt: "Write a very short summary of the article. No more than 70 words."

Statistical situation

In the study, the author summarized from two aspects: direct statistical data and indirect statistical data. Direct statistics (tokens, entities, entity density) are directly controlled by CoD, while indirect statistics are an expected by-product of densification.

Direct statistics. As shown in Table 1, the second step reduced the length by an average of 5 tokens (from 72 to 67) due to the removal of unnecessary words from the initially lengthy summary. Entity density starts at 0.089, initially lower than human and Vanilla GPT-4 (0.151 and 0.122), and eventually rises to 0.167 after 5 steps of densification. Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer wordsIndirect statistics. The level of abstraction should increase with each step of CoD, as the abstract is repeatedly rewritten to make room for each additional entity. The authors measure abstraction using extraction density: the average square length of extracted fragments (Grusky et al., 2018). Likewise, concept fusion should increase monotonically as entities are added to a fixed-length summary. The authors expressed the degree of integration by the average number of source sentences aligned with each summary sentence. For alignment, the authors use the relative ROUGE gain method (Zhou et al., 2018), which aligns the source sentence with the target sentence until the relative ROUGE gain of the additional sentences is no longer positive. They also expected changes in content distribution, or the position within the article from which the summary content comes.

Specifically, the authors expected that CoD abstracts would initially exhibit a strong "lead bias" but would then gradually begin to introduce entities from the middle and end of the article. To measure this, they used alignment in fusion to rewrite content in Chinese, without the original sentence appearing, and measured the average sentence rank across all aligned source sentences.

Figure 3 confirms these hypotheses: as the number of rewriting steps increases, the abstraction increases (the left image shows lower extraction density), the fusion rate also increases (the middle image shows), and the abstract starts to include the middle of the article and the content at the end (shown on the right). Interestingly, all CoD summaries are more abstract compared to human-written summaries and baseline summaries

Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer words

When rewriting the content, you need to rewrite it in Chinese , the original sentence does not need to appear

In order to better understand the tradeoff of CoD abstracts, the authors conducted a preference-based human study and conducted a rating-based evaluation using GPT-4.

Human preferences. Specifically, for the same 100 articles (5 steps *100 = 500 abstracts in total), the author randomly showed the "re-created" CoD abstracts and articles to the first four authors of the paper. Each annotator gave his or her favorite summary based on Stiennon et al.'s (2020) definition of a "good summary." Table 2 reports the first-place votes of each annotator in the CoD stage, as well as the summary of each annotator. Overall, 61% of first-place abstracts (23.0 22.5 15.5) involved ≥3 densification steps. The median number of preferred CoD steps is in the middle (3), with an expected step number of 3.06.

Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer words

Based on the average density in the third step, the preferred entity density of all CoD candidates is approximately 0.15. As can be seen from Table 1, this density is consistent with human-written summaries (0.151), but significantly higher than summaries written with ordinary GPT-4 Prompt (0.122)

automatic measures. As a supplement to human evaluation (below), the authors used GPT-4 to score CoD summaries (1-5 points) along 5 dimensions: informativeness, quality, coherence, attributability, and overallness. As shown in Table 3, density correlates with informativeness, but up to a limit, with the score peaking at step 4 (4.74).

Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer words

Judging from the average scores of each dimension, the first and last steps of CoD have the lowest scores, while the middle three steps have close scores (4.78, 4.77 and 4.76).

Qualitative analysis. There is a clear trade-off between abstract coherence/readability and informativeness. Two CoD steps are shown in Figure 4: one step's summary is improved by more detail, and the other step's summary is compromised. On average, intermediate CoD summaries best achieve this balance, but this tradeoff still needs to be precisely defined and quantified in future work.

Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer wordsFor more details of the paper, please refer to the original paper.

The above is the detailed content of Salesforce collaborates with MIT researchers to open source GPT-4 revision tutorials to deliver more information with fewer words. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' Jun 24, 2024 pm 03:04 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com In the development process of artificial intelligence, the control and guidance of large language models (LLM) has always been one of the core challenges, aiming to ensure that these models are both powerful and safe serve human society. Early efforts focused on reinforcement learning methods through human feedback (RL

The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days Jul 17, 2024 am 01:56 AM

It is also a Tusheng video, but PaintsUndo has taken a different route. ControlNet author LvminZhang started to live again! This time I aim at the field of painting. The new project PaintsUndo has received 1.4kstar (still rising crazily) not long after it was launched. Project address: https://github.com/lllyasviel/Paints-UNDO Through this project, the user inputs a static image, and PaintsUndo can automatically help you generate a video of the entire painting process, from line draft to finished product. follow. During the drawing process, the line changes are amazing. The final video result is very similar to the original image: Let’s take a look at a complete drawing.

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Jul 17, 2024 pm 10:02 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Code repair; Deng Yinlin, fourth-year doctoral student, researcher

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI ​​model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated Aug 05, 2024 pm 03:32 PM

Recently, the Riemann Hypothesis, known as one of the seven major problems of the millennium, has achieved a new breakthrough. The Riemann Hypothesis is a very important unsolved problem in mathematics, related to the precise properties of the distribution of prime numbers (primes are those numbers that are only divisible by 1 and themselves, and they play a fundamental role in number theory). In today's mathematical literature, there are more than a thousand mathematical propositions based on the establishment of the Riemann Hypothesis (or its generalized form). In other words, once the Riemann Hypothesis and its generalized form are proven, these more than a thousand propositions will be established as theorems, which will have a profound impact on the field of mathematics; and if the Riemann Hypothesis is proven wrong, then among these propositions part of it will also lose its effectiveness. New breakthrough comes from MIT mathematics professor Larry Guth and Oxford University

arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it Aug 01, 2024 pm 05:18 PM

cheers! What is it like when a paper discussion is down to words? Recently, students at Stanford University created alphaXiv, an open discussion forum for arXiv papers that allows questions and comments to be posted directly on any arXiv paper. Website link: https://alphaxiv.org/ In fact, there is no need to visit this website specifically. Just change arXiv in any URL to alphaXiv to directly open the corresponding paper on the alphaXiv forum: you can accurately locate the paragraphs in the paper, Sentence: In the discussion area on the right, users can post questions to ask the author about the ideas and details of the paper. For example, they can also comment on the content of the paper, such as: "Given to

Axiomatic training allows LLM to learn causal reasoning: the 67 million parameter model is comparable to the trillion parameter level GPT-4 Axiomatic training allows LLM to learn causal reasoning: the 67 million parameter model is comparable to the trillion parameter level GPT-4 Jul 17, 2024 am 10:14 AM

Show the causal chain to LLM and it learns the axioms. AI is already helping mathematicians and scientists conduct research. For example, the famous mathematician Terence Tao has repeatedly shared his research and exploration experience with the help of AI tools such as GPT. For AI to compete in these fields, strong and reliable causal reasoning capabilities are essential. The research to be introduced in this article found that a Transformer model trained on the demonstration of the causal transitivity axiom on small graphs can generalize to the transitive axiom on large graphs. In other words, if the Transformer learns to perform simple causal reasoning, it may be used for more complex causal reasoning. The axiomatic training framework proposed by the team is a new paradigm for learning causal reasoning based on passive data, with only demonstrations

Unlimited video generation, planning and decision-making, diffusion forced integration of next token prediction and full sequence diffusion Unlimited video generation, planning and decision-making, diffusion forced integration of next token prediction and full sequence diffusion Jul 23, 2024 pm 02:05 PM

Currently, autoregressive large-scale language models using the next token prediction paradigm have become popular all over the world. At the same time, a large number of synthetic images and videos on the Internet have already shown us the power of diffusion models. Recently, a research team at MITCSAIL (one of whom is Chen Boyuan, a PhD student at MIT) successfully integrated the powerful capabilities of the full sequence diffusion model and the next token model, and proposed a training and sampling paradigm: Diffusion Forcing (DF). Paper title: DiffusionForcing:Next-tokenPredictionMeetsFull-SequenceDiffusion Paper address: https:/

See all articles