


This year's English College Entrance Examination, CMU used reconstruction pre-training to achieve a high score of 134, significantly surpassing GPT3
The way we store data is changing, from biological neural networks to artificial neural networks. In fact, the most common situation is to use the brain to store data. As the amount of data available today continues to grow, people seek to store data using different external devices such as hard drives or cloud storage. With the rise of deep learning technology, another promising storage technology has emerged, which uses artificial neural networks to store information in data.
Researchers believe that the ultimate goal of data storage is to better serve human life, and the access method of data is equally important as the storage method. However, there are differences in how the data is stored and accessed. Historically, people have struggled to bridge this gap in order to make better use of the information that exists in the world. As shown in Figure 3:
- In terms of biological neural networks (such as the human brain), humans receive curriculum (i.e. knowledge) education at a very young age , so that they can extract specific data to deal with the complex and changing life.
- For external device storage, people usually structure the data according to a certain pattern (such as a table), and then use a specialized language (such as SQL) to efficiently retrieve the required information from the database.
- For artificial neural network-based storage, researchers leverage self-supervised learning to store data from large corpora (i.e., pre-training) and then use the network for a variety of downstream tasks (e.g., emotion classification).
Researchers from CMU have proposed a new way to access data containing various types of information that can serve as pre-training signals to guide model parameter optimization. The study presents data in a structured manner in units of signals. This is similar to the scenario of using a database to store data: they are first structured into tables or JSON format, so that the exact information required can be retrieved through a specialized language such as SQL.
In addition, this study believes that valuable signals exist abundantly in all kinds of data in the world, rather than simply existing in manually managed supervised data sets. What researchers need to do is (a) identify the data (b) Reorganize the data with a unified language (c) Integrate and store them into a pre-trained language model. The study calls this learning paradigm reStructured Pre-training (RST). Researchers liken the process to a "treasure hunt in a mine." Different data sources such as Wikipedia are equivalent to mines rich in gems. They contain rich information, such as named entities from hyperlinks, that can provide signals for model pre-training. A good pretrained model (PLM) should clearly understand the composition of various signals in the data in order to provide accurate information based on the different needs of downstream tasks.
Paper address: https://arxiv.org/pdf/2206.11147.pdf
Preliminary Training language model treasure hunt
This research proposes a new paradigm of natural language processing task learning, namely RST, which re-emphasizes the role of data and regards model pre-training and fine-tuning of downstream tasks as the role of data. Storage and access procedures. On this basis, this research realizes a simple principle, that is, a good storage mechanism should not only have the ability to cache large amounts of data, but also consider the convenience of access.
After overcoming some engineering challenges, the research achieved this by pre-training on reconstructed data (composed of various valuable information instead of the original data). Experiments demonstrate that the RST model not only significantly outperforms the best existing systems (e.g., T0) on 52/55 popular datasets from various NLP tasks (e.g., classification, information extraction, fact retrieval, text generation, etc.), but also does not require Fine-tune downstream tasks. It has also achieved excellent results in China's most authoritative College Entrance Examination English exam, which is taken by millions of students every year.
Specifically, the college entrance examination AI (Qin) proposed in this article is 40 points higher than the average student score and 15 points higher than GPT3 using 1/16 parameters. The special Qin scored a high score of 138.5 (out of 150) in the 2018 English test.
In addition, the study also released the Gaokao Benchmark online submission platform, which contains 10 annotated English test papers from 2018-2021 to the present (and will be expanded every year), allowing more AI models to participate in the college entrance examination. The research also establishes a relatively level test bed for human and AI competition, helping us better understand where we stand. In addition, in the 2022 College Entrance Examination English test a few days ago (2022.06.08), the AI system scored a good score of 134 points, while GPT3 only scored 108 points.
The main contributions of this study include:
(1) Propose the evolution hypothesis of NLP method. This research attempts to establish the "NLP technology evolution hypothesis" from an overall perspective by exploring the intrinsic connections between the development of modern NLP technology. In short, the core idea of this hypothesis is that technology iterations always develop in such a direction that developers only need to do less to design better and more versatile systems.
So far, the evolution of NLP technology has gone through multiple iterations as shown in Figure 2: Feature Engineering → Architecture Engineering → Target Engineering → Prompt Engineering, and is moving towards A move towards more practical and effective data-centric engineering. The researchers hope that in the future, more scientific researchers will be inspired to think critically about this issue, grasp the core driving force of technological progress, find a "gradient upward" path for academic development, and do more scientifically meaningful work.
(2) New paradigm based on evolution hypothesis: reStructured Pre-training. This paradigm treats model pre-training/fine-tuning as a data storage/access process and claims that a good storage mechanism should make the intended data easily accessible. With such a new paradigm, the research was able to unify 26 different types of signals in the world (e.g. entities of sentences) from 10 data sources (e.g. Wikipedia). The general model trained on this basis achieved strong generalization capabilities on various tasks, including 55 NLP data sets.
(3) AI used for college entrance examination. Based on the above paradigm, this study developed an AI system - Qin - specifically used for college entrance examination English testing tasks. This is the world’s first college entrance examination English artificial intelligence system based on deep learning. Qin has achieved outstanding results on college entrance examination questions for many years: 40 points higher than ordinary people, and achieved 15 points higher than GPT-3 using only 1/16 of the parameters of GPT-3. Especially in the 2018 English test questions, QIN received a high score of 138.5 points (out of 150 points), with perfect scores in both listening and reading comprehension.
(4) Rich resources. (1) In order to track the progress of existing AI technology in achieving human intelligence, the study released a new benchmark-Gaokao Benchmark. Not only does it provide a comprehensive assessment of a variety of practical tasks and domains in real-world scenarios, it also provides human performance scores so that AI systems can be directly compared to humans. (2) This study uses ExplainaBoard (Liu et al., 2021b) to set an interactive leaderboard for Gaokao Benchmark so that more AI systems can easily participate in Gaokao Benchmark and automatically obtain scores. (3) All resources can be found on GitHub.
In addition, the success of AI in the college entrance examination English test task has provided researchers with many new ideas: AI technology can empower education and help solve a series of problems in education and teaching.
For example, (a) help teachers automate grading, (b) help students answer questions about assignments and explain them in detail, and (c) more importantly, promote educational equity and make it accessible to most families Educational services of the same quality. This work integrates 26 different signals from around the world in a unified way for the first time, and rather than trying to distinguish between supervised and unsupervised data, it is concerned with how much and how we can use the information nature gives us. The outstanding performance on more than 50 datasets from various NLP tasks shows the value of data-centric pre-training and inspires more future exploration.
Refactoring pre-training
The paradigms for solving NLP tasks are changing rapidly and are still continuing. The following table lists the five paradigms in NLP:
Different from the existing model-centric design paradigm, this research thinks more from the perspective of data to maximize the use of existing data. Specifically, this study adopts the data storage and access view, where the pre-training stage is regarded as the data storage process, while the downstream tasks (e.g., sentiment classification) based on the pre-trained model are regarded as the data access process from the pre-trained model, And claims that a good data storage mechanism should make the stored data more accessible.
To achieve this goal, this study treats data as an object composed of different signals, and believes that a good pre-training model should (1) cover as many signal types as possible, (2) when downstream Provide precise access mechanisms to these signals when required by the mission. Generally speaking, this new paradigm consists of three steps: reconstruction, pre-training, and fine-tuning.
The new paradigm of reconstruction, pre-training, and fine-tuning highlights the importance of data, and researchers need to invest more engineering energy in data processing.
Reconstruction Project
Signal Definition
Signals are useful information present in the data that can provide supervision for machine learning models and are expressed as n-tuples. For example, "Mozart was born in Salzburg", "Mozart" and "Salzburg" can be considered as signals for named entity recognition. Typically, signals can be clustered from different perspectives, as shown in Figure 6 below.
Data Mining
Real-world data contains many different types of signals. Reconstructing pre-training enables these signals to be fully exploited. The study organized the collected signals (n-tuples) in a tree diagram as shown in Figure 10 below.
Signal Extraction
The next step of the study was signal extraction and processing, which involved obtaining raw data from different modalities of data mining, data cleaning and Data normalization. Existing methods are roughly divided into two types: (1) rule-based and (2) machine learning-based. In this work, the research mainly focuses on rule-based signal extraction strategies and leaves more high-coverage methods for future work.
Signal Reconstruction
After extracting different signals from various data mining, the next important step is to unify them into a fixed form so that they can be transformed during pre-training. All information is stored consistently in the model. The prompt method (Brown et al., 2020; Liu et al., 2021d) can achieve this goal, and in principle, with appropriate prompt design, it can unify almost all types of signals into a language model style.
The study divided signals into two broad categories: general signals and task-related signals. The former contains basic language knowledge and can benefit all downstream tasks to a certain extent, while the latter can benefit some specific downstream tasks.
Experiments on 55 commonly used NLP datasets
The study is evaluated on 55 datasets and then compared with GPT3 and T0pp respectively. The results of the comparison with GPT3 are shown in the figure: on the four datasets except the cb dataset, both RST-All and RST-Task have better zero-shot performance than GPT3's few-shot learning. In addition, the cb data set is the smallest of these data sets, with only 56 samples in the validation set, so the performance of different prompts on this data set will fluctuate greatly.
and T0pp are shown in Table 4-6. For example, in the average performance of 55 measurements, RST-All beats T0pp on 49 datasets and wins with maximum performance on 47/55 examples. Furthermore, in the average performance test on 55 datasets, RST-Task outperforms T0pp on 52 datasets and outperforms T0pp on 50/55 examples. This illustrates the superiority of reconstructive learning.
Best performing model What tasks is RST-Task good at? To answer this question, this study compares the performance of the RST-Task model in the zero-sample setting with current SOTA models, and the results are shown in Figure 13. RST-Task is good at topic classification, sentiment classification and natural language reasoning tasks, but performs poorly in information extraction tasks.
College Entrance Examination Experiment: Towards Human-Level AI
This study collected 10 college entrance examination English test papers, including the 2018 National Entrance Examination I/III, 2019 2020 National Examination I/II/III, 2020 National Examination I/II/III, 2021 National Paper A/B. These test papers follow the same question type, and they divide all test question types into the following seven subcategories, as shown in Table 7:
The full score of each college entrance examination English paper is 150 points . Listening, cloze, reading, and writing accounted for 30, 45, 40, and 35 respectively. Typically, the writing section is subjective and requires human assessment, while other sections are objective and can be automatically scored. As shown in Table 8:
Use the reconstruction engineering cycle shown in Table 1 to build the College Entrance Examination English AI system, namely Qin. The whole process is shown in Figure 14:
This study uses the following prompt to convert the original signal tuple into a prompt sample, as shown in Table 9:
The experimental results are shown in Table 10-11. We can draw the following conclusions: In each English test paper, RST achieved the highest total score in the two sets of listening tests, with an average score of 130.6 points; compared with T0pp In comparison, the performance of RST is far better than T0pp under the same model size. Across all settings, the total score obtained by RST is on average 54.5 points higher than T0pp, with the highest difference being 69 points (46% of the total score); compared to GPT3, RST can achieve significant results with a model size 16 times smaller. Better results. Across all settings considered, the total score obtained with RST was on average 14.0 points higher than that of T0pp, with a maximum of 26 points (17% of the total score); for T0pp, the listening scores obtained using gold and speech-to-text transcripts differed significantly , with an average of 4.2 points. In comparison, GPT3 and RST are 0.6 and 0.45 respectively, indicating that T0pp’s performance is sensitive to text quality.
The study conducted a fine-grained analysis to understand the performance of different models on different problem subcategories. In Figure 15-(a), it is clear that RST and GPT3 outperform T0pp on every problem subcategory.
Figure 15-(b) shows the performance of the model and the average performance of students on national test papers in recent years. It is clear that T0pp's overall score on paper 9/10 is below the student average, while RST and GPT3 performance exceeds the student average. In particular, five of the ten papers had an overall RST score above 130 (often considered the target score for students to aim for).
The 2022 College Entrance Examination - English Test (2022.06.08) has just ended, and we learned about the performance of the model in the college entrance examination papers of the most recent year. This study performed experiments with GPT3 and RST. The results show that the total RST score reaches 134, which is much higher than the 108 score achieved by GPT3.
The above is the detailed content of This year's English College Entrance Examination, CMU used reconstruction pre-training to achieve a high score of 134, significantly surpassing GPT3. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Current deep edge detection networks usually adopt an encoder-decoder architecture, which contains up and down sampling modules to better extract multi-level features. However, this structure limits the network to output accurate and detailed edge detection results. In response to this problem, a paper on AAAI2024 provides a new solution. Thesis title: DiffusionEdge:DiffusionProbabilisticModelforCrispEdgeDetection Authors: Ye Yunfan (National University of Defense Technology), Xu Kai (National University of Defense Technology), Huang Yuxing (National University of Defense Technology), Yi Renjiao (National University of Defense Technology), Cai Zhiping (National University of Defense Technology) Paper link: https ://ar

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

In time for the Spring Festival, version 1.5 of Tongyi Qianwen Model (Qwen) is online. This morning, the news of the new version attracted the attention of the AI community. The new version of the large model includes six model sizes: 0.5B, 1.8B, 4B, 7B, 14B and 72B. Among them, the performance of the strongest version surpasses GPT3.5 and Mistral-Medium. This version includes Base model and Chat model, and provides multi-language support. Alibaba’s Tongyi Qianwen team stated that the relevant technology has also been launched on the Tongyi Qianwen official website and Tongyi Qianwen App. In addition, today's release of Qwen 1.5 also has the following highlights: supports 32K context length; opens the checkpoint of the Base+Chat model;

Large language models (LLMs) typically have billions of parameters and are trained on trillions of tokens. However, such models are very expensive to train and deploy. In order to reduce computational requirements, various model compression techniques are often used. These model compression techniques can generally be divided into four categories: distillation, tensor decomposition (including low-rank factorization), pruning, and quantization. Pruning methods have been around for some time, but many require recovery fine-tuning (RFT) after pruning to maintain performance, making the entire process costly and difficult to scale. Researchers from ETH Zurich and Microsoft have proposed a solution to this problem called SliceGPT. The core idea of this method is to reduce the embedding of the network by deleting rows and columns in the weight matrix.

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

In April last year, researchers from the University of Wisconsin-Madison, Microsoft Research, and Columbia University jointly released LLaVA (Large Language and Vision Assistant). Although LLaVA is only trained with a small multi-modal instruction data set, it shows very similar inference results to GPT-4 on some samples. Then in October, they launched LLaVA-1.5, which refreshed the SOTA in 11 benchmarks with simple modifications to the original LLaVA. The results of this upgrade are very exciting, bringing new breakthroughs to the field of multi-modal AI assistants. The research team announced the launch of LLaVA-1.6 version, targeting reasoning, OCR and

What? Is Zootopia brought into reality by domestic AI? Exposed together with the video is a new large-scale domestic video generation model called "Keling". Sora uses a similar technical route and combines a number of self-developed technological innovations to produce videos that not only have large and reasonable movements, but also simulate the characteristics of the physical world and have strong conceptual combination capabilities and imagination. According to the data, Keling supports the generation of ultra-long videos of up to 2 minutes at 30fps, with resolutions up to 1080p, and supports multiple aspect ratios. Another important point is that Keling is not a demo or video result demonstration released by the laboratory, but a product-level application launched by Kuaishou, a leading player in the short video field. Moreover, the main focus is to be pragmatic, not to write blank checks, and to go online as soon as it is released. The large model of Ke Ling is already available in Kuaiying.

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.
