Home Technology peripherals AI Review of NeurIPS 2023: Tsinghua ToT pushes large models into focus

Review of NeurIPS 2023: Tsinghua ToT pushes large models into focus

Jan 26, 2024 pm 04:27 PM
data train

Recently, as one of the top ten technology blogs in the United States, Latent Space conducted a selected review and summary of the just past NeurIPS 2023 conference.

In the NeurIPS conference, a total of 3586 papers were accepted, 6 of which won awards. While these award-winning papers receive much attention, other papers are equally of outstanding quality and potential. In fact, these papers may even herald the next big breakthrough in AI.

Then let’s take a look together!

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

##Thesis title: QLoRA: Efficient Finetuning of Quantized LLMs

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper address: https://openreview.net/pdf?id=OUIFPHEgJU

This paper proposes QLoRA, which is a more advanced version of LoRA A memory-efficient but slower version that uses several optimization tricks to save memory.

Overall, QLoRA enables the use of less GPU memory when fine-tuning large language models.

They fine-tuned a new model, named Guanaco, and trained it for 24 hours on just one GPU, outperforming the previous model on the Vicuna benchmark.

At the same time, researchers have also developed other methods, such as 4-bit LoRA quantification, with similar effects.

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper title: DataComp: In search of the next generation of multimodal datasets

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper address: https://openreview.net/pdf?id=dVaWCDMBof

Multimodal data Ensembles have played a key role in recent breakthroughs such as CLIP, Stable Diffusion, and GPT-4, but their design has not received the same research attention as model architecture or training algorithms.

To address this shortfall in the machine learning ecosystem, researchers introduce DataComp, a new candidate pool of 12.8 billion image-text pairs built around Common Crawl. A test platform for collective experiments.

Users can experiment with DataComp to design new filtering techniques or curate new data sources by running standardized CLIP training code, as well as on 38 downstream test sets. Test the generated models to evaluate them on new datasets.

The results show that the best benchmark DataComp-1B, which allows training a CLIP ViT-L/14 model from scratch, achieves a zero-shot accuracy of 79.2% on ImageNet, compared with OpenAI's CLIP ViT-L/14 model is 3.7 percentage points higher, proving that the DataComp workflow can produce better training sets.

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Thesis title: Visual Instruction Tuning

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper address:https://www.php.cn/link/c0db7643410e1a667d5e01868827a9af

In this paper, The researchers present the first attempt to generate multimodal language-image instruction-following data using GPT-4, which relies solely on language.

By adapting instructions on this generated data, we introduce LLaVA: Large Language and Vision Assistant, a large multimodal model trained end-to-end that connects A visual encoder and LLM for general vision and language understanding.

Early experiments demonstrate that LLaVA demonstrates impressive multimodal chat capabilities, sometimes exhibiting multimodal GPT-4 behavior on unseen images/instructions, and on synthetic multimodal It achieved a relative score of 85.1% compared with GPT-4 on the static instruction following data set.

The synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53% when fine-tuning scientific question answering.

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

##Thesis title: Tree of Thoughts: Deliberate Problem Solving with Large Language Models

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper address: https://arxiv.org/pdf/2305.10601.pdf

Language models are getting better and better It is increasingly used for general problem solving across a wide range of tasks, but is still limited to a token-level, left-to-right decision-making process during reasoning. This means they may perform poorly in tasks that require exploration, strategic foresight, or where initial decision-making plays a key role.

To overcome these challenges, researchers introduce a new language model inference framework, Tree of Thoughts (ToT), which generalizes the popular Chain of Thought in prompting language models method and allows exploration on consistent text units (ideas) that serve as intermediate steps in problem solving.

ToT enables language models to make deliberate decisions by considering multiple different reasoning paths and self-evaluating choices to decide on next steps, and looking ahead or backtracking if necessary. Make global choices.

Experiments demonstrate that ToT significantly improves the problem-solving capabilities of language models on three new tasks that require non-trivial planning or search: 24-point games, creative writing, and mini-crossword puzzles . For example, in the 24-point game, while GPT-4 using Chain of Thought prompts only solved 4% of the tasks, ToT achieved a 74% success rate.

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Thesis title: Toolformer: Language Models Can Teach Themselves to Use Tools

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper address: https://arxiv.org/pdf/2302.04761.pdf

The language model shows that Demonstrated ability to solve novel tasks from a small number of examples or textual instructions, especially in large-scale situations. Paradoxically, however, they exhibit difficulties with basic functions such as arithmetic or fact-finding compared to simpler and smaller specialized models.

In this paper, the researchers show that a language model can teach itself to use external tools through a simple API, and achieve the best combination of the two.

They introduced Toolformer, a model trained to decide which APIs to call, when to call them, what parameters to pass and how to best incorporate the results into future token predictions.

This is done in a self-supervised manner, requiring only a small number of demonstrations per API. They integrate a variety of tools, including calculators, question and answer systems, search engines, translation systems, and calendars.

Toolformer achieves significantly improved zero-shot performance on a variety of downstream tasks while competing against larger models, without sacrificing its core language modeling capabilities.

Thesis title: Voyager: An Open-Ended Embodied Agent with Large Language Models

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper address: https://arxiv.org/pdf/2305.16291.pdf

This paper introduces Voyager, the first A learning agent driven by large language models (LLM) that can continuously explore the world, acquire diverse skills, and make independent discoveries in Minecraft.

Voyager consists of three key components:

Automated lessons designed to maximize exploration,

A growing library of executable code skills for storing and retrieving complex behaviors,

A new iterative prompting mechanism that integrates environmental feedback, execution errors and self Validate to improve procedures.

Voyager interacts with GPT-4 through black-box queries, avoiding the need to fine-tune model parameters.

Based on empirical research, Voyager demonstrates strong lifelong learning capabilities within environmental context and demonstrates superior proficiency in playing Minecraft.

It gains access to unique items 3.3 times higher than the previous tech level, travels 2.3 times longer, and unlocks key tech tree milestones 15.3 times faster than the previous tech level.

But while Voyager is able to use its learned skill set to solve novel tasks from scratch in new Minecraft worlds, other techniques are harder to generalize to.

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Thesis title: Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper address: https://openreview.net/pdf?id=VtkGvGcGe3

This paper was first proposed CogEval, a protocol inspired by cognitive science for systematically evaluating the cognitive capabilities of large language models.

Secondly, the paper uses the CogEval system to evaluate eight LLMs (OpenAI GPT-4, GPT-3.5-turbo-175B, davinci-003-175B, Google Bard, Cohere-xlarge- 52.4B, Anthropic Claude-1-52B, LLaMA-13B and Alpaca-7B) cognitive mapping and planning capabilities. The task prompts are based on human experiments and are not present in the LLM training set.

Research has found that although LLMs show obvious capabilities in some planning tasks with simpler structures, once the tasks become complex, LLMs will fall into blind spots, including the detection of invalid trajectories. Hallucinations and being stuck in a loop.

These findings do not support the idea that LLMs have plug-and-play planning capabilities. It may be that LLMs do not understand the underlying relational structure behind the planning problem, i.e., the cognitive map, and have problems unfolding goal-directed trajectories based on the underlying structure.

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Thesis title: Mamba: Linear-Time Sequence Modeling with Selective State Spaces

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Paper address: https://openreview.net/pdf?id=AL1fq05o7H

The author pointed out that at present Many sublinear time architectures, such as linear attention, gated convolution and recurrent models, as well as structured state space models (SSMs), aim to solve the computational inefficiency of Transformer when processing long sequences. However, these models do not perform as well as attention models on important domains such as language. The authors believe that a key weakness of these

types is their inability to perform content-based reasoning, and make some improvements.

First, simply making the SSM parameters a function of the input can address the weaknesses of its discrete modality, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token.

Second, although this change prevents the use of efficient convolutions, the authors designed a hardware-aware parallel algorithm in loop mode. Integrating these selective SSMs into a simplified end-to-end neural network architecture requires no attention mechanism or even an MLP module (Mamba).

Mamba performs well in inference speed (5x faster than Transformers) and scales linearly with sequence length, improving performance on real data up to millions of lengths sequence.

As a universal sequence model backbone, Mamba has achieved state-of-the-art performance in multiple fields including language, audio, and genomics. In terms of language modeling, the Mamba-1.4B model outperforms the Transformers model of the same size in both pre-training and downstream evaluation, and rivals its Transformers model twice the size.

NeurIPS 2023精选回顾:大模型最火,清华ToT思维树上榜

Although these papers did not win awards in 2023, such as Mamba, as a technical model that can revolutionize language model architecture, it is too early to evaluate its impact.

How will NeurIPS go next year, and how will the field of artificial intelligence and neural information systems develop in 2024? Although there are currently different opinions, who can guarantee it? let us wait and see.

The above is the detailed content of Review of NeurIPS 2023: Tsinghua ToT pushes large models into focus. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Use ddrescue to recover data on Linux Use ddrescue to recover data on Linux Mar 20, 2024 pm 01:37 PM

DDREASE is a tool for recovering data from file or block devices such as hard drives, SSDs, RAM disks, CDs, DVDs and USB storage devices. It copies data from one block device to another, leaving corrupted data blocks behind and moving only good data blocks. ddreasue is a powerful recovery tool that is fully automated as it does not require any interference during recovery operations. Additionally, thanks to the ddasue map file, it can be stopped and resumed at any time. Other key features of DDREASE are as follows: It does not overwrite recovered data but fills the gaps in case of iterative recovery. However, it can be truncated if the tool is instructed to do so explicitly. Recover data from multiple files or blocks to a single

Open source! Beyond ZoeDepth! DepthFM: Fast and accurate monocular depth estimation! Open source! Beyond ZoeDepth! DepthFM: Fast and accurate monocular depth estimation! Apr 03, 2024 pm 12:04 PM

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Slow Cellular Data Internet Speeds on iPhone: Fixes Slow Cellular Data Internet Speeds on iPhone: Fixes May 03, 2024 pm 09:01 PM

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

The vitality of super intelligence awakens! But with the arrival of self-updating AI, mothers no longer have to worry about data bottlenecks The vitality of super intelligence awakens! But with the arrival of self-updating AI, mothers no longer have to worry about data bottlenecks Apr 29, 2024 pm 06:55 PM

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

Kuaishou version of Sora 'Ke Ling' is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements Kuaishou version of Sora 'Ke Ling' is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements Jun 11, 2024 am 09:51 AM

What? Is Zootopia brought into reality by domestic AI? Exposed together with the video is a new large-scale domestic video generation model called "Keling". Sora uses a similar technical route and combines a number of self-developed technological innovations to produce videos that not only have large and reasonable movements, but also simulate the characteristics of the physical world and have strong conceptual combination capabilities and imagination. According to the data, Keling supports the generation of ultra-long videos of up to 2 minutes at 30fps, with resolutions up to 1080p, and supports multiple aspect ratios. Another important point is that Keling is not a demo or video result demonstration released by the laboratory, but a product-level application launched by Kuaishou, a leading player in the short video field. Moreover, the main focus is to be pragmatic, not to write blank checks, and to go online as soon as it is released. The large model of Ke Ling is already available in Kuaiying.

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

See all articles