


Stanford's 2 billion parameter end-test multi-modal AI Agent model has been greatly upgraded, and can be used by mobile phones, cars and robots
The world’s first ultra-small multi-modal AI Agent modelOctopus V3, from the NEXA AI team of Stanford University, making Agent smarter, faster, and reducing energy consumption and costs.
In early April this year, NEXA AI launched the much-anticipated Octopus V2, which surpassed GPT in function call performance -4, reduces the amount of text required for inference by 95%, bringing new possibilities to end-side AI applications. Its patented core technology "functional token" significantly reduces the length of text required for reasoning through innovative function calling methods.
This approach enables efficient training of models with only 2 billion parameters and surpasses in accuracy and latency GPT-4 adapts to the deployment needs of various end devices.
Since Octopus V2 was released in the LLM community, it has received widespread attention and attracted praise from a large number of experts and researchers in the field of artificial intelligence, such as Julien Chaumond, CTO of Hugging Face, and Rowan, founder of the well-known AI newsletter AI Cheung, as well as Figure AI founder Brett Adcock, OPPO edge artificial intelligence team leader Manoj Kumar, etc. They are hailed as "creating a new era of device-side AI technology."
On the well-known open source AI platform Hugging Face, Octopus V2 has been downloaded more than 12,000 times.
In less than a month, the NEXA AI team released the next-generation multi-modal AI Agent model Octopus V3, demonstrating further breakthroughs: with Image processing and multi-language text processing capabilities pave the way for end-side devices such as smartphones to truly enter the AI era.
The first multi-modal AI Agent model with less than 1 billion parameters
Octopus V3 not only has multi-modal capabilities, The function calling performance far exceeds similar models and is comparable to GPT-4V GPT4; while the number of model parameters does not reach 1 billion, and it has multi-language capabilities.
In other words, compared with traditional large-scale language models, it is smaller in size and consumes less energy. It can more easily run on various small-end devices, such as Raspberry Pi, and achieve high speed. and accurate function calls.
This means that in the future, AI Agent can be widely used in smartphones, AR/VR, robots, smart cars and other end-side devices to provide users with a more interactive experience. Smooth and smart.
On the other hand, because V3 has multi-modal processing capabilities, it can handle text and image input at the same time, coupled with multi-language capabilities, it will also make the user experience richer.
For example, in the Instacart shopping application, users can let the AI Agent automatically search for products for them through a picture of a pineapple and simple conversation instructions, improving efficiency and user experience.
For another example, in scenarios such as sending emails, Octopus V3 can automatically extract information and fill in the email content based on an image with text, providing users with more intelligent, Convenient service.
From software interaction to smart cars, device-side AI has huge potential
Based on these characteristics, Octopus V2 and V3 have rich and diverse application scenarios and a wide range of applications. Application prospects.
In addition to the mobile phone scenarios mentioned above, when Octopus V2 is applied to smart cars, it can also bring new interactive experiences. Current voice assistants are often difficult to help car owners complete more complex tasks, such as temporarily changing destinations during driving, adding additional stops, etc. After applying Octopus V3, the AI assistant can quickly and accurately complete corresponding tasks based on relatively vague and simple instructions.
Combined with the capabilities of V2 and V3, from information retrieval to completion of design based on instructions, users can obtain a smooth AI experience in virtual scenes: In a community user’s VR scene demo, input simple voice commands Finally, AI Agent can help users quickly complete a living room design, replace sofas, change the color of lights, etc. with just a few clicks. After the user enters the travel instructions, the user quickly arrives in Japan, and the AI Agent can also help the user search for corresponding attractions and provide rich information in simple conversational communication.
Data shows that the global large-scale language model market is growing rapidly. Granview Research reports that the global large language model market size is estimated at US$4.35 billion and is expected to grow at a compound annual growth rate of 35.9% from 2024 to 2030. Similarly, the edge artificial intelligence market is also showing a booming momentum - it is expected that the global edge artificial intelligence market will grow at a compound annual growth rate of 21.0% from 2023 to 2030, and will reach US$66.478 billion by 2030.
The NEXA AI team was founded by outstanding researchers at Stanford University.
Founder and Chief Scientist Alex Chen (Chen Wei) is studying for a PhD at Stanford University. He has extensive experience in artificial intelligence research and has served as a Chinese researcher at Stanford University. Chairman of the Stanford Chinese Entrepreneurs Organization.
Co-founder and Chief Technology Officer Zack Li (Li Zhiyuan) is also a graduate of Stanford University and has 4 years of end-side experience in Google and Amazon Lab126 laboratories With front-line research and development experience in AI, he also served as the chairman of the Stanford Chinese Entrepreneurship Association.
Associate Professor at Stanford University and Deputy Director of the Stanford Technology Entrepreneurship ProgramCharles (Chuck) Eesley serves as an advisor, providing guidance and support to the team.
△Left: Li Zhiyuan; Right: Chen Wei
Currently, NEXA AI’s original technology has applied for patent protection.
The founding team of NEXA AI stated that they will continue to be committed to promoting the development of end-side AI technology, increasing the influence of its innovative technologies through open source models, and creating a smarter and more efficient future life for users.
Paper address: https://arxiv.org/abs/2404.11459
The above is the detailed content of Stanford's 2 billion parameter end-test multi-modal AI Agent model has been greatly upgraded, and can be used by mobile phones, cars and robots. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

The return value types of C language function include int, float, double, char, void and pointer types. int is used to return integers, float and double are used to return floats, and char returns characters. void means that the function does not return any value. The pointer type returns the memory address, be careful to avoid memory leakage.结构体或联合体可返回多个相关数据。

Flexible application of function pointers: use comparison functions to find the maximum value of an array. First, define the comparison function type CompareFunc, and then write the comparison function compareMax(a, b). The findMax function accepts array, array size, and comparison function parameters, and uses the comparison function to loop to compare array elements to find the maximum value. This method has strong code reusability, reflects the idea of higher-order programming, and is conducive to solving more complex problems.

Algorithms are the set of instructions to solve problems, and their execution speed and memory usage vary. In programming, many algorithms are based on data search and sorting. This article will introduce several data retrieval and sorting algorithms. Linear search assumes that there is an array [20,500,10,5,100,1,50] and needs to find the number 50. The linear search algorithm checks each element in the array one by one until the target value is found or the complete array is traversed. The algorithm flowchart is as follows: The pseudo-code for linear search is as follows: Check each element: If the target value is found: Return true Return false C language implementation: #include#includeintmain(void){i

The pointer parameters of C language function directly operate the memory area passed by the caller, including pointers to integers, strings, or structures. When using pointer parameters, you need to be careful to modify the memory pointed to by the pointer to avoid errors or memory problems. For double pointers to strings, modifying the pointer itself will lead to pointing to new strings, and memory management needs to be paid attention to. When handling pointer parameters to structures or arrays, you need to carefully check the pointer type and boundaries to avoid out-of-bounds access.

A function pointer is a pointer to a function, and a pointer function is a function that returns a pointer. Function pointers point to functions, used to select and execute different functions; pointer functions return pointers to variables, arrays or other functions; when using function pointers, pay attention to parameter matching and checking pointer null values; when using pointer functions, pay attention to memory management and free dynamically allocated memory; understand the differences and characteristics of the two to avoid confusion and errors.

The C language function returns a pointer to output a memory address. The pointing content depends on the operation inside the function, which may point to local variables (be careful, memory has been released after the function ends), dynamically allocated memory (must be allocated with malloc and free), or global variables.

The key elements of C function definition include: return type (defining the value returned by the function), function name (following the naming specification and determining the scope), parameter list (defining the parameter type, quantity and order accepted by the function) and function body (implementing the logic of the function). It is crucial to clarify the meaning and subtle relationship of these elements, and can help developers avoid "pits" and write more efficient and elegant code.

C language function calls can be divided into nested calls and recursive calls. Nested calls refer to calling other functions within a function, nesting them layer by layer. Recursive calls refer to the function itself calling itself, which can be used to deal with self-similar structure problems. The key difference is that the functions in nested calls are called in sequence, with independent interaction scopes, while the functions in recursive calls are constantly called, so you need to pay attention to the recursive basis and stack overflow issues. Which calling method to choose depends on the specific requirements and performance requirements of the problem.
