Home Technology peripherals AI Revealed: Step Star trillion MoE+ multi-modal large model matrix unveiled

Revealed: Step Star trillion MoE+ multi-modal large model matrix unveiled

Jul 12, 2024 am 05:52 AM
industry Step-2 Step stars

At the 2024 World Artificial Intelligence Conference, many people lined up in front of a booth just to let the big AI model "arrange" an errand for them in heaven.

Revealed: Step Star trillion MoE+ multi-modal large model matrix unveiled

Process:
  1. Provide personal photos
  2. Generate fairyland image photos (referring to the style of "Havoc in Heaven")
  3. Interactive plot selection and conversation session
  4. Evaluate MBTI personality type based on choices and answers
  5. "Arrange" heavenly errands according to personality type

Experience method:

  • Queue on site
  • Online experience (scan the QR code below)

    Revealed: Step Star trillion MoE+ multi-modal large model matrix unveiled

    Big model startup company Stepping Stars announces big move

The AI ​​interactive experience "AI + Havoc in Heaven" in cooperation with Shanghai Film Studio is just an appetizer for Stepping Stars to showcase the charm of large models. During WAIC, they grandly launched the following big move:

  1. MoE large model with trillions of parameters: Step-2 official version
  2. Multi-modal large model with hundreds of billions of parameters: Step-1.5V
  3. Image generation large model: Step-1X

Step-2 trillion parameter large model

After debuting with Step Stars in March, Step-2 has evolved to be fully close to GPT -4 level, with excellent performance in mathematical logic, programming, Chinese knowledge, English knowledge and instruction following.

Step-1.5V multi-modal large model

Based on the Step-2 model, Step Star developed the multi-modal large model Step-1.5V, which not only has powerful perception and video understanding capabilities, but also can Image content for advanced reasoning (such as solving math problems, writing code, composing poetry).

Step-1X large image generation model

The image generation in "AI + Upheaval in Heaven" is completed by the Step-1X model, which is deeply optimized for Chinese elements and has excellent semantic alignment and instruction following ability.

Step Star has established a complete large model matrix covering trillion-parameter MoE large models and multi-modal large models, becoming the first echelon of large model startups. This is due to their persistence in Scaling Law and matching technology and resource strength.

The

Step-2 trillion parameter large model

trained from scratch will significantly improve the model’s reasoning capabilities in fields such as mathematics and programming. Step-2 can solve more complex mathematical logic and programming problems than the 100-billion-level model, and has been quantitatively confirmed by benchmark evaluations.

Revealed: Step Star trillion MoE+ multi-modal large model matrix unveiledIn addition, its Chinese and English capabilities and command following ability have also been significantly improved.
The reason why Step-2 performs so well is, on the one hand, its huge number of parameters, and on the other hand, its training method.
We know that there are two main ways to train MoE models. One is upcycle, which is to further improve model performance in a more efficient and economical way by reusing the intermediate results of the training process or the already trained model. This training method requires low computing power and has high training efficiency, but the trained model often has a lower upper limit. For example, when training a MoE model, if multiple expert models are obtained by copying and fine-tuning the same basic model, there may be a high degree of similarity between these expert models. This homogeneity will limit the performance improvement of the MoE model. space.
Considering these limitations, Step Stars chose another approach - completely independent research and development and training from scratch. Although this method is difficult to train and consumes a lot of computing power, it can achieve a higher model upper limit.
Specifically, they first made some innovations in MoE architecture design, including parameter sharing by some experts, heterogeneous expert design, etc. The former ensures that certain common capabilities are shared among multiple experts, but at the same time each expert still retains its uniqueness. The latter increases the diversity and overall performance of the model by designing different types of expert models so that each expert has unique advantages on specific tasks.
Based on these innovations, Step-2 not only has a total number of parameters reaching the trillion level, but also the number of parameters activated for each training or inference exceeds most dense models on the market.
In addition, training such a trillion-parameter model from scratch is also a big test for the system team. Fortunately, the Step Star System team has rich practical experience in system construction and management, which allowed them to successfully break through key technologies such as 6D parallelism, extreme video memory management, and fully automated operation and maintenance during the training process, and successfully completed Step-2. train. The Step-1.5V multi-modal large model standing on the shoulders of Step-2
Three months ago, Step Star released the Step-1V multi-modal large model. Recently, with the release of the official version of Step-2, this large multi-modal model has also been upgraded to version 1.5.
Step-1.5V mainly focuses on multi-modal understanding capabilities. Compared with previous versions, its perceptual capabilities have been greatly improved. It can understand complex charts and flowcharts, accurately perceive complex geometric positions in physical space, and can also process high-resolution and extreme aspect ratio images.

Revealed: Step Star trillion MoE+ multi-modal large model matrix unveiled

In addition, it can also understand videos, including objects, characters, environments, and the overall atmosphere and characters' emotions in the videos.

As mentioned earlier, Step-2 played an indispensable role in the birth of Step-1.5V. This means that during Step-1.5V’s RLHF (reinforcement learning based on human feedback) training process, Step-2 is used as a supervised model, which is equivalent to Step-1.5V having a trillion parameters. Models become teachers. Under the guidance of this teacher, Step-1.5V's reasoning ability has been greatly improved, and it can perform various advanced reasoning tasks based on image content, such as solving math problems, writing code, composing poetry, etc. This is also one of the capabilities recently demonstrated by OpenAI GPT-4o. This capability has made the outside world full of expectations for its application prospects.

The multi-modal generation capability is mainly reflected in the new model Step-1X. Compared with some similar models, it has better semantic alignment and command following capabilities. At the same time, it has been deeply optimized for Chinese elements and is more suitable for the aesthetic style of Chinese people.

The AI ​​interactive experience of "Havoc in Heaven" created based on this model integrates image understanding, style transfer, image generation, plot creation and other capabilities, richly and three-dimensionally showing the industry-leading multi-modality of Step Stars level. For example, when generating the initial character, the system will first determine whether the photo uploaded by the user meets the requirements for "face pinching", and then flexibly give feedback in a very "Havoc in Heaven" language style. This reflects the model's picture understanding ability and large language model ability. With the support of large model technology, this game allows players to obtain a completely different interactive experience from traditional online H5 games. Because all interactive questions, user images, and analysis results are generated after the model learns features in real time, the possibility of thousands of people and faces and unlimited plots is truly realized.

Revealed: Step Star trillion MoE+ multi-modal large model matrix unveiledThese excellent performances are inseparable from the DiT model architecture developed by Step Star Full Link (OpenAI’s Sora is also a DiT architecture). In order to allow more people to use this model, Step Star has designed three different parameter quantities for Step-1X: 600M, 2B, and 8B to meet the needs of different computing power scenarios.

At the debut event in March, Jiang Daxin, the founder of Step Star, clearly stated that he believed that the evolution of large models will go through three stages:

  1. In the first stage, each modality such as language, vision, and sound develops independently, and the model of each modality focuses on learning and characterizing the characteristics of its specific modality.
  2. In the second stage, different modes begin to merge. However, this integration is not complete, and the understanding and generation tasks are still separated, which results in the model having strong understanding ability but weak generation ability, or vice versa.
  3. In the third stage, generation and understanding are unified in a model, and then fully integrated with the robot to form embodied intelligence. Next, embodied intelligence actively explores the physical world, and then gradually evolves into a world model, thereby realizing AGI.

This is also the route that Jiang Daxin and others have been adhering to since the beginning of their business. On this road, "Trillions of parameters" and "multi-mode fusion" are indispensable. Step-2, Step-1.5V, and Step-1X are all nodes they have reached on this road.

Moreover, these nodes are linked together. Take OpenAI as an example. The video generation model Sora they released at the beginning of the year used OpenAI's internal tool (most likely GPT-4V) for annotation; and GPT-4V was trained based on GPT-4 related technologies. From the current point of view, the powerful capabilities of single-modal models will lay the foundation for multi-modality; the understanding of multi-modality will lay the foundation for generation. Relying on such a model matrix, OpenAI realizes the left foot stepping on the right foot. And Step Star is confirming this route in China.

We look forward to this company bringing more surprises to the domestic large model field.

The above is the detailed content of Revealed: Step Star trillion MoE+ multi-modal large model matrix unveiled. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI ​​robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI ​​side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

See all articles