Table of Contents
1. Historical development of multi-modal large models
1. What can multi-modal large models do?
2. Multi-modal large model
3. Modal alignment architecture
2. Jiuzhang Yunji DataCanvas’ multi-modal large model platform
1. AI Foundation Software (AIFS)
2. Model tool LMOPS
4. LMB –Large Model Builder
5. LMS – Large Model Serving
6. Prompt Manager
3. Practice of Jiuzhang Yunji DataCanvas multi-modal large model
1. Multi-modal large model - with memory
2. Unstructured data ETL Pipeline
3. Multi-modal large model construction method
4. Case-Knowledge Base Construction
Home Technology peripherals AI Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

Oct 20, 2023 am 08:45 AM
algorithm Multimodal large model

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

1. Historical development of multi-modal large models

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

The picture above The photo is of the first Artificial Intelligence Workshop held at Dartmouth College in the United States in 1956. This conference is also considered to have kicked off the field of artificial intelligence. The attendees were mainly pioneers in the field of symbolic logic (except for those in the middle of the front row). neurobiologist Peter Milner).

However, this symbolic logic theory could not be realized for a long time, and even the first AI winter period came in the 1980s and 1990s. It was not until the recent implementation of large language models that we discovered that neural networks really carry this logical thinking. The work of neurobiologist Peter Milner inspired the later development of artificial neural networks, and it was for this reason that he was invited to participate in this academic seminar. meeting.

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

In 2012, Tesla’s self-driving director Andrew posted the picture above on his blog, showing then-U.S. President Obama joking with his subordinates. For artificial intelligence to understand this picture, it is not only a visual perception task, because in addition to identifying objects, it also needs to understand the relationship between them; only by knowing the physical principles of the scale can we know the story described in the picture: Obama steps on The man on the scale gained weight, causing him to make this strange expression while others laughed. Such logical thinking has obviously gone beyond the scope of pure visual perception. Therefore, visual cognition and logical thinking must be combined to get rid of the embarrassment of "artificial mental retardation". The importance and difficulty of multi-modal large models also reflect it's here.

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

The above picture is an anatomical structure diagram of the human brain. The language logical area in the picture corresponds to the large language model, while other areas are respectively Corresponding to different senses, including vision, hearing, touch, movement, memory, etc. Although the artificial neural network is not a brain neural network in the true sense, we can still get some inspiration from it, that is, when constructing a large model, different functions can be combined together. This is also the basic idea of ​​multi-modal model construction.

1. What can multi-modal large models do?

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

Multi-modal large models can do a lot of things for us, such as video understanding. Large models can help us summarize the summary and key information of the video, This saves us time watching videos; large models can also help us perform post-analysis of videos, such as program classification, program ratings statistics, etc.; in addition, Vincentian graphs are also an important application field of multi-modal large models.

If the large model is combined with the movement of people or robots, an embodied intelligence will be generated, just like a person, planning the best path based on past experience. methods and apply them to new scenarios to solve some problems that have not been encountered before while avoiding risks; you can even modify the original plan during the execution process until you finally achieve success. This is also an application scenario with broad prospects.

2. Multi-modal large model

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

The above picture is some important nodes in the development process of multi-modal large model :

  • The 2020 ViT model (Vision Transformer) is the beginning of a large model. For the first time, the Transformer architecture is used for other types of data (visual data) in addition to language and logical processing, and it is displayed It has good generalization ability;
  • Then through the OpenAI open source CLIP model, it was once again proved that through the use of ViT and large language model, visual tasks have been achieved a lot. Strong long-tail generalization ability, that is, inferring previously unseen categories through common sense
  • By 2023, a variety of multi-modal large models It gradually emerged, from PaLM-E (robot), to whisper (speech recognition), to ImageBind (image alignment), to Sam (semantic segmentation), and finally to geographical images; it also includes Microsoft's unified multi-modal architecture Kosmos2 , multimodal large models are developing rapidly.
  • # Tesla also proposed the vision of a universal world model at CVPR in June.

As can be seen from the above figure, in just half a year, the large model has undergone many changes, and its iteration speed is very fast.

3. Modal alignment architecture

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

The above picture is multi-modal The general architecture diagram of a large state model includes a language model and a visual model. The alignment model is learned through a fixed language model and a fixed visual model; and alignment is to combine the vector space of the visual model and the vector space of the language model, and then in The understanding of the internal logical relationship between the two is completed in a unified vector space.

Both the Flamingo model and the BLIP2 model shown in the figure adopt similar structures (the Flamingo model uses the Perceiver architecture, while the BLIP2 model uses an improved version of the Transformer architecture); and then through various comparisons The learning method carries out pre-training, conducts a large amount of learning on a large number of tokens, and obtains better alignment effects; finally, the model is fine-tuned according to specific tasks.

2. Jiuzhang Yunji DataCanvas’ multi-modal large model platform

1. AI Foundation Software (AIFS)

Jiuzhang Yunji DataCanvas is an artificial intelligence basic software provider. It also provides computing resources (including GPU clusters), performs high-performance storage and network optimization, and provides large model training on this basis. Tools, including data annotation modeling experiment sandbox, etc. Jiuzhang Yunji DataCanvas not only supports common open source large models on the market, but also independently develops Yuanshi multi-modal large models. At the application layer, tools are provided to manage prompt words, fine-tune the model, and provide a model operation and maintenance mechanism. At the same time, a multi-modal vector database was also open sourced to enrich the basic software architecture.

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

2. Model tool LMOPS

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

九章云Ji DataCanvas focuses on the optimization of the entire life cycle of development, including data preparation (data annotation supports manual annotation and intelligent annotation), model development, model evaluation (including horizontal and vertical evaluation), model reasoning (supports model quantification, knowledge distillation, etc. Accelerated inference mechanism), model application, etc.

3. LMB – Large Model Builder

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

When building the model, a lot of distributed Efficient optimization work, including data parallelism, Tensor parallelism, pipeline parallelism, etc. These distributed optimization tasks are completed with one click and support visual control, which can greatly reduce labor costs and improve development efficiency.

4. LMB –Large Model Builder

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

Tuning of large models has also been optimized, including common continue training, supervisor tuning, and human feedback in reinforcement learning. In addition, many optimizations have been made for Chinese, such as the automatic expansion of Chinese vocabulary. Because many Chinese words are not included in large open source models, these words may be split into multiple tokens; automatically expanding these words can allow the model to better use these words.

5. LMS – Large Model Serving

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

Serving of large models is also a very important component In part, the platform has also made a lot of optimizations in aspects such as model quantification and knowledge distillation, which greatly reduces the computing cost. It also accelerates the transformer and reduces its calculation amount through layer-by-layer knowledge distillation. At the same time, a lot of pruning work has been done (including structured pruning, sparse pruning, etc.), which has greatly improved the inference speed of large models.

In addition, the interactive dialogue process has also been optimized. For example, in a multi-turn dialogue Transformer, the key and value of each tensor can be remembered without repeated calculations. Therefore, it can be stored in Vector DB to realize the conversation history memory function and improve the user experience during the interaction process.

6. Prompt Manager

Prompt Manager, a large model prompt word design and construction tool, helps users design better prompt words and guide Large models generate more accurate, reliable, and expected output. This tool can not only provide development toolkit development mode for technical personnel, but also provide human-computer interaction operation mode for non-technical personnel, meeting the needs of different groups of people for using large models.

Its main functions include: AI model management, scene management, prompt word template management, prompt word development and prompt word application, etc.

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

The platform provides commonly used prompt word management tools to achieve version control, and provides commonly used templates to speed up the implementation of prompt words.

3. Practice of Jiuzhang Yunji DataCanvas multi-modal large model

1. Multi-modal large model - with memory

After introducing the platform functions, I will share the multi-modal large model development practice.

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform


The picture above is the basic framework of the Jiuzhang Yunji DataCanvas multi-modal model, which is similar to many other models. What is different about the modal large model is that it contains memory, which can improve the reasoning capabilities of the open source large model.

Generally, the number of parameters of large open source models is relatively low. If a part of the parameters are used for memory, its reasoning ability will be significantly reduced. If memory is added to a large open source model, reasoning and memory capabilities will be improved at the same time.

In addition, similar to most models, multi-modal large models will also fix the large language model and fixed data encoding, and conduct separate modular training for the alignment function; therefore, all different The data modes will be aligned to the logical parts of the text; in the reasoning process, the language is first translated, then fused, and finally the reasoning work is performed.

2. Unstructured data ETL Pipeline

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

Due to the combination of our DingoDB multi-modal vector database It has multi-modal and ETL functions, so it can provide good unstructured data management capabilities. The platform provides pipeline ETL functions and has made many optimizations, including operator compilation, parallel processing, and cache optimization.

In addition, the platform provides a Hub that can reuse pipelines to achieve the most efficient development experience. At the same time, it supports many encoders on Huggingface, which can achieve optimal encoding of different modal data.

3. Multi-modal large model construction method

Jiuzhang Yunji DataCanvas uses the Yuanshi multi-modal large model as a base to support Users can choose other open source large models and also support users to use their own modal data for training.

The construction of a large multi-modal model is roughly divided into three stages:

  • The first stage: fixed large language model and modal coding Machine training alignment and query;
  • Second stage (optional, supports multi-modal search): fixed large language model, modal encoder, alignment and Query module, training retrieval module;
  • The third stage (optional, for specific tasks): Instructions to fine-tune the large language model.

4. Case-Knowledge Base Construction

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform


The memory architecture in large models can help us realize the construction of multi-modal knowledge base, which is actually the application of the model. Zhihu is a typical multi-modal knowledge base application module, and its professional knowledge can be traced.

In order to ensure the certainty and security of knowledge, it is often necessary to trace the source of professional knowledge. The knowledge base can help us realize this function, and it will also be more convenient to add new knowledge. , there is no need to modify the model parameters, just add the knowledge directly to the database.

Specifically, professional knowledge is used through the encoder to make different encoding choices, and at the same time, unified evaluation is performed based on different evaluation methods, and the selection of the encoder is realized through one-click evaluation. Finally, the encoder vectorization is applied and stored in the DingoDB multi-modal vector database, and then relevant information is extracted through the multi-modal module of the large model, and reasoning is performed through the language model.

The last part of the model often requires fine-tuning of instructions. Since the needs of different users are different, the entire multi-modal large model needs to be fine-tuned. Due to the special advantages of multimodal knowledge bases in organizing information, the model has the ability to learn and retrieve. This is also an innovation we made in the process of paragraphing text.

The general knowledge base is to divide the document into paragraphs, and then unlock each paragraph independently. This method is easily interfered by noise, and for many large documents, it is difficult to determine the standard for paragraph division.

In our model, the retrieval module performs learning, and the model automatically finds suitable structured information organization. For a specific product, start from the product manual, first locate the large catalog paragraph, and then locate the specific paragraph. At the same time, due to multi-modal information integration, in addition to text, it often also contains images, tables, etc., which can also be vectorized and combined with Meta information to achieve joint retrieval, thus improving retrieval efficiency.

It is worth mentioning that the retrieval module uses a memory attention mechanism, which can increase the recall rate by 10% compared to similar algorithms; at the same time, the memory attention mechanism can be used for multi-modal Document processing is also a very advantageous aspect.

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

##4. Thoughts and prospects for the future

1. Enterprise data management - Knowledge base

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

85% of data in an enterprise is unstructured data, and only 15% is structured data . In the past 20 years, artificial intelligence has mainly revolved around structured data. Unstructured data is very difficult to utilize and requires a lot of energy and cost to convert it into structured data. With the help of multi-modal large models and multi-modal knowledge bases, and through the new paradigm of artificial intelligence, the utilization of unstructured data in internal management of enterprises can be greatly improved, which may bring about a 10-fold increase in value in the future.

2. Knowledge base --> Agent

Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform

Multimodal knowledge base As the basis of the intelligent agent, the above functions such as R&D agent, customer service agent, sales agent, legal agent, human resources agent, and enterprise operation and maintenance agent can all be operated through the knowledge base.

Taking the sales agent as an example, a common architecture includes two agents existing at the same time, one of which is responsible for decision-making and the other is responsible for the analysis of the sales stage. Both modules can search for relevant information through multi-modal knowledge bases, including product information, historical sales statistics, customer portraits, past sales experience, etc. This information is integrated to help these two agents do the best and most correct work These decisions, in turn, help users obtain the best sales information, which is then recorded into a multi-modal database. This cycle continues to improve sales performance.

We believe that the most valuable companies in the future will be those that put intelligence into practice. I hope Jiuzhang Yunji DataCanvas can accompany you all the way and help each other.

The above is the detailed content of Practice and reflections on Jiuzhang Yunji DataCanvas multi-modal large model platform. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

CLIP-BEVFormer: Explicitly supervise the BEVFormer structure to improve long-tail detection performance CLIP-BEVFormer: Explicitly supervise the BEVFormer structure to improve long-tail detection performance Mar 26, 2024 pm 12:41 PM

Written above & the author’s personal understanding: At present, in the entire autonomous driving system, the perception module plays a vital role. The autonomous vehicle driving on the road can only obtain accurate perception results through the perception module. The downstream regulation and control module in the autonomous driving system makes timely and correct judgments and behavioral decisions. Currently, cars with autonomous driving functions are usually equipped with a variety of data information sensors including surround-view camera sensors, lidar sensors, and millimeter-wave radar sensors to collect information in different modalities to achieve accurate perception tasks. The BEV perception algorithm based on pure vision is favored by the industry because of its low hardware cost and easy deployment, and its output results can be easily applied to various downstream tasks.

Implementing Machine Learning Algorithms in C++: Common Challenges and Solutions Implementing Machine Learning Algorithms in C++: Common Challenges and Solutions Jun 03, 2024 pm 01:25 PM

Common challenges faced by machine learning algorithms in C++ include memory management, multi-threading, performance optimization, and maintainability. Solutions include using smart pointers, modern threading libraries, SIMD instructions and third-party libraries, as well as following coding style guidelines and using automation tools. Practical cases show how to use the Eigen library to implement linear regression algorithms, effectively manage memory and use high-performance matrix operations.

Explore the underlying principles and algorithm selection of the C++sort function Explore the underlying principles and algorithm selection of the C++sort function Apr 02, 2024 pm 05:36 PM

The bottom layer of the C++sort function uses merge sort, its complexity is O(nlogn), and provides different sorting algorithm choices, including quick sort, heap sort and stable sort.

Can artificial intelligence predict crime? Explore CrimeGPT's capabilities Can artificial intelligence predict crime? Explore CrimeGPT's capabilities Mar 22, 2024 pm 10:10 PM

The convergence of artificial intelligence (AI) and law enforcement opens up new possibilities for crime prevention and detection. The predictive capabilities of artificial intelligence are widely used in systems such as CrimeGPT (Crime Prediction Technology) to predict criminal activities. This article explores the potential of artificial intelligence in crime prediction, its current applications, the challenges it faces, and the possible ethical implications of the technology. Artificial Intelligence and Crime Prediction: The Basics CrimeGPT uses machine learning algorithms to analyze large data sets, identifying patterns that can predict where and when crimes are likely to occur. These data sets include historical crime statistics, demographic information, economic indicators, weather patterns, and more. By identifying trends that human analysts might miss, artificial intelligence can empower law enforcement agencies

Improved detection algorithm: for target detection in high-resolution optical remote sensing images Improved detection algorithm: for target detection in high-resolution optical remote sensing images Jun 06, 2024 pm 12:33 PM

01 Outlook Summary Currently, it is difficult to achieve an appropriate balance between detection efficiency and detection results. We have developed an enhanced YOLOv5 algorithm for target detection in high-resolution optical remote sensing images, using multi-layer feature pyramids, multi-detection head strategies and hybrid attention modules to improve the effect of the target detection network in optical remote sensing images. According to the SIMD data set, the mAP of the new algorithm is 2.2% better than YOLOv5 and 8.48% better than YOLOX, achieving a better balance between detection results and speed. 02 Background & Motivation With the rapid development of remote sensing technology, high-resolution optical remote sensing images have been used to describe many objects on the earth’s surface, including aircraft, cars, buildings, etc. Object detection in the interpretation of remote sensing images

Application of algorithms in the construction of 58 portrait platform Application of algorithms in the construction of 58 portrait platform May 09, 2024 am 09:01 AM

1. Background of the Construction of 58 Portraits Platform First of all, I would like to share with you the background of the construction of the 58 Portrait Platform. 1. The traditional thinking of the traditional profiling platform is no longer enough. Building a user profiling platform relies on data warehouse modeling capabilities to integrate data from multiple business lines to build accurate user portraits; it also requires data mining to understand user behavior, interests and needs, and provide algorithms. side capabilities; finally, it also needs to have data platform capabilities to efficiently store, query and share user profile data and provide profile services. The main difference between a self-built business profiling platform and a middle-office profiling platform is that the self-built profiling platform serves a single business line and can be customized on demand; the mid-office platform serves multiple business lines, has complex modeling, and provides more general capabilities. 2.58 User portraits of the background of Zhongtai portrait construction

A must-read for AI product managers! A beginner's guide to getting started with machine learning algorithms A must-read for AI product managers! A beginner's guide to getting started with machine learning algorithms Nov 28, 2023 pm 05:25 PM

Interesting explanations of machine learning algorithms are the subject of the next article. This article is shared for students who are AI product managers and is highly recommended to students who have just entered this field! We have talked before about the artificial intelligence industry, the second curve of product managers, and the differences between the two positions. This time we will delve deeper into the topic of interesting machine learning algorithms. Machine learning algorithms may sound a bit unfathomable. I understand that many people, including me, will feel a headache at the beginning. I try not to use formulas and only present them in the form of cases. We will gradually deepen from the whole to the part. 1. Overview of Machine Learning Algorithms First, let’s understand the basic concepts of machine learning algorithms. Machine learning is a method for computers to learn and improve from data, and machine learning algorithms are

Add SOTA in real time and skyrocket! FastOcc: Faster inference and deployment-friendly Occ algorithm is here! Add SOTA in real time and skyrocket! FastOcc: Faster inference and deployment-friendly Occ algorithm is here! Mar 14, 2024 pm 11:50 PM

Written above & The author’s personal understanding is that in the autonomous driving system, the perception task is a crucial component of the entire autonomous driving system. The main goal of the perception task is to enable autonomous vehicles to understand and perceive surrounding environmental elements, such as vehicles driving on the road, pedestrians on the roadside, obstacles encountered during driving, traffic signs on the road, etc., thereby helping downstream modules Make correct and reasonable decisions and actions. A vehicle with self-driving capabilities is usually equipped with different types of information collection sensors, such as surround-view camera sensors, lidar sensors, millimeter-wave radar sensors, etc., to ensure that the self-driving vehicle can accurately perceive and understand surrounding environment elements. , enabling autonomous vehicles to make correct decisions during autonomous driving. Head

See all articles