The latest progress of the Ant Bailing large model: it already has native multi-modal capabilities-AI-php.cn

The latest progress of the Ant Bailing large model: it already has native multi-modal capabilities

王林

Release： 2024-07-10 15:06:57

Original

558 people have browsed it

On July 5, at the "Trusted Large Models Help Industrial Innovation and Development" forum at the 2024 World Artificial Intelligence Conference, Ant Group announced the latest development progress of its self-developed Bailing model: the Bailing model has the ability to "see" The native multi-modal capabilities of "listening", "speaking" and "drawing" can directly understand and train multi-modal data such as audio, video, pictures, text and so on. Native multimodality is considered to be the only way to AGI. In China, only a few large model manufacturers have achieved this capability. The reporter saw from the demonstration at the conference that multi-modal technology can make large models perceive and interact more like humans, supporting the upgrade of intelligent body experience. Bailing's multi-modal capabilities have been applied to the "Alipay Intelligent Assistant" and will be used in the future. Support more intelligent agent upgrades on Alipay.

The latest progress of the Ant Bailing large model: it already has native multi-modal capabilities

1. (Xu Peng, Vice President of Ant Group, introduced the native multi-modal capabilities of Bailing Large Model)

The multi-modal capabilities of Bailing Large Model have reached the GPT-4o level on the Chinese graphics and text understanding MMBench-CN evaluation set , reached the excellent level (the highest level) in the multi-modal security capability evaluation of the Academy of Information and Communications Technology, has the ability to support large-scale applications, and can support a series of downstream tasks such as AIGC, graphic dialogue, video understanding, and digital humans.
Multi-modal large model technology can enable AI to better understand the complex information of the human world, and also make AI more consistent with human interaction habits when applied. It has shown great potential in many fields such as intelligent customer service, autonomous driving, and medical diagnosis. application potential.
Ant Group has a wealth of application scenarios. The multi-modal capabilities of Bailing’s large model have also been applied in life services, search recommendations, interactive entertainment and other scenarios.
In terms of life services, Ant Group uses multi-modal models to implement ACT technology, allowing the agent to have certain planning and execution capabilities. For example, directly ordering a cup of coffee in the Starbucks applet based on the user's voice specification, this function is currently available on Alipay Intelligent assistant is online.
In the medical field, multi-modal capabilities enable users to operate complex tasks. It can identify and interpret more than 100 complex medical test reports, and can also detect hair health and hair loss to provide assistance in treatment.
(Audiences experienced using Alipay intelligent assistant to order coffee on-site in the Ant exhibition hall)

At the launch site, Xu Peng, Vice President of Ant Group, demonstrated more application scenarios that the newly upgraded multi-modal technology can achieve:

Passed In the natural form of video conversation, the AI assistant can identify the user's clothing and give matching suggestions for dates;
Make different recipe combinations from a bunch of ingredients according to the user's different intentions;
According to the physical symptoms described by the user , select potentially suitable medicines from a batch of medicines, and read out the taking instructions for users' reference.

Based on the multi-modal capabilities of Bailing’s large model, Ant Group has been exploring the practice of large-scale application landing in the industry.

The "Alipay Multi-modal Medical Model" simultaneously released on the forum is the practice of this exploration. It is understood that Alipay’s multi-modal medical model has added tens of billions of Chinese and English graphics and texts including reports, images, medicines and other multi-modal information, hundreds of billions of medical text corpus, and tens of millions of high-quality medical knowledge maps. , has professional medical knowledge, and ranked first on the A list and second on the B list on promptCBLUE, the Chinese medical LLM evaluation list.

Based on the multi-modal capabilities of the Bailing large model, SkySense, a remote sensing model jointly developed by Ant Group and Wuhan University, also announced an open source plan on the forum. SkySense is currently the multi-modal remote sensing basic model with the largest parameter scale, the most comprehensive task coverage, and the highest recognition accuracy.

"From single text semantic understanding to multi-modal capabilities, it is a key iteration of artificial intelligence technology, and the application scenarios of 'watching, listening, writing, and drawing' spawned by multi-modal technology will make AI performance more realistic, To be closer to humans, Ant will continue to invest in the research and development of native multi-modality technology,” Xu Peng said.

The above is the detailed content of The latest progress of the Ant Bailing large model: it already has native multi-modal capabilities. For more information, please follow other related articles on the PHP Chinese website!