On July 5, at the "Trusted Large Models Help Industrial Innovation and Development" forum at the 2024 World Artificial Intelligence Conference, Ant Group announced the latest development progress of its self-developed Bailing model: the Bailing model has the ability to "see" The native multi-modal capabilities of "listening", "speaking" and "drawing" can directly understand and train multi-modal data such as audio, video, pictures, text and so on. Native multimodality is considered to be the only way to AGI. In China, only a few large model manufacturers have achieved this capability. The reporter saw from the demonstration at the conference that multi-modal technology can make large models perceive and interact more like humans, supporting the upgrade of intelligent body experience. Bailing's multi-modal capabilities have been applied to the "Alipay Intelligent Assistant" and will be used in the future. Support more intelligent agent upgrades on Alipay.
1. (Xu Peng, Vice President of Ant Group, introduced the native multi-modal capabilities of Bailing Large Model)At the launch site, Xu Peng, Vice President of Ant Group, demonstrated more application scenarios that the newly upgraded multi-modal technology can achieve:
Based on the multi-modal capabilities of Bailing’s large model, Ant Group has been exploring the practice of large-scale application landing in the industry.
The "Alipay Multi-modal Medical Model" simultaneously released on the forum is the practice of this exploration. It is understood that Alipay’s multi-modal medical model has added tens of billions of Chinese and English graphics and texts including reports, images, medicines and other multi-modal information, hundreds of billions of medical text corpus, and tens of millions of high-quality medical knowledge maps. , has professional medical knowledge, and ranked first on the A list and second on the B list on promptCBLUE, the Chinese medical LLM evaluation list.
Based on the multi-modal capabilities of the Bailing large model, SkySense, a remote sensing model jointly developed by Ant Group and Wuhan University, also announced an open source plan on the forum. SkySense is currently the multi-modal remote sensing basic model with the largest parameter scale, the most comprehensive task coverage, and the highest recognition accuracy.
"From single text semantic understanding to multi-modal capabilities, it is a key iteration of artificial intelligence technology, and the application scenarios of 'watching, listening, writing, and drawing' spawned by multi-modal technology will make AI performance more realistic, To be closer to humans, Ant will continue to invest in the research and development of native multi-modality technology,” Xu Peng said.
The above is the detailed content of The latest progress of the Ant Bailing large model: it already has native multi-modal capabilities. For more information, please follow other related articles on the PHP Chinese website!