whip牛士 News on October 12th, recently, Zhipu AI & Tsinghua KEG released and directly open sourced the multi-modal large model-CogVLM-17B in the Moda community. It is reported that CogVLM is a powerful open source visual language model that uses the visual expert module to deeply integrate language coding and visual coding, and has achieved SOTA performance on 14 authoritative cross-modal benchmarks.
CogVLM-17B is currently the model with the first comprehensive performance on the multi-modal authoritative academic list, and has achieved the most advanced or second place results on 14 data sets. The effect of CogVLM depends on the idea of "visual priority", that is, giving visual understanding a higher priority in multi-modal models. It uses a 5B parameter visual encoder and a 6B parameter visual expert module, with a total of 11B parameters to model image features, even more than the 7B parameters of text
The above is the detailed content of Zhipu AI cooperated with Tsinghua KEG to release an open source multi-modal large model called CogVLM-17B. For more information, please follow other related articles on the PHP Chinese website!