India's AI landscape is rapidly evolving, with significant advancements and innovations emerging. Krutrim AI Labs, an Ola Group company, is a key player in this growth, recently unveiling Chitrarth-1, a groundbreaking Vision Language Model (VLM). Designed for India's diverse linguistic and cultural context, Chitrarth-1 supports ten major Indian languages plus English, addressing a critical need for multilingual AI solutions. This article delves into Chitrarth-1 and its implications for India's expanding AI capabilities.
Table of Contents
What is Chitrarth-1?
Chitrarth-1 (combining "Chitra" – image and "Artha" – meaning) is a 7.5-billion parameter VLM integrating advanced language and vision processing. Built to serve India's diverse linguistic needs, it supports Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, Assamese, and English. This model embodies Krutrim's commitment to developing AI "for our country, of our country, and for our citizens." Its use of a rich, multilingual dataset minimizes bias and ensures robust performance across Indic languages and English, promoting equitable AI access. Research on Chitrarth-1 is published in leading academic journals, including NeurIPS and the Ninth Conference on Machine Translation.
Chitrarth-1 Architecture and Specifications
Chitrarth-1 utilizes the Krutrim-7B LLM as its foundation, enhanced by a vision encoder based on the SIGLIP (siglip-so400m-patch14-384) model. Key architectural components include:
Training Data and Methodology
Chitrarth-1's training involved two phases using a vast, multilingual dataset:
Phase 1: Adapter Pre-training
Phase 2: Instruction Tuning
Performance and Benchmarks
Chitrarth-1 has been rigorously tested against leading VLMs like IDEFICS 2 (7B) and PALO 7B, consistently outperforming them on various benchmarks while maintaining competitiveness on tasks such as TextVQA and Vizwiz. It also surpasses LLaMA 3.2 11B Vision Instruct in key metrics. Krutrim introduced BharatBench, a new evaluation suite for ten under-resourced Indic languages across three tasks, establishing a baseline for future research and highlighting Chitrarth-1's ability to handle these languages effectively. Sample BharatBench results are shown below:
Language | POPE | LLaVA-Bench | MMVet |
---|---|---|---|
Telugu | 79.9 | 54.8 | 43.76 |
Hindi | 78.68 | 51.5 | 38.85 |
Bengali | 83.24 | 53.7 | 33.24 |
Malayalam | 85.29 | 55.5 | 25.36 |
Kannada | 85.52 | 58.1 | 46.19 |
English | 87.63 | 67.9 | 30.49 |
For more details, click here.
Accessing Chitrarth-1
Chitrarth-1 is accessible through:
Chitrarth-1 in Action
Examples of Chitrarth-1's capabilities include image analysis, image caption generation, and UI/UX screen analysis (images provided in the original article).
Conclusion
Krutrim AI Labs, a division of the Ola Group, is committed to building the future of AI computing. With Chitrarth-1, and other offerings like GPU as a Service, AI Studio, and more, they are establishing a new standard for inclusive, culturally sensitive AI, fostering a more equitable technological landscape.
The above is the detailed content of Chitrarth-1: A Multilingual VLM by Krutrim AI Labs. For more information, please follow other related articles on the PHP Chinese website!