Home > Technology peripherals > AI > Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Joseph Gordon-Levitt
Release: 2025-03-03 18:22:13
Original
929 people have browsed it

India's AI landscape is rapidly evolving, with significant advancements and innovations emerging. Krutrim AI Labs, an Ola Group company, is a key player in this growth, recently unveiling Chitrarth-1, a groundbreaking Vision Language Model (VLM). Designed for India's diverse linguistic and cultural context, Chitrarth-1 supports ten major Indian languages plus English, addressing a critical need for multilingual AI solutions. This article delves into Chitrarth-1 and its implications for India's expanding AI capabilities.

Table of Contents

  • What is Chitrarth-1?
  • Chitrarth-1 Architecture and Specifications
  • Training Data and Methodology
    • Phase 1: Adapter Pre-training
    • Phase 2: Instruction Tuning
  • Performance and Benchmarks
  • Accessing Chitrarth-1
  • Chitrarth-1 in Action
  • Conclusion

What is Chitrarth-1?

Chitrarth-1 (combining "Chitra" – image and "Artha" – meaning) is a 7.5-billion parameter VLM integrating advanced language and vision processing. Built to serve India's diverse linguistic needs, it supports Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, Assamese, and English. This model embodies Krutrim's commitment to developing AI "for our country, of our country, and for our citizens." Its use of a rich, multilingual dataset minimizes bias and ensures robust performance across Indic languages and English, promoting equitable AI access. Research on Chitrarth-1 is published in leading academic journals, including NeurIPS and the Ninth Conference on Machine Translation.

Chitrarth-1 Architecture and Specifications

Chitrarth-1 utilizes the Krutrim-7B LLM as its foundation, enhanced by a vision encoder based on the SIGLIP (siglip-so400m-patch14-384) model. Key architectural components include:

  • A pre-trained SIGLIP vision encoder for image feature extraction.
  • A trainable linear mapping layer to project image features into the LLM's token space.
  • Fine-tuning with instruction-following image-text datasets for improved multimodal performance.

Training Data and Methodology

Chitrarth-1's training involved two phases using a vast, multilingual dataset:

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Phase 1: Adapter Pre-training

  • Pre-trained on a diverse dataset translated into multiple Indic languages using an open-source model.
  • Maintained a balanced representation of English and Indic languages to ensure equitable performance.
  • Designed to avoid bias towards any single language, optimizing for efficiency and robustness.

Phase 2: Instruction Tuning

  • Fine-tuned on a complex instruction dataset to enhance multimodal reasoning capabilities.
  • Utilized an English-based instruction-tuning dataset and its multilingual translations.
  • Included a vision-language dataset featuring diverse Indian imagery (personalities, monuments, artwork, cuisine).
  • Incorporated high-quality proprietary English text data for balanced domain representation.

Performance and Benchmarks

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Chitrarth-1 has been rigorously tested against leading VLMs like IDEFICS 2 (7B) and PALO 7B, consistently outperforming them on various benchmarks while maintaining competitiveness on tasks such as TextVQA and Vizwiz. It also surpasses LLaMA 3.2 11B Vision Instruct in key metrics. Krutrim introduced BharatBench, a new evaluation suite for ten under-resourced Indic languages across three tasks, establishing a baseline for future research and highlighting Chitrarth-1's ability to handle these languages effectively. Sample BharatBench results are shown below:

Language POPE LLaVA-Bench MMVet
Telugu 79.9 54.8 43.76
Hindi 78.68 51.5 38.85
Bengali 83.24 53.7 33.24
Malayalam 85.29 55.5 25.36
Kannada 85.52 58.1 46.19
English 87.63 67.9 30.49

For more details, click here.

Accessing Chitrarth-1

Chitrarth-1 is accessible through:

  • Hugging Face: Direct use or fine-tuning. (Click here to visit)
  • GitHub: (Code provided in the original article)
  • Krutrim Cloud: (Click here to explore)

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Chitrarth-1 in Action

Examples of Chitrarth-1's capabilities include image analysis, image caption generation, and UI/UX screen analysis (images provided in the original article).

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs Chitrarth-1: A Multilingual VLM by Krutrim AI Labs Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Conclusion

Krutrim AI Labs, a division of the Ola Group, is committed to building the future of AI computing. With Chitrarth-1, and other offerings like GPU as a Service, AI Studio, and more, they are establishing a new standard for inclusive, culturally sensitive AI, fostering a more equitable technological landscape.

The above is the detailed content of Chitrarth-1: A Multilingual VLM by Krutrim AI Labs. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template