Comparison of Gemini Embedding with Multilingual-e5-large & Jina-AI-php.cn

Word embedding is crucial for natural language processing (NLP) tasks in Hindi, such as machine translation, question and answer, and information retrieval. These embeddings capture the semantic properties of words, enabling more accurate and context-oriented NLP applications. Given the large number of Hindi speakers and the growing number content of Hindi language, high-quality embedding is critical to improving NLP performance in these languages. Customized embedding can specifically solve the unique language characteristics and resource limitations of the Indian language family. The newly released Gemini Embedding model represents a significant advancement in multilingual text embedding, leveraging Google's powerful Gemini AI framework to achieve state-of-the-art performance in over 100 languages.

The Gemini Embedding model is good at tasks such as classification, retrieval and semantic search, providing greater efficiency and accuracy. By supporting larger input scales and higher dimensional outputs, Gemini Embedding provides richer text representations, enabling it to be widely used in a variety of applications.

Learning Objectives

Learn about Gemini embedding and its integration with Gemini LLM.
Practical tutorials for retrieving Hindi documents using Gemini embed.
Comparative analysis with Jina AI embedding and Multilingual-e5-large.
Insights on multilingual text retrieval capabilities and applications.

*This article is published as part of the *** Data Science Blog Marathon . ***

What is Gemini embedding?
Key Features of Gemini Embedding
Gemini embedded model architecture
Comparison with other multilingual embedding models
Retrieval using Gemini embedding and compared with Jina AI embedding and Multilingual-e5-large
- Step 1. Install the necessary libraries
- Step 2. Load the data
- Step 3. Block the data
- Step 4. Store the data in the vector database
- Step 5. Query the database
- Step 6. Compare with Jina AI Embedding
Comparison of embed search output
- explain
in conclusion
Frequently Asked Questions

What is Gemini embedding?

In March 2025, Google released a new experimental Gemini Embedding text model (gemini-embedding-exp-03-07) that can be used in the Gemini API.

The advanced embedding model originated from the Gemini model, which is said to inherit Gemini's profound understanding of nuances of language and subtle contexts, enabling it to be widely used in a variety of applications. It ranks first in the MTEB multilingual rankings.

Comparison of Gemini Embedding with Multilingual-e5-large & Jina

Gemini Embedding represents text as dense vectors where text inputs with similar semantics are mapped to vectors in vector space that are close to each other. Currently, it supports over 100 languages, and its embedding can be used for a variety of tasks such as retrieval and classification.

Key Features of Gemini Embedding

Strong multilingual capabilities : This model demonstrates outstanding performance in over 100 languages, not only in high-resource languages such as English, but also in low-resource languages such as Assamese and Macedonian.
Processing up to 8000 input tags : This powerful capability enables models to seamlessly handle lengthy documents or complex queries without truncation, thus maintaining context and meaning in a way that goes beyond many existing embedded models.
Output dimensions for 3K dimensions : This model generates an embed dimension up to 3072 and supports sub-dimensionality such as 768 and 1536 for task-specific optimization.
Impressive performance : Gemini Embedding ranked first in the massive text embedding benchmark (MTEB), with an average task score of 68.32, significantly surpassing its closest competitor.

Gemini embedded model architecture

Comparison of Gemini Embedding with Multilingual-e5-large & Jina

The core of Gemini Embedding is based on the Transformer architecture and initialized from Gemini LLM. This basis provides a deep understanding of language structure and semantics for the model. The model uses a bidirectional attention mechanism to process input sequences so that it can take into account the full context of a word or phrase when generating an embedding.

The input sequence T is processed by M (a Transformer with bidirectional attention, initialized from Gemini), resulting in a marker embedding sequence.
To generate a single embedding representing all the information in the input, a pooling function is applied.
Finally, linear projection is applied to scale the embedding to the target dimension, resulting in the final output embedding.

Loss function : The Gemini Embedding model is trained using noise comparison estimation (NCE) losses with in-batch negative examples. The exact loss will vary slightly depending on the training phase. Generally speaking, a training example includes a query, a positive target, and (optional) a difficult target.

Training strategies

Pre-fine-tuning : At this stage, the model is trained on a large diversified dataset containing query-target pairs. This exposure adjusts the parameters of large language models for coding tasks, laying the foundation for their adaptability.
Fine-tuning : In the second phase, the model uses a task-specific dataset containing a triple of query-positive-difficult-negative examples. This process uses smaller batch sizes and well-curated datasets to improve the performance of target tasks.

Read Also: Gemini Embedding: Universal Embedding from Gemini

Comparison with other multilingual embedding models

We compare the search for Hindi documents with the latest newly released Gemini embeddings and then compare them with Jina AI embeddings and Multilingual-e5-large embeddings. As shown in the following table, Gemini embedding and Jina AI embedding are high in terms of maximum number of tags, allowing the model to handle long documents or complex queries. Furthermore, as shown in the following table, Gemini embeddings have a higher embedding dimension that captures more detailed and nuanced semantic relationships between words, allowing models to represent nuanced differences in complex language patterns and meanings.

	Number of parameters	Embed dimensions	Maximum mark	Number of languages	Doll embedding
gemini-embedding-exp-03-07	unknown	3072	8192	100	Supports truncation of embeddings to various sizes, such as 2048, 1024, 512, 256, and 128 dimensions,
jinaai/jina-embeddings-v3	572 million	1024	8194	100	Supports flexible embed sizes (32, 64, 128, 256, 512, 768, 1024), allowing truncated embeds to fit your application
multilingual-e5-large-instruct	560 million	1024	514	94	NA

Retrieval using Gemini embedding and compared with Jina AI embedding and Multilingual-e5-large

In the following practical tutorial, we compare the search for Hindi documents with the latest newly released Gemini embeddings and then compare it with Jina AI embeddings and Multilingual-e5-large embeddings.

Step 1. Install the necessary libraries

 <code>!pip install langchain-community !pip install chromadb</code>

Copy after login

Step 2. Load the data

We used Hindi data from the website to evaluate the performance of Gemini embedding in Hindi language retrieval.

 <code>from langchain_community.document_loaders import WebBaseLoader loader = WebBaseLoader("https://ckbirlahospitals.com/rbh/blog/pregnancy-early-symptoms-in-hindi") data = loader.load()</code>

Copy after login

Step 3. Block the data

The following code uses RecursiveCharacterTextSplitter to split a large text document into 500-character chunks without overlap. It then applies this split to the datavariable and stores the result in all_splits. Due to the rate limits of the Gemini Embedding API, we only use 10 splits.

 <code>from langchain_text_splitters import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0) all_splits = text_splitter.split_documents(data) all_splits = all_splits[:10]</code>

Copy after login

Step 4. Store the data in the vector database

We first create a class called "GeminiEmbeddingFunction" which helps query the Gemini Embedding API and return the embedded value of the input query. We then create a function called "create_chroma_db" to create a collection in ChromaDB that will store data as well as embed.

 <code>import chromadb from chromadb import Documents, EmbeddingFunction, Embeddings class GeminiEmbeddingFunction(EmbeddingFunction): def __call__(self, input: Documents) -> Embeddings: title = "Custom query" return client.models.embed_content( model="gemini-embedding-exp-03-07", contents=input).embeddings[0].values def create_chroma_db(documents, name): chroma_client = chromadb.Client() db = chroma_client.create_collection(name=name, embedding_function=GeminiEmbeddingFunction()) for i, d in enumerate(documents): db.add( documents=d.page_content, ids=str(i) ) return db db = create_chroma_db(all_splits, "datab")</code>

Copy after login

Step 5. Query the database

 <code>def get_relevant_passage(query, db): passage = db.query(query_texts=[query], n_results=1)['documents'][0][0] return passage passage = get_relevant_passage("आपको प्रेगनेंसी टेस्ट कब करवाना चाहिए?", db) print(passage)</code>

Copy after login

Step 6. Compare with Jina AI Embedding

The following code uses the Hugging Face transformer model to define a custom embedding function, as well as a way to process text input to generate embeddings.

AutoTokenizer and AutoModel from transformers are used to load pretrained models (jinaai/jina-embeddings-v3) and import EmbeddingFunction from chromadb for creating custom embeddings.
average_pool function: This function aggregates the hidden states of the model by performing pooling operations on the model, averages the sequence length while taking the attention mask (ignoring the fill mark).
CustomHuggingFace class: It tokenizes text, feeds it into the model, and calculates the embedding using the average_pool function. The result is returned as an embedded list.

 <code>from transformers import AutoTokenizer, AutoModel from chromadb import EmbeddingFunction tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v3') model = AutoModel.from_pretrained('jinaai/jina-embeddings-v3') # the model returns many hidden states per document so we must aggregate them def average_pool(last_hidden_states, attention_mask): last_hidden = last_hidden_states.masked_fill(~attention_mask[...,None].bool(), 0.0) return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[...,None] class CustomHuggingFace(EmbeddingFunction): def __call__(self, texts): queries = [f'query: {text}' for text in texts] batch_dict = tokenizer(texts, max_length=512, padding=True, truncation=True, return_tensors='pt') outputs = model(**batch_dict) embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask']) return embeddings.tolist()</code>

Copy after login

Query

 <code>def get_relevant_passage(query, db): passage = db.query(query_texts=[query], n_results=1)['documents'][0][0] return passage passage = get_relevant_passage("आपको प्रेगनेंसी टेस्ट कब करवाना चाहिए?", db) print(passage)</code>

Copy after login

For choosing Multilingual-e5-large embed , we simply replace the tokenizer and model with "intfloat/multilingual-e5-large-instruct".

Comparison of embed search output

Question number	Query	Gemini Embed	jinaai/jina-embeddings-v3	intfloat/multilingual-e5-large-instruct
1	आपको प्रेगनेंसी टेस्ट कब करवाना चाहिए?	If you want to learn more about the early symptoms of pregnancy, this blog post is perfect for you. When should you have a pregnancy test? -mistake	If you want to learn more about the early symptoms of pregnancy, this blog post is perfect for you. When should you have a pregnancy test? -mistake	If you want to learn more about the early symptoms of pregnancy, this blog post is perfect for you. When should you have a pregnancy test? -mistake
2	Pregnancy के kuch symbols क्या होते हैं?	What are the early symptoms of pregnancy? During pregnancy, many hormonal changes occur in women. Early symptoms of pregnancy include nausea, vomiting, frequent urination and fatigue, which we will discuss in this blog post. -correct	Signs of pregnancy: Complete information on early symptoms! Home Quick Consultation Patient Login Contact Us: 08062136530 Emergency Phone: 07340054470 Open the main menu to serve patients and visitors International Patients About Us Make an appointment to call back WhatsApp to learn about the early symptoms of pregnancy. Obstetrics and Gynecology \| Author: Dr. CP Dadhich \| Release Date: February 6, 2025 Contents When should you have a pregnancy test? What are the early symptoms of pregnancy? Early symptoms of pregnancy Pregnancy – Error	What are the early symptoms of pregnancy? During pregnancy, many hormonal changes occur in women. Early symptoms of pregnancy include nausea, vomiting, frequent urination and fatigue, which we will discuss in this blog post. -correct
3	गर्भावस्था के दौरान एंटीबायोटिक दवा लेने से कब बचा हिए?	During the first few days of pregnancy, eggs and sperm are fertilized, causing symptoms such as bleeding and abdominal pain. During this period, for a healthy pregnancy, women are advised to avoid taking antibiotics, as this can be dangerous to mothers and babies. Early symptoms of pregnancy are not always delayed menstruation or vomiting. In addition, other symptoms may occur and require special attention, such as – Correct	During the first few days of pregnancy, eggs and sperm are fertilized, causing symptoms such as bleeding and abdominal pain. During this period, for a healthy pregnancy, women are advised to avoid taking antibiotics, as this can be dangerous to mothers and babies. Early symptoms of pregnancy are not always delayed menstruation or vomiting. In addition, other symptoms may occur and require special attention, such as – Correct	What every woman should know. For any pregnancy-related questions, we recommend that you contact our gynecologist to eliminate all complications. -mistake
4	कब गर्भावस्था में एंटीबायोटिक दवा लेने से बचाया जाए?	During the first few days of pregnancy, eggs and sperm are fertilized, causing symptoms such as bleeding and abdominal pain. During this period, for a healthy pregnancy, women are advised to avoid taking antibiotics, as this can be dangerous to mothers and babies. Early symptoms of pregnancy are not always delayed menstruation or vomiting. In addition, other symptoms may occur and require special attention, such as – Correct	During the first few days of pregnancy, eggs and sperm are fertilized, causing symptoms such as bleeding and abdominal pain. During this period, for a healthy pregnancy, women are advised to avoid taking antibiotics, as this can be dangerous to mothers and babies. Early symptoms of pregnancy are not always delayed menstruation or vomiting. In addition, other symptoms may occur and require special attention, such as – Correct	What every woman should know. For any pregnancy-related questions, we recommend that you contact our gynecologist to eliminate all complications. -mistake
5	गर्भधारण का सबसे पहला सामान्य लक्षण क्या है?	Delayed menstruation: This is the earliest and most common symptom of pregnancy. Confirmation of pregnancy based solely on this symptom is not entirely correct. However, if menstruation is delayed for one week or more, pregnancy tests are recommended. Breast changes: During pregnancy, the breasts will swell, become tender or change in color. It mainly changes in the size and color of the nipple (areola). -correct	With this in mind, how to confirm pregnancy? How to take care of the first month of pregnancy? How to do pregnancy checkups? How should I sit during pregnancy? Should sex occur during pregnancy? What fruits should you eat during pregnancy? How much water should you drink during pregnancy? The joy of becoming a mother is the greatest happiness in the world. During pregnancy, there are many changes in women's physical and psychological changes. You call these changes early symptoms of pregnancy – Error	What are the early symptoms of pregnancy? During pregnancy, many hormonal changes occur in women. Early symptoms of pregnancy include nausea, vomiting, frequent urination and fatigue, which we will discuss in this blog post. -correct
6	गर्भधारण के पहले संकेत क्या होते हैं?	Signs of pregnancy: Complete information on early symptoms! Home Quick Consultation Patient Login Contact Us: 08062136530 Emergency Phone: 07340054470 Open the main menu to serve patients and visitors International Patients About Us Make an appointment to call back WhatsApp to learn about the early symptoms of pregnancy. Obstetrics and Gynecology \| Author: Dr. CP Dadhich \| Release Date: February 6, 2025 Contents When should you have a pregnancy test? What are the early symptoms of pregnancy? Early symptoms of pregnancy Pregnancy – Error	With this in mind, how to confirm pregnancy? How to take care of the first month of pregnancy? How to do pregnancy checkups? How should I sit during pregnancy? Should sex occur during pregnancy? What fruits should you eat during pregnancy? How much water should you drink during pregnancy? The joy of becoming a mother is the greatest happiness in the world. During pregnancy, there are many changes in women's physical and psychological changes. You call these changes early symptoms of pregnancy – Error	What are the early symptoms of pregnancy? During pregnancy, many hormonal changes occur in women. Early symptoms of pregnancy include nausea, vomiting, frequent urination and fatigue, which we will discuss in this blog post. -correct
7	गर्भावस्था की पुष्टि के लिए कौन से हार्मोन का पता लगाना होता है?	The best time to have a pregnancy test is after menstruation is delayed by at least 7 days. You can use the home pregnancy testing tool to detect hCG levels at home. During pregnancy, the levels of this hormone will increase significantly. One thing you need to note is that premature testing can also lead to wrong results, so if your period is delayed and the test is negative, it is recommended that you wait at least 3 more days before you test again. -correct	There is also a correct way to do this, which you can also see on the test tool manual. To get accurate results, you should use the first urine in the morning, as the correct level of hCG hormone can be measured. Also, if you experience early symptoms of pregnancy and the test results are negative, see your doctor for a blood test immediately. In any case, you must consult a doctor if you have any questions. -correct	What are the early symptoms of pregnancy? During pregnancy, many hormonal changes occur in women. Early symptoms of pregnancy include nausea, vomiting, frequent urination and fatigue, which we will discuss in this blog post. -mistake

explain

As can be seen from the above Hindi output, using Gemini embedding, we get 5 correct outputs from 7 queries, while using Jina AI embedding and Multilingual-e5-large, we get only 3 correct responses.

This shows that, as reflected in the MTEB benchmark, Gemini embeddings perform well and handle multilinguals such as Hindi better than other embedding models.

in conclusion

In short, Gemini embedding represents a significant advancement in multilingual NLP, especially for Hindi languages such as Hindi. With its strong multilingual capabilities, support for large input sizes, and superior performance in benchmarks such as MTEB, Gemini excels in tasks such as retrieval, classification, and semantic search. Through practical comparisons, Gemini's performance is better than other models, providing higher accuracy and efficiency, making it a valuable tool for promoting multilingual NLP.

Main gains

Importance of Hindi Language Word Embedding : High-quality embedding enhances NLP tasks such as translation, question-and-answer, and retrieval, solving language challenges and resource gap problems.
Gemini Embedding Model : Google's Gemini Embedding utilizes its AI framework for multilingual text processing, covering more than 100 languages, including low-resource languages.
Key Features : Supports 8000 markers and 3072-dimensional embeddings, enabling efficient processing of long documents and complex queries.
Impressive Performance : Ranked No. 1 in the MTEB Multilingual Rankings with an average task score of 68.32, demonstrating its power in multilingual NLP.

The media shown in this article are not owned by Analytics Vidhya and can be used at the discretion of the author.

Frequently Asked Questions

Q1. What is the Gemini Embedding model? A: The Gemini Embedding model is based on Google's Gemini AI and provides top-notch multilingual text embeddings for more than 100 languages including Hindi.

Q2. What is unique about Gemini Embedding compared to other models? A: Gemini Embedding excels in multilingual support, can process 8000 markers and output 3072 dimensions, ensuring efficiency in classification, retrieval and semantic search.

Q3. How does Gemini Embedding perform in multilingual tasks? Answer: Gemini Embedding performs well in high-resource languages such as English and low-resource languages such as Assamese and Macedonian. It ranks number one on the MTEB multilingual rankings, demonstrating its powerful multilingual capabilities.

Q4. What is the architecture of the Gemini Embedding model? A: The model is initialized from Gemini LLM and uses a Transformer architecture with bidirectional attention to generate high-quality text embeddings that capture context and meaning.

Q5. How is the Gemini Embedding model trained? A: Gemini Embedding uses noise comparison estimation (NCE) loss with in-batch negative examples for training. It goes through two training phases: pre-fine-tuning on a large dataset and task-specific datasets to improve NLP performance.

The above is the detailed content of Comparison of Gemini Embedding with Multilingual-e5-large & Jina. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

InZoi: How To Apply To School And University

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7751

Java Tutorial

1643

CakePHP Tutorial

1397

Laravel Tutorial

1293

PHP Tutorial

1234

Related knowledge

Best AI Art Generators (Free & Paid) for Creative Projects Apr 02, 2025 pm 06:10 PM

The article reviews top AI art generators, discussing their features, suitability for creative projects, and value. It highlights Midjourney as the best value for professionals and recommends DALL-E 2 for high-quality, customizable art.

Getting Started With Meta Llama 3.2 - Analytics Vidhya Apr 11, 2025 pm 12:04 PM

Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

Best AI Chatbots Compared (ChatGPT, Gemini, Claude & More) Apr 02, 2025 pm 06:09 PM

The article compares top AI chatbots like ChatGPT, Gemini, and Claude, focusing on their unique features, customization options, and performance in natural language processing and reliability.

Is ChatGPT 4 O available? Mar 28, 2025 pm 05:29 PM

ChatGPT 4 is currently available and widely used, demonstrating significant improvements in understanding context and generating coherent responses compared to its predecessors like ChatGPT 3.5. Future developments may include more personalized interactions and real-time data processing capabilities, further enhancing its potential for various applications.

Top AI Writing Assistants to Boost Your Content Creation Apr 02, 2025 pm 06:11 PM

The article discusses top AI writing assistants like Grammarly, Jasper, Copy.ai, Writesonic, and Rytr, focusing on their unique features for content creation. It argues that Jasper excels in SEO optimization, while AI tools help maintain tone consist

Choosing the Best AI Voice Generator: Top Options Reviewed Apr 02, 2025 pm 06:12 PM

The article reviews top AI voice generators like Google Cloud, Amazon Polly, Microsoft Azure, IBM Watson, and Descript, focusing on their features, voice quality, and suitability for different needs.

Top 7 Agentic RAG System to Build AI Agents Mar 31, 2025 pm 04:25 PM

2024 witnessed a shift from simply using LLMs for content generation to understanding their inner workings. This exploration led to the discovery of AI Agents – autonomous systems handling tasks and decisions with minimal human intervention. Buildin

AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and More Apr 11, 2025 pm 12:01 PM

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

See all articles