Home > Database > Mysql Tutorial > How can I achieve efficient fuzzy matching for email addresses and phone numbers within Elasticsearch?

How can I achieve efficient fuzzy matching for email addresses and phone numbers within Elasticsearch?

Susan Sarandon
Release: 2024-10-31 09:19:01
Original
849 people have browsed it

How can I achieve efficient fuzzy matching for email addresses and phone numbers within Elasticsearch?

Elasticsearch Fuzzy Email or Telephone Matching

Question:

How can fuzzy matching be implemented for email addresses or telephone numbers using Elasticsearch? Specifically, how can one match all emails ending with "@gmail.com" or all telephone numbers starting with "136"?

Answer:

Utilizing custom analyzers for indexing and searching can facilitate fuzzy matching for email and telephone data.

Email Fuzzy Matching:

Configure an analyzer with the following settings:

  • Index analyzer: index_email_analyzer

    • Standard tokenizer
    • Lowercase and name-ngram filters
    • Max gram: 20
  • Search analyzer: search_email_analyzer

    • Standard tokenizer
    • Lowercase filter

Telephone Number Fuzzy Matching:

Configure an analyzer with the following settings:

  • Index analyzer: index_phone_analyzer

    • Digit-only filter
    • Edge-ngram tokenizer (3-15 grams)
    • Min gram: 1
    • Max gram: 15
  • Search analyzer: search_phone_analyzer

    • Digit-only filter
    • Keyword tokenizer

Index Example:

PUT myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "email_url_analyzer": {
          "type": "custom",
          "tokenizer": "uax_url_email",
          "filter": [ "trim" ]
        },
        "index_phone_analyzer": {
          "type": "custom",
          "char_filter": [ "digit_only" ],
          "tokenizer": "digit_edge_ngram_tokenizer",
          "filter": [ "trim" ]
        },
        "search_phone_analyzer": {
          "type": "custom",
          "char_filter": [ "digit_only" ],
          "tokenizer": "keyword",
          "filter": [ "trim" ]
        },
        "index_email_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [ "lowercase", "name_ngram_filter", "trim" ]
        },
        "search_email_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [ "lowercase", "trim" ]
        }
      },
      "char_filter": {
        "digit_only": {
          "type": "pattern_replace",
          "pattern": "\D+",
          "replacement": ""
        }
      },
      "tokenizer": {
        "digit_edge_ngram_tokenizer": {
          "type": "edgeNGram",
          "min_gram": "1",
          "max_gram": "15",
          "token_chars": [ "digit" ]
        }
      },
      "filter": {
        "name_ngram_filter": {
          "type": "ngram",
          "min_gram": "1",
          "max_gram": "20"
        }
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "email": {
          "type": "string",
          "analyzer": "index_email_analyzer",
          "search_analyzer": "search_email_analyzer"
        },
        "phone": {
          "type": "string",
          "analyzer": "index_phone_analyzer",
          "search_analyzer": "search_phone_analyzer"
        }
      }
    }
  }
}
Copy after login

Search Queries:

  • Match all emails ending with "@gmail.com":
POST myindex
{ 
    "query": {
        "term": 
            { "email": "@gmail.com" }
    }
}
Copy after login
  • Match all telephone numbers starting with "136":
POST myindex
{ 
    "query": {
        "term": 
            { "phone": "136" }
    }
}
Copy after login

By utilizing these custom analyzers, Elasticsearch can perform fuzzy matching for email addresses and telephone numbers efficiently.

The above is the detailed content of How can I achieve efficient fuzzy matching for email addresses and phone numbers within Elasticsearch?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template