Home > Database > Mysql Tutorial > How Can I Achieve Accent-Insensitive Searches in PostgreSQL?

How Can I Achieve Accent-Insensitive Searches in PostgreSQL?

Linda Hamilton
Release: 2025-01-20 12:21:17
Original
543 people have browsed it

How Can I Achieve Accent-Insensitive Searches in PostgreSQL?

PostgreSQL's Approach to Accent-Insensitive Searching

Unlike some databases (like Microsoft SQL Server), PostgreSQL doesn't natively support accent-insensitive collations. While PostgreSQL 12 introduced non-deterministic ICU collations offering case and accent insensitivity, these come with performance trade-offs and operational restrictions.

Strategies for Accent-Insensitive Queries in PostgreSQL

Several methods exist to achieve accent-insensitive searching in PostgreSQL:

1. The unaccent Module:

This module provides the unaccent() function, removing accents from strings. This allows queries like:

SELECT * FROM users WHERE unaccent(name) = unaccent('João');
Copy after login

However, unaccent() isn't IMMUTABLE, preventing its use in expression indexes, and it doesn't expand ligatures (e.g., 'Œ').

2. Optimized C Function Wrapper:

To address unaccent()'s limitations, a more efficient solution involves creating an IMMUTABLE C function wrapper:

CREATE OR REPLACE FUNCTION public.f_unaccent(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT
RETURN public.immutable_unaccent(regdictionary 'public.unaccent', );
Copy after login

This allows for the creation of expression indexes:

CREATE INDEX users_unaccent_name_idx ON users(public.f_unaccent(name));
Copy after login

Queries then use the wrapped function:

SELECT * FROM users WHERE f_unaccent(name) = f_unaccent('João');
Copy after login

3. Leveraging pg_trgm for Pattern Matching and Ligatures:

For more flexible pattern matching and ligature handling, the pg_trgm module with trigram indexes offers a powerful solution. A trigram GIN index enables case-insensitive searches and similarity detection:

CREATE INDEX users_unaccent_name_trgm_idx ON users
USING gin (f_unaccent(name) gin_trgm_ops);

SELECT * FROM users WHERE f_unaccent(name) LIKE ('%' || f_unaccent('João') || '%');
Copy after login

Note that pg_trgm indexes are more resource-intensive than standard B-tree indexes.

Choosing the optimal approach depends on the specific needs of your application, balancing query performance with index maintenance costs and the requirement for ligature handling.

The above is the detailed content of How Can I Achieve Accent-Insensitive Searches in PostgreSQL?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template