MySQL全文检索笔记_MySQL
bitsCN.com
MySQL全文检索笔记 1. MySQL 4.x版本及以上版本提供了全文检索支持,但是表的存储引擎类型必须为MyISAM,
以下是建表SQL,注意其中显式设置了存储引擎类型 CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT, FULLTEXT (title,body)) ENGINE=MyISAM DEFAULT CHARSET=utf8; 其中FULLTEXT(title, body) 给title和body这两列建立全文索引,之后检索的时候注意必须同时指定这两列。 2. 插入测试数据 INSERT INTO articles (title,body) VALUES ('MySQL Tutorial','DBMS stands for DataBase ...'), ('How To Use MySQL Well','After you went through a ...'), ('Optimizing MySQL','In this tutorial we will show ...'), ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'), ('MySQL vs. YourSQL','In the following database comparison ...'), ('MySQL Security','When configured properly, MySQL ...'); 3. 全文检索测试
SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('database');
检索结果如下:5 MySQL vs. YourSQL In the following database comparison ...1 MySQL Tutorial DBMS stands for DataBase ... 说明全文匹配时忽略大小写。 4. 可能遇到的困扰
到目前为止都很顺利,但是如果检索SQL改为下面会怎样呢?
SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('well');
结果让人大跌眼镜,开始我也困惑了许久,后来去网上查了下才知道原来是这么回事:
mysql指定了最小字符长度,默认是4,必须要匹配大于4的才会有返回结果,可以用SHOW VARIABLES LIKE 'ft_min_word_len' 来查看指定的字符长度,也可以在mysql配置文件my.ini 更改最小字符长度,方法是在my.ini 增加一行 比如:ft_min_word_len = 2,改完后重启mysql即可。
所以上面不能返回结果。但是我用上面的方法改配置文件并重启MySQL服务器后,再用show命令查看,并没有改变。 另外,MySQL还会计算一个词的权值,以决定是否出现在结果集中,具体如下:mysql在集和查询中的对每个合适的词都会先计算它们的权重,一个出现在多个文档中的词将有较低的权重(可能甚至有一个零权重),因为在这个特定的集中,它有较低的语义值。否则,如果词是较少的,它将得到一个较高的权重,mysql默认的阀值是50%,上面‘you’在每个文档都出现,因此是100%,只有低于50%的才会出现在结果集中。
但是如果不考虑权重,那么该怎么办呢?MySQL提供了布尔全文检索(BOOLEAN FULLTEXT SEARCH) 假设well在所有记录中都出现,并且ft_min_word_len已经改为2,那么下面的SQL检索语句得到的结果集将包含所有记录: SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('well' IN BOOLEAN MODE ); 5. 布尔全文检索语法
上面通过IN BOOLEAN MODE指定全文检索模式为布尔全文检索。MySQL还提供了一些类似我们平时使用搜索引擎时用到的的语法:逻辑与、逻辑或、逻辑非等。具体通过几个SQL语句例子来说明
SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('+apple -banana' IN BOOLEAN MODE); + 表示AND,即必须包含。- 表示NOT,即不包含。SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('apple banana' IN BOOLEAN MODE);
apple和banana之间是空格,空格表示OR,即至少包含apple、banana中的一个。SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('+apple banana' IN BOOLEAN MODE); 必须包含apple,但是如果同时也包含banana则会获得更高的权重。SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('+apple ~banana' IN BOOLEAN MODE);
~ 是我们熟悉的异或运算符。返回的记录必须包含apple,但是如果同时也包含banana会降低权重。但是它没有 +apple -banana 严格,因为后者如果包含banana压根就不返回。SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('+apple +(>banana
参考网上的链接,具体做法包括先对中文内容进行分词,然后中文转换为四位区位码存到索引表中。检索时,包含中文的检索词也要先分词,再转换为四位区位码,然后在索引表中进行全文检索。 7. 核对条目 A. 只有存储引擎类型为MyISAM类型的表,并且MySQL版本为4.X或者以上才能使用MySQL内置的全文检索支持 B. MySQL全文检索默认不支持中文,且对英文检索时忽略大小写 C. MySQL全文检索时,默认检索长度为4,即关键词的长度必须大于5才能被捕获 D. MySQL全文检索时,所有FULLTEXT索引列必须使用相同的字符集 E. MySQL全文检索返回结果集时还会考虑权重 F. MySQL全文检索还支持灵活的布尔全文检索模式 G. 更多内容参考MySQL5官方手册
作者 feichexia bitsCN.com

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

This website reported on March 7 that Dr. Zhou Yuefeng, President of Huawei's Data Storage Product Line, recently attended the MWC2024 conference and specifically demonstrated the new generation OceanStorArctic magnetoelectric storage solution designed for warm data (WarmData) and cold data (ColdData). Zhou Yuefeng, President of Huawei's data storage product line, released a series of innovative solutions. Image source: Huawei's official press release attached to this site is as follows: The cost of this solution is 20% lower than that of magnetic tape, and its power consumption is 90% lower than that of hard disks. According to foreign technology media blocksandfiles, a Huawei spokesperson also revealed information about the magnetoelectric storage solution: Huawei's magnetoelectronic disk (MED) is a major innovation in magnetic storage media. First generation ME

Vue3+TS+Vite development tips: How to encrypt and store data. With the rapid development of Internet technology, data security and privacy protection are becoming more and more important. In the Vue3+TS+Vite development environment, how to encrypt and store data is a problem that every developer needs to face. This article will introduce some common data encryption and storage techniques to help developers improve application security and user experience. 1. Data Encryption Front-end Data Encryption Front-end encryption is an important part of protecting data security. Commonly used

What is cache? A cache (pronounced ka·shay) is a specialized, high-speed hardware or software component used to store frequently requested data and instructions, which in turn can be used to load websites, applications, services, and other aspects of the system faster part. Caching makes the most frequently accessed data readily available. Cache files are not the same as cache memory. Cache files refer to frequently needed files such as PNGs, icons, logos, shaders, etc., which may be required by multiple programs. These files are stored in your physical drive space and are usually hidden. Cache memory, on the other hand, is a type of memory that is faster than main memory and/or RAM. It greatly reduces data access time since it is closer to the CPU and faster compared to RAM

Git is a fast, reliable, and adaptable distributed version control system. It is designed to support distributed, non-linear workflows, making it ideal for software development teams of all sizes. Each Git working directory is an independent repository with a complete history of all changes and the ability to track versions even without network access or a central server. GitHub is a Git repository hosted on the cloud that provides all the features of distributed revision control. GitHub is a Git repository hosted on the cloud. Unlike Git which is a CLI tool, GitHub has a web-based graphical user interface. It is used for version control, which involves collaborating with other developers and tracking changes to scripts and

How to correctly use sessionStorage to store sensitive information requires specific code examples. Whether in web development or mobile application development, we often need to store and process sensitive information, such as user login credentials, ID numbers, etc. In front-end development, using sessionStorage is a common storage solution. However, since sessionStorage is browser-based storage, some security issues need to be paid attention to to ensure that the stored sensitive information is not maliciously accessed and used.

How do PHP and swoole achieve efficient data caching and storage? Overview: In web application development, data caching and storage are a very important part. PHP and swoole provide an efficient method to cache and store data. This article will introduce how to use PHP and swoole to achieve efficient data caching and storage, and give corresponding code examples. 1. Introduction to swoole: swoole is a high-performance asynchronous network communication engine developed for PHP language. It can

This article is reprinted from the WeChat public account "Living in the Information Age". The author lives in the information age. To reprint this article, please contact the Living in the Information Age public account. For students who are familiar with database operations, writing beautiful SQL statements and finding ways to find the data they need from the database is a routine operation. For students who are familiar with machine learning, it is also a routine operation to obtain data, preprocess the data, build a model, determine the training set and test set, and use the trained model to make a series of predictions about the future. So, can we combine the two technologies? We see that data is stored in the database, and predictions need to be based on past data. If we query future data through the existing data in the database, then it is

Overview of Java Collection Framework The Java collection framework is an important part of the Java programming language. It provides a series of container class libraries that can store and manage data. These container class libraries have different data structures to meet the data storage and processing needs in different scenarios. The advantage of the collection framework is that it provides a unified interface, allowing developers to operate different container class libraries in the same way, thereby reducing the difficulty of development. Data structures of the Java collection framework The Java collection framework contains a variety of data structures, each of which has its own unique characteristics and applicable scenarios. The following are several common Java collection framework data structures: 1. List: List is an ordered collection that allows elements to be repeated. Li
