PHP+MYSQL implements full-text search and full-text search tools-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

PHP+MYSQL implements full-text search and full-text search tools

巴扎黑

May 26, 2018 pm 04:47 PM

How to use PHP to implement full-text search function?
Many people may be able to come up with several solutions right away, such as: file retrieval method, using SQL like statement, etc., but these methods are quite inefficient.
Here we introduce a relatively efficient method to implement PHP full-text retrieval, which is to use the FULLTEXT field type of MYSQL. However, MYSQL's FULLTEXT field does not support Chinese very well. This article also introduces how to implement the Chinese full-text search function through PHP+MYSQL.
First of all, you need to use a PHP Chinese word segmentation extension module??SCWS. Regarding the installation and use of this module, you can go to www.ftphp.com/scws to find relevant content (please leave a message if you have any questions).
Then take a look at the relevant information about the fulltext field type of mysql:
MySQL versions after 3.23.23 begin to support full-text indexing and search. The full-text index in MySQL is a FULLTEXT type index.
FULLTEXT indexes are used on MyISAM tables and can be created on CHAR, VARCHAR or TEXT columns at or after CREATE TABLE using ALTER TABLE or CREATE INDEX. For large databases, it is very fast to load the data into a table without a FULLTEXT index and then use ALTER TABLE (or CREATE INDEX) to create the index. Loading data into a table that already has a FULLTEXT index will be very slow.

MYSQL full-text search is completed through the MATCH() function.
The following is a simple example:
1. Create a new data table:

CREATE TABLE fulltext_sample(copy TEXT,FULLTEXT(copy)) TYPE=MyISAM;

Copy after login

The copy here is a fulltext type field. If the full text search field is not added when creating the table, it can also be added through alert, such as:

ALTER TABLE fulltext_sample ADD FULLTEXT(copy)

Copy after login

2. Insert data:

INSERT INTO fulltext_sample VALUES
(&#39;It appears good from here&#39;),
(&#39;The here and the past&#39;),
(&#39;Why are we hear&#39;),
(&#39;An all-out alert&#39;),
(&#39;All you need is love&#39;),
(&#39;A good alert&#39;);

Copy after login

3. Data retrieval:

SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST(&#39;love&#39;);

Copy after login

The above is the full-text search function of mysql. Note: Searching on the full-text index is not case-sensitive.

Let’s look at how to implement Chinese full-text search.
The fulltext field is based on words, and words need to be separated by spaces. However, in Chinese sentences, the words are not separated by spaces, so we need to segment Chinese words, which is why we need to emphasize the above. The Chinese word segmentation extension module is used for words.
However, despite segmenting Chinese words, MYSQL still cannot achieve full-text retrieval of Chinese through MATCH. This requires a certain method for conversion. A relatively simple and practical method is to use the following function (of course there are better ones), It converts Chinese into urlencode.

function q_encode($str)
{
$data = array_filter(explode(" ",$str));
$data = array_flip(array_flip($data));
foreach ($data as $ss) {
  if (strlen($ss)>1 ) 
   $data_code .= str_replace("%","",urlencode($ss)) . " ";
}
$data_code = trim($data_code);
return $data_code;
}

Copy after login

Save the converted content to the pre-defined fulltext field. Similarly, when querying, the query keywords need to be converted in the same way.

How to implement UTF8 full-text search with PHP+Mysql

This article explains how to quickly perform full-text search in massive data? MySQL provides a full-text index function, that is, setting the FULLTEXT index attribute on the field, and then searching through the MATCH AGAINST statement of SELECT.

TouchUs - The Global Yellow Pages & Business Directory (www.touchus.org), a pure English site we developed, uses this function of MySQL to achieve an average full-text retrieval time of less than 0.5 seconds for more than 100,000 pieces of data. However, when developing the Chinese website of TouchUs - City Yellow Pages (www.city39.cn), we encountered new problems. It turns out that in English typesetting, words are distinguished by spaces, which FULLText can fully support, but for Chinese or East Asian characters, it is not so simple. Because there is no obvious separation between words in Chinese, MySQL cannot Supports full-text search with Chinese characters.

How to make MySQL also support Chinese full-text search? An idea came up accidentally, that is, after Chinese word segmentation, it is possible to encode the Chinese into English characters, so as to establish a specific connection between Chinese and English, and then perform full-text search. In this way, wouldn't it be possible to realize Chinese characters? Is the full text indexed? After testing, the answer is yes. The following is the specific process implemented in the City Yellow Pages network:

1. Create a separate index table, for example, corresponding to the members table, we create a members_index table. M Members (members) User information full -text

user_id user_id

user_name index_intro

user_introduction

Add FullText index in the index_intro of the members_index table.

2. Perform Chinese word segmentation processing on the contents of the User_introduction field of the user information table (members)

中文分词的处理过程，可以参考简易中文分词系统http://www.ftphp.com/scws/，在城市黄页网中，我们采用了scws的PHP扩展模块方式来实现中文分词。scws的php扩展模块安装非常简单，只需简单编译配置后即可使用。在具体的php代码中，我们写了如下的函数来实现分词后将分词结果用空格进行连接。

//中文分词函数
function str_fc($str) {
$so = scws_new();
$so->set_charset(&#39;utf8&#39;);
// 这里没有调用 set_dict 和 set_rule 系统会自动试调用 ini 中指定路径下的词典和规则文件
$so->send_text($str);
while ($tmp = $so->get_result())
{
foreach (  $tmp as $ss ){
$s = trim($ss[word]);
if ( $s )
$mystr .= trim($ss[word]) . " ";
//echo urlencode(trim($ss[word])) . " ";
}
}
return $mystr;
}

Copy after login

该函数返回就是用空格连接的分词结果。

3. 对分词结果进行编码，可以采用多种编码方式，比如base64编码、urlencode编码、汉字转拼音等，对gb2312甚至可以采用区位码编码方式。考虑到存储空间以及便利性，我们采用了PHP的urlencode编码方式。需要注意的是，在编码前，我们可以去掉重复的分词来节约存储空间，编码后要去掉编码结果中的%符号，因为urlencode采用RFC 1738???行编码，会产生很多%，而%在MySQL是通配符。下面是编码过程用到的PHP代码

$data = str_fc($data);  //中文分词
$data = array_filter(explode(" ",$data)); //删除数组空项
$data = array_flip(array_flip($data));  //删除重复项
//对分词结果进行urlcode编码
foreach (  $data as $ss ) {
if (strlen($ss)>1 )
$data_code .= str_replace("%","",urlencode($ss)) . " ";
}

Copy after login

这里的$data_code就是编码后的结果。把编码结果根据user_id存入用户信息全文索

引表(members_index)

4. 在进行搜索处理时，首先对用户输入的关键字进行同样的分词编码处理，然后通过MySQL的SELECT的MATCH AGAINST语句进行全文快速检索，根据检索结的user_id即可调用用户信息表(members)中的原始数据进行显示，而没有必要进行一次解码重组。

以上MySQL UTF8中文全文检索方法.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7627

CakePHP Tutorial

1389

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

140

Related knowledge

Alipay PHP SDK transfer error: How to solve the problem of 'Cannot declare class SignData'? Apr 01, 2025 am 07:21 AM

Alipay PHP...

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

How does session hijacking work and how can you mitigate it in PHP? Apr 06, 2025 am 12:02 AM

Session hijacking can be achieved through the following steps: 1. Obtain the session ID, 2. Use the session ID, 3. Keep the session active. The methods to prevent session hijacking in PHP include: 1. Use the session_regenerate_id() function to regenerate the session ID, 2. Store session data through the database, 3. Ensure that all session data is transmitted through HTTPS.

Describe the SOLID principles and how they apply to PHP development. Apr 03, 2025 am 12:04 AM

The application of SOLID principle in PHP development includes: 1. Single responsibility principle (SRP): Each class is responsible for only one function. 2. Open and close principle (OCP): Changes are achieved through extension rather than modification. 3. Lisch's Substitution Principle (LSP): Subclasses can replace base classes without affecting program accuracy. 4. Interface isolation principle (ISP): Use fine-grained interfaces to avoid dependencies and unused methods. 5. Dependency inversion principle (DIP): High and low-level modules rely on abstraction and are implemented through dependency injection.

How to automatically set permissions of unixsocket after system restart? Mar 31, 2025 pm 11:54 PM

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

How to debug CLI mode in PHPStorm? Apr 01, 2025 pm 02:57 PM

How to debug CLI mode in PHPStorm? When developing with PHPStorm, sometimes we need to debug PHP in command line interface (CLI) mode...

Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

How to send a POST request containing JSON data using PHP's cURL library? Apr 01, 2025 pm 03:12 PM

Sending JSON data using PHP's cURL library In PHP development, it is often necessary to interact with external APIs. One of the common ways is to use cURL library to send POST�...

See all articles