分享mysql中文全文搜索:中文分词简单函数
分享mysql中文全文搜索:中文分词简单函数
原文地址:http://www.jb100.net/html/content-22-400-1.html
前段时间研究中文全文搜索,结果发现mysql不支持中文的全文搜索。但是有一些解决办法,就是手动把中文单词用空格分开,然后搜索的时候加 上 in boolean mode。 但是这就带来一个问题,就是中文分词。这个是个很大的难题,貌似中科院有个小组就是专门做中文分词技术的。我们用 php来分词的话,要实现真正语义上的分词是非常困难的,就算实现了效率也不高。一般情况下,我们采用的是如下方法分词:
比如我们有一句话:你好我是刘春龙
那么我们可以这样来分词: 你好 好我 我是 是刘 刘春 春龙
这样虽然看起来有点傻,但是实际应用起来确实可行,因为我们搜索时候输入的关键词也是按照这个方法分词。
下面有个我自己写的函数,可以实现这种分词。传入三个参数,分别是:
1.需要分词的字符串,必须,英文,标点,数字,汉字,日语等都可以。编码为UTF-8
2.是否返回字符串,可选,默认是。如果传入false,那么将返回一个数组。
3.是否base64_encode中文,可选,默认是。Mysql的全文搜索有个配置是 ft_min_word_len 这个值一般是4,而 我们分成的中文词语是两个字,就不会被mysql认为是一个词。而base64_encode过后,词语的长度为8,就不存在最小长度问题 了。 base64_encode过后数据量会增大 50%。
注意,这里输入和输出的字符串编码都是UTF-8 function string2words($s,$return_string = true,$encode64 = true) <br>
{ <br>
$re = ''; <br>
//匹配汉字 <br>
if (preg_match_all("/([x{4e00}-x{9fff}]{2,})/u",$s,$ms)) <br>
{ <br>
foreach($ms[0] as $w) <br>
{ <br>
//关键部分:分词 <br>
$l = strlen($w)/3; <br>
for($i=0;$i
{ <br>
$wi = substr($w,$i*3,6); <br>
if (strlen($wi) > 3) <br>
{ <br>
$re .= ($encode64)?' '.str_replace(',','@',base64_encode($wi)):' '.$wi; <br>
} <br>
} <br>
} <br>
} <br>
//匹配数字 <br>
if (preg_match_all("/(d+[.]?d+)/",$s,$ms)) <br>
{ <br>
foreach($ms[0] as $wi) <br>
{ <br>
if(strlen($wi) >= 2) <br>
{ <br>
$re .= ($encode64)?' '.str_replace(',','@',base64_encode($wi)):' '.$wi; <br>
} <br>
} <br>
$s = preg_replace("/(d+[.]?d+)/",' ',$s); <br>
} <br>
//去掉所有双字节字符 <br>
$s = preg_replace("/([^x{00}-x{ff}]+)/u",' ',$s); <br>
$re = $s.' '.$re; <br>
if (!$return_string) <br>
{ <br>
$re = preg_replace("/([^d])([,.-?n])([^d])/",'$1 $3',$re); <br>
$re = trim(preg_replace("/[s]{2,}/",' ',$re)); <br>
$arr = explode(' ',$re); <br>
$re = array(); <br>
foreach($arr as $a) <br>
{ <br>
if (strlen($a) >= 2) $re[] = $a; <br>
} <br>
return $re; <br>
} <br>
else <br>
{ <br>
$re = trim(preg_replace("/[s,.]{2,}/",' ',$re)); <br>
return $re; <br>
} <br>
}
原文地址:http://www.jb100.net/html/content-22-400-1.html
AD:真正免费,域名+虚机+企业邮箱=0元

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Learn about Python programming with introductory code examples Python is an easy-to-learn, yet powerful programming language. For beginners, it is very important to understand the introductory code examples of Python programming. This article will provide you with some concrete code examples to help you get started quickly. Print HelloWorldprint("HelloWorld") This is the simplest code example in Python. The print() function is used to output the specified content

PHP variables store values during program runtime and are crucial for building dynamic and interactive WEB applications. This article takes an in-depth look at PHP variables and shows them in action with 10 real-life examples. 1. Store user input $username=$_POST["username"];$passWord=$_POST["password"]; This example extracts the username and password from the form submission and stores them in variables for further processing. 2. Set the configuration value $database_host="localhost";$database_username="username";$database_pa

"Go Language Programming Examples: Code Examples in Web Development" With the rapid development of the Internet, Web development has become an indispensable part of various industries. As a programming language with powerful functions and superior performance, Go language is increasingly favored by developers in web development. This article will introduce how to use Go language for Web development through specific code examples, so that readers can better understand and use Go language to build their own Web applications. 1. Simple HTTP Server First, let’s start with a

The simplest code example of Java bubble sort Bubble sort is a common sorting algorithm. Its basic idea is to gradually adjust the sequence to be sorted into an ordered sequence through the comparison and exchange of adjacent elements. Here is a simple Java code example that demonstrates how to implement bubble sort: publicclassBubbleSort{publicstaticvoidbubbleSort(int[]arr){int

Title: From Beginner to Mastery: Code Implementation of Commonly Used Data Structures in Go Language Data structures play a vital role in programming and are the basis of programming. In the Go language, there are many commonly used data structures, and mastering the implementation of these data structures is crucial to becoming a good programmer. This article will introduce the commonly used data structures in the Go language and give corresponding code examples to help readers from getting started to becoming proficient in these data structures. 1. Array Array is a basic data structure, a group of the same type

Huawei Cloud Edge Computing Interconnection Guide: Java Code Samples to Quickly Implement Interfaces With the rapid development of IoT technology and the rise of edge computing, more and more enterprises are beginning to pay attention to the application of edge computing. Huawei Cloud provides edge computing services, providing enterprises with highly reliable computing resources and a convenient development environment, making edge computing applications easier to implement. This article will introduce how to quickly implement the Huawei Cloud edge computing interface through Java code. First, we need to prepare the development environment. Make sure you have the Java Development Kit installed (

How to use PHP to write the inventory management function code in the inventory management system. Inventory management is an indispensable part of many enterprises. For companies with multiple warehouses, the inventory management function is particularly important. By properly managing and tracking inventory, companies can allocate inventory between different warehouses, optimize operating costs, and improve collaboration efficiency. This article will introduce how to use PHP to write code for inventory warehouse management functions, and provide you with relevant code examples. 1. Establish the database before starting to write the code for the inventory warehouse management function.

Java Selection Sorting Method Code Writing Guide and Examples Selection sorting is a simple and intuitive sorting algorithm. The idea is to select the smallest (or largest) element from the unsorted elements each time and exchange it until all elements are sorted. This article will provide a code writing guide for selection sorting, and attach specific Java sample code. Algorithm Principle The basic principle of selection sort is to divide the array to be sorted into two parts, sorted and unsorted. Each time, the smallest (or largest) element is selected from the unsorted part and placed at the end of the sorted part. Repeat the above
