Sphinx is a full-text search engine developed by Russian Andrew Aksyonoff. It is intended to provide high-speed, space-saving, and high-result-relevant full-text search functions for other applications. Sphinx can be easily integrated with SQL databases and scripting languages. The current system's built-in support for MysqL and PostgreSQL database data sources also supports reading xml data in specific formats from standard input. By modifying the source code, you can add new data sources (for example: native support for other types of DBMS)
1. Sphinx Chinese word segmentation
Chinese full-text retrieval is based on semantic segmentation. Currently, most databases do not support Chinese full-text retrieval, such as Mysql. If Sphinx needs to perform full-text search for Chinese, it will also need some plug-ins to supplement it, such as coreseek and sfc.
2. Installation
There are two ways to apply Sphinx on mysql:
(1) Using API calls, such as using API functions or methods of PHP, java, etc. to query. The advantage is that there is no need to recompile MySQL, the server process is "lowly coupled", and the program can be called flexibly and conveniently; the disadvantage is that if there is already a search program, some programs need to be modified. Recommended for programmers.
(2) Use the plug-in method (sphinxSE) to compile sphinx into a mysql plug-in and use specific sql statements to retrieve. Its characteristics are that it is easy to combine on the SQL side, and can directly return data to the client without requiring a second query (note). In the program, only the corresponding SQL needs to be modified, but this is very inconvenient for programs developed using the framework. For example, using ORM. In addition, mysql needs to be recompiled, and mysql-5.1 or above needs to support plug-in storage. System administrators can use this method.
Use the first method to install:
Software environment:
Installation:
After the installation is complete, check whether there are three directories bin etc var under /usr/local/sphinx. If so, the installation is correct!
sfc installation
coreseek installation
3. Configuration
Specific instance configuration file:
##### Index source ###########
source article_src
{
type = mysql #####Data source type
sql_host = 192.168.1.10 ######mysqlhost
sql_user = root ########mysql username
sql_pass = pwd############mysql password
sql_db = test #########mysql database name
sql_port= 3306 ###########mysql port
sql_query_pre = SET NAMES UTF8 ###mysql search encoding, pay special attention to this point, many people cannot search in Chinese because the encoding of the database is GBK or other non-UTF8
sql_query = SELECT id,title,cat_id,member_id,content,created FROM sphinx_article ####### Get data sql
#####The following are the attributes used for filtering or conditional query############
sql_attr_uint = cat_id ######## Unsigned integer attribute
sql_attr_uint = member_id
sql_attr_timestamp = created ############ UNIX timestamp attribute
sql_query_info = select * from sphinx_article where id=$id ######### Test for command interface (CLI) calls
}
### Index ###
index article
{
source = article_src ####Declare index source
path = /usr/local/sphinx/var/data/article #######Index file storage path and index file name
docinfo = extern ##### Document information storage method
mlock = 0 ###Cache data memory lock
morphology = none #### Morphology (not valid for Chinese)
min_word_len = 1 #### Minimum length of indexed words
charset_type = utf-8 #####Data encoding
##### character table, note: if you use this method, sphinx will segment Chinese characters,
##### That is to perform word indexing. If you want to use Chinese word segmentation, you must use other word segmentation plug-ins such as coreseek, sfc
charset_table = U FF10..U FF19->0..9, 0..9, U FF41..U FF5A->a..z, U FF21..U FF3A->a.. z
}
######### Indexer configuration #####
indexer
{
mem_limit = 256M ####### Memory limit
}
############ sphinx service process ########
searchd
{
#listen = 9312 ### Listening port. Starting from this version, the official 9312 port has been officially authorized by IANA. The default in previous versions was 3312
log = /usr/local/sphinx/var/log/searchd.log #### Service process log. Once an exception occurs in sphinx, you can basically query effective information from here. Problems with rotation can generally be solved Find the answer here
query_log = /usr/local/sphinx/var/log/query.log ### Client query log, author's note: If you want to count some keywords, you can analyze this log file
read_timeout = 5 ## Request timeout
max_children = 30 ### The maximum number of searchd processes that can be executed simultaneously
pid_file = /usr/local/sphinx/var/log/searchd.pid #######Process ID file
max_matches = 1000 ### The maximum number of query results returned
seamless_rotate = 1 ### Whether to support seamless switching, usually required when doing incremental indexing
}
4. Create index
[root@localhost sphinx]# bin/indexer -c etc/sphinx.conf article ### Command to create index file
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file ‘etc/sphinx.conf’…
indexing index ‘article’…
collected 1000 docs, 0.2 MB
sorted 0.4 Mhits, 99.6% done
total 1000 docs, 210559 bytes
total 3.585 sec, 58723 bytes/sec, 278.89 docs/sec
total 2 reads, 0.031 sec, 1428.8 kb/call avg, 15.6 msec/call avg
total 11 writes, 0.032 sec, 671.6 kb/call avg, 2.9 msec/call avg
5. Application
In the previous step, we created the index, now we test the newly created index. There are two ways to test: CLI and API calls
(1) The command test on the CLI side is to use the search command that comes with sphinx: search
[root@localhost sphinx]# bin/search -c etc/sphinx.conf Liu Li
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file ‘etc/sphinx.conf’…
index 'mdmLoginLog': query 'Liu Li ': returned 6 matches of 6 total in 0.000 sec
displaying matches:
1. document=2, weight=2
2. document=3, weight=2
3. document=4, weight=2
4. document=5 , weight=2
5. document=7, weight=2
6. document=8, weight=2
words:
1. 'Liu': 6 documents, 6 hits
2. '利': 6 documents, 6 hits
(2) Use PHP’s api to test. Before testing, start the sphinx service process and open port 9312 on the centos firewall
[root@localhost sphinx]# bin/searchd -c etc/sphinx.conf & ### Make sphinx run in the background
[1] 5759
[root@localhost sphinx]# Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file ‘etc/sphinx.conf’…
listening on all interfaces, port=9312
[1] Done bin/searchd -c etc/sphinx.conf
Reference http://www.sphinxsearch.org/sphinx-tutorial