Sphinx installation and configuration application, sphinx installation and configuration_PHP tutorial

WBOY
Release: 2016-07-12 08:51:45
Original
1024 people have browsed it

Sphinx installation and configuration application, sphinx installation and configuration

Sphinx is a full-text search engine developed by Russian Andrew Aksyonoff. It is intended to provide high-speed, space-saving, and high-result-relevant full-text search functions for other applications. Sphinx can be easily integrated with SQL databases and scripting languages. The current system's built-in support for MysqL and PostgreSQL database data sources also supports reading xml data in specific formats from standard input. By modifying the source code, you can add new data sources (for example: native support for other types of DBMS)

1. Sphinx Chinese word segmentation

Chinese full-text retrieval is based on semantic segmentation. Currently, most databases do not support Chinese full-text retrieval, such as Mysql. If Sphinx needs to perform full-text search for Chinese, it will also need some plug-ins to supplement it, such as coreseek and sfc.

  • Coreseek is the most commonly used sphinx Chinese full-text search. It provides the Chinese word segmentation package LibMMSeg designed for Sphinx. It also provides binary distribution versions for multiple systems, including rpm, deb and binary packages under windows.
  • sfc (sphinx-for-chinese) is another Chinese word segmentation plug-in provided by the netizen happy brother. The Chinese dictionary uses xdict. According to him, after testing, the current version can basically reach half of the indexing speed of UTF-8 English in terms of indexing speed (Linux test platform), which is half of the officially claimed speed. (Time is mainly consumed in word segmentation).

2. Installation

There are two ways to apply Sphinx on mysql:
(1) Using API calls, such as using API functions or methods of PHP, java, etc. to query. The advantage is that there is no need to recompile MySQL, the server process is "lowly coupled", and the program can be called flexibly and conveniently; the disadvantage is that if there is already a search program, some programs need to be modified. Recommended for programmers.
(2) Use the plug-in method (sphinxSE) to compile sphinx into a mysql plug-in and use specific sql statements to retrieve. Its characteristics are that it is easy to combine on the SQL side, and can directly return data to the client without requiring a second query (note). In the program, only the corresponding SQL needs to be modified, but this is very inconvenient for programs developed using the framework. For example, using ORM. In addition, mysql needs to be recompiled, and mysql-5.1 or above needs to support plug-in storage. System administrators can use this method.

Use the first method to install:

Software environment:

  • Operating system: Centos-5.2
  • Database: mysql-5.0.77-3.el5 mysql-devel (if you want to use sphinxSE plug-in storage, please use mysql-5.1 or above)
  • Compilation software: gcc gcc-c autoconf automake
  • Sphinx: Sphinx-0.9.9 (latest stable version)

Installation:

  • [root@localhost ~]# yum install -y mysql mysql-devel
  • [root@localhost ~]# yum install -y automake autoconf
  • [root@localhost ~]# cd /usr/local/src/
  • [root@localhost src]# wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
  • [root@localhost src]# tar zxvf sphinx-0.9.9.tar.gz
  • [root@localhost local]# cd sphinx-0.9.9
  • [root@localhost sphinx-0.9.9]# ./configure –prefix=/usr/local/sphinx #Note: sphinx here already supports mysql by default
  • [root@localhost sphinx-0.9.9]# make && make install # The "warning" can be ignored

After the installation is complete, check whether there are three directories bin etc var under /usr/local/sphinx. If so, the installation is correct!

sfc installation

coreseek installation

3. Configuration

  • [root@localhost ~]#cd /usr/local/sphinx/etc #Enter the sphinx configuration file directory
  • [root@localhost etc]# cp sphinx.conf.dist sphinx.conf #New Sphinx configuration file
  • [root@localhost etc]# vim sphinx.conf #Edit sphinx.conf

Specific instance configuration file:

##### Index source ###########
source article_src
{
type = mysql #####Data source type
sql_host = 192.168.1.10 ######mysqlhost
sql_user = root ########mysql username
sql_pass = pwd############mysql password
sql_db = test #########mysql database name
sql_port= 3306 ###########mysql port
sql_query_pre = SET NAMES UTF8 ###mysql search encoding, pay special attention to this point, many people cannot search in Chinese because the encoding of the database is GBK or other non-UTF8
sql_query = SELECT id,title,cat_id,member_id,content,created FROM sphinx_article ####### Get data sql

#####The following are the attributes used for filtering or conditional query############

sql_attr_uint = cat_id ######## Unsigned integer attribute
sql_attr_uint = member_id
sql_attr_timestamp = created ############ UNIX timestamp attribute

sql_query_info = select * from sphinx_article where id=$id ######### Test for command interface (CLI) calls

}

### Index ###

index article
{
source = article_src ####Declare index source
path = /usr/local/sphinx/var/data/article #######Index file storage path and index file name
docinfo = extern ##### Document information storage method
mlock = 0 ###Cache data memory lock
morphology = none #### Morphology (not valid for Chinese)
min_word_len = 1 #### Minimum length of indexed words
charset_type = utf-8 #####Data encoding

##### character table, note: if you use this method, sphinx will segment Chinese characters,
##### That is to perform word indexing. If you want to use Chinese word segmentation, you must use other word segmentation plug-ins such as coreseek, sfc

charset_table = U FF10..U FF19->0..9, 0..9, U FF41..U FF5A->a..z, U FF21..U FF3A->a.. z

}

######### Indexer configuration #####
indexer
{
mem_limit = 256M ####### Memory limit
}

############ sphinx service process ########
searchd
{
#listen = 9312 ### Listening port. Starting from this version, the official 9312 port has been officially authorized by IANA. The default in previous versions was 3312

log = /usr/local/sphinx/var/log/searchd.log #### Service process log. Once an exception occurs in sphinx, you can basically query effective information from here. Problems with rotation can generally be solved Find the answer here
query_log = /usr/local/sphinx/var/log/query.log ### Client query log, author's note: If you want to count some keywords, you can analyze this log file
read_timeout = 5 ## Request timeout
max_children = 30 ### The maximum number of searchd processes that can be executed simultaneously
pid_file = /usr/local/sphinx/var/log/searchd.pid #######Process ID file
max_matches = 1000 ### The maximum number of query results returned
seamless_rotate = 1 ### Whether to support seamless switching, usually required when doing incremental indexing
}

4. Create index

[root@localhost sphinx]# bin/indexer -c etc/sphinx.conf article ### Command to create index file
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file ‘etc/sphinx.conf’…
indexing index ‘article’…
collected 1000 docs, 0.2 MB
sorted 0.4 Mhits, 99.6% done
total 1000 docs, 210559 bytes
total 3.585 sec, 58723 bytes/sec, 278.89 docs/sec
total 2 reads, 0.031 sec, 1428.8 kb/call avg, 15.6 msec/call avg
total 11 writes, 0.032 sec, 671.6 kb/call avg, 2.9 msec/call avg

5. Application

In the previous step, we created the index, now we test the newly created index. There are two ways to test: CLI and API calls

(1) The command test on the CLI side is to use the search command that comes with sphinx: search

[root@localhost sphinx]# bin/search -c etc/sphinx.conf Liu Li
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file ‘etc/sphinx.conf’…

index 'mdmLoginLog': query 'Liu Li ': returned 6 matches of 6 total in 0.000 sec

displaying matches:
1. document=2, weight=2
2. document=3, weight=2
3. document=4, weight=2
4. document=5 , weight=2
5. document=7, weight=2
6. document=8, weight=2

words:
1. 'Liu': 6 documents, 6 hits
2. '利': 6 documents, 6 hits

(2) Use PHP’s api to test. Before testing, start the sphinx service process and open port 9312 on the centos firewall

[root@localhost sphinx]# bin/searchd -c etc/sphinx.conf & ### Make sphinx run in the background
[1] 5759
[root@localhost sphinx]# Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file ‘etc/sphinx.conf’…
listening on all interfaces, port=9312

[1] Done bin/searchd -c etc/sphinx.conf

Reference http://www.sphinxsearch.org/sphinx-tutorial

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/1128121.htmlTechArticleSphinx installation and configuration application, sphinx installation and configuration Sphinx is a full-text search engine developed by Russian Andrew Aksyonoff. Intended to provide high speed, space occupancy, and high performance for other applications...
Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template