I have my own note-taking blog, where I often do some analysis of technical articles. When querying some articles, in the past, I could only use like fuzzy matching in mysql to query the content. When there are too many articles, this method is definitely not efficient. So I set my sights on the Chinese search plug-in coreseek and successfully used it in my project.
Rendering:
I hope that through this analysis, interested students will avoid some detours.
Sphinx is an open source search engine that supports full-text search in English. However, the natural word segmentation symbol in English is a space, while Chinese has more complex word segmentation requirements. The Chinese provide a Chinese full-text search engine based on Sphinx that can be used by enterprises. In other words, the actual core of Coreseek is still Sphinx. But the biggest difference is that coreseek has a Chinese word segmentation tool mmseg.
System: Ubuntu
http service: Apache/2.2.22
Mysql:Ver 14.14 Distrib 5.5.41
PHP: PHP 5.3.10
Installation steps
Download coreseek-3.2.14.tar.gz and place it in /usr/local/src
First of all, in order to avoid missing dependency packages during installation, you need to make up for it
apt-get <span style="color: #0000ff;">install</span> <span style="color: #0000ff;">make</span> <span style="color: #0000ff;">gcc</span> g++ automake libtool mysql-client libmysqlclient15-dev libxml2-dev libexpat1-dev
Just execute the above command, otherwise various strange problems may occur because the software package is not new enough. For example, I updated the 159M software package. (I went back and replenished my blood after encountering various pitfalls)
1, install mmseg word segmentation module
cd /usr/local/<span style="color: #000000;">src </span><span style="color: #0000ff;">tar</span> zxvf coreseek-<span style="color: #800080;">3.2</span>.<span style="color: #800080;">14</span>.<span style="color: #0000ff;">tar</span><span style="color: #000000;">.gz #解压 cd coreseek</span>-<span style="color: #800080;">3.2</span>.<span style="color: #800080;">14</span><span style="color: #000000;"> cd mmseg</span>-<span style="color: #800080;">3.2</span>.<span style="color: #800080;">14</span><span style="color: #000000;"> .</span>/<span style="color: #000000;">bootstrap #输出的warning信息可以忽略,如果出现error则需要解决 .</span>/configure --prefix=/usr/local/<span style="color: #000000;">mmseg3 #配置 </span><span style="color: #0000ff;">make</span><span style="color: #000000;"> #编译 </span><span style="color: #0000ff;">make</span> <span style="color: #0000ff;">install</span> #安装
1.1) Possible problems and solutions:
The error ./bootstrap: 27: ./bootstrap: autoconf: not found occurs when executing ./bootstrap,
Reason: Because the automake tool is not installed, (ubuntu 10.04) just install it with the following command.
sudo apt-get install autoconf automake libtool
1.2) Possible problems: When installing the word segmentation module mmseg, and at the end of the compilation and installation, an error of annot find input file: src/Makefile.in appears
Then I checked and found the solution, as follows:
aclocal // is a perl script program, its definition is: "aclocal - create aclocal.m4 by scanning configure.ac"
libtoolize --force <span style="color: #008000;">//</span><span style="color: #008000;">运行后有一个错误,不用管它。<br></span> automake --add-<span style="color: #000000;">missing<br> autoconf<br> autoheader<br> </span><span style="color: #0000ff;">make</span> clean
Then recompile
./configure --prefix=/usr/local/mmseg3
make && make install
Compilation and installation successful
Summary: In fact, I didn’t find out the reason for this error. Anyway, I succeeded according to the solution. If anyone knows, please leave a message, thank you.
2. Install CoreSeek
cd /usr/local/<span style="color: #000000;">src cd coreseek</span>-<span style="color: #800080;">3.2</span>.<span style="color: #800080;">14</span><span style="color: #000000;"> cd csft</span>-<span style="color: #800080;">3.2</span>.<span style="color: #800080;">14</span> <span style="color: #0000ff;">sh</span> buildconf.<span style="color: #0000ff;">sh</span><span style="color: #000000;"> #输出的warning信息可以忽略,如果出现error则需要解决 .</span>/configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-<span style="color: #000000;">mysql #配置 </span><span style="color: #0000ff;">make</span><span style="color: #000000;"> #编译 </span><span style="color: #0000ff;">make</span> <span style="color: #0000ff;">install</span> #安装
3, test mmseg word segmentation, coreseek search, MySQL data source
cd /usr/local/src
cd coreseek-3.2.14
cd testpack
cat /usr/local/src/coreseek-3.2.14/testpack/var/test/test.xml #Chinese should be displayed correctly at this time, as shown in the figure below
/usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc /usr/local/src/coreseek-3.2.14/testpack/var/test/test.xml
/usr/local/coreseek/bin/indexer -c /usr/local/src/coreseek-3.2.14/testpack/etc/csft.conf --all
/usr/local/coreseek/bin/search -c /usr/local/src/coreseek-3.2.14/testpack/etc/csft.conf Network search
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/sphinx-min.conf.dist
/usr/local/coreseek/bin/indexer -c /usr/local/src/coreseek-3.2.14/testpack/etc/csft.conf --all --rotate #Start the service and update the index
As shown in the picture, and no error is reported. It means that your coreseek is already running normally.
3.1) Possible problems and solutions:
When typing /usr/local/coreseek/bin/indexer -c etc/csft.conf --all, xmlpipe2 support NOT compiled in. To use xmlpipe2, install missing error is reported
Reason:
The xmlpipe2 library is missing, solution:
apt-get install expat-*
Then recompile coreseek, remember to make clean
4, coreseek configuration and usage
<span style="color: #0000ff;">cp</span> /usr/local/src/coreseek-<span style="color: #800080;">3.2</span>.<span style="color: #800080;">14</span>/testpack/etc/csft_mysql.conf /usr/local/coreseek/etc/<span style="color: #000000;">csft_mysql.conf #拷贝MySQL数据源配置文件 </span><span style="color: #0000ff;">ln</span> -s /usr/local/coreseek/etc/csft_mysql.conf /etc/<span style="color: #000000;">csft_mysql.conf #添加软连接 vim </span>/etc/csft_mysql.conf #编辑,修改
以我自己的配置文件为例:
/usr/local/coreseek/etc/csft_mysql.conf
<span style="color: #000000;">#索引源定义 source mysql { type </span>=<span style="color: #000000;"> mysql sql_host </span>=<span style="color: #000000;"> localhost sql_user </span>=<span style="color: #000000;">xxxx sql_pass </span>=<span style="color: #000000;">xxxx sql_db </span>=<span style="color: #000000;">xxxx sql_port </span>= <span style="color: #800080;">3306</span><span style="color: #000000;"> sql_query_pre </span>=<span style="color: #000000;"> SET NAMES utf8 sql_query </span>= SELECT <span style="color: #0000ff;">id</span>,<span style="color: #0000ff;">id</span><span style="color: #000000;">,uid,title,data FROM notebook_notepad #sql_query第一列id需为整数 #title、data作为字符串</span>/<span style="color: #000000;">文本字段,被全文索引 sql_attr_uint </span>= <span style="color: #0000ff;">id</span><span style="color: #000000;"> #从SQL读取到的值必须为整数 #sql_attr_timestamp </span>= <span style="color: #0000ff;">time</span><span style="color: #000000;"> #从SQL读取到的值必须为整数,作为时间属性 sql_attr_uint </span>=<span style="color: #000000;"> uid sql_query_info_pre </span>=<span style="color: #000000;"> SET NAMES utf8 #命令行查询时,设置正确的字符集 sql_query_info </span>= SELECT * FROM notebook_notepad WHERE <span style="color: #0000ff;">id</span>=$<span style="color: #0000ff;">id</span><span style="color: #000000;"> #命令行查询时,从数据库读取原始数据信息 } #index定义 index mysql { source </span>=<span style="color: #000000;"> mysql #对应的source名称 path </span>=/usr/local/coreseek/var/data/mysql #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span style="color: #000000;">... docinfo </span>=<span style="color: #000000;"> extern mlock </span>= <span style="color: #800080;">0</span><span style="color: #000000;"> morphology </span>=<span style="color: #000000;"> none min_word_len </span>= <span style="color: #800080;">1</span><span style="color: #000000;"> html_strip </span>= <span style="color: #800080;">0</span><span style="color: #000000;"> #中文分词配置,详情请查看:http:</span><span style="color: #008000;">//</span><span style="color: #008000;">www.coreseek.cn/products-install/coreseek_mmseg/</span> charset_dictpath = /usr/local/mmseg3/etc/ #BSD、Linux环境下设置,/<span style="color: #000000;">符号结尾 #charset_dictpath </span>= etc/ #Windows环境下设置,/符号结尾,最好给出绝对路径,例如:C:/usr/local/coreseek/etc/<span style="color: #000000;">... charset_type </span>= zh_cn.utf-<span style="color: #800080;">8</span><span style="color: #000000;"> } #全局index定义 indexer { mem_limit </span>=<span style="color: #000000;"> 128M } #searchd服务定义 searchd { listen </span>= <span style="color: #800080;">9312</span><span style="color: #000000;"> read_timeout </span>= <span style="color: #800080;">5</span><span style="color: #000000;"> max_children </span>= <span style="color: #800080;">30</span><span style="color: #000000;"> max_matches </span>= <span style="color: #800080;">1000</span><span style="color: #000000;"> seamless_rotate </span>= <span style="color: #800080;">0</span><span style="color: #000000;"> preopen_indexes </span>= <span style="color: #800080;">0</span><span style="color: #000000;"> unlink_old </span>= <span style="color: #800080;">1</span><span style="color: #000000;"> pid_file </span>=/usr/local/coreseek/var/log/searchd_mysql.pid #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span style="color: #000000;">... log </span>=/usr/local/coreseek/var/log/searchd_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span style="color: #000000;">... query_log </span>=/usr/local/coreseek/var/log/query_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span style="color: #000000;">... }</span>
这样搜索的话就会从索引文件中查出id,uid,title,data字段。
OK,配置完成以后,重启Coreseek 服务就能从生产你想要的查询索引,以后你就可以摆脱mysql的桎梏,什么中文,英文都可以,还自带分词。怎么样,是不是打开了新世界的大门。
下面讲一下重建索引可能出错的地方,以及解决办法。有兴趣的同学看一下,否则可以跳到下一节:PHP测试Coreseek 。
重建索引时报错误:WARNING: failed to open pid_file '/usr/local/coreseek/var/log/searchd_mysql.pid'.
解决方法:
试着停止一下coreseek服务
/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop 停止服务
然后重启
/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf 启动服务
再次建立索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all 建立索引
如果提示:FATAL: failed to lock /usr/local/coreseek/var/data/xxxx.spl: Resource temporarily unavailable, will not index. Try --rotate option.
则尝试重建索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate 重建索引
1,将sphinxapi.php放到测试目录下
cp /usr/local/src/coreseek-3.2.14/testpack/api/sphinxapi.php ./
vim test.php
<span style="color: #008080;">header</span>("Content-type: text/html; charset=utf-8"<span style="color: #000000;">); </span><span style="color: #008000;">//</span><span style="color: #008000;">require("./ ");</span> <span style="color: #800080;">$s</span> = <span style="color: #0000ff;">new</span><span style="color: #000000;"> SphinxClient; </span><span style="color: #800080;">$s</span>->setServer("127.0.0.1", 9312<span style="color: #000000;">); </span><span style="color: #008000;">//</span><span style="color: #008000;">SPH_MATCH_ALL, 匹配所有查询词(默认模式); SPH_MATCH_ANY, 匹配查询词中的任意一个; SPH_MATCH_EXTENDED2, 支持特殊运算符查询</span> <span style="color: #800080;">$s</span>-><span style="color: #000000;">setMatchMode(SPH_MATCH_ALL); </span><span style="color: #800080;">$s</span>->setMaxQueryTime(30); <span style="color: #008000;">//</span><span style="color: #008000;">设置最大搜索时间</span> <span style="color: #800080;">$s</span>->SetArrayResult(<span style="color: #0000ff;">false</span>); <span style="color: #008000;">//</span><span style="color: #008000;">是否将Matches的key用ID代替</span> <span style="color: #800080;">$s</span>->SetSelect ( "*" ); <span style="color: #008000;">//</span><span style="color: #008000;">设置返回信息的内容,等同于SQL</span> <span style="color: #800080;">$s</span>-><span style="color: #000000;">SetRankingMode(SPH_RANK_BM25); </span><span style="color: #800080;">$s</span>->SetLimits ( 0, 30, 1000, 0 ); <span style="color: #008000;">//</span><span style="color: #008000;">设置结果集偏移量 SetLimits </span> <span style="color: #800080;">$res</span> = <span style="color: #800080;">$s</span>->query('coreseek','mysql','--single-0-query--'); <span style="color: #008000;">#</span><span style="color: #008000;">[coreseek]关键字,[mysql]数据源source</span> <span style="color: #800080;">$err</span> = <span style="color: #800080;">$s</span>-><span style="color: #000000;">GetLastError(); </span><span style="color: #0000ff;">echo</span> '<pre class="brush:php;toolbar:false">'<span style="color: #000000;">; </span><span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$res</span><span style="color: #000000;">); </span><span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$res</span>['matches'<span style="color: #000000;">]); </span><span style="color: #008080;">var_export</span>(<span style="color: #800080;">$err</span><span style="color: #000000;">); </span><span style="color: #0000ff;">echo</span> '';
php5 test.php
运行结果:matches为匹配后的结果集
1,Sphinx扩展安装安装
Coreseek官方教程中建议php使用直接include一个php文件进行操作,事实上php有独立的sphinx模块可以直接操作coreseek(coreseek就是sphinx!)已经进入了php的官方函数库,而且效率更高!但php模块依赖于libsphinxclient包。我是按照以下这篇文章的步骤安装了Sphinx扩展。
感谢:http://blog.csdn.net/e421083458/article/details/21529969
[第一步] 安装依赖libsphinxclient
# cd /var/<span style="color: #0000ff;">install</span>/coreseek-<span style="color: #800080;">4.1</span>-beta/csft-<span style="color: #800080;">4.1</span>/api/libsphinxclient/<span style="color: #000000;"> # .</span>/configure --prefix=/usr/local/<span style="color: #000000;">sphinxclient configure: creating .</span>/<span style="color: #000000;">config.status config.status: creating Makefile config.status: error: cannot </span><span style="color: #0000ff;">find</span> input <span style="color: #0000ff;">file</span>: Makefile.<span style="color: #0000ff;">in</span><span style="color: #000000;"> #报错configure失败 </span><span style="color: #008000;">//</span><span style="color: #008000;">处理configure报错</span> 编译过程中报了一个config.status: error: cannot <span style="color: #0000ff;">find</span> input <span style="color: #0000ff;">file</span>: src/<span style="color: #000000;">Makefile.in这个的错误,然后运行下列指令再次编译就能通过了: # aclocal # libtoolize </span>--<span style="color: #000000;">force # automake </span>--add-<span style="color: #000000;">missing # autoconf # autoheader # </span><span style="color: #0000ff;">make</span><span style="color: #000000;"> clean </span><span style="color: #008000;">//</span><span style="color: #008000;">从新configure编译</span> # ./<span style="color: #000000;">configure # </span><span style="color: #0000ff;">make</span> && <span style="color: #0000ff;">make</span> <span style="color: #0000ff;">install</span>
[第二步] 安装sphinx的PHP扩展
http:<span style="color: #008000;">//</span><span style="color: #008000;">pecl.php.net/package/sphinx</span> # <span style="color: #0000ff;">wget</span> http:<span style="color: #008000;">//</span><span style="color: #008000;">pecl.php.net/get/sphinx-1.3.0.tgz</span> # <span style="color: #0000ff;">tar</span> zxvf sphinx-<span style="color: #800080;">1.3</span>.<span style="color: #800080;">0</span><span style="color: #000000;">.tgz # cd sphinx</span>-<span style="color: #800080;">1.3</span>.<span style="color: #800080;">0</span><span style="color: #000000;"> # phpize # .</span>/configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/<span style="color: #000000;">sphinxclient # </span><span style="color: #0000ff;">make</span> && <span style="color: #0000ff;">make</span> <span style="color: #0000ff;">install</span><span style="color: #000000;"> # cd </span>/etc/php.d/<span style="color: #000000;"> # </span><span style="color: #0000ff;">cp</span><span style="color: #000000;"> gd.ini sphinx.ini # </span><span style="color: #0000ff;">vi</span><span style="color: #000000;"> sphinx.ini extension</span>=<span style="color: #000000;">sphinx.so # service php</span>-fpm restart
安装完PHP的Sphinx扩展后,就可以直接使用$coreseek = new SphinxClient();而无需引入源文件了。
简单说一下我在TP里使用coreseek查询,并高亮关键词的思路:
1,通过sphinx查出id,uid的集合
2,然后$sql = "select * from post where id in($ids)";$res = mysql_query($sql);获取到数据库的真实数据
3,用BuildExcerpts将title和data的关键字高亮,然后分页展示
关键代码:
<span style="color: #800080;"> $where</span> = <span style="color: #0000ff;">array</span><span style="color: #000000;">(); </span><span style="color: #800080;">$where</span>['uid']=<span style="color: #800080;">$uid</span><span style="color: #000000;">; </span><span style="color: #0000ff;">if</span>(!<span style="color: #0000ff;">empty</span>(<span style="color: #800080;">$search</span>)){ <span style="color: #008000;">//</span><span style="color: #008000;">有需要查找的内容,则去 coreseek 忠查出对应的id</span> <span style="color: #800080;">$coreseek</span> = <span style="color: #0000ff;">new</span><span style="color: #000000;"> \SphinxClient(); </span><span style="color: #800080;">$coreseek</span>->setServer("127.0.0.1", 9312<span style="color: #000000;">); </span><span style="color: #008000;">//</span><span style="color: #008000;">SPH_MATCH_ALL, 匹配所有查询词(默认模式); SPH_MATCH_ANY, 匹配查询词中的任意一个; SPH_MATCH_EXTENDED2, 支持特殊运算符查询</span> <span style="color: #800080;">$coreseek</span>-><span style="color: #000000;">setMatchMode(SPH_MATCH_ALL); </span><span style="color: #800080;">$coreseek</span>->setMaxQueryTime(30); <span style="color: #008000;">//</span><span style="color: #008000;">设置最大搜索时间</span> <span style="color: #800080;">$coreseek</span>->SetArrayResult(<span style="color: #0000ff;">false</span>); <span style="color: #008000;">//</span><span style="color: #008000;">是否将Matches的key用ID代替</span> <span style="color: #800080;">$coreseek</span>->SetSelect ( "*" ); <span style="color: #008000;">//</span><span style="color: #008000;">设置返回信息的内容,等同于SQL</span> <span style="color: #800080;">$coreseek</span>->SetLimits ( 0, 30, 1000, 0 ); <span style="color: #008000;">//</span><span style="color: #008000;">设置结果集偏移量 SetLimits</span> <span style="color: #800080;">$res</span> = <span style="color: #800080;">$coreseek</span>->query(<span style="color: #800080;">$search</span>,'mysql','--single-0-query--'<span style="color: #000000;">); </span><span style="color: #800080;">$key</span> = <span style="color: #008080;">array_keys</span>(<span style="color: #800080;">$res</span>['matches'<span style="color: #000000;">]); </span><span style="color: #800080;">$where</span>['id']=<span style="color: #0000ff;">array</span>('in',<span style="color: #800080;">$key</span><span style="color: #000000;">); </span><span style="color: #800080;">$coreseek</span>-><span style="color: #000000;">close(); }</span><span style="color: #0000ff;">else</span><span style="color: #000000;">{ } </span><span style="color: #008000;">//</span><span style="color: #008000;">获取总数据条数</span> <span style="color: #800080;">$total</span>=<span style="color: #800080;">$mod</span>->where(<span style="color: #800080;">$where</span>)-><span style="color: #008080;">count</span>();
高亮的关键代码:
<span style="color: #0000ff;"> if</span>(!<span style="color: #0000ff;">empty</span>(<span style="color: #800080;">$search</span><span style="color: #000000;">)){ </span><span style="color: #800080;">$page</span>->parameter['search']=<span style="color: #800080;">$search</span><span style="color: #000000;">; </span><span style="color: #008000;">//</span><span style="color: #008000;">代码高亮</span> <span style="color: #800080;">$opt</span> = <span style="color: #0000ff;">array</span>("before_match"=>"<font style='font-weight:bold;color:#f00'>","after_match"=>"</font>"<span style="color: #000000;">); </span><span style="color: #800080;">$coreseek1</span> = <span style="color: #0000ff;">new</span><span style="color: #000000;"> \SphinxClient(); </span><span style="color: #800080;">$coreseek1</span>->setServer("127.0.0.1", 9312<span style="color: #000000;">); </span><span style="color: #800080;">$coreseek1</span>-><span style="color: #000000;">SetMatchMode(SPH_MATCH_ALL); </span><span style="color: #800080;">$i</span>=0<span style="color: #000000;">; </span><span style="color: #800080;">$tags_title</span>=<span style="color: #0000ff;">array</span><span style="color: #000000;">(); </span><span style="color: #0000ff;">foreach</span>(<span style="color: #800080;">$info</span> <span style="color: #0000ff;">as</span> <span style="color: #800080;">$key</span>=><span style="color: #800080;">$row</span><span style="color: #000000;">){ </span><span style="color: #800080;">$tags_title</span>[]=<span style="color: #800080;">$row</span>['title'<span style="color: #000000;">]; } </span><span style="color: #800080;">$replace_title</span>=<span style="color: #800080;">$coreseek1</span>->BuildExcerpts(<span style="color: #800080;">$tags_title</span>,'mysql',<span style="color: #800080;">$search</span>,<span style="color: #800080;">$opt</span><span style="color: #000000;">); </span><span style="color: #0000ff;">foreach</span>(<span style="color: #800080;">$info</span> <span style="color: #0000ff;">as</span> <span style="color: #800080;">$key</span>=>&<span style="color: #800080;">$row</span><span style="color: #000000;">){ </span><span style="color: #800080;">$info</span>[<span style="color: #800080;">$key</span>]['title']=<span style="color: #800080;">$replace_title</span>[<span style="color: #800080;">$key</span><span style="color: #000000;">]; } </span><span style="color: #800080;">$coreseek1</span>-><span style="color: #000000;">close(); }</span>
OK,至此,coreseek已经能够在TP里完美运行了。这篇分享的文章也可以画上句号了。以上是我一步步安装时的细节,记录下怕以后自己忘记,也希望能给感兴趣的同学一些帮助。文章信息量偏大,如果有哪些疏漏,万望指正!