一个文档包含了安装、增量备份、扩展、api调用示例,省去了查找大量文章的时间。 搭建coreseek(sphinx+mmseg3)安装 [第一步] 先安装mmseg3 cd /var/ install wget http: // www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz tar zxvf coreseek- 4
一个文档包含了安装、增量备份、扩展、api调用示例,省去了查找大量文章的时间。
cd /var/<span>install</span> <span>wget</span> http:<span>//</span><span>www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz</span> <span>tar</span> zxvf coreseek-<span>4.1</span>-beta.<span>tar</span><span>.gz cd coreseek</span>-<span>4.1</span>-<span>beta cd mmseg</span>-<span>3.2</span>.<span>14</span><span> .</span>/<span>bootstrap .</span>/configure --prefix=/usr/local/<span>mmseg3 </span><span>make</span> && <span>make</span> <span>install</span><span> 遇到的问题: error: cannot </span><span>find</span> input <span>file</span>: src/Makefile.<span>in</span><span> 或者遇到其他类似error错误时... 解决方案: 依次执行下面的命令,我运行</span><span>'</span><span>aclocal</span><span>'</span><span>时又出现了错误,解决方案请看下文描述 </span><span>yum</span> -y <span>install</span><span> libtool aclocal libtoolize </span>--<span>force automake </span>--add-<span>missing autoconf autoheader </span><span>make</span> clean
安装好'libtool'继续从'aclocal'开始执行上面提到的一串命令,执行完后再运行最开始的安装流程即可。
<span>##安装coreseek $ cd csft</span>-<span>3.2</span>.<span>14</span> 或者 cd csft-<span>4.0</span>.<span>1</span> 或者 cd csft-<span>4.1</span><span> $ </span><span>sh</span> buildconf.<span>sh</span><span> #输出的warning信息可以忽略,如果出现error则需要解决 $ .</span>/configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-<span>mysql ##如果提示mysql问题,可以查看MySQL数据源安装说明 http:</span><span>//</span><span>www.coreseek.cn/product_install/install_on_bsd_linux/#mysql</span> $ <span>make</span> && <span>make</span> <span>install</span><span> $ cd .. ##命令行测试mmseg分词,coreseek搜索(需要预先设置好字符集为zh_CN.UTF</span>-<span>8</span><span>,确保正确显示中文) $ cd testpack $ </span><span>cat</span> var/test/<span>test.xml #此时应该正确显示中文 $ </span>/usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc var/test/<span>test.xml $ </span>/usr/local/coreseek/bin/indexer -c etc/csft.conf --<span>all $ </span>/usr/local/coreseek/bin/search -c etc/csft.conf 网络搜索
出现这个 xmlpipe2 support NOT compiled in. To use xmlpipe2, install missing XML libra 错误
执行以下命令:
<span>yum</span> -y <span>install</span> expat expat-devel
依次安装后,从新编译coreseek,然后再生成索引,就可以通过了。
结果如下:
Coreseek Fulltext <span>4.1</span> [ Sphinx <span>2.0</span>.<span>2</span>-<span>dev (r2922)] Copyright (c) </span><span>2007</span>-<span>2011</span><span>, Beijing Choice Software Technologies Inc (http:</span><span>//</span><span>www.coreseek.com) </span> <span> using config </span><span>file</span> <span>'</span><span>etc/csft.conf</span><span>'</span><span>... index </span><span>'</span><span>xml</span><span>'</span>: query <span>'</span><span>网络搜索 </span><span>'</span>: returned <span>1</span> matches of <span>1</span> total <span>in</span> <span>0.000</span><span> sec displaying matches: </span><span>1</span>. document=<span>1</span>, weight=<span>1590</span>, published=Thu Apr <span>1</span> <span>07</span>:<span>20</span>:<span>07</span> <span>2010</span>, author_id=<span>1</span><span> words: </span><span>1</span>. <span>'</span><span>网络</span><span>'</span>: <span>1</span> documents, <span>1</span><span> hits </span><span>2</span>. <span>'</span><span>搜索</span><span>'</span>: <span>2</span> documents, <span>5</span> hits
创建sphinx统计表,在coreseek_test库中执行。
<span>CREATE TABLE sph_counter ( counter_id INTEGER PRIMARY KEY NOT NULL, max_doc_id INTEGER NOT NULL );</span>
创建配置sphinx与mysql的配置文件
# <span>vi</span> /usr/local/coreseek/etc/csft_mysql.conf
#MySQL数据源配置,详情请查看:http:<span>//</span><span>www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#conf-reference</span> <span> #源定义 source main { type </span>=<span> mysql sql_host </span>=<span> localhost sql_user </span>=<span> root sql_pass </span>= <span>123456</span><span> sql_db </span>=<span> coreseek_test sql_port </span>= <span>3306</span><span> sql_query_pre </span>=<span> SET NAMES utf8 sql_query_pre </span>= REPLACE INTO sph_counter SELECT <span>1</span>,MAX(<span>id</span><span>) FROM hr_spider_company; sql_query </span>= SELECT * FROM hr_spider_company WHERE <span>id</span>1<span> ) #sql_query第一列id需为整数 #title、content作为字符串</span>/<span>文本字段,被全文索引 sql_attr_uint </span>= <span>id</span><span> #从SQL读取到的值必须为整数 sql_attr_uint </span>=<span> from_id #从SQL读取到的值必须为整数,不支持全文检索 sql_attr_uint </span>=<span> link_id #从SQL读取到的值必须为整数,不支持全文检索 sql_attr_uint </span>=<span> add_time #从SQL读取到的值必须为整数,不支持全文检索 sql_field_string </span>=<span> link_url #字符串字段(可全文搜索,可返回原始文本信息) sql_field_string </span>=<span> company_name #字符串字段(可全文搜索,可返回原始文本信息) sql_field_string </span>=<span> type_name #字符串字段(可全文搜索,可返回原始文本信息) sql_field_string </span>=<span> trade_name #字符串字段(可全文搜索,可返回原始文本信息) sql_field_string </span>=<span> email #字符串字段(可全文搜索,可返回原始文本信息) sql_field_string </span>=<span> description #字符串字段(可全文搜索,可返回原始文本信息) sql_query_info_pre </span>=<span> SET NAMES utf8 #命令行查询时,设置正确的字符集 sql_query_info </span>= SELECT <span>id</span>,from_id,link_id,company_name,type_name,trade_name,address,description, FROM_UNIXTIME(add_time) AS add_time FROM hr_spider_company WHERE <span>id</span>=$<span>id</span><span> #命令行查询时,从数据库读取原始数据信息 } source delta : main { sql_query_pre </span>=<span> SET NAMES utf8 sql_query </span>= SELECT * FROM hr_spider_company WHERE <span>id</span>>( SELECT max_doc_id FROM sph_counter WHERE counter_id=<span>1</span><span> ) sql_query_post_index </span>= REPLACE INTO sph_counter SELECT <span>1</span>,MAX(<span>id</span><span>) FROM hr_spider_company } #index定义 index main { source </span>=<span> main #对应的source名称 path </span>= /usr/local/coreseek/var/data/mysql #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>... docinfo </span>=<span> extern mlock </span>= <span>0</span><span> morphology </span>=<span> none min_word_len </span>= <span>1</span><span> html_strip </span>= <span>0</span><span> #中文分词配置,详情请查看:http:</span><span>//</span><span>www.coreseek.cn/products-install/coreseek_mmseg/</span> charset_dictpath = /usr/local/mmseg3/etc/ #BSD、Linux环境下设置,/<span>符号结尾 charset_type </span>= zh_cn.utf-<span>8</span><span> } index delta : main { source </span>=<span> delta path </span>= /usr/local/coreseek/var/data/<span>delta } #全局index定义 indexer { mem_limit </span>=<span> 128M } #searchd服务定义 searchd { listen </span>= <span>9312</span><span> read_timeout </span>= <span>5</span><span> max_children </span>= <span>30</span><span> max_matches </span>= <span>1000</span><span> seamless_rotate </span>= <span>0</span><span> preopen_indexes </span>= <span>0</span><span> unlink_old </span>= <span>1</span><span> pid_file </span>= /usr/local/coreseek/var/log/searchd_mysql.pid #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>... log </span>= /usr/local/coreseek/var/log/searchd_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>... query_log </span>= /usr/local/coreseek/var/log/query_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>... binlog_path </span>=<span> #关闭binlog日志 }</span>
我的测试表名为hr_spider_company,你只需要根据实际需求更改为自己的表名即可。
调用命令列表:
启动后台服务(必须开启)
# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf
执行索引(查询、测试前必须执行一次)
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
执行增量索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate
合并索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted <span>0</span> <span>0</span>
(为了防止多个关键字指向同一个文档加上--merge-dst-range deleted 0 0)
后台服务测试
# /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/csft_mysql.conf aaa
关闭后台服务
# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop
自动化命令:
crontab -e
*/1 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate */5 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted <span>0</span> <span>0</span> 30 1 * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
以下任务计划的意思是:每隔一分钟执行一遍增量索引,每五分钟执行一遍合并索引,每天1:30执行整体索引。
Coreseek官方教程中建议php使用直接include一个php文件进行操作,事实上php有独立的sphinx模块可以直接操作
coreseek(coreseek就是sphinx!)已经进入了php的官方函数库,而且效率的提升不是一点点!但php模块依赖于
libsphinxclient包。
# cd /var/<span>install</span>/coreseek-<span>4.1</span>-beta/csft-<span>4.1</span>/api/libsphinxclient/<span> # .</span>/configure --prefix=/usr/local/<span>sphinxclient configure: creating .</span>/<span>config.status config.status: creating Makefile config.status: error: cannot </span><span>find</span> input <span>file</span>: Makefile.<span>in</span><span> #报错configure失败 </span><span>//</span><span>处理configure报错</span> 编译过程中报了一个config.status: error: cannot <span>find</span> input <span>file</span>: src/<span>Makefile.in这个的错误,然后运行下列指令再次编译就能通过了: # aclocal # libtoolize </span>--<span>force # automake </span>--add-<span>missing # autoconf # autoheader # </span><span>make</span><span> clean </span><span>//</span><span>从新configure编译</span> # ./<span>configure # </span><span>make</span> && <span>make</span> <span>install</span>
http:<span>//</span><span>pecl.php.net/package/sphinx</span> # <span>wget</span> http:<span>//</span><span>pecl.php.net/get/sphinx-1.3.0.tgz</span> # <span>tar</span> zxvf sphinx-<span>1.3</span>.<span>0</span><span>.tgz # cd sphinx</span>-<span>1.3</span>.<span>0</span><span> # phpize # .</span>/configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/<span>sphinxclient # </span><span>make</span> && <span>make</span> <span>install</span><span> # cd </span>/etc/php.d/<span> # </span><span>cp</span><span> gd.ini sphinx.ini # </span><span>vi</span><span> sphinx.ini extension</span>=<span>sphinx.so # service php</span>-fpm restart
打开phpinfo看一下是否已经支持了sphinx模块。
<span>php </span><span>$s</span> = <span>new</span><span> SphinxClient; </span><span>$s</span>->setServer("127.0.0.1", 9312<span>); </span><span>$s</span>-><span>setMatchMode(SPH_MATCH_PHRASE); </span><span>$s</span>->setMaxQueryTime(30<span>); </span><span>$res</span> = <span>$s</span>->query("宝马",'main'); <span>#</span><span>[宝马]关键字,[main]数据源source</span> <span>$err</span> = <span>$s</span>-><span>GetLastError(); </span><span>var_dump</span>(<span>array_keys</span>(<span>$res</span>['matches'<span>])); </span><span>echo</span> "<br>"."通过获取的ID来读取数据库中的值即可。"."<br>"<span>; </span><span>echo</span> '<pre class="brush:php;toolbar:false">'<span>; </span><span>var_dump</span>(<span>$res</span><span>); </span><span>var_dump</span>(<span>$err</span><span>); </span><span>echo</span> '
调用示例二:支持分页
<span>php </span><span>header</span>("Content-type: text/html; charset=utf-8"<span>); </span><span>require</span>("./sphinxapi.php"<span>); </span><span>$s</span> = <span>new</span><span> SphinxClient; </span><span>$s</span>->setServer("192.168.252.132", 9312<span>); </span><span>//</span><span>SPH_MATCH_ALL, 匹配所有查询词(默认模式); SPH_MATCH_ANY, 匹配查询词中的任意一个; SPH_MATCH_EXTENDED2, 支持特殊运算符查询</span> <span>$s</span>-><span>setMatchMode(SPH_MATCH_ALL); </span><span>$s</span>->setMaxQueryTime(30); <span>//</span><span>设置最大搜索时间</span> <span>$s</span>->SetArrayResult(<span>false</span>); <span>//</span><span>是否将Matches的key用ID代替</span> <span>$s</span>->SetSelect ( "*" ); <span>//</span><span>设置返回信息的内容,等同于SQL</span> <span>$s</span>->SetRankingMode(SPH_RANK_BM25); <span>//</span><span>设置评分模式,SPH_RANK_BM25可能使包含多个词的查询的结果质量下降。 //$s->SetSortMode(SPH_SORT_EXTENDED); //发现增加此参数会使结果不准确 //$s->SetSortMode(SPH_SORT_EXTENDED,"from_id asc,id desc"); //设置匹配项的排序模式, SPH_SORT_EXTENDED按一种类似SQL的方式将列组合起来,升序或降序排列。</span> <span>$weights</span> = <span>array</span> ('company_name' => 20<span>); </span><span>$s</span>->SetFieldWeights(<span>$weights</span>); <span>//</span><span>设置字段权重</span> <span>$s</span>->SetLimits ( 0, 30, 1000, 0 ); <span>//</span><span>设置结果集偏移量 SetLimits (便宜量,匹配项数目,查询的结果集数默认1000,阀值达到后停止) //$s->SetFilter ( $attribute, $values, $exclude=false ); //设置属性过滤 //$s->SetGroupBy ( $attribute, $func, $groupsort="@group desc" ); //设置分组的属性</span> <span>$res</span> = <span>$s</span>->query('@* "汽车"','main','--single-0-query--'); <span>#</span><span>[宝马]关键字,[news]数据源source //代码高亮</span> <span>$tags</span> = <span>array</span><span>(); </span><span>$tags_name</span> = <span>array</span><span>(); </span><span>foreach</span>(<span>$res</span>['matches'] <span>as</span> <span>$key</span>=><span>$value</span><span>){ </span><span>$tags</span>[] = <span>$value</span>['attrs'<span>]; </span><span>$company_name</span>[] = <span>$value</span>['attrs']['company_name'<span>]; </span><span>$description</span>[] = <span>$value</span>['attrs']['description'<span>]; } </span><span>$company_name</span> = <span>$s</span>->BuildExcerpts (<span>$company_name</span>, 'main', '汽车', <span>$opts</span>=<span>array</span>() ); <span>//</span><span>执行高亮,这里索引名字千万不能用*</span> <span>$description</span> = <span>$s</span>->BuildExcerpts (<span>$description</span>, 'main', '汽车', <span>$opts</span>=<span>array</span>() ); <span>//</span><span>执行高亮,这里索引名字千万不能用*</span> <span>foreach</span>(<span>$tags</span> <span>as</span> <span>$k</span>=><span>$v</span><span>) { </span><span>$tags</span>[<span>$k</span>]['company_name'] = <span>$company_name</span>[<span>$k</span>]; <span>//</span><span>高亮后覆盖</span> <span>$tags</span>[<span>$k</span>]['description'] = <span>$description</span>[<span>$k</span>]; <span>//</span><span>高亮后覆盖</span> <span> } </span><span>//</span><span> 高亮后覆盖</span> <span>$i</span> = 0<span>; </span><span>foreach</span>(<span>$res</span>['matches'] <span>as</span> <span>$key</span>=><span>$value</span><span>){ </span><span>$res</span>['matches'][<span>$key</span>] = <span>$tags</span>[<span>$i</span><span>]; </span><span>$i</span>++<span>; } </span><span>$err</span> = <span>$s</span>-><span>GetLastError(); </span><span>echo</span> '<pre class="brush:php;toolbar:false">'<span>; </span><span>var_export</span>(<span>$res</span><span>); </span><span>var_export</span>(<span>$err</span><span>); </span><span>echo</span> '
还有很对地方需要参考:http://www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#api-reference