Home > php教程 > php手册 > body text

启用sphinx全文搜索与实例

WBOY
Release: 2016-05-24 13:25:36
Original
1227 people have browsed it

在编译安装 sphinx 的时候出现很多中文乱码,最后抛出错误卡住了,我去到官方直接下载一个 rpm 包,安装就很爽,具体错误不想研究了,忙开发呢.

安装两个包,一个是 mmseg 这个是生成中文字典的程序,一个是  csft 也就是中国版的sphinx .

rpm -ivh 安装完以后,很顺利~~不到半分钟就装完了.

中文字典库,我直接去 csft 官方下载了,挺好的想得很周到.

unigram.txt  uni.lib 

unigram.txt  字典文本,可以在里面添加你自己的关键字.

然后使用:mmseg -u unigram.txt 生成字典文件:unigram.txt.uni 然后重命名一下  uni.lib 这个就是sphinx 认识的字典了.

放哪里?放你在 sphinx.conf 里面配置的字典路径里面,等会说到,然后基本就差不多了,在看下sphinx 几个实用的程序:

[root@beihai365 /]# csft-

csft-indexer  csft-search   csft-searchd

csft-indexer  是生成全文搜索索引的 程序

csft-search  是测试搜索是否生效用的,也很好用,不如我还没用客户端脚本开发,就可以用这个来查看全文搜索是否成功

csft-searchd  这个就是 sphinx 搜索的守护程序了。 启动以后,就可以用脚本 php python 等,开查询了。

就那么简单,在看下关键的两部分东西.

sphinx.conf 配置文件:

source tmsgs    
{    
    type                                    = mysql    
    sql_host                                = localhost    
    sql_user                                = root    
    sql_pass                                = 1    
    sql_db                                  = phpwind75sp3    
    sql_port                                = 3306  # optional, default is 3306    
    #sql_sock                                = /tmp/mysql3307.sock    
    sql_query_pre                           = SET NAMES gbk    
    sql_query                               = SELECT id,name,type,stock FROM pw_tools    
    #sql_attr_uint                          = id    
    sql_attr_uint                           = stock    
}    
index tmsgsindex    
{    
    source                                  = tmsgs    
    path                                    = /var/mmseg/searchdata/beihai365    
    docinfo                                 = extern    
    charset_type                            = zh_cn.gbk    
    #min_prefix_len  = 0    
    #min_infix_len  = 2    
    #ngram_len = 2    
    charset_dictpath                        = /var/mmseg/data    
    #min_prefix_len                          = 0    
    #min_infix_len                           = 0    
    #min_word_len                            = 2    
}    
indexer    
{    
    mem_limit                               = 128M    
}    
searchd    
{    
    #listen                                = 3312    
    log                                 = /var/log/searchd.log    
    query_log                           = /var/log/query.log    
    read_timeout                        = 5    
    max_children                        = 30    
    pid_file                            = /var/log/searchd.pid    
    max_matches                         = 1000    
    #seamless_rotate                     = 1    
    #preopen_indexes                     = 0    
    #unlink_old                          = 1    
}   
source tmsgs 
{ 
    type                                    = mysql 
    sql_host                                = localhost 
    sql_user                                = root 
    sql_pass                                = 1 
    sql_db                                  = phpwind75sp3 
    sql_port                                = 3306  # optional, default is 3306 
    #sql_sock                                = /tmp/mysql3307.sock 
    sql_query_pre                           = SET NAMES gbk 
    sql_query                               = SELECT id,name,type,stock FROM pw_tools 
    #sql_attr_uint                          = id 
    sql_attr_uint                           = stock 
} 
index tmsgsindex 
{ 
    source                                  = tmsgs 
    path                                    = /var/mmseg/searchdata/beihai365 
    docinfo                                 = extern 
    charset_type                            = zh_cn.gbk 
    #min_prefix_len  = 0 
    #min_infix_len  = 2 
    #ngram_len = 2 
    charset_dictpath                        = /var/mmseg/data 
    #min_prefix_len                          = 0 
    #min_infix_len                           = 0 
    #min_word_len                            = 2 
} 
indexer 
{ 
    mem_limit                               = 128M 
} 
searchd 
{ 
    #listen                                = 3312 
    log                                 = /var/log/searchd.log 
    query_log                           = /var/log/query.log 
    read_timeout                        = 5 
    max_children                        = 30 
    pid_file                            = /var/log/searchd.pid 
    max_matches                         = 1000 
    #seamless_rotate                     = 1 
    #preopen_indexes                     = 0 
    #unlink_old                          = 1 
}
Copy after login

再看一下,测试客户端代码:

<?php    
header("Content-type:text/html;charset=utf-8");    
include &#39;sphinxapi.php&#39;;    
$cl = new SphinxClient();    
$cl->SetServer(&#39;localhost&#39;,3312);    
$cl->SetMatchMode(SPH_MATCH_ALL);    
$cl->SetArrayResult(true);    
$res = $cl->Query("名卡","*");    
print_r($res);
Copy after login


<?php 
header("Content-type:text/html;charset=utf-8"); 
include &#39;sphinxapi.php&#39;; 
$cl = new SphinxClient(); 
$cl->SetServer(&#39;localhost&#39;,3312); 
$cl->SetMatchMode(SPH_MATCH_ALL); 
$cl->SetArrayResult(true); 
$res = $cl->Query("名卡","*"); 
print_r($res); 
//开源代码phprm.com
Copy after login


"名卡"这个关键字是我自己手动在字典里面添加的,看是否能真的搜到,实例代码如下:

Array    
(    
[error] =>     
[warning] =>     
[status] => 0    
[fields] => Array    
(    
[0] => name    
[1] => type    
)    
[attrs] => Array    
(    
[stock] => 1    
)    
[matches] => Array    
    (    
        [0] => Array    
            (    
                [id] => 8    
                [weight] => 1    
                [attrs] => Array    
                    (    
                        [stock] => 100    
                    )    
            )    
    )    
[total] => 1    
[total_found] => 1    
[time] => 0.018    
[words] => Array    
    (    
        [名卡] => Array    
            (    
                [docs] => 1    
                [hits] => 1    
            )    
    )    
)   
Array 
( 
[error] =>  
[warning] =>  
[status] => 0 
[fields] => Array 
    ( 
        [0] => name 
        [1] => type 
    ) 
[attrs] => Array 
    ( 
        [stock] => 1 
    ) 
[matches] => Array 
    ( 
        [0] => Array 
            ( 
                [id] => 8 
                [weight] => 1 
                [attrs] => Array 
                    ( 
                        [stock] => 100 
                    ) 
            ) 
    ) 
[total] => 1 
[total_found] => 1 
[time] => 0.018 
[words] => Array 
( 
    [名卡] => Array 
    ( 
        [docs] => 1 
        [hits] => 1 
    ) 
) 
)
Copy after login


完全没问题,搜索出来了,几个关键的操作:

[root@beihai365 /]# csft-searchd --stop 停止搜索守护
[root@beihai365 /]# csft-indexer --all 针对所有节点生成索引,你也可以针对某个节点生成索引比如:csft-indexer  xx
[root@beihai365 /]# csft-search App 搜索关键字 App,不过看下面信息没有搜到和没有命中任何的文档.
Coreseek Full Text Server 3.1 
Copyright (c) 2006-2008 coreseek.com 
using config file &#39;./csft.conf&#39;... 
1, 
pt:1, 1;        index &#39;tmsgsindex&#39;: query &#39;App &#39;: returned 0 matches of 0 total in 0.017 sec 
words: 
1. &#39;app&#39;: 0 documents, 0 hits
Copy after login

 

当大家在运行这些命令的时候发现,需要你自己手动的置顶 --config  sphinx.conf   配置文件的路径,很不方便,所以我干脆 ln -s 一个在 ./,这样不用每次都去敲入  --config.


Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Recommendations
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!