【原创】用coreseek快速搭建sphinx中文分词搜索引擎
以下内容基于linux 系统。 yum -y install glibc-common libtool autoconf automake mysql-devel expat-devel#如果不安装这个 可能下面 sh buildconf.sh会报错!!!cd /data/srctar -xjf ../software/autoconf-2.64.tar.bz2cd autoconf-2.64/./configuremak
以下内容基于linux 系统。
yum -y install glibc-common libtool autoconf automake mysql-devel expat-devel #如果不安装这个 可能下面 sh buildconf.sh会报错!!! cd /data/src tar -xjf ../software/autoconf-2.64.tar.bz2 cd autoconf-2.64/ ./configure make && make install cd ../ cd /data/software wget http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz cd /data/src tar zxf ../software/coreseek-4.1-beta.tar.gz cd coreseek-4.1-beta/mmseg-3.2.14 ./bootstrap ./configure --prefix=/usr/local/mmseg3 make && make install cd ../ cd /data/src/coreseek-4.1-beta/csft-4.1/ sh buildconf.sh ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --without-mysql make && make install cd ../ ##测试mmseg分词,coreseek搜索(需要预先设置好字符集为zh_CN.UTF-8,确保正确显示中文) cd testpack cat var/test/test.xml #此时应该正确显示中文 /usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc var/test/test.xml /usr/local/coreseek/bin/indexer -c etc/csft.conf --all /usr/local/coreseek/bin/search -c etc/csft.conf 网络搜索 #创建sphinx创建索引的脚本: mkdir -p /data/sh/other
vi /data/sh/other/sphinx_update_index.sh
#!/bin/bash CONFFILE=/usr/local/coreseek/etc/sphinx_index.conf /bin/sed s#var\/data\/#var\/data2\/#g ${CONFFILE} > ${CONFFILE}.2 mkdir -p /usr/local/coreseek/var/data2 #/usr/local/coreseek/bin/indexer --config ${CONFFILE}.2 --all --rotate /usr/local/coreseek/bin/indexer --config ${CONFFILE}.2 --all pkill -9 searchd sleep 4 /bin/rm -rf /usr/local/coreseek/var/data/ /bin/mv /usr/local/coreseek/var/data2/ /usr/local/coreseek/var/data/ sleep 2 /usr/local/coreseek/bin/searchd --config ${CONFFILE}
chmod 755 /data/sh/other/sphinx_update_index.sh
#配置sphinx索引参数配置
vi /usr/local/coreseek/etc/sphinx_index.conf
################################### PHPCMS ############################################ source cc_phpcms { type = mysql sql_host = 172.26.11.75 #此处请改成您的真实配置 sql_user = phpcms #此处请改成您的真实配置 sql_pass = 123456 #此处请改成您的真实配置 sql_db = phpcms #此处请改成您的真实配置 sql_port= 3306 #此处请改成您的真实配置 sql_query_pre = SET SESSION query_cache_type=OFF sql_query_pre = SET character_set_client = 'gbk' sql_query_pre = SET character_set_connection ='gbk' sql_query_pre = SET character_set_results ='utf8' sql_query = SELECT `id`,`catid`,`typeid`,`title`,`status`,`updatetime` from `i_news` #此处请改成您的真实配置 sql_range_step = 1000 sql_attr_timestamp = updatetime sql_attr_uint = catid sql_attr_uint = typeid sql_attr_uint = status sql_query_post = sql_ranged_throttle= 0 } index cc_phpcms { source = cc_phpcms path = /dev/shm/cc_phpcms #放这里比较好,因为这里是linux的内存区! docinfo = extern mlock = 0 enable_star = 1 morphology = none stopwords = min_word_len = 1 charset_dictpath = /usr/local/mmseg3/etc/ #注意此处 charset_type = zh_cn.utf-8 #注意此处 html_strip = 1 html_remove_elements = style, script html_index_attrs = img=alt,title; a=title; } #################################### SETTING ############################################ indexer { mem_limit = 300M } searchd { # address = 0.0.0.0 #listen = 3312 #listen = 9312 #listen = 9306:mysql41 port = 3312 log = /usr/local/coreseek/var/log/searchd.log query_log = /usr/local/coreseek/var/log/query.log read_timeout = 5 max_children = 30 pid_file = /usr/local/coreseek/var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 }
#接下来实现数据源支持:让sphinx支持MySQL数据源
yum -y install mysql-devel libxml2-devel expat-devel cd /data/src/coreseek-4.1-beta/csft-4.1/ make clean sh buildconf.sh ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql make && make install cd ../
##如果出现错误提示:“ERROR: cannot find MySQL include files…….To disable MySQL support, use –without-mysql option.“,可按照如下方法处理:
##请找到头文件mysql.h所在的目录,一般是/usr/local/mysql/include,请替换为实际的
##请找到库文件libmysqlclient.a所在的目录,一般是/usr/local/mysql/lib,请替换为实际的
##configure参数加上:–with-mysql-includes=/usr/local/mysql/include –with-mysql-libs=/usr/local/mysql/lib,执行后,重新编译安装
#跑sphinx服务脚本
/data/sh/other/sphinx_update_index.sh
好了,如果一切正常,将会顺利看到创建索引的信息如下:
下面写一段php代码进行测试(基于sphinx php 的api方式):
$page = (int)$_GET['page']; $page = ($page==0)?1:$page; $perpage = 200; $start = ($page -1) * $perpage; $keyword = urldecode($_GET['key']); require_once (S_ROOT . './api/sphinxapi.php');//请改成您的真实路径 $groupby = ""; $groupsort = "@group desc"; $filter = "fieldid"; $filtervals = array (); $distinct = ""; $sortby = ""; $cl = new SphinxClient(); $cl->SetServer("localhost", 3312); $cl->SetWeights(array ( 100, 1 )); $cl->SetMatchMode(SPH_MATCH_ANY); if (count($filtervals)) { $cl->SetFilter($filter, $filtervals); } if ($groupby) { $cl->SetGroupBy($groupby, SPH_GROUPBY_ATTR, $groupsort); } $order = 1; if ($order == 0) { //按时间倒序 $cl->SetSortMode(SPH_SORT_ATTR_DESC, "inputtime"); } elseif ($order == 1) { //按相关度排序 $cl->SetSortMode(SPH_SORT_RELEVANCE); } if ($distinct) { $cl->SetGroupDistinct($distinct); } $cl->SetLimits($start, $perpage, ($limit > 1000) ? $limit : 1000); $cl->SetRankingMode(SPH_RANK_PROXIMITY_BM25); $cl->SetArrayResult(true); $res = $cl->Query($keyword, 'cc_phpcms'); print_r($res);die;
上面的php代码没有做输入的字符过滤,这个请按自己的需要加上。
另外,
/data/sh/other/sphinx_update_index.sh 跑了一次后,
请
vi /data/sh/other/sphinx_update_index.sh
将
#/usr/local/coreseek/bin/indexer --config ${CONFFILE}.2 --all --rotate /usr/local/coreseek/bin/indexer --config ${CONFFILE}.2 --all
变成
/usr/local/coreseek/bin/indexer --config ${CONFFILE}.2 --all --rotate #/usr/local/coreseek/bin/indexer --config ${CONFFILE}.2 --all
也就是将注释调换,这样以后就可以设定个定时计划跑/data/sh/other/sphinx_update_index.sh 脚本了,
跑了/sphinx_update_index.sh 脚本后,自动会用–rotate的方式重建索引,也就是说新增加的内容也将会被索引到了。
当然,最好的方法还是做个实时索引的配置,下一篇将会重点介绍sphinx的实时索引功能!
原文地址:【原创】用coreseek快速搭建sphinx中文分词搜索引擎, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Call of Duty Warzone is a newly launched mobile game. Many players are very curious about how to set the language of this game to Chinese. In fact, it is very simple. Players only need to download the Chinese language pack, and then You can modify it after using it. The detailed content can be learned in this Chinese setting method introduction. Let us take a look together. How to set the Chinese language for the mobile game Call of Duty: Warzone 1. First enter the game and click the settings icon in the upper right corner of the interface. 2. In the menu bar that appears, find the [Download] option and click it. 3. Select [SIMPLIFIEDCHINESE] (Simplified Chinese) on this page to download the Simplified Chinese installation package. 4. Return to the settings

Excel spreadsheet is one of the office software that many people are using now. Some users, because their computer is Win11 system, so the English interface is displayed. They want to switch to the Chinese interface, but they don’t know how to operate it. To solve this problem, this issue The editor is here to answer the questions for all users. Let’s take a look at the content shared in today’s software tutorial. Tutorial for switching Excel to Chinese: 1. Enter the software and click the "File" option on the left side of the toolbar at the top of the page. 2. Select "options" from the options given below. 3. After entering the new interface, click the “language” option on the left

How to display Chinese characters correctly in PHPDompdf When using PHPDompdf to generate PDF files, it is a common challenge to encounter the problem of garbled Chinese characters. This is because the font library used by Dompdf by default does not contain Chinese character sets. In order to display Chinese characters correctly, we need to manually set the font of Dompdf and make sure to select a font that supports Chinese characters. Here are some specific steps and code examples to solve this problem: Step 1: Download the Chinese font file First, we need

VSCode Setup in Chinese: A Complete Guide In software development, Visual Studio Code (VSCode for short) is a commonly used integrated development environment. For developers who use Chinese, setting VSCode to the Chinese interface can improve work efficiency. This article will provide you with a complete guide, detailing how to set VSCode to a Chinese interface and providing specific code examples. Step 1: Download and install the language pack. After opening VSCode, click on the left

Title: An effective way to repair Chinese garbled characters in PHPDompdf. When using PHPDompdf to generate PDF documents, garbled Chinese characters are a common problem. This problem usually stems from the fact that Dompdf does not support Chinese character sets by default, resulting in Chinese content not being displayed correctly. In order to solve this problem, we need to take some effective ways to fix the Chinese garbled problem of PHPDompdf. 1. Use custom font files. An effective way to solve the problem of Chinese garbled characters in Dompdf is to use

"WWE2K24" is a racing sports game created by Visual Concepts and was officially released on March 9, 2024. This game has been highly praised, and many players are eagerly interested in whether it will have a Chinese version. Unfortunately, so far, "WWE2K24" has not yet launched a Chinese language version. Will wwe2k24 be in Chinese? Answer: Chinese is not currently supported. The standard version of WWE2K24 in the Steam Chinese region is priced at 199 yuan, the deluxe version is 329 yuan, and the commemorative edition is 395 yuan. The game has relatively high configuration requirements, and there are certain standards in terms of processor, graphics card, or running memory. Official recommended configuration and minimum configuration introduction:

Tips for solving Chinese garbled characters written by PHP into txt files. With the rapid development of the Internet, PHP, as a widely used programming language, is used by more and more developers. In PHP development, it is often necessary to read and write text files, including txt files that write Chinese content. However, due to encoding format problems, sometimes the written Chinese will appear garbled. This article will introduce some techniques to solve the problem of Chinese garbled characters written into txt files by PHP, and provide specific code examples. Problem analysis in PHP, text

How to change Chinese to English in Google Chrome? Some friends want to set Google Chrome to English so that they can continuously improve their English during use. So how to set it to English? Google Chrome is Chinese by default. Below, I will show you how to set the language of Google Chrome to English. Let’s take a look. Setting steps: 1. Open [Google Chrome], as shown in the figure below. 2. Click the [three dots] menu in the upper right corner of the Google Chrome interface, as shown in the figure below. 3. After entering the menu page, find [Settings], as shown in the figure below. 4. After entering the settings page, click the [Language] option, as shown in the figure below. 5. Select [Add Language] in the language interface, as shown in the figure below.
