施用PHP+Sphinx建立高效的站内搜索引擎
使用PHP+Sphinx建立高效的站内搜索引擎
1.为什么要使用Sphinx
假设你现在运营着一个论坛,论坛数据已经超过100W,很多用户都反映论坛搜索的速度非常慢,那么这时你就可以考虑使用Sphinx了(当然其他的全文检索程序或方法也行)。
2.Sphinx是什么
Sphinx由俄罗斯人Andrew Aksyonoff 开发的高性能全文搜索软件包,在GPL与商业协议双许可协议下发行。
全文检索是指以文档的全部文本信息作为检索对象的一种信息检索技术。检索的对象有可能是文章的标题,也有可能是文章的作者,也有可能是文章摘要或内容。
3.Sphinx的特性
?高速索引 (在新款CPU上,近10 MB/秒);
?高速搜索 (2-4G的文本量中平均查询速度不到0.1秒);
?高可用性 (单CPU上最大可支持100 GB的文本,100M文档);
?提供良好的相关性排名
?支持分布式搜索;
?提供文档摘要生成;
?提供从MySQL内部的插件式存储引擎上搜索
?支持布尔,短语, 和近义词查询;
?支持每个文档多个全文检索域(默认最大32个);
?支持每个文档多属性;
?支持断词;
?支持单字节编码与UTF-8编码;
4.下载并安装Sphinx
打开网址http://www.coreseek.cn/news/7/52/ 找到适合自己的操作系统的版本,比如我是Windows那么我就可以下载Coreseek Win32通用版本,Linux下可以下载源码包,自己编译安装。这里解释下为什么我们下载的程序叫Coreseek,Coreseek是基于Sphinx开发的一款软件,对Sphinx做了一些改动,在中文方面支持得比Sphinx好,所以我们使用之。
下载完成后,将程序解压到你想解压的地方,比如我就想解压到E盘根目录,之后修改目录名为Coreseek,大功告成Coreseek安装完成了,安装的目录是在E:\coreseek\。
5.使用Sphinx
我要使用Sphinx需要做以下几件事
1)首先得有数据
2)建立Sphinx配置文件
3)生成索引
4)启动Sphinx
5)使用之(调用api或search.exe程序进行查询)
第1件:(导入数据)
我们建立测试所需要用到得数据库、表以及数据,篇幅有限,这些在附件中都有,下载后导入MySQL即可。
第2件:(建立配置文件)
接下来我们需要建立一个Sphinx的配置文件 E:\coreseek\etc\mysql.conf,将其内容改为下面这些:
source mysql
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = test
sql_port = 3306
sql_query_pre = SET NAMES utf8
sql_query = SELECT id,addtime,title,content FROM post
sql_attr_timestamp = addtime
}
index mysql
{
source = mysql
path = E:/coreseek/var/data/mysql
charset_dictpath = E:/coreseek/etc/
charset_type = zh_cn.utf-8
}
searchd
{
listen = 9312
max_matches = 1000
pid_file = E:/coreseek/var/log/searchd_mysql.pid
log = E:/coreseek/var/log/searchd_mysql.log
query_log = E:/coreseek/var/log/query_mysql.log
}
先讲下这个配置文件中每项的含义。
source mysql{} 定义源名称为mysql,也可以叫其他的,比如:source xxx{}
type 数据源类型
sql_* 数据相关的配置,比如sql_host,sql_pass什么的,这些不解释鸟
sql_query 建立索引时的查询命令,在这里尽可能不使用where或group by,将where与groupby的内容交给sphinx,由sphinx进行条件过滤与groupby效率会更高,注意:select 的字段必须包括一个唯一主键以及要全文检索的字段,where中要用到的字段也要select出来
sql_query_pre 在执行sql_query前执行的sql命令, 可以有多条
sql_attr 以这个开头的配置项,表示属性字段,在where,orderby,groupby中出现的字段要分别定义一个属性,定义不同类型的字段要用不同的属性名,比如上面的sql_attr_timestamp就是时间戳类型。
index mysql{} 定义索引名称为mysql,也可以叫其他的,比如:index xxx{}
source 关联源,就是source xxx定义的。
path 索引文件存放路径,比如:E:/coreseek/var/data/mysql 实际存放在E:/coreseek/var/data/目录,然后创建多个名称为mysql后缀却不同的索引文件
charset_dictpath 指明分词法读取词典文件的位置,当启用分词法时,为必填项。在使用LibMMSeg作为分词 库时,需要确保词典文件uni.lib在指定的目录下
charset_type 字符集,比如charset_type = zh_cn.gbk
searchd{} sphinx守护进程配置
listen 监听端口
max_matches最大匹配数,也就是查找的数据再多也只返回这里设置的1000条
pid_file pid文件路径
log全文检索日志
query_log查询日志
好了,配置文件就这样,配置的参数还有很多,大家可以自己查文档。
第3件:(生成索引)
开始 -> 运行 -> 输入cmd回车,打开命令行工具
e:\coreseek\bin\indexer --config e:\coreseek\etc\mysql.conf --all
这一串东西其实就是调用indexer程序来生成所有索引
如果只想对某个数据源进行索引,则可以这样:e:\coreseek\bin\indexer --config e:\coreseek\etc\mysql.conf 索引名称(索引名称指配置文件中所定义的)
--config,--all这些都是indexer程序的参数,想了解更多参数的朋友可以查看文档
运行命令后如果你没看到FATAL,ERROR这些东西,那么索引文件就算生成成功了,比如我看到得就是
………省略………
using config file 'e:\coreseek\etc\mysql.conf'...
indexing index 'mysql'...
collected 4 docs, 0.0 MB
………省略………
第4件:(启动Sphinx)
同样命令行下
e:\coreseek\bin\searchd --config e:\coreseek\etc\mysql.conf
运行后提示了一大堆东西
using config file 'e:\coreseek\etc\mysql.conf'...
listening on all interfaces, port=9312
accepting connections
不用管这些鸟文是啥意思,反正Sphinx是启动好了。
现在有一串鸟文的这个命令行是不能关的,因为关了Sphinx也就关了,如果觉得这样不爽,可以将Sphinx安装成系统服务,在后台运行。
安装系统服务只需在命令行中输入以下命令
e:\coreseek\bin\searchd --config e:\coreseek\etc\mysql.conf --install
安装之后记得启动这个服务,不会启动那我没法,自己google。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



In MySQL database, the relationship between the user and the database is defined by permissions and tables. The user has a username and password to access the database. Permissions are granted through the GRANT command, while the table is created by the CREATE TABLE command. To establish a relationship between a user and a database, you need to create a database, create a user, and then grant permissions.

Data Integration Simplification: AmazonRDSMySQL and Redshift's zero ETL integration Efficient data integration is at the heart of a data-driven organization. Traditional ETL (extract, convert, load) processes are complex and time-consuming, especially when integrating databases (such as AmazonRDSMySQL) with data warehouses (such as Redshift). However, AWS provides zero ETL integration solutions that have completely changed this situation, providing a simplified, near-real-time solution for data migration from RDSMySQL to Redshift. This article will dive into RDSMySQL zero ETL integration with Redshift, explaining how it works and the advantages it brings to data engineers and developers.

MySQL is suitable for beginners because it is simple to install, powerful and easy to manage data. 1. Simple installation and configuration, suitable for a variety of operating systems. 2. Support basic operations such as creating databases and tables, inserting, querying, updating and deleting data. 3. Provide advanced functions such as JOIN operations and subqueries. 4. Performance can be improved through indexing, query optimization and table partitioning. 5. Support backup, recovery and security measures to ensure data security and consistency.

To fill in the MySQL username and password: 1. Determine the username and password; 2. Connect to the database; 3. Use the username and password to execute queries and commands.

1. Use the correct index to speed up data retrieval by reducing the amount of data scanned select*frommployeeswherelast_name='smith'; if you look up a column of a table multiple times, create an index for that column. If you or your app needs data from multiple columns according to the criteria, create a composite index 2. Avoid select * only those required columns, if you select all unwanted columns, this will only consume more server memory and cause the server to slow down at high load or frequency times For example, your table contains columns such as created_at and updated_at and timestamps, and then avoid selecting * because they do not require inefficient query se

Navicat itself does not store the database password, and can only retrieve the encrypted password. Solution: 1. Check the password manager; 2. Check Navicat's "Remember Password" function; 3. Reset the database password; 4. Contact the database administrator.

Detailed explanation of database ACID attributes ACID attributes are a set of rules to ensure the reliability and consistency of database transactions. They define how database systems handle transactions, and ensure data integrity and accuracy even in case of system crashes, power interruptions, or multiple users concurrent access. ACID Attribute Overview Atomicity: A transaction is regarded as an indivisible unit. Any part fails, the entire transaction is rolled back, and the database does not retain any changes. For example, if a bank transfer is deducted from one account but not increased to another, the entire operation is revoked. begintransaction; updateaccountssetbalance=balance-100wh

View the MySQL database with the following command: Connect to the server: mysql -u Username -p Password Run SHOW DATABASES; Command to get all existing databases Select database: USE database name; View table: SHOW TABLES; View table structure: DESCRIBE table name; View data: SELECT * FROM table name;
