Why do you need a search engine? Is a database alone not enough? If you are just creating a small website, a database will suffice. But when you're creating neutral or large-scale applications, search engines are the smarter choice. Of course, small websites can also use Solr to obtain highly relevant search results.
Imagine you are writing a search query program for an e-commerce website. The most straightforward idea is the following database query statement:
SELECT * FROM PRODUCTS WHERE LOWER(title) like LOWER('%$phrase%') OR LOWER(description) like LOWER('%$phrase%');
It works fine when querying phrases in titles or descriptions. But the reality is very complicated, for example, Apple iPhone 4G black 16GB (Apple 4G network iPhone black 16GB). When searching for "iPhone 16G", there are no results. You can replace spaces with % to handle this situation.
$phrase = str_replace(' ', '%', $phrase);
What about when querying "iPhone 16GB 4G"? Apparently the word order changed and it doesn't work properly. I'm guessing you'd add another field to hold the word order. So what should I do if I write the wrong word? What about synonyms? Coming up with a good solution for such a search system is challenging.
Designing an exquisite algorithm is not the key to solving this kind of problem. Text search consumes resources. Putting too much pressure on the database side It is never a good idea. The reason is that the database cannot be easily expanded. You cannot simply add an instance like a web server or Memcached. Expanding the database requires some preparation, code modification, configuration, downtime and maintenance time. In short, the cost is Very expensive. The good news is that Solr is designed to solve this type of problem.
Solr is an enterprise-level search platform based on Apache Lucene. It is fast, stable, has good documentation, and of course is easy to expand. Since Solr is a powerful solution, all its features are not listed one by one in this article. This guy is also quite easy to install.
First download the latest version from the official website version. Solr is an application written in the Java language, and you need the Java Runtime environment to run it.
$ cd solr-4.1.0/example/ $ java -jar start.jar
After a few seconds you will see the following information:
2013-03-09 18:47:41.177:INFO:oejs.AbstractConnector:Started SocketConnector@0.0.0.0:8983
Solr has a web interface that works under port 8983. Open the browser to access http: //localhost:8983/solr/.
In the navigation area on the left hand side you will find "collection1". In Solr, Collections are similar to database tables, and you can query data. Click on a collection and select its submenu "query".
The first option is called "Request-Handler (qt)" and has the default value "/select". Request handlers are a set of predefined queries. If you look at the Solr config file, you'll see something like this:
$ vim solr-4.1.0/example/solr/collection1/conf/solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">10</int> <str name="df">text</str> </lst></requestHandler>
The second parameter is the one we're most interested in. The default value "*:*" means query anything. If you click "execute query", you can get something like the following:
<?xml version="1.0" encoding="UTF-8"?><response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="indent">true</str> <str name="q">*:*</str> <str name="wt">xml</str> </lst> </lst> <result name="response" numFound="0" start="0" /></response>
The index result is empty, but this is not a problem, You need to insert some sample data.
$ cd solr-4.1.0/example/exampledocs/ $ java -jar post.jar monitor.xml SimplePostTool version 1.5 Posting files to base url http://localhost:8983/solr/update using content-type application/xml.. POSTing file monitor.xml1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/update..
Now you can return to the query interface and a document will be returned this time.
The data structure of Collection is defined in the schema file.
$ vim solr-4.1.0/example/solr/collection1/conf/schema.xml
This file has a lot of comments, you can easily tell what they do. If you want to modify the scheme file, please do not delete the field named "text" (if there is no good reason), it is associated with other fields and query statements (including select, look, etc.).
$ grep text solr-4.1.0/example/solr/collection1/conf/schema.xml | grep copy <copyField source="cat" dest="text"/> <copyField source="name" dest="text"/> <copyField source="manu" dest="text"/> <copyField source="features" dest="text"/> <copyField source="includes" dest="text"/> <copyField source="title" dest="text"/> <copyField source="author" dest="text"/> <copyField source="description" dest="text"/> <copyField source="keywords" dest="text"/> <copyField source="content" dest="text"/> <copyField source="content_type" dest="text"/> <copyField source="resourcename" dest="text"/> <copyField source="url" dest="text"/>
If you are using a relational database, you don’t want to have duplicate data. Solr is not a database, and most fields are also processed as text fields. This is the default request handler.
Accessing Solr from PHP requires a client. I recommend downloading one from PECL. It's fast, the API is clear, and the documentation is excellent. However, please note that this extension is now version 1.0.2 and does not support Solr 4.x. Solr 3.x and 4.x have slightly different protocols. But don't worry, I have made the modifications and you can download a compatible version from https://github.com/lukaszkujawa/php-pecl-solr. I've been using it for a while and it's reliable. It is slightly different from the official one. There is an additional Solr version parameter in the SolrClient constructor. This patch will be released in the official version, so you don't have to worry about future compatibility.
$ git clone https://github.com/lukaszkujawa/php-pecl-solr.git$ cd php-pecl-solr/ $ phpize $ whereis php-config php-config: /usr/bin/php-config /usr/bin/X11/php-config $ ./configure --with-php-config=/usr/bin/php-config $ make $ make install
Add to your php.ini
extension=solr.so
Restart the web server
$ /etc/init.d/apache2 restart
Now you can write php to add content to the index.
<?php $options = array ( 'hostname' => '127.0.0.1', ); $client = new SolrClient($options, "4.0"); // 参数4.0针对Solr4.x,其他版本时忽略 $doc = new SolrInputDocument(); $doc->addField('id', 100); $doc->addField('title', 'Hello Wolrd'); $doc->addField('description', 'Example Document'); $doc->addField('cat', 'Foo'); $doc->addField('cat', 'Bar'); $response = $client->addDocument($doc); $client->commit(); /* ------------------------------- */ $query = new SolrQuery(); $query->setQuery('hello'); $query->addField('id') ->addField('title') ->addField('description') ->addField('cat'); $queryResponse = $client->query($query); $response = $queryResponse->getResponse(); print_r( $response->response->docs );
If you add more than one document, she can handle it well without frequent commits.
Knowing how Solr works is valuable and you can use it in many projects. It has a great feature that allows you to pull all the data you need in one request. Of course, it takes some time to master her, but it's worth the effort. Solr has an active community and complete documentation resources. If you are still worried about using it in your project, please read Solr 3 enterprise search server. It not only allows you to quickly set up search services, but is also the basis for your data mining.
Related articles: