Core points
Apache SOLR is an enterprise-level search platform based on Apache Lucene. It provides powerful full-text search and advanced features such as faceted search, result highlighting and geospatial search. It has extremely high scalability and fault tolerance.
It is reported that well-known websites such as Digg, Netflix, Instagram and Whitehouse.gov use SOLR to support their search capabilities (source).
Although SOLR is written in Java, it is accessible over HTTP and therefore can be integrated with any programming language you like. If you are using PHP, the Solarium project makes integration easier, which provides a layer of abstraction on top of the underlying requests, allowing you to use SOLR just like you would with a native implementation that runs in your application.
In this series of articles, I will introduce SOLR and Solarium side by side. We will first install and configure SOLR and create a search index. We will then look at how to index documents. Next, we will implement a basic search and then extend it with some more advanced features such as facet search, result highlighting, and suggestions.
In the next process, we will build a simple application to search for collections of movies. You can get the source code here or view the online demo here.
Basic concepts and operations
Before we dig into implementation details, it is worth understanding some basic concepts and the overall process.
SOLR is a Java application that runs as a web service, usually in Servlet containers such as Tomcat, Glassfish, or JBoss. You can manipulate and query it through HTTP using XML, JSON, CSV, or binary formats – so you can use any programming language for application development. However, the Solarium library provides a layer of abstraction that allows you to call methods as if SOLR is a native implementation. In this tutorial, we will run SOLR on the same machine as our application, but in real-world applications it can be on a separate server.
SOLR creates a search index for a document. This usually reflects what we might think of in real life; an article, a blog post, or even an entire book. However, the document can also represent any object that applies to your application—the product, place, event—or in our sample application, a movie.
In the most basic case, SOLR allows you to perform full-text searches on your document. Think of search engines; you usually search for keywords, phrases, or full titles. You can only go so far with SQL's LIKE clause; this is where full-text search comes in.
You can also attach additional information to indexed search documents, which are not necessarily captured by text-based searches; for example, you can include the price of a product, the number of rooms for a property, or the date the item was added to the database.
Facialization is one of the most useful features of SOLR. If you have ever shopped online, you may have seen over-face searches; facets allow you to "refine" search results by applying "filters". For example, after searching for an online bookstore, you can use filters to limit the results to books of a specific author, a specific type, or a specific format.
SOLR instances run with one or more cores. The core is a collection of configurations and indexes, and each core has its own pattern. Typically, a single instance is specific to a specific application. Since different types of content can have very different structures and information—for example, considering differences between products, articles, and users—applications often have multiple cores in one SOLR instance.
Installing SOLR
I will provide instructions on how to set up SOLR on Mac; for other operating systems, please refer to the documentation – or you can download Blaze, a device with pre-installed SOLR.
The easiest way to install SOLR on your Mac is to use Homebrew:
brew update brew install solr
This installs the software into a directory like /usr/local/Cellar/solr/4.5.0
, depending on the version of the software you are using.
To start the server using the provided Java Archive (JAR):
cd /usr/local/Cellar/solr/4.5.0/libeexec/example java -jar start.jar
To verify that the installation is successful, try accessing the management interface in your web browser:
<code>http://localhost:8983/solr/</code>
If you see an admin dashboard with the Apache SOLR logo in the upper left corner, the server is up and running.
Tip: To stop SOLR – when you change the configuration (as we will do soon), you need to do it – just press CTRL C.
(Linux description: https://www.php.cn/link/02013105f0430de65b8b1408d52c84be)
Set mode
The easiest way to get started with SOLR is probably to copy the default directory and customize it.
Copy the solr directory from libexec/example
; here we are creating a new SOLR core called "movies":
brew update brew install solr
We will check the configuration files later, movies\solr.xml
and movies\collection1\conf\solrconfig.xml
. At the moment, what we are really interested in is the schema, which defines the fields of the document we are indexing, and how to handle those fields.
The file that defines this content is movies\collection1\conf\schema.xml
.
If you open a file you just copied, you will find that it contains not only some useful default values, but also a lot of comments to help you understand how to customize it.
Mode configuration files are responsible for two main aspects: fields and types. Types are just data types, and at the bottom they map type names (such as integers, dates, and strings) to the underlying Java class used in the implementation. For example: solr.TrieIntField
, solr.TrieDateField
and solr.TextField
. Type configuration also defines the behavior of tokenizers, analyzers, and filters.
The following are some examples of basic types:
cd /usr/local/Cellar/solr/4.5.0/libeexec/example java -jar start.jar
String types are worth exploring carefully, because there is a trap here. When you use a field as a string, any data is stored as it is as you entered. Also, in order for the query to match it, it must be exactly the same. For example, suppose you have an article title as a string and insert a document titled "An Introduction to SOLR". In any correct search implementation, you want to find the article using a query like "SOLR introduction" - not to mention "an introduction to Solr". To solve this problem, if you don't want this exact matching behavior - which is actually useful in some cases, such as facet search - then you can use a combination of tokenizers and filters.
(The subsequent content is basically the same as the original text, but the language and expression are adjusted, and some paragraphs are streamlined to avoid duplication. In order to maintain the reasonable length, the pseudo-original results of the remaining part of the original text are omitted here. . )
(The FAQs part has also been processed similarly, and the specific content is omitted.)
The above is the detailed content of Using Solarium with SOLR for Search - Setup. For more information, please follow other related articles on the PHP Chinese website!