Home Backend Development Golang Bleve: How to build a rocket-fast search engine?

Bleve: How to build a rocket-fast search engine?

Jan 03, 2025 am 04:23 AM

Bleve: How to build a rocket-fast search engine?

Go/Golang is one of my favorite languages; I love the minimalism and how clean it is, it's very compact syntax-wise and tries very hard to keep things simple (I am a big fan of the KISS principle).

One of the major challenges I faced in recent times is building a fast search engine. Sure there are options such as SOLR and ElasticSearch; both work really well and are highly scalable, however, I needed to simplify search, by making it faster and easier to deploy with little to no dependencies.

I needed to optimize enough for me to return results quickly so that they could be re-ranked. While C/Rust might be a good fit for this, I value development speed and productivity. Golang is the best of both worlds I guess.

In this article, I will go through a simple example of how you can build your own search engine using Go, you will be surprised: it's not as complicated as you may think.

Golang: Python on steroids

I don't know why, but Golang feels like Python in a way. The syntax is very easy to grasp, maybe it's the lack of semicolons and brackets everywhere or the lack of ugly try-catch statements. Maybe it's the awesome Go formatter, I don't know.

Anyway, since Golang generates a single self-contained binary, it's super easy to deploy to any production server. You simply "go build" and swap out the executable.

Which is exactly what I needed.

Do you Bleve?

No, that's not a typo ?. Bleve is a powerful, easy-to-use, and very flexible search library for Golang.

While as a Go developer, you generally avoid 3rd party packages like the plague; sometimes it makes sense to use a 3rd party package. Bleve is fast, well-designed, and provides sufficient value to justify using it.

In addition, here is why I "Bleve":

  • Self-contained, one of the big advantages of Golang is the single binary, so I wanted to maintain that feel and not need an external DB or service to store and query documents. Bleve runs in memory and writes to disk similar to Sqlite.

  • Easy to extend. Since it's just Go code, I can easily tweak the library or extend it in my codebase as needed.

  • Fast: Search results across 10 million documents take just 50-100ms, this includes filtering.

  • Faceting: you cannot build a modern search engine without some level of faceting support. Bleve has full support for common facet types: like ranges or simple category counts.

  • Fast indexing: Bleve is somewhat slower than SOLR. SOLR can index 10 million documents in 30 minutes, while Bleve takes over an hour, however, an hour or so is still pretty decent and fast enough for my needs.

  • Good quality results. Bleve does well with the keyword results but also some semantic-type searches work really well in Bleve too.

  • Fast startup: If you need to restart or deploy an update, it takes mere milliseconds to restart Bleve. There is no blocking of reads to rebuild the index in memory, so searching the index is possible without hiccups just milliseconds after a restart.

Setting up an index?

In Bleve, an "Index" can be thought of as a database table or a collection (NoSQL). Unlike a regular SQL table, you do not need to specify every single column, you basically can get away with the default schema for most use cases.

To initialize a Bleve index, you can do the following:

mappings := bleve.NewIndexMapping()
index, err = bleve.NewUsing("/some/path/index.bleve", mappings, "scorch", "scorch", nil)
if err != nil {
    log.Fatal(err)
}
Copy after login
Copy after login

Bleve supports a few different index types, but I found after much fiddling that the "scorch" index type gives you the best performance. If you don't pass in the last 3 arguments, Bleve will just default to BoltDB.

Adding documents

Adding documents to Bleve is a breeze. You basically can store any type of struct in the index:

type Book struct {
    ID    int    `json:"id"`
    Name  string `json:"name"`
    Genre string `json:"genre"`
}

b := Book{
    ID:    1234,
    Name:  "Some creative title",
    Genre: "Young Adult",
}
idStr := fmt.Sprintf("%d", b.ID)
// index(string, interface{})
index.index(idStr, b)
Copy after login
Copy after login

If you are indexing a large amount of documents, it's better to use batching:

// You would also want to check if the batch exists already
// - so that you don't recreate it.
batch := index.NewBatch()
if batch.Size() >= 1000 {
    err := index.Batch(batch)
    if err != nil {
        // failed, try again or log etc...
    }
    batch = index.NewBatch()
} else {
    batch.index(idStr, b)
}
Copy after login
Copy after login

As you will notice, a complex task like batching records and writing them to the index is simplified using "index.NewBatch" which creates a container to index documents temporarily.

Thereafter you just check the size as you loop along and flush the index once you reach the batch size limit.

Searching the index

Bleve exposes multiple different search query parsers that you can choose from depending on your search needs. To keep this article short and sweet, I am just going to use the standard query string parser.

searchParser := bleve.NewQueryStringQuery("chicken reciepe books")
maxPerPage := 50
ofsset := 0
searchRequest := bleve.NewSearchRequestOptions(searchParser, maxPerPage, offset, false)
// By default bleve returns just the ID, here we specify
// - all the other fields we would like to return.
searchRequest.Fields = []string{"id", "name", "genre"}
searchResults, err := index.Search(searchResult)
Copy after login
Copy after login

With just these few lines, you now have a powerful search engine that delivers good results with a low memory and resource footprint.

Here is a JSON representation of the search results, "hits" will contain the matching documents:

{
    "status": {
        "total": 5,
        "failed": 0,
        "successful": 5
    },
    "request": {},
    "hits": [],
    "total_hits": 19749,
    "max_score": 2.221337297308545,
    "took": 99039137,
    "facets": null
}
Copy after login
Copy after login

Faceting

As mentioned earlier, Bleve provides full faceting support out of the box without having to set these up in your schema. To Facet on the book "Genre" for example, you can do the following:

//... build searchRequest -- see previous section.
// Add facets
genreFacet := bleve.NewFacetRequest("genre", 50)
searchRequest.AddFacet("genre", genreFacet)
searchResults, err := index.Search(searchResult)
Copy after login
Copy after login

We extend our searchRequest from earlier with just 2 lines of code. The "NewFacetRequest" takes in 2 arguments:

  • Field: the field in our index to facet on (string).

  • Size: the number of entries to count (integer). Thus in our example, it will only count the first 50 genres.

The above will now fill the "facets" in our search results.

Next, we simply just add our facet to the search request. Which takes in a "facet name" and the actual facet. "Facet name" is the "key" you will find this result set under in our search results.

Advanced queries and filtering

While the "QueryStringQuery" parser can get you quite a bit of mileage; sometimes you need more complex queries such as "one must match" where you would like to match a search term against several fields and return results so long as at least one field matches.

You can use the "Disjunction" and "Conjunction" query types to accomplish this.

  • Conjunction Query: Basically, it allows you to chain multiple queries together to form one giant query. All child queries must match at least one document.

  • Disjunction Query: This will allow you to perform the "one must match" query mentioned above. You can pass in x amount of queries and set how many child queries must match at least one document.

Disjunction Query example:

mappings := bleve.NewIndexMapping()
index, err = bleve.NewUsing("/some/path/index.bleve", mappings, "scorch", "scorch", nil)
if err != nil {
    log.Fatal(err)
}
Copy after login
Copy after login

Similar to how we used "searchParser" earlier, we can now pass the "Disjunction Query" into the constructor for our "searchRequest".

While not exactly the same, this resembles the following SQL:

type Book struct {
    ID    int    `json:"id"`
    Name  string `json:"name"`
    Genre string `json:"genre"`
}

b := Book{
    ID:    1234,
    Name:  "Some creative title",
    Genre: "Young Adult",
}
idStr := fmt.Sprintf("%d", b.ID)
// index(string, interface{})
index.index(idStr, b)
Copy after login
Copy after login

You can also adjust how fuzzy you want the search to be by setting "query.Fuzziness=[0 or 1 or 2]"

Conjunction Query Example:

// You would also want to check if the batch exists already
// - so that you don't recreate it.
batch := index.NewBatch()
if batch.Size() >= 1000 {
    err := index.Batch(batch)
    if err != nil {
        // failed, try again or log etc...
    }
    batch = index.NewBatch()
} else {
    batch.index(idStr, b)
}
Copy after login
Copy after login

You will notice the syntax is very similar, you can basically just use the "Conjunction" and "Disjunction" queries interchangeably.

This will look similar to the following in SQL:

searchParser := bleve.NewQueryStringQuery("chicken reciepe books")
maxPerPage := 50
ofsset := 0
searchRequest := bleve.NewSearchRequestOptions(searchParser, maxPerPage, offset, false)
// By default bleve returns just the ID, here we specify
// - all the other fields we would like to return.
searchRequest.Fields = []string{"id", "name", "genre"}
searchResults, err := index.Search(searchResult)
Copy after login
Copy after login

In summary; use the "Conjunction Query" when you want all child queries to match at least one document and the "Disjunction Query" when you want to match at least one child query but not necessarily all child queries.

Sharding

If you run into speed issues, Bleve also makes it possible to distribute your data across multiple index shards and then query those shards in one request, for example:

{
    "status": {
        "total": 5,
        "failed": 0,
        "successful": 5
    },
    "request": {},
    "hits": [],
    "total_hits": 19749,
    "max_score": 2.221337297308545,
    "took": 99039137,
    "facets": null
}
Copy after login
Copy after login

Sharding can become quite complex, but as you see above, Bleve takes away a lot of the pain, since it automatically "merges" all the indexes and searches across them, and then returns results in one resultset just as if you searched a single index.

I have been using sharding to search across 100 shards. The whole search process completes in a mere 100-200 milliseconds on average.

You can create shards as follows:

//... build searchRequest -- see previous section.
// Add facets
genreFacet := bleve.NewFacetRequest("genre", 50)
searchRequest.AddFacet("genre", genreFacet)
searchResults, err := index.Search(searchResult)
Copy after login
Copy after login

Just be sure to create unique IDs for each document or have some sort of predictable way of adding and updating documents without messing up the index.

An easy way to do this is to store a prefix that contains the shard name in your source DB, or wherever you get the documents from. So that every time you try to insert or update, you look up the "prefix" which will tell you which shard to call ".index" on.

Speaking of updating, simply calling "index.index(idstr, struct)" will update an existing document.

Conclusion

Using just this basic search technique above and putting it behind GIN or the standard Go HTTP server, you can build quite a powerful search API and serve millions of requests without needing to roll out complex infrastructure.

One caveat though; Bleve does not cater for replication, however, since you can wrap this in an API. Simply have a cron job that reads from your source and "blasts" out an update to all your Bleve servers using goroutines.

Alternatively, you can just lock writing to disk for a few seconds and then just "rsync" the data across to slave indexes, although I don't advise doing so because you probably would also need to restart the go binary each time.

The above is the detailed content of Bleve: How to build a rocket-fast search engine?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What are the vulnerabilities of Debian OpenSSL What are the vulnerabilities of Debian OpenSSL Apr 02, 2025 am 07:30 AM

OpenSSL, as an open source library widely used in secure communications, provides encryption algorithms, keys and certificate management functions. However, there are some known security vulnerabilities in its historical version, some of which are extremely harmful. This article will focus on common vulnerabilities and response measures for OpenSSL in Debian systems. DebianOpenSSL known vulnerabilities: OpenSSL has experienced several serious vulnerabilities, such as: Heart Bleeding Vulnerability (CVE-2014-0160): This vulnerability affects OpenSSL 1.0.1 to 1.0.1f and 1.0.2 to 1.0.2 beta versions. An attacker can use this vulnerability to unauthorized read sensitive information on the server, including encryption keys, etc.

Transforming from front-end to back-end development, is it more promising to learn Java or Golang? Transforming from front-end to back-end development, is it more promising to learn Java or Golang? Apr 02, 2025 am 09:12 AM

Backend learning path: The exploration journey from front-end to back-end As a back-end beginner who transforms from front-end development, you already have the foundation of nodejs,...

What libraries are used for floating point number operations in Go? What libraries are used for floating point number operations in Go? Apr 02, 2025 pm 02:06 PM

The library used for floating-point number operation in Go language introduces how to ensure the accuracy is...

What is the problem with Queue thread in Go's crawler Colly? What is the problem with Queue thread in Go's crawler Colly? Apr 02, 2025 pm 02:09 PM

Queue threading problem in Go crawler Colly explores the problem of using the Colly crawler library in Go language, developers often encounter problems with threads and request queues. �...

How to specify the database associated with the model in Beego ORM? How to specify the database associated with the model in Beego ORM? Apr 02, 2025 pm 03:54 PM

Under the BeegoORM framework, how to specify the database associated with the model? Many Beego projects require multiple databases to be operated simultaneously. When using Beego...

In Go, why does printing strings with Println and string() functions have different effects? In Go, why does printing strings with Println and string() functions have different effects? Apr 02, 2025 pm 02:03 PM

The difference between string printing in Go language: The difference in the effect of using Println and string() functions is in Go...

How to solve the user_id type conversion problem when using Redis Stream to implement message queues in Go language? How to solve the user_id type conversion problem when using Redis Stream to implement message queues in Go language? Apr 02, 2025 pm 04:54 PM

The problem of using RedisStream to implement message queues in Go language is using Go language and Redis...

What should I do if the custom structure labels in GoLand are not displayed? What should I do if the custom structure labels in GoLand are not displayed? Apr 02, 2025 pm 05:09 PM

What should I do if the custom structure labels in GoLand are not displayed? When using GoLand for Go language development, many developers will encounter custom structure tags...

See all articles