When it comes to indexing, the first impression is the noun of database, but Gaussian Redis can also implement secondary indexing! ! ! Secondary indexes in Gaussian Redis are generally implemented using zset. Gaussian Redis has higher stability and cost advantages than open source Redis. Using Gaussian Redis zset to implement business secondary indexes can achieve a win-win situation in performance and cost.
The essence of indexing is to use ordered structures to speed up queries. Therefore, numeric type and character type indexes can be easily implemented through the Zset structure Gaussian Redis.
• Numeric type index (zset is sorted by fraction):
• Character type index (fraction is sorted At the same time, zset is sorted lexicographically):
Let’s cut into two types of classic business scenarios and see how to use Gaussian Redis to build Stable and reliable secondary indexing system.
When typing a query in the browser, the browser usually recommends searches with the same prefix based on likelihood. In this scenario, Gaussian Redis 2 can be used Level index function is implemented.
The simplest method is to add each query of the user to the index. If you need to provide completion prompts to users, you can use ZRANGEBYLEX to perform range queries. To reduce the number of results, using the LIMIT option is a method supported by Gaussian Redis.
• Add user search banana to the index:
ZADD myindex 0 banana:1
• Suppose the user enters "bit" in the search form, and we want to provide search keywords that may start with "bit" .
ZRANGEBYLEX myindex "[bit" "[bit\xff"
Even if you use ZRANGEBYLEX to perform a range query, the query range is the string currently entered by the user, and the same string plus a trailing byte of 255 (\xff). We can use this method to get all the strings prefixed by the string entered by the user.
In practical applications, people usually want to automatically sort the completion entries to adapt to the frequency of occurrence and eliminate entries that are no longer popular. while adapting to future inputs. We can still use the ZSet structure of Gaussian Redis to achieve this goal, but in the index structure, not only the search terms need to be stored, but also the frequencies associated with them need to be stored.
• Add user search banana to the index
• Determine whether banana exists
ZRANGEBYLEX myindex "[banana:" + LIMIT 0 1
• Assume banana does not exist, add banana:1, where 1 is the frequency
ZADD myindex 0 banana:1
• Assuming banana exists, you need to increment the frequency
If the frequency returned in ZRANGEBYLEX myindex "[banana:" LIMIT 0 1 is 1
1) Delete the old entry :
ZREM myindex 0 banana:1
2) Frequency plus one to rejoin:
ZADD myindex 0 banana:2
Please note that since there may be concurrent updates, the above three commands should be sent through a Lua script to automatically obtain the old count with Lua script and re-add the entry after increasing the score.
If the user enters "banana" in the search form, we hope to provide relevant search keywords. Sort by frequency after getting results via ZRANGEBYLEX.
ZRANGEBYLEX myindex "[banana:" + LIMIT 0 10 1) "banana:123" 2) "banaooo:1" 3) "banned user:49" 4) "banning:89"
• Use streaming algorithms to purge infrequently used inputs. Randomly select a returned entry and subtract one from its score, then add it back with the updated score. However, if the new score is 0, we need to remove the entry from the list.
• If the frequency of randomly selected entries is 1, such as bananaoo:1
ZREM myindex 0 banaooo:1
• If the frequency of randomly selected entries is greater than 1, such as banana:123
ZREM myindex 0 banana:123 ZADD myindex 0 banana:122
Over the long term, the index will include popular searches and automatically adapt if popular searches change over time.
Gaussian Redis not only supports queries in a single dimension, but can also retrieve in multidimensional data. For example, search for people who meet the following criteria: age between 50 and 55 years old, and salary between 70,000 and 85,000. Converting two-dimensional data encoding into one-dimensional data, and then using Gaussian distributed Redis zset storage, is an important method to implement multi-dimensional secondary indexes.
Represent two-dimensional index from a visual perspective. In this space, there are some data sample points represented as coordinates (x, y), and the maximum values of both x and y variables in these coordinates are 400. The blue box in the image represents our query. We want to find all points with coordinates x between 50 and 100 and y between 100 and 300.
If the inserted data point is x = 75 and y = 200
1) fill with 0 (the maximum data is 400, so fill in 3 digits)
x = 075
y = 200
2)交织数字,以x表示最左边的数字,以y表示最左边的数字,依此类推,以便创建一个编码
027050
若使用00和99替换最后两位,即027000 to 027099,map回x和y,即:
x = 70-79
y = 200-209
因此,针对x=70-79和y = 200-209的二维查询,可以通过编码map成027000 to 027099的一维查询,这可以通过高斯Redis的Zset结构轻松实现。
同理,我们可以针对后四/六/etc位数字进行相同操作,从而获得更大范围。
3)使用二进制
如果将数据表示为二进制,就可以获得更细的粒度,而在数字替换时,每次都将搜索范围扩大两倍。如果我们使用二进制表示法数字,每个变量最多需要9位(表示最多400个值),那么我们将得到:
x = 75 -> 001001011
y = 200 -> 011001000
交织后,000111000011001010
让我们看看在交错表示中用0s ad 1s替换最后的2、4、6、8,...位时我们的范围是什么:
若插入数据点为x = 75和y = 200
x = 75和y = 200二进制交织编码后为000111000011001010,
ZADD myindex 0 000111000011001010
查询:x介于50和100之间,y介于100和300之间的所有点
从索引中替换N位会给我们边长为2^(N/2)的搜索框。因此,我们要做的是检查搜索框较小的尺寸,并检查与该数字最接近的2的幂,并不断切分剩余空间,随后用ZRANGEBYLEX进行搜索。
下面是示例代码:
def spacequery(x0,y0,x1,y1,exp) bits=exp*2 x_start = x0/(2**exp) x_end = x1/(2**exp) y_start = y0/(2**exp) y_end = y1/(2**exp) (x_start..x_end).each{|x| (y_start..y_end).each{|y| x_range_start = x*(2**exp) x_range_end = x_range_start | ((2**exp)-1) y_range_start = y*(2**exp) y_range_end = y_range_start | ((2**exp)-1) puts "#{x},#{y} x from #{x_range_start} to #{x_range_end}, y from #{y_range_start} to #{y_range_end}" # Turn it into interleaved form for ZRANGEBYLEX query. # We assume we need 9 bits for each integer, so the final # interleaved representation will be 18 bits. xbin = x_range_start.to_s(2).rjust(9,'0') ybin = y_range_start.to_s(2).rjust(9,'0') s = xbin.split("").zip(ybin.split("")).flatten.compact.join("") # Now that we have the start of the range, calculate the end # by replacing the specified number of bits from 0 to 1. e = s[0..-(bits+1)]+("1"*bits) puts "ZRANGEBYLEX myindex [#{s} [#{e}" } } end spacequery(50,100,100,300,6)
The above is the detailed content of How to use Gaussian Redis to implement secondary index. For more information, please follow other related articles on the PHP Chinese website!