Unique counting is a very common feature in website systems. For example, a website needs to count the number of unique visitors (that is, UV) that visits every day. Counting problems are very common, but they can be very complicated to solve: first, the amount that needs to be counted may be very large, for example, a large site is visited by millions of people every day, and the amount of data is quite large; second, it is usually desirable to expand the dimension of counting. For example, in addition to daily UV, you also want to know weekly or monthly UV, which makes the calculation very complicated.
In a relational database storage system, the method to achieve unique counting is select count(distinct
Redis is easy to solve this kind of counting problem. It is faster and consumes less resources than relational databases. It even provides 3 different methods.
1. Set based on set
Redis is used to save a unique data collection. Through it, you can quickly determine whether an element exists in the collection, and you can also quickly calculate the elements of a collection. number, additionally and can be merged into a new set. The commands involved are as follows:
SISMEMBER key member # 判断 member 是否存在 SADD key member # 往集合中加入 member SCARD key # 获取集合元素个数
The set-based method is simple and effective, has accurate counting, wide application, and is easy to understand. Its disadvantage is that it consumes a lot of resources (of course it is much less than a relational database). If The number of elements is very large (for example, hundreds of millions of counts), and the memory consumption is terrible.
2. Bit based on bit
Redis can be used to implement counting that is more highly compressed than set memory. It uses a bit 1 or 0 to store information about whether an element exists. For example, for the count of unique visitors to a website, user_id can be used as the offset of the bit. Set to 1 to indicate access. Using 1 MB of space can store the one-day access count of more than 8 million users. The commands involved are as follows:
SETBIT key offset value # 设置位信息 GETBIT key offset # 获取位信息 BITCOUNT key [start end] # 计数 BITOP operation destkey key [key ...] # 位图合并
The bit-based method consumes much less space than the set method, but it requires that the elements can be simply mapped to bit offsets, and the applicable scope is much narrower. In addition, it consumes a lot of space. Depends on the maximum offset, regardless of the count value. If the maximum offset is large, the memory consumption is also considerable.
3. Based on HyperLogLog
It is difficult to achieve accurate unique counting of extremely large amounts of data, but if it is only approximate, there are many efficient algorithms in computing science, among which HyperLogLog Counting is one of them A very famous algorithm that can achieve hundreds of millions of unique counts using only about 12 K of memory, and the error is controlled at about one percent. The commands involved are as follows:
PFADD key element [element ...] # 加入元素 PFCOUNT key [key ...] # 计数
This counting method is really magical, and I haven’t fully understood it yet. If you are interested, you can study related articles in depth.
The three unique counting methods provided by redis each have their own advantages and disadvantages, and can fully meet the counting requirements in different situations.
For more 3 ways to implement unique counting in Redis, please pay attention to the PHP Chinese website to share related articles!