How to use Redis's HyperLogLog algorithm
You are happily slacking off, but the product manager sends you a requirements document via email. The company needs to keep long-term statistics on the website's daily visitor IPs, and the statistical time may last for months or even years.
After reading the requirements, you will feel that this is so easy. You can easily implement this function using the collection type of Redis: generate a collection type key every day, use SADD to store the daily visitor IP, and use the SCARD command to easily Get the number of visitor IPs per day.
You quickly finished typing the code and passed the test, and the function was online. After going online and running for a period of time, you will find that the server where Redis is located starts to alarm. The reason is that the memory usage of some keys is too large. You took a look and found that these keys are all set keys that store visitor IPs. Only then did you pat your head, knowing that you had dug a big hole for yourself.
Assume that storing an IP address in IPv4 format requires up to 15 bytes and that the website has up to 1 million visitors per day. These collection keys will use 0.45 GB of memory per month and 5.4 GB of memory per year. This is only an estimate of the IPv4 format. If the IPv6 format will occupy more memory. Although the time complexity of SADD and SCARD is O(1), their memory consumption is intolerable.
You browsed the official website of Redis and found that Redis also provides a data type HyperLogLog, which can not only meet the needs of the product but also occupy less memory.
HyperLogLog Algorithm
HyperLogLog is a probabilistic algorithm created specifically for calculating the cardinality of a set. It can calculate the approximate cardinality of a given set.
The approximate cardinality is not the actual cardinality of the set. It may be a little smaller or larger than the actual cardinality, but the error between the estimated cardinality and the actual cardinality will be within a reasonable range. For those who do not require Very accurate statistics can be achieved using the HyperLogLog algorithm.
The advantage of HyperLogLog is that the memory required for calculating the approximate cardinality does not change due to the size of the set. No matter how many elements the set contains, the memory required for HyperLogLog to calculate is always fixed, and are very few.
Each HyperLogLog type of Redis only needs to use 12KB of memory space to count nearly: 264 elements, and the standard error of the algorithm is only 0.81%.
If you use the HyperLogLog type to implement the above functions, if there are 1 million visitors per day, it will only occupy 360KB of memory in one month.
PFADD
The PFADD command can be used to count one or more given set elements.
PFADD key element [element...]
Depending on whether the given element has been counted, the PFADD command may return 0 or 1:
If all the given elements have been counted, the PFADD command will return 0, indicating that the approximate cardinality calculated by HyperLogLog has not changed.
The PFADD command will return 1 if the approximate cardinality calculated by HyperLogLog changes due to the presence of at least one element in a given element that has not been previously counted.
For example:
redis> PFADD letters a b c -- 第一次添加 (integer) 1 redis> PFADD letters a -- 第二次添加 (integer) 0
It is also possible if you only specify the key without specifying the element when calling this command. If the key exists, no operation will be performed. If If it does not exist, a data structure will be created (returns 1).
PFCOUNT
Use the PFCOUNT command to obtain the set cardinality based on HyperLogLog approximate calculation. If the given key does not exist, 0 will be returned.
PFCOUNT key [key...]
For example:
redis> PFCOUNT letters (integer) 3
When multiple HyperLogLogs are passed to PFCOUNT, the PFCOUNT command will first The union of all HyperLogLogs is then returned and the approximate cardinality is returned.
redis> PFADD letters1 a b c (integer) 1 redis> PFADD letters2 c d e (integer) 1 redis> PFCOUNT letters1 letters2 (integer) 5
PFMERGE
The PFMERGE command can perform a union calculation on multiple HyperLogLogs, and then save the calculated union HyperLogLog to the specified key.
PFMERGE destKey sourceKey [sourceKey...]
If the specified key already exists, the PFMERGE command will overwrite the existing key.
redis> PFADD letters1 a b c (integer) 1 redis> PFADD letters2 c d e (integer) 1 redis> PFMERGE res letters1 letters2 OK redis> PFCOUNT res (integer) 5
You can see that the PFMERGE and PFCOUNT commands are very similar. In fact, the PFCOUNT command performs the following operations when calculating the approximate base of multiple HyperLogLogs:
Internally called The PFMERGE command calculates the union of all given HyperLogLogs and stores the union into a temporary HyperLogLog.
Execute the PFCOUNT command on the temporary HyperLogLog to get its approximate cardinality.
Delete the temporary HyperLogLog.
Return the resulting approximate base.
When the program needs to call the PFCOUNT command on multiple HyperLogLogs, and this call may be repeated multiple times, you can consider replacing this call with the corresponding PFMERGE command call: by combining the The calculation results are stored in the specified HyperLogLog instead of recalculating the union every time, and the program can minimize unnecessary union calculations.
Business Scenario
HyperLogLog’s features are very suitable for: counting (monthly, annual statistics), deduplication (spam SMS detection) and other scenarios.
The above is the detailed content of How to use Redis's HyperLogLog algorithm. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

Redis uses hash tables to store data and supports data structures such as strings, lists, hash tables, collections and ordered collections. Redis persists data through snapshots (RDB) and append write-only (AOF) mechanisms. Redis uses master-slave replication to improve data availability. Redis uses a single-threaded event loop to handle connections and commands to ensure data atomicity and consistency. Redis sets the expiration time for the key and uses the lazy delete mechanism to delete the expiration key.

To view all keys in Redis, there are three ways: use the KEYS command to return all keys that match the specified pattern; use the SCAN command to iterate over the keys and return a set of keys; use the INFO command to get the total number of keys.

To view the Redis version number, you can use the following three methods: (1) enter the INFO command, (2) start the server with the --version option, and (3) view the configuration file.

Steps to solve the problem that redis-server cannot find: Check the installation to make sure Redis is installed correctly; set the environment variables REDIS_HOST and REDIS_PORT; start the Redis server redis-server; check whether the server is running redis-cli ping.

Redis Ordered Sets (ZSets) are used to store ordered elements and sort by associated scores. The steps to use ZSet include: 1. Create a ZSet; 2. Add a member; 3. Get a member score; 4. Get a ranking; 5. Get a member in the ranking range; 6. Delete a member; 7. Get the number of elements; 8. Get the number of members in the score range.

The best way to understand Redis source code is to go step by step: get familiar with the basics of Redis. Select a specific module or function as the starting point. Start with the entry point of the module or function and view the code line by line. View the code through the function call chain. Be familiar with the underlying data structures used by Redis. Identify the algorithm used by Redis.

Using the Redis directive requires the following steps: Open the Redis client. Enter the command (verb key value). Provides the required parameters (varies from instruction to instruction). Press Enter to execute the command. Redis returns a response indicating the result of the operation (usually OK or -ERR).
