How to implement bloom filter using php+redis
First define a hash function collection class. These hash functions are not necessarily used. In fact, 3 32-bit hash values are enough. The specific number can be based on the total amount of your bit sequence and your need to store it. Determined by the amount, the optimal value has been given above.
class BloomFilterHash { /** * 由Justin Sobel编写的按位散列函数 */ public function JSHash($string, $len = null) { $hash = 1315423911; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash ^= (($hash << 5) + ord($string[$i]) + ($hash >> 2)); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 该哈希算法基于AT&T贝尔实验室的Peter J. Weinberger的工作。 * Aho Sethi和Ulman编写的“编译器(原理,技术和工具)”一书建议使用采用此特定算法中的散列方法的散列函数。 */ public function PJWHash($string, $len = null) { $bitsInUnsignedInt = 4 * 8; //(unsigned int)(sizeof(unsigned int)* 8); $threeQuarters = ($bitsInUnsignedInt * 3) / 4; $oneEighth = $bitsInUnsignedInt / 8; $highBits = 0xFFFFFFFF << (int) ($bitsInUnsignedInt - $oneEighth); $hash = 0; $test = 0; $len || $len = strlen($string); for($i=0; $i<$len; $i++) { $hash = ($hash << (int) ($oneEighth)) + ord($string[$i]); } $test = $hash & $highBits; if ($test != 0) { $hash = (($hash ^ ($test >> (int)($threeQuarters))) & (~$highBits)); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 类似于PJW Hash功能,但针对32位处理器进行了调整。它是基于UNIX的系统上的widley使用哈希函数。 */ public function ELFHash($string, $len = null) { $hash = 0; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = ($hash << 4) + ord($string[$i]); $x = $hash & 0xF0000000; if ($x != 0) { $hash ^= ($x >> 24); } $hash &= ~$x; } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 这个哈希函数来自Brian Kernighan和Dennis Ritchie的书“The C Programming Language”。 * 它是一个简单的哈希函数,使用一组奇怪的可能种子,它们都构成了31 .... 31 ... 31等模式,它似乎与DJB哈希函数非常相似。 */ public function BKDRHash($string, $len = null) { $seed = 131; # 31 131 1313 13131 131313 etc.. $hash = 0; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = (int) (($hash * $seed) + ord($string[$i])); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 这是在开源SDBM项目中使用的首选算法。 * 哈希函数似乎对许多不同的数据集具有良好的总体分布。它似乎适用于数据集中元素的MSB存在高差异的情况。 */ public function SDBMHash($string, $len = null) { $hash = 0; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = (int) (ord($string[$i]) + ($hash << 6) + ($hash << 16) - $hash); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 由Daniel J. Bernstein教授制作的算法,首先在usenet新闻组comp.lang.c上向世界展示。 * 它是有史以来发布的最有效的哈希函数之一。 */ public function DJBHash($string, $len = null) { $hash = 5381; $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = (int) (($hash << 5) + $hash) + ord($string[$i]); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * Donald E. Knuth在“计算机编程艺术第3卷”中提出的算法,主题是排序和搜索第6.4章。 */ public function DEKHash($string, $len = null) { $len || $len = strlen($string); $hash = $len; for ($i=0; $i<$len; $i++) { $hash = (($hash << 5) ^ ($hash >> 27)) ^ ord($string[$i]); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } /** * 参考 http://www.isthe.com/chongo/tech/comp/fnv/ */ public function FNVHash($string, $len = null) { $prime = 16777619; //32位的prime 2^24 + 2^8 + 0x93 = 16777619 $hash = 2166136261; //32位的offset $len || $len = strlen($string); for ($i=0; $i<$len; $i++) { $hash = (int) ($hash * $prime) % 0xFFFFFFFF; $hash ^= ord($string[$i]); } return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF; } }
The next step is to connect redis to perform operations
/** * 使用redis实现的布隆过滤器 */ abstract class BloomFilterRedis { /** * 需要使用一个方法来定义bucket的名字 */ protected $bucket; protected $hashFunction; public function __construct($config, $id) { if (!$this->bucket || !$this->hashFunction) { throw new Exception("需要定义bucket和hashFunction", 1); } $this->Hash = new BloomFilterHash; $this->Redis = new YourRedis; //假设这里你已经连接好了 } /** * 添加到集合中 */ public function add($string) { $pipe = $this->Redis->multi(); foreach ($this->hashFunction as $function) { $hash = $this->Hash->$function($string); $pipe->setBit($this->bucket, $hash, 1); } return $pipe->exec(); } /** * 查询是否存在, 如果曾经写入过,必定回true,如果没写入过,有一定几率会误判为存在 */ public function exists($string) { $pipe = $this->Redis->multi(); $len = strlen($string); foreach ($this->hashFunction as $function) { $hash = $this->Hash->$function($string, $len); $pipe = $pipe->getBit($this->bucket, $hash); } $res = $pipe->exec(); foreach ($res as $bit) { if ($bit == 0) { return false; } } return true; } }
The above definition is an abstract class. If you want to use it, you can use it according to the specific business. For example, below is a filter that filters out duplicate content.
/** * 重复内容过滤器 * 该布隆过滤器总位数为2^32位, 判断条数为2^30条. hash函数最优为3个.(能够容忍最多的hash函数个数) * 使用的三个hash函数为 * BKDR, SDBM, JSHash * * 注意, 在存储的数据量到2^30条时候, 误判率会急剧增加, 因此需要定时判断过滤器中的位为1的的数量是否超过50%, 超过则需要清空. */ class FilteRepeatedComments extends BloomFilterRedis { /** * 表示判断重复内容的过滤器 * @var string */ protected $bucket = 'rptc'; protected $hashFunction = array('BKDRHash', 'SDBMHash', 'JSHash'); }
The above is the detailed content of How to implement bloom filter using php+redis. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Redis cluster mode deploys Redis instances to multiple servers through sharding, improving scalability and availability. The construction steps are as follows: Create odd Redis instances with different ports; Create 3 sentinel instances, monitor Redis instances and failover; configure sentinel configuration files, add monitoring Redis instance information and failover settings; configure Redis instance configuration files, enable cluster mode and specify the cluster information file path; create nodes.conf file, containing information of each Redis instance; start the cluster, execute the create command to create a cluster and specify the number of replicas; log in to the cluster to execute the CLUSTER INFO command to verify the cluster status; make

The future of PHP will be achieved by adapting to new technology trends and introducing innovative features: 1) Adapting to cloud computing, containerization and microservice architectures, supporting Docker and Kubernetes; 2) introducing JIT compilers and enumeration types to improve performance and data processing efficiency; 3) Continuously optimize performance and promote best practices.

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

How to clear Redis data: Use the FLUSHALL command to clear all key values. Use the FLUSHDB command to clear the key value of the currently selected database. Use SELECT to switch databases, and then use FLUSHDB to clear multiple databases. Use the DEL command to delete a specific key. Use the redis-cli tool to clear the data.

PHP remains important in modern web development, especially in content management and e-commerce platforms. 1) PHP has a rich ecosystem and strong framework support, such as Laravel and Symfony. 2) Performance optimization can be achieved through OPcache and Nginx. 3) PHP8.0 introduces JIT compiler to improve performance. 4) Cloud-native applications are deployed through Docker and Kubernetes to improve flexibility and scalability.

To read a queue from Redis, you need to get the queue name, read the elements using the LPOP command, and process the empty queue. The specific steps are as follows: Get the queue name: name it with the prefix of "queue:" such as "queue:my-queue". Use the LPOP command: Eject the element from the head of the queue and return its value, such as LPOP queue:my-queue. Processing empty queues: If the queue is empty, LPOP returns nil, and you can check whether the queue exists before reading the element.

The reasons why PHP is the preferred technology stack for many websites include its ease of use, strong community support, and widespread use. 1) Easy to learn and use, suitable for beginners. 2) Have a huge developer community and rich resources. 3) Widely used in WordPress, Drupal and other platforms. 4) Integrate tightly with web servers to simplify development deployment.
