How to implement task scheduling based on Redis distributed lock-Redis-php.cn

Home

Database

Redis

How to implement task scheduling based on Redis distributed lock

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 28, 2023 pm 01:37 PM

redis

In the process of distributed large-batch data collection, the management of information sources is particularly important. In order to ensure that the same task can only be processed by one collector at the same time, the uniqueness of task scheduling must be ensured. Usually when we carry out distributed data collection, there will usually be a scheduling module, whose main responsibility is to distribute the collection tasks and ensure the uniqueness of the tasks.

Because it is distributed, it involves multiple servers (multiple machines), each server involves multiple collectors (multiple processes), and each collector may involve multiple threads. , Therefore, the lock mechanism in the task scheduling module is particularly important. Depending on the implementation architecture of the application, lock implementation methods can usually be divided into the following types

If the handler is single-process and multi-threaded, under python, you can Use the Lock object of the threading module to restrict synchronous access to shared variables to achieve thread safety.

In the case of single machine and multiple processes, under python, you can use the Lock object of multiprocessing to handle it.

In the case of multi-machine and multi-process deployment, you have to rely on a third-party component (storage lock object) to implement a distributed synchronization lock.

Since the scheduling module is a multi-machine, multi-process, and multi-thread processing mechanism, it is consistent with the third method.

Distributed lock implementation methods

The current mainstream distributed lock implementation methods are as follows:

Based on database, such as mysql

Based on cache, such as redis

Based on zookeeper

Each implementation method has its own merits. After comprehensive consideration, Redis is the most suitable choice. The main reason is:

redis operates based on memory, and the access speed is faster than the database. Under high concurrency, the performance after locking will not drop too much

redis can set the survival time (TTL) of key values

redis is simple to use and has low overall implementation overhead

However, the distributed lock implemented using redis also needs to meet the following conditions:

Only one thread can occupy the lock at the same time. Other threads must wait until the lock is released

The lock operation must satisfy atomicity

No deadlock will occur, such as when the lock has been acquired The thread suddenly exits abnormally before releasing the lock, causing other threads to wait in a loop for the lock to be released

The addition and release of the lock must be set by the same thread

We use redis to implement a distributed synchronization lock to ensure data consistency, which needs to meet the following characteristics:

Satisfy mutual exclusivity, only one thread can acquire the lock at the same time

Use the ttl of redis to ensure that no deadlock will occur, but it will also cause problems due to lock expiration The problem of multiple threads occupying locks at the same time requires us to set the expiration time of the lock reasonably to avoid

Use the uniqueness of the lock to ensure that the lock will not be accidentally deleted

In the actual operation process, I separated the scheduling module from the entire collection system, based on the Java client Jredis (JRedis is a high-end A high-performance Java client used to connect to the Redis distributed hash key-value database. An independent service that uses Spring Boot to implement synchronous and asynchronous functions. It allows other collectors to request the collection tasks to be processed through HTTP. .The processing process is roughly as follows:

The collector sends a task request to the dispatching center through HTTP;
The dispatching center determines whether the lock exists , if it exists, the empty set will be returned directly;
If the lock does not exist, the request will be locked, and then the corresponding collection task will be obtained according to the source rules;
Return the acquired task (if there is no pending task, return empty), and then delete the lock.

The code implementation of the scheduling module is roughly as follows: