[php] mysql global ID generation solution, phpmysql global id

[php]mysql global ID generation solution, phpmysql global id

As the business grows, the production system will always experience a process of business volume increasing from small to large, and the scalability is An important indicator when considering the high availability of a database system; when the amount of data in a single table/database is too large and the amount of updates continues to soar, the MySQL DBA often proposes sharding solutions for the business system. Since sharding is required, it is inevitable to discuss the sharding key issue. In some business systems, the sharding key must be globally unique, such as a database that stores products, etc. So how to generate a globally unique ID? The following will be from the perspective of a DBA. Introduce several common solutions.

1. Use CAS thinking

What is the CAS protocol

Memcached added the CAS (Check and Set) protocol in version 1.2.4, which is similar to Java's concurrent CAS (Compare and Swap) atomic operation, processing Concurrency issues when the same item is changed by multiple threads

Basic principles of CAS

The basic principle is very simple. In a nutshell, it is "version number". Each stored data object has a version number.

We can understand from the following example:

If CAS is not used, the following scenario will occur:

In the first step, A takes out the data object X;
In the second step, B takes out the data object X;
The third step, B modifies the data object X and puts it into the cache;
The fourth step, A modifies the data object X and puts it into the cache.

Conclusion: Data writing conflicts will occur in the fourth step.

Using the CAS protocol, the following scenario occurs.

In the first step, A takes out the data object X and obtains CAS-ID1;
In the second step, B takes out the data object X and obtains CAS-ID2;
In the third step, B modifies the data object X and checks whether the CAS-ID is consistent with the CAS-ID of the data in the cache space before writing it to the cache. The result is "consistent", and the modified X with CAS-ID2 is written to the cache.
The fourth step, A modifies the data object Y, and before writing to the cache, checks whether the CAS-ID is consistent with the CAS-ID of the data in the cache space. If the result is "inconsistent", the write is rejected and storage failure is returned.

In this way, the CAS protocol uses the idea of "version number" to solve the conflict problem. (Optimistic lock concept)

In fact, this is not strictly CAS, but uses the idea of comparison and exchange atomic operations.

The generation idea is as follows: each time a global id is generated, the current global maximum id is first obtained from the sequence table. Then add 1 to the obtained global ID, and the value after adding 1 is updated to the database. For example, the value after adding 1 is 203, the table name is users, and the data table structure is as follows:

CREATE TABLE `SEQUENCE` (
    `name` varchar(30) NOT NULL COMMENT '分表的表名',
    `gid` bigint(20) NOT NULL COMMENT '最大全局id',
    PRIMARY KEY (`name`)
) ENGINE=innodb

Copy after login

sql statement

update sequence set gid = 203 where name = 'users' and gid < 203;

Copy after login

and gid < 203 in the sql statement is to ensure that the value of gid only increases in a concurrent environment.

If the number of records affected by the update statement is 0, it means that another process has generated the value 203 in advance and written it to the database. You need to repeat the above steps to regenerate.

The code is implemented as follows:

//$name 表名
function next_id_db($name){
    //获取数据库全局sequence对象
    $seq_dao = Wk_Sequence_Dao_Sequence::getInstance();
    $threshold = 100; //最大尝试次数
    for($i = 0; $i < $threshold; $i++){
        $last_id = $seq_dao->get_seq_id($name);//从数据库获取全局id
        $id = $last_id +1;
        $ret = $seq_dao->set_seq_id($name, $id);
        if($ret){
            return $id;
            break;
        }
    }
    return false;
}

Copy after login

2. Use global lock

When performing concurrent programming, the lock mechanism is generally used. In fact, the generation of global ID also solves the concurrency problem.

The generation idea is as follows:

When using the setnx method of redis and the add method of memcace, if the specified key already exists, false will be returned. Use this feature to implement global lock

Before generating a global ID each time, first check whether the specified key exists. If it does not exist, use the incr method of redis or the increment of memcache to add 1. The return value of these two methods is the value after adding 1. If it exists, the program enters the loop waiting state. During the loop, it is constantly checked whether the key still exists. If the key does not exist, the above operation is performed.

The code is as follows:

//使用redis实现
//$name 为 逻辑表名
function next_id_redis($name){
    $redis = Wk_Redis_Util::getRedis();//获取redis对象
    $seq_dao = Wk_Sequence_Dao_Sequence::getInstance();//获取存储全局id数据表对象
    if(!is_object($redis)){
        throw new Exception("fail to create redis object");
    }
    $max_times = 10; //最大执行次数 避免redis不可用的时候 进入死循环
    while(1){
        $i++;
        //检测key是否存在，相当于检测锁是否存在
        $ret = $redis->setnx("sequence_{$name}_flag",time());
        if($ret){
            break;
        }
        if($i > $max_times){
            break;
        }
        $time = $redis->get("sequence_{$name}_flag");
        if(is_numeric($time) && time() - $time > 1){//如果循环等待时间大于1秒，则不再等待。
            break;
        }
    }
    $id = $redis->incr("sequence_{$name}");
    //如果操作失败，则从sequence表中获取全局id并加载到redis
    if (intval($id) === 1 or $id === false) {
        $last_id = $seq_dao->get_seq_id($name);//从数据库获取全局id
        if(!is_numeric($last_id)){
            throw new Exception("fail to get id from db");
        }
        $ret = $redis->set("sequence_{$name}",$last_id);
        if($ret == false){
            throw new Exception("fail to set redis key [ sequence_{$name} ]");
        }
        $id = $redis->incr("sequence_{$name}");
        if(!is_numeric($id)){
            throw new Exception("fail to incr redis key [ sequence_{$name} ]");
        }
    }
    $seq_dao->set_seq_id($name, $id);//把生成的全局id写入数据表sequence
    $redis->delete("sequence_{$name}_flag");//删除key，相当于释放锁
    $db = null;
    return $id;
}

Copy after login

3. Combination of redis and db

Using redis to directly operate the memory may have better performance. But what to do if redis dies? Combining the above two solutions provides better stability.
The code is as follows:

function next_id($name){
    try{
        return $this->next_id_redis($name);
    }
    catch(Exception $e){
        return $this->next_id_db($name);
    }
}

Copy after login

4. Flicker’s solution

Because mysql itself supports the auto_increment operation, it is natural that we would think of using this feature to implement this function. Flicker uses MySQL's auto-increment ID mechanism (auto_increment replace into MyISAM) in its global ID generation solution. A specific solution for generating 64-bit IDs is as follows:
First create a separate database (eg: ticket), and then create a table:

CREATE TABLE Tickets64 (
            id bigint(20) unsigned NOT NULL auto_increment,
            stub char(1) NOT NULL default '',
            PRIMARY KEY  (id),
            UNIQUE KEY stub (stub)
    ) ENGINE=MyISAM

Copy after login

After we insert the record and execute SELECT * from Tickets64, the query result is like this:

+-------------------+------+
| id                | stub |
+-------------------+------+
| 72157623227190423 |    a |
+-------------------+------+

Copy after login

On our application side, we need to do the following two operations and submit them in a transaction session:

REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

Copy after login

In this way we can get growing and non-duplicate IDs.
So far, we have only generated IDs on a single database. From the perspective of high availability,
The next step is to solve the single point of failure problem: Flicker has enabled two database servers to generate IDs,
Odd and even IDs are generated by distinguishing the starting value and step size of auto_increment.

TicketServer1:
auto-increment-increment = 2
auto-increment-offset = 1

TicketServer2:
auto-increment-increment = 2
auto-increment-offset = 2

Copy after login

Finally, the client only needs to get the ID through polling.

Advantages: Make full use of the self-increasing ID mechanism of the database to provide high reliability and the generated IDs are orderly.
Disadvantages: Occupying two independent MySQL instances is a waste of resources and the cost is high.

Reference:

http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/

http://segmentfault.com/a/1190000004090537