As the business grows, the production system will always experience a process of business volume increasing from small to large, and the scalability is An important indicator when considering the high availability of a database system; when the amount of data in a single table/database is too large and the amount of updates continues to soar, the MySQL DBA often proposes sharding solutions for the business system. Since sharding is required, it is inevitable to discuss the sharding key issue. In some business systems, the sharding key must be globally unique, such as a database that stores products, etc. So how to generate a globally unique ID? The following will be from the perspective of a DBA. Introduce several common solutions.
What is the CAS protocol
Memcached added the CAS (Check and Set) protocol in version 1.2.4, which is similar to Java's concurrent CAS (Compare and Swap) atomic operation, processing Concurrency issues when the same item is changed by multiple threads
The basic principle is very simple. In a nutshell, it is "version number". Each stored data object has a version number.
We can understand from the following example:
If CAS is not used, the following scenario will occur:
Conclusion: Data writing conflicts will occur in the fourth step.
Using the CAS protocol, the following scenario occurs.
In this way, the CAS protocol uses the idea of "version number" to solve the conflict problem. (Optimistic lock concept)
In fact, this is not strictly CAS, but uses the idea of comparison and exchange atomic operations.
The generation idea is as follows: each time a global id is generated, the current global maximum id is first obtained from the sequence table. Then add 1 to the obtained global ID, and the value after adding 1 is updated to the database. For example, the value after adding 1 is 203, the table name is users, and the data table structure is as follows:
CREATE TABLE `SEQUENCE` ( `name` varchar(30) NOT NULL COMMENT '分表的表名', `gid` bigint(20) NOT NULL COMMENT '最大全局id', PRIMARY KEY (`name`) ) ENGINE=innodb
sql statement
update sequence set gid = 203 where name = 'users' and gid < 203;
and gid < 203 in the sql statement is to ensure that the value of gid only increases in a concurrent environment.
If the number of records affected by the update statement is 0, it means that another process has generated the value 203 in advance and written it to the database. You need to repeat the above steps to regenerate.
The code is implemented as follows:
//$name 表名 function next_id_db($name){ //获取数据库全局sequence对象 $seq_dao = Wk_Sequence_Dao_Sequence::getInstance(); $threshold = 100; //最大尝试次数 for($i = 0; $i < $threshold; $i++){ $last_id = $seq_dao->get_seq_id($name);//从数据库获取全局id $id = $last_id +1; $ret = $seq_dao->set_seq_id($name, $id); if($ret){ return $id; break; } } return false; }
When performing concurrent programming, the lock mechanism is generally used. In fact, the generation of global ID also solves the concurrency problem.
The generation idea is as follows:
When using the setnx method of redis and the add method of memcace, if the specified key already exists, false will be returned. Use this feature to implement global lock
Before generating a global ID each time, first check whether the specified key exists. If it does not exist, use the incr method of redis or the increment of memcache to add 1. The return value of these two methods is the value after adding 1. If it exists, the program enters the loop waiting state. During the loop, it is constantly checked whether the key still exists. If the key does not exist, the above operation is performed.
The code is as follows:
//使用redis实现 //$name 为 逻辑表名 function next_id_redis($name){ $redis = Wk_Redis_Util::getRedis();//获取redis对象 $seq_dao = Wk_Sequence_Dao_Sequence::getInstance();//获取存储全局id数据表对象 if(!is_object($redis)){ throw new Exception("fail to create redis object"); } $max_times = 10; //最大执行次数 避免redis不可用的时候 进入死循环 while(1){ $i++; //检测key是否存在,相当于检测锁是否存在 $ret = $redis->setnx("sequence_{$name}_flag",time()); if($ret){ break; } if($i > $max_times){ break; } $time = $redis->get("sequence_{$name}_flag"); if(is_numeric($time) && time() - $time > 1){//如果循环等待时间大于1秒,则不再等待。 break; } } $id = $redis->incr("sequence_{$name}"); //如果操作失败,则从sequence表中获取全局id并加载到redis if (intval($id) === 1 or $id === false) { $last_id = $seq_dao->get_seq_id($name);//从数据库获取全局id if(!is_numeric($last_id)){ throw new Exception("fail to get id from db"); } $ret = $redis->set("sequence_{$name}",$last_id); if($ret == false){ throw new Exception("fail to set redis key [ sequence_{$name} ]"); } $id = $redis->incr("sequence_{$name}"); if(!is_numeric($id)){ throw new Exception("fail to incr redis key [ sequence_{$name} ]"); } } $seq_dao->set_seq_id($name, $id);//把生成的全局id写入数据表sequence $redis->delete("sequence_{$name}_flag");//删除key,相当于释放锁 $db = null; return $id; }
Using redis to directly operate the memory may have better performance. But what to do if redis dies? Combining the above two solutions provides better stability.
The code is as follows:
function next_id($name){ try{ return $this->next_id_redis($name); } catch(Exception $e){ return $this->next_id_db($name); } }
Because mysql itself supports the auto_increment operation, it is natural that we would think of using this feature to implement this function. Flicker uses MySQL's auto-increment ID mechanism (auto_increment replace into MyISAM) in its global ID generation solution. A specific solution for generating 64-bit IDs is as follows:
First create a separate database (eg: ticket), and then create a table:
CREATE TABLE Tickets64 ( id bigint(20) unsigned NOT NULL auto_increment, stub char(1) NOT NULL default '', PRIMARY KEY (id), UNIQUE KEY stub (stub) ) ENGINE=MyISAM
After we insert the record and execute SELECT * from Tickets64
, the query result is like this:
+-------------------+------+ | id | stub | +-------------------+------+ | 72157623227190423 | a | +-------------------+------+
On our application side, we need to do the following two operations and submit them in a transaction session:
REPLACE INTO Tickets64 (stub) VALUES ('a'); SELECT LAST_INSERT_ID();
In this way we can get growing and non-duplicate IDs.
So far, we have only generated IDs on a single database. From the perspective of high availability,
The next step is to solve the single point of failure problem: Flicker has enabled two database servers to generate IDs,
Odd and even IDs are generated by distinguishing the starting value and step size of auto_increment.
TicketServer1: auto-increment-increment = 2 auto-increment-offset = 1 TicketServer2: auto-increment-increment = 2 auto-increment-offset = 2
Finally, the client only needs to get the ID through polling.
Reference:
http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
http://segmentfault.com/a/1190000004090537