Introduction | We introduced the birth of pika, the characteristics of pika, the core of pika and the use of pika in "Large Capacity Redis Storage - Everything About Pika". The article analyzes the important files in pika synchronization logic in great detail: the data storage method and implementation principle of "write2file", which is very worth reading! |
pika is a large-capacity Redis-like storage developed by the DBA of 360 Web Platform Department and the infrastructure team. The emergence of pika is not to replace Redis, but to supplement Redis scenarios. Pika strives to solve the problems of Redis in large-capacity scenarios through persistent storage on the premise of being fully compatible with the Redis protocol and inheriting the convenient operation and maintenance design of Redis, such as slow recovery time, high cost of master-slave synchronization, relatively fragile single thread, and load-bearing capacity. Data is limited, memory cost is high, etc.
pika master-slave replication principle binlogBinlog related files contain two parts: manifest and write2file. Manifest records log meta information, including the current log file number and current log file offset. write2file num records all redis write commands and parameters received by pika. .
file formatManifest file format:
Log offset (8 bytes)|con_offset (8 bytes, unused)|Number of elements (4 bytes, unused)|Log file number (4 bytes).
Binlog file format:
The fixed size of the Binlog file is 100MB. Each Binlog file is composed of multiple Blocks. The size of each Block is fixed at 64KB. Each write redis command is called a Record. A Record can be distributed in multiple Blocks, but it will only be distributed in one Binlog file, so the Binlog file may be larger than 100MB.
Record format: Header|Cmd
Header: Record Length (3 bytes) | Timestamp (4 bytes) | Record type (1 byte).
Cmd: part or all of the redis command, depending on whether the remaining space of the current Block can store the Record.
Implementation classBasic Class
Version: Meta information class, mapped through mmap and manifest files.
Binlog: Log class, mapped through mmap and write2file files.
PikaBinlogSenderThread: Log consumption class, sequentially reads log file contents and consumes logs.
Basic operationsConstructing Binlog
//file_size can be specified in the configuration file, the default is 100MB
Binlog::Binlog(const std::string& binlog_path, const int file_size)
1.1Create the binlog file directory.
1.2 Check whether the manifest file in the log directory exists. If it does not exist, create a new one.
1.3 Initialize the Version class according to the manifest file.
1.4 Find the corresponding log file according to filenum in the manifest, locate the file append position according to pro_offset, initialize the log pointer, record the log content length, and the number of Block blocks.
Update current log production status
//pro_num: Log file number
//pro_offset: log file offset
//Used to update the binlog information corresponding to the slave instance when full synchronization is required
Status Binlog::SetProducerStatus(uint32_t pro_num, uint64_t pro_offset)
2.1 Delete write2file0.
2.2 Delete write2file pro_num.
2.3 Construct a new write2file pro_num file, fill in pro_offset spaces, initialize version->pro_num to pro_num, version->pro_offset to pro_offset, and refresh it to the manifest file.
2.4 Initialize the current filesize and block_offset.
Update current log production status
//filenum: current log number
//pro_offset: current log offset
Status Binlog::GetProducerStatus(uint32_t* filenum, uint64_t* pro_offset)
3.1 Read pro_num and pro_offset in version and return.
Production log
//Put->Produce->EmitPhysicalRecord
Status Binlog::Put(const std::string &item)
4.1 Check whether the current log file meets the cutting conditions, and if so, cut it.
4.1.1 pro_num increases by 1, initializes new log files, version->pro_num=pro_num, version->pro_offset = 0, binlog->filesize = 0, binlog->block_offset = 0.
4.1.2 If the remaining size of the current block is 4.1.3 Produce is a loop, which ensures that when the item size exceeds kBlockSize, EmitPhysicalRecord can be performed multiple times, and all data of the item will be dropped into the binlog file. The condition for the loop to exit normally is left==0. 4.1.3.1 If left 4.1.3.2 If left > avail, it means multiple Blocks are needed to store items, then the first time Type=kFirstType, call EmitPhysicalRecord multiple times. 4.1.3.3 If left > avail, and it is not the first time to EmitPhysicalRecord, then Type=kMiddleType, call EmitPhysicalRecord multiple times. 4.1.4EmitPhysicalRecord. 4.1.4.1 Splice RecordHeader (3-byte length, 4-byte time, 1-byte Type), write data, and update block_offset and pro_offset. Consumption Log //scratch: The consumption result returns a complete redis cmd //Consume->ReadPhysicalRecord, ReadPhysicalRecord reads a complete Record each time, and multiple Records constitute a complete redis cmd Status PikaBinlogSenderThread::Consume(std::string &scratch) 5.1Consume is a loop, which may call ReadPhysicalRecord multiple times. The condition for loop exit is that the read record_type==kFullType or record_type==kLastType. 5.1.1 If the read kBlockSize-last_record_offset_ <= kHeaderSize means that the end of the Block has been read and it is filled with data, skip it. 5.1.2 Read data, update last_record_offset_, con_offset. The above is the detailed content of Pika: Supplement applicable scenarios for large-capacity Redis storage. For more information, please follow other related articles on the PHP Chinese website!