The key points are as follows:
Currently there are 23 pc servers
The number of pv per day is about 2k million. The number of registered users is 3 million.
Most of the data in the table has tens of millions of rows.
5-person algorithm team. In addition, there are a total of 11 developers, including full-time and part-time (in the past, there were only 10 people when I saw the technology shared by Baixing.com)
In 2006, there were about 1.2 million dynamic requests every day. At this time, the main bottleneck is disk i/0. Get venture capital and have money to buy hardware equipment. Purchase two iu servers (dual core, 4g memory)
one as an application server and one as a database server, migrate to a dual-line IP computer room, and use dns to resolve IP addresses of different network segments (find out which network segments belong to China Telecom) Which network segments are Netcom, and then analyze them yourself). After reading the computer room adjustment mentioned at the end of the speech, I felt that this was actually a detour. You can choose a good computer room to solve the DNS analysis aspect (later I concluded that relying on IP segments to distribute data is unreliable)
Specific How to do it is to put it in a computer room that supports multi-line (education network Tietong, etc.). The Alibaba Cloud used by our company now is multi-line)
In this way, there is no need to allocate multiple IP segments by yourself (that is, to determine the access user Is it Telecom or Netcom, etc.).
Two principles for using memory cache (Douban uses memcached):
1. For data that needs to consume resources
2. Data that needs to be reused. If it only needs to be used once, then even if it consumes resources, there is not much point in throwing it into the cache
Understand: Memory cache also requires memory, there is no need to waste it. If it does not need to be reused, it is wasteful to throw it into the memory (after all, memory is not cheap and takes up server resources)
Douban’s memached hit rate is quite high. This also relieves a lot of stress.
InnoDB has good concurrent access support because it supports row-level storage. Whether to use myisam or innodb, their business characteristics are: use myisam to read more and write less, use innodb to write more and read less Function module division table. A table related to a functional module is placed in a library), as mentioned, the classic mysql master-slave architecture is used. So each library is actually duplicated in three copies (the primary and secondary libraries he mentioned). It should be three mysql slave servers
that operate multiple libraries and use cursors to obtain specific libraries and specific tables. Pass in the parameters (I don’t understand the details)
The problem of database master-slave replication delay has always been a common problem.
Buying a hard drive is a lesson: in the beginning, you would rather invest more money to buy a better disk, because it is impossible to upgrade the disk. By then the website can no longer hold it. Still have to change. Well, at the beginning, I would rather spend more money and buy high-speed disks, because if the business develops quickly, it will have to be replaced. Even though it is more expensive, the disk is still not wasted.
When there were 2 million dynamic requests per day, Douban mentioned that static small file services (user avatars, cover pictures) made disk i/0 a bottleneck. In the past, I was stupid enough to send pictures Put them all under one directory. There are hundreds of thousands of small files under this directory (which directly results in the inability to use the ls command, and the server will die as soon as it is used). At this time, the files are divided into directories. Divide each directory into 10,000 files.
Have a dedicated data mining team. The algorithm team performs matrix calculations and puts the results into mysql for front-end query and display.
Douban’s fs is specially developed for image storage. In fact, the mechanism is based on Amazon's, and three copies of data are written when writing.
Disk random seek is more important than throughput. The performance bottleneck at that time was the disk seek speed (this is similar to the disk disk caused by a large number of image accesses when looking at Taobao's image file system analysis. The delay caused by frequent head positioning is similar)
Later, all myisam tables were changed to innodb tables.
Innodb’s cache: is self-managed in the process (that is, in memory), while myisam’s cache is based on files (controlled by the operating system). In the past, both myisam tables and innodb tables were used, which caused the two types of tables to compete with each other for memory, which was not efficient. All indexes were replaced with the innodb storage engine (I don’t understand this very well, I only understand that the consideration is to better utilize the memory)
Application server failure: nginx comes with its own functions.
The traffic of pictures has become a big cost: it is cheaper to move to the Tianjin computer room. The cabinet is relatively cheap, and all data mining data and image data can be moved there.
Two computer rooms in Beijing and Tianjin. Each of them builds the master-slave structure of mysql.
Search: I have been using the full-text index of mysql before. Later, it was migrated to use sphinx (this was used in combination with mysql as a storage engine of mysql), and later it became xapain
Why not use sphinx? No detailed explanation
Use MogileFS to store pictures, and later developed doubanfs storage. Reason for migration: Mogilefs has a performance bottleneck. Since mogilefs stores metadata (namespace, and file location) in mysql, as the number of database rows increases, it will become slower and slower. A large number of small files need to be read from the database, which also affects the speed. The number of rows at that time was growing very fast, and the bottleneck at that time was the mysql database.
Large fields affect the performance of the database. In fact, the number of rows in the data table is not many. It’s the impact of large fields. The large text fields are removed and stored in doubanDB developed by myself (it is a key-value database, which is simplified with reference to Amazon's dymamo). The underlying storage is based on tokyocabinet. Later, doubanfs was rewritten and implemented based on doubandb to store pictures in it.
Use a dual master solution. This solves the replication delay problem, because both writing and reading are to the same master, and the data read is the latest. In the past: writing from the master and then reading from the slave, there was a data delay
deploying lvs.
Previously used spread as the message queue, and later used rabbitMQ instead
======================== =================================
Summary: It is not feasible to copy its architecture and technical solutions. Only by learning from his mistakes and the design ideas behind it can we learn the essence (mainly understand why it is done that way and what considerations it is based on).
Lessons: disk selection and computer room selection. Choose a disk with a fast RPM, and it’s worth the higher initial cost.
Sub-library, first divide the area from a functional perspective. There is no need to do horizontal partitioning yet. It is a necessary stage to put the function-related tables in a library or on a separate server.
It is worthwhile to spend money on memory. A machine can never have too much memory. Databases consume more memory. General memory often becomes a bottleneck (a large number of connections and calculation data can cause memory problems). not enough). Memcached is not cheap (network i/0, consumes cpu). Be careful what you put into memcached.
Avoid database join operations (this is similar to the point of view shared by Shi Zhan before, reduce join operations and prefer to split the data into multiple times to obtain data. Facebook's architecture also mentions not to do JOIN operations)
The overall feeling is that the database experience I learned from Douban is in terms of sub-databases. Their visit level does not need to be divided horizontally, it can be divided into databases and partitioned according to business functions. Tables related to a business function module are split into the same library. Then perform master-slave synchronization on the database server to maintain hot backup of data.
Sharding’s application scenarios in the industry are basically situations where read applications are relatively heavy, and transaction security requirements are not high. Such scenarios will be very suitable.
Sata*3 checked and found that 450G costs more than 1,000 yuan each.
The SATA hard drive has a relatively high failure rate, so I replaced it with a SCSI hard drive.
For image storage or small file storage, due to the large volume (traffic cost, storage cost), we developed our own file system
If image storage relies on a database for storage, the amount of data will be large After that, it will indeed become a bottleneck (no wonder Taobao’s image file system hides part of the metadata in the saved file name of the image)
Question: Beijing and Tianjin cross computer rooms, and mysql on both sides is running between them. Synchronize data, or the data mining program in Tianjin writes data to Beijing. What is the speed?
I checked the information and found that it generally requires the use of a dedicated optical fiber network channel.