


Summary of Douban Architecture Change Sharing Session_PHP Tutorial
The key points are as follows:
Currently there are 23 pc servers
The number of pv per day is about 2k million. The number of registered users is 3 million.
Most of the data in the table has tens of millions of rows.
5-person algorithm team. In addition, there are a total of 11 developers, including full-time and part-time (in the past, there were only 10 people when I saw the technology shared by Baixing.com)
In 2006, there were about 1.2 million dynamic requests every day. At this time, the main bottleneck is disk i/0. Get venture capital and have money to buy hardware equipment. Purchase two iu servers (dual core, 4g memory)
one as an application server and one as a database server, migrate to a dual-line IP computer room, and use dns to resolve IP addresses of different network segments (find out which network segments belong to China Telecom) Which network segments are Netcom, and then analyze them yourself). After reading the computer room adjustment mentioned at the end of the speech, I felt that this was actually a detour. You can choose a good computer room to solve the DNS analysis aspect (later I concluded that relying on IP segments to distribute data is unreliable)
Specific How to do it is to put it in a computer room that supports multi-line (education network Tietong, etc.). The Alibaba Cloud used by our company now is multi-line)
In this way, there is no need to allocate multiple IP segments by yourself (that is, to determine the access user Is it Telecom or Netcom, etc.).
Two principles for using memory cache (Douban uses memcached):
1. For data that needs to consume resources
2. Data that needs to be reused. If it only needs to be used once, then even if it consumes resources, there is not much point in throwing it into the cache
Understand: Memory cache also requires memory, there is no need to waste it. If it does not need to be reused, it is wasteful to throw it into the memory (after all, memory is not cheap and takes up server resources)
Douban’s memached hit rate is quite high. This also relieves a lot of stress.
InnoDB has good concurrent access support because it supports row-level storage. Whether to use myisam or innodb, their business characteristics are: use myisam to read more and write less, use innodb to write more and read less Function module division table. A table related to a functional module is placed in a library), as mentioned, the classic mysql master-slave architecture is used. So each library is actually duplicated in three copies (the primary and secondary libraries he mentioned). It should be three mysql slave servers
that operate multiple libraries and use cursors to obtain specific libraries and specific tables. Pass in the parameters (I don’t understand the details)
The problem of database master-slave replication delay has always been a common problem.
Buying a hard drive is a lesson: in the beginning, you would rather invest more money to buy a better disk, because it is impossible to upgrade the disk. By then the website can no longer hold it. Still have to change. Well, at the beginning, I would rather spend more money and buy high-speed disks, because if the business develops quickly, it will have to be replaced. Even though it is more expensive, the disk is still not wasted.
When there were 2 million dynamic requests per day, Douban mentioned that static small file services (user avatars, cover pictures) made disk i/0 a bottleneck. In the past, I was stupid enough to send pictures Put them all under one directory. There are hundreds of thousands of small files under this directory (which directly results in the inability to use the ls command, and the server will die as soon as it is used). At this time, the files are divided into directories. Divide each directory into 10,000 files.
Have a dedicated data mining team. The algorithm team performs matrix calculations and puts the results into mysql for front-end query and display.
Douban’s fs is specially developed for image storage. In fact, the mechanism is based on Amazon's, and three copies of data are written when writing.
Disk random seek is more important than throughput. The performance bottleneck at that time was the disk seek speed (this is similar to the disk disk caused by a large number of image accesses when looking at Taobao's image file system analysis. The delay caused by frequent head positioning is similar)
Later, all myisam tables were changed to innodb tables.
Innodb’s cache: is self-managed in the process (that is, in memory), while myisam’s cache is based on files (controlled by the operating system). In the past, both myisam tables and innodb tables were used, which caused the two types of tables to compete with each other for memory, which was not efficient. All indexes were replaced with the innodb storage engine (I don’t understand this very well, I only understand that the consideration is to better utilize the memory)
Application server failure: nginx comes with its own functions.
The traffic of pictures has become a big cost: it is cheaper to move to the Tianjin computer room. The cabinet is relatively cheap, and all data mining data and image data can be moved there.
Two computer rooms in Beijing and Tianjin. Each of them builds the master-slave structure of mysql.
Search: I have been using the full-text index of mysql before. Later, it was migrated to use sphinx (this was used in combination with mysql as a storage engine of mysql), and later it became xapain
Why not use sphinx? No detailed explanation
Use MogileFS to store pictures, and later developed doubanfs storage. Reason for migration: Mogilefs has a performance bottleneck. Since mogilefs stores metadata (namespace, and file location) in mysql, as the number of database rows increases, it will become slower and slower. A large number of small files need to be read from the database, which also affects the speed. The number of rows at that time was growing very fast, and the bottleneck at that time was the mysql database.
Large fields affect the performance of the database. In fact, the number of rows in the data table is not many. It’s the impact of large fields. The large text fields are removed and stored in doubanDB developed by myself (it is a key-value database, which is simplified with reference to Amazon's dymamo). The underlying storage is based on tokyocabinet. Later, doubanfs was rewritten and implemented based on doubandb to store pictures in it.
Use a dual master solution. This solves the replication delay problem, because both writing and reading are to the same master, and the data read is the latest. In the past: writing from the master and then reading from the slave, there was a data delay
deploying lvs.
Previously used spread as the message queue, and later used rabbitMQ instead
======================== =================================
Summary: It is not feasible to copy its architecture and technical solutions. Only by learning from his mistakes and the design ideas behind it can we learn the essence (mainly understand why it is done that way and what considerations it is based on).
Lessons: disk selection and computer room selection. Choose a disk with a fast RPM, and it’s worth the higher initial cost.
Sub-library, first divide the area from a functional perspective. There is no need to do horizontal partitioning yet. It is a necessary stage to put the function-related tables in a library or on a separate server.
It is worthwhile to spend money on memory. A machine can never have too much memory. Databases consume more memory. General memory often becomes a bottleneck (a large number of connections and calculation data can cause memory problems). not enough). Memcached is not cheap (network i/0, consumes cpu). Be careful what you put into memcached.
Avoid database join operations (this is similar to the point of view shared by Shi Zhan before, reduce join operations and prefer to split the data into multiple times to obtain data. Facebook's architecture also mentions not to do JOIN operations)
The overall feeling is that the database experience I learned from Douban is in terms of sub-databases. Their visit level does not need to be divided horizontally, it can be divided into databases and partitioned according to business functions. Tables related to a business function module are split into the same library. Then perform master-slave synchronization on the database server to maintain hot backup of data.
Sharding’s application scenarios in the industry are basically situations where read applications are relatively heavy, and transaction security requirements are not high. Such scenarios will be very suitable.
Sata*3 checked and found that 450G costs more than 1,000 yuan each.
The SATA hard drive has a relatively high failure rate, so I replaced it with a SCSI hard drive.
For image storage or small file storage, due to the large volume (traffic cost, storage cost), we developed our own file system
If image storage relies on a database for storage, the amount of data will be large After that, it will indeed become a bottleneck (no wonder Taobao’s image file system hides part of the metadata in the saved file name of the image)
Question: Beijing and Tianjin cross computer rooms, and mysql on both sides is running between them. Synchronize data, or the data mining program in Tianjin writes data to Beijing. What is the speed?
I checked the information and found that it generally requires the use of a dedicated optical fiber network channel.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



With the rapid development of social media, Xiaohongshu has become one of the most popular social platforms. Users can create a Xiaohongshu account to show their personal identity and communicate and interact with other users. If you need to find a user’s Xiaohongshu number, you can follow these simple steps. 1. How to use Xiaohongshu account to find users? 1. Open the Xiaohongshu APP, click the "Discover" button in the lower right corner, and then select the "Notes" option. 2. In the note list, find the note posted by the user you want to find. Click to enter the note details page. 3. On the note details page, click the "Follow" button below the user's avatar to enter the user's personal homepage. 4. In the upper right corner of the user's personal homepage, click the three-dot button and select "Personal Information"

As a platform integrating social networking and e-commerce, Xiaohongshu has attracted more and more users to join. Some users hope to register multiple accounts to better experience interacting with Xiaohongshu. So, how to register multiple accounts on Xiaohongshu? 1. How to register multiple accounts on Xiaohongshu? 1. Use different mobile phone numbers to register. Currently, Xiaohongshu mainly uses mobile phone numbers to register accounts. Users sometimes try to purchase multiple mobile phone number cards and use them to register multiple Xiaohongshu accounts. However, this approach has some limitations, because purchasing multiple mobile phone number cards is cumbersome and costly. 2. Use email to register. In addition to your mobile phone number, your email can also be used to register a Xiaohongshu account. Users can prepare multiple email addresses and then use these email addresses to register accounts. but

Xiaohongshu, a social platform integrating life, entertainment, shopping and sharing, has become an indispensable part of the daily life of many young people. So, how to register a Xiaohongshu account? 1. How to register a Xiaohongshu account? 1. Open the Xiaohongshu official website or download the Xiaohongshu APP. Click the "Register" button below and you can choose different registration methods. Currently, Xiaohongshu supports registration with mobile phone numbers, email addresses, and third-party accounts (such as WeChat, QQ, Weibo, etc.). 3. Fill in the relevant information. According to the selected registration method, fill in the corresponding mobile phone number, email address or third-party account information. 4. Set a password. Set a strong password to keep your account secure. 5. Complete the verification. Follow the prompts to complete mobile phone verification or email verification. 6. Perfect the individual

In Ubuntu systems, the root user is usually disabled. To activate the root user, you can use the passwd command to set a password and then use the su- command to log in as root. The root user is a user with unrestricted system administrative rights. He has permissions to access and modify files, user management, software installation and removal, and system configuration changes. There are obvious differences between the root user and ordinary users. The root user has the highest authority and broader control rights in the system. The root user can execute important system commands and edit system files, which ordinary users cannot do. In this guide, I'll explore the Ubuntu root user, how to log in as root, and how it differs from a normal user. Notice

As one of the most popular lifestyle sharing platforms in the world, Xiaohongshu has attracted a large number of users. So, how to register a Xiaohongshu account? This article will introduce you to the Xiaohongshu account registration process in detail, and answer the question of how to recover Xiaohongshu account abnormalities. 1. How to register a Xiaohongshu account? 1. Download the Xiaohongshu APP: Search and download the Xiaohongshu APP in the mobile app store, and open it after the installation is complete. 2. Register an account: After opening the Xiaohongshu APP, click the "Me" button in the lower right corner of the homepage, and then select "Register". 3. Fill in the registration information: Fill in the mobile phone number, set password, verification code and other registration information according to the prompts. 4. Complete personal information: After successful registration, follow the prompts to complete personal information, such as name, gender, birthday, etc. 5. Settings

qooapp is a software that can download many games, so how to register an account? Users need to click the "Register" button if they don't have a pass yet, and then choose a registration method. This account registration method introduction is enough to tell you how to operate it. The following is a detailed introduction, so take a look. How to register a qooapp account? Answer: Click to register, and then choose a registration method. Specific methods: 1. After entering the login interface, click below. Don’t have a pass yet? Apply now. 2. Then choose the login method you need. 3. You can use it directly after that. Official website registration: 1. Open the website https://apps.ppaooq.com/ and click on the upper right corner to register. 2. Select registration

1. First, we enter NetEase Cloud Music, and then click on the software homepage interface to enter the song playback interface. 2. Then in the song playback interface, find the sharing function button in the upper right corner, as shown in the red box in the figure below, click to select the sharing channel; in the sharing channel, click the "Share to" option at the bottom, and then select the first "WeChat Moments" allows you to share content to WeChat Moments.

SpringDataJPA is based on the JPA architecture and interacts with the database through mapping, ORM and transaction management. Its repository provides CRUD operations, and derived queries simplify database access. Additionally, it uses lazy loading to only retrieve data when necessary, thus improving performance.
