Table of Contents
Analysis process
Optimization Report
1 Background
2 User access diagram
3 Existing problems
4. Analysis
1. Web server solution
6 Server upgrade plan
Home Backend Development PHP Tutorial Optimization experience of a production accident

Optimization experience of a production accident

Mar 12, 2017 pm 04:24 PM

After a normal event promotion, customer service began to give feedback one after another. Users reported that they could not open the webpage or APP when grabbing bids. When they opened it, the bids had already been snatched up. They were not particularly interested at first. I felt Isn’t that what it’s like when competing for bids, and isn’t that what it’s like when competing for Xiaomi phones? As the event continued, more users protested strongly. Users who received interest rate coupons or cash coupons were unable to grab the bids, believing that the platform was fraudulent and deliberately prevented them from being used to save resources.

Analysis process

In fact, there have been continuous user feedbacks in the past that did not decrease, and customers were deceived by using Xiaomi to grab mobile phones as an example. This time the user feedback was too strong, so we paid attention to it. got up. We have a total of three front-end products, app, official website, and H5. Among them, the app is used the most, and the official website is second. H5 is rarely used in daily life, but the traffic will increase sharply during events (events are usually mostly H5 games, and H5 is also convenient for promotion and marketing. ), the three front-end products all use lvs to load into the two back-end webservice servers (as shown below). This time the user feedback is basically on the web and app sides, so focus on observing these four servers. server.

Optimization experience of a production accident

First of all, I suspected whether the network bandwidth was full, and found a network engineer to monitor it through Tools. During the bidding process, the maximum bandwidth usage was only about 70%. , and then rule it out; I once again doubted whether the web server could no longer withstand it. Use the top command to check the load of the two servers on the official website. At the moment of bidding, it will soar to about 6-8, and it will slowly increase after the bidding. It returned to normal, and the two servers of the app peaked at 10-12, and then returned to normal.

Tracked the web server business log and found that the database Update layer reported that no new database connections could be requested or the database connections had been used up. It was thought that the maximum number of connections in the database was too small, so adjustments were made. mysql databaseThe maximum number of connections is 3 times that of the past; I will continue to observe the business log when bidding next time and find that errors related to database links are no longer reported, but many users still report that the page cannot be opened during bidding. .

Continue to track the web server, use the command (ps -ef|grep httpd|wc -l) when bidding to check the number of httpd connections, which is about 1,000, and randomly check apacheThe maximum number of connections set in the configuration file is 1024 (apache’s default maximum number of connections is 256). It turns out that the number of connections during the bidding process has reached the maximum number of connections. Many users have been unable to obtain http connections during the bidding process. As a result, the page becomes unresponsive or the app keeps waiting. So adjust the maximum number of connections in the apache configuration file to 1024*3.

Continue to observe during the bidding process, the number of Apache connections can still soar to between 2600-2800 during the bidding process. According to customer service feedback, there are still many users reporting the problem of bidding, but it is slightly better than before. A little, but there are sporadic user feedbacks that they have already grabbed the target, and finally it was rolled back. Then continue to observe the database server, use the top command and MySQL Workbench to view the various loads of the mysql main library and the slave library. I was shocked (as shown below). The indicators of the mysql server main library have reached their peak, while the slave library is almost not too big. pressure.

Optimization experience of a production accident

The tracking code found that all the business codes at the three ends were connected to the main library, and only the query business in the background was used in the slave library, so the transformation was started immediately; Except for queries during the bidding process, all queries on other pages or businesses were transformed into queries on the slave database. After the transformation, we found that the pressure on the master database was significantly reduced, and the pressure on the slave database began to increase. As shown below:

Optimization experience of a production accident

#According to the feedback from customer service, after the transformation, the problem of the bid being returned is almost gone. During the bidding process, the page cannot be opened or is opened slowly. It has been alleviated to a certain extent, but some users still report this problem. According to the analysis results of the above projects, we can conclude that:

  • 1 The two servers under load have reached the processing limit and more configurations are required. server to load.

  • 2 The pressure on the mysql main database has been significantly reduced, but the pressure on the slave database has increased. It is necessary to change the current one master and one slave to one master and multiple slaves model.

  • 3 To completely solve these problems, we need to comprehensively consider the overall optimization of the platform, such as: business optimization (removing hot spots in the business), increasing caching, and paginationfacestatic (you can use the front-end optimization rules of Yahoo and Google, and there are many test websites on the Internet for evaluation) and so on.

I wrote an optimization report based on these circumstances, see below:

Optimization Report

1 Background

With the continuous development of the company's business, the business volume and user volume have surged. The official website pv has also increased from the initial xxx-xxx to the current xxx-xxxx, and the active users of the APP have increased significantly; therefore, it has also affected the current platform's TechnologyArchitecture has greater challenges. Especially when the platform's bid sources are tight recently, the time to complete the bid is getting shorter and shorter. The pressure on servers is also increasing; therefore, the current system architecture needs to be upgraded to support a larger number of users and business volumes.

2 User access diagram

Optimization experience of a production accident

Currently, the platform has three products facing users, the platform official website, platform APP, and platform small webpage; among them, the platform official website and platform APP The pressure is relatively high.

3 Existing problems

The problems when users compete for bids are concentrated in the following aspects
1. The webpage or APP cannot be opened
2. The website or APP is slow to open
3. After the transfer was successful during the bidding process, the update failed due to the heavy pressure on the server, and the refund was issued again.
4. The number of database connections was exhausted, resulting in the failure to add investment records after the bidding was full, and the progress of the bidding was rolled back.

4. Analysis

Through in-depth analysis of recent server parameters, concurrency, and system logs, it is concluded that:
1. The server pressure is huge during the bidding process of the platform's official website and platform APP. Among them, the problem of platform APP is more prominent. During the peak period of bidding, the maximum number of apache connections for a single APP server has been close to 2600, which is close to the maximum processing capacity of apache. 2. The database server is under huge pressure. The pressure on the database is mainly prominent in two periods

1) When the platform is doing activities, the number of visits to the official website, small web pages, and APPs increases dramatically, resulting in a huge increase in data query volume. When the database processing limit is reached, problems will occur. Problems such as slow website opening;

2) When users compete for bids, the pressure on users to compete for bids is divided into two stages: before bidding and during bidding. Before bidding, because the bidding is full very quickly, users open the bidding page in advance and refresh it continuously. This will increase the query pressure on the database. If the number of users competing for bids is very large, the number of database connections will be used up before bidding. ; During the bidding process, a single purchase will probably involve about 15 tables for change and query. Each bid has a share of 10 million, and about 100-200 people will purchase and complete the full bid each time. Calculated based on the median value of 150 people, in a few seconds The data needs to be updated 2000-
300
0 times within a period of time (only updates, excluding queries), resulting in a large amount of concurrency, which may cause update failures or connection timeouts, thus affecting user bidding and normal system fullness. mark. 5 Solution

1. Web server solution

Schematic diagram of a single user accessing web services


Optimization experience of a production accidentCurrent website and platform The APP uses two services for balanced responsibility. Each server has

installed

apache for server-side processing. Each apache can handle a maximum of about 2,000 connections. Therefore, in theory, the current website or APP can handle more than 4,000 user requests. If you want to support 10,000 requests at the same time, you need 5 apache servers to support it, so you currently lack 6 web servers. Access diagram after upgrading the server

Optimization experience of a production accident2. Database solution

Current database deployment plan



Optimization experience of a production accident1) Master-slave Separately solves 80% of the query pressure of the main database. At present, the official website and APP of the platform are connected to the MySQL main database, which doubles the pressure on the main database. Migrating all queries in the service to the slave database can greatly reduce the pressure on the main database.

2) Add a cache server. When the slave database query reaches its peak, it will also affect the master-slave synchronization, thereby affecting transactions. Therefore, queries frequently used by users are cached to reduce the request pressure on the database. It is necessary to

add

three cache servers to build a redis cluster.

3. Other optimizations
1) The homepage of the official website is static. According to cnzz statistics, the homepage accounts for about 15% of the total visits to the website. Data that does not change frequently on the homepage are processed statically to improve The smoothness of opening the official website.

2) Optimize the apache server, enable gzip compression, configure a reasonable number of links, etc.

3) Remove the update hotspot in the investment process: the target schedule. Each time a bid succeeds or fails, the bid schedule needs to be updated. Problems such as optimistic locking may occur during multi-thread updates. Eliminate updates during the process and only save the bid progress information in the bid schedule after the bid is full, optimizing the pressure on the database during the investment process.

6 Server upgrade plan

1. The biggest pressure on the platform comes from the database. It is necessary to change the current one master and one slave to one master and four slaves. A large number of queries generated by the official website/app/small webpage are distributed to three slave databases by virtual IP, and the background management queries go to another slave database. The database needs to add three new servers
Schematic diagram after database upgrade
Optimization experience of a production accident

2. Increase cache to reduce data pressure. Two new cache servers with large memory need to be added
Optimization experience of a production accident

3. Three new web servers need to be added to decompose user access requests.

The app needs to add two new servers.
The pressure on the app server during the bidding process Maximum, two new servers need to be added. Schematic diagram after the configuration is completed
Optimization experience of a production accident

The official website needs to add one new server
The official website also has certain requirements in the bidding process Pressure requires a new server. The completed diagram is as follows:
Optimization experience of a production accident

In total, 8 servers need to be purchased, two of which require large memory (64G or more)

Click to download the optimization report word version

Note: After all optimization plans are put into production, the problems will be solved and there will be no bids. worry!


##Author: Pure Smile
Source: http://www.php.cn/
Copyright belongs to the author, please indicate the source when reprinting.

The above is the detailed content of Optimization experience of a production accident. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

Describe the SOLID principles and how they apply to PHP development. Describe the SOLID principles and how they apply to PHP development. Apr 03, 2025 am 12:04 AM

The application of SOLID principle in PHP development includes: 1. Single responsibility principle (SRP): Each class is responsible for only one function. 2. Open and close principle (OCP): Changes are achieved through extension rather than modification. 3. Lisch's Substitution Principle (LSP): Subclasses can replace base classes without affecting program accuracy. 4. Interface isolation principle (ISP): Use fine-grained interfaces to avoid dependencies and unused methods. 5. Dependency inversion principle (DIP): High and low-level modules rely on abstraction and are implemented through dependency injection.

How to automatically set permissions of unixsocket after system restart? How to automatically set permissions of unixsocket after system restart? Mar 31, 2025 pm 11:54 PM

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

Explain the concept of late static binding in PHP. Explain the concept of late static binding in PHP. Mar 21, 2025 pm 01:33 PM

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

How to debug CLI mode in PHPStorm? How to debug CLI mode in PHPStorm? Apr 01, 2025 pm 02:57 PM

How to debug CLI mode in PHPStorm? When developing with PHPStorm, sometimes we need to debug PHP in command line interface (CLI) mode...

How to send a POST request containing JSON data using PHP's cURL library? How to send a POST request containing JSON data using PHP's cURL library? Apr 01, 2025 pm 03:12 PM

Sending JSON data using PHP's cURL library In PHP development, it is often necessary to interact with external APIs. One of the common ways is to use cURL library to send POST�...

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

See all articles