Large website architecture evolution and knowledge system

Home

Backend Development

PHP Tutorial

Large website architecture evolution and knowledge system_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 13, 2016 am 10:37 AM

introduce and Large article Architecture evolution Knowledge website

There have been some articles introducing the evolution of large-scale website architecture before, such as LiveJournal and eBay, which are very worthy of reference, but I feel that they talk more about the results of each evolution rather than being very detailed. It talks about why such an evolution is necessary. In addition, it seems that many students have recently found it difficult to understand why a website requires such complex technology, so I came up with the idea of writing this article. In this article, I will explain a common It is a relatively typical architectural evolution process and the knowledge system that needs to be mastered in the process of the website developing into a large website. I hope it can give some preliminary concepts to students who want to work in the Internet industry:). Please correct me if there are any errors in the article. Give me some more suggestions so that this article can really serve as a starting point.

The first step in architecture evolution: physically separating the webserver and database

At first, because of some ideas, I built a Website, it is even possible that the host is rented at this time, but since this article only focuses on the evolution of the architecture, we assume that a host is already hosted at this time and has a certain bandwidth. At this time, due to the website It has certain features and attracted some people to visit. Gradually, you find that the pressure on the system is getting higher and higher, and the response speed is getting slower and slower. At this time, it is more obvious that the database and the application affect each other. If there is a problem with the application, the database will also be affected. It is easy for problems to occur, and when the database has problems, the application is also prone to problems, so it enters the first evolution stage: physically separating the application and the database into two machines. There is nothing new in technology at this time. requirements, but you find that it does work. The system has returned to its previous response speed and can support higher traffic without affecting each other between the database and the application.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

This One-step architecture evolution has basically no requirements for a technical knowledge system.

The second step of architectural evolution: increase page caching

The good times don’t last long, as more and more people visit, you find that the response The speed started to slow down again. I searched for the reason and found that there were too many operations to access the database, which led to fierce competition for data connections, so the response slowed down. However, you cannot open too many database connections, otherwise the pressure on the database machine will be very high, so I considered using Caching mechanism to reduce competition for database connection resources and pressure on database reading. At this time, you may first choose to use Squid and other similar mechanisms to cache relatively static pages in the system (for example, pages that will be updated in a day or two) ( Of course, you can also use the solution of making the page static), so that the program can be modified without modification, which can greatly reduce the pressure on the webserver and reduce the competition for database connection resources. OK, so I started to use Squid to do relatively static Page caching.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

Front end Page caching technology, such as Squid, if you want to use it well, you must have a deep understanding of Squid's implementation and cache invalidation algorithms.

The third step of architectural evolution: Add page fragment cache

After adding Squid for caching, the overall system The speed has indeed improved, and the pressure on the webserver has begun to decrease. However, as the number of visits increases, I find that the system has become a bit slow again. After tasting the benefits of dynamic caching such as Squid, I began to think about how to We can't cache the relatively static parts of today's dynamic pages, so we considered adopting a page fragment caching strategy like ESI. OK, so we started using ESI to cache the relatively static fragments of dynamic pages.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

page Fragment caching technology, such as ESI, etc., if you want to use it well, you also need to master the implementation of ESI;

The fourth step of architecture evolution: data caching

After using technologies such as ESI to improve the caching effect of the system again, the pressure on the system has indeed been further reduced. However, as the number of visits increases, the system still begins to slow down. After searching, you may find that the system There are some places in the system where data information is repeatedly obtained, such as obtaining user information. At this time, I began to consider whether this data information could also be cached, so I cached the data to local memory. After the changes were completed, it was completely in line with expectations. The system's response speed has been restored, and the pressure on the database has been reduced a lot again.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

Cache Technology, including Map data structure, caching algorithm, implementation mechanism of the selected framework itself, etc.

The fifth step of architecture evolution: Add webserver

The good times did not last long, and I found that with the increase in system visits, Increase, the pressure on the webserver machine will rise to a relatively high level during peak periods. At this time, we began to consider adding a webserver. This is also to solve the availability problem at the same time and avoid being unable to use a single webserver if it goes down. After doing so After these considerations, we decided to add a webserver. When adding a webserver, we will encounter some problems. Typical ones are:
1. How to distribute access to these two machines. The solution usually considered at this time is The load balancing solution that comes with Apache, or a software load balancing solution such as LVS;
2. How to keep the status information synchronized, such as user sessions, etc. The solutions that will be considered at this time include writing to the database, writing to storage, Mechanisms such as cookies or synchronized session information;
3. How to keep data cache information synchronized, such as previously cached user data, etc. The mechanisms usually considered at this time include cache synchronization or distributed cache;
4. How to make similar functions such as uploading files continue to work normally. The mechanism usually considered at this time is to use a shared file system or storage;
After solving these problems, the webserver was finally increased to two, and the system was finally restored. Returned to previous speed.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

Load Balancing technology (including but not limited to hardware load balancing, software load balancing, load algorithm, Linux forwarding protocol, implementation details of the selected technology, etc.), active and backup technologies (including but not limited to ARP spoofing, Linux heart-beat, etc.), Status information or cache synchronization technology (including but not limited to Cookie technology, UDP protocol, status information broadcast, implementation details of the selected cache synchronization technology, etc.), shared file technology (including but not limited to NFS, etc.), storage technology (including but not limited to Not limited to storage devices, etc.).

The sixth step of architecture evolution: Sub-library

Enjoyed the happiness of rapid growth in system visits for a period of time Later, I found that the system started to slow down again. What happened this time? After searching, I found that the resource competition for database connections in some operations of database writing and updating was very fierce, causing the system to slow down. What should I do now? Well, the options available at this time include database clustering and sub-library strategies. In terms of clustering, some databases do not support it very well, so sub-library will become a more common strategy. Sub-library means that the original program must be modified. After modification, after one-pass modification to implement sub-database, yes, the goal has been achieved, and the system recovery is even faster than before.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

This The next step is to make reasonable business divisions to achieve sub-library. There are no other requirements for specific technical details;

But at the same time, with the increase in data volume and the progress of sub-library , we need to do better in database design, tuning and maintenance, so we still have high requirements for these technologies.

The seventh step of architecture evolution: table sharding, DAL and distributed cache
As the system continues to run, the amount of data begins to increase significantly. At this time, it is found that the query after sharding the database still has some problems. It was slow, so I started to work on table subdivision according to the idea of sub-library. Of course, this will inevitably require some modifications to the program. Maybe at this time, you will find that the application itself has to care about the rules of sub-database and sub-table, etc., which are still somewhat Complex, so I thought about whether to add a general framework to achieve data access in sub-databases and sub-tables. This corresponds to DAL in eBay's architecture. This evolution process takes a relatively long time. Of course, there are Maybe this general framework will wait until the sub-table is completed before starting. At the same time, at this stage, you may find problems with the previous cache synchronization solution. Because the amount of data is too large, it is now impossible to store the cache locally and then synchronize it. In this way, a distributed cache solution needs to be adopted. So, after another round of investigation and torture, a large amount of data cache was finally transferred to the distributed cache.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

points Tables are also divided into business aspects, and the technologies involved include dynamic hash algorithm, consistent hash algorithm, etc.;

DAL involves more complex technologies, such as database connection management (timeout, exception), database operation control (timeout, exception), encapsulation of sub-database and sub-table rules, etc.;

The eighth step of architecture evolution: Add More webservers

After completing the tasks of sub-database and table, the pressure on the database has dropped to a relatively low level, and I have begun to live a happy life again watching the number of visits increase every day. , suddenly one day, I found that the access to the system began to slow down again. At this time, I first checked the database and found that the pressure was normal. Then I checked the webserver and found that apache blocked a lot of requests, and the application server was relatively fast for each request. Yes, it seems that the number of requests is too high, which causes the need to wait in line and the response speed is slow. This is easy to handle. Generally speaking, you will have some money at this time, so you add some webserver servers. In the process of adding webserver servers, Several challenges may arise:
1. Apache’s soft load or LVS soft load cannot handle the huge amount of web traffic (number of requested connections, network traffic, etc.). At this time, if funds allow, it will The solution is to purchase hardware loads, such as F5, Netsclar, Athelon, etc. If funds do not allow, the solution will be to logically classify the applications and then distribute them to different soft load clusters;
2. Some of the original status information synchronization, file sharing and other solutions may have bottlenecks and need to be improved. Maybe at this time, a distributed file system that meets the business needs of the website will be written according to the situation;
After completion After these efforts, we began to enter an era of seemingly perfect infinite scalability. When website traffic increased, the solution was to continuously add webservers.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

Arrived At this step, as the number of machines continues to grow, the amount of data continues to grow, and the requirements for system availability become higher and higher, this time requires a deeper understanding of the technology used, and needs to be based on the needs of the website. To make more customized products.

The ninth step of architecture evolution: separation of data reading and writing and cheap storage solution

Suddenly one day, I found this The perfect era is coming to an end, and the nightmare of the database appears again. Because too many webservers have been added, the database connection resources are still not enough. At this time, the database and tables have been divided into databases and tables, and the database has begun to be analyzed. Under pressure, you may find that the read-write ratio of the database is very high. At this time, you usually think of a solution to separate data reading and writing. Of course, this solution is not easy to implement. In addition, you may find that some data is wasted in the database. In other words, it takes up too much database resources, so the architecture evolution that may be formed at this stage is to separate data reading and writing, and at the same time write some cheaper storage solutions, such as BigTable.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

The separation of data reading and writing requires an in-depth mastery and understanding of database replication, standby and other strategies, and also requires self-implementation technology;

The cheap storage solution requires an in-depth understanding of OS file storage Mastery and understanding also require a deep understanding of the implementation of the language used in the document.

The tenth step of architecture evolution: Entering the era of large-scale distributed applications and the dream era of cheap server farms

Go through the above This long and painful process has finally ushered in the perfect era again. Continuously adding webservers can support higher and higher visits. For large websites, there is no doubt that popularity is important. As the popularity increases, The demand for various functions also began to increase explosively. At this time, it was suddenly discovered that the web application originally deployed on the web server was already very large. When multiple teams began to make changes to it, it could be It is quite inconvenient, and the reusability is also quite bad. Basically, every team has done more or less repeated things, and deployment and maintenance are also quite troublesome, because the huge application package is copied on N machines. It takes a lot of time to start up, and it is not easy to check when there is a problem. Another worse situation is that a bug in a certain application may cause the entire site to be unavailable, and there are other Factors such as difficulty in tuning (because the applications deployed on the machine have to do everything, and targeted tuning cannot be performed at all). Based on this analysis, I began to make up my mind to split the system according to responsibilities. So a large-scale distributed application was born. Usually, this step takes a long time because it encounters many challenges:
1. After being split into a distributed application, a high-performance and stable communication framework needs to be provided. , and need to support a variety of different communication and remote calling methods;
2. Splitting a huge application takes a long time, requiring business organization and system dependency control;
3 , how to operate and maintain (dependency management, health management, error tracking, tuning, monitoring and alarming, etc.) this huge distributed application.
After this step, the system architecture has entered a relatively stable stage. At the same time, it can also start to use a large number of cheap machines to support the huge amount of visits and data. Combined with this architecture and the experience learned from so many evolutions To use various other methods to support increasing traffic.

Look at the diagram of the system after this step is completed:

This step involves these knowledge systems:

This This step involves a lot of knowledge systems, requiring an in-depth understanding and mastery of communication, remote calling, message mechanisms, etc. It requires a clear understanding of the theory, hardware level, operating system level, and implementation of the language used. understand.

Operation and maintenance also involves a lot of knowledge systems. In most cases, you need to master distributed parallel computing, reporting, monitoring technology, rules and strategies, etc.

It’s really not that difficult to say. The classic evolution process of the entire website architecture is similar to the above comparison. Of course, the plan taken at each step and the steps of evolution There may be differences. In addition, because the business of the website is different, there will be different professional and technical needs. This blog explains the evolution process more from the perspective of architecture. Of course, there are still many technologies that are not yet included. This mentions things like database clusters, data mining, search, etc., but in the real evolution process, we will also use things like improving hardware configuration, network environment, transforming operating systems, CDN mirrors, etc. to support greater traffic. Therefore, in the real evolution process There will be many differences in the development process. Another large website has to do far more than just the above, but also security, operation and maintenance, operations, services, storage, etc. It is really difficult to do a good job in a large website. It’s not easy. I wrote this article more in the hope that it can lead to more introductions to the evolution of large-scale website architecture, :).

ps: Finally, here are a few articles on the evolution of LiveJournal architecture:
Looking at large-scale website performance optimization methods from the background development of LiveJournal
http://blog.zhangjianfeng.com/article/743
In addition, you can find more information about the current LiveJournal website architecture here: http://www.danga.com/words/.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7467

CakePHP Tutorial

1376

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

How can I make money by publishing articles on Toutiao today? How to earn more income by publishing articles on Toutiao today! Mar 15, 2024 pm 04:13 PM

1. How can you make money by publishing articles on Toutiao today? How to earn more income by publishing articles on Toutiao today! 1. Activate basic rights and interests: original articles can earn profits by advertising, and videos must be original in horizontal screen mode to earn profits. 2. Activate the rights of 100 fans: if the number of fans reaches 100 fans or above, you can get profits from micro headlines, original Q&A creation and Q&A. 3. Insist on original works: Original works include articles, micro headlines, questions, etc., and are required to be more than 300 words. Please note that if illegally plagiarized works are published as original works, credit points will be deducted, and even any profits will be deducted. 4. Verticality: When writing articles in professional fields, you cannot write articles across fields at will. You will not get appropriate recommendations, you will not be able to achieve the professionalism and refinement of your work, and it will be difficult to attract fans and readers. 5. Activity: high activity,

What is the architecture and working principle of Spring Data JPA? Apr 17, 2024 pm 02:48 PM

SpringDataJPA is based on the JPA architecture and interacts with the database through mapping, ORM and transaction management. Its repository provides CRUD operations, and derived queries simplify database access. Additionally, it uses lazy loading to only retrieve data when necessary, thus improving performance.

1.3ms takes 1.3ms! Tsinghua's latest open source mobile neural network architecture RepViT Mar 11, 2024 pm 12:07 PM

Paper address: https://arxiv.org/abs/2307.09283 Code address: https://github.com/THU-MIG/RepViTRepViT performs well in the mobile ViT architecture and shows significant advantages. Next, we explore the contributions of this study. It is mentioned in the article that lightweight ViTs generally perform better than lightweight CNNs on visual tasks, mainly due to their multi-head self-attention module (MSHA) that allows the model to learn global representations. However, the architectural differences between lightweight ViTs and lightweight CNNs have not been fully studied. In this study, the authors integrated lightweight ViTs into the effective

How steep is the learning curve of golang framework architecture? Jun 05, 2024 pm 06:59 PM

The learning curve of the Go framework architecture depends on familiarity with the Go language and back-end development and the complexity of the chosen framework: a good understanding of the basics of the Go language. It helps to have backend development experience. Frameworks that differ in complexity lead to differences in learning curves.

PyCharm Beginner's Guide: Comprehensive Analysis of Replacement Functions Feb 25, 2024 am 11:15 AM

PyCharm is a powerful Python integrated development environment with rich functions and tools that can greatly improve development efficiency. Among them, the replacement function is one of the functions frequently used in the development process, which can help developers quickly modify the code and improve the code quality. This article will introduce PyCharm's replacement function in detail, combined with specific code examples, to help novices better master and use this function. Introduction to the replacement function PyCharm's replacement function can help developers quickly replace specified text in the code

Detailed introduction of Samsung S24ai functions Jun 24, 2024 am 11:18 AM

2024 is the first year of AI mobile phones. More and more mobile phones integrate multiple AI functions. Empowered by AI smart technology, our mobile phones can be used more efficiently and conveniently. Recently, the Galaxy S24 series released at the beginning of the year has once again improved its generative AI experience. Let’s take a look at the detailed function introduction below. 1. Generative AI deeply empowers Samsung Galaxy S24 series, which is empowered by Galaxy AI and brings many intelligent applications. These functions are deeply integrated with Samsung One UI6.1, allowing users to have a convenient intelligent experience at any time, significantly improving the performance of mobile phones. Efficiency and convenience of use. The instant search function pioneered by the Galaxy S24 series is one of the highlights. Users only need to press and hold

Looking at the development and evolution of Go language from a historical perspective Mar 29, 2024 am 11:51 AM

Title: Historical Development and Evolution of the Go Language Since its launch by Google in 2009, the Go language (also known as Golang) has rapidly risen in the field of software development and has become one of the programming languages favored by many developers. The original intention of the Go language is to improve programmers' productivity while eliminating some of the shortcomings of traditional programming languages, such as memory leaks, concurrency safety and other issues. In the process of gradually maturing, the Go language has experienced many milestone events in its development and evolution. This article will explore the development and development of Go language from a historical perspective

Introduction to the skills and attributes of Hua Yishan Heart of the Moon Lu Shu Mar 23, 2024 pm 05:30 PM

In Hua Yishan Heart Moon, Lu Shu is an SSR celebrity. He is positioned as a single-target backline player and has a very impressive critical hit rate. Many players don’t know much about Lu Shu. Here’s what I’ve brought you. Come and take a look at the introduction to the skills and attributes of Hua Yishan Heart of the Moon Lu Shu. Celebrity Attributes Celebrity Skills 1. Lu Ming Shuzhong Skill Description: Lu Shu was born in Qiongqihui in Shuzhong. He has practiced martial arts since he was a child and has outstanding martial arts skills. Causes basic attack damage equal to 100% of the enemy's back row attack power, and reduces the target's rage by 10 points. Skill attributes: Level 2: Basic attack damage increased to 105%. Level 2: Basic attack damage is increased to 110%, and the target's rage is reduced by 15 points. Level 2: Basic attack damage increased to 115%. Level 2: Basic attack damage is increased to 120%, and the target's rage is reduced by 20 points. Level 2: Basic attack

See all articles