First of all, you need to understand why the system uses distributed. With the development of the Internet, many performance bottlenecks of traditional single-project projects have become more and more prominent. Performance bottlenecks can have several aspects. 1. Application service layer: As the number of users increases, the amount of concurrency increases, and a single project cannot withstand the performance bottleneck caused by such large concurrent requests. 2. The underlying database layer: With the development of business, the pressure on the database is increasing, resulting in performance bottlenecks. In view of the above two points, I think it can be solved from two aspects. Application service layer: There are several solutions for the application service layer: Application system cluster: The simplest application system cluster is a server cluster, such as: tomcat cluster. When applying system clusters, the more prominent problem is session sharing. Session sharing can be solved through server plug-ins. The other one can also be implemented through middleware such as redis. Service-based splitting: Service-based splitting is a very hot method currently. Now everyone is talking about microservices. By splitting traditional projects into services, services can be independently decoupled, and single services can be expanded horizontally. The classic problem encountered in service splitting is the distributed transaction problem. Currently, there are several commonly used distributed transaction solutions: message eventual consistency, TCC compensation transaction, and best-effort notification. For specific details, you can refer to this blog distributed transaction solution Underlying database layer: If the performance pressure of the system appears in the database, then we can solve it with solutions such as read-write separation, sub-database and sub-table. Because of this aspect, I There is not enough experience, so you can refer to other literature. Mycat database sharding and table sharding middleware MySQL high-availability read-write separation cluster
Thank you for the invitation. I am not a great expert, and I feel that the question asked by the questioner is too big and difficult to answer.
However, as far as I know, technical interviews are generally gradual, and the interviewer tests the depth of the candidate's technical system. So in theory, the more you talk about a topic, the better the candidate's ability. Many high concurrency and distributed processing problems are actually empirical issues, because the high concurrency processing situations of different business scenarios and different amounts of data are completely different, and there is no completely universal solution.
So during the interview, it is very important to understand the business scenario described by the interviewer. This is also key information for analyzing performance bottlenecks. According to the barrel principle, there must be a bottleneck at a certain key point.
To give a simple example, As the business grows, a server cluster directly connected to the database encounters a performance bottleneck. How should it be solved?
At this time, you must first analyze where the performance bottleneck lies. First, you must consider whether the design of the database itself is reasonable, whether the index is playing a role, analyze the SQL execution plan, and whether the database can be split horizontally or vertically to share the pressure. .
It is also necessary to analyze whether a distributed read-write separation database can be used, which will lead to issues of data synchronization, data distribution, etc.
After the database layer is analyzed, it can also be analyzed at the application layer. Generally, caching is used to improve query performance, which involves cache hit rate issues, cache update issues, cached multi-node hash issues, etc.
In general, you need to understand the business scenario, and then try to come up with a solution to a specific problem. For example, does Sina Weibo use push mode or pull mode? If it is a push mode, does a big V with tens of millions of fans post a Weibo post to tens of millions of people? If it is pull mode, will there be any performance problems every time when one person follows too many users?
There are many types of distribution. For example, you divide your project into multiple modules, and each module uses a jvm to complete the overall work through rpc calls. This is distributed. For example, your redis server cannot withstand so much concurrency or cannot If you have so much memory for caching, you can build a redis cluster, which is also distributed. For example, if your database alone cannot meet your current data volume or concurrency, then you can divide the database into tables and implement transactions through JTA. This is also distributed. , plus log synchronization, load balancing, adding hdfs to store data backup, storing logs, and adding eslaticsearch to analyze and search logs can all be discussed.
When someone asks a general question, if the answerer follows the general question, he will fail... It is recommended to be logical when answering. If it were me, I would do two things: 1. Find a project that I have done before, discuss the bottlenecks with him, then introduce theoretical knowledge, and finally give a solution. This approach makes it easy for you to talk about yourself, and it is best to have some interaction with the interviewer. Pay attention to the interviewer’s questions during the conversation and check clearly what the interviewer is asking
2. The second approach is to show off the cards first and bring out the more classic distributed issues, such as traffic load. Business guarantees. . . . Then give a business scenario for the problem.
First of all, you need to understand why the system uses distributed.
With the development of the Internet, many performance bottlenecks of traditional single-project projects have become more and more prominent. Performance bottlenecks can have several aspects.
1. Application service layer: As the number of users increases, the amount of concurrency increases, and a single project cannot withstand the performance bottleneck caused by such large concurrent requests.
2. The underlying database layer: With the development of business, the pressure on the database is increasing, resulting in performance bottlenecks.
In view of the above two points, I think it can be solved from two aspects.
Application service layer:
There are several solutions for the application service layer:
Application system cluster:
The simplest application system cluster is a server cluster, such as: tomcat cluster. When applying system clusters, the more prominent problem is session sharing. Session sharing can be solved through server plug-ins. The other one can also be implemented through middleware such as redis.
Service-based splitting:
Service-based splitting is a very hot method currently. Now everyone is talking about microservices. By splitting traditional projects into services, services can be independently decoupled, and single services can be expanded horizontally. The classic problem encountered in service splitting is the distributed transaction problem. Currently, there are several commonly used distributed transaction solutions: message eventual consistency, TCC compensation transaction, and best-effort notification. For specific details, you can refer to this blog distributed transaction solution
Underlying database layer:
If the performance pressure of the system appears in the database, then we can solve it with solutions such as read-write separation, sub-database and sub-table. Because of this aspect, I There is not enough experience, so you can refer to other literature.
Mycat database sharding and table sharding middleware
MySQL high-availability read-write separation cluster
Thank you for the invitation. I am not a great expert, and I feel that the question asked by the questioner is too big and difficult to answer.
However, as far as I know, technical interviews are generally gradual, and the interviewer tests the depth of the candidate's technical system. So in theory, the more you talk about a topic, the better the candidate's ability. Many high concurrency and distributed processing problems are actually empirical issues, because the high concurrency processing situations of different business scenarios and different amounts of data are completely different, and there is no completely universal solution.
So during the interview, it is very important to understand the business scenario described by the interviewer. This is also key information for analyzing performance bottlenecks. According to the barrel principle, there must be a bottleneck at a certain key point.
To give a simple example, As the business grows, a server cluster directly connected to the database encounters a performance bottleneck. How should it be solved?
At this time, you must first analyze where the performance bottleneck lies. First, you must consider whether the design of the database itself is reasonable, whether the index is playing a role, analyze the SQL execution plan, and whether the database can be split horizontally or vertically to share the pressure. .
It is also necessary to analyze whether a distributed read-write separation database can be used, which will lead to issues of data synchronization, data distribution, etc.
After the database layer is analyzed, it can also be analyzed at the application layer. Generally, caching is used to improve query performance, which involves cache hit rate issues, cache update issues, cached multi-node hash issues, etc.
In general, you need to understand the business scenario, and then try to come up with a solution to a specific problem. For example, does Sina Weibo use push mode or pull mode? If it is a push mode, does a big V with tens of millions of fans post a Weibo post to tens of millions of people? If it is pull mode, will there be any performance problems every time when one person follows too many users?
There are many types of distribution. For example, you divide your project into multiple modules, and each module uses a jvm to complete the overall work through rpc calls. This is distributed. For example, your redis server cannot withstand so much concurrency or cannot If you have so much memory for caching, you can build a redis cluster, which is also distributed. For example, if your database alone cannot meet your current data volume or concurrency, then you can divide the database into tables and implement transactions through JTA. This is also distributed. , plus log synchronization, load balancing, adding hdfs to store data backup, storing logs, and adding eslaticsearch to analyze and search logs can all be discussed.
When someone asks a general question, if the answerer follows the general question, he will fail... It is recommended to be logical when answering.
If it were me, I would do two things:
1. Find a project that I have done before, discuss the bottlenecks with him, then introduce theoretical knowledge, and finally give a solution. This approach makes it easy for you to talk about yourself, and it is best to have some interaction with the interviewer. Pay attention to the interviewer’s questions during the conversation and check clearly what the interviewer is asking
2. The second approach is to show off the cards first and bring out the more classic distributed issues, such as traffic load. Business guarantees. . . . Then give a business scenario for the problem.