Question PreviewThrough interviews and manuscript requests, veterans in the field of operation and maintenance are invited to provide profound insights and collide together, with a view to forming some advanced consensus and promoting the industry to move forward better.
In this issue, we invite Chen Cunli, general manager of Du Xiaoman System Operation and Maintenance Department. He has spent most of his 20-year career in the Internet field. During his time in the Baidu Operations and Maintenance Department, his team members called him "Commander Chen" due to his excellent leadership style. Today we invite "Commander Chen" to talk about his views.
This is the 5th issue of the down-to-earth and high-level "Operation and Maintenance Forum" , let’s start!
Q: You joined Baidu very early, and later became independent with Du Xiaoman. We understand that there are many employees around you who are actually very long-term employees. Time has been following you, and you have experienced many business operation and maintenance tests. I believe everyone is very interested in how to unite a group of people to continue in this hard-working position of operation and maintenance. I would like to hear your thoughts.
Answer: I understand that you are complimenting me, and I am deeply grateful. In 2000, I started my career by doing computer training, and then worked in a state-owned enterprise for 3 years. In 2004, I started my Internet-related career in Beijing. Looking back on my more than 20 years of professional experience, many teams were built from scratch. Therefore, there should be more than a thousand colleagues who have worked in the operation and maintenance department. There are also 300-400 brothers who have fought several tough battles with me. In 2018, I worked in Xiaoman , once again formed the current team from scratch, and it has been till today. In fact, it is painful and sad every time to leave the original team and classmates to form a new team from scratch. But I see that many of my former colleagues are now in very good working and living conditions. Some of them have successfully challenged the limits of the industry after leaving my team. Of course, they earn more than me. I am also happy for them in my heart. If I were to talk about the characteristics of leading a team, I would summarize three points:Q: Many people think that engineers are worthless if they don’t write code. What do you think about this issue? Do you have any advice on how engineers who don’t write code can continue to improve themselves?
Answer: This topic can refer to military management. Everyone gives me a nickname "Commander". This may be related to my frequent use of military methods as a reference in my work. In my opinion, this question is the same as whether soldiers should go to the battlefield and shoot: soldiers must know how to use basic weapons, and it is best to have regular exercise. Of course, not all soldiers can win the war by fighting with weapons. We fight for logistical supplies, we fight for the advancement of weapons, and we fight for justice. Those who do logistics, weapons research, or propaganda are all an indispensable part of the war, but no matter what position they are in, they are all Job responsibilities should be fulfilled to the extreme, and the rest should be left to the war commander. So coming back to this issue, I understand that engineers must first understand the positioning of their position in the company, and then combine it with their own positioning to try to match the two. If they do not match, it is better to change to the matching position.
Question: You have experienced the development and ups and downs of many businesses, large and small, at Baidu and Du Xiaoman. Do you think there are any differences in the concepts and methods of business operation and maintenance at different stages and sizes? difference? Are there some principled methodologies to guide decision-making?
Answer: This is a good question. The difficulties encountered by different volumes of work are completely different. The difficulties faced by maintaining 10,000 machines are completely different from the difficulties faced by maintaining 100 machines.
When maintaining 100 machines, we may not need a tool that can quickly detect machine faults and automatically repair them, because according to the industry's machine failure rate, it can be done manually, and people will think it is just right, neither It’s tiring and we have things to do; but when maintaining 10,000 machines, if we only rely on manual labor, we will be too busy to inspect each machine. Coupled with coordinating maintenance time with suppliers and business operations, we will be so busy that we forget to eat. So my advice is that if you want to have a good balance between life and work, a small company is good. If you want to improve your technical capabilities and vision, you must go to large-scale and large-scale traffic, so that you can train yourself.
Let’s talk about another topic. Businesses have different business goals at different stages of development, and the corresponding operation and maintenance concepts and methods are also very different. Many companies will do well if they can survive in the early stage. They will hope to deploy and go online quickly, because the business must compete for the market, and they can only continue to develop if they survive first, so they rarely consider long-term planning. At this time, operation and maintenance came up and told the boss that we should consider business growth in the next ten years and build infrastructure based on business growth needs. This is unrealistic. But if a business already has millions or even tens of millions of core users, then there is a high probability that the business will focus on the end-user experience. At this time, the operation and maintenance must design the entire underlying architecture and facilities around the end-user experience. All improvements User experience work will receive support from the boss. Of course, the boss will also pay attention to other issues such as the cost of input and output, whether it is sustainable (the ratio of business growth rate and resource input). It should also be noted that there are huge differences between different industries. For example, there are huge differences between finance and the Internet.
In summary, it can be summarized as: Technology serves the business. All technologies that can help business development will be supported by resources. No matter what the work is, it needs to start from the perspective of "how to make the company better." "Thinking from this perspective, only if the company is good can you be good, and only if your team is good can you be good.
Q: Do you think there are any common practices in the operation and maintenance industry that are actually wrong? Why?
Answer: I have not thought deeply about what the industry is doing wrong. Each company has its own practical problems, so it is difficult to evaluate.
However, one thing I would like to mention is that I have never limited myself to operation and maintenance work. Operation and maintenance is an area that I am good at, and it is the basis for helping the company maintain the basic connection experience of users, but I usually prefer Pay attention to what the company's business urgently needs now? What do the company’s core users need? We give priority to whatever they need, because from my perspective, when it comes to ensuring stable service, every company owes a lot of debt and needs to pay it back slowly.
Question: Some of the hot technical directions at the moment include FinOps, observability, chatGPT, etc. What do you think of the development of these technical directions? Is it a hype concept or has real value? What response measures should operation and maintenance personnel take?
Answer: I personally think these directions are very good. If you only talk about them in words, it is just speculating on concepts. Only by actually implementing them can you achieve advanced productivity. These contents have achieved good results on Baidu in the past, and may be easier to implement in a large environment because the corresponding amount of data and talent depth will be more sufficient. But if someone only has 100 machines and still talks about FinOps, it may be a hype concept, and the same applies to other things.
Q: With the development of cloud, the traditional operation and maintenance positions that only do Ops will disappear in the long run. Do you agree with this view? Do you have any suggestions for the transformation path of such friends?
Answer: Operation and maintenance positions will not disappear, and the demand will become more and more important, but you really need to think carefully about whether people will do it.
In a software project, operation and maintenance is a very critical link, but whether this link is done by people or machines depends on the development of technology, just like sweeping the streets mentioned above, as long as there is The streets are here and there are people living there. The demand for street sweeping will not disappear and is very strong, but the replacement may be unmanned machines, which have now been gradually replaced by road sweepers driven by people. We must be aware of this, and we must also be aware of another point. Operation and maintenance is an extremely complex matter. It is far more complicated than road sweeping. From the maturity process of cloud services over so many years, you can feel that this is a long process. process, I would rather suggest that this process of operation and maintenance revolutionizes its own life, led and designed by the operation and maintenance itself, and eventually we will become the owners of the "operation and maintenance" product.
Q: Many friends complain about unfair company performance ratings on Maimai. Do you have any suggestions for them? In addition, as a manager, can you share how you design the performance appraisal mechanism?
Answer: This topic is relatively sensitive, and it is also a topic that operation and maintenance students are looking forward to discussing. Therefore, the following opinions are only my personal career experience and do not represent the views of any company.
The following is my personal perception. Performance is earned by yourself. Whether your performance is good or not depends on how much outstanding performance contribution you have brought to the company and what qualitative changes you have made in your own work through your own efforts. , performance is usually ranked relatively, so it is relatively fair, and it is difficult to achieve absolute fairness.
When we talk about performance, we might as well put ourselves in the shoes of the company’s bosses. One of them makes money for the company, and the other spends money to maintain the basic user experience for the company. Only by making more money can we give Everyone gets paid, so the results are obvious.
Of course, this is also related to the different hardships that everyone suffers. Some people say that there are five kinds of hardships in life. The first is physical hardship, which emphasizes working overtime. Many traditional operation and maintenance jobs can suffer from this hardship; the second is physical hardship. The second type is the pain of thinking, which requires the thoroughness of your layout and the precision of your work; the third type is the pain of enduring loneliness, which requires a person to continuously learn a lot of knowledge silently. When others are eating, drinking and having fun, he I have spent a lot of time constantly learning new knowledge; the fourth type is the pain of dignity. In order to accompany customers, I don’t have to show my old face, and serve everyone I see as if they were my ancestors; the fifth type is for everyone to take a guess. . Don't say that you can endure any kind of hardship. Different roles have different hardships. Having a good mentality is the foundation of good health.
Finally, I wish everyone can achieve good performance through their own efforts. The above opinions are just my personal experience and do not represent any company.
This public account talks about SRE related topics in all aspects. The person in charge is Qin Xiaohui, the founder and developer of Open-Falcon and Nightingale, Geek Time "Operation and Maintenance Monitoring System Practical Notes》Author, partner of Kuaimao Nebula (the entrepreneurial direction is unified monitoring and stability guarantee, please contact me for communication if you have any needs).
The above is the detailed content of Du Xiaoman and Chen Cunli: 20-year-old 'commander' talks about operation and maintenance, performance and growth. For more information, please follow other related articles on the PHP Chinese website!