


Having worked in operation and maintenance for more than ten years, there have been countless moments when I felt like I was still a novice...
Once upon a time, when I was a fresh graduate majoring in computer science, I browsed many job postings on recruitment websites. I was confused by the dazzling technical positions: R&D engineer, operation and maintenance engineer, and test engineer. ..
During college, my professional courses were so-so, not to mention having any technical vision, and I had no clear ideas about which technical direction to pursue.
Until a senior said to me: "Do operation and maintenance. You don't have to write code every day to do operation and maintenance. You just need to be able to play Liunx! It's much easier than doing development!"
I chose I believe...
I have been in the industry for more than ten years. I have suffered a lot, took a lot of blame, killed servers, and experienced department layoffs. If someone tells me now that I want to do operations and maintenance than development Simple, then I will block him without hesitation...
Basic operation and maintenance work is very simple, but the biggest feature of operation and maintenance work is that it is complicated
In my opinion, operation and maintenance work may be one of the most complex technical jobs, requiring handling a large number of technical details, integration and configuration of different platforms, and solving various complex problems and failures. Therefore, operation and maintenance personnel are required to have a wide range of skills and knowledge to cope with changing technical and business needs:
Operation and maintenance often face complex platform operation and maintenance work. The reason is that what enterprises usually need to manage and monitor is not a single platform and system, but much more complex. These systems may come from different vendors and use different protocols and technologies, including servers, storage, networks, applications, etc.
Complex configuration management is also one of the difficulties in operation and maintenance work. Configuration management involves a large number of tasks such as system installation, configuration updates, software installation and updates, etc. These tasks need to be coordinated and executed throughout the system.
The management of large-scale clusters is also not simple. Large enterprises need to manage thousands of servers, which requires powerful tools and automation technology. Operations staff need automated tools to manage configuration, updates, monitoring and reporting.
Operation and maintenance security issues cannot be ignored either. Operations and maintenance personnel need to protect the company's assets and data and ensure the security of the system. This may include firewalls, intrusion detection systems, security patch management, etc.
Operation and maintenance also require rich troubleshooting experience. Faults are common problems in operation and maintenance work. When a problem occurs in the system, operation and maintenance personnel need to quickly locate the fault and take measures to restore services.
Continuous learning is the most basic requirement for operation and maintenance personnel. The rapid evolution of operation and maintenance tools and technologies is exaggerated. IT technology is constantly developing, new technologies and tools are constantly emerging, and operation and maintenance personnel need to constantly learn and update knowledge to keep up with the rapid evolution of technology.
Operation and maintenance is a high-risk profession. The life of operation and maintenance who has never killed a server is not perfect?
If we talk about high-risk occupations, operation and maintenance can definitely be counted as one. Even in many large companies, downtime accidents caused by manual operations of operation and maintenance often occur:
Pacific Petroleum Company cyber attack ( 2021): In May 2021, the U.S. Pacific Oil Company suffered a ransomware attack, causing the company's network and servers to malfunction and shut down. According to reports, the incident was caused by an employee accidentally opening a malicious link.
GitLab Outage (2017): In January 2017, code hosting service provider GitLab experienced a serious data loss incident, resulting in many customers' data being permanently deleted. According to a later official statement from GitLab, this was caused by an employee accidentally deleting a file in a production database.
Walmart Server Outage (2019): In November 2019, the servers of the American retail giant Walmart went down multiple times within an hour, causing the company’s website, applications, and payment systems to not work properly. The incident was reportedly caused by an error made by an employee while performing routine server maintenance.
Microsoft Azure cloud service outage (2020): In September 2020, Microsoft's Azure cloud service experienced a global outage, causing many customers' applications and services to not work properly. It was later confirmed that the incident was due to a network configuration error.
Operation and maintenance may also face various force majeure, even natural disasters
Philippine Typhoon (2013): In November 2013, the Philippines encountered a severe Typhoon, the strongest typhoon to hit the Philippines since 1947. The typhoon left more than 6,000 people dead and missing and wreaked havoc on the country's infrastructure. The disaster also caused the outage of data centers and servers in the Philippines for many international businesses.
U.S. Hurricane (2012): In October 2012, the East Coast of the United States encountered a severe hurricane. The disaster caused large-scale power outages, communication interruptions, and flooding. The disaster also caused data center and server outages for some well-known companies and service providers, including Amazon, Google, and Netflix.
The career development direction is unclear, and operation and maintenance work often falls into confusion in the workplace...
Lack of hard skills may be the biggest problem faced by operation and maintenance people. As technology continues to advance, operation and maintenance work requires continuous learning of new skills and tools to adapt to changing market demands. However, for some people who have been working in operations and maintenance for many years, they may find that their skills have fallen behind the market demand, which can make them feel confused and overwhelmed.
The poor environment is really not caused by operation and maintenance. Compared with other technical fields, the career development path in the operation and maintenance field is relatively vague. In some organizations, operation and maintenance engineers are often regarded only as the "logistics department" and lack equal status and treatment with other technical teams. For example, they cannot receive due recognition and rewards. This aggravates the negative emotions of operation and maintenance, which to a certain extent causes operation and maintenance engineers to be unclear about their career development prospects.
I just walk with my head down and have no time to look up at the sky. The essence of operation and maintenance work is to ensure the stability and reliability of the system, so operation and maintenance engineers must maintain a high degree of vigilance and concentration at all times. This can lead to a very stressful job for them, especially when faced with system failures or emergencies. Tired of dealing with the hustle and bustle of life, I have no time to think about the future of career development.
So we often think about how to develop our operation and maintenance career better?
The book "Vision" written by Brian Featherstone Howe describes the general development law of career. The principles mentioned in it may give us the answer:
Have a mindset of the next 45 years. If you plan for a longer time span, such as 45 years, you will not care about the gains and losses of one city or one pool at the moment. And if you have a clear career plan, it is easier to overcome difficulties and persevere.
What we have to do is to clarify the development path of operation and maintenance technology, so as to achieve the ultimate in a segmented technology field
Transformation to DevOps: I don’t know when, a trend began to become popular in the technology circle The so-called "DevOps is dead" argument. However, DevOps is by no means simply asking development to do operation and maintenance, leaving operation and maintenance with nowhere to go.
Operation and maintenance work is already difficult, so stop creating panic for us.
The necessary components of real DevOps should be an internal DevOps platform and a dedicated team to maintain the internal platform, rather than a bunch of scattered open source tools that programmers need to handle themselves, or let developers do operations and maintenance. live. A true DevOps team should closely unite development and operation and maintenance, share responsibilities, and collaboratively improve IT performance to empower the business.
The transformation from operation and maintenance to DevOps requires operation and maintenance personnel to master some key tools and technologies, such as continuous integration, continuous delivery, automated testing, containerization, etc. At the same time, the DevOps team should introduce agile development, iterative development and continuous development. Delivery and other methods. In an enterprise that has established a complete DevOps culture, operation and maintenance transformation to DevOps work is a very good development path.
Transformation to AIOps: Similarly, AIOps has always been a good career development path for operation and maintenance. AIOps can help IT operations and maintenance personnel automate some routine, tedious, and low-value operations, such as log analysis, troubleshooting, etc., thus freeing up more time and energy to solve more complex problems.
At the same time, operation and maintenance work involves many aspects, including infrastructure management, application deployment, monitoring, troubleshooting, etc. These tasks require the professional knowledge and experience of human operation and maintenance personnel.
AIOps technology can improve the efficiency and accuracy of IT operations, but it will not completely replace the work of human operations personnel. Instead, they work together to make the entire IT operations team more efficient and productive.
Transformation to SRE: Continuously learn software development skills, master automation tools, testing, deployment and monitoring practices in DevOps. To learn cloud computing and container technology, SREs need to understand cloud computing platforms and container technologies, and master basic cloud services and container management tools, such as AWS, Docker, Kubernetes, etc. Master data analysis skills while building an SRE culture within the organization, such as core concepts such as reliability, automation, and a culture of experimentation.
The above is the detailed content of Having worked in operation and maintenance for more than ten years, there have been countless moments when I felt like I was still a novice.... For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

1. Introduction to SpringBootActuator endpoint 1.1 What is Actuator endpoint SpringBootActuator is a sub-project used to monitor and manage SpringBoot applications. It provides a series of built-in endpoints (Endpoints) that can be used to view the status, operation status and operation indicators of the application. Actuator endpoints can be exposed to external systems in HTTP, JMX or other forms to facilitate operation and maintenance personnel to monitor, diagnose and manage applications. 1.2 The role and function of the endpoint The Actuator endpoint is mainly used to implement the following functions: providing health check of the application, including database connection, caching,

Artificial intelligence (AI) has changed the game in many industries, enabling businesses to improve efficiency, decision-making and customer experience. As AI continues to evolve and become more complex, it is critical that enterprises invest in the right infrastructure to support its development and deployment. A key aspect of this infrastructure is collaboration between IT and data science teams, as both play a critical role in ensuring the success of AI initiatives. The rapid development of artificial intelligence has led to increasing demands for computing power, storage and network capabilities. This demand puts pressure on traditional IT infrastructure, which was not designed to handle the complex and resource-intensive workloads required by AI. As a result, enterprises are now looking to build systems that can support AI workloads.

Once upon a time, when I was a fresh graduate majoring in computer science, I browsed many job postings on recruitment websites. I was confused by the dazzling technical positions: R&D engineer, operation and maintenance engineer, test engineer... During college, my professional courses were so-so, not to mention having any technical vision, and I had no clear ideas about which technical direction to pursue. Until a senior student said to me: "Do operation and maintenance. You don't have to write code every day to do operation and maintenance. You just need to be able to play Liunx! It's much easier than doing development!" I chose to believe... I have been in the industry for more than ten years , I have suffered a lot, shouldered a lot of blame, killed servers, and experienced department layoffs. If someone tells me now that operation and maintenance is easier than development, then I will

Don’t learn golang for operation and maintenance. The reasons are: 1. Golang is mainly used to develop applications with high performance and concurrent performance requirements; 2. The tools and scripting languages commonly used by operation and maintenance engineers can already meet most management and Maintenance requirements; 3. Learning golang requires a certain programming foundation and experience; 4. The main goal of the operation and maintenance engineer is to ensure the stability and high availability of the system, not to develop applications.

With the rapid development of the Internet, the complexity of enterprise-level applications is increasing day by day. In response to this situation, the microservice architecture came into being. With its modularity, independent deployment, and high scalability, it has become the first choice for enterprise-level application development today. As an excellent microservice architecture, Spring Cloud has shown great advantages in practical applications. This article will introduce the deployment and operation and maintenance of SpringCloud microservice architecture. 1. Deploy SpringCloud microservice architecture SpringCloud

Through interviews and submissions, veterans in the field of operation and maintenance are invited to provide profound insights and collide together, with a view to forming some advanced consensus and promoting the industry to move forward better. In this issue, we invite Zou Yi, the operation and maintenance director of Tuyou Games. Mr. Zou often jokingly calls himself the operation and maintenance representative of the world's top 5 million companies. It can be seen that in his heart, he feels that the operation and maintenance construction ideas of small and medium-sized companies are different from those of large enterprises. There are differences. Today we have a few questions and ask Mr. Zou to share his journey of integrating research and operations for small and medium-sized companies. This is the 6th issue of the down-to-earth and high-level "Operation and Maintenance Forum", starting now! Question Preview Tuyou is a game company. What do you think are the unique features of game operation and maintenance? What are the biggest operational challenges you face? How did you solve these challenges? Game operation and maintenance people

Before the holidays, I collaborated with the PG China community to conduct an online live broadcast on how to use D-SMART to operate and maintain the PG database. It happened that one of my clients in the financial industry listened to my introduction and called over to chat. They are selecting database Xinchuang and have tried several domestic databases. Finally, they are going to choose TDSQL. I felt a little surprised at the time. They had been selecting domestic databases since 2020, but it seemed that the initial experience after using TDSQL was not very good. Later, after communication, I learned that they had just started using TDSQL's distributed database and found that the research and development requirements were too high, so they all chose TDSQL's centralized MYSQL instance. After using it, they found that it was very easy to use. The entire database cloud

This article is an article by Uber engineer Gergely Orosz. The original address is: https://blog.pragmaticengineer.com/operating-a-high-scale-distributed-system/ In the past few years, I have been building and operating a large-scale Distributed systems: Uber’s payment system. During this period, I learned a lot about distributed architecture concepts and witnessed first-hand the challenges of running high-load and high-availability systems (a system is far from finished when it is developed, and the challenges of running it online are actually even greater). Building the system itself is an interesting endeavor. How planning systems handle 10x/100
