What is observability? Everything a beginner needs to know
#The term observability comes from the field of engineering and has become increasingly popular in the field of software development in recent years. Simply put, observability is the ability to understand the internal state of a system based on external outputs. IBM defines observability as:
## Generally, observability refers to the degree to which the internal state or condition of a complex system can be understood based on knowledge of its external output. The more observable the system is, the faster and more accurate the process of locating the root cause of a performance issue can be without the need for additional testing or coding.
In cloud computing, observability also refers to the software tools and practices for aggregating, correlating, and analyzing data from distributed application systems and the infrastructure that supports their operation in order to analyze Monitor, troubleshoot, and debug your application systems more effectively to achieve customer experience optimization, service level agreements (SLAs), and other business goals.
# As IT architecture becomes more complex, system management and troubleshooting also become more complex. In many scenarios, traditional approaches are no longer sufficient to ensure optimal performance. Observability is often considered a derivative of monitoring. Monitoring often involves tracking a specific set of metrics, such as CPU usage or network traffic, and raising alerts when those metrics exceed thresholds. Monitoring has certain limitations, whereas observability involves collecting and analyzing a wider range of data, providing a more comprehensive view of system behavior.
In software development, observability refers to the ability to understand application behavior and performance based on the data generated by the application, including logs, metrics, traces and other data. By analyzing this data, developers can understand how their application is performing and identify areas for improvement.
Observability CasePlatform security is a practical application case of observability.
Platform security teams receive large amounts of data in multiple formats from multiple sources. Analyzing messy, low-quality data slows down the ability to detect vulnerabilities, find new threats, and respond when breaches occur. In addition, with the deployment of multiple security tools, there is also the problem of being unable to share information between different security tools.
The solution is to define observability filters to identify potential security threats and improve the quality of incoming data to be analyzed. The next step is to enrich the data with supporting data from external databases to help analyze and identify security threats. Everything from DNS information to IP addresses to user identifiers can be added.
Benefits of ObservabilityOne of the major benefits of observability is that it helps developers quickly identify and troubleshoot problems with their applications. By analyzing the telemetry data generated by an application, developers can understand how it performs and identify directions in which performance can be improved. This helps reduce downtime and improves the overall user experience.
With automation, the timeliness and accuracy of monitoring and control will be improved. At the same time, it will help you reduce overall monitoring and maintenance costs.
Pillars of ObservabilityObservability is generally considered to be built on three pillars:
LogMany processes can create logs of their activities. Generally they are useful for observability, but in some cases need to be adjusted to increase the level of detail displayed in the logs to be useful.
TrackingLogs are very useful, but forward and backward tracing are also necessary to see why an event occurred and its consequences.
MetricsMetrics are how we measure anomalies and trigger corrective action if necessary. Simply put, you need to know the normal state and detect deviations from the normal state. So having indicators that define normal status is a must.
Implementation of ObservabilityObservability can also be implemented using some older tools, but they have some limitations in applicability and coverage. Achieving observability requires a toolbox of techniques and tools itself, covering the three pillars of observability: logs, traces, and metrics.
These tools allow managers, monitors, and developers to collect and analyze data from a variety of sources, including application code, infrastructure, and user behavior. By using these tools together, system administrators can gain a complete view of the behavior and performance of an entire system or a single system, which can help them identify and resolve problems more accurately and quickly.
Instrumentation
The first step is to deploy tools that measure the performance of the entire system or individual systems. These tools need to cover logs, metrics, and traces to collect data about system behavior and performance. Connecting network management and control systems improves observability.
Collect
After you install the dashboard, you need to collect the data generated by the system. Tools such as logging frameworks, metric collection systems, and tracing libraries can be used to collect data.
You need to review the data provided by each tool and determine which data is stored, safely ignored, or discarded.
Storage
Defining how to store your phone’s data is the next step. Storing data in a centralized location, such as a database or data lake, makes it easier to query or analyze the data later. Cloud storage is very useful in this regard. Many businesses use classification systems where new data is immediately available, while historical data remains in an online repository for some time. Automatic retrieval systems can access older data saved offline.
Regular backup of data is part of daily operating procedures. How you define the demarcation point between immediate, online, and offline storage will vary based on business needs.
Analysis
Next you can start analyzing the collected data to understand the behavior and performance of your system. The analysis process involves the use of tools such as dashboards, alerting systems, and machine learning models.
You can instantly analyze your data to identify and manage changes in usage, such as observing the impact of marketing campaigns on your e-commerce application. You can also analyze historical trends. For example, the peak carpet-buying season in the Northern Hemisphere is usually in the fall, around early October. Historical analysis will reveal similar patterns in the business.
Visualization
Visualization is the key point. Presenting data comes in various forms such as charts and graphs. Visualization helps identify trends and patterns in system behavior. There are many visualization tools, even Microsoft Excel can complete this process.
Overall, achieving observability requires a combination of tools, processes, and best practices that allow you to understand the behavior and performance of your system at both a holistic and granular level. This helps corporate and departmental decision-makers identify and resolve problems faster.
Finally
Observability is a powerful concept that can help developers gain insights into the behavior and performance of their applications. By collecting and analyzing telemetry data, developers can quickly identify and resolve issues, improving the overall user experience and reducing downtime.
The above is the detailed content of What is observability? Everything a beginner needs to know. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

1. Introduction to SpringBootActuator endpoint 1.1 What is Actuator endpoint SpringBootActuator is a sub-project used to monitor and manage SpringBoot applications. It provides a series of built-in endpoints (Endpoints) that can be used to view the status, operation status and operation indicators of the application. Actuator endpoints can be exposed to external systems in HTTP, JMX or other forms to facilitate operation and maintenance personnel to monitor, diagnose and manage applications. 1.2 The role and function of the endpoint The Actuator endpoint is mainly used to implement the following functions: providing health check of the application, including database connection, caching,

Once upon a time, when I was a fresh graduate majoring in computer science, I browsed many job postings on recruitment websites. I was confused by the dazzling technical positions: R&D engineer, operation and maintenance engineer, test engineer... During college, my professional courses were so-so, not to mention having any technical vision, and I had no clear ideas about which technical direction to pursue. Until a senior student said to me: "Do operation and maintenance. You don't have to write code every day to do operation and maintenance. You just need to be able to play Liunx! It's much easier than doing development!" I chose to believe... I have been in the industry for more than ten years , I have suffered a lot, shouldered a lot of blame, killed servers, and experienced department layoffs. If someone tells me now that operation and maintenance is easier than development, then I will

With the rapid development of the Internet, the complexity of enterprise-level applications is increasing day by day. In response to this situation, the microservice architecture came into being. With its modularity, independent deployment, and high scalability, it has become the first choice for enterprise-level application development today. As an excellent microservice architecture, Spring Cloud has shown great advantages in practical applications. This article will introduce the deployment and operation and maintenance of SpringCloud microservice architecture. 1. Deploy SpringCloud microservice architecture SpringCloud

Before the holidays, I collaborated with the PG China community to conduct an online live broadcast on how to use D-SMART to operate and maintain the PG database. It happened that one of my clients in the financial industry listened to my introduction and called over to chat. They are selecting database Xinchuang and have tried several domestic databases. Finally, they are going to choose TDSQL. I felt a little surprised at the time. They had been selecting domestic databases since 2020, but it seemed that the initial experience after using TDSQL was not very good. Later, after communication, I learned that they had just started using TDSQL's distributed database and found that the research and development requirements were too high, so they all chose TDSQL's centralized MYSQL instance. After using it, they found that it was very easy to use. The entire database cloud

Don’t learn golang for operation and maintenance. The reasons are: 1. Golang is mainly used to develop applications with high performance and concurrent performance requirements; 2. The tools and scripting languages commonly used by operation and maintenance engineers can already meet most management and Maintenance requirements; 3. Learning golang requires a certain programming foundation and experience; 4. The main goal of the operation and maintenance engineer is to ensure the stability and high availability of the system, not to develop applications.

Through interviews and submissions, veterans in the field of operation and maintenance are invited to provide profound insights and collide together, with a view to forming some advanced consensus and promoting the industry to move forward better. In this issue, we invite Zou Yi, the operation and maintenance director of Tuyou Games. Mr. Zou often jokingly calls himself the operation and maintenance representative of the world's top 5 million companies. It can be seen that in his heart, he feels that the operation and maintenance construction ideas of small and medium-sized companies are different from those of large enterprises. There are differences. Today we have a few questions and ask Mr. Zou to share his journey of integrating research and operations for small and medium-sized companies. This is the 6th issue of the down-to-earth and high-level "Operation and Maintenance Forum", starting now! Question Preview Tuyou is a game company. What do you think are the unique features of game operation and maintenance? What are the biggest operational challenges you face? How did you solve these challenges? Game operation and maintenance people

The term observability originates from the engineering field and has become increasingly popular in the software development field in recent years. Simply put, observability is the ability to understand the internal state of a system based on external outputs. IBM defines observability as: Generally, observability refers to the degree to which the internal state or condition of a complex system can be understood based on knowledge of its external output. The more observable the system is, the faster and more accurate the process of locating the root cause of a performance issue can be without the need for additional testing or coding. In cloud computing, observability also refers to software tools and practices that aggregate, correlate, and analyze data from distributed application systems and the infrastructure that supports their operation in order to more effectively monitor, troubleshoot, and debug application systems. , thereby achieving customer experience optimization and service level agreement

This article is an article by Uber engineer Gergely Orosz. The original address is: https://blog.pragmaticengineer.com/operating-a-high-scale-distributed-system/ In the past few years, I have been building and operating a large-scale Distributed systems: Uber’s payment system. During this period, I learned a lot about distributed architecture concepts and witnessed first-hand the challenges of running high-load and high-availability systems (a system is far from finished when it is developed, and the challenges of running it online are actually even greater). Building the system itself is an interesting endeavor. How planning systems handle 10x/100
