Home System Tutorial LINUX In-depth exploration of the knowledge system in the field of surveillance

In-depth exploration of the knowledge system in the field of surveillance

Jan 01, 2024 pm 07:17 PM
linux linux tutorial Red Hat linux system linux command linux certification red hat linux linux video

Introduction Monitoring is the most important part of the entire operation and maintenance and even the entire product life cycle. It provides timely warnings to detect faults beforehand, and provides detailed data afterwards for tracing and locating problems. There are many good open source products in the industry to choose from. Choosing an open source monitoring system is a time-saving, labor-saving and most efficient solution. Of course, friends who don’t know much about monitoring may have a deeper understanding of the entire monitoring system after reading the following article.
1. Monitoring target

Let’s first understand what monitoring is, the importance of monitoring, and the goals of monitoring. Of course, everyone is in a different industry, company, business, position, and has a different understanding of monitoring. But we need to pay attention to monitoring. It needs to be considered from the company's business perspective, rather than the use of a certain monitoring technology.

Uninterrupted real-time monitoring of the system: In fact, it is uninterrupted real-time monitoring of the system (this is monitoring);

Real-time feedback on the current status of the system: When we monitor a certain hardware or a certain system, we need to be able to see the status of the current system in real time, whether it is normal, abnormal, or faulty;

Ensure service reliability and security: The purpose of our monitoring is to ensure the normal operation of systems, services, and businesses;

Ensure the continuous and stable operation of the business: If our monitoring is perfect, even if a fault occurs, we can receive the fault alarm as soon as possible and handle it as soon as possible, thereby ensuring the continuous and stable operation of the business;

In-depth exploration of the knowledge system in the field of surveillance

2. Monitoring method

Now that we understand the importance of monitoring and the purpose of monitoring, we need to understand the methods of monitoring.

Understand the monitoring objects: Do you understand the objects we want to monitor? For example, how does the CPU work?

Performance benchmark indicators: What properties of this thing do we want to monitor? For example, CPU usage, load, user mode, kernel mode, and context switching.

Alarm threshold definition: What is considered a fault and requires an alarm? For example, what is the load of the CPU that is considered high? How much of the user mode and kernel mode are running respectively?

Troubleshooting process: After receiving a fault alarm, how do we deal with it? Is there any more efficient process?

3. Monitoring core

We have learned about the monitoring methods, monitoring objects, performance indicators, alarm threshold definitions, and the steps of the fault handling process. Of course, we need to know what is the core of monitoring?

Discover the problem: When the system fails and alarms, we will receive the fault alarm information;

Positioning problem: Failure emails usually write about a certain host failure and the specific failure content. We need to analyze the alarm content. For example, if a server cannot be connected: we need to consider whether it is a network problem or too high a load. If the connection cannot be made for a long time, or a certain development triggers firewall prohibition related policies, etc., we need to analyze the specific cause of the failure;

Solve the problem: Of course, after we understand the cause of the fault, we need to solve the fault according to the priority of fault resolution;

Summary problem: After we solve the major fault, we need to summarize the cause and prevention of the fault to avoid recurrence in the future;

4. Monitoring tools

Next we need to choose a monitoring tool that is suitable for the company's business. Here I have briefly classified the monitoring tools

Old monitoring tools:

MRTG (Multi Route Trffic Grapher) is a set of software that can be used to draw network traffic graphs. It was developed by Tobias Oetiker and Dave Rand in Olten, Switzerland, and is licensed under the GPL. The best version of MRTG was launched in 1995. It is written in perl language and can be used across platforms. The SNMP protocol is used for data collection. MRTG draws the collected data through the Web page to draw images in GIF or PNG format.

Ganglia is a cross-platform, scalable, high-performance distributed monitoring system such as clusters and grids. It is based on a layered design, uses a wide range of technologies, and uses RRDtool to store data. It has a visual interface and is suitable for automated monitoring of cluster systems. Its carefully designed data structure and algorithm make the connection overhead from the monitoring end to the monitored end very low. Thousands of clusters are currently using this monitoring system, which can easily handle a cluster environment of 2,000 nodes.

Cacti (meaning cactus in English) is a set of network traffic monitoring graphical analysis tools developed based on PHP, MySQL, SNMP and RRDtool. It obtains data through snmpget and uses RRDtool for drawing, but users do not need to understand the complex parameters of RRDtool. . It provides very powerful data and user management functions. Each user can be designated to view the tree structure, host device and any picture. It can also be combined with LDAP for user authentication, and can also customize templates. In terms of historical data display and monitoring, its function is quite good.

Cacti makes the monitoring of different devices reusable by adding templates, has customizable drawing functions, and has powerful computing capabilities (data overlay function)

Nagios is an enterprise-level monitoring system that can monitor the running status and network information of services, monitor the status of specified local or remote hosts and services, and provide abnormal alarm notification functions.

Nagios runs on Linux and UNIX platforms. At the same time, a web interface is provided to facilitate system administrators to view network status, various system problems, and system-related logs.

The function of Nagios focuses on monitoring the availability of services and can trigger alarms based on the status of monitoring indicators.

At present, Nagios also occupies a certain market share. However, Nagios has not kept pace with the times and can no longer meet the changing monitoring needs. The scalability of the architecture and the ease of use need to be enhanced. Its advanced functions are integrated in business version of Nagios XI.

Smokeping is mainly used to monitor network performance, including regular ping, www server performance, DNS query performance, SSH performance, etc. The bottom layer is also supported by RRDtool. It is characterized by very beautiful drawings. Network packet loss and delay are marked with colors and shadows. It supports stacking multiple pictures together. Its author has also developed tools such as MRTG and RRDtll.

Smokeping’s website is: http://tobi.oetiker.cn/hp

OpenTSDB, an open source monitoring system, uses Hbase to store all time series (no sampling required) data to build a distributed, scalable time series database. It supports second-level data collection, supports permanent storage, can do capacity planning, and can be easily integrated into existing alarm systems.

OpenTSDB can obtain corresponding collection indicators from large-scale clusters (including network devices, operating systems, and applications in the cluster), and store, index, and serve them, making these data easier to understand, such as Webization, graphics, etc.

Ace monitoring tool:

Zabbix is ​​a distributed monitoring system that supports multiple collection methods and collection clients. It has a dedicated Agent and also supports multiple protocols such as SNMP, IPMI, JMX, Telnet, and SSH. It will store the collected data. to the database, then analyze and organize it, and trigger an alarm when conditions are met. Its flexible scalability and rich functions are unmatched by other monitoring systems. Relatively speaking, its overall functionality is excellent. From the comparison of the above various monitoring systems, Zabbix has advantages, with its rich functions, scalability, secondary development capabilities and simplicity of use. Readers can build their own with just a little study monitoring system.

Xiaomi’s monitoring system: open-falcon. The goal of open-falcon is to make the most open and easy-to-use Internet enterprise-level monitoring product.

Three-party monitoring tools:

There are many good third-party monitoring on the market now, such as: Monitoring Bao, Monitoring Easy, Tingyun, and many cloud vendors have their own monitoring, but we are not going to introduce it here. If you want to learn about third-party monitoring, you can do it by yourself Go to the official website for consultation. (Avoid saying advertising)

5. Monitoring process

So much has been introduced above, so what monitoring tool is the most suitable? I recommend several open source monitoring tools: Zabbix, Open-Falcon, and LEPUS (dedicated to monitoring databases).

But this article is still based on Zabbix to build the entire monitoring system ecosystem.

Then let’s talk about the entire process of Zabbix:

Data collection: Zabbix collects data from the system through SNMP, Agent, ICMP, SSH, IPMI, etc.;

Data storage: Zabbix is ​​stored on MySQL and can also be stored on other database services;

Data analysis: When we need to review and analyze the fault afterwards, Zabbix can provide us with relevant information such as graphics and time, so that we can determine the location of the fault;

Data display: web interface display (mobile APP, java_php can also develop a web interface);

Monitoring and alarming: phone alarm, email alarm, WeChat alarm, SMS alarm, alarm upgrade mechanism, etc. (no matter what alarm is available);

Alarm processing: When receiving an alarm, we need to process it according to the level of the fault, such as: important and urgent, important and not urgent, etc. According to the level of the fault, cooperate with relevant personnel to handle it quickly;

6. Monitoring indicators

We have learned about the monitoring methods, goals, processes, and what tools are available for monitoring. Some people may be wondering, what exactly do we want to monitor? Then I have sorted it out here:

6.1 Hardware Monitoring

In the early days, we used computer room inspections to check the flashing lights of hardware equipment to determine whether they were faulty. This was a waste of manpower and was repetitive and non-technical work. Everyone understands.

Of course we can now monitor the details of the hardware through IPMI and set alarm thresholds for CPU, memory, disk, temperature, fan, voltage, etc. (We can write a reasonable alarm range for the monitoring alarm content by ourselves)

IPMI Monitoring Hardware Service Reference Material

6.2 System Monitoring

Small and medium-sized enterprises are basically all Linux servers, so we must monitor the usage of system resources. System monitoring is the basis of the monitoring system.

Main objects to monitor:

CPU has several important concepts: context switching, run queue and usage.

These are also several key indicators of our CPU monitoring.

Normally, the run queue of each processor should not be higher than 3, the "user mode/kernel mode" ratio of CPU utilization is maintained at 70/30, and the idle state is maintained at 50%. Context switching should be based on the busyness of the system. Let’s consider it comprehensively.

Commonly used tools for CPU include: htop, top, vmstat, mpstat, dstat, glances

Zabbix provides system monitoring template: Zabbix Agent Interface

Memory: Usually we need to monitor the memory usage and SWAP usage. At the same time, we can use zabbix to draw the memory usage curve graph to find a service memory overflow, etc.

Commonly used tools for memory include: free, top, vmstat, glances

Memory usage: IO is divided into disk IO and network IO. In addition to monitoring more detailed data when doing performance tuning, daily monitoring only focuses on disk usage, disk throughput, disk write busyness, and the network also monitors network card traffic.

Commonly used tools include: iostat, iotop, df, iftop, sar, glances

Other system monitoring includes running process ports, number of processes, logged in users, Open File, etc. (see zabbix’s own OS Linux template for details)

6.3 Application Monitoring

After understanding the hardware monitoring and system monitoring, our further operation is to log in to the server to see which services the server is running, and they all need to be monitored.

Application service monitoring is also an important part of the monitoring system, such as: LVS, Haproxy, Docker, Nginx, PHP, Memcached, Redis, MySQL, Rabbitmq, etc. Related services need to be monitored using zabbix

The author has written about the detailed operation process of service monitoring before, so I will not show them one by one here.

Zabbix provides application service monitoring: Zabbix Agent UserParameter
Java monitoring provided by Zabbix: Zabbix JMX Interface
percona provides MySQL database monitoring: percona-monitoring-plulgins

6.4 Network Monitoring

As an e-commerce website targeting users across the country, it is also necessary to keep track of the network status of various places and computer rooms at all times.

Network monitoring is something we must consider when building a monitoring platform, especially for scenarios with multiple computer rooms. The network status between each computer room, the network status in the computer room and across the country are what we need to focus on. So how to master this status information? We need to use the network monitoring tool Smokeping.

Smokeping is the work of Tobi Oetiker, the author of rrdtool. It is written in Perl. It is mainly used to monitor network performance, www server performance, dns query performance, etc. It uses rrdtool for drawing and supports distribution. It can directly collect data from multiple agents. summary.

At the same time, since you have relatively few monitoring points, you can also use many commercial monitoring tools, such as Monitoring Bao, Tingyun, Keynote, Borui, etc. At the same time, these service providers can also help you monitor the status of your CDN.

6.5 Traffic Analysis

Website traffic analysis is a knowledge that must be mastered by operation and maintenance personnel. For example, for an e-commerce company:

Through statistics and analysis of order sources, we can understand whether our advertising investment on a certain website has achieved the expected results.

You can distinguish the number of visitors from different regions and even the transaction volume of goods.

Baidu statistics, Google analytics, webmaster tools, etc., you only need to embed a js in the page.

However, the data is always in the hands of the other party, and personalized customization is inconvenient, so Google released an open source analysis tool called piwik

6.6 Log monitoring

Normally, as the system runs, the operating system will generate system logs, and the application will generate application access logs, error logs, operation logs, and network logs. We can use ELK for log monitoring.

For log monitoring, the most common requirements are collection, storage, query, and display.

The open source community has corresponding open source projects: logstash (collection) elasticsearch (storage search) kibana (display)

We call the combined technology of these three ELK Stack, so ELK Stack refers to the combination of Elasticsearch, Logstash, and Kibana technology stacks.

If log information is collected, if there is an exception in the deployment update, it can be seen immediately on kibana.

Of course, you can also filter error logs through Zabbix to generate alerts.

6.7 Security Monitoring

Although there are many open source security products for Linux, such as four-layer iptables, seven-layer WEB protection, Nginx lua, and WAF, the relevant logs are finally collected into ELK Stack, and different attack types are displayed graphically. But it is always a time-consuming thing, and personally I think the effect is not very good. At this time we can choose to connect to third-party service providers.

Three-party vendors provide comprehensive vulnerability libraries, covering services, backdoors, databases, configuration detection, CGI, SMTP and other types

Comprehensive detection of host and Web application vulnerabilities, combined with independent mining and industry sharing, to update 0day vulnerabilities immediately to eliminate the latest security risks

6.8 API Monitoring

As APIs become more and more important, it is obvious that we also need such data to tell whether the APIs we provide are functioning properly.
Monitor API interface GET, POST, PUT, DELETE, HEAD, OPTIONS requests. Availability, correctness, and response time are the three major performance indicators

6.9 Performance Monitoring

Comprehensive monitoring of web page performance, DNS response time, HTTP connection establishment time, page performance index, response time, availability, element size, etc.
Zabbix provides URL monitoring: Zabbix Web Monitoring

6.10 Business Monitoring

A monitoring platform without business indicator monitoring is not a complete monitoring platform. Usually in our monitoring system, we must monitor our important business indicators and set thresholds for alarm notifications.

For example, e-commerce industry:

How many orders are generated per minute;

How many users are registered per minute;

How many active users are there every day;

How many promotion activities are there every day;

How many users are introduced to the promotion activity;

How much traffic does the promotion bring in;

How much profit does the promotion bring?

Etc. Important indicators can be added to Zabbix and then displayed through screen.

7. Monitoring and alarm

There are many ways to notify fault alarms. Of course, the most commonly used methods are SMS, email, and SMS alarm

8. Alarm handling

How do we deal with faults after a general alarm? First, we can automatically handle it through the alarm upgrade mechanism. For example, if the Nginx service is down, we can set the alarm upgrade to automatically start Nginx. But if a serious failure occurs in a general business, we usually assign different operation and maintenance personnel to handle it according to the level of the failure and the business of the failure. Of course, different business forms, different architectures, and different services may adopt different methods. There is no fixed model that can be applied.

9. Interview monitoring

In operation and maintenance interviews, we are often asked questions related to monitoring. So how to answer this question? I provide you with a simple answer idea for this article.

Hardware monitoring. Monitoring router switches through SNMP (you can communicate with some manufacturers to learn how to do this), server temperature and others, can be achieved through IPMI. Of course, if there is no hardware and everything is in the cloud, just skip this step.

System monitoring. Such as CPU load, context switching, memory usage, disk read and write, disk usage, disk inode usage. Of course, these need to be configured with triggers, because the default setting is too low and will cause frequent alarms.

Service monitoring. For example, the LAMP architecture used by the company, nginx comes with its own Status module, PHP also has related Status, MySQL can be monitored through the percona official tool, and Redis obtains information through its own info for filtering, etc. The methods are similar. Or bring your own service. Either use scripts to implement the content you want to monitor, as well as alarm and graphics functions.

Network Monitoring. If it is a cloud host and it is not across computer rooms, you can choose not to monitor the network. Of course you said we are across computer rooms and so on. It is recommended to use smokeping for network-related monitoring. Or leave it directly to your network engineer, because there are specialties in the industry.

Security Monitoring. If it is a cloud host, you can consider using its own security protection. Of course you can also use iptables. If it is hardware, then it is recommended to use a hardware firewall. Using the cloud, you can purchase anti-DDoS to avoid malfunctions that may cause downtime for a day. If it is a system, then basic solutions such as permissions, passwords, backup, and recovery must be done well. web can also use Nginx Lua to implement a web-level firewall. Of course, you can also use integrated Openresty.

Web monitoring. There are still many topics about web monitoring. For example, you can use the built-in web monitoring to monitor page-related delays, js response time, download time, etc. Here I recommend using professional commercial software, Monitoring Bao or Tingyun to achieve this. After all, there are computer rooms all over the country. (If it is a multi-machine room, let’s talk about it separately)

Log monitoring. If it is the web, you can use to monitor Nginx’s 50x and 40x error logs, and PHP’s ERROR log. In fact, these requirements are nothing more than collection, storage, query, and display. We can actually use the open source ELKstack to achieve this. Logstash (collection), elasticsearch (storage search), kibana (display)
Business monitoring. We have done so much, but in the end we still ensure the operation of the business. Only in this way can the monitoring we do make sense. Therefore, the monitoring at the business level requires meetings and discussions with the development and director to monitor the more important business indicators (which need to be confirmed by a meeting) and then can be implemented through a simple script, and finally set the trigger.

Traffic Analysis. Usually we use a bunch of tools like awk sed xxx to analyze logs. This is not very convenient for us to count IP, PV, and UV. Then you can use Baidu Statistics, Google Statistics, and Commerce to develop embedded codes. In order to avoid privacy, you can also use piwik to do related traffic analysis.

Visualization. Use screen and introduce some third-party libraries to beautify the interface. At the same time, we also need to know that the order volume suddenly increases or decreases. In other words, a large wave of traffic suddenly came. Where did this traffic come from? Was it promoted or was it attacked? The monitoring platform can be combined to sort out the business relationships between various systems.

Automated monitoring. As we have done so much work above, of course we cannot add keys one by one. This can be achieved through Zabbix's active mode and passive mode. Of course it's best to do this via API.
Summarize

If we really want to achieve a more complete monitoring system, the current open source software cannot satisfy it well. Qualified companies have begun to develop their own monitoring systems, such as Xiaomi's open source Open-Falcon. There are also relatively good open source monitoring frameworks such as Sensu, etc., plus influxdb and grafana, which can be used to customize the monitoring platform to suit your own enterprise.

Of course, what I said is still very simple. My experience is limited and my ideas can only provide so much. The above are some of the methods and experiences I share about monitoring. (Old birds please don’t comment)

The above is the detailed content of In-depth exploration of the knowledge system in the field of surveillance. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What computer configuration is required for vscode What computer configuration is required for vscode Apr 15, 2025 pm 09:48 PM

VS Code system requirements: Operating system: Windows 10 and above, macOS 10.12 and above, Linux distribution processor: minimum 1.6 GHz, recommended 2.0 GHz and above memory: minimum 512 MB, recommended 4 GB and above storage space: minimum 250 MB, recommended 1 GB and above other requirements: stable network connection, Xorg/Wayland (Linux)

vscode cannot install extension vscode cannot install extension Apr 15, 2025 pm 07:18 PM

The reasons for the installation of VS Code extensions may be: network instability, insufficient permissions, system compatibility issues, VS Code version is too old, antivirus software or firewall interference. By checking network connections, permissions, log files, updating VS Code, disabling security software, and restarting VS Code or computers, you can gradually troubleshoot and resolve issues.

Can vscode be used for mac Can vscode be used for mac Apr 15, 2025 pm 07:36 PM

VS Code is available on Mac. It has powerful extensions, Git integration, terminal and debugger, and also offers a wealth of setup options. However, for particularly large projects or highly professional development, VS Code may have performance or functional limitations.

What is vscode What is vscode for? What is vscode What is vscode for? Apr 15, 2025 pm 06:45 PM

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages ​​and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

How to use VSCode How to use VSCode Apr 15, 2025 pm 11:21 PM

Visual Studio Code (VSCode) is a cross-platform, open source and free code editor developed by Microsoft. It is known for its lightweight, scalability and support for a wide range of programming languages. To install VSCode, please visit the official website to download and run the installer. When using VSCode, you can create new projects, edit code, debug code, navigate projects, expand VSCode, and manage settings. VSCode is available for Windows, macOS, and Linux, supports multiple programming languages ​​and provides various extensions through Marketplace. Its advantages include lightweight, scalability, extensive language support, rich features and version

What is the main purpose of Linux? What is the main purpose of Linux? Apr 16, 2025 am 12:19 AM

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

How to run java code in notepad How to run java code in notepad Apr 16, 2025 pm 07:39 PM

Although Notepad cannot run Java code directly, it can be achieved by using other tools: using the command line compiler (javac) to generate a bytecode file (filename.class). Use the Java interpreter (java) to interpret bytecode, execute the code, and output the result.

How to check the warehouse address of git How to check the warehouse address of git Apr 17, 2025 pm 01:54 PM

To view the Git repository address, perform the following steps: 1. Open the command line and navigate to the repository directory; 2. Run the "git remote -v" command; 3. View the repository name in the output and its corresponding address.

See all articles