With the continuous upgrading of cloud computing technology, the scale of IT infrastructure carrying business continues to expand, the link relationships between various applications become more and more complex, and a large amount of log data is generated. The collection, storage, analysis and processing methods of log data have become an important indicator of the degree of digitalization of enterprise systems. Traditional IT operation and maintenance solutions also face huge difficulties when facing these challenges. For DevOps, solving a problem may take hours to find, compare, and analyze. You need to review various logs, monitoring data, and other related information to find the root cause of the problem. For SecOps, conducting in-depth analysis in massive amounts of data means they need to quickly dig out root causes and find anomalies from hundreds of terabytes of data. This process is very time-consuming and cumbersome, and may require a large investment of manpower and resources
In order to solve the above problems, a new generation of AIOps solutions needs to be introduced. This solution realizes automation and full-stack data link observability through data fusion analysis, and provides easier-to-use reports and diagnostic rules, making what you see is what you get. With the support of AI technology, anomalies can be automatically detected more efficiently and root causes can be quickly located. AIOps has brought revolutionary changes to operation and maintenance work
SLS automated full-stack implementation of data collection
SLS provides out-of-the-box reports and diagnostic rules
SLS launches an open and compatible data ecosystem
Alibaba Cloud Log Service (SLS) is committed to building efficient and observable operation and maintenance solutions. With many years of operation and maintenance experience and the support of large language models, SLS continues to improve its competitiveness in this field. Recently, SLS released a basic intelligent operation and maintenance model, covering observable data scenarios such as logs, tracking, and indicators, and supporting functions such as anomaly detection, text segmentation annotation, and high-latency analysis of tracking requests. The model provides plug-and-play anomaly detection, automatic annotation, classification and root cause analysis capabilities. In a production environment, it can locate the root cause within seconds within thousands of requests, with an accuracy of over 95%
In addition, SLS provides manual-assisted fine-tuning. On the log service platform, it natively supports annotation feedback for Log, Metric, and Trace, allowing customers to quickly annotate and correct during use to accumulate data sets that meet specific scenarios. Through the platform's annotation capabilities, customers can accumulate high-quality operation and maintenance data labels from scratch, providing unlimited possibilities for future root cause diagnosis model training. In the future, customers can fine-tune models in specific fields for their own annotated data, quickly deploy them, and create private model services. This function supports automatic annotation and manual-assisted fine-tuning, and also supports the correction of manual annotation results. The model is automatically fine-tuned based on manual feedback to improve scene accuracy
SLS becomes an important intelligent assistant by assisting in generating query statements. Released Alibaba Cloud CloudLens Copilot large model to help cloud facility maintenance and operations. Using NL2Query technology based on large language models to accurately understand the user's query intentions and improve the accuracy of query results; there is no need to understand complex SQL language and query syntax, and it can accurately convert natural language queries into SQL queries and visual charts; establish scenario-based Knowledge graph, continuous learning, continuous optimization of model adjustments and knowledge base updates, and continuous improvement of the accuracy and effect of question answering
We propose a solution for scenarios with complex calls and dependencies in the game service system. We use the Trace data in the service to automatically generate topology maps, and conduct analysis and diagnosis of high latency analysis, high error rate analysis, system hotspots and bottlenecks, etc., to shorten problem processing time and optimize system latency
Through the automatically generated topology map, we can quickly locate the root causes of abnormalities and performance bottlenecks in massive Trace data without manual intervention. This method can improve the efficiency of abnormal location in large-scale distributed systems and achieve root cause location at the level of thousands of requests per second. In a production environment, the accuracy of this solution can reach 95%
Traditional AIOps technologies, such as anomaly detection and root cause location, have the following two main problems:
In response to the above problems, SLS has now launched a universal model capability for intelligent operation and maintenance. We have developed basic models for analyzing logs, tracking information, and indicator data respectively, and provide out-of-the-box anomaly detection algorithms, root cause analysis, and automatic labeling. Our model is able to locate root causes in seconds across thousands of requests, with over 95% accuracy in production environments. For different data types, we choose different tasks for pre-training
Products with basic models in specific fields can be used immediately without cumbersome deployment processes. You can start using them with just one click, thus greatly lowering the threshold for customers to use the basic functions of Log Service. Customers do not need to fine-tune the model in specific scenarios. They can simply use the general basic model provided by Log Service directly to obtain good results
Alibaba Cloud Intelligent Lens Copilot provides support for cloud facility maintenance and operations through powerful models, effectively solving the problems faced by users in terms of unfamiliarity with SLS syntax, lack of business domain knowledge and high-quality question and answer corpus
With the improvement of AI capabilities, SLS’s intelligent analysis capabilities will be comprehensively improved. SLS aims to leverage data and algorithms to support AIOps innovation with the following benefits:
Customers can easily use functions such as indicator anomaly detection, intelligent word segmentation of log text, and trace link high latency diagnosis on the Log Service console, allowing customers to experience the ubiquity of models
Basic models in specific fields have been prepared in advance and can be used directly, eliminating the tedious deployment process and only need to click once to start
The large language model in specific fields launched this time can significantly lower the threshold for customers to use the basic capabilities of the log service, so that the large language model can assist in generating query statements and become an important intelligent assistant
1. Customers do not need to fine-tune the model in specific scenarios, and can obtain good results by simply using the general basic model provided by Log Service
On the log service platform, it natively supports the annotation and feedback capabilities of Log, Metric, and Trace, allowing customers to quickly annotate during use and accumulate data sets that meet specific scenarios
With the powerful computing power support of Alibaba Cloud, the basic general model provided by the Log Service can achieve rapid expansion and service migration
In the future, customers will have the ability to fine-tune domain-specific models and quickly deploy them in parallel to create private model services
Original link: https://developer.aliyun.com/article/1396326?utm_content=g_1000386345
Please do not copy or reprint this article. This article is the original content of Alibaba Cloud and may not be reproduced without permission
The above is the detailed content of AI-driven SLS technology with innovative intelligent analysis capabilities. For more information, please follow other related articles on the PHP Chinese website!