Technological waves such as cloud computing, big data, artificial intelligence, and blockchain have given continuous vitality to financial technology innovation. However, at the same time, new economic forms represented by the digital economy have also impacted traditional financial formats and Existing underlying technologies bring profound changes and huge challenges.
#In the context of the complex international situation, the country has put forward higher requirements for safe, reliable, independent and controllable technologies. The urgent need now is to strengthen the independent research and development capabilities of information systems in the financial industry and reduce dependence on commercial products.
#Since the financial industry involves people’s livelihood, once problems arise in the business, it will have a serious impact on the entire public opinion. Therefore, ensuring the system stability of the financial industry is particularly important. . However, financial enterprises that are going digital have problems such as unpredictability, uncontrollability, and high complexity in their business, which brings considerable challenges to ensuring system stability.
#So, how do financial companies plan for system stability governance? How to use the characteristics and advantages of cloud native architecture to ensure the stability of business systems? Recently, 51CTOinterviewed Zhu Jianfeng, an expert in NetEase Shufan’s cloud-native solutions, brought long-term practical experience of large state-owned banks, for the financial industry Provide suggestions and suggestions for improving system stability.
There are many challenges in cloud native upgrade in the financial industry
Follow As competition in the industry has intensified in recent years, major financial institutions have pursued personalization and scenario-based financial services, and want to create open finance. In addition, regulatory requirements require the realization of independent and controllable IT goals. Financial companies have further evolved from large single bus architectures to microservices, Cloud native architecture evolution.In the recent
WOT Global Technology Innovation At the conference, many heads of technology departments from banking, securities, insurance and other fields said that the implementation of cloud native architecture is not achieved overnight and needs to be continuously improved during the evolution of the architecture. Relevant capabilities and norms form the organizational culture and technical system of the enterprise. Enterprises need to comprehensively sort out and gradually evolve, from periphery to core, innovating first and then tradition. Mapping it to financial enterprises means starting from the sensitive business that is biased toward the Internet business, splitting microservices and cloud-native transformation, and then penetrating into the steady-state business of the core system. # As more enterprises begin to migrate to cloud native architecture, the stability guarantee of cloud native is also receiving more and more attention. Enterprises also encounter many challenges in ensuring the stability of cloud native. Zhu Jianfeng said that financial companies have two main challenges in ensuring the stability of cloud native. One type focuses on system resilience. In the external environment, changes in access traffic will cause system overload, and application of high-availability design that does not meet standards will result in insufficient system resilience. The other type focuses on system observability. Due to system observability, Insufficient, resulting in operation and maintenance personnel being unable to detect diversified cloud service risk failures at the first time, such as operational changes to the production environment (human error, change failure), code writing defects (code quality, program logic, application architecture and other failures) ), as well as platform hardware failures, network failures, etc. that the business relies on, cannot quickly detect and locate problems, ultimately causing business losses.
# Therefore, the key evolution direction of financial enterprise business systems is the cloud native technology base. Financial enterprises need to apply the characteristics of cloud native to business scenarios to enhance the observability, application resilience, high availability, fault self-healing and other capabilities of traditional cloud services, thereby eliminating uncertainty and bringing stability to business systems. for extra protection.
The stability value provided by cloud native is underestimated,
State-owned banks have already tried it As we all know, the traditional operating environment tends to be manual operation and maintenance, relying more on personal experience, and is generally difficult to standardize. The
essential difference of the cloud native architecture lies incontainers and the ability to orchestrate and schedule containers. Containerization brings a standardized environment for running applications , including monitoring alarms, abnormal events and other data in the cloud native environment, are also stored in standardized format, and combined with K8s Technology provides fault self-healing and automated operation and maintenance technology. The risk prediction platform built using cloud native technology naturally has more intelligent, automated, and standard stability guarantee capabilities. It can also Provide a more effective tool platform for business applications on traditional virtual machine environments. However, most of the industry’s expectations for cloud native still focus on how to migrate business to cloud native architecture. However, the cost of this migration process is relatively high and the cycle is also long. Head customers in the financial industry with a strong desire to innovate are already taking action, especially some large state-owned banks with strong technical capabilities, to migrate based on cloud native Backed by the stability guarantee, the distributed architecture transformation and the plan to move core business to small machines are combined to promote implementation, while more companies with insufficient technical reserves are often in a wait-and-see state. Comprehensive analysis,Zhu Jianfeng believes that the additional observability and fault self-healing capabilities that cloud-native architecture can provide are underestimated. PaaS Cloud native base, these technical tool platforms have cloud native capabilities and advantages. Compared with traditional virtual machines and physical machines, they can further achieve intelligent capabilities, which is something that everyone does not have Too much perceived. Therefore, before #turning their business to cloud-native, enterprises may wish to consider migrating technical tool platforms to cloud-native architecture and use stability assurance technology to reverse the Businesses under traditional architecture (including steady-state business) are empowered. In fact, some financial customers who cooperate with NetEase Shufan have also cautiously adopted such a strategy in certain businesses. System stability guarantee trilogy includes both pre-event and mid-event impacts Murphy's Law states that "anything that can go wrong has a high probability of going wrong." It means that as long as any event has a probability greater than zero, it cannot be assumed that it will not happen. The essence of this law is that even if the probability of something happening is low, it should not be taken lightly and precautions should be taken to prevent adverse effects. #So, how should we build and improve the stability of the business system? According to the event life cycle, Zhu Jianfeng organized the construction path to enhance the stability and risk assurance capabilities of the business system into three parts: provide risk prediction in advance to reduce the probability of failure; quickly stop losses and reduce failures through fault awareness and automatic root cause analysis during the event. Impact; improve fault improvement tracking capabilities afterwards to achieve stability construction goals. #In the pre-event stage, through risk prediction and middleware inspection, combined with full-link stress testing, chaos engineering, and traffic diversion playback, the system's existence is discovered in advance in the test environment. Possible risks will be analyzed and an analysis report will be given; at the same time, the production environment will be inspected regularly to promptly discover possible risk issues in the production environment. During the incident stage, through three-dimensional monitoring and collection of deep indicators of the system, standardized data allows root cause analysis to detect faults in a timely manner, locate the root cause, and provide an analysis report, so that problems can be discovered in one minute and located in five minutes. The main purpose of the post-event stage in the review process is to summarize experience and summarize some of the experience before and during the event into an expert rule base. However, if the enterprise’s IT The team's capabilities are limited and capital investment is also limited. Should we focus on before, during or after the event? Zhu Jianfeng said that the situation of each enterprise is different. If the enterprise does not have sufficient budget and manpower, it should focus on reducing the occurrence before the incident and the impact during the incident. Beforehand, through continuous inspections, risk assessments, fault drills, etc., abnormal risks will be shifted to the left, and algorithms will be introduced to realize the ability to predict risks beforehand and reduce potential risks; through systematic monitoring during the incident, after the fault occurs, It can quickly locate the root cause and adopt current limitingcovering or self-healing strategies based on the fault characteristic template to minimize the impact. Precipitating expert experience reduces the stability guarantee threshold It is understood that in terms of ensuring the stability of enterprise cloud native, NetEase Shufan provides full-stack event life cycle capabilities, including fault drills, service governance, risk prediction, three-dimensional monitoring, root cause analysis, Different modules such as fault self-healing and rule expert library. So, what are the core competitiveness of NetEase Shufan? Zhu Jianfeng told 51CTO that the core value of NetEase Shufan's cloud native stability assurance platform lies in the accumulation of expert experience, which is also part of the company's digital assets. On the one hand, NetEase is involved in large-scale Internet business in the group's business, and has accumulated a lot of professional expert experience, which can cover 70%-80% of Internet scenarios and can be reused in sensitive industries such as finance. State-of-the-art business support. On the other hand, NetEase Shufan is also cooperating with a number of leading financial companies, including major state-owned banks, to jointly build an expert database in the financial industry and continuously improve the experts in financial scenarios. Experience provides stability guarantee for financially sensitive businesses. "Based on this set of expert experience database, NetEase Shufan combines expert experience and fault database into codes, allowing machines to reduce the dependence on 'human' experience in system guarantee through algorithms, lowering the threshold of stability guarantee." #Expert experience can be effectively used in stability assurance scenarios, which is actually achieved through decision-making. On the one hand, it directly runs expert experience through the rule engine, and on the other hand, it helps enterprises assist through AIGC, AIOps and other technologies. decision-making, thereby continuously improving the scientific nature and effectiveness of diagnostic recommendations. This is also NetEase Shufan’s next step in ensuring stability and is in the process of internal verification. Provide transformation tools to ensure the stability of the financial system In the process of serving financial enterprises' technical architecture transformation, NetEase Shufan's positioning is relatively clear. First, through independent, controllable, stable and reliable## that has been optimized through large-scale practice. #, leading technology, and technical base products that can continue to evolve, provide financial enterprises with a sharp tool for transformation. The second is to adhere to the principles of open source, openness and no binding, and provide lightweight, decoupled modular tool products to complement the existing ## of the enterprise. #IT Plan small steps and run quickly, and gradually implement digital transformation. #A certain financial enterprise often has cache unavailability, which indirectly leads to business unavailability. However, this company has a low degree of automation and weak observation, and cannot find the root cause of the problem. After accessing NetEase Shufan Cloud's native stability assurance platform, the company discovered the underlying storage jitter through stability inspection and accurately located SSD Write through disk failure, so as to detect the problem in time and inform the storage team to investigate and deal with it. In addition, the jitter of the underlying storage also affects the middleware corresponding to virtual machines and physical machines running , NetEase ShufanBased on cloud native practice eachmiddlewarecategory design With matching remote multi-activity and fault self-healing capabilities, if abnormal jitter occurs, can Through this capability, traffic can be migrated to a stable cluster in a timely manner to avoid risks. Zhu Jianfeng emphasized that the core requirement of large enterprises is to quickly stop losses when problems occur. Therefore, the underlying storage is jittered. If the problem is checked and the corresponding storage is restored, the entire cycle will be very long. of. However, quickly discovering problems through stability inspection# and automatically solving the accident is a way to quickly stop losses during the incident. Conclusion The financial industry has always been an important area in which NetEase Shufan continues to invest and promote implementation practices. Integrating NetEase Internet technology and financial industry service experience, by providing microservice governance, API gateway, container platform, distributed cache, messaging, search, etc.Cloud native PaaS middleware and related Full stack capabilities of cloud-native distributed products, NetEase Shufan has already helped two of the four major state-owned banks and more than a dozen of China’s top 100 financial enterprise customers Transform and upgrade to a cloud-native distributed architecture, build full life cycle management of API assets, create an enterprise-level technology base that meets the characteristics of financial business, and help financial companies cope with complex business scenarios challenges and accelerate financial business innovation. In the field of cloud native stability guarantee, In the future, NetEase Shufan will continue to cooperate with financial companies to continuously improve DBA, SRE Experience of experienced experts, thereby improving the scientific nature and effectiveness of diagnostic recommendations, combined with intelligent decision-making, Help financial companies meet their business stability and growth needs.
The above is the detailed content of Is cloud native stability undervalued? Look at the stability guarantee rules of leading financial companies!. For more information, please follow other related articles on the PHP Chinese website!