Introduction
Ibco is a c/c++ coroutine library used on a large scale in the WeChat backend. It has been running stably on tens of thousands of machines in the WeChat backend since 2013. Libco was open sourced for the first time in 2013 as one of Tencent's six major open source projects. We have recently made a major update, which is synchronized at https://github.com/tencent/libco. libco supports the backend agile synchronization style programming model while providing high concurrency capabilities of the system.
Features supported by libco
No need to invade business logic, transform multi-process and multi-thread services into coroutine services, and concurrency capabilities are improved a hundred times;
Support CGI framework, easily build web services (New);
Support gethostbyname , mysqlclient, ssl and other commonly used third libraries (New);
Optional shared stack mode, a single machine can easily access tens of millions of connections (New);
Complete and concise coroutine programming interface
– pthread-like interface design, The creation and restoration of coroutines can be completed through simple and clear interfaces such as co_create and co_resume; – Coroutine private variables of class __thread, coroutine semaphore co_signal (New) for communication between coroutines; – Non-language level lambda implementation, Combined with coroutines to write and execute background asynchronous tasks in place (New); – A small and light network framework based on epoll/kqueue, a high-performance timer based on time roulette;
Background generated by libco
Early WeChat Due to complex and changing business requirements and rapid product iteration in the backend, most modules adopt a semi-synchronous and semi-asynchronous model. The access layer is an asynchronous model, and the business logic layer is a synchronous multi-process or multi-thread model. The concurrency capability of the business logic is only dozens to hundreds. As WeChat's business grows, the system scale becomes larger and larger, and each module is easily affected by back-end service/network jitter.
The choice of asynchronous transformation
In order to improve the concurrency capability of the WeChat backend, the general approach is to change all services on the existing network to an asynchronous model. This approach requires a huge amount of work, from the framework to the business logic code, which requires a complete transformation, which is time-consuming, labor-intensive and risky. So we started thinking about using coroutines.
But using coroutines will face the following challenges:
Coroutines in the industry have no large-scale application experience in c/c++ environments;
How to control coroutine scheduling;
How to handle synchronous style API calls, such as Socket, mysqlclient, etc.;
How to deal with the use of existing global variables and thread private variables;
In the end, we solved all the above problems through libco and achieved non-invasive asynchronous transformation of business logic. We used libco to transform hundreds of WeChat backend modules into coroutines and asynchronous transformations. During the transformation process, the business logic code was basically unchanged. So far, most of the services in the WeChat backend are multi-process or multi-threaded coroutine models. The concurrency capabilities have been qualitatively improved compared to before, and libco has become the cornerstone of the WeChat backend framework.
libco framework
libco is divided into three layers in the framework, namely interface layer, system function Hook layer and event-driven layer.
Synchronous style API processing
For synchronous style APIs, mainly synchronous network calls, libco’s primary task is to eliminate the occupation of resources by these waits and improve the concurrency performance of the system. For a regular network background service, we may go through steps such as connect, write, and read to complete a complete network interaction. When calling these APIs synchronously, the entire thread will hang waiting for network interaction.
Although the concurrency performance of the synchronous programming style is not good, it has the advantages of clear code logic, easy writing, and can support rapid business iteration and agile development. In order to continue to maintain the advantages of synchronous programming without modifying the existing business logic code online, libco innovatively took over the network call interface (Hook) and registered the surrender and recovery of the coroutine as an event in asynchronous network IO. with callbacks. When business processing encounters a synchronous network request, the libco layer will register the network request as an asynchronous event, this coroutine will give up the CPU occupation, and the CPU will be handed over to other coroutines for execution. Libco will automatically resume coroutine execution when a network event occurs or times out.
We have taken over most of the synchronization style APIs through the Hook method, and libco will schedule the coroutine to resume execution at the appropriate time.
Support of tens of millions of coroutines
By default, libco allows each coroutine to have its own running stack. When the coroutine is created, a fixed-size memory is allocated from the heap memory as the running stack of the coroutine. If we use a coroutine to handle an access connection on the front end, then for a massive access service, the concurrency limit of our service will easily be limited by memory. To this end, libco also provides a stackless coroutine sharing stack mode, which allows you to set up several coroutines to share the same running stack. When switching between coroutines under the same shared stack, the current running stack content needs to be copied to the coroutine's private memory. In order to reduce the number of such memory copies, the memory copy of the shared stack only occurs when switching between different coroutines. When the occupier of the shared stack has not changed, there is no need to copy the running stack.
The shared coroutine stack mode of libco coroutine makes it easy for a single machine to access tens of millions of connections, just by creating enough coroutines. We create 10 million coroutines (E5-2670 v3 @ 2.30GHz * 2, 128G memory) through libco shared stack mode. Each 100,000 coroutines use 128k memory. The total memory consumption of the entire stable echo service is approximately is 66G.
Coroutine private variables
When a multi-process program is transformed into a multi-threaded program, we can use __thread to quickly modify global variables. In the coroutine environment, we created the coroutine variable ROUTINE_VAR, which greatly simplifies The workload of coroutine transformation.
Since coroutines are essentially executed serially within a thread, when we define a thread private variable, there may be reentrancy issues. For example, if we define a thread private variable of __thread, we originally wanted each execution logic to have exclusive use of this variable. But when our execution environment is migrated to coroutines, the same thread private variable may be operated by multiple coroutines, which leads to the problem of variable intrusion. For this reason, when we were doing the asynchronous transformation of libco, we changed most of the thread private variables into coroutine-level private variables. Coroutine private variables have the following characteristics: when the code is running in a multi-threaded non-coroutine environment, the variable is thread-private; when the code is running in a coroutine environment, this variable is coroutine-private. The underlying coroutine private variables will automatically determine the running environment and correctly return the required value.
Coroutine private variables play a decisive role in transforming the existing environment from synchronization to asynchronousization. At the same time, we have defined a very simple and convenient method to define coroutine private variables, which is as simple as just one line of declaration code.
Hook method of gethostbyname
For existing network services, it may be necessary to query DNS to obtain the real address through the system's gethostbyname API interface. During the coroutine transformation, we found that the socket family function of our hook is not applicable to gethostbyname. When a coroutine calls gethostbyname, it will wait for the result synchronously, which causes other coroutines in the same thread to be delayed in execution. We studied the gethostbyname source code of glibc and found that the hook does not take effect mainly because glibc defines the __poll method internally to wait for events, instead of the general poll method; at the same time, glibc also defines a thread private variable, which can be used by different coroutines. Switching may cause reentrancy resulting in inaccurate data. Finally, the asynchronousization of the gethostbyname coroutine is solved through the Hook __poll method and the definition of coroutine private variables.
Gethostbyname is a synchronous query DNS interface provided by glibc. There are many excellent asynchronous solutions for gethostbyname in the industry, but these implementations require the introduction of a third-party library and require the underlying layer to provide an asynchronous callback notification mechanism. Through the hook method, libco realizes the asynchronousization of gethostbyname without modifying the glibc source code.
Coroutine Semaphore
In a multi-threaded environment, we will have synchronization requirements between threads. For example, the execution of one thread needs to wait for the signal of another thread. For this requirement, we usually use pthread_signal to solve it. In libco, we define the coroutine semaphore co_signal to handle the concurrency requirements between coroutines. A coroutine can decide to notify a waiting coroutine or wake up all waiting coroutines through co_cond_signal and co_cond_broadcast.
Summary
Libco is an efficient c/c++ coroutine library that provides a complete coroutine programming interface, commonly used Socket family function Hooks, etc., allowing businesses to use synchronous programming models for rapid iterative development. With stable operation over the past few years, libco has played a pivotal role as the cornerstone of WeChat's backend framework.