Druid is an open source distributed data storage technology for real-time data analysis. It has the characteristics of high performance, low latency, and scalability. In order to further improve the performance and reliability of Druid, the Druid development team developed caching technology. This article mainly introduces the relevant knowledge of Druid caching.
1. Druid cache overview
Druid cache is divided into two types: one is the result cache on the Broker, and the other is the data cache on the Historical node. The role of caching is mainly to reduce the time it takes Druid to query data and reduce the query load.
The result cache on Broker is the cache of query results. Once the results are cached, subsequent queries can be directly retrieved from the cache. Obtain. The result cache is stored on the Broker's local disk, and the query result lifecycle is configurable and is 5 minutes by default. Query caching is generally used in scenarios that require high query response speed.
The data cache on Historical node is a cache of data blocks. The Historical node is responsible for storing data blocks. When the Historical node receives a query request, if the queried data block is already in the local cache, the Historical node reads the data block directly from the cache and returns the result. If the data block is not in the cache, the Historical node needs to obtain the data block from other nodes in the cluster or data source and cache it. Data caching is one of the most important features of Druid, and can greatly improve query performance and response speed in many scenarios.
2. How to use Druid cache
You need to pay attention to the following points when using cache in Druid:
Druid does not enable caching by default, and you need to explicitly specify the cache when querying. When querying, you can enable result caching or data block caching by setting corresponding parameters. The query parameters are as follows:
(1) useResultCache: set to true to enable result caching, the default is false;
(2) useCache: set to true to enable data block caching, the default is false .
Druid’s cache is configurable, and users can set the size, life cycle and other parameters of the cache according to their actual needs. The parameters of the cache configuration are as follows:
(1) QueryCacheSize: The maximum size of the result cache, the default value is 500MB;
(2) segmentQueryCacheSize: The maximum size of the data block cache, the default is 0;
(3) resultCacheMaxSizeBytes: The maximum size of a single query result cache, the default is 10485760 bytes (10MB);
(4) resultCacheExpire: The life cycle of the query result cache, the default is 5 minutes.
3. Cache optimization
The optimization of Druid cache mainly includes the following points:
When caching When the maximum capacity is reached or certain conditions are met, part of the cache needs to be cleared. By default, Druid cache clears some expired caches to free up more space. In addition, users can define their own clearing strategies and implement corresponding interfaces.
The setting of the cache size directly affects the storage capacity and efficiency of the cache. If the cache size is set too small, the cache will not be able to store enough data blocks or query results, thus affecting the performance of Druid queries; if the cache size is set too large, too many memory resources will be occupied, resulting in reduced query performance. Therefore, it needs to be adjusted according to the actual scenario to achieve optimal performance.
If the cache life cycle is set too long, the memory resources occupied by the cache will not be released for a long time, affecting the performance of Druid queries; cache life cycle If it is too short, the cache hit rate will be reduced, which will also affect the performance of Druid queries. Therefore, the cache life cycle needs to be adjusted according to actual scenarios to achieve optimal performance.
Summary:
Druid caching is an important way to optimize Druid query performance. Result caching and data block caching each have different advantages and disadvantages, and users need to choose the appropriate caching method based on specific scenarios. When using Druid cache, you need to pay attention to cache enablement and configuration, and adjust and optimize it according to actual scenarios.
The above is the detailed content of Learn about Druid caching technology. For more information, please follow other related articles on the PHP Chinese website!