When asked “Which one is better?” the answer usually gets is “not necessarily”. This is not a black and white world, most of the time it depends on the situation. Lightweight applications with built-in data structures for caching can be a good choice. Communication methods that do not go through the network are definitely relatively efficient, and are also simple and easy to use for programs. However, most of the built-in data structures are not thread-safe, which means that you need to lock yourself for thread synchronization when appropriate. The topic of thread synchronization is not within the scope of this topic, but it is not easy to play with thread synchronization. Many people either cannot lock or use excessive locks, resulting in low performance. In a truly high-concurrency environment, even a short period of performance degradation may cause a large number of requests to accumulate on the server and be unable to handle them, causing the application to crash. So when your application reaches a certain scale, you should consider using distributed cache. In addition to the lock problem that has been solved on the server side, it can also solve the problem that the cache cannot be shared and the problem of horizontal expansion you mentioned ( What should I do if the cache is too large to be accommodated on one server?) Under the same conditions, migrating from internal cache to distributed cache may not shorten the response time and improve QPS. In most cases, it may prolong the response time to a certain extent (network transmission and serialization and deserialization processes). But in exchange, more importantly, it brings horizontal scalability. To put it bluntly, it allows you to handle more requests through the heap server. Horizontal expansion is the key issue that most distributed databases need to solve. Of course, the introduction of NoSQL also inevitably introduces complexity, adding extra workload to your development. But NoSQL databases will bring some additional benefits, such as high availability and consistency of cached data. I’m afraid I’m confused after saying this. What I want to express is that introducing NoSQL for caching may not be as wonderful as you think. It will bring both problems and advantages. What you have to do is to evaluate whether the advantages it brings are what you want in your own application scenario. While enjoying these advantages, whether you can tolerate the adverse effects it brings, and then decide whether to use it.
This depends on the size of the data. Under the condition of a small amount, using the built-in data structure is the cheapest and most efficient option. Under the conditions of large data volume, ready-made noSql services must be used. Self-written components may have fatal bugs, and time-tested software is our best choice. Moreover, these softwares all have the necessary functions for large data volumes.
Only those with a small amount of data will use data structure cache. After all, it is stored in the memory. Those with a large amount of data will definitely have to use NoSQL
You can refer to the design of HBase. HBase physical storage is based on the MapFile format of HDFS (the new version implements HFile itself, but the principle is basically the same). MapFile is a read-only key-value file and cannot be modified after it is written once. Moreover, MapFile requires that all keys in the file are sorted when writing. In this case, HBase first stores the data in MemStore, and then writes it to HDFS (MapFile or HFile) persistently after MeMStore stores a certain amount of data. So your cache can be implemented using in-memory data structure + NoSQL persistence mechanism. The two ways make up for each other
I won’t say whether it is easy to implement. If you put a large number of objects directly into the jvm, your hair will turn gray when you start the program. Initialize and load once every time you start it! Unspeakable pain
When asked “Which one is better?” the answer usually gets is “not necessarily”. This is not a black and white world, most of the time it depends on the situation.
Lightweight applications with built-in data structures for caching can be a good choice. Communication methods that do not go through the network are definitely relatively efficient, and are also simple and easy to use for programs. However, most of the built-in data structures are not thread-safe, which means that you need to lock yourself for thread synchronization when appropriate. The topic of thread synchronization is not within the scope of this topic, but it is not easy to play with thread synchronization. Many people either cannot lock or use excessive locks, resulting in low performance. In a truly high-concurrency environment, even a short period of performance degradation may cause a large number of requests to accumulate on the server and be unable to handle them, causing the application to crash.
So when your application reaches a certain scale, you should consider using distributed cache. In addition to the lock problem that has been solved on the server side, it can also solve the problem that the cache cannot be shared and the problem of horizontal expansion you mentioned ( What should I do if the cache is too large to be accommodated on one server?) Under the same conditions, migrating from internal cache to distributed cache may not shorten the response time and improve QPS. In most cases, it may prolong the response time to a certain extent (network transmission and serialization and deserialization processes). But in exchange, more importantly, it brings horizontal scalability. To put it bluntly, it allows you to handle more requests through the heap server. Horizontal expansion is the key issue that most distributed databases need to solve.
Of course, the introduction of NoSQL also inevitably introduces complexity, adding extra workload to your development. But NoSQL databases will bring some additional benefits, such as high availability and consistency of cached data.
I’m afraid I’m confused after saying this. What I want to express is that introducing NoSQL for caching may not be as wonderful as you think. It will bring both problems and advantages. What you have to do is to evaluate whether the advantages it brings are what you want in your own application scenario. While enjoying these advantages, whether you can tolerate the adverse effects it brings, and then decide whether to use it.
This depends on the size of the data. Under the condition of a small amount, using the built-in data structure is the cheapest and most efficient option.
Under the conditions of large data volume, ready-made noSql services must be used. Self-written components may have fatal bugs, and time-tested software is our best choice. Moreover, these softwares all have the necessary functions for large data volumes.
Nothing is good, it’s better to use a caching framework, such as: ehcache
Only those with a small amount of data will use data structure cache. After all, it is stored in the memory. Those with a large amount of data will definitely have to use NoSQL
You can refer to the design of HBase.
HBase physical storage is based on the MapFile format of HDFS (the new version implements HFile itself, but the principle is basically the same). MapFile is a read-only key-value file and cannot be modified after it is written once. Moreover, MapFile requires that all keys in the file are sorted when writing. In this case, HBase first stores the data in MemStore, and then writes it to HDFS (MapFile or HFile) persistently after MeMStore stores a certain amount of data.
So your cache can be implemented using in-memory data structure + NoSQL persistence mechanism. The two ways make up for each other
I won’t say whether it is easy to implement. If you put a large number of objects directly into the jvm, your hair will turn gray when you start the program. Initialize and load once every time you start it! Unspeakable pain