Sharding - Java backend code implementation and best practices after database sharding and table cutting
过去多啦不再A梦
过去多啦不再A梦 2017-06-23 09:12:49
0
5
919

In the current business, as some tables are getting larger and larger, the pressure is very high when reading (the demand for writing is relatively small), so on the database side, we decided to cut some tables with particularly large amounts of data, but in the back-end code There are a lot of codes/queries that need to be joined to these tables. How do you solve this situation?

For example, we now have a SampleTable with about 100 million pieces of data. We have divided it into about 16 different tables based on logic: SampleTable 1, SampleTable2...SampleTable31,
There was a query in the previous code, which was similar to:

select * from  SampleTable join test_table

Do we need to execute this query multiple times and aggregate the data as the return result?

select * from  SampleTable1 join test_table

Is there a better method or library recommendation? Is there any beset practice or sample code?

If we want to split multiple tables into different database servers later, do we need to add database connections for different DBs in the back-end code?

The basic idea and sharding strategy of database Sharding
This article is more about the strategy of database sharding. Can anyone provide an actual project code sample?
Database sharding and JPA
what -to-do-instead-of-sql-joins-while-scaling-horizontally

Some answers on stackoverflow

过去多啦不再A梦
过去多啦不再A梦

reply all(5)
大家讲道理

You can consider introducing database middleware
sharding-jdbc client level
mycat-server server level

世界只因有你

A friend recommended Spark, which supports SQl-style queries and returns results in about 0.5 seconds for 100 million pieces of data

ringa_lee

Only for the current situation in our project: when dividing tables, it falls to a specific table according to the hash algorithm, and then when fetching, first obtain the distribution position of the data according to the algorithm, and then it is a normal selection

漂亮男人

Join table query is not recommended
1. Database resources are relatively precious, and join table query will take up a lot of memory, resulting in a decrease in database performance
2. Data is not supported in multiple database instances, the split database situation cannot be handled, and the scalability is poor

The common approach is to divide the join table query into multiple single table queries, and then summarize the results in the application.
1. Can solve the above problem of joining table query
2. For multiple queries, the intermediate results of each query can also be processed in the program, which is a flexibility.
3. The application can also be expanded at any time, making it more flexible

If it is an offline scenario, it is recommended to use the MR (mapreduce) framework to handle it, such as hadoop, etc. Accordingly, the data needs to be written to HDFS.

学霸

http://blog.csdn.net/tianyale...
Detailed explanation of sub-database and sub-table

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!