Now there is a flow table structure in the project as follows
id sdkVersion jarVersion countryCode imei createTime
The previous requirement was to find the total number by grouping sdkVersion, jarVersion, countryCode and the total number after imei sorting. The approximate sql is as follows:
select sdkVersion,jarVersion,countryCode,count(*),count(distinct imei) from xxx
where createTime = 'xxxx-xx-xx'
group by sdkVersion,jarVersion,countryCode
Find out all the data of the previous day and summarize it into a table. The structure is roughly as follows
id sdkVersion jarVersion countryCode count(*) count(distinct imei) createTime
The current requirement is to query the combination of any latitude, that is,
group by sdkVersion
group by jarVersion
group by countryCode
group by sdkVersion, countryCode
and so on. Combination, if we follow the previous daily summary plan, we will have to create many tables for different latitude combinations. Is there any good solution to solve this problem? Or can it be solved using a specialized statistical framework?
You can check out the PipelineDB streaming database
apache kylin, sub-second olap
For daily summary, real-time requirements are not high, and 500W records are still within the processing range. View + scheduled plan can meet the requirements, and there is no need to build multiple tables.
It is best for the questioner to explain what bottlenecks or pain points there are. After all, mysql is a mature product, and there are certain risks in switching to cutting-edge technology.
Write the stored procedure and run it regularly every day