Mysql solution for counting 5 million+ daily table data?
<code>请教: 现在有每天的日表数据(一天生成一张), 每张表数据大概在500w左右。 需要从每天的日表数据中统计:根据appid统计ip数,同时ip需要去重。 大概的sql是:</code>
select appid, count(distinct(ip)) from log0812_tb where iptype = 4 group by appid;
<code>然后将统计的appid 和 ip数,放入到另一张统计表中。 1、直接执行sql的话,肯定超时了(系统仅配置了400ms读取时间)。 2、如果将数据都取出到内存中再做操作,内存又不足了,给的内存只有50M。。。(不为难程序员的需求不是好公司) 请问,还有优化的解决方案吗? 谢谢 </code>
Reply content:
<code>请教: 现在有每天的日表数据(一天生成一张), 每张表数据大概在500w左右。 需要从每天的日表数据中统计:根据appid统计ip数,同时ip需要去重。 大概的sql是:</code>
select appid, count(distinct(ip)) from log0812_tb where iptype = 4 group by appid;
<code>然后将统计的appid 和 ip数,放入到另一张统计表中。 1、直接执行sql的话,肯定超时了(系统仅配置了400ms读取时间)。 2、如果将数据都取出到内存中再做操作,内存又不足了,给的内存只有50M。。。(不为难程序员的需求不是好公司) 请问,还有优化的解决方案吗? 谢谢 </code>
Let’s first talk about the possible optimizations in the table below:
Make a combined index (appid, ip)
IP stores integers, not strings
If it still times out, then try to read the data into the memory, but your memory is only 50M, then you can try to use HyperLogLog. The memory consumed is very small, but the statistical data will be slightly biased, about 2%
Finally, it is best not to store this kind of log data in sql. You can choose some nosql such as hbase and mongodb, which can meet your needs very well
@manong
Thank you, the two optimization solutions you mentioned are both good.
I built a joint index of typeid, appid, and ip, so that this statement is executed through the index query without returning the table, and the time is controlled below 1.5s, which is effective.
As for the HyperLogLog algorithm, I just roughly checked it and didn’t put it into practice, but thank you for the recommendation.
I use another method to process: schedule tasks to process these 5 million+ data in batches. After deduplicating the data taken twice, do array_diff to compare the second different data, and then sum it to get the total count. In this way, the time can also be controlled below 1s. A trick here is to convert the array of the first comparison into a string and then store it in the array. Convert the string to array for the second comparison. This will save a lot of memory, because after trying it, nested arrays are better than long characters. Arrays of string values consume memory.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



The future of PHP will be achieved by adapting to new technology trends and introducing innovative features: 1) Adapting to cloud computing, containerization and microservice architectures, supporting Docker and Kubernetes; 2) introducing JIT compilers and enumeration types to improve performance and data processing efficiency; 3) Continuously optimize performance and promote best practices.

Create a database using Navicat Premium: Connect to the database server and enter the connection parameters. Right-click on the server and select Create Database. Enter the name of the new database and the specified character set and collation. Connect to the new database and create the table in the Object Browser. Right-click on the table and select Insert Data to insert the data.

MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

MySQL and SQL are essential skills for developers. 1.MySQL is an open source relational database management system, and SQL is the standard language used to manage and operate databases. 2.MySQL supports multiple storage engines through efficient data storage and retrieval functions, and SQL completes complex data operations through simple statements. 3. Examples of usage include basic queries and advanced queries, such as filtering and sorting by condition. 4. Common errors include syntax errors and performance issues, which can be optimized by checking SQL statements and using EXPLAIN commands. 5. Performance optimization techniques include using indexes, avoiding full table scanning, optimizing JOIN operations and improving code readability.

You can create a new MySQL connection in Navicat by following the steps: Open the application and select New Connection (Ctrl N). Select "MySQL" as the connection type. Enter the hostname/IP address, port, username, and password. (Optional) Configure advanced options. Save the connection and enter the connection name.

You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

PHP is not dying, but constantly adapting and evolving. 1) PHP has undergone multiple version iterations since 1994 to adapt to new technology trends. 2) It is currently widely used in e-commerce, content management systems and other fields. 3) PHP8 introduces JIT compiler and other functions to improve performance and modernization. 4) Use OPcache and follow PSR-12 standards to optimize performance and code quality.
