Mysql solution for counting 5 million+ daily table data?-PHP Tutorial-php.cn

Table of Contents

Reply content:

Home

Backend Development

PHP Tutorial

Mysql solution for counting 5 million+ daily table data?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 18, 2016 am 09:15 AM

mysql php

<code>请教：
现在有每天的日表数据（一天生成一张）， 每张表数据大概在500w左右。
需要从每天的日表数据中统计：根据appid统计ip数，同时ip需要去重。 
大概的sql是：</code>

Copy after login

select appid, count(distinct(ip)) from log0812_tb where iptype = 4 group by appid;

<code>然后将统计的appid 和 ip数，放入到另一张统计表中。 

1、直接执行sql的话，肯定超时了（系统仅配置了400ms读取时间）。
2、如果将数据都取出到内存中再做操作，内存又不足了，给的内存只有50M。。。（不为难程序员的需求不是好公司）
 
请问，还有优化的解决方案吗？
谢谢 </code>

Copy after login

Reply content:

<code>请教：
现在有每天的日表数据（一天生成一张）， 每张表数据大概在500w左右。
需要从每天的日表数据中统计：根据appid统计ip数，同时ip需要去重。 
大概的sql是：</code>

Copy after login

select appid, count(distinct(ip)) from log0812_tb where iptype = 4 group by appid;

<code>然后将统计的appid 和 ip数，放入到另一张统计表中。 

1、直接执行sql的话，肯定超时了（系统仅配置了400ms读取时间）。
2、如果将数据都取出到内存中再做操作，内存又不足了，给的内存只有50M。。。（不为难程序员的需求不是好公司）
 
请问，还有优化的解决方案吗？
谢谢 </code>

Copy after login

Let’s first talk about the possible optimizations in the table below:

Make a combined index (appid, ip)
IP stores integers, not strings

If it still times out, then try to read the data into the memory, but your memory is only 50M, then you can try to use HyperLogLog. The memory consumed is very small, but the statistical data will be slightly biased, about 2%

Finally, it is best not to store this kind of log data in sql. You can choose some nosql such as hbase and mongodb, which can meet your needs very well

@manong
Thank you, the two optimization solutions you mentioned are both good.

I built a joint index of typeid, appid, and ip, so that this statement is executed through the index query without returning the table, and the time is controlled below 1.5s, which is effective.

As for the HyperLogLog algorithm, I just roughly checked it and didn’t put it into practice, but thank you for the recommendation.

I use another method to process: schedule tasks to process these 5 million+ data in batches. After deduplicating the data taken twice, do array_diff to compare the second different data, and then sum it to get the total count. In this way, the time can also be controlled below 1s. A trick here is to convert the array of the first comparison into a string and then store it in the array. Convert the string to array for the second comparison. This will save a lot of memory, because after trying it, nested arrays are better than long characters. Arrays of string values consume memory.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7495

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

The Future of PHP: Adaptations and Innovations Apr 11, 2025 am 12:01 AM

The future of PHP will be achieved by adapting to new technology trends and introducing innovative features: 1) Adapting to cloud computing, containerization and microservice architectures, supporting Docker and Kubernetes; 2) introducing JIT compilers and enumeration types to improve performance and data processing efficiency; 3) Continuously optimize performance and promote best practices.

How to create navicat premium Apr 09, 2025 am 07:09 AM

Create a database using Navicat Premium: Connect to the database server and enter the connection parameters. Right-click on the server and select Create Database. Enter the name of the new database and the specified character set and collation. Connect to the new database and create the table in the Object Browser. Right-click on the table and select Insert Data to insert the data.

MySQL: Simple Concepts for Easy Learning Apr 10, 2025 am 09:29 AM

MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

PHP vs. Python: Understanding the Differences Apr 11, 2025 am 12:15 AM

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

MySQL and SQL: Essential Skills for Developers Apr 10, 2025 am 09:30 AM

MySQL and SQL are essential skills for developers. 1.MySQL is an open source relational database management system, and SQL is the standard language used to manage and operate databases. 2.MySQL supports multiple storage engines through efficient data storage and retrieval functions, and SQL completes complex data operations through simple statements. 3. Examples of usage include basic queries and advanced queries, such as filtering and sorting by condition. 4. Common errors include syntax errors and performance issues, which can be optimized by checking SQL statements and using EXPLAIN commands. 5. Performance optimization techniques include using indexes, avoiding full table scanning, optimizing JOIN operations and improving code readability.

How to create a new connection to mysql in navicat Apr 09, 2025 am 07:21 AM

You can create a new MySQL connection in Navicat by following the steps: Open the application and select New Connection (Ctrl N). Select "MySQL" as the connection type. Enter the hostname/IP address, port, username, and password. (Optional) Configure advanced options. Save the connection and enter the connection name.

How to open phpmyadmin Apr 10, 2025 pm 10:51 PM

You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

PHP: Is It Dying or Simply Adapting? Apr 11, 2025 am 12:13 AM

PHP is not dying, but constantly adapting and evolving. 1) PHP has undergone multiple version iterations since 1994 to adapt to new technology trends. 2) It is currently widely used in e-commerce, content management systems and other fields. 3) PHP8 introduces JIT compiler and other functions to improve performance and modernization. 4) Use OPcache and follow PSR-12 standards to optimize performance and code quality.

See all articles