How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Guanhui

Jul 27, 2020 pm 05:24 PM

mysql

##Interview Questions & Real Experience

Interview question: How to achieve deep paging when the amount of data is large?

You may encounter the above questions during interviews or when preparing for interviews. Most of the answers are basically to divide databases and tables to build indexes. This is a very standard correct answer, but Reality is always very hard, so the interviewer will usually ask you, now that the construction period is insufficient and the personnel are insufficient, how can we achieve deep paging?

At this time, students who have no practical experience are basically numb. So, please listen to me.

Painful Lessons

First of all, it must be clear: depth paging can be done, but depth is random Page jumps absolutely need to be banned.

Previous picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Guess, if I click on page 142360, will the service explode?

Like MySQL, MongoDB database is okay. It is a professional database in itself. The processing is not good, and at most it is slow. But if it involves ES, the nature is different. We have to use SearchAfter Api to loop Obtaining data involves the issue of memory usage. If the code is not written elegantly, it may directly lead to memory overflow.

Why random depth page jumps cannot be allowed

Let’s briefly talk about why random depth page jumps cannot be allowed from a technical point of view, or that Why is deep paging not recommended?

MySQL

The basic principle of paging:

SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20;

Copy after login

LIMIT 10000, 20 means scanning 10020 rows that meet the conditions and throwing them away Drop the first 10,000 lines and return the last 20 lines. If it is LIMIT 1000000, 100, 1000100 rows need to be scanned. In a highly concurrent application, each query needs to scan more than 100W rows. It would be strange if it does not explode.

MongoDB

The basic principle of paging:

db.t_data.find().limit(5).skip(5);

Copy after login

Similarly, as the page number increases, the items skipped by skip will also increase. becomes larger, and this operation is implemented through the iterator of the cursor. The consumption of the CPU will be very obvious. When the page number is very large and frequent, it will inevitably explode.

ElasticSearch

From a business perspective, ElasticSearch is not a typical database. It is a search engine. If the desired data is not found under the filter conditions , we will not find the data we want if we continue deep paging. To take a step back, if we use ES as a database for query, we will definitely encounter the limit of max_result_window when paging. Did you see it? Officials tell you the maximum The offset limit is ten thousand.

Query process:

If you query page 501, with 10 items per page, the client sends a request to a certain node
This node broadcasts data to each shard, and each shard queries the first 5010 pieces of data.
The query results are returned to the node, and then the data is integrated and the first 5010 pieces of data are retrieved.
Return to the client

From this we can see why it is necessary to limit the offset. In addition, if you use a scrolling method such as Search After API's deep page jump query also requires scrolling thousands of items each time. It may be necessary to scroll millions or tens of millions of pieces of data in total, just for the last 20 pieces of data. The efficiency can be imagined.

Align with the product again

As the saying goes, problems that cannot be solved by technology should be solved by business!

During my internship, I believed in the evil of the product, and it was necessary to implement deep paging and page jumps. Now we must correct the chaos, and the following changes must be made in the business:

Add default filtering conditions as much as possible, such as : Time period, the purpose is to reduce the amount of data displayed

Modify the display method of page jumps, change it to scrolling display, or jump pages in a small range

Scrolling display reference picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Small-scale page jump reference picture:

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

##General solutionThe quick solution in a short period of time mainly includes the following points:

MySQL

Original paging SQL:

# 第一页
SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit 0, 20;
# 第N页
SELECT * FROM `year_score` where `year` = 2017 ORDER BY id limit (N - 1) * 20, 20;

Copy after login

Through context, rewritten as:

# XXXX 代表已知的数据
SELECT * FROM `year_score` where `year` = 2017 and id > XXXX ORDER BY id limit 20;

Copy after login

在没内鬼，来点干货！SQL优化和诊断一文中提到过，LIMIT会在满足条件下停止查询，因此该方案的扫描总量会急剧减少，效率提升Max！

方案和MySQL相同，此时我们就可以随用所欲的使用 FROM-TO Api，而且不用考虑最大限制的问题。

MongoDB

方案基本类似，基本代码如下：

How to be compatible with MySQL + ES + MongoDB to achieve deep paging of hundreds of millions of data?

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

1 months ago By DDD

R.E.P.O. Best Graphic Settings

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

1 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7409

Java Tutorial

1631

CakePHP Tutorial

1358

Laravel Tutorial

1268

PHP Tutorial

1218

Related knowledge

PHP's big data structure processing skills May 08, 2024 am 10:24 AM

Big data structure processing skills: Chunking: Break down the data set and process it in chunks to reduce memory consumption. Generator: Generate data items one by one without loading the entire data set, suitable for unlimited data sets. Streaming: Read files or query results line by line, suitable for large files or remote data. External storage: For very large data sets, store the data in a database or NoSQL.

How to use MySQL backup and restore in PHP? Jun 03, 2024 pm 12:19 PM

Backing up and restoring a MySQL database in PHP can be achieved by following these steps: Back up the database: Use the mysqldump command to dump the database into a SQL file. Restore database: Use the mysql command to restore the database from SQL files.

How to optimize MySQL query performance in PHP? Jun 03, 2024 pm 08:11 PM

MySQL query performance can be optimized by building indexes that reduce lookup time from linear complexity to logarithmic complexity. Use PreparedStatements to prevent SQL injection and improve query performance. Limit query results and reduce the amount of data processed by the server. Optimize join queries, including using appropriate join types, creating indexes, and considering using subqueries. Analyze queries to identify bottlenecks; use caching to reduce database load; optimize PHP code to minimize overhead.

How to insert data into a MySQL table using PHP? Jun 02, 2024 pm 02:26 PM

How to insert data into MySQL table? Connect to the database: Use mysqli to establish a connection to the database. Prepare the SQL query: Write an INSERT statement to specify the columns and values to be inserted. Execute query: Use the query() method to execute the insertion query. If successful, a confirmation message will be output.

How to create a MySQL table using PHP? Jun 04, 2024 pm 01:57 PM

Creating a MySQL table using PHP requires the following steps: Connect to the database. Create the database if it does not exist. Select a database. Create table. Execute the query. Close the connection.

How to use MySQL stored procedures in PHP? Jun 02, 2024 pm 02:13 PM

To use MySQL stored procedures in PHP: Use PDO or the MySQLi extension to connect to a MySQL database. Prepare the statement to call the stored procedure. Execute the stored procedure. Process the result set (if the stored procedure returns results). Close the database connection.

How to fix mysql_native_password not loaded errors on MySQL 8.4 Dec 09, 2024 am 11:42 AM

One of the major changes introduced in MySQL 8.4 (the latest LTS release as of 2024) is that the "MySQL Native Password" plugin is no longer enabled by default. Further, MySQL 9.0 removes this plugin completely. This change affects PHP and other app

The difference between oracle database and mysql May 10, 2024 am 01:54 AM

Oracle database and MySQL are both databases based on the relational model, but Oracle is superior in terms of compatibility, scalability, data types and security; while MySQL focuses on speed and flexibility and is more suitable for small to medium-sized data sets. . ① Oracle provides a wide range of data types, ② provides advanced security features, ③ is suitable for enterprise-level applications; ① MySQL supports NoSQL data types, ② has fewer security measures, and ③ is suitable for small to medium-sized applications.

See all articles