大数据表的查询优化方案
如果有一张大表,表中的数据有几百万、几千万甚至上亿,要实现实时查询,查询的结果要在十秒钟之内出来,怎么办?如何做优化? 本人现在做的项目中,有个表的数据超过1千万行,超过3G的数据。现在需要对表中的数据进行查询统计,之前由于没做优化,导致此表的
如果有一张大表,表中的数据有几百万、几千万甚至上亿,要实现实时查询,查询的结果要在十秒钟之内出来,怎么办?如何做优化?
本人现在做的项目中,有个表的数据超过1千万行,超过3G的数据。现在需要对表中的数据进行查询统计,之前由于没做优化,导致此表的查询效率非常低下,让使用者非常苦恼,于是本人参与了此表的优化。
举个类似的例子,比如表中的结构如下,现在要统计某一天出生的人口数,或者统计某一城市的人口数,或者某一城市某一天出生的人口数。
CREATE TABLE `population` ( `population_id` bigint(64) NOT NULL AUTO_INCREMENT COMMENT '人口表', `name` varchar(128) COLLATE utf8_bin DEFAULT NULL COMMENT '姓名', `city` varchar(32) COLLATE utf8_bin DEFAULT NULL COMMENT '城市', `birthday` date DEFAULT NULL COMMENT '出生日期', PRIMARY KEY (`population_id`) ) 查询某一城市某一天出生的人口数 SELECT COUNT(*) FROM population WHERE city='广州' AND birthday = '2014-11-02' 查询某一城市的人口数 SELECT COUNT(*) FROM population WHERE city='广州' 查询某一天出生的人口数 SELECT COUNT(*) FROM population WHERE birthday = '2014-11-02'
提出了两个优化方案,
(1).优化索引
通过添加索引后,查询的效率得到极大的提升,常用查询的查询时间从原来的几十秒下降到几秒。
建立以下两个单列索引
ALTER TABLE `population` ADD INDEX `fk_city` (`city`), ADD INDEX `fk_birthday` (`birthday`);
也可以建立以下两个组合索引
ALTER TABLE `population` ADD INDEX `fk_index1` (`city`, `birthday`), ADD INDEX `fk_index2` (`birthday`, `city`);
(2).使用中间表
虽然索引优化可以将查询时间大大减少,但如果数据量达到一定量时,有些情况下索引到的数据达到几百万时,查询仍然会很慢,因此索引优化无法从根本上解决问题。现在表中的数据量越来越大,平均每个月要增加一两百万的数据,索引的优化方法只是暂时的,只能解决小数据量的查询问题,随着数据量的快速增长,索引带来的性能优化很容易达到极限,要寻找其他的解决方案。
我们根据业务需求的特点,创建中间表population_statistics,将表population中的统计数据存放到中间表population_statistics中,查询时直接从中间表population_statistics中查询。注意,在对表population进行增、删、改时,必须同时更新population_statistics中的数据,否则会出现数据不一致的错误!
CREATE TABLE `population_statistics` ( `population_statistics_id` bigint(64) NOT NULL AUTO_INCREMENT COMMENT '人口统计表ID', `city` varchar(128) COLLATE utf8_bin DEFAULT NULL COMMENT '城市', `birthday` int(32) DEFAULT NULL COMMENT '出生日期', `total_count` int(32) DEFAULT NULL COMMENT '人口数量', PRIMARY KEY (`population_statistics_id`), KEY `fk_city` (`city`), KEY `fk_birthday` (`birthday`) ) 查询某一城市某一天出生的人口数 SELECT total_count FROM population_statistics WHERE city='广州' AND birthday = '2014-11-02'; 查询某一城市的人口数 SELECT COUNT(total_count) FROM population_statistics WHERE city='广州'; 查询某一天出生的人口数 SELECT COUNT(total_count) FROM population_statistics WHERE birthday = '2014-11-02';
某个城市某一天的人口在表population中可能有几千甚至万的数据,而在统计表population_statistics中最多只有一条数据,也就是说统计表population_statistics中的数据量只有人口表population的几千分之一,再加上索引的优化,查询的速度会极大提高。
下面总结一下常用的大数据表优化方案.
1. 索引优化
通过建立合理高效的索引,提高查询的速度.
建议阅读本人写的一篇关于索引的博客
http://blog.csdn.net/brushli/article/details/39677387
2. SQL优化
组织优化SQL语句,使查询效率达到最优,在很多情况下要考虑索引的作用.
建议阅读考本人写的一篇关于索引的博客
http://blog.csdn.net/brushli/article/details/39677387
3. 水平拆表
如果表中的数据呈现出某一类特性,比如呈现时间特性,那么可以根据时间段将表拆分成多个。
比如按年划分、按季度划分、按月划分等等,查询时按时间段进行拆分查询,再把查询结果进行合并;
比如按地区将表拆分,不同地区的数据放在不同的表里面,然后对查询进行分拆,对查询结果进行合并。
4. 垂直拆表
将表按字段拆分成多个表,常用的字段放在一个表,不常用的字段或大字段放在另外一个表。由于数据库每次查询都是以块为单位,而每块的容量是有限的,通常是十几K或几十K,将表按字段拆分后,单次IO所能检索到的行数通常会提高很多,查询效率就能提高上去。
比如有成员表,结构如下:
CREATE TABLE `member` ( `member_id` bigint(64) NOT NULL AUTO_INCREMENT COMMENT '成员表ID', `name` varchar(128) COLLATE utf8_bin DEFAULT NULL COMMENT '成员姓名', `age` int(32) DEFAULT NULL COMMENT '成员年龄', `introduction` text COLLATE utf8_bin COMMENT '成员介绍', PRIMARY KEY (`member_id`) )
introduction是大字段,保存成员的介绍,这个大字段会严重影响查询效率,可以将它独立出来,单独形成一个表。
CREATE TABLE `member` ( `member_id` bigint(64) NOT NULL AUTO_INCREMENT COMMENT '成员表ID', `name` varchar(128) COLLATE utf8_bin DEFAULT NULL COMMENT '成员姓名', `age` int(32) DEFAULT NULL COMMENT '成员年龄', PRIMARY KEY (`member_id`) ) CREATE TABLE `member_introduction` ( `member_introduction_id` bigint(64) NOT NULL AUTO_INCREMENT COMMENT '成员介绍表ID', `member_id` bigint(64) DEFAULT NULL COMMENT '成员ID', `introduction` text COLLATE utf8_bin COMMENT '成员介绍', PRIMARY KEY (`member_introduction_id`), KEY `fk_member_id` (`member_id`), CONSTRAINT `fk_member_id` FOREIGN KEY (`member_id`) REFERENCES `member` (`member_id`) )
5. 建立中间表,以空间换时间
在有些情况下,是可以通过建立中间表来加快查询速度的,详情可看文章开头的例子。
6. 用内存缓存数据,以空间换时间
将常用而且不常修改的数据加载到内存中,直接从内存查询则可。
可以使用热门的缓存技术,如Memcache、Redis、Ehcache等。
7. 使用其他辅助技术
Solr:一种基于Lucene的JAVA搜索引擎技术

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to check my academic qualifications on Xuexin.com? You can check your academic qualifications on Xuexin.com, but many users don’t know how to check their academic qualifications on Xuexin.com. Next, the editor brings you a graphic tutorial on how to check your academic qualifications on Xuexin.com. Interested users come and take a look! Xuexin.com usage tutorial: How to check your academic qualifications on Xuexin.com 1. Xuexin.com entrance: https://www.chsi.com.cn/ 2. Website query: Step 1: Click on the Xuexin.com address above to enter the homepage Click [Education Query]; Step 2: On the latest webpage, click [Query] as shown by the arrow in the figure below; Step 3: Then click [Login Academic Credit File] on the new page; Step 4: On the login page Enter the information and click [Login];

Download the latest version of 12306 ticket booking app. It is a travel ticket purchasing software that everyone is very satisfied with. It is very convenient to go wherever you want. There are many ticket sources provided in the software. You only need to pass real-name authentication to purchase tickets online. All users You can easily buy travel tickets and air tickets and enjoy different discounts. You can also start booking reservations in advance to grab tickets. You can book hotels or special car transfers. With it, you can go where you want to go and buy tickets with one click. Traveling is simpler and more convenient, making everyone's travel experience more comfortable. Now the editor details it online Provides 12306 users with a way to view historical ticket purchase records. 1. Open Railway 12306, click My in the lower right corner, and click My Order 2. Click Paid on the order page. 3. On the paid page

MySQL and PL/SQL are two different database management systems, representing the characteristics of relational databases and procedural languages respectively. This article will compare the similarities and differences between MySQL and PL/SQL, with specific code examples to illustrate. MySQL is a popular relational database management system that uses Structured Query Language (SQL) to manage and operate databases. PL/SQL is a procedural language unique to Oracle database and is used to write database objects such as stored procedures, triggers and functions. same

Time complexity measures the execution time of an algorithm relative to the size of the input. Tips for reducing the time complexity of C++ programs include: choosing appropriate containers (such as vector, list) to optimize data storage and management. Utilize efficient algorithms such as quick sort to reduce computation time. Eliminate multiple operations to reduce double counting. Use conditional branches to avoid unnecessary calculations. Optimize linear search by using faster algorithms such as binary search.

1. Press the key combination (win key + R) on the desktop to open the run window, then enter [regedit] and press Enter to confirm. 2. After opening the Registry Editor, we click to expand [HKEY_CURRENT_USERSoftwareMicrosoftWindowsCurrentVersionExplorer], and then see if there is a Serialize item in the directory. If not, we can right-click Explorer, create a new item, and name it Serialize. 3. Then click Serialize, then right-click the blank space in the right pane, create a new DWORD (32) bit value, and name it Star

Vivox100s parameter configuration revealed: How to optimize processor performance? In today's era of rapid technological development, smartphones have become an indispensable part of our daily lives. As an important part of a smartphone, the performance optimization of the processor is directly related to the user experience of the mobile phone. As a high-profile smartphone, Vivox100s's parameter configuration has attracted much attention, especially the optimization of processor performance has attracted much attention from users. As the "brain" of the mobile phone, the processor directly affects the running speed of the mobile phone.

Five ways to optimize PHP function efficiency: avoid unnecessary copying of variables. Use references to avoid variable copying. Avoid repeated function calls. Inline simple functions. Optimizing loops using arrays.

How to check the latest price of Tongshen Coin? Token is a digital currency that can be used to purchase in-game items, services, and assets. It is decentralized, meaning it is not controlled by governments or financial institutions. Transactions of Tongshen Coin are conducted on the blockchain, which is a distributed ledger that records the information of all Tongshen Coin transactions. To check the latest price of Token, you can use the following steps: Choose a reliable price check website or app. Some commonly used price query websites include: CoinMarketCap: https://coinmarketcap.com/Coindesk: https://www.coindesk.com/ Binance: https://www.bin
