首页 数据库 mysql教程 Parallel Query for MySQL with Shard-Query_MySQL

Parallel Query for MySQL with Shard-Query_MySQL

Jun 01, 2016 pm 01:16 PM

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitionedtables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.

The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.

In the examples I am going to use the schema from the Star Schema Benchmark.  I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and explain how Shard-Query executes them in parallel.

Here is the DDL for the lineorder table, which I will use for the demo queries:

CREATE TABLE IF NOT EXISTS lineorder( LO_OrderKey bigint not null, LO_LineNumber tinyint not null, LO_CustKey int not null, LO_PartKey int not null, LO_SuppKey int not null, LO_OrderDateKey int not null, LO_OrderPriority varchar(15), LO_ShipPriority char(1), LO_Quantity tinyint, LO_ExtendedPrice decimal, LO_OrdTotalPrice decimal, LO_Discount decimal, LO_Revenue decimal, LO_SupplyCost decimal, LO_Tax tinyint, LO_CommitDateKey int not null, LO_ShipMode varchar(10), primary key(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)) PARTITION BY HASH(LO_OrderDateKey) PARTITIONS 8;
登录后复制

CREATETABLEIFNOTEXISTSlineorder

(

LO_OrderKeybigintnotnull,

LO_LineNumbertinyintnotnull,

LO_CustKeyintnotnull,

LO_PartKeyintnotnull,

LO_SuppKeyintnotnull,

LO_OrderDateKeyintnotnull,

LO_OrderPriorityvarchar(15),

LO_ShipPrioritychar(1),

LO_Quantitytinyint,

LO_ExtendedPricedecimal,

LO_OrdTotalPricedecimal,

LO_Discountdecimal,

LO_Revenuedecimal,

LO_SupplyCostdecimal,

LO_Taxtinyint,

LO_CommitDateKeyintnotnull,

LO_ShipModevarchar(10),

primarykey(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)

)PARTITIONBYHASH(LO_OrderDateKey)PARTITIONS8;

Notice that the lineorder table is partitioned by HASH(LO_OrderDateKey) into 8 partitions.  I used 8 partitions and my test box has 4 cores. It does not hurt to have more partitions than cores. A number of partitions that is two or three times the number of cores generally works best because it keeps each partition small, and smaller partitions are faster to scan. If you have a very large table, a larger number of partitions may be acceptable. Shard-Query will submit a query to Gearman for each partition, and the number of Gearman workers controls the parallelism.

The SQL for the first demo is:

SELECT COUNT(DISTINCT LO_OrderDateKey) FROM lineorder;
登录后复制

SELECTCOUNT(DISTINCTLO_OrderDateKey)FROMlineorder;

Here is the explain from regular MySQL:

mysql> explain select count(distinct LO_OrderDateKey) from lineorder/G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder type: indexpossible_keys: PRIMARYkey: PRIMARYkey_len: 25ref: NULL rows: 58922188Extra: Using index1 row in set (0.00 sec)
登录后复制

mysql>explainselectcount(distinctLO_OrderDateKey)fromlineorder/G

***************************1.row***************************

          id:1

  select_type:SIMPLE

        table:lineorder

        type:index

possible_keys:PRIMARY

          key:PRIMARY

      key_len:25

          ref:NULL

        rows:58922188

        Extra:Usingindex

1rowinset(0.00sec)

So it is basically a full table scan. It takes a long time:

mysql> select count(distinct LO_OrderDateKey) from lineorder;+---------------------------------+| count(distinct LO_OrderDateKey) |+---------------------------------+|2406 |+---------------------------------+1 row in set (4 min 48.63 sec)
登录后复制

mysql>selectcount(distinctLO_OrderDateKey)fromlineorder;

+---------------------------------+

|count(distinctLO_OrderDateKey)|

+---------------------------------+

|                            2406|

+---------------------------------+

1rowinset(4min48.63sec)

Shard-Query executes this query differently from MySQL. It sends a query to each partition, in parallel like the following queries:

Array([0] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p0)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[1] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p1)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[2] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p2)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[3] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p3)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[4] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p4)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[5] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p5)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[6] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p6)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[7] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p7)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey)
登录后复制
Array(

    [0]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p0)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [1]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p1)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [2]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p2)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [3]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p3)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [4]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p4)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [5]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p5)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [6]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p6)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    [7]=>SELECTLO_OrderDateKeyASexpr_2839651562

FROMlineorder  PARTITION(p7)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

)

You will notice that there is one query for each partition.  Those queries will be sent to Gearman and executed in parallel by as many Gearman workers as possible (in this case 4.)  The output of the queries go into a coordinator table, and then another query does a final aggregation.  That query looks like this:

SELECT COUNT(distinct expr_2839651562) AS `count`FROM `aggregation_tmp_73522490`
登录后复制

SELECTCOUNT(distinctexpr_2839651562)AS`count`

FROM`aggregation_tmp_73522490`

The Shard-Query time:

select count(distinct LO_OrderDateKey) from lineorder;Array([count ] => 2406)1 rows returnedExec time: 0.10923719406128
登录后复制

selectcount(distinctLO_OrderDateKey)fromlineorder;

Array(

    [count]=>2406

)1rowsreturned

Exectime:0.10923719406128

That isn’t a typo, it really issub-secondcompared tominutesin regular MySQL.

This is because Shard-Query usesGROUP BYto answer this query and a loose index scanof the PRIMARY KEY is possible:

mysql> explain partitions SELECT LO_OrderDateKey AS expr_2839651562-> FROM lineorderPARTITION(p7)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey-> /G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder partitions: p7 type: rangepossible_keys: PRIMARYkey: PRIMARYkey_len: 4ref: NULL rows: 80108Extra: Using index for group-by1 row in set (0.00 sec)
登录后复制

mysql>explainpartitionsSELECTLO_OrderDateKeyASexpr_2839651562

    ->FROMlineorder  PARTITION(p7)  AS`lineorder`  WHERE1=1  AND1=1  GROUPBYLO_OrderDateKey

    ->/G

***************************1.row***************************

          id:1

  select_type:SIMPLE

        table:lineorder

  partitions:p7

        type:range

possible_keys:PRIMARY

          key:PRIMARY

      key_len:4

          ref:NULL

        rows:80108

        Extra:Usingindexforgroup-by

1rowinset(0.00sec)

Next another simple query will be tested, first on regular MySQL:

mysql> select count(*) from lineorder;+----------+| count(*) |+----------+| 59986052 |+----------+1 row in set (4 min 8.70 sec)
登录后复制

mysql>selectcount(*)fromlineorder;

+----------+|count(*)|+----------+|59986052|+----------+

1rowinset(4min8.70sec)

Again, the EXPLAIN shows a full table scan:

mysql> explain select count(*) from lineorder/G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder type: indexpossible_keys: NULLkey: PRIMARYkey_len: 25ref: NULL rows: 58922188Extra: Using index1 row in set (0.00 sec)
登录后复制

mysql>explainselectcount(*)fromlineorder/G

***************************1.row***************************

          id:1

  select_type:SIMPLE

        table:lineorder

        type:index

possible_keys:NULL

          key:PRIMARY

      key_len:25

          ref:NULL

        rows:58922188

        Extra:Usingindex

1rowinset(0.00sec)

Now, Shard-Query can’t do anything special to speed up this query, except to execute it in parallel, similar to the first query:

[0] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 AND 1=1[1] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 AND 1=1[2] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 AND 1=1[3] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 AND 1=1...
登录后复制

[0]=>SELECTCOUNT(*)ASexpr_3190753946

FROMlineorderPARTITION(p0)AS`lineorder`WHERE1=1AND1=1

[1]=>SELECTCOUNT(*)ASexpr_3190753946

FROMlineorderPARTITION(p1)AS`lineorder`WHERE1=1AND1=1

[2]=>SELECTCOUNT(*)ASexpr_3190753946

FROMlineorderPARTITION(p2)AS`lineorder`WHERE1=1AND1=1

[3]=>SELECTCOUNT(*)ASexpr_3190753946

FROMlineorderPARTITION(p3)AS`lineorder`WHERE1=1AND1=1

...

The aggregation SQL is similar, but this time the aggregate function is changed to SUM to combine the COUNT from each partition:

SELECT SUM(expr_3190753946) AS ` count `FROM `aggregation_tmp_51969525`
登录后复制

SELECTSUM(expr_3190753946)AS`count`

FROM`aggregation_tmp_51969525`

And the query is quite a bit faster at 140.24 second compared with MySQL’s 248.7 second result:

Array([count ] => 59986052)1 rows returnedExec time: 140.24419403076
登录后复制
Array(

[count]=>59986052

)1rowsreturned

Exectime:140.24419403076

Finally, I want to look at a more complex query that uses joins and aggregation.

mysql> explain select d_year, c_nation,sum(lo_revenue - lo_supplycost) as profitfrom lineorderjoin dim_dateon lo_orderdatekey = d_datekeyjoin customeron lo_custkey = c_customerkeyjoin supplieron lo_suppkey = s_suppkeyjoin parton lo_partkey = p_partkeywherec_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra |+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+|1 | SIMPLE| dim_date| ALL| PRIMARY | NULL| NULL| NULL |5 | Using temporary; Using filesort ||1 | SIMPLE| lineorder | ref| PRIMARY | PRIMARY | 4 | ssb.dim_date.D_DateKey | 89 | NULL||1 | SIMPLE| supplier| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_SuppKey |1 | Using where ||1 | SIMPLE| customer| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_CustKey |1 | Using where ||1 | SIMPLE| part| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_PartKey |1 | Using where |+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+5 rows in set (0.01 sec)
登录后复制

mysql>explainselectd_year,c_nation,  sum(lo_revenue-lo_supplycost)asprofit  fromlineorder  

joindim_date  onlo_orderdatekey=d_datekey  

joincustomer  onlo_custkey=c_customerkey  

joinsupplier  onlo_suppkey=s_suppkey  

joinpart  onlo_partkey=p_partkey  

where  c_region='AMERICA'  ands_region='AMERICA'  

and(p_mfgr='MFGR#1'  orp_mfgr='MFGR#2')  

groupbyd_year,c_nation  orderbyd_year,c_nation;

+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+

|id|select_type|table    |type  |possible_keys|key    |key_len|ref                      |rows|Extra                          |

+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+

|  1|SIMPLE      |dim_date  |ALL    |PRIMARY      |NULL    |NULL    |NULL                    |    5|Usingtemporary;Usingfilesort|

|  1|SIMPLE      |lineorder|ref    |PRIMARY      |PRIMARY|4      |ssb.dim_date.D_DateKey  |  89|NULL                            |

|  1|SIMPLE      |supplier  |eq_ref|PRIMARY      |PRIMARY|4      |ssb.lineorder.LO_SuppKey|    1|Usingwhere                    |

|  1|SIMPLE      |customer  |eq_ref|PRIMARY      |PRIMARY|4      |ssb.lineorder.LO_CustKey|    1|Usingwhere                    |

|  1|SIMPLE      |part      |eq_ref|PRIMARY      |PRIMARY|4      |ssb.lineorder.LO_PartKey|    1|Usingwhere                    |

+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+

5rowsinset(0.01sec)

Here is the query on regular MySQL:

mysql> select d_year, c_nation,sum(lo_revenue - lo_supplycost) as profitfrom lineorderjoin dim_dateon lo_orderdatekey = d_datekeyjoin customeron lo_custkey = c_customerkeyjoin supplieron lo_suppkey = s_suppkeyjoin parton lo_partkey = p_partkeywherec_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;+--------+---------------+--------------+| d_year | c_nation| profit |+--------+---------------+--------------+| 1992 | ARGENTINA | 102741829748 |...| 1998 | UNITED STATES |61345891337 |+--------+---------------+--------------+35 rows in set (11 min 56.79 sec)
登录后复制

mysql>selectd_year,c_nation,  sum(lo_revenue-lo_supplycost)asprofit  fromlineorder  joindim_date  onlo_orderdatekey=d_datekey  joincustomer  onlo_custkey=c_customerkey  joinsupplier  onlo_suppkey=s_suppkey  joinpart  onlo_partkey=p_partkey  where  c_region='AMERICA'  ands_region='AMERICA'  and(p_mfgr='MFGR#1'  orp_mfgr='MFGR#2')  groupbyd_year,c_nation  orderbyd_year,c_nation;

+--------+---------------+--------------+

|d_year|c_nation      |profit      |

+--------+---------------+--------------+

|  1992|ARGENTINA    |102741829748|

...

|  1998|UNITEDSTATES|  61345891337|

+--------+---------------+--------------+

35rowsinset(11min56.79sec)

Again, Shard-Query splits up the query to run over each partition (I won’t bore you with the details) and it executes the query faster than MySQL, in 343.3 second compared to ~720:

Array([d_year] => 1998[c_nation] => UNITED STATES[profit] => 61345891337)35 rows returnedExec time: 343.29854893684
登录后复制
Array(

    [d_year]=>1998

    [c_nation]=>UNITEDSTATES

    [profit]=>61345891337

)35rowsreturned

Exectime:343.29854893684

I hope you see how using Shard-Query can speed up queries without using sharding, on just a single server. All you really need to do is add partitioning.

You can get Shard-Query from GitHub at http://github.com/greenlion/swanhart-tools

Please note: Configure and install Shard-Query as normal, but simply use one node and set thecolumnoption (the shard column) to “nocolumn” or false, because you are not required to use a shard column if you are not sharding.

本站声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn

热AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

Undress AI Tool

免费脱衣服图片

Clothoff.io

Clothoff.io

AI脱衣机

AI Hentai Generator

AI Hentai Generator

免费生成ai无尽的。

热门文章

R.E.P.O.能量晶体解释及其做什么(黄色晶体)
2 周前 By 尊渡假赌尊渡假赌尊渡假赌
仓库:如何复兴队友
4 周前 By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island冒险:如何获得巨型种子
4 周前 By 尊渡假赌尊渡假赌尊渡假赌

热工具

记事本++7.3.1

记事本++7.3.1

好用且免费的代码编辑器

SublimeText3汉化版

SublimeText3汉化版

中文版,非常好用

禅工作室 13.0.1

禅工作室 13.0.1

功能强大的PHP集成开发环境

Dreamweaver CS6

Dreamweaver CS6

视觉化网页开发工具

SublimeText3 Mac版

SublimeText3 Mac版

神级代码编辑软件(SublimeText3)

减少在Docker中使用MySQL内存的使用 减少在Docker中使用MySQL内存的使用 Mar 04, 2025 pm 03:52 PM

本文探讨了Docker中的优化MySQL内存使用量。 它讨论了监视技术(Docker统计,性能架构,外部工具)和配置策略。 其中包括Docker内存限制,交换和cgroups

mysql无法打开共享库怎么解决 mysql无法打开共享库怎么解决 Mar 04, 2025 pm 04:01 PM

本文介绍了MySQL的“无法打开共享库”错误。 该问题源于MySQL无法找到必要的共享库(.SO/.DLL文件)。解决方案涉及通过系统软件包M验证库安装

如何使用Alter Table语句在MySQL中更改表? 如何使用Alter Table语句在MySQL中更改表? Mar 19, 2025 pm 03:51 PM

本文讨论了使用MySQL的Alter Table语句修改表,包括添加/删除列,重命名表/列以及更改列数据类型。

在 Linux 中运行 MySQl(有/没有带有 phpmyadmin 的 podman 容器) 在 Linux 中运行 MySQl(有/没有带有 phpmyadmin 的 podman 容器) Mar 04, 2025 pm 03:54 PM

本文比较使用/不使用PhpMyAdmin的Podman容器直接在Linux上安装MySQL。 它详细介绍了每种方法的安装步骤,强调了Podman在孤立,可移植性和可重复性方面的优势,还

什么是 SQLite?全面概述 什么是 SQLite?全面概述 Mar 04, 2025 pm 03:55 PM

本文提供了SQLite的全面概述,SQLite是一个独立的,无服务器的关系数据库。 它详细介绍了SQLite的优势(简单,可移植性,易用性)和缺点(并发限制,可伸缩性挑战)。 c

在MacOS上运行多个MySQL版本:逐步指南 在MacOS上运行多个MySQL版本:逐步指南 Mar 04, 2025 pm 03:49 PM

本指南展示了使用自制在MacOS上安装和管理多个MySQL版本。 它强调使用自制装置隔离安装,以防止冲突。 本文详细详细介绍了安装,起始/停止服务和最佳PRA

如何为MySQL连接配置SSL/TLS加密? 如何为MySQL连接配置SSL/TLS加密? Mar 18, 2025 pm 12:01 PM

文章讨论了为MySQL配置SSL/TLS加密,包括证书生成和验证。主要问题是使用自签名证书的安全含义。[角色计数:159]

哪些流行的MySQL GUI工具(例如MySQL Workbench,PhpMyAdmin)是什么? 哪些流行的MySQL GUI工具(例如MySQL Workbench,PhpMyAdmin)是什么? Mar 21, 2025 pm 06:28 PM

文章讨论了流行的MySQL GUI工具,例如MySQL Workbench和PhpMyAdmin,比较了它们对初学者和高级用户的功能和适合性。[159个字符]

See all articles