Parallel Query for MySQL with Shard-Query_MySQL
While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node. Shard-Query can add parallelism to queries which use partitionedtables. Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.
The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.
In the examples I am going to use the schema from the Star Schema Benchmark. I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and explain how Shard-Query executes them in parallel.
Here is the DDL for the lineorder table, which I will use for the demo queries:
CREATE TABLE IF NOT EXISTS lineorder( LO_OrderKey bigint not null, LO_LineNumber tinyint not null, LO_CustKey int not null, LO_PartKey int not null, LO_SuppKey int not null, LO_OrderDateKey int not null, LO_OrderPriority varchar(15), LO_ShipPriority char(1), LO_Quantity tinyint, LO_ExtendedPrice decimal, LO_OrdTotalPrice decimal, LO_Discount decimal, LO_Revenue decimal, LO_SupplyCost decimal, LO_Tax tinyint, LO_CommitDateKey int not null, LO_ShipMode varchar(10), primary key(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)) PARTITION BY HASH(LO_OrderDateKey) PARTITIONS 8;
CREATETABLEIFNOTEXISTSlineorder (LO_OrderKeybigintnotnull, LO_LineNumbertinyintnotnull, LO_CustKeyintnotnull, LO_PartKeyintnotnull, LO_SuppKeyintnotnull, LO_OrderDateKeyintnotnull, LO_OrderPriorityvarchar(15), LO_ShipPrioritychar(1), LO_Quantitytinyint, LO_ExtendedPricedecimal, LO_OrdTotalPricedecimal, LO_Discountdecimal, LO_Revenuedecimal, LO_SupplyCostdecimal, LO_Taxtinyint,LO_CommitDateKeyintnotnull, LO_ShipModevarchar(10), primarykey(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber) )PARTITIONBYHASH(LO_OrderDateKey)PARTITIONS8; |
Notice that the lineorder table is partitioned by HASH(LO_OrderDateKey) into 8 partitions. I used 8 partitions and my test box has 4 cores. It does not hurt to have more partitions than cores. A number of partitions that is two or three times the number of cores generally works best because it keeps each partition small, and smaller partitions are faster to scan. If you have a very large table, a larger number of partitions may be acceptable. Shard-Query will submit a query to Gearman for each partition, and the number of Gearman workers controls the parallelism.
The SQL for the first demo is:
SELECT COUNT(DISTINCT LO_OrderDateKey) FROM lineorder;
SELECTCOUNT(DISTINCTLO_OrderDateKey)FROMlineorder; |
Here is the explain from regular MySQL:
mysql> explain select count(distinct LO_OrderDateKey) from lineorder/G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder type: indexpossible_keys: PRIMARYkey: PRIMARYkey_len: 25ref: NULL rows: 58922188Extra: Using index1 row in set (0.00 sec)
mysql>explainselectcount(distinctLO_OrderDateKey)fromlineorder/G ***************************1.row*************************** id:1select_type:SIMPLE table:lineorder type:index possible_keys:PRIMARY key:PRIMARY key_len:25 ref:NULL rows:58922188 Extra:Usingindex 1rowinset(0.00sec) |
So it is basically a full table scan. It takes a long time:
mysql> select count(distinct LO_OrderDateKey) from lineorder;+---------------------------------+| count(distinct LO_OrderDateKey) |+---------------------------------+|2406 |+---------------------------------+1 row in set (4 min 48.63 sec)
mysql>selectcount(distinctLO_OrderDateKey)fromlineorder; +---------------------------------+ |count(distinctLO_OrderDateKey)| +---------------------------------+ | 2406| +---------------------------------+ 1rowinset(4min48.63sec) |
Shard-Query executes this query differently from MySQL. It sends a query to each partition, in parallel like the following queries:
Array([0] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p0)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[1] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p1)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[2] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p2)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[3] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p3)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[4] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p4)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[5] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p5)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[6] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p6)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey[7] => SELECT LO_OrderDateKey AS expr_2839651562FROM lineorderPARTITION(p7)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey)
Array( [0]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p0) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [1]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p1) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [2]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p2) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [3]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p3) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [4]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p4) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [5]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p5) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [6]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p6) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [7]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p7) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey ) |
You will notice that there is one query for each partition. Those queries will be sent to Gearman and executed in parallel by as many Gearman workers as possible (in this case 4.) The output of the queries go into a coordinator table, and then another query does a final aggregation. That query looks like this:
SELECT COUNT(distinct expr_2839651562) AS `count`FROM `aggregation_tmp_73522490`
SELECTCOUNT(distinctexpr_2839651562)AS`count` FROM`aggregation_tmp_73522490` |
The Shard-Query time:
select count(distinct LO_OrderDateKey) from lineorder;Array([count ] => 2406)1 rows returnedExec time: 0.10923719406128
selectcount(distinctLO_OrderDateKey)fromlineorder; Array([count]=>2406 )1rowsreturnedExectime:0.10923719406128 |
That isn’t a typo, it really issub-secondcompared tominutesin regular MySQL.
This is because Shard-Query usesGROUP BYto answer this query and a loose index scanof the PRIMARY KEY is possible:
mysql> explain partitions SELECT LO_OrderDateKey AS expr_2839651562-> FROM lineorderPARTITION(p7)AS `lineorder` WHERE 1=1AND 1=1GROUP BY LO_OrderDateKey-> /G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder partitions: p7 type: rangepossible_keys: PRIMARYkey: PRIMARYkey_len: 4ref: NULL rows: 80108Extra: Using index for group-by1 row in set (0.00 sec)
mysql>explainpartitionsSELECTLO_OrderDateKeyASexpr_2839651562 ->FROMlineorder PARTITION(p7) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey ->/G***************************1.row*************************** id:1select_type:SIMPLE table:lineorder partitions:p7 type:range possible_keys:PRIMARY key:PRIMARY key_len:4 ref:NULL rows:80108 Extra:Usingindexforgroup-by 1rowinset(0.00sec) |
Next another simple query will be tested, first on regular MySQL:
mysql> select count(*) from lineorder;+----------+| count(*) |+----------+| 59986052 |+----------+1 row in set (4 min 8.70 sec)
mysql>selectcount(*)fromlineorder; +----------+|count(*)|+----------+|59986052|+----------+1rowinset(4min8.70sec) |
Again, the EXPLAIN shows a full table scan:
mysql> explain select count(*) from lineorder/G*************************** 1. row *************************** id: 1select_type: SIMPLEtable: lineorder type: indexpossible_keys: NULLkey: PRIMARYkey_len: 25ref: NULL rows: 58922188Extra: Using index1 row in set (0.00 sec)
mysql>explainselectcount(*)fromlineorder/G ***************************1.row*************************** id:1select_type:SIMPLE table:lineorder type:index possible_keys:NULL key:PRIMARY key_len:25 ref:NULL rows:58922188 Extra:Usingindex 1rowinset(0.00sec) |
Now, Shard-Query can’t do anything special to speed up this query, except to execute it in parallel, similar to the first query:
[0] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 AND 1=1[1] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 AND 1=1[2] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 AND 1=1[3] => SELECT COUNT(*) AS expr_3190753946FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 AND 1=1...
[0]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p0)AS`lineorder`WHERE1=1AND1=1 [1]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p1)AS`lineorder`WHERE1=1AND1=1 [2]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p2)AS`lineorder`WHERE1=1AND1=1 [3]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p3)AS`lineorder`WHERE1=1AND1=1 ... |
The aggregation SQL is similar, but this time the aggregate function is changed to SUM to combine the COUNT from each partition:
SELECT SUM(expr_3190753946) AS ` count `FROM `aggregation_tmp_51969525`
SELECTSUM(expr_3190753946)AS`count` FROM`aggregation_tmp_51969525` |
And the query is quite a bit faster at 140.24 second compared with MySQL’s 248.7 second result:
Array([count ] => 59986052)1 rows returnedExec time: 140.24419403076
Array( [count]=>59986052 )1rowsreturnedExectime:140.24419403076 |
Finally, I want to look at a more complex query that uses joins and aggregation.
mysql> explain select d_year, c_nation,sum(lo_revenue - lo_supplycost) as profitfrom lineorderjoin dim_dateon lo_orderdatekey = d_datekeyjoin customeron lo_custkey = c_customerkeyjoin supplieron lo_suppkey = s_suppkeyjoin parton lo_partkey = p_partkeywherec_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra |+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+|1 | SIMPLE| dim_date| ALL| PRIMARY | NULL| NULL| NULL |5 | Using temporary; Using filesort ||1 | SIMPLE| lineorder | ref| PRIMARY | PRIMARY | 4 | ssb.dim_date.D_DateKey | 89 | NULL||1 | SIMPLE| supplier| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_SuppKey |1 | Using where ||1 | SIMPLE| customer| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_CustKey |1 | Using where ||1 | SIMPLE| part| eq_ref | PRIMARY | PRIMARY | 4 | ssb.lineorder.LO_PartKey |1 | Using where |+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+5 rows in set (0.01 sec)
mysql>explainselectd_year,c_nation, sum(lo_revenue-lo_supplycost)asprofit fromlineorder joindim_date onlo_orderdatekey=d_datekey joincustomer onlo_custkey=c_customerkey joinsupplier onlo_suppkey=s_suppkey joinpart onlo_partkey=p_partkey where c_region='AMERICA' ands_region='AMERICA' and(p_mfgr='MFGR#1' orp_mfgr='MFGR#2') groupbyd_year,c_nation orderbyd_year,c_nation; +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ | 1|SIMPLE |dim_date |ALL |PRIMARY |NULL |NULL |NULL | 5|Usingtemporary;Usingfilesort| | 1|SIMPLE |lineorder|ref |PRIMARY |PRIMARY|4 |ssb.dim_date.D_DateKey | 89|NULL | | 1|SIMPLE |supplier |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_SuppKey| 1|Usingwhere | | 1|SIMPLE |customer |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_CustKey| 1|Usingwhere | | 1|SIMPLE |part |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_PartKey| 1|Usingwhere | +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ 5rowsinset(0.01sec) |
Here is the query on regular MySQL:
mysql> select d_year, c_nation,sum(lo_revenue - lo_supplycost) as profitfrom lineorderjoin dim_dateon lo_orderdatekey = d_datekeyjoin customeron lo_custkey = c_customerkeyjoin supplieron lo_suppkey = s_suppkeyjoin parton lo_partkey = p_partkeywherec_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;+--------+---------------+--------------+| d_year | c_nation| profit |+--------+---------------+--------------+| 1992 | ARGENTINA | 102741829748 |...| 1998 | UNITED STATES |61345891337 |+--------+---------------+--------------+35 rows in set (11 min 56.79 sec)
mysql>selectd_year,c_nation, sum(lo_revenue-lo_supplycost)asprofit fromlineorder joindim_date onlo_orderdatekey=d_datekey joincustomer onlo_custkey=c_customerkey joinsupplier onlo_suppkey=s_suppkey joinpart onlo_partkey=p_partkey where c_region='AMERICA' ands_region='AMERICA' and(p_mfgr='MFGR#1' orp_mfgr='MFGR#2') groupbyd_year,c_nation orderbyd_year,c_nation; +--------+---------------+--------------+ |d_year|c_nation |profit | +--------+---------------+--------------+ | 1992|ARGENTINA |102741829748| ...| 1998|UNITEDSTATES| 61345891337| +--------+---------------+--------------+ 35rowsinset(11min56.79sec) |
Again, Shard-Query splits up the query to run over each partition (I won’t bore you with the details) and it executes the query faster than MySQL, in 343.3 second compared to ~720:
Array([d_year] => 1998[c_nation] => UNITED STATES[profit] => 61345891337)35 rows returnedExec time: 343.29854893684
Array( [d_year]=>1998 [c_nation]=>UNITEDSTATES [profit]=>61345891337 )35rowsreturnedExectime:343.29854893684 |
I hope you see how using Shard-Query can speed up queries without using sharding, on just a single server. All you really need to do is add partitioning.
You can get Shard-Query from GitHub at http://github.com/greenlion/swanhart-tools
Please note: Configure and install Shard-Query as normal, but simply use one node and set thecolumnoption (the shard column) to “nocolumn” or false, because you are not required to use a shard column if you are not sharding.

熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

Video Face Swap
使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱門文章

熱工具

記事本++7.3.1
好用且免費的程式碼編輯器

SublimeText3漢化版
中文版,非常好用

禪工作室 13.0.1
強大的PHP整合開發環境

Dreamweaver CS6
視覺化網頁開發工具

SublimeText3 Mac版
神級程式碼編輯軟體(SublimeText3)

MySQL是一個開源的關係型數據庫管理系統。 1)創建數據庫和表:使用CREATEDATABASE和CREATETABLE命令。 2)基本操作:INSERT、UPDATE、DELETE和SELECT。 3)高級操作:JOIN、子查詢和事務處理。 4)調試技巧:檢查語法、數據類型和權限。 5)優化建議:使用索引、避免SELECT*和使用事務。

MySQL在Web應用中的主要作用是存儲和管理數據。 1.MySQL高效處理用戶信息、產品目錄和交易記錄等數據。 2.通過SQL查詢,開發者能從數據庫提取信息生成動態內容。 3.MySQL基於客戶端-服務器模型工作,確保查詢速度可接受。

InnoDB使用redologs和undologs確保數據一致性和可靠性。 1.redologs記錄數據頁修改,確保崩潰恢復和事務持久性。 2.undologs記錄數據原始值,支持事務回滾和MVCC。

MySQL是一種開源的關係型數據庫管理系統,主要用於快速、可靠地存儲和檢索數據。其工作原理包括客戶端請求、查詢解析、執行查詢和返回結果。使用示例包括創建表、插入和查詢數據,以及高級功能如JOIN操作。常見錯誤涉及SQL語法、數據類型和權限問題,優化建議包括使用索引、優化查詢和分錶分區。

MySQL在數據庫和編程中的地位非常重要,它是一個開源的關係型數據庫管理系統,廣泛應用於各種應用場景。 1)MySQL提供高效的數據存儲、組織和檢索功能,支持Web、移動和企業級系統。 2)它使用客戶端-服務器架構,支持多種存儲引擎和索引優化。 3)基本用法包括創建表和插入數據,高級用法涉及多表JOIN和復雜查詢。 4)常見問題如SQL語法錯誤和性能問題可以通過EXPLAIN命令和慢查詢日誌調試。 5)性能優化方法包括合理使用索引、優化查詢和使用緩存,最佳實踐包括使用事務和PreparedStatemen

選擇MySQL的原因是其性能、可靠性、易用性和社區支持。 1.MySQL提供高效的數據存儲和檢索功能,支持多種數據類型和高級查詢操作。 2.採用客戶端-服務器架構和多種存儲引擎,支持事務和查詢優化。 3.易於使用,支持多種操作系統和編程語言。 4.擁有強大的社區支持,提供豐富的資源和解決方案。

MySQL与其他编程语言相比,主要用于存储和管理数据,而其他语言如Python、Java、C 则用于逻辑处理和应用开发。MySQL以其高性能、可扩展性和跨平台支持著称,适合数据管理需求,而其他语言在各自领域如数据分析、企业应用和系统编程中各有优势。

MySQL索引基数对查询性能有显著影响:1.高基数索引能更有效地缩小数据范围,提高查询效率;2.低基数索引可能导致全表扫描,降低查询性能;3.在联合索引中,应将高基数列放在前面以优化查询。
