Parallel Query for MySQL with Shard-Query_MySQL
While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node. Shard-Query can add parallelism to queries which use partitionedtables. Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.
The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.
In the examples I am going to use the schema from the Star Schema Benchmark. I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and explain how Shard-Query executes them in parallel.
Here is the DDL for the lineorder table, which I will use for the demo queries:
1 |
|
CREATETABLEIFNOTEXISTSlineorder (LO_OrderKeybigintnotnull, LO_LineNumbertinyintnotnull, LO_CustKeyintnotnull, LO_PartKeyintnotnull, LO_SuppKeyintnotnull, LO_OrderDateKeyintnotnull, LO_OrderPriorityvarchar(15), LO_ShipPrioritychar(1), LO_Quantitytinyint, LO_ExtendedPricedecimal, LO_OrdTotalPricedecimal, LO_Discountdecimal, LO_Revenuedecimal, LO_SupplyCostdecimal, LO_Taxtinyint,LO_CommitDateKeyintnotnull, LO_ShipModevarchar(10), primarykey(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber) )PARTITIONBYHASH(LO_OrderDateKey)PARTITIONS8; |
Notice that the lineorder table is partitioned by HASH(LO_OrderDateKey) into 8 partitions. I used 8 partitions and my test box has 4 cores. It does not hurt to have more partitions than cores. A number of partitions that is two or three times the number of cores generally works best because it keeps each partition small, and smaller partitions are faster to scan. If you have a very large table, a larger number of partitions may be acceptable. Shard-Query will submit a query to Gearman for each partition, and the number of Gearman workers controls the parallelism.
The SQL for the first demo is:
1 |
|
SELECTCOUNT(DISTINCTLO_OrderDateKey)FROMlineorder; |
Here is the explain from regular MySQL:
1 |
|
mysql>explainselectcount(distinctLO_OrderDateKey)fromlineorder/G ***************************1.row*************************** id:1select_type:SIMPLE table:lineorder type:index possible_keys:PRIMARY key:PRIMARY key_len:25 ref:NULL rows:58922188 Extra:Usingindex 1rowinset(0.00sec) |
So it is basically a full table scan. It takes a long time:
1 |
|
mysql>selectcount(distinctLO_OrderDateKey)fromlineorder; +---------------------------------+ |count(distinctLO_OrderDateKey)| +---------------------------------+ | 2406| +---------------------------------+ 1rowinset(4min48.63sec) |
Shard-Query executes this query differently from MySQL. It sends a query to each partition, in parallel like the following queries:
1 |
|
Array( [0]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p0) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [1]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p1) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [2]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p2) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [3]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p3) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [4]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p4) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [5]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p5) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [6]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p6) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey [7]=>SELECTLO_OrderDateKeyASexpr_2839651562 FROMlineorder PARTITION(p7) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey ) |
You will notice that there is one query for each partition. Those queries will be sent to Gearman and executed in parallel by as many Gearman workers as possible (in this case 4.) The output of the queries go into a coordinator table, and then another query does a final aggregation. That query looks like this:
1 |
|
SELECTCOUNT(distinctexpr_2839651562)AS`count` FROM`aggregation_tmp_73522490` |
The Shard-Query time:
1 |
|
selectcount(distinctLO_OrderDateKey)fromlineorder; Array([count]=>2406 )1rowsreturnedExectime:0.10923719406128 |
That isn’t a typo, it really issub-secondcompared tominutesin regular MySQL.
This is because Shard-Query usesGROUP BYto answer this query and a loose index scanof the PRIMARY KEY is possible:
1 |
|
mysql>explainpartitionsSELECTLO_OrderDateKeyASexpr_2839651562 ->FROMlineorder PARTITION(p7) AS`lineorder` WHERE1=1 AND1=1 GROUPBYLO_OrderDateKey ->/G***************************1.row*************************** id:1select_type:SIMPLE table:lineorder partitions:p7 type:range possible_keys:PRIMARY key:PRIMARY key_len:4 ref:NULL rows:80108 Extra:Usingindexforgroup-by 1rowinset(0.00sec) |
Next another simple query will be tested, first on regular MySQL:
1 |
|
mysql>selectcount(*)fromlineorder; +----------+|count(*)|+----------+|59986052|+----------+1rowinset(4min8.70sec) |
Again, the EXPLAIN shows a full table scan:
1 |
|
mysql>explainselectcount(*)fromlineorder/G ***************************1.row*************************** id:1select_type:SIMPLE table:lineorder type:index possible_keys:NULL key:PRIMARY key_len:25 ref:NULL rows:58922188 Extra:Usingindex 1rowinset(0.00sec) |
Now, Shard-Query can’t do anything special to speed up this query, except to execute it in parallel, similar to the first query:
1 |
|
[0]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p0)AS`lineorder`WHERE1=1AND1=1 [1]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p1)AS`lineorder`WHERE1=1AND1=1 [2]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p2)AS`lineorder`WHERE1=1AND1=1 [3]=>SELECTCOUNT(*)ASexpr_3190753946 FROMlineorderPARTITION(p3)AS`lineorder`WHERE1=1AND1=1 ... |
The aggregation SQL is similar, but this time the aggregate function is changed to SUM to combine the COUNT from each partition:
1 |
|
SELECTSUM(expr_3190753946)AS`count` FROM`aggregation_tmp_51969525` |
And the query is quite a bit faster at 140.24 second compared with MySQL’s 248.7 second result:
1 |
|
Array( [count]=>59986052 )1rowsreturnedExectime:140.24419403076 |
Finally, I want to look at a more complex query that uses joins and aggregation.
1 |
|
mysql>explainselectd_year,c_nation, sum(lo_revenue-lo_supplycost)asprofit fromlineorder joindim_date onlo_orderdatekey=d_datekey joincustomer onlo_custkey=c_customerkey joinsupplier onlo_suppkey=s_suppkey joinpart onlo_partkey=p_partkey where c_region='AMERICA' ands_region='AMERICA' and(p_mfgr='MFGR#1' orp_mfgr='MFGR#2') groupbyd_year,c_nation orderbyd_year,c_nation; +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ | 1|SIMPLE |dim_date |ALL |PRIMARY |NULL |NULL |NULL | 5|Usingtemporary;Usingfilesort| | 1|SIMPLE |lineorder|ref |PRIMARY |PRIMARY|4 |ssb.dim_date.D_DateKey | 89|NULL | | 1|SIMPLE |supplier |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_SuppKey| 1|Usingwhere | | 1|SIMPLE |customer |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_CustKey| 1|Usingwhere | | 1|SIMPLE |part |eq_ref|PRIMARY |PRIMARY|4 |ssb.lineorder.LO_PartKey| 1|Usingwhere | +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+ 5rowsinset(0.01sec) |
Here is the query on regular MySQL:
1 |
|
mysql>selectd_year,c_nation, sum(lo_revenue-lo_supplycost)asprofit fromlineorder joindim_date onlo_orderdatekey=d_datekey joincustomer onlo_custkey=c_customerkey joinsupplier onlo_suppkey=s_suppkey joinpart onlo_partkey=p_partkey where c_region='AMERICA' ands_region='AMERICA' and(p_mfgr='MFGR#1' orp_mfgr='MFGR#2') groupbyd_year,c_nation orderbyd_year,c_nation; +--------+---------------+--------------+ |d_year|c_nation |profit | +--------+---------------+--------------+ | 1992|ARGENTINA |102741829748| ...| 1998|UNITEDSTATES| 61345891337| +--------+---------------+--------------+ 35rowsinset(11min56.79sec) |
Again, Shard-Query splits up the query to run over each partition (I won’t bore you with the details) and it executes the query faster than MySQL, in 343.3 second compared to ~720:
1 |
|
Array( [d_year]=>1998 [c_nation]=>UNITEDSTATES [profit]=>61345891337 )35rowsreturnedExectime:343.29854893684 |
I hope you see how using Shard-Query can speed up queries without using sharding, on just a single server. All you really need to do is add partitioning.
You can get Shard-Query from GitHub at http://github.com/greenlion/swanhart-tools
Please note: Configure and install Shard-Query as normal, but simply use one node and set thecolumnoption (the shard column) to “nocolumn” or false, because you are not required to use a shard column if you are not sharding.

핫 AI 도구

Undresser.AI Undress
사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover
사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool
무료로 이미지를 벗다

Clothoff.io
AI 옷 제거제

AI Hentai Generator
AI Hentai를 무료로 생성하십시오.

인기 기사

뜨거운 도구

메모장++7.3.1
사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전
중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기
강력한 PHP 통합 개발 환경

드림위버 CS6
시각적 웹 개발 도구

SublimeText3 Mac 버전
신 수준의 코드 편집 소프트웨어(SublimeText3)

뜨거운 주제











이 기사는 MySQL의 Alter Table 문을 사용하여 열 추가/드롭 테이블/열 변경 및 열 데이터 유형 변경을 포함하여 테이블을 수정하는 것에 대해 설명합니다.

기사는 인증서 생성 및 확인을 포함하여 MySQL에 대한 SSL/TLS 암호화 구성에 대해 설명합니다. 주요 문제는 자체 서명 인증서의 보안 영향을 사용하는 것입니다. [문자 수 : 159]

기사는 MySQL에서 파티셔닝, 샤딩, 인덱싱 및 쿼리 최적화를 포함하여 대규모 데이터 세트를 처리하기위한 전략에 대해 설명합니다.

기사는 MySQL Workbench 및 Phpmyadmin과 같은 인기있는 MySQL GUI 도구에 대해 논의하여 초보자 및 고급 사용자를위한 기능과 적합성을 비교합니다. [159 자].

이 기사에서는 Drop Table 문을 사용하여 MySQL에서 테이블을 떨어 뜨리는 것에 대해 설명하여 예방 조치와 위험을 강조합니다. 백업 없이는 행동이 돌이킬 수 없으며 복구 방법 및 잠재적 생산 환경 위험을 상세하게합니다.

이 기사에서는 PostgreSQL, MySQL 및 MongoDB와 같은 다양한 데이터베이스에서 JSON 열에서 인덱스를 작성하여 쿼리 성능을 향상시킵니다. 특정 JSON 경로를 인덱싱하는 구문 및 이점을 설명하고 지원되는 데이터베이스 시스템을 나열합니다.

기사는 외국 열쇠를 사용하여 데이터베이스의 관계를 나타내고 모범 사례, 데이터 무결성 및 피할 수있는 일반적인 함정에 중점을 둡니다.

기사는 준비된 명령문, 입력 검증 및 강력한 암호 정책을 사용하여 SQL 주입 및 무차별 적 공격에 대한 MySQL 보안에 대해 논의합니다 (159 자)
