Why MySQL slows down without specifying an index
P粉787806024
P粉787806024 2023-09-08 00:38:24
0
1
613

I'm trying to optimize a SQL query, but I'd like to know how to do it correctly.

SELECT 
    r0_.*
FROM ride r0_ use index (ride_booking_id_IDX)
LEFT JOIN booking b1_ ON r0_.booking_id = b1_.id 
LEFT JOIN spot s2_ ON r0_.from_spot_id = s2_.id 
LEFT JOIN spot s3_ ON r0_.to_spot_id = s3_.id 
WHERE b1_.start_at <= '2023-04-21' 
    AND b1_.end_at >= '2023-04-20' 
    AND b1_.paid_at IS NOT NULL 
    AND b1_.cancelled_at IS NULL 
    AND ((s2_.zone_id = 1 OR s3_.zone_id = 1)) 
    AND s2_.type = 'parking';

Here, I force the index (booking_id, from_spot_id, to_spot_id), which causes the query to execute within 25 seconds of the nearest date for about 100 milliseconds!

The

booking table has about 2 million rows, and the ride table has about 5 million rows.

However, I can see it scanning more rows using the forced index:

id Choose a type surface Partition type Possible keys key key_len refer to OK Filtered additional
1 Simple b1_ scope Main, booking_id_end_IDX, booking_id_IDX, booking_id_start_IDX, IDX_E00CEDDEB75363F7, IDX_E00CEDDE37D3107C, IDX_E00CEDDEDEA4208C, booking_paid_at_IDX, booking_cancelled_at_IDX IDX_E00CEDDE37D3107C 6 111456 6.6 Use index conditions; use location
1 Simple r0_ refer to ride_booking_id_IDX ride_booking_id_IDX 109 ector.b1_.id 1 100.0
1 Simple s2_ eq_ref main,IDX_B9327A739F2C3FAB,spot_type_IDX main 4 ector.r0_.from_spot_id 1 72.52 place of use
1 Simple s3_ eq_ref main main 4 ector.r0_.to_spot_id 1 100.0 place of use

Compared to the same query without using indexes:

id Choose a type surface Partition type Possible keys key key_len refer to OK Filtered additional
1 Simple s2_ refer to main,IDX_B9327A739F2C3FAB,spot_type_IDX spot_type_IDX 767 constant 161 100.0 Use index conditions
1 Simple r0_ refer to IDX_9B3D7CD0ABAF30D3, IDX_9B3D7CD03301C60, ride_booking_id_IDX, booking_from_spot_to_spot_IDX IDX_9B3D7CD0ABAF30D3 5 ector.s2_.id 392 100.0
1 Simple b1_ eq_ref Main, booking_id_end_IDX, booking_id_IDX, booking_id_start_IDX, IDX_E00CEDDEB75363F7, IDX_E00CEDDE37D3107C, IDX_E00CEDDEDEA4208C, booking_paid_at_IDX, booking_cancelled_at_IDX main 108 ector.r0_.booking_id 1 5.0 place of use
1 Simple s3_ eq_ref main main 4 ector.r0_.to_spot_id 1 100.0 place of use

As far as I can tell, the dates I'm using to compare against start_at and end_at are the reason why the query is noticeably faster.

So I try to isolate the slowest parts to smaller queries:

Select * from booking b where b.start_at < '2021-01-01' 和 b.end_at > '2021-01-01';

On the table booking I have two indexes (start_at) and (end_at) which help this query run faster as you approach the max and min values Faster (since the index will filter most rows, leaving very few rows to scan).

However, when I take a random value far enough in the past, it becomes much slower. The above query takes 10 seconds to run because it only uses one of the two indexes as expected and I don't know why the explanation for merge_index doesn't show up on such a simple query:

id Choose a type surface Partition type Possible keys key key_len refer to OK Filtered additional
1 Simple b scope IDX_E00CEDDEB75363F7,IDX_E00CEDDE37D3107C IDX_E00CEDDEB75363F7 6 1147319 50 Use index conditions; use location

Since I can't have an index that satisfies both range conditions, I tried splitting the query in half

SELECT * from booking b
INNER JOIN booking b2 use index(booking_id_start_IDX)  ON b.id = b2.id and b2.start_at < '2021-01-01'
INNER JOIN booking b3 use index(booking_id_end_IDX) ON b.id = b3.id and b3.end_at > '2021-01-01';

This query runs significantly faster, taking approximately 600 milliseconds. However, due to the simplicity of the query and the fact that it returns about 7k rows, I would expect it to be in the double digits at best.

I don't understand why the query doesn't automatically select my indexes (id, start_at) and (id, end_at)? What am I missing?

I know I could partition the table to get better results, but I have foreign keys that cannot be deleted, so that's not a solution. Should I consider another schema and have a table that holds the reservation dates separately without any foreign keys and have the reservations table reference it so I can partition the reservations table? Is it possible to use foreign keys to reference a partitioned subscription table at subscription time?

The Mysql engine is running in AWS with the following version: 8.0.mysql_aurora.3.02.2

The output of SELECT @@optimizer_switch is:

index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, engine_condition_pushdown=on, index_condition_pushdown=on, mrr=on, mrr_cost_based=on, block_nested_loop=on, batched_key_access=off, materialization=on, semijoin=on, loosescan=on, firstmatch=on, duplicateweedout=on, subquery_materialization_cost_based=on, use_index_extensions=on, condition_fanout_filter=on, provided_merge=on, use_invisible_indexes=off, skip_scan=on, hash_join=on, subquery_to_derivative=off , prefer_ordering_index=on, hypergraph_optimizer=off, derivative_condition_pushdown=on

P粉787806024
P粉787806024

reply all(1)
P粉018653751

Your index (id, start_at) is not selected because there is no fixed id to search for.

Depending on your use case, you may want to create one index on start_at and another on end_at. Afterwards, a simple query SELECT * from booking b where b.start_at '2021-01-01'; will take effect immediately; depending on the search criteria, MySQL may use a MERGE INDEX optimization operation index or both.

If you wish to use a single index, you need to choose the order of the fields carefully, since indexes are used in the same order in which they are defined.

Edit: After the OPs edit, here are my thoughts.

This explains things pretty well. Assuming you SELECT *, MySQL will be forced to read the entire table. Try selecting only id, in which case it will most likely use the index; as it will be an index-covered query.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template