I'm observing strange behavior that I'm trying to understand.
MySQL version: 5.7.33 I have the following query:
select * from a_table where time>='2022-05-10' and guid in (102,512,11,35,623,6,21,673);
a_table
has a primary key on time, guid
, and an index on guid
The query I wrote above has very good performance, according to the explain plan it is using index condition;using location;using MRR
When I increase the number of values in the in
clause, the performance is affected significantly.
After some drills, I got a rough number. For values less than ~14500, the interpretation scheme is the same as above. For quantities above this, only the explain plan uses where
and takes forever to run my query.
In other words, for example, if I put 14,000 values in the in
clause, the explain plan would have the expected 14,000 rows. However, if I put 15,000 values in the in
clause, the explanation has 221200324 rows. I don't even have that many rows in my entire table.
I'm trying to understand this behavior and know if there is any way to fix this.
Thanks
UnderstandLimit memory for range optimization.
When
IN()
has a large number of values in the predicate, it uses more memory during the query optimization step. This has been considered a problem in some cases, so recent versions of MySQL set a maximum memory limit (default is 8MB).If the optimizer finds that it requires more memory than the limit, and there are no other conditions in the query to optimize against, it will give up trying to optimize and resort to a table scan. I deduce that your table statistics actually show that the table has about 221 million rows (although the table statistics are an inaccurate estimate).
I can't say I know the exact formula for how much memory a given list of values requires, but based on the behavior you observed, we can guess that considering 14k items, each item averages about 600 bytes effectively, but more is invalid.
You can set
range_optimizer_max_mem_size = 0
to disable memory limits. This creates the risk of overcommitting memory, but it avoids the optimizer "giving up". We set this value on all MySQL instances in our previous job because we couldn't educate developers to avoid creating huge lists of values in their queries.