When using the group function to filter the result set, some problems encountered and solutions [Recommended :mysql video tutorial】
1. Application scenario
There are two tables
Article table (one-to-many messages Table) t_posts:
oid, posts_name
Message table (many-to-one article table) t_comment:
oid, posts_id, msg_content, create_time
2. Requirements analysis
Query the latest reply content of each article
3.SQL writing
select tp.oid, tp.posts_name, tc.msg_content, tc.create_time from t_posts tp left join t_comment tc on tp.oid = tc.posts_id group by tp.oid having create_time = max(create_time)
Suppose there are now two articles A and B (the order of the reply records in the database is consistent with the following)
<p>A有一个回复记录时间为: 2019-09-10 <br>A有一个回复记录时间为: 2019-09-11 <br>B有一个回复记录时间为: 2019-09-01 <br>B有一个回复记录时间为: 2019-09-09<br></p>
When you run the above sql, you will find that a large number of records are lost in the result set, and the result is Wrong. After querying the data, I learned that
mysql's having is executed after group by. That is to say, grouping is performed first and then filtered. However, because there are more than two message records,
so grouping The subsequent result set will only take the first message of each message as the record information after grouping. If you use having create_time = max(create_time)
, then max(create_time) is the maximum time of the current grouping
is: 2019-09-10 and 2019-09-09
So the above sql will lose the result set
4. Transform SQL
Because we know that the duplicate result set merged after grouping is the one with the smallest rownum, can we modify the sql as follows??
select tp.oid, tp.posts_name, tc.msg_content, tc.create_time from t_posts tp left join t_comment tc on tp.oid = tc.posts_id group by tp.oid having create_time = max(create_time) -- 下面的是新增的sql order by tc.create_time desc
After running it, we found that it still doesn’t work, which proves that order by is in After group by & having
Later I thought about it, can I use order by directly to optimize the grouped results without having?
having create_time = max (create_time)
select tp.oid, tp.posts_name, tc.msg_content, tc.create_time from t_posts tp left join t_comment tc on tp.oid = tc.posts_id group by tp.oid order by tc.create_time desc
The result set error does not affect the grouping results. The duplicate result sets are still merged according to the minimum grouping of rownum, and then sorted
5. Ultimate transformation Version
Because order by can only affect group by, is it possible to sort the result set before group by, and then group it?
select * from ( select tp.oid, tp.posts_name, tc.msg_content, tc.create_time from t_posts tp left join t_comment tc on tp.oid = tc.posts_id order by tc.create_time desc ) t group by t.oid
I found it still not possible It works, but the subquery is indeed sorted first
After querying (explain), I found that the order by of the subquery has been optimized away. Solution:
- Use in the subquery limit 99999
- Use where condition in subquery, create_time = (select max(create_time) from t_comment group by oid)
select * from ( select tp.oid, tp.posts_name, tc.msg_content, tc.create_time from t_posts tp left join t_comment tc on tp.oid = tc.posts_id order by tc.create_time desc limit 9999 ) t group by t.oid
Done
Additional knowledge points:
The difference between mysql5.5 and mysql 5.7 versions: In version 5.7, if limit is not used, group by will optimize order by