The behavior of GROUP BY
clause in SQL without aggregate function
When executing a GROUP BY
query without using aggregate functions, the row selection returned by MySQL is not self-explanatory. Understanding its underlying behavior is critical to debugging such scenarios and ensuring the reliability of query results.
For example, for query SELECT * FROM emp GROUP BY dept
, the server chooses to return Jill and Fred instead of Jack and Tom. This is due to an optimization technique used by MySQL. However, this optimization is only reliable if the omitted columns (name and salary in this case) have the same value within each group (e.g., Jill and Tom have the same name and salary in group 'a', Fred and Tom have the same name and salary in group 'b').
If the omitted column has different values within the group, the selection is undefined, meaning the server can return any row from the group. This behavior is clearly stated in the MySQL documentation:
<code>使用此功能时,每个组中的所有行对于从 GROUP BY 部分省略的列都应具有相同的值。服务器可以自由返回组中的任何值,因此除非所有值都相同,否则结果是不确定的。</code>
This behavior causes problems when using *
in a SELECT clause instead of an explicit column enumeration. In the example provided, the query SELECT A.*, MIN(A.salary) AS min_salary FROM emp AS A GROUP BY A.dept
may not always return the lowest salary row for each department if the omitted column has different values within the group.
To avoid indeterminate results, it is strongly recommended to explicitly specify the required columns in the SELECT clause, ensuring that omitted columns have the same value within each group. Queries like SELECT A.* FROM emp AS A WHERE A.salary = (SELECT MAX(B.salary) FROM emp B WHERE B.dept = A.dept)
should be used instead of relying on undefined behavior.
The above is the detailed content of Why is MySQL's `GROUP BY` behavior unpredictable without aggregate functions?. For more information, please follow other related articles on the PHP Chinese website!