Returning the value of the previous row based on a union CTE sorted by date is slow.
P粉517814372
2023-07-24 17:47:03
<p>从以下的体育比赛表格中:</p>
<table class="s-table">
<thead>
<tr>
<th>id_</th>
<th>p1_id</th>
<th>p2_id</th>
<th>match_date</th>
<th>p1_stat</th>
<th>p2_stat</th>
</tr>
</thead>
<tbody>
<tr>
<td>852666</td>
<td>1</td>
<td>2</td>
<td>01/01/1997</td>
<td>1301</td>
<td>249</td>
</tr>
<tr>
<td>852842</td>
<td>1</td>
<td>2</td>
<td>13/01/1997</td>
<td>2837</td>
<td>2441</td>
</tr>
<tr>
<td>853471</td>
<td>2</td>
<td>1</td>
<td>05/05/1997</td>
<td>1474</td>
<td>952</td>
</tr>
<tr>
<td>4760</td>
<td>2</td>
<td>1</td>
<td>25/05/1998</td>
<td>1190</td>
<td>1486</td>
</tr>
<tr>
<td>6713</td>
<td>2</td>
<td>1</td>
<td>18/01/1999</td>
<td>2084</td>
<td>885</td>
</tr>
<tr>
<td>9365</td>
<td>2</td>
<td>1</td>
<td>01/11/1999</td>
<td>2894</td>
<td>2040</td>
</tr>
<tr>
<td>11456</td>
<td>1</td>
<td>2</td>
<td>15/05/2000</td>
<td>2358</td>
<td>1491</td>
</tr>
<tr>
<td>13022</td>
<td>1</td>
<td>2</td>
<td>14/08/2000</td>
<td>2722</td>
<td>2401</td>
</tr>
<tr>
<td>29159</td>
<td>1</td>
<td>2</td>
<td>26/08/2002</td>
<td>431</td>
<td>2769</td>
</tr>
<tr>
<td>44915</td>
<td>1</td>
<td>2</td>
<td>07/10/2002</td>
<td>1904</td>
<td>482</td>
</tr>
</tbody>
</table>
<p>对于选定的比赛id_,我希望返回两位选手在他们各自上一场比赛中的统计数据,无论该选手在上一场比赛中是p1还是p2。请参考下面的预期输出,其中id_ = 11456:</p>
<table class="s-table">
<thead>
<tr>
<th>id_</th>
<th>p1_id</th>
<th>p2_id</th>
<th>match_date</th>
<th>p1_stat</th>
<th>p2_stat</th>
<th>p1_prev_stat</th>
<th>p2_prev_stat</th>
</tr>
</thead>
<tbody>
<tr>
<td>11456</td>
<td>1</td>
<td>2</td>
<td>15/05/2000</td>
<td>2358</td>
<td>1491</td>
<td>2040</td>
<td>2894</td>
</tr>
</tbody>
</table>
<p>在这个大小的表上,以下的SQL语句完全正常运行:</p>
<pre class="brush:php;toolbar:false;">WITH cte_1 AS (
(
SELECT
id_,
match_date,
p1_id AS player_id,
p1_stat AS stat
FROM
test.match_table
UNION ALL
SELECT
id_,
match_date,
p2_id AS player_id,
p2_stat AS stat
FROM
test.match_table
)
),
cte_2 AS (
SELECT
id_,
player_id,
LAG(stat) OVER (
PARTITION BY player_id
ORDER BY
match_date,
id_
) AS prev_stat
FROM
cte_1
)
SELECT
m.*,
cte_p1.prev_stat AS p1_prev_stat,
cte_p2.prev_stat AS p2_prev_stat
FROM
test.match_table AS m
JOIN cte_2 AS cte_p1 ON cte_p1.id_ = m.id_
AND cte_p1.player_id = m.p1_id
JOIN cte_2 AS cte_p2 ON cte_p2.id_ = m.id_
AND cte_p2.player_id = m.p2_id
WHERE m.id_ = 11456
ORDER BY m.match_date</pre>
<p>然而,实际的表格有130万行,大约需要12秒的时间。根据这个回答,问题似乎是CTE加载了所有或大部分的表格行,而不仅仅是需要的行。然而,解决方案并没有涵盖这种情况。</p><p>是否有人能提供一些建议,以改善性能?、</p><p>以下是创建小表格的SQL语句:</p><p><br /></p>
<pre class="brush:php;toolbar:false;">CREATE TABLE `match_table` (
`id_` int NOT NULL AUTO_INCREMENT,
`p1_id` int NOT NULL,
`p2_id` int NOT NULL,
`match_date` date NOT NULL,
`p1_stat` int DEFAULT NULL,
`p2_stat` int DEFAULT NULL,
PRIMARY KEY (`id_`),
KEY `ix__p1_id` (`p1_id`),
KEY `ix__p2_id` (`p2_id`),
KEY `ix__match_date` (`match_date`),
KEY `ix__comp` (`p1_id`, `p2_id`, `match_date`)
);
INSERT INTO `match_table`
VALUES
(
4760, 2, 1, '1998-05-25', 1190, 1486
),
(6713, 2, 1, '1999-01-18', 2084, 885),
(
9365, 2, 1, '1999-11-01', 2894, 2040
),
(
11456, 1, 2, '2000-05-15', 2358, 1491
),
(
13022, 1, 2, '2000-08-14', 2722, 2401
),
(
29159, 1, 2, '2002-08-26', 431, 2769
),
(
44915, 1, 2, '2002-10-07', 1904, 482
),
(
852666, 1, 2, '1997-01-01', 1301, 249
),
(
852842, 1, 2, '1997-01-13', 2837, 2441
),
(
853471, 2, 1, '1997-05-05', 1474, 952
);</pre>
First, making sure you have the correct indexes on your database is key. You should make sure that the id_, p1_id, p2_id, and match_date fields are indexed because they are used in your query.
Secondly, instead of using two CTEs (common expressions), use the LAG() function directly in the main query. This avoids the need for duplicate rows, thus speeding up queries. Here are the methods you can use:
This query performs the following operations:
However, if this approach does not meet your performance needs, consider creating a summary table. The table tracks each player's statistics for each game. Whenever new match results are added, the summary table is updated. This takes up more storage space and affects write performance, but greatly improves read performance.