Returning the value of the previous row based on a union CTE sorted by date is slow.
P粉517814372
P粉517814372 2023-07-24 17:47:03
0
1
385
<p>从以下的体育比赛表格中:</p> <table class="s-table"> <thead> <tr> <th>id_</th> <th>p1_id</th> <th>p2_id</th> <th>match_date</th> <th>p1_stat</th> <th>p2_stat</th> </tr> </thead> <tbody> <tr> <td>852666</td> <td>1</td> <td>2</td> <td>01/01/1997</td> <td>1301</td> <td>249</td> </tr> <tr> <td>852842</td> <td>1</td> <td>2</td> <td>13/01/1997</td> <td>2837</td> <td>2441</td> </tr> <tr> <td>853471</td> <td>2</td> <td>1</td> <td>05/05/1997</td> <td>1474</td> <td>952</td> </tr> <tr> <td>4760</td> <td>2</td> <td>1</td> <td>25/05/1998</td> <td>1190</td> <td>1486</td> </tr> <tr> <td>6713</td> <td>2</td> <td>1</td> <td>18/01/1999</td> <td>2084</td> <td>885</td> </tr> <tr> <td>9365</td> <td>2</td> <td>1</td> <td>01/11/1999</td> <td>2894</td> <td>2040</td> </tr> <tr> <td>11456</td> <td>1</td> <td>2</td> <td>15/05/2000</td> <td>2358</td> <td>1491</td> </tr> <tr> <td>13022</td> <td>1</td> <td>2</td> <td>14/08/2000</td> <td>2722</td> <td>2401</td> </tr> <tr> <td>29159</td> <td>1</td> <td>2</td> <td>26/08/2002</td> <td>431</td> <td>2769</td> </tr> <tr> <td>44915</td> <td>1</td> <td>2</td> <td>07/10/2002</td> <td>1904</td> <td>482</td> </tr> </tbody> </table> <p>对于选定的比赛id_,我希望返回两位选手在他们各自上一场比赛中的统计数据,无论该选手在上一场比赛中是p1还是p2。请参考下面的预期输出,其中id_ = 11456:</p> <table class="s-table"> <thead> <tr> <th>id_</th> <th>p1_id</th> <th>p2_id</th> <th>match_date</th> <th>p1_stat</th> <th>p2_stat</th> <th>p1_prev_stat</th> <th>p2_prev_stat</th> </tr> </thead> <tbody> <tr> <td>11456</td> <td>1</td> <td>2</td> <td>15/05/2000</td> <td>2358</td> <td>1491</td> <td>2040</td> <td>2894</td> </tr> </tbody> </table> <p>在这个大小的表上,以下的SQL语句完全正常运行:</p> <pre class="brush:php;toolbar:false;">WITH cte_1 AS ( ( SELECT id_, match_date, p1_id AS player_id, p1_stat AS stat FROM test.match_table UNION ALL SELECT id_, match_date, p2_id AS player_id, p2_stat AS stat FROM test.match_table ) ), cte_2 AS ( SELECT id_, player_id, LAG(stat) OVER ( PARTITION BY player_id ORDER BY match_date, id_ ) AS prev_stat FROM cte_1 ) SELECT m.*, cte_p1.prev_stat AS p1_prev_stat, cte_p2.prev_stat AS p2_prev_stat FROM test.match_table AS m JOIN cte_2 AS cte_p1 ON cte_p1.id_ = m.id_ AND cte_p1.player_id = m.p1_id JOIN cte_2 AS cte_p2 ON cte_p2.id_ = m.id_ AND cte_p2.player_id = m.p2_id WHERE m.id_ = 11456 ORDER BY m.match_date</pre> <p>然而,实际的表格有130万行,大约需要12秒的时间。根据这个回答,问题似乎是CTE加载了所有或大部分的表格行,而不仅仅是需要的行。然而,解决方案并没有涵盖这种情况。</p><p>是否有人能提供一些建议,以改善性能?、</p><p>以下是创建小表格的SQL语句:</p><p><br /></p> <pre class="brush:php;toolbar:false;">CREATE TABLE `match_table` ( `id_` int NOT NULL AUTO_INCREMENT, `p1_id` int NOT NULL, `p2_id` int NOT NULL, `match_date` date NOT NULL, `p1_stat` int DEFAULT NULL, `p2_stat` int DEFAULT NULL, PRIMARY KEY (`id_`), KEY `ix__p1_id` (`p1_id`), KEY `ix__p2_id` (`p2_id`), KEY `ix__match_date` (`match_date`), KEY `ix__comp` (`p1_id`, `p2_id`, `match_date`) ); INSERT INTO `match_table` VALUES ( 4760, 2, 1, '1998-05-25', 1190, 1486 ), (6713, 2, 1, '1999-01-18', 2084, 885), ( 9365, 2, 1, '1999-11-01', 2894, 2040 ), ( 11456, 1, 2, '2000-05-15', 2358, 1491 ), ( 13022, 1, 2, '2000-08-14', 2722, 2401 ), ( 29159, 1, 2, '2002-08-26', 431, 2769 ), ( 44915, 1, 2, '2002-10-07', 1904, 482 ), ( 852666, 1, 2, '1997-01-01', 1301, 249 ), ( 852842, 1, 2, '1997-01-13', 2837, 2441 ), ( 853471, 2, 1, '1997-05-05', 1474, 952 );</pre>
P粉517814372
P粉517814372

reply all(1)
P粉274161593

First, making sure you have the correct indexes on your database is key. You should make sure that the id_, p1_id, p2_id, and match_date fields are indexed because they are used in your query.

Secondly, instead of using two CTEs (common expressions), use the LAG() function directly in the main query. This avoids the need for duplicate rows, thus speeding up queries. Here are the methods you can use:

SELECT 
    m.*, 
    LAG(m.p1_stat) OVER (PARTITION BY m.p1_id ORDER BY m.match_date, m.id_) AS p1_prev_stat, 
    LAG(m.p2_stat) OVER (PARTITION BY m.p2_id ORDER BY m.match_date, m.id_) AS p2_prev_stat 
FROM 
    test.match_table AS m 
WHERE 
    m.id_ = 11456
ORDER BY 
    m.match_date;

This query performs the following operations:

  • Removed the union operation to avoid doubling the number of rows.
  • Skips the steps of joining the original table twice.
  • Apply the LAG() function directly in the main query to obtain previous statistics.

However, if this approach does not meet your performance needs, consider creating a summary table. The table tracks each player's statistics for each game. Whenever new match results are added, the summary table is updated. This takes up more storage space and affects write performance, but greatly improves read performance.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template