Incorrect results when using last_value()
P粉486743671
P粉486743671 2024-03-31 11:23:07
0
1
455

This is the form:

id area Variety price
1 Alexander Valley Cabernet Sauvignon 35
2 Alexander Valley Cabernet Sauvignon 45
3 Alexander Valley Merlot 19
4 California Sauvignon Blanc 8
5 California Pinot Noir 17

I want to find out the cheapest and most expensive varieties in each region, so the output should be:

area expensive Cheap
Alexander Valley Cabernet Sauvignon Merlot
California Pinot Noir Sauvignon Blanc

I was able to get the correct results using both first_value()

SELECT
  DISTINCT region,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price) AS cheapest
FROM wine_list

I think it is equivalent to the following query

SELECT
  DISTINCT region,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
  LAST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS cheapest
FROM wine_list

But now my output is:

areaexpensiveCheapAlexander ValleyCabernet SauvignonCabernet SauvignonAlexander ValleyCabernet SauvignonMerlotCaliforniaPinot NoirPinot NoirCaliforniaPinot NoirSauvignon Blanc
Why is my output wrong? I'm confused.

P粉486743671
P粉486743671

reply all(1)
P粉253800312

The default window for

FIRST_VALUE and LAST_VALUE is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. ie. This is the first response. The last value is "so far".

However, you want it to apply to the entire dataset, so you must explicitly describe the window range:

SELECT DISTINCT
  region,
  FIRST_VALUE(variety) OVER 
    (PARTITION BY region ORDER BY price DESC
     ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS expensive,
  LAST_VALUE(variety) OVER 
     (PARTITION BY region ORDER BY price DESC
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cheapest
FROM wine_list;
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template