This is the form:
id | area | Variety | price |
---|---|---|---|
1 | Alexander Valley | Cabernet Sauvignon | 35 |
2 | Alexander Valley | Cabernet Sauvignon | 45 |
3 | Alexander Valley | Merlot | 19 |
4 | California | Sauvignon Blanc | 8 |
5 | California | Pinot Noir | 17 |
I want to find out the cheapest and most expensive varieties in each region, so the output should be:
area | expensive | Cheap |
---|---|---|
Alexander Valley | Cabernet Sauvignon | Merlot |
California | Pinot Noir | Sauvignon Blanc |
I was able to get the correct results using both first_value()
SELECT DISTINCT region, FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive, FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price) AS cheapest FROM wine_listI think it is equivalent to the following query
SELECT DISTINCT region, FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive, LAST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS cheapest FROM wine_listBut now my output is:
expensive | Cheap | |
---|---|---|
Cabernet Sauvignon | Cabernet Sauvignon | |
Cabernet Sauvignon | Merlot | |
Pinot Noir | Pinot Noir | |
Pinot Noir | Sauvignon Blanc |
The default window for
FIRST_VALUE
andLAST_VALUE
isROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. ie. This is the first response. The last value is "so far".However, you want it to apply to the entire dataset, so you must explicitly describe the window range: