Home > Database > Mysql Tutorial > How to Select the First Row of Each Group in a Spark DataFrame?

How to Select the First Row of Each Group in a Spark DataFrame?

Susan Sarandon
Release: 2025-01-23 13:16:12
Original
974 people have browsed it

How to Select the First Row of Each Group in a Spark DataFrame?

Select the first row of each group

In order to retrieve the first row of each group based on specific sorting criteria, you can use several methods:

Window function

import org.apache.spark.sql.functions.{row_number, max, broadcast}
import org.apache.spark.sql.expressions.Window

// 创建一个用于分区和排序的窗口对象
val w = Window.partitionBy($"Hour").orderBy($"TotalValue".desc)

// 添加一个排名列来标识每个分组的第一行
val dfTop = df.withColumn("rn", row_number.over(w))

// 过滤排名为1的行
dfTop.where($"rn" === 1).drop("rn")
Copy after login

Normal SQL aggregations and joins

// 聚合以查找每个小时的最大值
val dfMax = df.groupBy($"Hour".as("max_hour")).agg(max($"TotalValue").as("max_value"))

// 将原始DataFrame与聚合后的DataFrame连接
val dfTopByJoin = df.join(broadcast(dfMax), ($"Hour" === $"max_hour") && ($"TotalValue" === $"max_value"))

// 删除不必要的列
dfTopByJoin.drop("max_hour").drop("max_value")
Copy after login

Structure sorting

// 为包含TotalValue和Category的结构体定义别名
val vs = struct($"TotalValue", $"Category").alias("vs")

// 按Hour分组并查找每个分组的最大结构体
val dfTop = df.select($"Hour", vs).groupBy($"Hour").agg(max(vs).alias("vs"))

// 从最大结构体中提取Category和TotalValue
dfTop.select($"Hour", $"vs.Category", $"vs.TotalValue")
Copy after login

Use DataFrame API

// 为DataFrame定义一个自定义类
case class Record(Hour: Integer, Category: String, TotalValue: Double)

// 将DataFrame转换为自定义类
val dfRecords = df.as[Record]

// 按Hour分组并减少以查找TotalValue最大的记录
val dfTopRecords = dfRecords.groupByKey(_.Hour).reduce((x, y) => if (x.TotalValue > y.TotalValue) x else y)

// 转换回DataFrame
dfTopRecords.toDF
Copy after login

The above is the detailed content of How to Select the First Row of Each Group in a Spark DataFrame?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template