Home > Database > Mysql Tutorial > How to Keep Non-Aggregated Columns After a Spark DataFrame GroupBy?

How to Keep Non-Aggregated Columns After a Spark DataFrame GroupBy?

Susan Sarandon
Release: 2024-12-31 14:33:11
Original
362 people have browsed it

How to Keep Non-Aggregated Columns After a Spark DataFrame GroupBy?

How to Preserve Non-Aggregated Columns in Spark DataFrame GroupBy

When aggregating data using DataFrame's groupBy method, the resulting DataFrame only contains the group-by key and the aggregated values. However, in some cases, it may be desirable to also include non-aggregated columns from the original DataFrame in the result.

Limitation of Spark SQL

Spark SQL follows the convention of pre-1999 SQL, which does not allow additional columns in aggregation queries. Aggregations like count produce results that are not well-defined when applied to multiple columns, so different systems handling such queries exhibit varying behaviors.

Solution:

To preserve non-aggregated columns in a Spark DataFrame groupBy, there are several options:

  1. Join Original DataFrame: Join the aggregated DataFrame with the original DataFrame to add the missing columns.
val aggregatedDf = df.groupBy(df("age")).agg(Map("id" -> "count"))
val joinedDf = aggregatedDf.join(df, Seq("age"), "left")
Copy after login
  1. Use Window Functions: Use window functions like first or last to include additional columns in the aggregation query. However, this approach can be computationally expensive in certain scenarios.
import org.apache.spark.sql.expressions.Window
val windowSpec = Window.partitionBy(df("age"))
val aggregatedDf = df.withColumn("name", first(df("name")).over(windowSpec))
  .groupBy(df("age")).agg(Map("id" -> "count"))
Copy after login

The above is the detailed content of How to Keep Non-Aggregated Columns After a Spark DataFrame GroupBy?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template