Home > Database > Mysql Tutorial > How to Get the Top N Items per Group in a Spark DataFrame?

How to Get the Top N Items per Group in a Spark DataFrame?

Linda Hamilton
Release: 2024-12-23 01:57:15
Original
445 people have browsed it

How to Get the Top N Items per Group in a Spark DataFrame?

Get Top N Items per Group Using Spark DataFrame GroupBy

In Spark DataFrame operations, you may encounter the need to group data by a specific column and retrieve the top N items within each group. This article demonstrates how to achieve this using Scala, inspiraling from a Python example.

Consider the provided DataFrame:

user1 item1 rating1
user1 item2 rating2
user1 item3 rating3
user2 item1 rating4
...
Copy after login

Scala Solution

To retrieve the top N items for each user group, you can leverage a window function in conjunction with the orderBy and where operations. Here's the implementation:

// Import required functions and classes
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.{rank, desc}

// Specify the number of desired top N items
val n: Int = ???

// Define the window definition for ranking
val w = Window.partitionBy($"user").orderBy(desc("rating"))

// Calculate the rank within each group using the rank function
val rankedDF = df.withColumn("rank", rank.over(w))

// Filter the DataFrame to select only the top N items
val topNDF = rankedDF.where($"rank" <= n)
Copy after login

Alternative Option

If ties are not a concern, you can substitute rank with row_number:

val topNDF = rankedDF.withColumn("row_num", row_number.over(w)).where($"row_num" <= n)
Copy after login

By using this approach, you can efficiently retrieve the top N items for each user group in your DataFrame.

The above is the detailed content of How to Get the Top N Items per Group in a Spark DataFrame?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template