Home > Database > Mysql Tutorial > How to Generate Sequential Row Numbers in Spark RDDs, Similar to SQL's `row_number()`?

How to Generate Sequential Row Numbers in Spark RDDs, Similar to SQL's `row_number()`?

Barbara Streisand
Release: 2024-12-20 05:40:09
Original
834 people have browsed it

How to Generate Sequential Row Numbers in Spark RDDs, Similar to SQL's `row_number()`?

How to Replicate SQL's Row Numbering in Spark RDDs

Understanding the Problem

You want to generate a sequential row number for each entry in a Spark RDD, ordered by specific columns and partitioned by a key column. Similar to SQL's row_number() over (partition by ... order by ...), but using Spark RDDs.

Your Initial Attempt

Your initial attempt used sortByKey and zipWithIndex, which did not produce the desired partitioned row numbers. Note that sortBy is not applicable directly to RDDs, requiring you to collect them first, resulting in a non-RDD output.

Solution using Spark 1.4

Data Preparation

Create an RDD with tuples of the form (K, (col1, col2, col3)).

val sample_data = Seq(((3,4),5,5,5),((3,4),5,5,9),((3,4),7,5,5),((1,2),1,2,3),((1,2),1,4,7),((1,2),2,2,3))
val temp1 = sc.parallelize(sample_data)
Copy after login

Generating Partitioned Row Numbers

Use rowNumber over a partitioned window to generate row numbers for each key:

import org.apache.spark.sql.functions._

temp1.toDF("key", "col1", "col2", "col3").withColumn("rownum", rowNumber() over (Window partitionBy "key" orderBy desc("col2"), "col3")))
Copy after login

Example Output

+---+----+----+----+------+
|key|col1|col2|col3|rownum|
+---+----+----+----+------+
|1,2|1   |4   |7    |2     |
|1,2|1   |2   |3    |1     |
|1,2|2   |2   |3    |3     |
|3,4|5   |5   |5    |1     |
|3,4|5   |5   |9    |2     |
|3,4|7   |5   |5    |3     |
+---+----+----+----+------+
Copy after login

The above is the detailed content of How to Generate Sequential Row Numbers in Spark RDDs, Similar to SQL's `row_number()`?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template