How to Split a Vector Column into Individual Columns in PySpark?

Mary-Kate Olsen
Release: 2024-11-03 12:25:29
Original
940 people have browsed it

How to Split a Vector Column into Individual Columns in PySpark?

PySpark: Split Vector Into Columns

In PySpark, you may encounter a DataFrame with a vector column and the need to split it into multiple columns, one for each dimension. Here's how to achieve this:

For Spark >= 3.0.0

Starting from Spark 3.0.0, a convenient way to extract vector components is using vector_to_array function:

<code class="python">df = df.withColumn("xs", vector_to_array("vector"))

# Pick the first three dimensions for illustration
result = df.select(["word"] + [col("xs")[i] for i in range(3)])</code>
Copy after login

For Spark < 3.0.0

Method 1:RDD Conversion

One approach involves converting the DataFrame to an RDD and extracting the vector components manually:

<code class="python">rdd = df.rdd.map(lambda row: (row.word, ) + tuple(row.vector.toArray().tolist()))
result = rdd.toDF(["word"])</code>
Copy after login

Method 2: UDF Creation

Alternatively, you can create a user-defined function (UDF) and apply it to the vector column:

<code class="python">@udf(ArrayType(DoubleType()))
def to_array(vector):
    return vector.toArray().tolist()

result = df.withColumn("xs", to_array(col("vector"))).select(["word"] + [col("xs")[i] for i in range(3)])</code>
Copy after login

The above is the detailed content of How to Split a Vector Column into Individual Columns in PySpark?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template