How do you Convert VectorUDTs into Columns in PySpark?

Patricia Arquette
Release: 2024-10-31 18:34:01
Original
173 people have browsed it

How do you Convert VectorUDTs into Columns in PySpark?

Unraveling VectorUDTs into Columns Using PySpark

In PySpark, you may encounter the need to extract individual dimensions from vector columns stored as VectorUDTs. To accomplish this, you can leverage various approaches based on your Spark version.

Spark >= 3.0.0

PySpark 3.0.0 brings built-in functionality for this task:

<code class="python">from pyspark.ml.functions import vector_to_array

df.withColumn("xs", vector_to_array("vector")).select(["word"] + [col("xs")[i] for i in range(3)])</code>
Copy after login

This concisely converts the vector into an array and projects the desired columns.

Spark < 3.0.0

Pre-3.0.0 Spark versions require more intricate approaches:

RDD Conversion:

<code class="python">df.rdd.map(lambda row: (row.word,) + tuple(row.vector.toArray().tolist())).toDF(["word"])</code>
Copy after login

UDF Method:

<code class="python">from pyspark.sql.functions import udf, col
from pyspark.sql.types import ArrayType, DoubleType

def to_array(col):
    return udf(lambda v: v.toArray().tolist(), ArrayType(DoubleType()))(col)

df.withColumn("xs", to_array(col("vector"))).select(["word"] + [col("xs")[i] for i in range(3)])</code>
Copy after login

Note: For increased performance, ensure asNondeterministic is used with the UDF (requires Spark 2.3 ).

Scala Equivalent

For the Scala equivalent of these approaches, refer to "Spark Scala: How to convert Dataframe[vector] to DataFrame[f1:Double, ..., fn: Double)]."

The above is the detailed content of How do you Convert VectorUDTs into Columns in PySpark?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!