How to Split a Vector Column into Rows in PySpark?
Splitting a Vector Column into Rows in PySpark
In PySpark, splitting a column containing vector values into separate columns for each dimension is a common task. This article will guide you through different approaches to achieve this:
Spark 3.0.0 and Above
Spark 3.0.0 introduced the vector_to_array function, simplifying this process:
<code class="python">from pyspark.ml.functions import vector_to_array df = df.withColumn("xs", vector_to_array("vector"))</code>
You can then select the desired columns:
<code class="python">df.select(["word"] + [col("xs")[i] for i in range(3)])</code>
Spark Less Than 3.0.0
Approach 1: Converting to RDD
<code class="python">def extract(row): return (row.word, ) + tuple(row.vector.toArray().tolist()) df.rdd.map(extract).toDF(["word"]) # Vector values will be named _2, _3, ...</code>
Approach 2: Using a UDF
<code class="python">from pyspark.sql.functions import udf, col from pyspark.sql.types import ArrayType, DoubleType def to_array(col): def to_array_(v): return v.toArray().tolist() return udf(to_array_, ArrayType(DoubleType())).asNondeterministic()(col) df = df.withColumn("xs", to_array(col("vector")))</code>
Select the desired columns:
<code class="python">df.select(["word"] + [col("xs")[i] for i in range(3)])</code>
By implementing any of these methods, you can effectively split a vector column into individual columns, making it easier to work with and analyze your data.
The above is the detailed content of How to Split a Vector Column into Rows in PySpark?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How does Uvicorn continuously listen for HTTP requests? Uvicorn is a lightweight web server based on ASGI. One of its core functions is to listen for HTTP requests and proceed...

Fastapi ...

Using python in Linux terminal...

Understanding the anti-crawling strategy of Investing.com Many people often try to crawl news data from Investing.com (https://cn.investing.com/news/latest-news)...
