Home > Backend Development > Python Tutorial > How to Convert PySpark String Columns to Date Format?

How to Convert PySpark String Columns to Date Format?

Mary-Kate Olsen
Release: 2024-11-25 15:17:15
Original
848 people have browsed it

How to Convert PySpark String Columns to Date Format?

Converting PySpark String to Date Format

You have a PySpark DataFrame with a string column representing dates in the MM-dd-yyyy format. Your attempt to convert this column to a date format using to_date function returns nulls. This article provides methods for addressing this issue.

Updated Recommendation (Spark 2.2 ):

For Spark versions 2.2 and above, the preferred approach is to use the to_date or to_timestamp functions, which now support the format argument. This allows you to specify the input format and convert the string column directly to a date or timestamp:

from pyspark.sql.functions import to_timestamp

df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
df.select(to_timestamp(df.t, 'yyyy-MM-dd HH:mm:ss').alias('dt')).collect()

# Output:
# [Row(dt=datetime.datetime(1997, 2, 28, 10, 30))]
Copy after login

Original Answer (Spark < 2.2):

For earlier Spark versions, you can use the following method without the need for a user-defined function (UDF):

from pyspark.sql.functions import unix_timestamp, from_unixtime

df = spark.createDataFrame(
    [("11/25/1991",), ("11/24/1991",), ("11/30/1991",)], 
    ['date_str']
)

df2 = df.select(
    'date_str', 
    from_unixtime(unix_timestamp('date_str', 'MM/dd/yyy')).alias('date')
)

print(df2)

# Output:
# DataFrame[date_str: string, date: timestamp]

df2.show(truncate=False)

# Output:
# +----------+-------------------+
# |date_str  |date               |
# +----------+-------------------+
# |11/25/1991|1991-11-25 00:00:00|
# |11/24/1991|1991-11-24 00:00:00|
# |11/30/1991|1991-11-30 00:00:00|
# +----------+-------------------+
Copy after login

In this method, unix_timestamp converts the string column to a Unix timestamp, and from_unixtime converts the Unix timestamp to a date column.

The above is the detailed content of How to Convert PySpark String Columns to Date Format?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template