Converting PySpark String to Date Format
You have a PySpark DataFrame with a string column in the MM-dd-yyyy format, and you need to convert it to a date column.
Solution:
To convert a PySpark string column to a date column, you can use the to_date function. However, if you're using an older version of Spark (< 2.2), you can follow the alternative approach below:
Alternative Approach for Spark < 2.2:
Use a combination of unix_timestamp and from_unixtime functions:
from pyspark.sql.functions import unix_timestamp, from_unixtime # Example DataFrame with string dates df = spark.createDataFrame( [("11/25/1991",), ("11/24/1991",), ("11/30/1991",)], ["date_str"] ) # Convert to timestamps df2 = df.select( "date_str", from_unixtime(unix_timestamp("date_str", "MM/dd/yyy")).alias("date") )
This will create a new column named date with date objects converted from the string column.
The above is the detailed content of How to Convert a PySpark String Column to a Date Column?. For more information, please follow other related articles on the PHP Chinese website!