Spark Equivalent of IF Then ELSE
In this example, we aim to add a new column "Class" to the "iris_spark" DataFrame based on the values of an existing categorical column, "iris_class," which has three distinct categories.
The provided code, however, encounters an error:
iris_spark_df = iris_spark.withColumn( "Class", F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))
The error message indicates that the when() function in Spark takes only two arguments, contrary to the provided code.
To address this issue, the correct structure for using the when() function is either:
(when(col("iris_class") == 'Iris-setosa', 0) .when(col("iris_class") == 'Iris-versicolor', 1) .otherwise(2))
or
(when(col("iris_class") == 'Iris-setosa', 0) .otherwise(when(col("iris_class") == 'Iris-versicolor', 1) .otherwise(2)))
These expressions are equivalent to the SQL CASE statements:
CASE WHEN (iris_class = 'Iris-setosa') THEN 0 WHEN (iris_class = 'Iris-versicolor') THEN 1 ELSE 2 END
and
CASE WHEN (iris_class = 'Iris-setosa') THEN 0 ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1 ELSE 2 END END
respectively.
The general syntax of when() in Spark is:
when(condition, value).when(...)
or
when(condition, value).otherwise(...)
Note that the Hive IF conditional expression IF(condition, if-true, if-false) is not supported directly in Spark and can only be used in raw SQL with Hive support.
The above is the detailed content of How to Implement IF-THEN-ELSE Logic in Spark Using `when()`?. For more information, please follow other related articles on the PHP Chinese website!