In Apache Spark, you can concatenate columns in a DataFrame using either raw SQL or the DataFrame API introduced in Spark 1.5.0.
To concatenate columns using raw SQL, employ the CONCAT function:
In Python:
df = sqlContext.createDataFrame([("foo", 1), ("bar", 2)], ("k", "v")) df.registerTempTable("df") sqlContext.sql("SELECT CONCAT(k, ' ', v) FROM df")
In Scala:
import sqlContext.implicits._ val df = sc.parallelize(Seq(("foo", 1), ("bar", 2))).toDF("k", "v") df.registerTempTable("df") sqlContext.sql("SELECT CONCAT(k, ' ', v) FROM df")
Since Spark 1.5.0, you can use the concat function with the DataFrame API:
In Python:
from pyspark.sql.functions import concat, col, lit df.select(concat(col("k"), lit(" "), col("v")))
In Scala:
import org.apache.spark.sql.functions.{concat, lit} df.select(concat($"k", lit(" "), $"v"))
There's also the concat_ws function, which takes a string separator as its first argument:
df.select(concat_ws("-", col("k"), col("v")))
The above is the detailed content of How to Concatenate Columns in an Apache Spark DataFrame?. For more information, please follow other related articles on the PHP Chinese website!