SparkSQL Subquery Support
SparkSQL fully supports correlated and non-correlated subqueries in versions 2.0 and beyond. However, in versions prior to 2.0, Spark's support for subqueries was limited.
For subqueries in the FROM clause, Spark supports them in the same way as Hive (versions <= 0.12).
SELECT col FROM (SELECT * FROM t1 WHERE bar) t2
However, subqueries in the WHERE clause were not supported in Spark versions prior to 2.0. This was due to performance concerns and the fact that every subquery can be expressed using JOIN.
In Spark 2.0 and later, both correlated and uncorrelated subqueries are supported. Examples include:
SELECT * FROM l WHERE exists (SELECT * FROM r WHERE l.a = r.c) SELECT * FROM l WHERE l.a in (SELECT c FROM r)
However, it's important to note that using DataFrame DSL to express subqueries in versions prior to 2.0 is not currently possible.
The above is the detailed content of How Does SparkSQL Handle Subqueries Across Different Versions?. For more information, please follow other related articles on the PHP Chinese website!