Apache Spark 2.0.0 中如何检索特定查询结果而不是整个表？-mysql教程-PHP中文网

Apache Spark 2.0.0 中如何检索特定查询结果而不是整个表？

Susan Sarandon

发布： 2024-11-30 03:22:14

原创

1032 人浏览过

How Can I Retrieve Specific Query Results Instead of Entire Tables in Apache Spark 2.0.0?

在 Apache Spark 2.0.0 中检索查询结果而不是表数据

在 Apache Spark 2.0.0 中，可以获取从外部数据库获取特定的查询结果集，而不是将整个表加载到 Spark 中。这对于优化性能和减少 Spark 应用程序处理的数据量非常有用。

使用 PySpark，您可以指定子查询作为 read 方法的 dbtable 参数。该子查询将在外部数据库上执行，结果数据将加载到 Spark 中。例如，以下代码演示了如何检索查询结果，而不是加载整个 schema.tablename 表：

from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .appName("spark play")\
    .getOrCreate()    

df = spark.read\
    .format("jdbc")\
    .option("url", "jdbc:mysql://localhost:port")\
    .option("dbtable", "(SELECT foo, bar FROM schema.tablename) AS tmp")\
    .option("user", "username")\
    .option("password", "password")\
    .load()

登录后复制

通过将子查询指定为 dbtable 参数，您可以仅选择特定列以及您感兴趣的行。这可以显着提高性能，尤其是在处理大型表时。

以上是Apache Spark 2.0.0 中如何检索特定查询结果而不是整个表？的详细内容。更多信息请关注PHP中文网其他相关文章！