php editor Youzi recently received feedback from users that they encountered problems when using Jupyter Notebook on Docker to connect to PySpark. The specific problem is that I encountered some problems related to PostgreSQL during the connection process. In response to this problem, we will provide you with solutions and operation steps to help users successfully connect to PySpark and solve the problem. In this article, we will introduce in detail how to use Jupyter Notebook on Docker to connect to PySpark, and provide solutions to some common problems. We hope it will be helpful to everyone.
I encountered this problem py4jjavaerror: An error occurred when calling o124.save. :org.postgresql.util.psqexception: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster accepts tcp/ip connections.
When I run this pysark code on jupyter notbook and run everything using docker, postgresql is installed in my local machine (windows).
from pyspark.sql import SparkSession from pyspark.sql.functions import lit, col, explode import pyspark.sql.functions as f spark = SparkSession.builder.appName("ETL Pipeline").config("spark.jars", "./postgresql-42.7.1.jar").getOrCreate() df = spark.read.text("./Data/WordData.txt") df2 = df.withColumn("splitedData", f.split("value"," ")) df3 = df2.withColumn("words", explode("splitedData")) wordsDF = df3.select("words") wordCount = wordsDF.groupBy("words").count() driver = "org.postgresql.Driver" url = "jdbc:postgresql://localhost:5432/local_database" table = "word_count" user = "postgres" password = "12345" wordCount.write.format("jdbc") \ .option("driver", driver) \ .option("url", url) \ .option("dbtable", table) \ .option("mode", "append") \ .option("user", user) \ .option("password", password) \ .save() spark.stop()
I tried editing postgresql.conf adding "listen_addresses='localhost'" and editing pg_hba.conf adding "host all all 0.0.0.0/0 md5" but it didn't work for me so I don't know what to do Do.
I also solved the problem of installing PostgreSQL on docker (using this image https://hub.docker .com/_/postgres/ only Create a container for postgres) and use the command to create a network between the PySpark container and the postgreSQL container
docker network creates my_network
,
This command is for postgres container
docker run --name postgres_container --network my_network -e POSTGRES_PASSWORD=12345 -d -p 5432:5432 postgres:latest
This is for Jupyter-pyspark container
docker run --name jupyter_container --network my_network -it -p 8888:8888 -v C:\home\work\path:/home/jovyan/work jupyter/pyspark-notebook:latest
The above is the detailed content of Problem with postgreSQL, trying to connect PySpark on Jupyter Notebook on Docker. For more information, please follow other related articles on the PHP Chinese website!