


Java calls Python Spark program to get stuck: How to solve the problem of Runtime.getRuntime().exec() blocking?
Analysis and solution of python code stuck in java call
In the process of calling python code using java, you often encounter some difficult problems, such as the program being stuck and unable to continue executing. This article will analyze a specific case and provide corresponding solutions.
Problem description: The developer uses Java's runtime.getruntime().exec() method to execute python scripts, and the python script uses spark for data processing. On the Java side, the output of the python script is obtained through the process object, but after the python script is executed to the sorted_word_count.take(20), the java side program is stuck and cannot continue execution.
The python script code is as follows:
spark = sparksession.builder.appname("read from java backend").master("local[*]").getorcreate(); # Get the passed parameter comment = sys.argv[1] # Convert json string to python object comment = json.loads(comment) # Convert comment list to rdd comment_rdd = spark.sparkcontext.parallelize(comment) # Convert rdd to dataframe df = spark.createdataframe(comment_rdd.map(lambda x: row(**x))) # Load the stop word library stop_words = spark.sparkcontext.textfile("c:/users/10421/downloads/baidu_stopwords.txt").collect() # ... (Some codes are omitted here) ... # Calculate the number of occurrences of each word word_count = df.rdd.map(lambda x: (x.word, 1)).reducebykey(lambda x, y: xy) sorted_word_count = word_count.sortby(lambda x: x[1], ascending=false) top_20_words = sorted_word_count.take(20) column = 0 for row in top_20_words: print(row[column])
The java code snippet is as follows:
process process = runtime.getruntime().exec(args1); // Get the program execution result inputstream inputstream = process.getinputstream(); bufferedreader reader = new bufferedreader(new inputstreamreader(inputstream,"gb2312")); // ... (Some codes are omitted here) ...
Problem analysis: After testing, it was found that the reason why the Java program was stuck is the execution of the code sorted_word_count.take(20) in the python script. This part of the code will block until spark processing completes and returns the result. Since process.getinputstream() is blocking, if the output of the python program is not output to the standard output stream in time, the java program will wait for it, resulting in a stuck.
Solution: The problem is most likely in character encoding. The original code uses gb2312 encoding to read the output of python, which may be inconsistent with the output encoding of the python script, causing data read blockage. Modifying the java code and using utf-8 encoding to read the output of python can solve this problem.
Modified java code:
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8")); BufferedReader reader2 = new BufferedReader(new InputStreamReader(errorStream, "UTF-8"));
By modifying the encoding of the read input stream and the error stream in the java code to utf-8, the problem of java program stuck can be solved. It should be noted that python scripts also need to make sure their output is encoded using utf-8. If the problem persists, you need to further check the execution efficiency of the spark job and whether there are other potential blocking operations in the python script.
The above is the detailed content of Java calls Python Spark program to get stuck: How to solve the problem of Runtime.getRuntime().exec() blocking?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Golang and Python each have their own advantages: Golang is suitable for high performance and concurrent programming, while Python is suitable for data science and web development. Golang is known for its concurrency model and efficient performance, while Python is known for its concise syntax and rich library ecosystem.

Golang is more suitable for high concurrency tasks, while Python has more advantages in flexibility. 1.Golang efficiently handles concurrency through goroutine and channel. 2. Python relies on threading and asyncio, which is affected by GIL, but provides multiple concurrency methods. The choice should be based on specific needs.

Sublime Text's column editing function can greatly improve code efficiency. 1. Select the same content through the shortcut key (Ctrl Shift L/Cmd Shift L) to modify it uniformly, such as batch replacement of variable names; 2. Use multiple column selection (Ctrl Shift M/Cmd Shift M) to modify it in the same position in different rows, such as adding parameters to multiple functions at the same time. After proficiency, column editing can significantly improve coding efficiency and reduce errors. It is suitable for various programming languages, but for complex code or conditional modifications, other tools may be required.

To write code using GBK encoding in Sublime Text, you need to: 1. Set the project encoding to GBK; 2. Create a new file; 3. Select GBK encoding when saving as; 4. Enter the code using GBK encoding.

ChooseGolangforhighperformanceandconcurrency,idealforbackendservicesandnetworkprogramming;selectPythonforrapiddevelopment,datascience,andmachinelearningduetoitsversatilityandextensivelibraries.

There are two ways to automatically format code in VSCode: use shortcut keys (Windows/Linux: Ctrl Shift I, macOS: Cmd Shift I) or through the menu (Editor Menu Bar > "Source" > "Format Document"). VSCode provides customizable automatic formatting options that can be configured in the Settings menu.

Sublime Text is a powerful customizable text editor with advantages and disadvantages. 1. Its powerful scalability allows users to customize editors through plug-ins, such as adding syntax highlighting and Git support; 2. Multiple selection and simultaneous editing functions improve efficiency, such as batch renaming variables; 3. The "Goto Anything" function can quickly jump to a specified line number, file or symbol; but it lacks built-in debugging functions and needs to be implemented by plug-ins, and plug-in management requires caution. Ultimately, the effectiveness of Sublime Text depends on the user's ability to effectively configure and manage it.

Python is highly favored for its simplicity and power, suitable for all needs from beginners to advanced developers. Its versatility is reflected in: 1) Easy to learn and use, simple syntax; 2) Rich libraries and frameworks, such as NumPy, Pandas, etc.; 3) Cross-platform support, which can be run on a variety of operating systems; 4) Suitable for scripting and automation tasks to improve work efficiency.
