php editor Youzi brings you java questions and answers about monitoring Spring Data JPA flow. During development, monitoring data flows in real time is critical for system performance optimization and troubleshooting. This article will introduce how to monitor Spring Data JPA flow, allowing you to better understand the data processing process, detect problems in time, and handle them accordingly. Let’s discuss how to effectively monitor Spring Data JPA flow and improve system stability and performance!
I am trying to use spring data jpa streaming as instructed on this blog. However, I can't monitor the process or progress with any logs. Should I see multiple sql queries printed in the log when the process tries to extract data in batches? If not, then how do I know that all the rows are not loaded at once?
Other blogs like this one and this one suggested that I should set mysql's hint_fetch_size
to integer.min_value
which I thought might be the solution, but this throws The following exception:
2024-01-29 14:40:20.843 warning 78247 --- [nio-8080-exec-1] o.h.engine.jdbc.spi.sqlexceptionhelper: sql Error: 0, sqlstate: s1000 2024-01-29 14:40:20.843 Error 78247 --- [nio-8080-exec-1] o.h.engine.jdbc.spi.sqlexceptionhelper: Streaming result set com.mysql.cj.protocol.a.result. resultsetrowsstreaming@4ca63fa5 is still active. No statements shall be issued while any streaming result set is open and in use on a given connection. Before trying more queries, make sure you have called .close() on any active streaming result sets. End time: 48 org.springframework.orm.jpa.jpasystemexception: Unable to extract resultset; nested exception is org.hibernate.exception.genericjdbcexception: Unable to extract resultset at org.springframework.orm.jpa.vendor.hibernatejpadialect.converthibernateaccessexception(hibernatejpadialect.java:331)
This is my repository code:
@QueryHints(value = { @QueryHint(name = org.hibernate.jpa.QueryHints.HINT_FETCH_SIZE, value = "" + Integer.MIN_VALUE), @QueryHint(name = org.hibernate.jpa.QueryHints.HINT_CACHEABLE, value = "false"), @QueryHint(name = org.hibernate.jpa.QueryHints.HINT_READONLY, value = "true"), }) @Query("SELECT s FROM Salary s") Stream<Salary> findAllStream();
I guess I'd like to get a guarantee if the above is the correct way to use stream queries in spring data jpa since I can't reliably monitor the performance of streaming myself?
renew
The above exception occurs due to repeated calls to the findallstream method in the same calling method. Removing one of them fixed the exception.
I can't find any log configuration to show whether the data is being fetched in batches. But I did find a way to test performance locally.
To test the streaming functionality, I need to access a database containing millions of records. I use docker image https://www.php.cn/link/7092d5eb1bbca1a22bdc69ba3f517e68 to use mysql employee data
After setting up the docker image, I'm having trouble connecting mysql workbench with the server. It looks like the docker image is not configured to accept ssl connections by default. I had to disable the use ssl
flag to be able to establish a connection. This setting appears in the mysql workbench under the ssl tab.
The connection string in the application must also be configured as follows:
spring.datasource.url=jdbc:mysql://localhost:3307/employees?verifyservercertificate=false&usessl=false&requiressl=false
The data in the employee database consists of a table named salaries
, which has approximately 2.8 million rows.
For testing, I wrote a small spring data jpa application that has the following methods in the repository class and a simple controller to call these methods:
@Override List<Salary> findAll(); @QueryHints(value = { @QueryHint(name = org.hibernate.jpa.QueryHints.HINT_FETCH_SIZE, value = "" + Integer.MIN_VALUE), @QueryHint(name = org.hibernate.jpa.QueryHints.HINT_CACHEABLE, value = "false"), @QueryHint(name = org.hibernate.jpa.QueryHints.HINT_READONLY, value = "true"), }) @Query("SELECT s FROM Salary s") Stream<Salary> findAllStream();
I then wrote a small piece of code to convert the read data into a json object and then write it back to the file using multiple threads. This is to simulate processing in real-life cases.
This is what I observed.
When using the list method, memory usage increases significantly. The initial query took most of the time, but once all the data was loaded, the actual data processing was completed quickly.
When using the stream method, the impact on memory usage is almost unnoticeable. But overall, the performance of the completion processing part is similar or even worse compared to the list method.
in conclusion
My above findings lead me to conclude that the stream
return type of a repository method should only be used when there is a risk of running out of memory, i.e. getting an out memory exception
. Otherwise, if your application is already running on a large enough server, the overall impact on memory usage will be barely noticeable and will only be temporary if your process completes quickly.
Memory usage statistics from intellij profiler
The above is the detailed content of How to monitor Spring Data JPA streams. For more information, please follow other related articles on the PHP Chinese website!