Traditionally, fetching large amounts of data can strain memory resources, as it often involves loading the entire result set into memory.
=> Stream query methods offer a solution by providing a way to process data incrementally using Java 8 Streams. This ensures that only a portion of the data is held in memory at any time, enhancing performance and scalability.
In this blog post, we'll dive deep into how stream query methods work in Spring Data JPA, explore their use cases, and demonstrate their implementation.
For this guide, we’re using:
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency>
NOTE: For more detailed examples, please visit my GitHub repository here
Stream query methods in Spring Data JPA allow us to return query results as a Stream instead of a List or other collection types. This approach provides several benefits:
Efficient Resource Management: Data is processed incrementally, reducing memory overhead.
Lazy Processing: Results are fetched and processed on-demand, which is ideal for scenarios like pagination or batch processing.
Integration with Functional Programming: Streams fit with Java's functional programming features, enabling operations like filter, map, and collect.
=> Let's imagine that we are developing an e-commerce application and want to:
Entities
@Setter @Getter @Entity @Entity(name = "tbl_customer") public class Customer { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; private String name; private String email; @OneToMany(mappedBy = "customer", cascade = CascadeType.ALL, fetch = FetchType.LAZY) private List<Order> orders; }
@Setter @Getter @Entity(name = "tbl_order") public class Order { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; private Double amount; private LocalDateTime orderDate; @ManyToOne @JoinColumn(name = "customer_id") private Customer customer; }
Repository
public interface CustomerRepository extends JpaRepository<Customer, Long> { @Query(""" SELECT c FROM tbl_customer c JOIN FETCH c.orders o WHERE o.orderDate >= :startDate """) @QueryHints( @QueryHint(name = AvailableHints.HINT_FETCH_SIZE, value = "25") ) Stream<Customer> findCustomerWithOrders(@Param("startDate") LocalDateTime startDate); }
NOTE:
The JOIN FETCH ensures orders are eagerly loaded.
The @QueryHints used to provide additional hints to the JPA provides (e.g,. Hibernate) to optimize the query execution.
=> For example, when my query return 100 records:
Service
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency>
Here's the service class to process the data with two parameters startDate and minOrderAmount. As you can see, we don't filter by using sql query and load all data as stream then filter and group by our Java code.
Controller
@Setter @Getter @Entity @Entity(name = "tbl_customer") public class Customer { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; private String name; private String email; @OneToMany(mappedBy = "customer", cascade = CascadeType.ALL, fetch = FetchType.LAZY) private List<Order> orders; }
Testing
=> To create data for testing, you can execute the following script inside my source code or add by yourself.
src/main/resources/dummy-data.sql
Request:
@Setter @Getter @Entity(name = "tbl_order") public class Order { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; private Double amount; private LocalDateTime orderDate; @ManyToOne @JoinColumn(name = "customer_id") private Customer customer; }
Response:
public interface CustomerRepository extends JpaRepository<Customer, Long> { @Query(""" SELECT c FROM tbl_customer c JOIN FETCH c.orders o WHERE o.orderDate >= :startDate """) @QueryHints( @QueryHint(name = AvailableHints.HINT_FETCH_SIZE, value = "25") ) Stream<Customer> findCustomerWithOrders(@Param("startDate") LocalDateTime startDate); }
=> You can use IntelliJ Profiler to monitor memory usage and execution time. For more detail about how to add and test with large data set, you can find in my GitHub repository
Small Dataset: (10 customers, 100 orders)
Large Dataset (10.000 customers, 100.000 orders)
Performance Metrics
Metric | Stream | List |
---|---|---|
Initial Fetch Time | Slightly slower (due to lazy loading) | Faster (all at once) |
Memory Consumption | Low (incremental processing) | High (entire dataset in memory) |
Memory Consumption | Low (incremental processing) | High (entire dataset in memory) |
Processing Overhead | Efficient for large datasets | May cause memory issues for large datasets |
Batch Fetching | Supported (with fetch size) | Not applicable |
Error Recovery | Graceful with early termination | Limited, as data is preloaded |
Spring Data JPA stream query methods offer an elegant way to process large datasets efficiently while leveraging the power of Java Streams. By processing data incrementally, they reduce memory consumption and integrate seamlessly with modern functional programming paradigms.
What are your thoughts on stream query methods? Share your experiences and use cases in the comments below!
See you in the next posts. Happy Coding!
The above is the detailed content of Spring Data JPA Stream Query Methods. For more information, please follow other related articles on the PHP Chinese website!