Monday, March 19, 2007

Unit testing, a lot of developers don't get it


As a developer I quite often stumble across code that I would like to rewrite. The reasons why I want to rewrite the piece of code, may differ from case to case, reduce coupling and remove duplication are well known reasons. Lately I have been doing a lot of refactoring and one thing that stands out is how much terrible test code it is out there. It seems as if developers think that if they just have written a test for it, and the code coverage if good, then everything is good. This is dead wrong. I have seen people backing away from changing bad code just because they were backed by brittle and hard to understand tests.

Doing refactoring requires that tests are present. In systems using code written less than two or maybe three years ago, it seems to be quite rare that the code is not been backed by tests to some extension. But the way tests are written varies greatly. Having tests is a necessary but not sufficient prerequisite to do effective refactoring. In the latter cases where I have been doing requirements it strikes me as if a lot of developers do not understand that the same principles that they apply when writing system-code, should also be applied when writing test-code. It is all code, and should be treated as such. I have seen by far too much duplication, lengthy pieces of code iterating over collections just to add a conditional and assert for the particular element they where looking for, and by far to much cut-copy paste, ecpecialy in test-fixtures.

I am am doing a Lightening Talk on how to simplify unit testing on Oslo XP Meetup on 26 March. I will discuss small things you can use to write more fluent tests, tests that are easier to read and that requires less code. When I was reading up on different aspects on Unit testing, I came over this great article that was linked from unit.org. It really explains in sufficient details code smells in test-code and how to refactor them. Read it and distribute it to your peer developers.





Sunday, March 04, 2007

Effective pagination using Hibernate

This article presents with an solution on how to page through large amounts of data effectively without taking down your database and without keeping it all in memory.

Last major update 07.09.2008

I have now added references that describes how to retrieve the source-code for the examples. All source-code is released under the Creative Commons License as can be seen from the Licensing details, the source code can be retrieved from http://svitjod.rymdweb.com/repository/com/zenior/sandbox/ which contains the source-code in maven form. To include a dependency to the source code just include:

<repository>

  <id>internal</id>
  <name>Svitjod</name>
  <layout>default</layout>
  <url>http://svitjod.rymdweb.com/repository</url>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>

</repository>

and 

 <dependency>
  <groupId>com.zenior.sandbox</groupId>
  <artifactId>example-domain</artifactId>
  <version>1.0-SNAPSHOT</version>
  <scope>compile</scope>
  </dependency>

 <dependency>
  <groupId>com.zenior.sandbox</groupId>
  <artifactId>hibernate-sandbox</artifactId>
  <version>1.0-SNAPSHOT</version>
  <scope>compile</scope>
  </dependency>

To your project pom.xml. 

If do not use maven, you may just download the source code by looking for files ending with -sources in the maven repository mentioned above.

Abstract

Dealing effectively with large resultsets is important. Batch applications may require to process millions of rows as part of a transaction.  The brute force approach would be read everything up in memory before iterating over theelements. This approach will effectively kill performance and use far to much memory. One common approach when using Hibernate is to use options in the Criteria interface called setFirstResult/setMaxResult. This will generate SQL (at least for Oracle) where the database will actually have to read and throw away data before returning them to the client. Another approach is to use ScrollableResults, but here you will have to keep the connection open during the entire pagination process. By retrieving a collection of all the primary keys for the matching object, you will need to perform an extra query before actually retrieving the data, but it is very easy to generate effective SQL to retrieve using this approach. This is by far the best solution in terms of database-load and memory footprint, and it is easy to implement in a portable manner.

The problem

In order to have a rich domain model, we want relationships between domain objects to be implemented as relationships between the Java classes. If we are going to be able to do that without killing performance, we have be smart in how we deal with relationships having high cardinality.

Most importantly we must iterate through a large resultset without keeping all data in memory. We also want the approach to generate simple SQL, and avoid using esoteric options that make the SQL difficult to optimize or difficult to port. We might also need adjustments to the mapping strategy and relationships to retrieve data effectively.

It is assumed that the reader is familiar with Hibernate and with common ORM issues like n+1 selects.

The example

An order application processing an order with more than 100.000 order lines.We will see how different solutions differ with respect to memory consumption, resource usage in general, effectiveness in terms of execution and differences in how the SQL is generated. The solution will be using Java 6, Hibernate3.2.0GA with annotations, the Spring framework and Oracle XE.

I will test three different approaches that don't use custom code and for each approach say something about execution time, memory consumption and explain the inner details.
As a domain model for the test I have been using a domain-model representing a generic order-domain as shown in figure 1.

Figure 1
The model is simplified but representative.

Test fixture

I have used Hibernate and Springs AbstractTransactionalSpringContextTeststopopulate the test fixture that has been inserted into the database. Idon't want the unit test to include timing for actually setting up the data.Therefore I have executed the fixture-setup by adjusting the test-case slightly so that the fixture is actually committed, by using the following approach:
  1 protected void onSetUpInTransaction() throws Exception {
2 if (!isTestDataPresent()) {
3 createOrderWithOrderLines(1000 * 1000);
4 setComplete();
5 endTransaction();
6 }
7 }
Listing 1
By using the AbstractTransactionalSpringContextTests I can actually force the test itself to commit the data, which may sometimes be useful when inspecting persistent side-effects to data when executing tests. The methods setComplete() and endTransaction() will force the execution to commit after the test is finished. To make it idempotent, I have added a method that checks f the test-data is already there.

The brute-force approach yields the following test-results on my workstation, which is a custom built AMD Athlon 64 X2 4600, 4GB Ram, Dual Raptor 150GB SATA in a Raid 0 setup. The tests have been launched from IntelliJ 6.0.4 and with the Oracle XE instance running on the same machine. The test runner is launched in a spawned process using a maximum of 1 GB RAM (-Xmx1G).


Brute force

In the first approach, we process the entire order without thinking about memory consumption when we retrieve OrderLine instances from Hibernate.
This solution basically uses the one-to-many relational mapping and leaves it up to Hibernate to retrieve the data.

The mapping is defined as follows:
  1   @OneToMany(mappedBy = "order", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
2 public Set getOrderLines() {
3 return orderLines;
4 }
Listing 2
And on the OrderLine side of the bi-directional relationship, we have:
  1  @ManyToOne(fetch = FetchType.EAGER)
2 public Order getOrder() {
3 return order;
4 }

Listing 3

The test that gathers timing and resource consumption information is a simple JUnit-test method:

 

  1 public void testRetrieveOrderBruteForce() {
2 Order order = (Order) hibernateTemplate.get(Order.class, orderId);
3 ResourceMonitor monitor = new ResourceMonitor("order.getOrderLines()");
4 assertEquals(1000*1000,order.getOrderLines().size());
5 System.out.println(monitor.stop());
6 }

Listing 4


Time elapsed
Memory usage delta
79,5s
501000kb
Table 1 Test results brute-force approach

As we can see, it is quite fast to retrieve all the data in just a single select. We have retrieved 1000000 OrderLine instances in close to 80-seconds. The problem with this approach is that it is memory intensive. On most 32-bit systems, there is a practical upper limit of around 2GB heap, which will make it difficult to have more than three such queries running simultaneously [1].
In addition to this, when using a StatefulSession in Hibernate to keep track of updated fields, keeping this much data in memory at the same time will eventually halt your system or make it very slow [2]. Note that making the relationship lazy (which is the default behavior in Hibernate) will not change that fact that everything is fetched using one single select.

Using Criteria search limited by firstResult and maxResult

In order to retrieve the OrderLine side of the relationship using a Criteria query, we clearly cannot have a Hibernate-managed relationship between Order and OrderLine, we will have to manage this relationship ourselves. The most visible consequence of this will be that the Set<OrderLine> property of Order will have to be removed.This is unfortunate because it removes an important aspect of the domain-model, which is the explicit composition that constitutes the relationship.


When dealing with cases like this, I would like to be explicit about the relationship also in the domain of the Order-class, so other than removing the getOrderLines()method from the Order class, I would prefer changing it to something like this shown in listing 5.
  1 public Set<OrderLine> getOrderLines(HibernateTemplate template) {
2 ....
3 }
Listing 5

Here it is still quite clear that Order has a relationship to OrderLine. Still, however, we will have to implement the paging mechanism using a Colllection-implementation, which preserves a pretty decent domain-model with respect to an explicit relationship between Order and OrderLine. This might turn out to be a good thing, because it will probably easier to refactor the getOrderLines method to be extended to take a HibernateTemplate as an argument instead of in lining some other implementation whenever we need to retrieve OrderLine instances belonging to an Order.
An even better approach here would be to de-couple the dependency to Hibernate completely.

I discovered this after implementing the getOrderLines() - method, taking hibernateTemplate as a parameter. The example-domain module that contains the order-domain related classes, did not have a dependency to Hibernate, which I did not want to introduce because I wanted the example-domain to be as lean as possible. In order to compare multiple strategies for paging using the same domain-classes I had to introduce an interface that made plugging different strategies as in listing 6.
 1 public Collection getOrderLines(CollectionBuilder collectionBuilder) {
2 return collectionBuilder.build(this);
3 }


1 public interface CollectionBuilder {
2 /**
3 * Build the collection using information from the given instance.
4 * @param instance
5 * @return
6 */
7 Collection build(Object owner);
8 }

Listing 6

Using the Collection-builder interface, I could easily provide a Hibernate based implementation of the different strategies without introducing a Hibernate dependency into the domain-classes. Here it would be even more visible that the one-to-many collection is owned by an order instance, signified by the owner parameter of the build-method.

To support paging by restricting the underlying resultset Hibernate uses a mechanism by using the Dialect interface. This mechanism varies greatly between different RDBMS'es. The Orcale9Dialect uses a construct in oracle called rownum restrictions, which may be found in the getLimitString-method of the Dialect implementation given in listing 7 (listed only in part).
  1 public String getLimitString(String sql, boolean hasOffset) {
2 ....
10 StringBuffer pagingSelect = new StringBuffer( sql.length()+100 );
11 if (hasOffset) {
12 pagingSelect.append("select * from ( select row_.*, rownum rownum_ from ( ");
13 }
14 else {
15 pagingSelect.append("select * from ( ");
16 }
17 pagingSelect.append(sql);
........
30 }
31
Listing 7

Without careful indexing and dependent upon the underlying RDBMS you might end up with a situation that may force the RDBMS to scan all the data for the entire set of OrderLine items each time in order to return the subset limited by setFirstResult and setMaxResult.
Here is an implementation of a Collection implementation that may page the underlying data  based upon limiting the page size in the way mentioned above:

  1 public class PagingCollection extends AbstractCollection {
2 private HibernateTemplate hibernateTemplate;
3 private HibernateCriteriaSpecification specificationCriteria;
4 private int pageSize = 2000;
5 private Integer collectionSize;
6 private Criteria queryCriteria;
7
8
9
public PagingCollection(HibernateTemplate hibernateTemplate, HibernateCriteriaSpecifiaa criteria, int pageSize)
10 this.specificationCriteria = criteria;
11 this.hibernateTemplate = hibernateTemplate;
12 this.pageSize = pageSize;
13 this.queryCriteria = criteria.populate(createCriteria());
14 }
15
16
public Iterator iterator() {
17 return new PagedIterator(size());
18 }
19
20
public int size() {
21 if (collectionSize == null) {
22 Criteria criteria = specificationCriteria.populate(createCriteria());
23 criteria.setProjection(Projections.countDistinct("id"));
24 collectionSize = (Integer) criteria.uniqueResult();
25 }
26 return collectionSize;
27 }
28
29
private Criteria createCriteria() {
30 Criteria criteria = (Criteria) hibernateTemplate.execute(new HibernateCallback() {
31 public Object doInHibernate(Session session) throws HibernateException, SQLException {
32 return session.createCriteria(specificationCriteria.getPersistentClass());
33 }
34 });
35 return criteria;
36 }
37
38
private class PagedIterator implements Iterator {
39 public boolean lastPage = false;
40 private int offset;
41 private List currentCollection;
42 private Iterator currentIterator;
43 private int size;
44
45
46
public PagedIterator(int size) {
47 this.size = size;
48 this.currentCollection = loadPage(0);
49 this.currentIterator = currentCollection.iterator();
50 }
51
52
protected List loadPage(int offset) {
53 if (offset > size) {
54 throw new NoSuchElementException();
55 }
56 this.offset = offset;
57 queryCriteria.setFirstResult(offset);
58 queryCriteria.setMaxResults(pageSize);
59 currentCollection = queryCriteria.list();
60 if ( (this.offset + pageSize )>= size) {
61 lastPage = true;
62 }
63 return currentCollection;
64 }
65
66
public boolean hasNext() {
67 return size() > 0 && (currentIterator.hasNext() || !lastPage);
68 }
69
70
public Object next() {
71 if (!currentIterator.hasNext()) {
72 unloadPage(currentCollection);
73 currentCollection = loadPage(offset + pageSize);
74 currentIterator = currentCollection.iterator();
75 }
76 return currentIterator.next();
77 }
78
79
protected void unloadPage(List currentCollection) {
80 for (Object o : currentCollection) {
81 hibernateTemplate.evict(o);
82 }
83 }
84
85
public void remove() {
86 throw new UnsupportedOperationException();
87 }
88 }
89 }
Listing 8

The test case looks as follows, note that we have to iterate the entire set if we want to poll everything out from the database. The size() method just returns an aggregated value by issuing a count.
  1  public void testRetrieveOrderLinesUsingPaging() {
2 Order order = (Order) hibernateTemplate.get(Order.class, orderId);
3 Collection orderLines = order.getOrderLines(hibernateTemplate, 50000);
4
5
ResourceMonitor monitor = new ResourceMonitor("order.getOrderLines()");
6 for (Object orderLine : orderLines) {
7 //
8 OrderLine line = (OrderLine) orderLine;
9 }
10 assertEquals(1000 * 1000, orderLines.size());
11 monitor.stop();
12 System.out.println(monitor);
13
14
}
Listing 9

Without tuning the underlying database with respect to query optimization, the test result looks like this:
Time elapsed
Memory usage delta
381s
29791kb
Table 2 Test result using setFirstResult/setMaxResult

Note that the footprint is reduced to below one tenth, but the response-time is not satisfactory. In this case a new query will be issued across each page boundary, in this case each page being 50.000 elements. It turns out that executing the SQL to retrieve one single page is almost as expensive as retrieving all rows in a single select. To understand why, we have to examine the SQL.
 1 SELECT *
2 FROM
3 (SELECT row_.*,
4 rownum rownum_
5 FROM
6 (SELECT *
7 FROM orderline o
8 WHERE o.order_id = order2_.id)
9 FROM orderline this_ LEFT
10 OUTER JOIN customer_order order2_ ON this_.order_id = order2_.id LEFT
11 WHERE this_.order_id = 10)
12 row_
13 WHERE rownum <= 100000)
14 WHERE rownum_ > 50000;
Listing 10

The SQL in listing 10 has been simplified from what Hibernate generates, but the most important attribute is the nested select with the rownum-restriction, which is shown. The query above will retrieve the second page in the select, the rows ranging from 50000 to 100000. Note the nested query where the first (innermost) select will return all rows with a rownum less than 100000. This is followed by a filter on the inner result, returning only the rows counting from 50000 and above. This means that the RDBMS will actually read and throw away most of the rows for each query. For this to be implemented consistently, we also need to make sure that the same select will return the same rows repeatedly. This can only be done if we add an order by clause to the select above (which is not included in this case). See [3] for a discussion on how rownum works in Oracle.

Using this approach for large results will effectively use up all your database resources.

Using ScrollableResults

The approach that uses ScrollableReults relies on the underlying JDBC-driver providing support for some kind of cursor that may allow you to retrieve data in an incremental manner without keeping everything in memory. A ScrollableResults can beretrieved from a Criteria implementation by using one of the Criteria.scroll(..) methods.Support for this may vary. Old versions of the JDBC-drivers in Oracle has implemented this as a client side cache that keeps everything in memory.Whereas newer-versions implements this using server-side cursors.

The implementation that uses the ScrollableResult is straightforward. The CollectionBuilder implementation is shown in Listing 11 and the actual Collection implementation in Listing 12.
 1 public class OrderLineScrolleableResultSetCollectionBuilder implements CollectionBuilder {
2 private HibernateTemplate hibernateTemplate;
3 private int pageSize;
4
5
public OrderLineScrolleableResultSetCollectionBuilder(HibernateTemplate hibernateTemplate, int pageSize) {
6 this.hibernateTemplate = hibernateTemplate;
7 this.pageSize = pageSize;
8 }
9
10
public Collection build(Object instance) {
11 Order order = (Order) instance;
12 return new ScrollableResultsCollection(hibernateTemplate, new HibernateCriteriaSpecification(OrderLine.class, order, "order"), pageSize);
13 }
14 }
15
Listing 11
  1 public class ScrollableResultsCollection extends AbstractCollection {
2 private HibernateTemplate hibernateTemplate;
3 private HibernateCriteriaSpecification specificationCriteria;
4 private int pageSize = 2000;
5 private Integer collectionSize = null;
6 private Criteria queryCriteria;
7
8
9
public ScrollableResultsCollection(HibernateTemplate hibernateTemplate, HibernateCriteriaSpecification criteria, int pageSize) {
10 this.specificationCriteria = criteria;
11 this.hibernateTemplate = hibernateTemplate;
12 this.pageSize = pageSize;
13 this.queryCriteria = criteria.populate(createCriteria());
14 }
15
16
public Iterator iterator() {
17 return new PagedIterator(size());
18 }
19
20
public synchronized int size() {
21 if (collectionSize == null) {
22 Criteria criteria = specificationCriteria.populate(createCriteria());
23 criteria.setProjection(Projections.countDistinct("id"));
24 collectionSize = (Integer) criteria.uniqueResult();
25
26
}
27 return collectionSize;
28 }
29
30
private Criteria createCriteria() {
31 Criteria criteria = (Criteria) hibernateTemplate.execute(new HibernateCallback() {
32 public Object doInHibernate(Session session) throws HibernateException, SQLException {
33 return session.createCriteria(specificationCriteria.getPersistentClass());
34 }
35 });
36 return criteria;
37 }
38
39
private class PagedIterator implements Iterator {
40 public boolean lastPage = false;
41 private int offset;
42 private List currentCollection;
43 private Iterator currentIterator;
44 private int size;
45 private ScrollableResults scrollebleResults;
46
47
48
public PagedIterator(int size) {
49 this.size = size;
50 this.scrollebleResults = queryCriteria.scroll(ScrollMode.FORWARD_ONLY);
51 this.currentCollection = loadPage(0);
52 this.currentIterator = currentCollection.iterator();
53
54
}
55
56
protected List loadPage(int offset) {
57 if (offset > size) {
58 throw new NoSuchElementException();
59 }
60 this.offset = offset;
61 currentCollection = new ArrayList(pageSize);
62 System.out.println("Loading offset ["+offset+"]");
63 if ((this.offset + pageSize) >= size) {
64 lastPage = true;
65 }
66 int scrollSize = lastPage ? size - offset: pageSize;
67 for (int i = 0; i < scrollSize; i++) {
68 if (!scrollebleResults.next()) {
69 throw new NoSuchElementException();
70 }
71 currentCollection.add(scrollebleResults.get(0));
72 }
73
74
currentIterator = currentCollection.iterator();
75 return currentCollection;
76 }
77
78
public boolean hasNext() {
79 return size() > 0 && (currentIterator.hasNext() || !lastPage);
80 }
81
82
public Object next() {
83 if (!currentIterator.hasNext()) {
84 unloadPage(currentCollection);
85 currentCollection = loadPage(offset + pageSize);
86 currentIterator = currentCollection.iterator();
87 }
88 return currentIterator.next();
89 }
90
91
protected void unloadPage(List currentCollection) {
92 for (Object o : currentCollection) {
93 hibernateTemplate.evict(o);
94 }
95 }
96
97
public void remove() {
98 throw new UnsupportedOperationException();
99 }
100 }
101 }
102
Listing 12
The ScrollableResults -interface is basically just a low-level approach in mapping row-based data into domain-objects. The instantiated and mapped objects are retrieved using the ScrollableResult.get(..)method. The get-method retrieves objects index from 0. It may be possible to retrieve several objects from a result-set, because hibernate generates queries that provides data for several objects when using outer-join fetching to avoid multiple roundtrips to the database.

Let us say that we extend the model presented in Figure 1 with a one-to-many relation relation to OrderLineDetail as shown in Figure 2.

Figure 2

In tis case, to get all the data from OrderDetail using one select and outer-join fetching, will make the returned select return four rows in total with the data for OrderLine repeated once for each instance caused by the outer-join with OrderDetail. This means that we lack a layer of abstraction here. We cannot implement this in a generic manner. We will actually have to alter the implementation of the strategy implemented to retrieve OrderLine instances just because we did a small change to the domain-model, adding the OrderDetail -relation.

The ScrollableResults is backed by a ResultSet and needs the connection to be open during the iteration of the query. This will pose some challenges when implementing paging spanning multiple web-requests, typically paging in a web-page, because the ScrollableResult will need to be valid and usable across requests.

Even though the approach is fast and lean on memory as can be seen by looking at the test-results in table 3, these limitations will probably get you in to problems except from in the simplest cases.

Time elapsed
Memory usage delta
89s
26009

Table 3


The solution

The most effective solution is to use the primary key as a paging criterion.This enables us to rely of first class constructs like a between range query which is simple for the RDBMS to optimize, the primary key of the queried entity will most likely be indexed already.
Retrieving data using a range query on the primary key is a two-step process. First one have to retrieve the collection of primary-keys, followed by a step to generate the intervals to properly identify a proper subset of the data,followed by the actual queries against the data.
  1 package com.zenior.sandbox.hibernate.pagination;
2
3
import no.zenior.domain.HibernateCriteriaSpecification;
4 import org.springframework.orm.hibernate3.HibernateTemplate;
5
6
import java.util.AbstractCollection;
7 import java.util.ArrayList;
8 import java.util.Collection;
9 import java.util.Iterator;
10
11
/**
12 * Collection implementation that iterates using a range query for available identifiers
13 * for the queried collection.
14 * </br>
15 *
16 * @author Bjørn Bjerkeli
17 */
18 public class IdIntervalCollection extends AbstractCollection {
19 private static final int DEFAULT_PAGESIZE = 2000;
20 private HibernateTemplate hibernateTemplate;
21 private HibernateCriteriaSpecification finderSpecification;
22 private int pageSize = DEFAULT_PAGESIZE;
23 private IdIntervalCollectionBuilder pager;
24 private int size;
25
26
/**
27 * @param hibernateTemplate
28 * @param finderSpecification
29 */
30 public IdIntervalCollection(HibernateTemplate hibernateTemplate,
31 HibernateCriteriaSpecification finderSpecification) {
32 this(hibernateTemplate, finderSpecification, DEFAULT_PAGESIZE);
33 }
34
35
/**
36 * @param hibernateTemplate
37 * @param finderSpecification
38 * @param pageSize
39 */
40 public IdIntervalCollection(HibernateTemplate hibernateTemplate,
41 HibernateCriteriaSpecification finderSpecification, int pageSize) {
42 this.hibernateTemplate = hibernateTemplate;
43 this.finderSpecification = finderSpecification;
44 this.pageSize = pageSize;
45 initialize();
46 }
47
48
/**
49 * Initialize the instance of BucketIdIntervalPager.
50 */
51 private void initialize() {
52 Collection identifiers = HibernateRepositoryTool.idProjection(hibernateTemplate, finderSpecification);
53 this.pager = new IdIntervalCollectionBuilder(pageSize);
54 for (Object identifier : identifiers) {
55 pager.add((Long) identifier);
56 }
57 this.size = identifiers.size();
58 hibernateTemplate.clear();
59 }
60
61
62
63
public int size() {
64 return size;
65 }
66
67
public Iterator iterator() {
68 return new Iterator() {
69 private Collection currentPage = new ArrayList();
70 private Iterator currentIterator = currentPage.iterator();
71 private Iterator currentIdIntervalIterator = new ArrayList(pager.buildIdIntervals()).iterator();
72 private IdInterval currentIdInterval;
73
74
public void remove() {
75 throw new UnsupportedOperationException();
76 }
77
78
public boolean hasNext() {
79 return ((size > 0) && currentIdIntervalIterator.hasNext())
80 || currentIterator.hasNext();
81 }
82
83
public Object next() {
84 if (!currentIterator.hasNext()) {
85 unloadPage(currentPage);
86 currentIdInterval = (IdInterval) currentIdIntervalIterator.next();
87 loadPage(currentIdInterval);
88 }
89
90
return currentIterator.next();
91 }
92
93
protected void unloadPage(Collection currentPage) {
94 for (Object o : currentPage) {
95 hibernateTemplate.evict(o);
96 }
97 }
98
99
/**
100 * Load the data corresponding to the idInterval.
101 */
102 protected void loadPage(IdInterval currentIdInterval) {
103 finderSpecification.setIdInterval(currentIdInterval);
104 currentPage = HibernateRepositoryTool.find(hibernateTemplate, finderSpecification);
105 currentIterator = currentPage.iterator();
106 }
107 };
108 }
109 }
110
111
Listing 13

The key here is basically to implement the iteration process in two steps, one to retrieve identifiers, and one to actually provide the iterator with an id-interval that is used tospecify a between range-query. Retrieving the set of identifiers is easy, the HibernateCriteriaSpecification that uses Criteria for search has a Projections.id()factory that returns all identifiers for a given Criteria-query. Basedupon the entire set of identifiers we are able to construct buckets of identifiers each representing a logical and proper page of data.
  1  public void testRetrieveOrderLinesUsingPagingIdInterval() {
2 final Order order = (Order) hibernateTemplate.get(Order.class, orderId);
3 Collection orderLines = order.getOrderLines(new IdIntervalPagedCollectionBuilder(hibernateTemplate, 50000));
4
5
ResourceMonitor monitor = new ResourceMonitor("order.getOrderLines()");
6 for (Object orderLine : orderLines) {
7 //
8 OrderLine line = (OrderLine) orderLine;
9 }
10 assertEquals(1000 * 1000, orderLines.size());
11 monitor.stop();
12 System.out.println(monitor);
13
14 }
Listing 14

The IdIntervalPagedCollectionBuilder is the pluggable implementation of the CollectionBuilder interface described in listing 6.
 1 public class IdIntervalPagedCollectionBuilder implements CollectionBuilder {
2 private HibernateTemplate hibernateTemplate;
3 private int pageSize;
4
5
public IdIntervalPagedCollectionBuilder(HibernateTemplate hibernateTemplate, int pageSize) {
6 this.hibernateTemplate = hibernateTemplate;
7 this.pageSize = pageSize;
8 }
9
10
public Collection build(Object instance) {
11 return new IdIntervalCollection(hibernateTemplate,
12 new HibernateCriteriaSpecification(OrderLine.class, instance, "order"), pageSize);
13 }
14 }


Listing 15

The three examples have almost the same amount of code, but rely onquite different approaches. The test-result for the last test is remarkable.
Time elapsed
Memory Usage Delta
99,5s
29402
This approach is almost as fast as the brute-force version. The memory consumption is about one tenth. By selecting the appropriate page-size for this implementation, you may alter the ratio between execution time and memory consumption. This version is also stateless, it does not keep references to resources like the ScrollableResult version does, nor does it strain the database like the version using setFirstResult/setMaxResult.

Limitations

We need a primary key, preferably a surrogate key, that has a natural ordering so that we can implement the paging concept by using a between search for the identifiers. Another limitation is the need to actually retrieve all the identifiers before performing the search, which means two select-statements instead of possibly only one.

The Author

The Author is working as a senior consultant in a Norwegian consulting company, Zenior AS, and is an active member of the Norwegian JUG, javaBin.

Licensing

Creative Commons License
Hibernate paging using id-intervals by Bjørn Bjerkeli is licensed under a Creative Commons Attribution-Share Alike 3.0 Norway License.
Based on a work at svitjod.rymdweb.com.

References

[1] Memory limitation on 32 bit systems, http://www.theserverside.com/discussions/thread.tss?thread_id=19442
[2] Hibernate batch processing http://www.hibernate.org/hib_docs/v3/reference/en/html/batch.html
[3] Oracle rownum http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html