Effective pagination using Hibernate

This article presents with an solution on how to page through large amounts of data effectively without taking down your database and without keeping it all in memory.

Last major update 07.09.2008

I have now added references that describes how to retrieve the source-code for the examples. All source-code is released under the Creative Commons License as can be seen from the Licensing details, the source code can be retrieved from http://svitjod.rymdweb.com/repository/com/zenior/sandbox/ which contains the source-code in maven form. To include a dependency to the source code just include:

<id>internal</id>
<name>Svitjod</name>
<layout>default</layout>
<url>http://svitjod.rymdweb.com/repository</url>
<snapshots>
<enabled>false</enabled>
</snapshots>

</repository>

and

<dependency>
<groupId>com.zenior.sandbox</groupId>
<artifactId>example-domain</artifactId>
<version>1.0-SNAPSHOT</version>
<scope>compile</scope>
</dependency>

<dependency>
<groupId>com.zenior.sandbox</groupId>
<artifactId>hibernate-sandbox</artifactId>
<version>1.0-SNAPSHOT</version>
<scope>compile</scope>
</dependency>

To your project pom.xml.

If do not use maven, you may just download the source code by looking for files ending with -sources in the maven repository mentioned above.

Abstract

Dealing effectively with large resultsets is important. Batch applications may require to process millions of rows as part of a transaction. The brute force approach would be read everything up in memory before iterating over theelements. This approach will effectively kill performance and use far to much memory. One common approach when using Hibernate is to use options in the Criteria interface called setFirstResult/setMaxResult. This will generate SQL (at least for Oracle) where the database will actually have to read and throw away data before returning them to the client. Another approach is to use ScrollableResults, but here you will have to keep the connection open during the entire pagination process. By retrieving a collection of all the primary keys for the matching object, you will need to perform an extra query before actually retrieving the data, but it is very easy to generate effective SQL to retrieve using this approach. This is by far the best solution in terms of database-load and memory footprint, and it is easy to implement in a portable manner.

The problem

In order to have a rich domain model, we want relationships between domain objects to be implemented as relationships between the Java classes. If we are going to be able to do that without killing performance, we have be smart in how we deal with relationships having high cardinality.

Most importantly we must iterate through a large resultset without keeping all data in memory. We also want the approach to generate simple SQL, and avoid using esoteric options that make the SQL difficult to optimize or difficult to port. We might also need adjustments to the mapping strategy and relationships to retrieve data effectively.

It is assumed that the reader is familiar with Hibernate and with common ORM issues like n+1 selects.

The example

An order application processing an order with more than 100.000 order lines.We will see how different solutions differ with respect to memory consumption, resource usage in general, effectiveness in terms of execution and differences in how the SQL is generated. The solution will be using Java 6, Hibernate3.2.0GA with annotations, the Spring framework and Oracle XE.

I will test three different approaches that don't use custom code and for each approach say something about execution time, memory consumption and explain the inner details.
As a domain model for the test I have been using a domain-model representing a generic order-domain as shown in figure 1.
Order Domain

Figure 1
The model is simplified but representative.

Test fixture

I have used Hibernate and Springs AbstractTransactionalSpringContextTeststopopulate the test fixture that has been inserted into the database. Idon't want the unit test to include timing for actually setting up the data.Therefore I have executed the fixture-setup by adjusting the test-case slightly so that the fixture is actually committed, by using the following approach:

  1 protected void onSetUpInTransaction() throws Exception {
  2         if (!isTestDataPresent()) {
  3             createOrderWithOrderLines(1000 * 1000);
  4             setComplete();
  5             endTransaction();
  6         }
  7 }

Listing 1
By using the AbstractTransactionalSpringContextTests I can actually force the test itself to commit the data, which may sometimes be useful when inspecting persistent side-effects to data when executing tests. The methods setComplete() and endTransaction() will force the execution to commit after the test is finished. To make it idempotent, I have added a method that checks f the test-data is already there.

The brute-force approach yields the following test-results on my workstation, which is a custom built AMD Athlon 64 X2 4600, 4GB Ram, Dual Raptor 150GB SATA in a Raid 0 setup. The tests have been launched from IntelliJ 6.0.4 and with the Oracle XE instance running on the same machine. The test runner is launched in a spawned process using a maximum of 1 GB RAM (-Xmx1G).

Brute force

In the first approach, we process the entire order without thinking about memory consumption when we retrieve OrderLine instances from Hibernate.
This solution basically uses the one-to-many relational mapping and leaves it up to Hibernate to retrieve the data.

The mapping is defined as follows:

  1   @OneToMany(mappedBy = "order", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
  2     public Set getOrderLines() {
  3         return orderLines;
  4     }

Listing 2
And on the OrderLine side of the bi-directional relationship, we have:

  1  @ManyToOne(fetch = FetchType.EAGER)
  2     public Order getOrder() {
  3         return order;
  4     }

Listing 3

The test that gathers timing and resource consumption information is a simple JUnit-test method:

  1 public void testRetrieveOrderBruteForce() {
  2         Order order = (Order) hibernateTemplate.get(Order.class, orderId);
  3         ResourceMonitor monitor = new ResourceMonitor("order.getOrderLines()");
  4         assertEquals(1000*1000,order.getOrderLines().size());
  5         System.out.println(monitor.stop());
  6 }

Listing 4

Time elapsed	Memory usage delta
79,5s	501000kb

Table 1 Test results brute-force approach

As we can see, it is quite fast to retrieve all the data in just a single select. We have retrieved 1000000 OrderLine instances in close to 80-seconds. The problem with this approach is that it is memory intensive. On most 32-bit systems, there is a practical upper limit of around 2GB heap, which will make it difficult to have more than three such queries running simultaneously [1].
In addition to this, when using a StatefulSession in Hibernate to keep track of updated fields, keeping this much data in memory at the same time will eventually halt your system or make it very slow [2]. Note that making the relationship lazy (which is the default behavior in Hibernate) will not change that fact that everything is fetched using one single select.

Using Criteria search limited by firstResult and maxResult

In order to retrieve the OrderLine side of the relationship using a Criteria query, we clearly cannot have a Hibernate-managed relationship between Order and OrderLine, we will have to manage this relationship ourselves. The most visible consequence of this will be that the Set<OrderLine> property of Order will have to be removed.This is unfortunate because it removes an important aspect of the domain-model, which is the explicit composition that constitutes the relationship.

When dealing with cases like this, I would like to be explicit about the relationship also in the domain of the Order-class, so other than removing the getOrderLines()method from the Order class, I would prefer changing it to something like this shown in listing 5.

  1 public Set<OrderLine> getOrderLines(HibernateTemplate template) {
  2         ....
  3 }

Listing 5

Here it is still quite clear that Order has a relationship to OrderLine. Still, however, we will have to implement the paging mechanism using a Colllection-implementation, which preserves a pretty decent domain-model with respect to an explicit relationship between Order and OrderLine. This might turn out to be a good thing, because it will probably easier to refactor the getOrderLines method to be extended to take a HibernateTemplate as an argument instead of in lining some other implementation whenever we need to retrieve OrderLine instances belonging to an Order.
An even better approach here would be to de-couple the dependency to Hibernate completely.

I discovered this after implementing the getOrderLines() - method, taking hibernateTemplate as a parameter. The example-domain module that contains the order-domain related classes, did not have a dependency to Hibernate, which I did not want to introduce because I wanted the example-domain to be as lean as possible. In order to compare multiple strategies for paging using the same domain-classes I had to introduce an interface that made plugging different strategies as in listing 6.

 1 public Collection getOrderLines(CollectionBuilder collectionBuilder) {
 2         return collectionBuilder.build(this);
 3 }


 1 public interface CollectionBuilder {
 2     /**
 3      * Build the collection using information from the given instance.
 4      * @param instance
 5      * @return
 6      */
 7     Collection build(Object owner);
 8 }

Listing 6

Using the Collection-builder interface, I could easily provide a Hibernate based implementation of the different strategies without introducing a Hibernate dependency into the domain-classes. Here it would be even more visible that the one-to-many collection is owned by an order instance, signified by the owner parameter of the build-method.

To support paging by restricting the underlying resultset Hibernate uses a mechanism by using the Dialect interface. This mechanism varies greatly between different RDBMS'es. The Orcale9Dialect uses a construct in oracle called rownum restrictions, which may be found in the getLimitString-method of the Dialect implementation given in listing 7 (listed only in part).

  1 public String getLimitString(String sql, boolean hasOffset) {
  2 ....
 10 		StringBuffer pagingSelect = new StringBuffer( sql.length()+100 );
 11 if (hasOffset) {
 12 			pagingSelect.append("select * from ( select row_.*, rownum rownum_ from ( ");
 13 		}
 14 else {
 15 			pagingSelect.append("select * from ( ");
 16 		}
 17 		pagingSelect.append(sql);
         ........
 30 }
 31

Listing 7

Without careful indexing and dependent upon the underlying RDBMS you might end up with a situation that may force the RDBMS to scan all the data for the entire set of OrderLine items each time in order to return the subset limited by setFirstResult and setMaxResult.
Here is an implementation of a Collection implementation that may page the underlying data based upon limiting the page size in the way mentioned above:

  1 public class PagingCollection extends AbstractCollection {
  2     private HibernateTemplate hibernateTemplate;
  3     private HibernateCriteriaSpecification specificationCriteria;
  4     private int pageSize = 2000;
  5     private Integer collectionSize;
  6     private Criteria queryCriteria;
  7 
  8 
  9     public PagingCollection(HibernateTemplate hibernateTemplate, HibernateCriteriaSpecifiaa criteria, int pageSize)
 10         this.specificationCriteria = criteria;
 11         this.hibernateTemplate = hibernateTemplate;
 12         this.pageSize = pageSize;
 13         this.queryCriteria = criteria.populate(createCriteria());
 14     }
 15 
 16     public Iterator iterator() {
 17         return new PagedIterator(size());
 18     }
 19 
 20     public int size() {
 21         if (collectionSize == null) {
 22             Criteria criteria = specificationCriteria.populate(createCriteria());
 23             criteria.setProjection(Projections.countDistinct("id"));
 24             collectionSize = (Integer) criteria.uniqueResult();
 25         }
 26         return collectionSize;
 27     }
 28 
 29     private Criteria createCriteria() {
 30         Criteria criteria = (Criteria) hibernateTemplate.execute(new HibernateCallback() {
 31             public Object doInHibernate(Session session) throws HibernateException, SQLException {
 32                 return session.createCriteria(specificationCriteria.getPersistentClass());
 33             }
 34         });
 35         return criteria;
 36     }
 37 
 38     private class PagedIterator implements Iterator {
 39         public boolean lastPage = false;
 40         private int offset;
 41         private List currentCollection;
 42         private Iterator currentIterator;
 43         private int size;
 44 
 45 
 46         public PagedIterator(int size) {
 47             this.size = size;
 48             this.currentCollection = loadPage(0);
 49             this.currentIterator = currentCollection.iterator();
 50         }
 51 
 52         protected List loadPage(int offset) {
 53             if (offset > size) {
 54                 throw new NoSuchElementException();
 55             }
 56             this.offset = offset;
 57             queryCriteria.setFirstResult(offset);
 58             queryCriteria.setMaxResults(pageSize);
 59             currentCollection = queryCriteria.list();
 60             if ( (this.offset + pageSize )>= size) {
 61                 lastPage = true;
 62             }
 63             return currentCollection;
 64         }
 65 
 66         public boolean hasNext() {
 67             return size() > 0 && (currentIterator.hasNext() || !lastPage);
 68         }
 69 
 70         public Object next() {
 71             if (!currentIterator.hasNext()) {
 72                 unloadPage(currentCollection);
 73                 currentCollection = loadPage(offset + pageSize);
 74                 currentIterator = currentCollection.iterator();
 75             }
 76             return currentIterator.next();
 77         }
 78 
 79         protected void unloadPage(List currentCollection) {
 80             for (Object o : currentCollection) {
 81                 hibernateTemplate.evict(o);
 82             }
 83         }
 84 
 85         public void remove() {
 86             throw new UnsupportedOperationException();
 87         }
 88     }
 89 }

Listing 8

The test case looks as follows, note that we have to iterate the entire set if we want to poll everything out from the database. The size() method just returns an aggregated value by issuing a count.

  1  public void testRetrieveOrderLinesUsingPaging() {
  2         Order order = (Order) hibernateTemplate.get(Order.class, orderId);
  3         Collection orderLines = order.getOrderLines(hibernateTemplate, 50000);
  4 
  5         ResourceMonitor monitor = new ResourceMonitor("order.getOrderLines()");
  6         for (Object orderLine : orderLines) {
  7             //
  8             OrderLine line = (OrderLine) orderLine;
  9         }
 10         assertEquals(1000 * 1000, orderLines.size());
 11         monitor.stop();
 12         System.out.println(monitor);
 13 
 14  }

Listing 9

Without tuning the underlying database with respect to query optimization, the test result looks like this:

Time elapsed	Memory usage delta
381s	29791kb

Table 2 Test result using setFirstResult/setMaxResult

Note that the footprint is reduced to below one tenth, but the response-time is not satisfactory. In this case a new query will be issued across each page boundary, in this case each page being 50.000 elements. It turns out that executing the SQL to retrieve one single page is almost as expensive as retrieving all rows in a single select. To understand why, we have to examine the SQL.

 1 SELECT *
 2 FROM
 3   (SELECT row_.*,
 4      rownum rownum_
 5    FROM
 6     (SELECT * 
 7        FROM orderline o
 8        WHERE o.order_id = order2_.id)
 9      FROM orderline this_ LEFT
10      OUTER JOIN customer_order order2_ ON this_.order_id = order2_.id LEFT
11      WHERE this_.order_id = 10)
12   row_
13    WHERE rownum <= 100000)
14 WHERE rownum_ > 50000;

Listing 10

The SQL in listing 10 has been simplified from what Hibernate generates, but the most important attribute is the nested select with the rownum-restriction, which is shown. The query above will retrieve the second page in the select, the rows ranging from 50000 to 100000. Note the nested query where the first (innermost) select will return all rows with a rownum less than 100000. This is followed by a filter on the inner result, returning only the rows counting from 50000 and above. This means that the RDBMS will actually read and throw away most of the rows for each query. For this to be implemented consistently, we also need to make sure that the same select will return the same rows repeatedly. This can only be done if we add an order by clause to the select above (which is not included in this case). See [3] for a discussion on how rownum works in Oracle.

Using this approach for large results will effectively use up all your database resources.

Using ScrollableResults

The approach that uses ScrollableReults relies on the underlying JDBC-driver providing support for some kind of cursor that may allow you to retrieve data in an incremental manner without keeping everything in memory. A ScrollableResults can beretrieved from a Criteria implementation by using one of the Criteria.scroll(..) methods.Support for this may vary. Old versions of the JDBC-drivers in Oracle has implemented this as a client side cache that keeps everything in memory.Whereas newer-versions implements this using server-side cursors.

The implementation that uses the ScrollableResult is straightforward. The CollectionBuilder implementation is shown in Listing 11 and the actual Collection implementation in Listing 12.

 1 public class OrderLineScrolleableResultSetCollectionBuilder implements CollectionBuilder {
 2     private HibernateTemplate hibernateTemplate;
 3     private int pageSize;
 4 
 5     public OrderLineScrolleableResultSetCollectionBuilder(HibernateTemplate hibernateTemplate, int pageSize) {
 6         this.hibernateTemplate = hibernateTemplate;
 7         this.pageSize = pageSize;
 8     }
 9 
10     public Collection build(Object instance) {
11         Order order = (Order) instance;
12         return new ScrollableResultsCollection(hibernateTemplate, new HibernateCriteriaSpecification(OrderLine.class, order, "order"), pageSize);
13     }
14 }
15

Listing 11

  1 public class ScrollableResultsCollection extends AbstractCollection {
  2     private HibernateTemplate hibernateTemplate;
  3     private HibernateCriteriaSpecification specificationCriteria;
  4     private int pageSize = 2000;
  5     private Integer collectionSize = null;
  6     private Criteria queryCriteria;
  7 
  8 
  9     public ScrollableResultsCollection(HibernateTemplate hibernateTemplate, HibernateCriteriaSpecification criteria, int pageSize) {
 10         this.specificationCriteria = criteria;
 11         this.hibernateTemplate = hibernateTemplate;
 12         this.pageSize = pageSize;
 13         this.queryCriteria = criteria.populate(createCriteria());
 14     }
 15 
 16     public Iterator iterator() {
 17         return new PagedIterator(size());
 18     }
 19 
 20     public synchronized int size() {
 21         if (collectionSize == null) {
 22             Criteria criteria = specificationCriteria.populate(createCriteria());
 23             criteria.setProjection(Projections.countDistinct("id"));
 24             collectionSize = (Integer) criteria.uniqueResult();
 25 
 26         }
 27         return collectionSize;
 28     }
 29 
 30     private Criteria createCriteria() {
 31         Criteria criteria = (Criteria) hibernateTemplate.execute(new HibernateCallback() {
 32             public Object doInHibernate(Session session) throws HibernateException, SQLException {
 33                 return session.createCriteria(specificationCriteria.getPersistentClass());
 34             }
 35         });
 36         return criteria;
 37     }
 38 
 39     private class PagedIterator implements Iterator {
 40         public boolean lastPage = false;
 41         private int offset;
 42         private List currentCollection;
 43         private Iterator currentIterator;
 44         private int size;
 45         private ScrollableResults scrollebleResults;
 46 
 47 
 48         public PagedIterator(int size) {
 49             this.size = size;
 50             this.scrollebleResults = queryCriteria.scroll(ScrollMode.FORWARD_ONLY);
 51             this.currentCollection = loadPage(0);
 52             this.currentIterator = currentCollection.iterator();
 53 
 54         }
 55 
 56         protected List loadPage(int offset) {
 57             if (offset > size) {
 58                 throw new NoSuchElementException();
 59             }
 60             this.offset = offset;
 61             currentCollection = new ArrayList(pageSize);
 62             System.out.println("Loading offset ["+offset+"]");
 63             if ((this.offset + pageSize) >= size) {
 64                 lastPage = true;
 65             }
 66             int scrollSize = lastPage ? size - offset: pageSize;
 67             for (int i = 0; i < scrollSize; i++) {
 68                 if (!scrollebleResults.next()) {
 69                     throw new NoSuchElementException();
 70                 }
 71                 currentCollection.add(scrollebleResults.get(0));
 72             }
 73 
 74             currentIterator = currentCollection.iterator();
 75             return currentCollection;
 76         }
 77 
 78         public boolean hasNext() {
 79             return size() > 0 && (currentIterator.hasNext() || !lastPage);
 80         }
 81 
 82         public Object next() {
 83             if (!currentIterator.hasNext()) {
 84                 unloadPage(currentCollection);
 85                 currentCollection = loadPage(offset + pageSize);
 86                 currentIterator = currentCollection.iterator();
 87             }
 88             return currentIterator.next();
 89         }
 90 
 91         protected void unloadPage(List currentCollection) {
 92             for (Object o : currentCollection) {
 93                 hibernateTemplate.evict(o);
 94             }
 95         }
 96 
 97         public void remove() {
 98             throw new UnsupportedOperationException();
 99         }
100     }
101 }
102

Listing 12
The ScrollableResults -interface is basically just a low-level approach in mapping row-based data into domain-objects. The instantiated and mapped objects are retrieved using the ScrollableResult.get(..)method. The get-method retrieves objects index from 0. It may be possible to retrieve several objects from a result-set, because hibernate generates queries that provides data for several objects when using outer-join fetching to avoid multiple roundtrips to the database.

Let us say that we extend the model presented in Figure 1 with a one-to-many relation relation to OrderLineDetail as shown in Figure 2.

Figure 2

In tis case, to get all the data from OrderDetail using one select and outer-join fetching, will make the returned select return four rows in total with the data for OrderLine repeated once for each instance caused by the outer-join with OrderDetail. This means that we lack a layer of abstraction here. We cannot implement this in a generic manner. We will actually have to alter the implementation of the strategy implemented to retrieve OrderLine instances just because we did a small change to the domain-model, adding the OrderDetail -relation.

The ScrollableResults is backed by a ResultSet and needs the connection to be open during the iteration of the query. This will pose some challenges when implementing paging spanning multiple web-requests, typically paging in a web-page, because the ScrollableResult will need to be valid and usable across requests.

Even though the approach is fast and lean on memory as can be seen by looking at the test-results in table 3, these limitations will probably get you in to problems except from in the simplest cases.

Time elapsed	Memory usage delta
89s	26009

Table 3

The solution

The most effective solution is to use the primary key as a paging criterion.This enables us to rely of first class constructs like a between range query which is simple for the RDBMS to optimize, the primary key of the queried entity will most likely be indexed already.
Retrieving data using a range query on the primary key is a two-step process. First one have to retrieve the collection of primary-keys, followed by a step to generate the intervals to properly identify a proper subset of the data,followed by the actual queries against the data.

  1 package com.zenior.sandbox.hibernate.pagination;
  2 
  3 import no.zenior.domain.HibernateCriteriaSpecification;
  4 import org.springframework.orm.hibernate3.HibernateTemplate;
  5 
  6 import java.util.AbstractCollection;
  7 import java.util.ArrayList;
  8 import java.util.Collection;
  9 import java.util.Iterator;
 10 
 11 /**
 12  * Collection implementation that iterates using a range query for available identifiers
 13  * for the queried collection.
 14  * </br>
 15  *
 16  * @author Bjørn Bjerkeli
 17  */
 18 public class IdIntervalCollection extends AbstractCollection {
 19     private static final int DEFAULT_PAGESIZE = 2000;
 20     private HibernateTemplate hibernateTemplate;
 21     private HibernateCriteriaSpecification finderSpecification;
 22     private int pageSize = DEFAULT_PAGESIZE;
 23     private IdIntervalCollectionBuilder pager;
 24     private int size;
 25 
 26     /**
 27      * @param hibernateTemplate
 28      * @param finderSpecification
 29      */
 30     public IdIntervalCollection(HibernateTemplate hibernateTemplate,
 31                                 HibernateCriteriaSpecification finderSpecification) {
 32         this(hibernateTemplate, finderSpecification, DEFAULT_PAGESIZE);
 33     }
 34 
 35     /**
 36      * @param hibernateTemplate
 37      * @param finderSpecification
 38      * @param pageSize
 39      */
 40     public IdIntervalCollection(HibernateTemplate hibernateTemplate,
 41                                 HibernateCriteriaSpecification finderSpecification, int pageSize) {
 42         this.hibernateTemplate = hibernateTemplate;
 43         this.finderSpecification = finderSpecification;
 44         this.pageSize = pageSize;
 45         initialize();
 46     }
 47 
 48     /**
 49      * Initialize the instance of BucketIdIntervalPager.
 50      */
 51     private void initialize() {
 52         Collection identifiers = HibernateRepositoryTool.idProjection(hibernateTemplate, finderSpecification);
 53         this.pager = new IdIntervalCollectionBuilder(pageSize);
 54         for (Object identifier : identifiers) {
 55             pager.add((Long) identifier);
 56         }
 57         this.size = identifiers.size();
 58         hibernateTemplate.clear();
 59     }
 60 
 61 
 62 
 63     public int size() {
 64         return size;
 65     }
 66 
 67     public Iterator iterator() {
 68         return new Iterator() {
 69             private Collection currentPage = new ArrayList();
 70             private Iterator currentIterator = currentPage.iterator();
 71             private Iterator currentIdIntervalIterator = new ArrayList(pager.buildIdIntervals()).iterator();
 72             private IdInterval currentIdInterval;
 73 
 74             public void remove() {
 75                 throw new UnsupportedOperationException();
 76             }
 77 
 78             public boolean hasNext() {
 79                 return ((size > 0) && currentIdIntervalIterator.hasNext())
 80                         || currentIterator.hasNext();
 81             }
 82 
 83             public Object next() {
 84                 if (!currentIterator.hasNext()) {
 85                     unloadPage(currentPage);
 86                     currentIdInterval = (IdInterval) currentIdIntervalIterator.next();
 87                     loadPage(currentIdInterval);
 88                 }
 89 
 90                 return currentIterator.next();
 91             }
 92 
 93             protected void unloadPage(Collection currentPage) {
 94                 for (Object o : currentPage) {
 95                     hibernateTemplate.evict(o);
 96                 }
 97             }
 98 
 99             /**
100              * Load the data corresponding to the idInterval.
101              */
102             protected void loadPage(IdInterval currentIdInterval) {
103                 finderSpecification.setIdInterval(currentIdInterval);
104                 currentPage = HibernateRepositoryTool.find(hibernateTemplate, finderSpecification);
105                 currentIterator = currentPage.iterator();
106             }
107         };
108     }
109 }
110 
111

Listing 13

The key here is basically to implement the iteration process in two steps, one to retrieve identifiers, and one to actually provide the iterator with an id-interval that is used tospecify a between range-query. Retrieving the set of identifiers is easy, the HibernateCriteriaSpecification that uses Criteria for search has a Projections.id()factory that returns all identifiers for a given Criteria-query. Basedupon the entire set of identifiers we are able to construct buckets of identifiers each representing a logical and proper page of data.

  1  public void testRetrieveOrderLinesUsingPagingIdInterval() {
  2         final Order order = (Order) hibernateTemplate.get(Order.class, orderId);
  3         Collection orderLines = order.getOrderLines(new IdIntervalPagedCollectionBuilder(hibernateTemplate, 50000));
  4 
  5         ResourceMonitor monitor = new ResourceMonitor("order.getOrderLines()");
  6         for (Object orderLine : orderLines) {
  7             //
  8             OrderLine line = (OrderLine) orderLine;
  9         }
 10         assertEquals(1000 * 1000, orderLines.size());
 11         monitor.stop();
 12         System.out.println(monitor);
 13         
 14 }

Listing 14

The IdIntervalPagedCollectionBuilder is the pluggable implementation of the CollectionBuilder interface described in listing 6.

 1 public class IdIntervalPagedCollectionBuilder implements CollectionBuilder {
 2     private HibernateTemplate hibernateTemplate;
 3     private int pageSize;
 4 
 5     public IdIntervalPagedCollectionBuilder(HibernateTemplate hibernateTemplate, int pageSize) {
 6         this.hibernateTemplate = hibernateTemplate;
 7         this.pageSize = pageSize;
 8     }
 9 
10     public Collection build(Object instance) {
11         return new IdIntervalCollection(hibernateTemplate, 
12                 new HibernateCriteriaSpecification(OrderLine.class, instance, "order"), pageSize);
13     }
14 }

Listing 15

The three examples have almost the same amount of code, but rely onquite different approaches. The test-result for the last test is remarkable.

Time elapsed	Memory Usage Delta
99,5s	29402

This approach is almost as fast as the brute-force version. The memory consumption is about one tenth. By selecting the appropriate page-size for this implementation, you may alter the ratio between execution time and memory consumption. This version is also stateless, it does not keep references to resources like the ScrollableResult version does, nor does it strain the database like the version using setFirstResult/setMaxResult.

Limitations

We need a primary key, preferably a surrogate key, that has a natural ordering so that we can implement the paging concept by using a between search for the identifiers. Another limitation is the need to actually retrieve all the identifiers before performing the search, which means two select-statements instead of possibly only one.

The Author

The Author is working as a senior consultant in a Norwegian consulting company, Zenior AS, and is an active member of the Norwegian JUG, javaBin.

Licensing

Hibernate paging using id-intervals by Bjørn Bjerkeli is licensed under a Creative Commons Attribution-Share Alike 3.0 Norway License.
Based on a work at svitjod.rymdweb.com.

References

[1] Memory limitation on 32 bit systems, http://www.theserverside.com/discussions/thread.tss?thread_id=19442
[2] Hibernate batch processing http://www.hibernate.org/hib_docs/v3/reference/en/html/batch.html
[3] Oracle rownum http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html

11 comments:

Anonymous said...: Hi Bjørn,

Very nice article! I'd like to know if you plan to release the full source code. I'm interesting to test your approches on MySQL database and check out if the limit statement used by MySQL is more efficient than loading all ids.

I work on a huge database containing combinatorial datas. Some select returns more than 10.000.000 records, and your SOLUTION cannot be mine. 10.000.000 mean at least 10.000.000 x long (8 bytes) = 80 MB.

Hope that you'll add a link to the source ;o)

Regards.; 12:27 AM
saden1 said...: What library is the ResourceMonitor class you are using from?

Thanks.; 12:27 AM
Anonymous said...: It's very sad to see a total lack of disregard for:
1) The original purpose of Hibernate
2) The way DBMSs work when it comes to ordering and sorting

Unfortunately, I see the same syndrome over and over again.

See http://www.javaperformancetuning.com/news/interview041.shtml
"There are two usage models that are problematic. The first is where people try to apply ORM where it is not really suitable. The main example of where ORM - and indeed Java - is not suitable is the case of processing data in bulk. It is simply never going to be efficient to fetch millions of rows from the database, into your JVM, and then update them one at a time. Don't use Java for this. Use a stored procedure. "; 10:29 AM
Anonymous said...: Pardon my English - should be "total disregard" or "total lack of understanding of"; 10:30 AM
Bjørn Bjerkeli said...: >t's very sad to see a total lack of >disregard for:
>1) The original purpose of Hibernate

Well, even if the hibernate developers main intent was not to support batch oriented scenarios, it supports it well if used when it is meaningful.

>2) The way DBMSs work when it comes to >ordering and sorting

Well, one of the points of writing this article is exactly that, use the database for operations that it supports well, like sorting like you mention here. Red it closer and I believe it should be clear. I might however try to elaborate some more on issues related to batch-processing using Hibernate. That is possibly the topic of a separate article.

I believe that the main point that Gavin is stating in the article is that it makes no sense to read a bunch of data into a java-program to update the data in a row-oriented way, and then write it back again, unless you have some requirements when it comes to integration when processing these rows.

One case here is in business integration scenarios, which was my requirement when using the approach demonstrated in the article. I created millions of rows in a java-based batch application from processing text-based input, and or each row I was processing, I had to integrate with a bunch of legacy systems. Try doing this in a stored procedure.

Stating that you cannot create a batch application using Hibernate and java is mildly put a naive simplification. It works, and quite a few people has done it successfully, but you have to know when to do it, and when not to do it.; 8:48 PM
Anonymous said...: could you provide me the source code for this article; 10:55 AM
Anonymous said...: Hi Bjorn,

is there a chance that you publish the complete sources?.

looks very helpfull.; 4:37 AM
Anonymous said...: Hi Bjorn,

is there a chance that you publish the complete sources?.

looks very helpfull.; 12:57 PM
Anonymous said...: It's very sad to see a total disregard for:
1) Someone posting a well thought-out solution to a common problem (pagination) which can help others in their software dev.
2) Not reading the post before adding comments to simply spout irrelevant opinions.

Indeed the hibernate guys themselves to say that hibernate is not appropriate for mass updates or inserts. However this article is in regards to pagination, which is a read activity, which hibernate is pretty damn good at!

I bet you go around to all the forums just trying to bag people out.; 9:44 PM
Anonymous said...: It’s very good work…. Very usefull…. Thankyou for giving it….
eviction law firm broward; 10:09 AM
Unknown said...: In our Norton Com Setup, we provide the best techniques to resolve each and every norton.com/setup of all users.
www.norton.com/setup and Norton Setup

Enter Key Office Setup, after purchasing MS Office from visit www.office.com/setup, sign in to your Microsoft account then enter product key for office.com/setup

We provide Best Microsoft Office Setup services in the Town! For more info, click www.office.com/setup. Are you facing troubles with Office Com Setup We understand that Microsoft Office installation is usually complicated enough to baffle individuals/companies.; 5:59 PM

Seaview, a Developers Perspective

Sunday, March 04, 2007