Search This Blog

Wednesday, 1 August 2012

batch-size =n

In our previous example we saw how a default fetch strategy(fetch = "select") could result in N+1 selection problem for Hibernate Associations. One way to solve this problem would be to use batch-size property
In this case Hibernate works on initializing Book sets for multiple shelves, when the set for a single Shelf is to fetched. Thus in a single book select, multiple collections are initialized. The hbm files for Shelf is now as below:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping package="com.collection.smart">
    <class name="Shelf" table="SHELF">
        <id name="id" type="integer">
            <column name="ID" />
            <generator class="native" />
        </id>
        <property name="code" type="string">
            <column name="CODE" length="50" not-null="true" />
        </property>
        <set name="allBooks" cascade="all-delete-orphan" 
            inverse="true" batch-size="3" >
            <key column="SHELF_ID" not-null="true" />
            <one-to-many class="Book" />
        </set>

    </class>
</hibernate-mapping>
Observe the set element. The fetching property is still unchanged. It is the default select value. However the batch-size has been set to three here. Now if were to execute a select query against The Shelf class:
public static void testBatchedLoad() {

    Session session = sessionFactory.openSession();
    Transaction transaction = session.beginTransaction();
    System.out.println("Testing Query select");
    session = sessionFactory.openSession();
    transaction = session.beginTransaction();
    @SuppressWarnings("unchecked")
    List<Shelf> shelfs = session.createQuery("from Shelf shelf 
         where shelf.id <= 5 ").list();
    //will return four books from database    
    for (Shelf shelf : shelfs) {
        System.out.println("shelf class is " + shelf.getClass());
        Set<Book> books = shelf.getAllBooks();            
        System.out.println("The Collection class is " + shelf.getAllBooks().getClass());
        System.out.println("Total books is " + books.size());
    }
    
    transaction.commit();
    session.close();
}
The HQL query loads a list of Shelf entities and then displays the number of books on each shelf.
Testing Query select
Hibernate: 
    /* 
from
    Shelf shelf 
where
    shelf.id <= 5  */ 
    select
        shelf0_.ID as ID1_,
        shelf0_.CODE as CODE1_ 
    from
        SHELF shelf0_ 
    where
        shelf0_.ID<=5
shelf class is class com.collection.smart.Shelf
The Collection class is class org.hibernate.collection.PersistentSet
Hibernate: 
    /* load one-to-many com.collection.smart.Shelf.allBooks */ 
    select
        allbooks0_.SHELF_ID as SHELF3_1_,
        allbooks0_.ID as ID1_,
        allbooks0_.ID as ID0_0_,
        allbooks0_.Name as Name0_0_,
        allbooks0_.shelf_id as shelf3_0_0_ 
    from
        BOOK allbooks0_ 
    where
        allbooks0_.SHELF_ID in (
            ?, ?, ?
        )
Total books is 2
shelf class is class com.collection.smart.Shelf
The Collection class is class org.hibernate.collection.PersistentSet
Total books is 1
shelf class is class com.collection.smart.Shelf
The Collection class is class org.hibernate.collection.PersistentSet
Hibernate: 
    /* load one-to-many com.collection.smart.Shelf.allBooks */ 
    select
        allbooks0_.SHELF_ID as SHELF3_1_,
        allbooks0_.ID as ID1_,
        allbooks0_.ID as ID0_0_,
        allbooks0_.Name as Name0_0_,
        allbooks0_.shelf_id as shelf3_0_0_ 
    from
        BOOK allbooks0_ 
    where
        allbooks0_.SHELF_ID=?
Total books is 1
shelf class is class com.collection.smart.Shelf
The Collection class is class org.hibernate.collection.PersistentSet
Total books is 1
As can be seen above the select query executed to load the Book collection object is an IN query. It is designed to take three ids as we have a batch-size of 3.
A total of 4 shelves were loaded by the HQL query. When we tried to access the set for the first book, Hibernate pre-fetched the collection for the first three shelves. It then executed only one more select clause to load the last collection. Thus the number of secondary selects fired has been reduced by the batch-size. Instead of N+1 select queries we now have only (N/batch-size) +1 queries.
If the batch-size was 5 then the books select query for the above example would be
Testing Query select
    /* load one-to-many com.collection.smart.Shelf.allBooks */ 
    select
        allbooks0_.SHELF_ID as SHELF3_1_,
        allbooks0_.ID as ID1_,
        allbooks0_.ID as ID0_0_,
        allbooks0_.Name as Name0_0_,
        allbooks0_.shelf_id as shelf3_0_0_ 
    from
        BOOK allbooks0_ 
    where
        allbooks0_.SHELF_ID in (
            ?, ?, ?, ?
        )
As can be seen Hibernate will fire only one query replacing the four question marks with the ids of the four shelf objects.Thus if we had 18 Shelf records in the result-set then the collections would be pre-fetched as below: (batch-size = 5)
Query 1 : 5 records (for shelfs with id 1-5)
Query 2 : 5 records (for shelfs with id 6-10)
Query 3 : 5 records (for shelfs with id 11-15)
Query 4 : 3 records (for shelfs with id 16-18).
4 secondary queries instead of 18 secondary queries
This does not imply that Hibernate can create any number of loaders. E.g for a batch size of 20 and 119 records the assumption would be
  • 5 queries for fetching 20 records each ( 1 to 100)
  • 1 query for fetching 19 records ( 101 to 119)
But this is not the case. Hibernate actually generates 6 queries
  • 5 queries for fetching 20 records each ( 1 to 100)
  • 1 query for fetching 10 records (101 to 110)
  • 1 query for fetching last 9 records (110 to 119)
I have searched high and low for the exact algorithm and the best explanation was in this discussion.
Batch fetching functionality is not just limited to collections. You can also use it for associations. Consider that you fire a query to load a list of books. Now if were to access the Shelf reference for each book, it wold result in n queries for n books. or the N+1 Problem.
However if you add the following attribute in your Shelf.hbm file:
<class name="Shelf" table="SHELF" batch-size="5">
Now when we iterate over the set the Shelf records will be loaded by Hibernate in batches of 6.
In the next post we shall attempt a two query solution for the whole N+1 problem.

1 comment: