Search This Blog

Thursday, 16 August 2012

fetch = join and the Cartesian Product Problem

When we apply the join fetch strategy, the data is fetched using joins. We saw this for collections and associations.This leads to a problem when there are multiple collections in the same entity.
Consider the example of a Basket that holds fruits and vegetables.
public class Basket {
    private Integer id;
    private String color;
    private Set<Fruit> fruits = new HashSet<Fruit>();
    private Set<Vegetable> vegetables = new HashSet<Vegetable>();
The Fruit and Vegetable class are simple POJOs:
public class Fruit {
    private Integer id;
    private String name;
    private Basket basket;
public class Vegetable {
    private Integer id;
    private String name;
    private Basket basket;
The hbm for the basket is as below:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
<hibernate-mapping package="">
    <class name="Basket" table="BASKET">
        <id name="id" type="integer">
            <column name="ID" />
            <generator class="native" />
        <property name="color" type="string">
            <column name="COLOR" length="50" not-null="true" />
        <set name="fruits" cascade="all-delete-orphan" inverse="true" fetch="join" >
            <key column="BASKET_ID" not-null="true" />
            <one-to-many class="Fruit" />
        <set name="vegetables" cascade="all-delete-orphan" inverse="true" fetch="join" >
            <key column="BASKET_ID" not-null="true" />
            <one-to-many class="Vegetable" />

The code that is executed when a Basket is loaded is :
        basket0_.ID as ID0_2_,
        basket0_.COLOR as COLOR0_2_,
        fruits1_.BASKET_ID as BASKET3_4_,
        fruits1_.ID as ID4_,
        fruits1_.ID as ID1_0_,
        fruits1_.Name as Name1_0_,
        fruits1_.basket_id as basket3_1_0_,
        vegetables2_.BASKET_ID as BASKET3_5_,
        vegetables2_.ID as ID5_,
        vegetables2_.ID as ID2_1_,
        vegetables2_.Name as Name2_1_,
        vegetables2_.basket_id as basket3_2_1_ 
        BASKET basket0_ 
    left outer join
        FRUIT fruits1_ 
            on basket0_.ID=fruits1_.BASKET_ID 
    left outer join
        VEGETABLE vegetables2_ 
            on basket0_.ID=vegetables2_.BASKET_ID 
        basket0_.ID= ?
I executed the same query in MySql directly. The result set shows the below data:
As can be seen above , the basket had one fruit and one vegetable and the result set therefore had one single record. (This is similar to the case of one-to-one and many-to-one associations.)
Now I added a fruit and a vegetable to the basket. The query result now is changed to the below:
As can be seen in the above result set there is a lot of redundant data. The double joins applied in the database causes several rows of data to be generated. The above data set was small. But consider the case when a Basket has 100 fruits and 100 vegetables. The data set is then almost 1 X 100 X 100 rows. This entire data will be sent over the network to hibernate which will have to  discard redundant information to create the actual object graph.
As long as the collections are small, join fetch will be fine, but with growing data this eager loading of parallel collections is going to be a performance issue.


  1. What do you mean by reason

    1. Not sure I follow you. Could you be a bit more specific ?