Learning the code way: fetch = join and the Cartesian Product Problem

Thursday, 16 August 2012

fetch = join and the Cartesian Product Problem

When we apply the join fetch strategy, the data is fetched using joins. We saw this for collections and associations.This leads to a problem when there are multiple collections in the same entity.
Consider the example of a Basket that holds fruits and vegetables.

public class Basket {
    private Integer id;
    private String color;
    private Set<Fruit> fruits = new HashSet<Fruit>();
    private Set<Vegetable> vegetables = new HashSet<Vegetable>();
}

The Fruit and Vegetable class are simple POJOs:

public class Fruit {
    private Integer id;
    private String name;
    private Basket basket;
}

public class Vegetable {
    private Integer id;
    private String name;
    private Basket basket;
}

The hbm for the basket is as below:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping package="com.collection.smart.cart_problem">
    <class name="Basket" table="BASKET">
        <id name="id" type="integer">
            <column name="ID" />
            <generator class="native" />
        </id>
        <property name="color" type="string">
            <column name="COLOR" length="50" not-null="true" />
        </property>
        <set name="fruits" cascade="all-delete-orphan" inverse="true" fetch="join" >
            <key column="BASKET_ID" not-null="true" />
            <one-to-many class="Fruit" />
        </set>
        <set name="vegetables" cascade="all-delete-orphan" inverse="true" fetch="join" >
            <key column="BASKET_ID" not-null="true" />
            <one-to-many class="Vegetable" />
        </set>

    </class>
</hibernate-mapping>

The code that is executed when a Basket is loaded is :

select
        basket0_.ID as ID0_2_,
        basket0_.COLOR as COLOR0_2_,
        fruits1_.BASKET_ID as BASKET3_4_,
        fruits1_.ID as ID4_,
        fruits1_.ID as ID1_0_,
        fruits1_.Name as Name1_0_,
        fruits1_.basket_id as basket3_1_0_,
        vegetables2_.BASKET_ID as BASKET3_5_,
        vegetables2_.ID as ID5_,
        vegetables2_.ID as ID2_1_,
        vegetables2_.Name as Name2_1_,
        vegetables2_.basket_id as basket3_2_1_ 
    from
        BASKET basket0_ 
    left outer join
        FRUIT fruits1_ 
            on basket0_.ID=fruits1_.BASKET_ID 
    left outer join
        VEGETABLE vegetables2_ 
            on basket0_.ID=vegetables2_.BASKET_ID 
    where
        basket0_.ID= ?

I executed the same query in MySql directly. The result set shows the below data:

As can be seen above , the basket had one fruit and one vegetable and the result set therefore had one single record. (This is similar to the case of one-to-one and many-to-one associations.)
Now I added a fruit and a vegetable to the basket. The query result now is changed to the below:

As can be seen in the above result set there is a lot of redundant data. The double joins applied in the database causes several rows of data to be generated. The above data set was small. But consider the case when a Basket has 100 fruits and 100 vegetables. The data set is then almost 1 X 100 X 100 rows. This entire data will be sent over the network to hibernate which will have to discard redundant information to create the actual object graph.

As long as the collections are small, join fetch will be fine, but with growing data this eager loading of parallel collections is going to be a performance issue.

8 comments:

Anonymous29 April 2014 at 16:45
reason?
ReplyDelete
Replies
Robin Varghese1 June 2014 at 05:12
see the join sequence
ReplyDelete
Replies
Anonymous20 August 2014 at 10:55
What do you mean by reason
ReplyDelete
Replies
Unknown14 July 2017 at 13:01
This comment has been removed by the author.
ReplyDelete
Replies
Unknown14 July 2017 at 13:02
This problem does not exists anymore with JPA/hibernate 5.2.10.
Same code run 3 queries :

Hibernate:
select
basket0_.id as id1_0_,
basket0_.name as name2_0_
from
basket basket0_
where
basket0_.id=?
Hibernate:
select
vegetables0_.basketid as basketid3_3_0_,
vegetables0_.id as id1_3_0_,
vegetables0_.id as id1_3_1_,
vegetables0_.basketid as basketid3_3_1_,
vegetables0_.name as name2_3_1_
from
vegetable vegetables0_
where
vegetables0_.basketid=?
Hibernate:
select
fruits0_.basketid as basketid3_1_0_,
fruits0_.id as id1_1_0_,
fruits0_.id as id1_1_1_,
fruits0_.basketid as basketid3_1_1_,
fruits0_.name as name2_1_1_
from
fruit fruits0_
where
fruits0_.basketid=?
ReplyDelete
Replies
Unknown14 July 2017 at 14:29
in fact, problem does not exists when you use JPQL which override fetch.join behaviour with secondaries select.
ReplyDelete
Replies

Add comment

Learning the code way

Search This Blog

Thursday, 16 August 2012

fetch = join and the Cartesian Product Problem

8 comments:

Total Pageviews