Rick

Rick
Rick

Tuesday, March 30, 2010

Optimizing JPA calls (using JPA, ROO, Hibernate, Maven, Spring and Tomcat)

I created a proof of concept for our new REST based JSON API.

I used ROO, Spring 3 and JPA.

(This is just a POC and not a real app at this point.)

One of the issues is the number of queries we make for a relatively simple access call to get a list of the top level categories and a list of categories.

The current implementation of the POC does not use any caching.

NOTE:
If you are using ROO, Spring, JPA Maven and/or the Maven Tomcat plugin, I think this will help you setup caching and log4j. It will also help you optimize you object retrieval with JPA. Thus, I post it here where people can see it, find it, and use it. If it helps, you good. Feel free to provide feedback.

As an exercise in architecture design, I wanted to show what it would take to optimize the queries and to add 2nd level caching. Then we can decide if it makes sense for future JSON APIs.

{
"outer" : {
"categoryList" : {
"categoryItem" : [ {
"id" : 1,
"count":2,
"name" : "Games"
}, {
"id" : 2,
"count":4
"name" : "Utils"
} ]
}
}
}
We need to look up the categories by spoken language.

As JPA goes, the Category domain object is fairly complex, as it uses a self join through a join table. But JPA does a good job of hiding the complexity and making representing this object fairly straight forward as follows:

@Entity
@RooJavaBean
@RooToString
@Table(name = "ad_categories")
@RooEntity(versionField = "", finders = { "findCategorysByCommonLanguage" })
@JsonIgnoreProperties( { "parent", "commonLanguage", "subcategories" })
public class Category {

private String name;

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(name = "id")
private Long id;

@ManyToOne(targetEntity = Category.class)
@JoinTable(name = "ad_category_hierarchy", joinColumns =
@JoinColumn(name = "childid"), inverseJoinColumns = @JoinColumn(name = "parentid"))
private Category parent;

@ManyToOne(targetEntity = CommonLanguage.class)
@JoinColumn(name = "languageid")
private CommonLanguage commonLanguage;

@Formula("(select count(*) from ad_category_hierarchy ah join " +
"ad_categories ac on ac.id = ah.childid where ah.parentid = id)")
private int count;

@ManyToMany(cascade = CascadeType.ALL)
@JoinTable(name = "ad_category_hierarchy", joinColumns = @JoinColumn(name = "parentid"), inverseJoinColumns = @JoinColumn(name = "childid"))
private Set subcategories = new HashSet();

public static Query findCategoriesByCommonLanguageAndParent(
CommonLanguage commonLanguage, Category parent) {
if (commonLanguage == null)
throw new IllegalArgumentException(
"The commonLanguage argument is required");

EntityManager em = Category.entityManager();
Query q = null;

if (parent == null) {
q = em
.createQuery("SELECT Category FROM Category AS category WHERE category.commonLanguage = :commonLanguage AND category.parent is null");
} else {
q = em
.createQuery("SELECT Category FROM Category AS category WHERE category.commonLanguage = :commonLanguage AND category.parent = :parent");

}
q.setParameter("commonLanguage", commonLanguage);
if (parent != null) {
q.setParameter("parent", parent);
}
return q;
}
}
The above is a fairly naked, first attempt of using JPA to manage a category domain object.

Here is the controller we use to return a JSON response to the client:

@RooWebScaffold(path = "category", automaticallyMaintainView = true, formBackingObject = Category.class)
@RequestMapping("/category/**")
@Controller
public class CategoryController {

/**
* List Categories is used to return a JSON payload of cateogories.
*
*
* @param locale the locale in the request (currently we only use the language from the locale)
* @param parentId If the parentId is 0 then just look up the top level categories, else return categories for that parent
* @param response gives up control over what response code we send to the client.
* @return
*/
@SuppressWarnings("unchecked")
@RequestMapping(value = "/category/listCategories/locale/{locale}/category/{parentId}", method = RequestMethod.GET)
public @ResponseBody List listCategories(@PathVariable String locale,
@PathVariable Long parentId,
HttpServletResponse response) {

/* Locale is required. */
if (locale == null || locale.trim().length()==0) {
response.setStatus(HttpServletResponse.SC_BAD_REQUEST);
//TODO log this
return null;
}

/* If the parentId is 0, then set it to null, otherwise look up the parent category. */
Category parent = parentId==0 ? null : Category.findCategory(parentId);

/* Extract the language part of the locale to look up the common language. */
String language = locale.split("_")[0];
CommonLanguage commonLanguage = (CommonLanguage) CommonLanguage.findCommonLanguagesByCode(language).getSingleResult();

/* Look up the list of categories. */
List categories = Category.findCategoriesByCommonLanguageAndParent(commonLanguage, parent).getResultList();
return categories;
}

}
Focus mainly on this part of the code:

Category parent = parentId==0 ? null : Category.findCategory(parentId);

String language = locale.split("_")[0];
CommonLanguage commonLanguage = (CommonLanguage) CommonLanguage.findCommonLanguagesByCode(language).getSingleResult();

List categories = Category.findCategoriesByCommonLanguageAndParent(commonLanguage, parent).getResultList();
return categories;
It is fairly easy to instrument Hibernate to show you exactly what SQL it is producing to return a call as follows:

persistence.xml



When hitting the above JSON end point, we get the following SQL queries.

URL:

http://localhost:8080/appcatalog/category/listCategories/locale/en_US/category/0

Output:

[
{
"name": "Other",
"id": 1,
"count": 0
},
{
"name": "Books",
"id": 2,
"count": 14
},
{
"name": "Business",
"id": 3,
"count": 11
},
{
"name": "School",
"id": 4,
"count": 11
},
{
"name": "Games",
"id": 5,
"count": 9
},
{
"name": "Finance",
"id": 6,
"count": 9
},
{
"name": "Food",
"id": 7,
"count": 5
},
...
]
The above uses the following SQL:

Mar 30, 2010 11:45:53 AM org.apache.catalina.core.ApplicationContext log
INFO: Initializing Spring FrameworkServlet 'appcatalog'
Hibernate:
/* SELECT
CommonLanguage
FROM
CommonLanguage AS commonlanguage
WHERE
commonlanguage.code = :code */ select
commonlang0_.id as id0_,
commonlang0_.code as code0_,
commonlang0_.description as descript3_0_,
commonlang0_.name as name0_
from
common_language commonlang0_
where
commonlang0_.code=?
Hibernate:
/* SELECT
Category
FROM
Category AS category
WHERE
category.commonLanguage = :commonLanguage
AND category.parent is null */ select
category0_.id as id2_,
category0_.languageid as languageid2_,
category0_.name as name2_,
category0_1_.parentid as parentid3_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid
where
category0_.languageid=?
and (
category0_1_.parentid is null
)
If we specify a parent id category as follows:

URL: http://localhost:8080/appcatalog/category/listCategories/locale/en_US/category/15

Then we get the following:

Hibernate:
/* load com.mycompany.appcatalog.domain.Category */ select
category0_.id as id2_2_,
category0_.languageid as languageid2_2_,
category0_.name as name2_2_,
category0_1_.parentid as parentid3_2_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_2_,
commonlang1_.id as id0_0_,
commonlang1_.code as code0_0_,
commonlang1_.description as descript3_0_0_,
commonlang1_.name as name0_0_,
category2_.id as id2_1_,
category2_.languageid as languageid2_1_,
category2_.name as name2_1_,
category2_1_.parentid as parentid3_1_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category2_.id) as formula0_1_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid
left outer join
common_language commonlang1_
on category0_.languageid=commonlang1_.id
left outer join
ad_categories category2_
on category0_1_.parentid=category2_.id
left outer join
ad_category_hierarchy category2_1_
on category2_.id=category2_1_.childid
where
category0_.id=?
Hibernate:
/* SELECT
CommonLanguage
FROM
CommonLanguage AS commonlanguage
WHERE
commonlanguage.code = :code */ select
commonlang0_.id as id0_,
commonlang0_.code as code0_,
commonlang0_.description as descript3_0_,
commonlang0_.name as name0_
from
common_language commonlang0_
where
commonlang0_.code=?
Hibernate:
/* SELECT
Category
FROM
Category AS category
WHERE
category.commonLanguage = :commonLanguage
AND category.parent = :parent */ select
category0_.id as id2_,
category0_.languageid as languageid2_,
category0_.name as name2_,
category0_1_.parentid as parentid3_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid
where
category0_.languageid=?
and category0_1_.parentid=?
Ok so in the first case (top level cat) we do the following:

Select on common language
Select on category with inner select and an outer join (quite expensive in terms of db calls)
In the second case things get pretty exciting, we do the following:

Select on category with an inner join and count them 1, 2, 3, 4 outer joins
Select on Common language
Select on category with inner select and outer join
What immediate comes to mind is that I can get rid of the calls to load commonlanguage by rolling the lang code into the category query as follows:

public static Query findCategoriesByLanguageCodeAndParent(
String languageCode, Category parent) {
if (languageCode == null)
throw new IllegalArgumentException(
"The commonLanguage argument is required");

EntityManager em = Category.entityManager();
Query q = null;

if (parent == null) {
q = em
.createQuery("SELECT Category FROM Category AS category WHERE category.commonLanguage.code = :languageCode AND category.parent is null");
} else {
q = em
.createQuery("SELECT Category FROM Category AS category WHERE category.commonLanguage.code = :languageCode AND category.parent = :parent");

}
q.setParameter("languageCode", languageCode);
if (parent != null) {
q.setParameter("parent", parent);
}
return q;
}

...

public @ResponseBody List listCategories(@PathVariable String locale,
@PathVariable Long parentId,
HttpServletResponse response) {

/* Locale is required. */
if (locale == null || locale.trim().length()==0) {
response.setStatus(HttpServletResponse.SC_BAD_REQUEST);
//TODO log this
return null;
}

/* If the parentId is 0, then set it to null, otherwise look up the parent category. */
Category parent = parentId==0 ? null : Category.findCategory(parentId);

/* Extract the language part of the locale to look up the category by language code. */
String languageCode = locale.split("_")[0];

/* Look up the list of categories. */
List categories = Category.findCategoriesByLanguageCodeAndParent(languageCode, parent).getResultList();
return categories;
}
My first knee jerk reaction was to add a second level cache but after looking at the SQL, the solution became very clear. (Simple refactoring..)

Ok... all of the above did was switch the order. Hibernate still loaded the common language. We need to make it lazy load it on demand.

I went ahead and marked all of the relationships as lazy as we can override the lazy load behavior with the Criteria API or with JPA QL later if needed.

Here is an updated set of queries:

First case (load top level cats):

URL: http://localhost:8080/appcatalog/category/listCategories/locale/en_US/category/0

Hibernate:
/* SELECT
Category
FROM
Category AS category
WHERE
category.commonLanguage.code = :languageCode
AND category.parent is null */ select
category0_.id as id2_,
category0_.languageid as languageid2_,
category0_.name as name2_,
category0_1_.parentid as parentid3_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid,
common_language commonlang1_
where
category0_.languageid=commonlang1_.id
and commonlang1_.code=?
and (
category0_1_.parentid is null
)
We got the hits to the database go down to 1.

Now for the second case:

URL: http://localhost:8080/appcatalog/category/listCategories/locale/en_US/category/2

Hibernate:
/* load com.mycompany.appcatalog.domain.Category */ select
category0_.id as id2_0_,
category0_.languageid as languageid2_0_,
category0_.name as name2_0_,
category0_1_.parentid as parentid3_0_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid
where
category0_.id=?
Hibernate:
/* SELECT
Category
FROM
Category AS category
WHERE
category.commonLanguage.code = :languageCode
AND category.parent = :parent */ select
category0_.id as id2_,
category0_.languageid as languageid2_,
category0_.name as name2_,
category0_1_.parentid as parentid3_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid,
common_language commonlang1_
where
category0_.languageid=commonlang1_.id
and commonlang1_.code=?
and category0_1_.parentid=?
So the db hit count goes down to 2 from 3 and we are no longer doing a five way join.

We are still loading the top level category and then loading its children. It would behoove us to setup an object cache for categories due to the natural fan out of a hierarchal list of taxonomy like an app catalog.

After we setup the cache, we should get the hits to the db to 1 per use case.

Here are the new lazy fields for JPA:

public class Category {

private String name;

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(name = "id")
private Long id;

@ManyToOne(targetEntity = Category.class, fetch=FetchType.LAZY)
@JoinTable(name = "ad_category_hierarchy", joinColumns =
@JoinColumn(name = "childid"), inverseJoinColumns = @JoinColumn(name = "parentid"))
private Category parent;

@ManyToOne(targetEntity = CommonLanguage.class, fetch=FetchType.LAZY)
@JoinColumn(name = "languageid")
private CommonLanguage commonLanguage;

@Formula("(select count(*) from ad_category_hierarchy ah join " +
"ad_categories ac on ac.id = ah.childid where ah.parentid = id)")
private int count;

@ManyToMany(cascade = CascadeType.ALL, fetch=FetchType.LAZY)
@JoinTable(name = "ad_category_hierarchy", joinColumns = @JoinColumn(name = "parentid"), inverseJoinColumns = @JoinColumn(name = "childid"))
private Set subcategories = new HashSet();
Ok before we setup the second level cache, we need to turn on log4j logging so we can see that it is setup.

We use the maven tomcat plugin. Tomcat uses log4j internally so it is difficult to get tomcat to respect your log4j setting.

To get it to respect them, you have to go nuclear and setup log4j at the JVM level in order to do that you need to setup the MAVEN_OPTS environment variable, e.g.,

cat.sh
rickhightower$ cat env.sh
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
export CATALINA_HOME=`pwd`/tomcat

export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:/Users/rickhightower/proto/application-catalog/src/main/resources/META-INF/spring/log4j.properties"
Then in the log4j.properties turn on Hibernate cache logging.

log4j.properties
log4j.rootLogger=error, stdout, R

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

# Print the date in ISO 8601 format
log4j.appender.stdout.layout.ConversionPattern=%d [%t] %-5p %c - %m%n

log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=application.log

log4j.appender.R.MaxFileSize=100KB
# Keep one backup file
log4j.appender.R.MaxBackupIndex=1

log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n

log4j.logger.org.hibernate.cache=DEBUG
When it comes to debugging Hibernate and Spring issues you cannot fly blind. You have to setup logging. If you don't know how, then you don't know how to use Spring and Hibernate.

Then you need to tell Hibernate about ehcache configuration:

persistence.xml






Notice we setup hibernate.cache.use_second_level_cache to true. Then we configure a provider and a location for the cache setup file.

Then you need to setup a ehcache file to configure the cache as follows:

ehcache-ac.xml



name="com.mycompany.appcatalog.domain.Category"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="0"
timeToLiveSeconds="3600"
overflowToDisk="false"
/>

maxElementsInMemory="10000"
eternal="false"
timeToIdleSeconds="120"
timeToLiveSeconds="120"
overflowToDisk="true"
maxElementsOnDisk="10000000"
diskPersistent="false"
diskExpiryThreadIntervalSeconds="120"
memoryStoreEvictionPolicy="LRU"
/>

Then lastly, we need to enable caching for the particular object see @Cache annotation below:

Category.java
@Entity
@Cache(usage = CacheConcurrencyStrategy.READ_ONLY )
public class Category {

private String name;

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(name = "id")
private Long id;
Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training