Rick

Rick
Rick

Tuesday, March 30, 2010

Optimizing JPA calls (using JPA, ROO, Hibernate, Maven, Spring and Tomcat)

I created a proof of concept for our new REST based JSON API.

I used ROO, Spring 3 and JPA.

(This is just a POC and not a real app at this point.)

One of the issues is the number of queries we make for a relatively simple access call to get a list of the top level categories and a list of categories.

The current implementation of the POC does not use any caching.

NOTE:
If you are using ROO, Spring, JPA Maven and/or the Maven Tomcat plugin, I think this will help you setup caching and log4j. It will also help you optimize you object retrieval with JPA. Thus, I post it here where people can see it, find it, and use it. If it helps, you good. Feel free to provide feedback.

As an exercise in architecture design, I wanted to show what it would take to optimize the queries and to add 2nd level caching. Then we can decide if it makes sense for future JSON APIs.

{
"outer" : {
"categoryList" : {
"categoryItem" : [ {
"id" : 1,
"count":2,
"name" : "Games"
}, {
"id" : 2,
"count":4
"name" : "Utils"
} ]
}
}
}
We need to look up the categories by spoken language.

As JPA goes, the Category domain object is fairly complex, as it uses a self join through a join table. But JPA does a good job of hiding the complexity and making representing this object fairly straight forward as follows:

@Entity
@RooJavaBean
@RooToString
@Table(name = "ad_categories")
@RooEntity(versionField = "", finders = { "findCategorysByCommonLanguage" })
@JsonIgnoreProperties( { "parent", "commonLanguage", "subcategories" })
public class Category {

private String name;

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(name = "id")
private Long id;

@ManyToOne(targetEntity = Category.class)
@JoinTable(name = "ad_category_hierarchy", joinColumns =
@JoinColumn(name = "childid"), inverseJoinColumns = @JoinColumn(name = "parentid"))
private Category parent;

@ManyToOne(targetEntity = CommonLanguage.class)
@JoinColumn(name = "languageid")
private CommonLanguage commonLanguage;

@Formula("(select count(*) from ad_category_hierarchy ah join " +
"ad_categories ac on ac.id = ah.childid where ah.parentid = id)")
private int count;

@ManyToMany(cascade = CascadeType.ALL)
@JoinTable(name = "ad_category_hierarchy", joinColumns = @JoinColumn(name = "parentid"), inverseJoinColumns = @JoinColumn(name = "childid"))
private Set subcategories = new HashSet();

public static Query findCategoriesByCommonLanguageAndParent(
CommonLanguage commonLanguage, Category parent) {
if (commonLanguage == null)
throw new IllegalArgumentException(
"The commonLanguage argument is required");

EntityManager em = Category.entityManager();
Query q = null;

if (parent == null) {
q = em
.createQuery("SELECT Category FROM Category AS category WHERE category.commonLanguage = :commonLanguage AND category.parent is null");
} else {
q = em
.createQuery("SELECT Category FROM Category AS category WHERE category.commonLanguage = :commonLanguage AND category.parent = :parent");

}
q.setParameter("commonLanguage", commonLanguage);
if (parent != null) {
q.setParameter("parent", parent);
}
return q;
}
}
The above is a fairly naked, first attempt of using JPA to manage a category domain object.

Here is the controller we use to return a JSON response to the client:

@RooWebScaffold(path = "category", automaticallyMaintainView = true, formBackingObject = Category.class)
@RequestMapping("/category/**")
@Controller
public class CategoryController {

/**
* List Categories is used to return a JSON payload of cateogories.
*
*
* @param locale the locale in the request (currently we only use the language from the locale)
* @param parentId If the parentId is 0 then just look up the top level categories, else return categories for that parent
* @param response gives up control over what response code we send to the client.
* @return
*/
@SuppressWarnings("unchecked")
@RequestMapping(value = "/category/listCategories/locale/{locale}/category/{parentId}", method = RequestMethod.GET)
public @ResponseBody List listCategories(@PathVariable String locale,
@PathVariable Long parentId,
HttpServletResponse response) {

/* Locale is required. */
if (locale == null || locale.trim().length()==0) {
response.setStatus(HttpServletResponse.SC_BAD_REQUEST);
//TODO log this
return null;
}

/* If the parentId is 0, then set it to null, otherwise look up the parent category. */
Category parent = parentId==0 ? null : Category.findCategory(parentId);

/* Extract the language part of the locale to look up the common language. */
String language = locale.split("_")[0];
CommonLanguage commonLanguage = (CommonLanguage) CommonLanguage.findCommonLanguagesByCode(language).getSingleResult();

/* Look up the list of categories. */
List categories = Category.findCategoriesByCommonLanguageAndParent(commonLanguage, parent).getResultList();
return categories;
}

}
Focus mainly on this part of the code:

Category parent = parentId==0 ? null : Category.findCategory(parentId);

String language = locale.split("_")[0];
CommonLanguage commonLanguage = (CommonLanguage) CommonLanguage.findCommonLanguagesByCode(language).getSingleResult();

List categories = Category.findCategoriesByCommonLanguageAndParent(commonLanguage, parent).getResultList();
return categories;
It is fairly easy to instrument Hibernate to show you exactly what SQL it is producing to return a call as follows:

persistence.xml



When hitting the above JSON end point, we get the following SQL queries.

URL:

http://localhost:8080/appcatalog/category/listCategories/locale/en_US/category/0

Output:

[
{
"name": "Other",
"id": 1,
"count": 0
},
{
"name": "Books",
"id": 2,
"count": 14
},
{
"name": "Business",
"id": 3,
"count": 11
},
{
"name": "School",
"id": 4,
"count": 11
},
{
"name": "Games",
"id": 5,
"count": 9
},
{
"name": "Finance",
"id": 6,
"count": 9
},
{
"name": "Food",
"id": 7,
"count": 5
},
...
]
The above uses the following SQL:

Mar 30, 2010 11:45:53 AM org.apache.catalina.core.ApplicationContext log
INFO: Initializing Spring FrameworkServlet 'appcatalog'
Hibernate:
/* SELECT
CommonLanguage
FROM
CommonLanguage AS commonlanguage
WHERE
commonlanguage.code = :code */ select
commonlang0_.id as id0_,
commonlang0_.code as code0_,
commonlang0_.description as descript3_0_,
commonlang0_.name as name0_
from
common_language commonlang0_
where
commonlang0_.code=?
Hibernate:
/* SELECT
Category
FROM
Category AS category
WHERE
category.commonLanguage = :commonLanguage
AND category.parent is null */ select
category0_.id as id2_,
category0_.languageid as languageid2_,
category0_.name as name2_,
category0_1_.parentid as parentid3_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid
where
category0_.languageid=?
and (
category0_1_.parentid is null
)
If we specify a parent id category as follows:

URL: http://localhost:8080/appcatalog/category/listCategories/locale/en_US/category/15

Then we get the following:

Hibernate:
/* load com.mycompany.appcatalog.domain.Category */ select
category0_.id as id2_2_,
category0_.languageid as languageid2_2_,
category0_.name as name2_2_,
category0_1_.parentid as parentid3_2_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_2_,
commonlang1_.id as id0_0_,
commonlang1_.code as code0_0_,
commonlang1_.description as descript3_0_0_,
commonlang1_.name as name0_0_,
category2_.id as id2_1_,
category2_.languageid as languageid2_1_,
category2_.name as name2_1_,
category2_1_.parentid as parentid3_1_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category2_.id) as formula0_1_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid
left outer join
common_language commonlang1_
on category0_.languageid=commonlang1_.id
left outer join
ad_categories category2_
on category0_1_.parentid=category2_.id
left outer join
ad_category_hierarchy category2_1_
on category2_.id=category2_1_.childid
where
category0_.id=?
Hibernate:
/* SELECT
CommonLanguage
FROM
CommonLanguage AS commonlanguage
WHERE
commonlanguage.code = :code */ select
commonlang0_.id as id0_,
commonlang0_.code as code0_,
commonlang0_.description as descript3_0_,
commonlang0_.name as name0_
from
common_language commonlang0_
where
commonlang0_.code=?
Hibernate:
/* SELECT
Category
FROM
Category AS category
WHERE
category.commonLanguage = :commonLanguage
AND category.parent = :parent */ select
category0_.id as id2_,
category0_.languageid as languageid2_,
category0_.name as name2_,
category0_1_.parentid as parentid3_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid
where
category0_.languageid=?
and category0_1_.parentid=?
Ok so in the first case (top level cat) we do the following:

Select on common language
Select on category with inner select and an outer join (quite expensive in terms of db calls)
In the second case things get pretty exciting, we do the following:

Select on category with an inner join and count them 1, 2, 3, 4 outer joins
Select on Common language
Select on category with inner select and outer join
What immediate comes to mind is that I can get rid of the calls to load commonlanguage by rolling the lang code into the category query as follows:

public static Query findCategoriesByLanguageCodeAndParent(
String languageCode, Category parent) {
if (languageCode == null)
throw new IllegalArgumentException(
"The commonLanguage argument is required");

EntityManager em = Category.entityManager();
Query q = null;

if (parent == null) {
q = em
.createQuery("SELECT Category FROM Category AS category WHERE category.commonLanguage.code = :languageCode AND category.parent is null");
} else {
q = em
.createQuery("SELECT Category FROM Category AS category WHERE category.commonLanguage.code = :languageCode AND category.parent = :parent");

}
q.setParameter("languageCode", languageCode);
if (parent != null) {
q.setParameter("parent", parent);
}
return q;
}

...

public @ResponseBody List listCategories(@PathVariable String locale,
@PathVariable Long parentId,
HttpServletResponse response) {

/* Locale is required. */
if (locale == null || locale.trim().length()==0) {
response.setStatus(HttpServletResponse.SC_BAD_REQUEST);
//TODO log this
return null;
}

/* If the parentId is 0, then set it to null, otherwise look up the parent category. */
Category parent = parentId==0 ? null : Category.findCategory(parentId);

/* Extract the language part of the locale to look up the category by language code. */
String languageCode = locale.split("_")[0];

/* Look up the list of categories. */
List categories = Category.findCategoriesByLanguageCodeAndParent(languageCode, parent).getResultList();
return categories;
}
My first knee jerk reaction was to add a second level cache but after looking at the SQL, the solution became very clear. (Simple refactoring..)

Ok... all of the above did was switch the order. Hibernate still loaded the common language. We need to make it lazy load it on demand.

I went ahead and marked all of the relationships as lazy as we can override the lazy load behavior with the Criteria API or with JPA QL later if needed.

Here is an updated set of queries:

First case (load top level cats):

URL: http://localhost:8080/appcatalog/category/listCategories/locale/en_US/category/0

Hibernate:
/* SELECT
Category
FROM
Category AS category
WHERE
category.commonLanguage.code = :languageCode
AND category.parent is null */ select
category0_.id as id2_,
category0_.languageid as languageid2_,
category0_.name as name2_,
category0_1_.parentid as parentid3_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid,
common_language commonlang1_
where
category0_.languageid=commonlang1_.id
and commonlang1_.code=?
and (
category0_1_.parentid is null
)
We got the hits to the database go down to 1.

Now for the second case:

URL: http://localhost:8080/appcatalog/category/listCategories/locale/en_US/category/2

Hibernate:
/* load com.mycompany.appcatalog.domain.Category */ select
category0_.id as id2_0_,
category0_.languageid as languageid2_0_,
category0_.name as name2_0_,
category0_1_.parentid as parentid3_0_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid
where
category0_.id=?
Hibernate:
/* SELECT
Category
FROM
Category AS category
WHERE
category.commonLanguage.code = :languageCode
AND category.parent = :parent */ select
category0_.id as id2_,
category0_.languageid as languageid2_,
category0_.name as name2_,
category0_1_.parentid as parentid3_,
(select
count(*)
from
ad_category_hierarchy ah
join
ad_categories ac
on ac.id = ah.childid
where
ah.parentid = category0_.id) as formula0_
from
ad_categories category0_
left outer join
ad_category_hierarchy category0_1_
on category0_.id=category0_1_.childid,
common_language commonlang1_
where
category0_.languageid=commonlang1_.id
and commonlang1_.code=?
and category0_1_.parentid=?
So the db hit count goes down to 2 from 3 and we are no longer doing a five way join.

We are still loading the top level category and then loading its children. It would behoove us to setup an object cache for categories due to the natural fan out of a hierarchal list of taxonomy like an app catalog.

After we setup the cache, we should get the hits to the db to 1 per use case.

Here are the new lazy fields for JPA:

public class Category {

private String name;

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(name = "id")
private Long id;

@ManyToOne(targetEntity = Category.class, fetch=FetchType.LAZY)
@JoinTable(name = "ad_category_hierarchy", joinColumns =
@JoinColumn(name = "childid"), inverseJoinColumns = @JoinColumn(name = "parentid"))
private Category parent;

@ManyToOne(targetEntity = CommonLanguage.class, fetch=FetchType.LAZY)
@JoinColumn(name = "languageid")
private CommonLanguage commonLanguage;

@Formula("(select count(*) from ad_category_hierarchy ah join " +
"ad_categories ac on ac.id = ah.childid where ah.parentid = id)")
private int count;

@ManyToMany(cascade = CascadeType.ALL, fetch=FetchType.LAZY)
@JoinTable(name = "ad_category_hierarchy", joinColumns = @JoinColumn(name = "parentid"), inverseJoinColumns = @JoinColumn(name = "childid"))
private Set subcategories = new HashSet();
Ok before we setup the second level cache, we need to turn on log4j logging so we can see that it is setup.

We use the maven tomcat plugin. Tomcat uses log4j internally so it is difficult to get tomcat to respect your log4j setting.

To get it to respect them, you have to go nuclear and setup log4j at the JVM level in order to do that you need to setup the MAVEN_OPTS environment variable, e.g.,

cat.sh
rickhightower$ cat env.sh
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
export CATALINA_HOME=`pwd`/tomcat

export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:/Users/rickhightower/proto/application-catalog/src/main/resources/META-INF/spring/log4j.properties"
Then in the log4j.properties turn on Hibernate cache logging.

log4j.properties
log4j.rootLogger=error, stdout, R

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

# Print the date in ISO 8601 format
log4j.appender.stdout.layout.ConversionPattern=%d [%t] %-5p %c - %m%n

log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=application.log

log4j.appender.R.MaxFileSize=100KB
# Keep one backup file
log4j.appender.R.MaxBackupIndex=1

log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n

log4j.logger.org.hibernate.cache=DEBUG
When it comes to debugging Hibernate and Spring issues you cannot fly blind. You have to setup logging. If you don't know how, then you don't know how to use Spring and Hibernate.

Then you need to tell Hibernate about ehcache configuration:

persistence.xml






Notice we setup hibernate.cache.use_second_level_cache to true. Then we configure a provider and a location for the cache setup file.

Then you need to setup a ehcache file to configure the cache as follows:

ehcache-ac.xml



name="com.mycompany.appcatalog.domain.Category"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="0"
timeToLiveSeconds="3600"
overflowToDisk="false"
/>

maxElementsInMemory="10000"
eternal="false"
timeToIdleSeconds="120"
timeToLiveSeconds="120"
overflowToDisk="true"
maxElementsOnDisk="10000000"
diskPersistent="false"
diskExpiryThreadIntervalSeconds="120"
memoryStoreEvictionPolicy="LRU"
/>

Then lastly, we need to enable caching for the particular object see @Cache annotation below:

Category.java
@Entity
@Cache(usage = CacheConcurrencyStrategy.READ_ONLY )
public class Category {

private String name;

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(name = "id")
private Long id;

Friday, March 26, 2010

Customize what parts of the object you want in the JSON Stream

This page has examples of how to customize what parts of the Java object gets output to the JSON stream.

in reference to: JacksonMixInAnnotations - FasterXML Wiki (view on Google Sidewiki)

test

test

in reference to:

"So in this case we have installed Hibernate as object-relational mapping (ORM)-provider. Hibernate is one of three ORM providers which Roo currently offers. EclipseLink and OpenJPA represent the alternative choices. In a similar fashion we have chosen the Hypersonic in-memory database database as our target database. Hypersonic is a convenient database for Roo application development because it relieves the developer from having to install and configure a production scale database."
- Chapter 2. Beginning With Roo: The Tutorial (view on Google Sidewiki)

Using ROO with MySQL and Maven Tomcat plugin

I spent some time trying to get Roo to work with both MySQL and the Tomcat plugin in such a way that I could do integration tests from Eclipse and deploy to Tomcat and use JNDI.

This posting is really more about how to use the Maven Tomcat plugin, it just so happens that I ran across this whilst using Roo.

The Roo tutorial seems a bit terse when it comes to these topics, which is a shame. I think Roo would be more useful if they helped setup the Maven Tomcat plugin and tried to configure the JDBC bits. Perhaps this will help document the problem and at least on possible solution.

I used Roo to setup MySQL (see guide) and then I tried to run mvn tomcat:run. It failed. Then I spent some time trying to figure out how to best configure the Maven Tomcat plugin and without breaking integration tests that I wrote.

Along the way, I learned quite a bit about the maven tomcat plugin, Roo and Spring 3. I figured I would share this with like minded folks on my team and beyond.

Here is what I came up for the pom.ml.

I could not get the XML stuff to show up correctly on blogger.. go here to see the code listings:

http://www.jroller.com/RickHigh/entry/using_roo_with_mysql_and

Roo added the following to pom.xml:



com.mysql.jdbc
com.springsource.com.mysql.jdbc
5.1.6



By the way Roo fills your application with artifact ids that are prepended with com.springsource. I am not sure why it does this, but it probably has something to do with having their own repositories and wanting to control the jar files that get used for customer support reason. It is either that or a secret plot for taking over the world.

Roo also added a tomcat plugin. The problem is that I need a context.xml file for tomcat so I can setup my JNDI datasource.

I reconfigured the tomcat plugin as follows in pom.xml:


org.codehaus.mojo

tomcat-maven-plugin

1.0-beta-1



tomcat





testing

yes











com.mysql.jdbc

com.springsource.com.mysql.jdbc

5.1.6









Notice that I add mysql jar file to the list of dependencies for the plugin so that Tomcat can find mysql and I can avoid the dreaded Class not found exception for the MySQL driver.

By the way, I tried many things before getting to this point. This is just the first one that worked. I am sure there are other ways, and even possible better ways. I got the above through trial and error and reading the maven tomcat plugin documents (sparse) and the tomcat manual.


Also notice that I have a tomcat property set. This tomcat folder will be under the project root directory, ${project.dir}/tomcat. This was the key for adding the JNDI datasource to my database.

In the ${project.dir}/tomcat dir I have a conf directory. The conf directory has the following files:


$ pwd
/Users/rickhightower/proto/application-catalog/tomcat/conf

$ ls
context.xml tomcat-users.xml web.xml


The tomcat-users.xml and web.xml were generated by the tomcat plugin under /target/tomcat. I just copied them here.

The context.xml is where I set up the JDBC as follows:




WEB-INF/web.xml

type="javax.sql.DataSource"
driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/mydb?autoReconnect=true"
username="mydb" password="mydb"
maxActive="20" maxIdle="10" maxWait="-1" />







(Now would be a good time to add the applicationContext.xml file to the watched resources (note to self)).

Notice that I have configured the Resource for my database.

In my application context file, I just add the following:




The problem is that now my code is somewhat tied to running in a container which is exactly what I don't want. I want to be able to run this application context without change in my integration tests.

I want to use this dataSource when I am testing and such.








To achieve this in Spring 3, I can use SPeL (Spring Expression Language) as follows:


ref="#{ systemProperties['ac.testing']=='yes' or systemProperties['ac.development']=='yes' ? 'dataSourceDBCP' : 'dataSourceJNDI'}"/>


The above basically states that if ac.testing or ac.development system properties are set to yes then use dataSourceDBCP otherwise use dataSourceJNDI.

I like SPeL.

Then to run with dataSourceDBCP instead of dataSourceJNDI (for whatever reason), you can pass a system environment variable as follows:


$ mvn tomcat:run -Dac.testing="yes"


You could setup a maven profile or something some such so that you don't ever have to pass this whilst running this from maven.
Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training