Dublin Tech

Monday, January 2, 2012

Make your JAXB cleaner with the MOXy implementation

The principle advantage of using JAXB when marshalling and demarshalling XML is the programming model. Simply annotate a few POJOs and use the JAXB API's and you can serialise to XML and deserialise from XML very easily. You don't need to worry about the specifics regarding how the XML is marshalled / unmarshalled. Everything is much simpler than alternatives such as DOM and SAX.

Now data in XML files tends to be hierarchial in nature. For example, conside this XML file:

In this case, the person Barok Obama has a car which is a Green Ford Focus. Here, we see the hierarchial characteristics of XML. The Car is under the Person. In a more sophisticated example, a Person could have a Car, which has a Car Radio, which has an Amplifier, which has Transistors etc. But let's stick with our simpler case for the moment. Suppose we want to unmarshall this XML file using JAXB. We want all the person details (firstname, lastname etc.) and the model of the car belonging to the person. We create a Person POJO and a Car POJO and annotate as appropriate.

To unmarshall this we simply do:

This all seems very simple - especially when you consider that the Car entity doesn't even need any annotations! However, the Car only has one attribute and it can seem like overkill to have a POJO class for something we only want one attribute from! Remember this is a simple example, imagine if the hierarchial structure was much deeper. Something like an outer entity containing an entity, which contained another entity which contained even another entity and all we wanted was the outer entity and one attribute from very deepest nested entity. It's essentially the same problem but just even more overkill. We would have to ensure there were POJO class for everything in the hierarchy - even for entities which we wanted nothing from. No-one likes code bloat. So what can we do?

Well the first thing we gotta remember is that JAXB is a specification for which there are many implementations for (e.g. JaxMeAPI, MOXy, Metro). If we were to use the JAXB reference implementation (shipped with the JDK, there is not much we can do). We have to have a Car and Person POJO. However, if we use the MOXy implementation from EclipseLink we can use some of its extensions to help us. More specifically we can use the MOXy @XmlPath extension which is inspired from XPath.

Let's see it in action. Here is the updated Person POJO.

So where's the Car POJO gone? Well it's deleted. We don't need it anymore. Bye bye.
Using the MOXy @XmlPath annotation we do not need the Car POJO. This annotation resides in org.eclipse.persistence.oxm.annotations package and to get that on your classpath is really simple. If you are a maven user just add:

To tell your JDK to use MOXy for the JAXB implementation at runtime you put a file named
jaxb.properties in the same directory as your JAXB POJOs. It contains one line:

To ensure you are using the MOXy implementation just check the JAXB context:

You should see something like:

After that there are no changes. The exact same unmarshalling code can be used.
One reason why I really like this extension is because it means less code. This usually means cleaner code and more maintable code. This becomes even more obvious in more complex scenarios where entities are much more deeper in hiearchial structure than this simple example. It doesn't matter if you are using something like XJC to generate your POJOs you still got code bloat.

Remember JAXB set out to be a cleaner programming model than JAXP alternatives such as SAX and DOM but in scenarios with deep hierachies, the profileration of classes using JAXB doesn't make it a convincingly cleaner. Remember, it would be quite easy to ignore the classes you don't want using DOM and XPath or even just using SAX.

MOXy swings the battle for cleanliness back to JAXB by providing the ability to use XPath expressions for anything in our XML file.

Note: MOXy has just being included as JAXB implementation for WebLogic 12c.

References:
1. MOXy project page
2. Blaise Doughan's blog

Tuesday, December 27, 2011

JAXB, SAX, DOM Performance

This post investigates the performance of unmarshalling an XML document to Java objects using a number of different approaches. The XML document is very simple. It contains a collection of Person entities.

There is a corresponding Person Java object for the Person entity in the XML
...

and a PersonList object to represent a collection of Persons.

The approaches investigated were:

Various flavours of JAXB
SAX
DOM

In all cases, the objective was to get the entities in the XML document to the corresponding Java objects. The JAXB annotations on the Person and PersonList POJOS are used in the JAXB tests. The same classes can be used in SAX and DOM tests (the annotations will just be ignored). Initially the reference
implementations for JAXB, SAX and DOM were used. The Woodstox STAX parsing was then used. This would have been called in some of the JAXB unmarshalling tests.

The tests were carried out on my Dell Laptop, a Pentium Dual-Core CPU, 2.1 GHz running Windows 7.

Test 1 - Using JAXB to unmarshall a Java File.

Test 1 illustrates how simple the progamming model for JAXB is. It is very easy to go from an XML file to Java objects. There is no need to get involved with the nitty gritty details of marshalling and parsing.

Test 2 - Using JAXB to unmarshall a Streamsource

Test 2 is similar Test 1, except this time a Streamsource object wraps around a File object. The Streamsource object gives a hint to the JAXB implementation to stream the file.

Test 3 - Using JAXB to unmarshall a StAX XMLStreamReader

Again similar to Test 1, except this time an XMLStreamReader instance wraps a FileReader instance which is unmarshalled by JAXB.

Test 4 - Just use DOM
This test uses no JAXB and instead just uses the JAXP DOM approach. This means straight away more code is required than any JAXB approach.
Test 5 - Just use SAX Test 5 uses no JAXB and uses SAX to parse the XML document. The SAX approach involves more code and more complexity than any JAXB approach. The Developer has to get involved with the parsing of the document.

The tests were run 5 times for 3 files which contain a collection of Person entities. The first first file contained 100 Person entities and was 5K in size. The second contained 10,000 entities and was 500K in size and the third contained 250,000 Person entities and was 15 Meg in size. In no cases was any XSD used, or any validations performed. The results are given in result tables where the times for the different runs are comma separated.

TEST RESULTS
The tests were first run using JDK 1.6.26, 32 bit and the reference implementation for SAX, DOM and JAXB shipped with JDK was used.

Unmarshall Type	100 Persons time (ms)	10K Persons time (ms)	250K Persons time (ms)
JAXB (Default)	48,13, 5,4,4	78, 52, 47,50,50	1522, 1457, 1353, 1308,1317
JAXB(Streamsource)	11, 6, 3,3,2	44, 44, 48,45,43	1191, 1364, 1144, 1142, 1136
JAXB (StAX)	18, 2,1,1,1	111, 136, 89,91,92	2693, 3058, 2495, 2472, 2481
DOM	16, 2, 2,2,2	89,50, 55,53,50	1992, 2198, 1845, 1776, 1773
SAX	4, 2, 1,1,1	29, 34, 23,26,26	704, 669, 605, 589,591

JDK 1.6.26 Test comments

The first time unmarshalling happens is usually the longest.
The memory usage for the JAXB and SAX is similar. It is about 2 Meg for the file with 10,000 persons and 36 - 38 Meg file with 250,000. DOM Memory usage is far higher. For the 10,000 persons file it is 6 Meg, for the 250,000 person file it is greater than 130 Meg.
The performance times for pure SAX are better. Particularly, for very large files.

The exact same tests were run again, using the same JDK (1.6.26) but this time the Woodstox implementation of StAX parsing was used.

Unmarshall Type	100 Persons time (ms)	10K Persons time (ms)	250K Persons time (ms)
JAXB (Default)	168,3,5,8,3	294, 43, 46, 43, 42	2055, 1354, 1328, 1319, 1319
JAXB(Streamsource)	11, 3,3,3,4	43,42,47,44,42	1147, 1149, 1176, 1173, 1159
JAXB (StAX)	30,0,1,1,0	67,37,40,37,37	1301, 1236, 1223, 1336, 1297
DOM	103,1,1,1,2	136,52,49,49,50	1882, 1883, 1821, 1835, 1822
SAX	4, 2, 2,1,1	31,25,25,38,25	613, 609, 607, 595, 613

JDK 1.6.26 + Woodstox test comments

Again, the first time unmarshalling happens is usually proportionally longer.
Again, memory usage for SAX and JAXB is very similar. Both are far better
than DOM. The results are very similar to Test 1.
The JAXB (StAX) approach time has improved considerably. This is due to the
Woodstox implementation of StAX parsing being used.
The performance times for pure SAX are still the best. Particularly
for large files.

The the exact same tests were run again, but this time I used JDK 1.7.02 and the Woodstox implementation of StAX parsing.

Unmarshall Type	100 Persons time (ms)	10,000 Persons time (ms)	250,000 Persons time (ms)
JAXB (Default)	165,5, 3, 3,5	611,23, 24, 46, 28	578, 539, 511, 511, 519
JAXB(Streamsource)	13,4, 3, 4, 3	43,24, 21, 26, 22	678, 520, 509, 504, 627
JAXB (StAX)	21,1,0, 0, 0	300,69, 20, 16, 16	637, 487, 422, 435, 458
DOM	22,2,2,2,2	420,25, 24, 23, 24	1304, 807, 867, 747, 1189
SAX	7,2,2,1,1	169,15, 15, 19, 14	366, 364, 363, 360, 358

JDK 7 + Woodstox test comments:

The performance times for JDK 7 overall are much better. There are some anomolies - the first time the 100 persons and the 10,000 person file is parsed.
The memory usage is slightly higher. For SAX and JAXB it is 2 - 4 Meg for the 10,000 persons file and 45 - 49 Meg for the 250,000 persons file. For DOM it is higher again. 5 - 7.5 Meg for the 10,000 person file and 136 - 143 Meg for the 250,000 persons file.

Note: W.R.T. all tests

No memory analysis was done for the 100 persons file. The memory usage was just too small and so it would have pointless information.
The first time to initialise a JAXB context can take up to 0.5 seconds. This was not included in the test results as it only took this time the very first time. After that the JVM initialises context very quickly (consistly < 5ms). If you notice this behaviour with whatever JAXB implementation you are using, consider initialising at start up.
These tests are a very simple XML file. In reality there would be more object types and more complex XML. However, these tests should still provide a guidance.

Conclusions:

The peformance times for pure SAX are slightly better than JAXB but only for very large files. Unless you are using very large files the performance differences are not worth worrying about. The progamming model advantages of JAXB win out over the complexitiy of the SAX programming model. Don't forget JAXB also provides random accses like DOM does. SAX does not provide this.
Performance times look a lot better with Woodstox, if JAXB / StAX is being used.
Performance times with 64 bit JDK 7 look a lot better. Memory usuage looks slightly higher.

Wednesday, December 7, 2011

Ant versus Maven

There are many ways to organise build systems for Java projects. The two most predominant are probably still Ant and Maven. Debates between the two tend to go around in circles with the balance now swinging towards maven - since IDE support has got better (particularly Eclipse). My own view is if you do not have a good architecture which is modular in nature and which separates concerns that should be separated you'll run into trouble no matter what you use. The emphasis should always be on good architecture first and foremost.

That said, I made this short video which illustrates some of the arguments you hear from Maven-ites and Ant-ists. It is a debate between Maeve and Anthony. Maeve is arguing for Maven; Anthony is arguing for Ant. Obviously, it's impossible to cover every single argument but the video includes some of the principle ones. Get some popcorn and enjoy.

Saturday, December 3, 2011

Musing on mis-usings: 'Powerful use, Damaging misuse'.

There's an old phrase attributed to the former British Prime Minister Benjamin Disraeli which states there are three types of lies: "lies, damn lies and statistics". The insinuation here is that statistics are so easy to make up they are unreliable. However, statistics are extensively used in empiracle science so surely they have some merit? In fact, they have a lot of merit. But only when they are used corrrectly. The problem is they are easy to misuse. And when misused, misinformation happens which in turn does more more harm than good.

There are strong parallels to this narrative in the world of software engineering. Object orientated lanuages introduced the notion of inheritance, a clever idea to promote code reuse. However, inheritance - when misused - can easily lead to complex hierarchies and can make it difficult to change objects. The misuse of inheritance can reek havoc and since all it takes to use inheritance (in Java) is to be able to spell the word "extends", it's very easy to reek such havoc if you don't know what you are doing. A similar story can be told with polymorphism and with design patterns. We all know the case of someone hell bent on using a pattern and thinking more about the pattern than the problem they are trying to solve. Even if they understand the difference between a Bridge and an Adapter it is still quite possible that some part of the architecture may be over engineered. Perhaps it's worth bearing in mind that every single one of the GOF design pattern is already in JDK, so if you really want it in your architecture you don't have to look very far - otherwise only use when it makes sense to use it.

This 'Powerful use, damaging misuse' anti-pattern is ubiquitous in Java systems. Servlet Filters are a very handy feature for manipulating requests and reponses, but that's all they are meant to do. There is nothing in the language to stop a developer treating the Filter as a classical object, adding public APIs and business logic to the Filter. Of course the filter is never meant to be used this way and when they are trouble inevitably happens. But the key point is that it's easy for a developer to take such a powerful feature, misuse it and damage architectures. 'Powerful use, damaging misuse' happens very easy with Aspects, even Exceptions (we have all seen cases where exceptions were thrown and it would have made more sense to just return a boolean) and with many other features.

When it is so easy to make mistakes, inevitably they will happen. The Java compiler isn't going to say - 'wait a sec do you really understand this concept?' and codestyle tools aren't sophisticated enough to spot misuse of advanced concepts. In addition, no company has the time to get the most senior person to review every line of code. And even the most Senior Engineer will make mistakes.

Now, much of what has been written here is obvious and has already been well documentated. Powerful features generally have to be well understood to be properly used. The question I think worth asking is if there is any powerful feature or engineering concept in a Java centric architecture which is not so easy to misuse? I suggest there is at least one, namely: Encapsulation. Firstly, let's consider if encapsulation didn't exist. Everything would be public or global (as in Javascript). As soon as access scope narrows, encapsulation is happening which is usually a good thing. Is it possible to make an architecture worse by encapsulating behaviour? Well it's damn hard to think of a case where it could. If you make a method private, it may be harder to unit test. But is it really? It's always easy to unit test the method which calls it, which will be in the same class and logical unit.

There's a lesson to be learnt here. As soon as you design anything which something else uses, whether it be a core component in your architecture, a utility library class or a REST API you are going to tell the world about, ask youself:

How easy is it for people to misuse this? Is it at the risky levels of inheritance or the safer levels of encapsulation?
What are the consequences of misuse?
And what can you do to minimise misuse and its consequences?

Aim to increase 'powerful use' and minimise 'damaging misuse'!