Sunday, 18 July 2010

Testable Java

Lately I have been modifying some code other people have written. There are some things about writing Java that I consider fairly basic, but not so I find. I will elaborate.

We use Spring Framework a lot. Spring supports the dependency injection concept and, used properly, makes life much simpler. To use it properly the idea is to 'inject the dependencies' (well, obviously, you'd think).

Dependency Injection

One of the huge advantages with this approach is that the code is more testable. You might have a class A which depends on class B. Without dependency injection you'd probably hard code a 'new B()' in class A, or have B a static class, and then call it. That means when you want to test A you necessarily have to test B. You can't mock it out.

It is much better to create a triplet of:
  • B the interface
  • BImpl the production code
  • BMock a mock implementation that does very little
Then you add a field of type B to class A with a setter and getter. When you write a unit test for A you set BMock into it so A calls BMock rather then BImpl when it is being tested.

Otherwise you find that to just test one class you have to do a massive amount of work to set up all the dependencies and their dependencies etc. Let's say B is a DAO which requires a database. BMock does not, it just returns some hard coded values. Now A's unit test (which uses BMock) needs no database. That's a help.

Sometimes there is more than one BImpl, for example you might want a version of B that fetches data from a web service or LDAP. A should not need to know anything about this (this is called encapsulation and is fundamental to OO concepts).

The instantiation of these classes is done by Spring. An XML file defines the relationships between the classes, or you can use annotations. But this post is more about Java than Spring. The point I must make is that your Java code is not doing the instantiation, otherwise it would have to know which B to instantiate.

It is worth including your mock classes in your distribution. We often find other internal projects using our classes (eg writing their equivalent of class A) and they need to use BMock to test their code. So it makes sense to bundle BMock in the same jar file as BImpl.

It makes even more sense to produce two jars and have them test against one and deploy the other, but that is more work.

Static Classes

So, is there no place for a class with a few static methods? Is that always the wrong answer?

Actually no. You should still use statics, I like to call them Utils classes, when the following criteria are met.
  • The class has no non-static internal storage (well, obviously, otherwise the static methods would not do what you want)
  • It needs to be called from lots of places and getting it injected into everywhere that needs is will be tedious.
  • It has no dependencies of its own. You can relax about JRE classes of course. You might even relax about 3rd party classes. But it should not depend on any classes you wrote yourself and anything it does depend on should be small/simple.
Consider Spring's StringUtils as an example. It has some static methods that manipulate Strings. Lots of code calls it and it depends on nothing else. Also there is only one StringUtils. There aren't several variations with different code needing different kinds of StringUtils.

But pretty much everything else needs to be an Interface/Impl/Mock triplet.

Configuration

Sometimes you want to inject values into your classes rather than other classes. You might have a class that sends messages to a web service and you want to configure that externally. This allows people to adjust the final address used at deployment time. There are several ways to do this:
  • Spring allows you to load properties from one or more properties files and use them in your bean definitions.
  • MaduraConfiguration allows you to specify the values in an Apache Configuration file and (again) inject them into your classes by specifying them in the bean definition.
No doubt other people have other variations. But one thing you should never do is inject your configuration scheme into your class and have the class call it. The class should be unaware of where the configuration information came from. Something called the setter which told it the value.

That has two advantages. First, you can change the way you configure things without having to change the configured classes. They never knew what was doing the configuring so there is no need for them to change.

Second, you can avoid writing a mock configuration source for your class to call if it never calls anything to do configuration. This saves a little work.

Conclusion

This isn't the whole story but hopefully it gives an idea of how to make the code you write more testable.

Saturday, 10 July 2010

I wouldn't start from here... (solving serialization problems)

Last week we had a problem with a live system, therefore it was an emergency. The problem was around serialization. The application uses Spring Webflow and, to allow the user to save where they were up to and restore it later, we serialize the Webflow conversation to the database. And there lay the problem.

We used object serialization, which you probably know, stores a binary representation of the Java object. The Webflow conversation object has a bunch of other objects attached to it, including application objects. It works fine as long as those objects do not change, and there lay the problem.

A new version of the application was deployed and one of those objects changed. So immediately any of the older conversations refused to restore, causing all kinds of problems.

The way you're supposed to handle this is to ensure that all your Serializable objects have a serialVersionUID field added. Eclipse flags classes that implement the Serializable interface for this reason and you really ought to take the hint and add a serialVersionUID. Java uses the serialVersionUID to figure out that even though the class has changed you still want it treated the same because the serialVersionUID is the same (unless you changed it deliberately).

Now, while that is all fine, it is not much help if you've already serialized the objects. At this point I'd like to point out that I had nothing to do with this part of the design, just so you know. I only got involved with solving the problem. But it is a bit like the old joke where you ask an Irishman how to get to Dublin and he says 'well,I wouldn't start from here!'

To solve this I figured we'd need to restore the objects into the old version, then write them to some neutral format, then serialise from there to the new format. Initially I was thinking I would need to run two passes on this. The first to write the neutral format to a file, and second to read the file. The two passes would be implemented in two different jar files so that the two versions of the same objects would not get mixed up. But someone suggested we could make this simpler by messing about with the class loader and I remembered madura-bundle which would provide a neat way to manage this.

Madura-bundle is kind of a simplified OSGi that is closely tied with Spring. You can create a bundle containing your objects and a spring beans file specifying some of them as beans. Then you have a main program with a main spring beans file, and you inject bundled beans into that. Then, and this is the interesting bit, you can switch from one bundle to another. The bundled beans injected into your main program change automatically.

So for this problem I loaded the objects from the two versions of the application into two different bundles, then I could restore the serialised objects from the database under bundle-1, switch to bundle-2 and serialise them back. Easy.

I used xstream for my neutral format. So, having unpacked the objects from the database I called xstream to get them into an XML string. Then, after I switched bundles, I used xstream to create the new version objects which I could then serialise to the database.

Xstream worked pretty much first time and it handles complex structures where an object is cross referenced several times. Where XML would typically repeat the object definition xstream stores an XPath when it finds a repeated object. I did have a small issue with one of the webflow objects which uses Externalize rather than Serialize. The xx object has a protected constructor which, naturally, gives xstream a problem. To get around this I had to implement a converter for the org.springframework.webflow.engine.impl.FlowSessionImpl class.

Xstream's architecture uses a bunch of standard converter classes which handle pretty much everything as far as I can see. But when necessary you can add your own converter. In this case all I had to do was extend
com.thoughtworks.xstream.converters.reflection.ExternalizableConverter
and override the unmarshal method to instantiate the FlowSessionImpl class explicitly. It put the converter in the same package as FlowSessionImpl so it overcame the protected constructor.

The code ended up like this. To unpack from the blob I used this method:
public String getXMLFromBlob(InputStream is)
{
    XStream xstream = new XStream();
    Object ret = null;
    if (is != null)
    {
        try {
            ObjectInputStream oji =
                new ClassLoaderObjectInputStream(
                    this.getClass().getClassLoader(),is);
            ret = oji.readObject();
        } catch (Exception e) {
               throw new RuntimeException(e);
        }
    }
    return xstream.toXML(ret);
}
I put this into the first madura bundle, and all it does is take the input stream which comes from the blob and converts it. I did have to implement an extension of the standard ObjectInputStream because the standard one locked on to the main class loader. The extension just uses the following to override the resolveClass method:
protected Class resolveClass(ObjectStreamClass desc)
    throws IOException, ClassNotFoundException{
    try{
        String name = desc.getName();
        return Class.forName(name, false, classLoader);
    }
    catch(ClassNotFoundException e){
        return super.resolveClass(desc);
    }
}

To get the new objects from the XML I switch to the other bundle and then call this method:
public Object getObjectFromXML(String xml) {
    XStream xstream = new XStream();
    xstream.setClassLoader(this.getClass().getClassLoader());
    xstream.registerConverter(
    new org.springframework.webflow.engine.impl.FlowSessionConverter(
    xstream.getMapper()));
    return xstream.fromXML(xml);
}
This also shows how the custom converter is registered with xstream. The object returned is always the Webflow Conversation object, plus the other objects that were attached to it.

So that's how things are serialized at the detail level. The process is controlled by this code:
getBundleManager().setBundle("bundle-1.0");
InputStream is = new java.io.ByteArrayInputStream(bytes);
String xml = getTranslator().getXMLFromBlob(is);
getBundleManager().setBundle("bundle-2.0");
Object obj = getTranslator().getObjectFromXML(xml);
Not very much to it really. The setBundle call, obviously, selects a different bundle and this switches back and forth between them as we process each record. I have not shown reading and writing the blob in the database but that is easy enough to figure out.

Although this does provide a solution to the problem you should not get into this situation in the first place.
  1. Do not serialise objects to a database like this. Put them into a neutral format such as XML (perhaps generated from xstream) instead. This feature of Java is best confined to sending objects between applications rather than persisting them.
  2. If you must serialize binary objects then declare the serialVersionUID field in every class. This is not completely foolproof. You can change the object enough to break it. But you'll probably be okay.
As part of this exercise I had a need to find the serial version of the class so that I could tell if I had an old one or a new one. My understanding is that Java uses the serialVersionUID if there is one and if it doesn't then it calculates a value based on the bytecode of the class.  In my case there was no serialVersionUID so I needed some way to find out what it was. I used this:
ObjectStreamClass objStreamClass = ObjectStreamClass.lookup(obj.getClass());
long serialVersion = objStreamClass2.getSerialVersionUID()