Thursday, 4 March 2010

Failsafe

I have just spent several days hunting down what should have been a trivial problem but, because it was well hidden, it turned into a monster. There was no need for this to happen, there are ways to avoid it.

The actual problem was in a third party EJB. It threw a NullPointerException which it turned into an EJBException and passed to the client application. The first problem is that the source we have for the EJB doesn't actually match what we run, obviously not open source, but we do have a copy of an older version which is near enough. That is not an intrinsic problem, but it can be if the thing that logs the exception loses the stack trace. I was getting a stack trace okay, it told me there was a null pointer, and it showed down the the point where the exception was logged, not where it happened.

You'll appreciate, I'm sure, that stepping through the code without up to date source is awkward. My debug environment doesn't tell me local variable values if I don't have source, so I couldn't tell much at all. But, mostly by comparing the broken system with a similar, but working system, I was able to track down a minor difference in the way the database was set up.

It has made me think about throwing exceptions. Usually I just do the obvious and it works, but I notice PMD has been warning me about some of my exception handling and I am going to take more notice of its advice now. If this exception had been better reported I would have been able to find the problem faster.

This is really about failing safely, or at least helpfully. You're probably not going to handle every exception case that can happen, especially when you are fed stuff from external systems like databases and web services. So it is worth making sure the exceptions throw decent stack traces.
Post a Comment