Failing Fast

failing-fast

Raise your hand if you have seen this before in your development or production error logs.

As you might have guessed, this is informing us that we are calling a method or property on an object that is currently null. The stack trace might be able to help us to locate where the error occurred. However, an ambiguous error like this can be hard to troubleshoot for multiple reasons.

  • If the stack trace does not have line numbers and multiple local variables exist within the method, it can be hard to determine which variable was null.

  • If the variable was an input parameter, it can be hard to determine which method within the call stack the null reference was introduced.

Failing Fast Enhances Maintainability

In my last post, I talked about the various factors that define software quality. One of those, maintainability, is very important, because it reduces the effort required to locate and fix an error in an operational program. Using a technique like failing fast can enable you and your team to locate errors more quickly in both development and production.

So, what does it mean to fail fast?

It means that rather than write code that ignores or band-aids an issue (like setting default values) and allows the code to limp along throughout the execution of your program, you fail “immediately and visibly” the minute that you are aware that there is an issue.

Assertions are the Key

assert-sign

In order to fail fast, the key is to use assertions in your code. An assertion is code that checks for a condition, and if the condition is not met, fails. In the case of the null object reference, you can do the following:

For null references, I like to check in places where two methods or classes interact:

  • Constructor: verify any dependencies passed in by another class
  • Method: check any input parameters passed in by another method or class
  • Method: check the response from another method or class, as in the case with the Gateway call above

Drying Up Your Assertions

After using these kind of assertions throughout your code, you would want to DRY (don’t repeat yourself) your code and create reusable assertions. Below is one example.

You might also look into third party libraries such as Magnum by Chris Patterson that are available as NuGet packages. Below is an example using the Guard class that Magnum provides for these kind of common assertions.

Another option, if you are using the .NET framework, is Microsoft’s Code Contracts. These allow the developer to specify pre- and post-conditions that can also be seen in the documentation of a given method. A discussion of their use is outside the scope of this article.

Is Failing Fast Robust?

At this point, some may object and say this technique of failing fast will create fragile software. However, by failing fast, errors will more likely be found during development and testing of your software, rather than in production where it really counts. And if a bug does escape in production, your team will more likely be able to fix the issue quickly by failing closer to where the issue originally occurred.

Another objection to this style of programming is that it will more likely cause the system to crash in front of the user. One way to mitigate this is to use a global error handler that gracefully displays a user-friendly, generic error message to the user while providing the error details to developers through logs or email. In the case of non-interactive applications such as batch processes or windows services, you don’t have to display a message, but after handling the error globally, operations continue by moving on to the next action/transaction.

Other Resources

If you are interested in reading more about the concept of failing fast, check out James Shore’s article for IEEE magazine discussing the same topic.

June 20, 2015 [email protected] , , , 3 Comments
  • The precondition asserts in your code is the way to go in service calls especially. Good tip.

    • Hi Bobby – I don’t know if you got my previous reply, because I think that I must have just made it an additional comment, so I am writing it again to thank you for commenting.

      I’m intrigued – can you tell me more? Are you talking about a web service call and doing validation to know whether or not it was a bad request as a precondition?

      If so, I agree that that is a good place for this kind of fail fast. We have had a recent example where the team that had built a service was filling in defaults for every part of the expected input object even to the point of new-ing up the actual input parameter if it was null. We didn’t know that there was an issue in one of our scenarios of calling that service until production. If they had failed fast, we could have addressed this in pre-production.

  • Thanks for your comment. I’m intrigued – can you tell me more? Are you talking about a web service call and doing validation to know whether or not it was a bad request as a precondition?

    If so, I agree that that is a good place for this kind of fail fast. We have had a recent example where the team that had built a service was filling in defaults for every part of the expected input object even to the point of new-ing up the actual input parameter if it was null. We didn’t know that there was an issue in one of our scenarios of calling that service until production. If they had failed fast, we could have addressed this in pre-production.