Skip to main content

TOAD FTW! Evaluating a Test Suite: A TOAD Thought Experiment


The AND is important!

Disclaimer: In my career I have done most of the things I describe below, but never tied them all into one project. The following should be possible:

Testing a System

Suppose we have a software system of reasonable complexity. Suppose our system is comprised of a front end with a user interface (UI) and a back end, thus a client and a server. The front end and the back end communicate via an application programming interface (API) of some sort. This is a common architecture of many software systems.

Suppose further that our front end and our back end are well-designed. They have unit tests, and meet whatever definition of quality you would want to apply.

Because our system is reasonably complex, we want to have a suite of end-to-end tests that exercises the entire software-and-data stack,  that demonstrates that the users can do the things they need to do, and that the front end and the back end are communicating properly in the service of the users' requests.

In a system of reasonable complexity, there are a finite number of paths through the UI. We create a suite of tests that exercise these paths. Suppose for this thought experiment that our tests exercise all of the paths through the application available to the user. (In other places I have called this sort of test design "feature coverage". In this case we have 100% feature coverage.) The tests operate on a carefully chosen well-known set of data. The tests are designed well: each test navigates a path through the application that changes the application state at least one time; each test makes at least one assertion about the changed state of the application, and reports any unexpected state it encounters.

Now we have a question: how good is our end-to-end test suite? We believe that the tests cover all the interactions that a user would experience in operating the application, and we think we can expose all the errors or mistakes that a user might encounter-- but how can we know that for sure?

Observability at a High Level

In the Wikipedia article on Observability it says that a system is observable if we "...can determine the behavior of the entire system from the system's outputs."  Each of our end-to-end tests change the state of the system and then assert something about the changed state. We have already accomplished a certain level of observability, and because we have good feature coverage, we can say quite a lot about the state of what is probably the most important part of the system-- everything that users can do.

But we can't know about states that potentially exist but that we did not exercise. However, we can think about our architecture, and we can make our systems more observable than they are. Examining the front end of the system, it is likely that our end-to-end tests have in fact exercised all of the calls to the back end that exist. Since we stipulated that this is code of high quality, it is unlikely that there is dead code in the form of unused calls to the back end of the system.

The back end is more difficult to reason about. It is entirely possible that the back end is capable of supplying more information than the front end is capable of consuming. There is also a chance that there may be paths through the application that we failed to discern, and there could be data that cause problems we have failed to anticipate.

So our system is in fact observable because we can change its state and infer its status from those changes. We can be reasonably sure that we are exercising the capabilities of the front end to the greatest extent possible. And we can say that our test suite is probably pretty good-- but the possibility of "unknown unknowns" still exists.

Observability at a Deep Level

Google "software observability" and find a wealth of tools, approaches, and other material on the subject. A simple description of software observability is that code is instrumented in such a way as to expose state changes in the application. The records of these state changes may be consumed so as to analyze the behavior of the system, with the goal of exposing problems in behavior, performance, errors, etc.


This is an essay on TOAD, Testing, Observability AND DevOps. We want to be able to evaluate how effective our suite of end-to-end tests is, and we think TOAD can help.  So far we have a good grasp of a certain kind of end-to-end testing, and two kinds of observability. That second level of observability is where DevOps comes in.

Our code is instrumented such that it emits detailed notices of state changes in the system. We can consume and analyze those notices and gain a good understanding of the behavior of the system. We can create a profile of the behavior of the production system over a period of time that shows things like the kinds of state changes occurring across the platform and across the code base.

We can run our end-to-end test suite with the same instrumented code and generate the same profile of the behavior of the system under our suite of tests. When we compare the production profile to the test profile, we expect to see the same sort of instances of behavior and state changes in both profiles. When this is true, we can say with certainty that our suite of end-to-end tests is valuable and its coverage is excellent. Note that the instrumented code in production does not participate in that first kind of observability that we noted, where we make detailed assertions about the state of the system. The production profile only tells us what happened. The test system profile tells us that the same things happened when we tested the system, but only the test system tells us that the correct things happened.

But what if the production profile shows activity in areas that the test profile does not? Remember that the design of our tests mean that parts of the system, the back end of the system in particular, is necessarily something of a black box. Our test suite has no mechanism to discover dead code or unsuspected communication channels. This is where our second kind of observability and our DevOps operations can illuminate parts of the system that we may have neglected to put under test. We designed the best end-to-end tests that we could, and we can use TOAD to tell us if we did it right the first time, or if we need to create more test coverage than we have.