Skip to main content

Posts

Showing posts from 2019

TOAD Goes To 11

TOAD is Testing, Observability And Devops. We think these three things are related. What would happen if we took each of the three aspects of TOAD and put strong emphasis on each in turn? What happens if we take our testing effort as far as it can go, and "dial it to 11"? Observability to 11? Devops to 11? The meaning of "dial to 11" will be different for different organizations: it might mean hiring staff, or investing in tools, or even just emphasizing the mission more than before.

If we dial testing to 11, I think two things will result. For one thing, the number of observable incidents in production will likely go down, because the emphasis on testing means that more problems will be found and fixed before deploying to prod. I also think that rate of deployments (Devops) could potentially increase, because the decrease in observable incidents will make deployments safer. So: increase the T in TOAD to decrease the O and increase the D.

With testing at 11, now…

TOAD: RRRAR! Rollbacks, Replays, Reverts, and Regressions

TOAD is

Testing
Observability
And
DevOps

My last blog post  was about why TOAD ideas are important and how TOAD ideas interact with each other. But even the most excellent TOAD systems occasionally release bugs to production. This post describes how to understand these problems and how to address them when they happen. And there is an experience report at the end!

Most of the time we learn about problems in the production system because the system is observable. We see a performance problem, or a data problem, or a space problem, and we can use observability tools to narrow down and ultimately fix the cause(s) of the observed problem. There is also an entire class of problems that cannot be discovered with typical observability tools, but only by testing the system. Given that we find such problems in production, what can we do about it?
Rollback and Replay The most drastic response to a production problem is to "roll back" the system to the last known good state by means …

TOAD FTW! Evaluating a Test Suite: A TOAD Thought Experiment

TOAD is Testing
Observability
AND
DevOps
The AND is important!

Disclaimer: In my career I have done most of the things I describe below, but never tied them all into one project. The following should be possible:
Testing a System Suppose we have a software system of reasonable complexity. Suppose our system is comprised of a front end with a user interface (UI) and a back end, thus a client and a server. The front end and the back end communicate via an application programming interface (API) of some sort. This is a common architecture of many software systems.

Suppose further that our front end and our back end are well-designed. They have unit tests, and meet whatever definition of quality you would want to apply.

Because our system is reasonably complex, we want to have a suite of end-to-end tests that exercises the entire software-and-data stack,  that demonstrates that the users can do the things they need to do, and that the front end and the back end are communicatin…

My Remote Retrospective Process

Possibly the most important aspect of an agile process is the retrospective. A retrospective usually happens in a team meeting, generally at the end of every agile iteration or sprint. While there are any number of ways to run retrospectives, the object is to discuss three questions:

     What is working well?
     What is not working well?
     What should we change?

The first question tends to be the easiest to answer, and it is tempting to just skip it, but that is dangerous. It is every bit as important to celebrate the team's success as it is to grapple with the team's issues and problems. Positive reinforcement is more powerful than negative reinforcement.

The second question also tends to be easy to answer in teams where the members feel safe to discuss work honestly. It is important to surface problems so that the whole team is aware of the current issues and why those issues affect the work.

The third question tends to be hard to answer. There is an entire …