Skip to main content

Test Automation Heuristic: Minimum Data


When designing automated UI tests, one thing I've learned to do over the years is to start by creating the most minimal valid records in the system. Doing this illuminates assumptions in various parts of the system that are likely not being caught in unit tests.

As I have written elsewhere, I make every effort to set up test data for UI tests by way of the system API. (Even if the "API" is raw SQL, it is still good practice.) That way when your browser hits the system to run the test, all the data it needs are right in place.

For example (and this is a mostly true example!) say that you have a record in your system for User, and the only required field on the User record is "Last Name". If you start designing your tests with a record having only "Last Name" on it, you will quickly uncover places in the system that may assume the existence of a field "First Name", or "Address", or "Email", or "Phone". For some background on this sort of thing see the well-known article "Falsehoods Programmers Believe About Names"

I ran into a particularly interesting case of this some time ago where I was testing with a record with a field containing an empty string. I encountered a part of the system where I should have been able to delete the record, but lack of that field prevented the operation. What was happening was that the system expected to find a string of text in that field on the record, and it expected to do a split() operation on a particular character in that string, which would return an array of strings split on that one character.

First of all, I expected to be able to delete the record, there was no reason the absence of this particular string should have prevented that. Failing that, I would have expected an error message along the lines of "Information in Field X cannot be read or is missing" or some such. But the system did not catch that exception since it didn't expect it, and the message I got instead was the language-level error "ARRAY INDEX OUT OF BOUNDS", which of course is not very helpful.

There is an argument that goes something like "But no user would ever do that", to which the canonical QA answer is "I'm a user and I just did it", but that reply is facile and over-simple. The underlying issue is that once your system gets into the hands of your customers, you cannot predict what your customers are going to try to do in the system. You owe it to them to be as helpful as possible. The much better answer is that you expect this system to scale to many users and many records of many kinds, and that the possibility of any one event happening at scale approaches certainty. There is a very good essay on this from long ago in 2004 called One In A Million Is Next Tuesday.

Of course not every operation can proceed with minimal data. You can't change an address from one value to another if you have no starting address and no address to change to. But starting your design with the most minimal legal set of data for the system is a great way to uncover assumptions about what data should and should not exist for your UI operations.

And finally, designing and building out test data takes time and thought. By starting with the minimum set, your tests are up and running and finding issues as quickly as they possibly can be.