Skip to main content

Test Heuristic: Managing Test Data

Managing test data is one of the most difficult parts of good testing practice, but I think that managing test data receives surprisingly little attention from the testing community. Here are a few approaches that I think are useful to consider for both automated testing and for exploratory testing.

Antipattern: Do Not Create Test Data In The UI

But first I want to point out what I think is a mistake that many testers make. When you are ready to test a new feature or you are ready to do a regression test, or you are ready to demonstrate some aspect of the system, the data you need to work with should already be in place for you to use. Having to create test data from scratch before meaningful testing begins is an anti-pattern (and a waste of time!), whether for automated testing or for exploratory testing.

Create Test Data With An API

This is usually my favorite approach to managing test data. Considering this scenario in the context of an automated test, I usually will have a 'setup' step in the test that will use whatever API or APIs exist to create a set of records that the subsequent test will use. Then after the test runs, I have a teardown step that deletes all of the data created in the setup step.  This keeps my test environment clean and ready for use by any test in any order. I often mark the records I create this way with a particular string based on random numbers or a date-time value, so I always know for certain that I am working with the data I expect to find.

I take this approach with exploratory testing as well. At times I will have a dedicated script that will use an API to load a set of data into a test environment, but more often I will simply comment out the tear-down steps in whatever automated test is appropriate, then use my partial automated test to lay down a set of data that I can use for exploratory testing, without my having to set it all up by hand. This is also a handy approach when designing automated tests.

There can be drawbacks to using an API to set up and tear down data. In some cases the API can be quite slow. In other cases the nature of the data relationships among records can be unmanageably complex. Sometimes a different approach to managing test data is warranted.

Existing Data Store

It may be possible to save a 'golden image' of a set of useful test data such that the set of data can be deployed at will to a test environment. In this case you could deploy the whole set of expected test data to a fresh test environment, or you could swap out a set of data that you have altered until it is no longer useful with the fresh set of pristine data that you need to work with.

I once worked with an application that would happily address a number of different data stores, including SQLite for development purposes. This made swapping test data sets in and out quite simple.

Extract From Production

While this is a practice I have seen almost exclusively in the mainframe world, it is an approach that may be appropriate in some situations. It may be that you could create some sort of Extract/Transfer/Load (ETL) operation that would pull actual production data from a production instance into your test environment.