Tuesday, November 30, 2010

UI test smells: if() and for() and files

I read with interest Matt Archer's blog post entitled "How test automation with Selenium or Watir can fail"

He shows a couple of examples that, while perfectly valid, are poor sorts of tests to execute at the UI level.  Conveniently, Mr. Archer's tests are so well documented that it is possible to point to exactly to where the smells come in. 

The test in the first example describes a UI element that has only two possible states: either "two-man lift" or "one-man lift", depending on the weight of an item.  In a well-designed test for a well-designed UI, it should be possible for the test to validate that only those two states are available to the user of the UI, and then stop.  

But Mr. Archer's test actually reaches out to the system under test in order to open a file whose contents may be variable or even arbitrary, iterates over the contents of the file, and attempts to check that the proper value is displayed in the UI based on the contents of the file.  Mr. Archer himself notes a number of problems with this approach, but he fails to note the biggest problem:  this test is expensive.  The contents of the file could be large, the contents of the file could be corrupt, and since each entry generates only one of two states in the UI, all but two checks made by this test will be unnecessary. 

Mr. Archer goes on to note a number of ways that this test is fragile.  I suggest that the cases involving bad data in the source file are excellent charters for exploratory testing, but a poor idea for an automated test.  An automated UI test that simply checks that the two states are available without reference to any outside test data is perfectly adequate for this situation. 

In his second example, Mr. Archer once again iterates over a large number of source records in order to achieve very little.  Again, exercising the same UI elements 100 times using different source data is wasteful, since all the UI test should be doing is checking that the UI functions correctly.  However, there is an interesting twist in Mr. Archer's second example that he fails to notice.   If Mr. Archer were to randomly pull a single source record from his list of 100 records for each run of the test, he would get the coverage that he seems to desire for his 100 records over the course of many hundreds of individual runs of the test.  I took a similar approach in a test harness I once built for an API test framework, and I described that work in the June 2006 issue of Better Software magazine, in a piece called "Old School Meets New Wave". 

Both of Mr. Archer's examples violate two well-known UI test design principles.  The first principle is called the "Testing Pyramid for Web Apps".   As far as I can tell, this pyramid was invented simultaneously and independently by both Jason Huggins (inventor of Selenium) and by Mike Cohn.  (Jason's pyramid image is around the web, but I nabbed it from here.)




Any test that reaches out to the file system or to a database is going to belong to the middle tier of business-logic functional tests.  And even then, most practitioners would probably use a mock rather than an actual file, depending on context.   While it is not always possible for UI tests to avoid the business-logic tier completely, it should be the case that UI tests are in fact focused on testing *only* the UI.  Loops and conditionals in UI tests are an indication that something is being tested that is not just part of the UI.  Business logic tests should to the greatest extent possible be executed "behind the GUI".  From years of experience I can say with authority that UI tests that exercise business logic become both expensive to run and expensive to maintain, if they are maintainable at all.

The other principle violated by these examples is that highest-level tests should never have loops or conditionals.  The well-known test harness Fitnesse does not allow loops or conditionals in its own UI.  Whatever loops or conditionals may be required by the tests represented in the Fitnesse UI must be coded as part of the fixture for each test.  For a detailed discussion of this design pattern, see this post by Gojko Adzic: "How to implement loops in Fitnesse test fixtures

4 comments:

Gojko Adzic said...

I have a different take on the whole thing of UI test design now that, for me, makes such errors obvious. I call it the principle of symmetric change

Jared said...

I don't think we can say the pyramid was 'invented', perhaps observed and described. Craig Larman spoke to us regarding a similar test automation strategy around 2003, and it's been a fairly standard idea (if not practice) in the teams I've worked in since 2004.

I'll try and dig up some other references (if only because someone sent me a link to http://blogs.agilefaqs.com/2011/02/01/inverting-the-testing-pyramid/# and I was annoyed by their lack of attribution).

TestWithUs said...
This comment has been removed by a blog administrator.
The Geeks said...
This comment has been removed by a blog administrator.