Tuesday, November 30, 2010

UI test smells: if() and for() and files

I read with interest Matt Archer's blog post entitled "How test automation with Selenium or Watir can fail"

He shows a couple of examples that, while perfectly valid, are poor sorts of tests to execute at the UI level.  Conveniently, Mr. Archer's tests are so well documented that it is possible to point to exactly to where the smells come in. 

The test in the first example describes a UI element that has only two possible states: either "two-man lift" or "one-man lift", depending on the weight of an item.  In a well-designed test for a well-designed UI, it should be possible for the test to validate that only those two states are available to the user of the UI, and then stop.  

But Mr. Archer's test actually reaches out to the system under test in order to open a file whose contents may be variable or even arbitrary, iterates over the contents of the file, and attempts to check that the proper value is displayed in the UI based on the contents of the file.  Mr. Archer himself notes a number of problems with this approach, but he fails to note the biggest problem:  this test is expensive.  The contents of the file could be large, the contents of the file could be corrupt, and since each entry generates only one of two states in the UI, all but two checks made by this test will be unnecessary. 

Mr. Archer goes on to note a number of ways that this test is fragile.  I suggest that the cases involving bad data in the source file are excellent charters for exploratory testing, but a poor idea for an automated test.  An automated UI test that simply checks that the two states are available without reference to any outside test data is perfectly adequate for this situation. 

In his second example, Mr. Archer once again iterates over a large number of source records in order to achieve very little.  Again, exercising the same UI elements 100 times using different source data is wasteful, since all the UI test should be doing is checking that the UI functions correctly.  However, there is an interesting twist in Mr. Archer's second example that he fails to notice.   If Mr. Archer were to randomly pull a single source record from his list of 100 records for each run of the test, he would get the coverage that he seems to desire for his 100 records over the course of many hundreds of individual runs of the test.  I took a similar approach in a test harness I once built for an API test framework, and I described that work in the June 2006 issue of Better Software magazine, in a piece called "Old School Meets New Wave". 

Both of Mr. Archer's examples violate two well-known UI test design principles.  The first principle is called the "Testing Pyramid for Web Apps".   As far as I can tell, this pyramid was invented simultaneously and independently by both Jason Huggins (inventor of Selenium) and by Mike Cohn.  (Jason's pyramid image is around the web, but I nabbed it from here.)




Any test that reaches out to the file system or to a database is going to belong to the middle tier of business-logic functional tests.  And even then, most practitioners would probably use a mock rather than an actual file, depending on context.   While it is not always possible for UI tests to avoid the business-logic tier completely, it should be the case that UI tests are in fact focused on testing *only* the UI.  Loops and conditionals in UI tests are an indication that something is being tested that is not just part of the UI.  Business logic tests should to the greatest extent possible be executed "behind the GUI".  From years of experience I can say with authority that UI tests that exercise business logic become both expensive to run and expensive to maintain, if they are maintainable at all.

The other principle violated by these examples is that highest-level tests should never have loops or conditionals.  The well-known test harness Fitnesse does not allow loops or conditionals in its own UI.  Whatever loops or conditionals may be required by the tests represented in the Fitnesse UI must be coded as part of the fixture for each test.  For a detailed discussion of this design pattern, see this post by Gojko Adzic: "How to implement loops in Fitnesse test fixtures

Sunday, November 21, 2010

close reading/critical thinking

The last Weekend Testers (Australia/New Zealand) was brilliant. Let me urge you to read Marlena Compton's report and the transcript of the session.

This sort of practical implementation of critical theory is long overdue in the testing community, and the WTANZ crew did a great job of using a well-known theoretical tool to analyze and dissect some real problems in some real work.

Compare what WTANZ did with Zeger Van Hese's recent demonstration of deconstruction.

This sort of work, bringing reputable and sophisticated critical theory to bear on actual testing and QA activity, is a field wide open, barely explored, and long overdue. 

May we see much more of it soon.

Tuesday, November 16, 2010

more on certs, more numbers

I noticed (thanks Twitterverse) that there was an interview with Rex Black over on the UTest blog.  In that interview he reveals a very interesting number:

"...the ISTQB has issued over 160,000 certifications in the last ten years."

Using the numbers from my previous post:  if we assume that there are about 3,000,000 software testers in the world right now, and if we issued 160,000 certifications right now, that would mean about 5 certifications for every 100 software testers.   

I would be willing to bet that there were about the same number of testers ten years ago:  Y2K was just over and the value of dedicated testers had been shown.   But as Alan Page and others have noted, there is a lot of turnover, a lot of churn, among those practicing software testing. 

So my numbers start to get a little sketchy here, I don't have anything to back them, so consider this a thought experiment:  as noted above, let's say that there were about 3 million testers a decade ago and there are still 3 million testers today.  Let's say half of today's testers have started since 2000.   This gives us a field of 4.5 million testers who could have acquired a certification in the last decade.  This makes for about 3 certified testers for every 100 possible certifications. 

I think it is an excellent bet that a significant fraction of those 160,000 certifications were issued in the UK, Australia, and New Zealand.   Just to make it even, call it about 1/3, put 60,000 certs in those regions, leaving 100,000 for the rest of the world.  That brings us down to about 2 certs per 100 testers.  

But that still seems high to me.  I might have missed something.  Regardless, it still looks like a pretty small market, and I'd bet the market has been shrinking a lot with the rise of agile adoption and the economic downturn.  

Thursday, November 11, 2010

an object of interest

I bought this recently at a guerilla art show:



Here it is hanging in my office:


The poster caught my eye because I've loved the Alice books all my life and I re-read them often. I am especially fond of the Tenniel illustrations, and the one for Jabberwocky is a favorite.

The poster also caught my eye because of the odd and interesting typeface. The story behind that typeface is fascinating. I asked the artist to send me that story in email so that I could have it written down:

The story is this:
Just south of where I grew up (near Green Bay, WI) is the Hamilton Wood Type Museum. A while back, I visited armed with a few sheets of printmaking paper with the goal of printing some or all of the Jabberwocky poem from some original wood type. During the course of the 19th and 20th century Hamilton made wood type for advertisements and headlines and circuses and had gone on to accumulate wood type from other manufacturers who had given in to market pressures or the eventual obsolescence of the letterpress industry. What remained when I visited were cases and cases of cabinets full of uncategorized type.. roughly 1000 different type faces and sizes. I spent a better part of a day just finding a type face I liked, using a rather capricious method to determine "the one": the style of the lower case 'g'. I estimate the type to have been produced in the early 20th century, probably for about 20 years, if that. It is an obscure, unnamed type face. I set the type but realized that by choosing according to lower case 'g', I had picked a case that only let me set 3 lines of text. This was all that was left of this type in existence. So, I printed the top three lines first on a large flat bed cylinder press called a Vandercook 325G (incidentally I have the exact same model press in my shop here), disassembled the text and composed the 4th line. When I returned to Colorado, I replicated the illustration from Through the Looking Glass and then added that to the print.
That's the story.
Enjoy
Dan


Zeger Van Hese is a Belgian software tester who, like me, is interested in critical theory and what application critical theory might have to the work of creating software. The other day he mentioned in passing a seminal work by Walter Benjamin, The Work of Art in the Age of Mechanical Reproduction that I had not read in many years.

In the light of Benjamin's work, my poster is a strange object indeed. While it was created in a process of mechanical reproduction, it was created only once. The means to create it are lost in an anonymous bin in an obscure warehouse somewhere in Wisconsin. And even if someone were dedicated enough to find that one particular bin, not even enough of this particular wood type exists to even print all four lines from Jabberwocky.

My poster would have been a strange item even for 1936, when Benjamin wrote about mechanical reproduction. But to have such a thing on my wall in 2010 is, for me, astonishing.

Saturday, November 06, 2010

XP, Scrum, Lean, and QA

Before I do this, two things up front: for one thing, I am a crappy programmer. I read code far better than I write it, and I read non-OO code better than I read OO code. Also, I am writing as someone who knows a lot about Quality Assurance and testing, and very little about the hands-on day-to-day work of modern programming. So here goes:

As a QA guy, I know this: long before Scrum and XP and the Agile Manifesto, people working in Computer Science and software development knew three things about software quality: having an institutional process for code review always improves software quality; having automated unit tests (for some value of "unit") with code coverage improves software quality; and having frequent, fast builds in a useful feedback loop improves software quality. Sorry, I don't have references handy, I read most of this stuff about a decade ago. Maybe some heavy CS people could leave references in the comments.

The XP practices simply institutionalize those three well-known practices, and for the time, dialed them up to 11. Pair programming is just a very efficient way to accomplish code review. TDD is just a very efficient way to accomplish unit test coverage. CI is just a very efficient way to get a fast feedback loop for the build.

There is nothing magical about these practices, and I have worked on agile teams that don't do pair programming but do awesome code review. I have worked on agile teams whose microtest suite runs heavily to integration tests instead of traditional unit tests. I have worked on agile teams with a dedicated build guy. I started my career working in an incredibly well-designed COBOL code base. No objects in sight. Had I known then what I know now about test automation, I could have written an awesome unit/integration test framework for that system. The XP practices themselves are not sacred. The principles behind those practices are.

But the XP practices themselves are just a small piece of having a successful agile team. In musical terms, these are the equivalent of knowing scales and chords, just basic technical practices necessary to get along in the business. Of course they are not necessary: The Ramones and Tom Petty have only a basic grasp of the technical aspects of music, but they cranked out some monster hits. Put any of those guys in a jazz jam session or a symphony orchestra and they would be completely lost. There is some nasty software out in the world that makes a lot of money.

I like Scrum, for a number of reasons. For one thing, it has an aesthetic appeal. The concepts of developing, then releasing, then retrospective speaks to me strongly, not least because they map closely to the ideas from the performing arts of practice, perform, rehearse.

I also like Scrum because of its emphasis on human interaction rather than institutional process. Scrum is lightweight by design, and leaves much room for people to act as people with other people. Scrum favors mature, intelligent adults.

Finally, I like Scrum because it is a process derived directly from the actual practice of creating software. It is described in plain English and it relies on no special concepts. It was crafted out of whole cloth by good developers in a tough spot.

I dislike Lean/kanban for the same reasons. As a mature adult, I resent having any of my activities identified as "waste". I resent not having the end of an iteration to celebrate. I resent being treated as a station in a production chain.

Unlike Scrum, the Lean principles were not derived from the actual work of software development. They came from automobile manufacturing, and were overlaid on software development in what I consider to be a poor fit. Putting on my QA hat again, there are two other popular software development methodologies that came directly from manufacturing, and the state of those methodologies is instructive. One of them is ISO9000. The fatal flaw of ISO9000 is that once a process is in place, it becomes very difficult and expensive to change that process. This is fine in manufacturing, but it is death to a reasonable software development process. The other methodology from manufacturing is Six Sigma. Six Sigma is very expensive, and while it might yield information valuable to managers, it provides no benefit to those actually doing the day-to-day work of software development. I am not aware of any manufacturing processes shown conclusively to improve the hands-on work of software development.

XP and Scrum are not nearly enough to guarantee a successful software project. For a comparable situation, just because a band has a rehearsal schedule and some gigs does not guarantee that they will be international superstars. Brian Marick at one point talked a lot about four principles that also increase the chance of a successful software project: skill and discipline, ease and joy. I won't explain those here, interested readers can find that work themselves.

But beyond even skill, discipline, ease and joy, a successful software project requires that we as creators of the software reach out and interact with the world in a way that changes the lives of those who encounter our software. It is an awesome power. In some cases, we can make our users' lives a living hell. But it's a lot more fun to make everyone happy.