Sunday, February 26, 2012

deja vu: code, culture, and QA



Some years ago I had the privilege of making some suggestions for Brian Marick's book Everyday Scripting based on the first article I ever wrote for Better Software magazine.  That article appeared in 2004, and I just recently ran into a similar situation at work. 

Wikipedia is localized for well over 100 languages.  I had only been working at Wikimedia Foundation a couple of weeks when I heard that discrepancies between the localized message files from version to version could cause problems when upgrading.  I didn't know what kind of problems, but since we're upgrading all the Wikipedia wikis to version 1.19, that sounded like sort of a big deal, so I followed up.

It turns out that changes to the localization files are essentially undocumented, no tools exist to monitor such changes, and we simply did not know anything about discrepancies in those files.  So I decided it would be useful to look into that.

You can find the Wikipedia localization files for version 1.19 here  and for version 1.18 here if you want to follow along.  

Since there are well over 100 files in each directory and each file has 1000s of lines, checking for discrepancies manually is impossible.  From one of the senior people on the Wikimedia dev staff I got a few examples of certain places in these files where discrepancies would cause big problems.  (See technical note at the end.)  Although I've cleaned the code up quite a bit (one-off scripts don't have to be DRY, right?) here's what I did to cite discrepancies for one of the examples:

In a directory called 'mediawiki' I have one directory 'lang118' and another 'lang119'.  In those directories are all of the Messages*.php files for each version.  What I want to do is read each file in each version, identify the contents of the $namespaceNames array, and compare those contents for every file in each directory. 

path119 = 'mediawiki/lang119/'
path118 = 'mediawiki/lang118/'

r119namespaceNames_array = []
r118namespaceNames_array = []

def get_values ( path, array_name  ) 
  Dir.foreach(path) do |name|
  unless File.directory?("#{path}#{name}")
    text = File.read("#{path}#{name}")
    text.scan(/namespaceNames.+?\)/m)
    array_name << name + $~.to_s
         end #unless
  end #do
end

get_values(path119, r119namespaceNames_array)
get_values(path118, r118namespaceNames_array)

mismatch = r119namespaceNames_array - r118namespaceNames_array
disc= mismatch.length.to_s
puts "number of files with discrepancies in $namespaceNames array is #{disc}"

mismatch.each do |string|
  file = string.split(".php")
  puts file[0]
end
 
 
This script runs from the directory above 'mediawiki'.  It defines the paths to where the localization files live, and defines two arrays to hold the values to be compared.  For each directory it calls the 'get_values' method, and puts the name of the file and the contents of the $namespaceNames array of that file into the appropriate array.  Subtracting one array from the other yields a set of all mismatches, and with that the script knows how many files have mismatches, and what the names of those files are.  

Reading this script should be fairly straightforward for anyone who knows a little bit of Ruby.  Note a few things, though: 

* 'unless' is equivalent to 'if not', and the script needs to not check directories, only files
* File.read is the same as Perl's "slurp", it puts the entire contents of the file into the variable 'text'
* the 'scan' method takes a regular expression for an argument.  Here the regular expression is saying "give me all the text that begins with the string 'namespaceNames' and ends with the string ')'.  I had forgotten that '.+' is 'greedy', and will match past the terminating string, so doing '.+?' prevents that, thanks Charley Baker for the reminder.  The 'm' at the end of the regex tells it to match multiple lines, which is necessary because each value of the $namespaceNames array is on a single line and I want to match all of them in one fell swoop.  

The output from this script looks like


number of files with discrepancies in $namespaceNames array is 16
MessagesEn_ca
MessagesEn_rtl
MessagesFrp
MessagesIg
MessagesMk
MessagesMzn
MessagesNb
MessagesNds_nl
MessagesNo
MessagesOr
MessagesOs
MessagesQug
MessagesSa
MessagesSr_ec
MessagesWar
MessagesYue
 
At this point it made sense to just look at the problem files with my eyeballs and see what was in their $namespaceNames arrays.  With a little help from diff(), that's what I did.  I reported the discrepancies I found on a public mail list for Wikimedia tech issues.

A couple of interesting things happened because of that.  Again, keep in mind that I am a total n00b with these systems.  While I have a little more information now, I had no idea of what the consequences of such discrepancies would be.

I got an answer on the mail list from a senior Wikimedia dev person who analyzed the discrepancies I reported and said in effect "everything's fine, we are good to upgrade based on these examples".  And while there are several other areas in these localization files that could cause issues, my example demonstrates that the technical risk for upgrading to 1.19 seems low.

But then some days later in a a conversation on IRC, a different senior Wikimedia dev person said in effect "whoa, whoa, whoa, if we release these changes without at least some review from the language communities affected, we are going to be in for big trouble".

As I write this I do not know if the localization files for Wikipedia will be upgraded next week or not; that decision is not in my hands. However, I am immensely pleased that as a total n00b I was able to provide true concrete examples of the data in question to inform that decision.

I decided to write about this for a number of reasons:

To my mind, nothing in this story has anything to do with "testing".  For some time now I have been saying that "QA is not evil", and to me, this was an exercise in pure Software Quality Assurance.  Since my official title at Wikimedia is "QA Lead", this makes me happier than you would imagine.

One of the great neglected areas of software projects is the state of the actual data in applications, be it held in files or databases or whatever.  One of the most important skills QA/testing people can bring to bear on a software project is the ability to isolate critical chunks of data from enormous data stores.  That was true when I wrote "Is Your Haystack Missing a Needle" in 2004, it was true when Brian published "Everyday Scripting" in 2007 and it remains true today.  If as a QA/testing person you don't know how to read a bunch of files and do regular expressions (and for that matter do SQL queries too), you owe it to yourself and to your projects to learn. (Frankly, I hadn't done this kind of thing in a long, long time, and it felt great to get back on that horse.)

Finally, I wrote this because all of the data and all of the conversations we had were completely open and public.  I could give you a link to the email thread where I published the detailed discrepancies and got the reply, I could publish a link to the IRC log where people discussed the cultural risks of upgrading the localization files.  The only reason I don't is because they're not germane to the story.  I so enjoy working in an open culture.

Technical notes:

My original script checked for discrepancies among four arrays:  $namespaceNames, $namespaceAliases, $magicWords, and $specialPageAliases.  The $magicWords array was trickier, and I had to do this:

text = File.read("#{@@path118}#{name}")
text.scan(/magicWords.+?\);/m)
if $~.to_s.length > 0
array = $~.to_s
array_no_space = array.gsub(/\s+/,"")
@@nsn118magicWords_array << name + array_no_space

For one thing, $magicWords is an array-of-arrays, so I check for a terminating string of ');' instead of just ')'.  For another thing, some of the files didn't contain the $magicWords array.  For another thing, I found some random differences in whitespace between versions for many many files, so I eliminated all the whitespace in the strings in question by doing 'array.gsub(/\s+/,"")'.  The comparison only became valid once those things happened.

Saturday, February 04, 2012

Who I Am and Where I Am, early 2012

I've been pretty quiet in recent times, but that's going to change somewhat in 2012, so I thought I'd write this(*) to catch up.

As of last week, I am the QA Lead for the Wikimedia Foundation. My job there will be to create, codify, and execute the software testing and quality assurance regimes for the software that powers Wikipedia and its associated properties.

I've worked some other interesting places, among them Thoughtworks and Socialtext. I like open source and wikis. I have been a dedicated telecommuter/remote worker since 2006. Depending on when you read this, I'm in either Santa Fe NM or Durango CO, or somewhere else.

I have written about software a lot. Most of my writing in recent times has been for SearchSoftwareQuality.com  (warning: registration wall), but I've also written a lot for StickyMinds.com  and a couple of articles for PragPub.   I wrote a chapter for Beautiful Testing.

I created the Writing About Testing peer conference and associated mail list and wiki. I like to think WAT has had some influence on the testing/QA field over the last couple of years. I've given presentations at a couple of Agile conferences, once at PNSQC, a couple of AWTAs, and some smaller peer conferences from time to time. I attended two GTACs.

I've been using browser automation tools (Selenium  and Watir) since they existed. I know a lot about their history and something about how to use them well. As a programmer I am slow and simple. What programming I do is usually in Ruby.

I play fretless electric bass guitar with a couple of jazz bands, but outside of the WAT conference I'm more well known for playing irreverent tunes on a cheap green ukulele at software conferences.  I'm a pretty good musician.

I'm @chris_mcmahon on Twitter, christopher.mcmahon at gmail and gplus. I ignore LinkedIn and I don't have a Facebook page, don't try to reach me there.

* props to Warren Ellis, from whom I stole the idea of this post.

Sunday, December 04, 2011

Just Fix It

On the writing-about-testing mail list recently was a discussion of defect tracking.  Given a good enough code base and a mature dev team, I think defect tracking is mostly unnecessary, and it's worth talking about why that is. 

Some time ago there was a popular meme in the agile testing community that goes "Just Fix It", but I haven't heard it mentioned in some time, and I think it's worth reviving the discussion.  The idea behind Just Fix It is to bypass the overhead of creating a defect report, having those defect reports go through some sort of triage process, and only then addressing the problems themselves represented by the defect reports.  You save a lot of time and overhead if you Just Fix It. 

For some time now I have specialized in testing at the UI, a combination of functional testing and UX work.  In my experience, in a good code base, important defects found at the UI level are almost always what I think of as "last mile" issues, where the underlying code has changed in some way but the hook for that code into the UI has been mangled or overlooked.  These are cases where unit tests are almost certainly passing, but the app is broken anyway.  Some examples:

  •  Explanatory text has disappeared or no longer describes function accurately.
  •  A widget that used to function no longer does.  For example, a Submit button no longer makes anything happen.
  •  A call to some underlying function is no longer correct.  For example, a Search function that used to return results no longer does.

While a Just Fix It culture is not necessarily agile, examples of Just Fix It are easier to describe in a typical agile situation.

Small Team

Many agile teams share a single space, making communication easy and instantaneous. In such a situation, a conversation like this might happen:

Tester: "Hey, the froobnozzle stopped froobing, anyone know anything about that?"
Dev: "Wow, I didn't realize my last commit would break the froob function, I'll Just Fix It."

This is a canonical example of what Lisa Crispin calls the "whole team approach", where testers, devs, and everyone else is working on the same stories at the same time in the same place simultaneously.

And if it's appropriate, there's no real reason a tester couldn't Just Fix It themselves.

Large Team

But some teams are too large for a conversation like this to be practical.  Assume a collocated team with a really big story board with dozens of story cards all being moved around a lot.  Say a tester finds an issue with the froobnozzle. 

Tester: grabs a red sticky note and writes brief description of froobnozzle problem. Puts sticky note on froobnozzle story card
...minutes later...
Dev: whoa, a red card on my story, better Just Fix It. 

Distributed Team

Distributed teams tend to have really sophisticated issue-tracking systems in place, where stories are represented in software of some sort, where they can be assigned, have their status changed, etc. etc.  If a distributed team is small enough, a tester will know that Joe is working on the froobnozzle story, so:

Tester to Joe on IM: "hi Joe, I think you might have just broken the froobnozzle."
Joe the Dev:  "whoa, good catch, I'll Just Fix It."

Large Distributed Team

On a large distributed team, identifying who might be in a position to Just Fix It can be complicated.  One strategy is to read the commit logs upon identifying a defect to see who or what may have caused the problem.  Another strategy might be to review all the stories in play to discover who might be working on the froobnozzle this iteration. 

But sometimes these sorts of approaches are too complicated or take too much time.  One pattern I have seen on several occasions in large distributed teams is to designate a knowledgeable person on the dev staff, or possibly a Scrum Master type, to represent the whole dev team for questions about behavior or function.  I have seen this role called the Face, and the Ninja, and the Disturbed.  

Tester:  "hi Face, I just discovered the that the froobnozzle got broken within the last day or so."
Face: "whoa, let me check that for you"
Face: "good catch, Joe broke that two commits ago, he's Just Fixing It"

Defect Found in Production

A customer probably reported it.  The fix is deployed to production within minutes or maybe hours of the report.  (Again, a good code base allows this.)

I worry that too often "root cause analysis" is a synonym for "blame".  Defects in production are almost certainly a process problem, and the place to address process problems are in retrospectives or similar conversations. 

Besides, if your team is releasing so many defects to production that you have to track them, you have bigger problems.

Won't Fix

True story:  just this week I was refactoring some Selenium tests and discovered a bug.  This was in a part of the application that is not exposed to customers, it is only for internal users employed by my company.  The bug was that attempting to enter a duplicate record causes an unhandled exception and the user is presented with an ugly stack trace.  This was an old bug, and was not part of the work of the current iteration.

I work on a large distributed team.  As I noted above, we have a sophisticated issue-tracking system in place.  All of the work we do is documented and tracked in this system.  We have no designated defect-tracking system, just a single monolithic sophisticated issue-tracking software application.

Upon finding the bug, I had a conversation with the dev who knows about that part of the code.  We agreed that this was a no-harm-no-foul situation, no data corrupted, minimal inconvenience to the user, no customer exposure.  We agreed that Just Fixing It right this minute wasn't very important.

So I created a new issue in the issue tracking system and assigned it to the dev who knows about that part of the code.  This issue has the same visibility and status as every other issue in the system.  My bug report issue will be treated the same as every other issue in the system, included in the backlog, and prioritized to be worked on with every other issue in the backlog. 

I don't even really think of that particular issue as a defect.  It's just a description of the state of a part of the application, some work that we might choose to do at some point.  I'm sure we'll Just Fix It pretty soon. 





Friday, September 30, 2011

a selenesse trick

Selenesse is the mashup between Fitnesse and Selenium I helped work on some time ago.  I keep encountering this pattern in my selenesse tests, so I thought I'd share it...

Every once in a while a test needs some sort of persistent quasi-random bit of data.  In the example below I'm adding a new unique "location" and then checking that my new location appears selected in a selectbox.   This is also a neat trick for testing search, or anywhere else you need to add a bit of data to the system and then retrieve that exact bit of data from somewhere else in the system. 

| note | eval javascript inline FTW! |
| type; | location_name | javascript{RN='TestLocation' + Math.floor(Math.random()*9999999999);while (String(RN).length < 14) { RN= RN+'0';}} |
| note | instantiate a variable and assign the new value to it in a single line FTW! |
| $LOCATION= | getValue | location_name |
| note | click a thingie to add the new data to the system |
| click | location_submit_button ||
| waitForTextNotPresent | Adding ||
| note | check that the newly added data appears as the selected entry in a selectbox FTW! |
| check | getSelectedLabel | location_site_id | $LOCATION |

Thursday, September 08, 2011

more UI test design (once more from Alan Page)

Before it gets lost in history, I want to riff off Alan Page once again, who made some excellent points.  But as someone who has been designing and creating GUI (browser) tests for a long time, I'd like to address some of those and also point out some bits of ROI that Alan missed. 

Making UI tests not fragile is design work.  It requires an understanding of the architecture of the application.  In the case of a web app, the UI test designer is really required to understand things like how AJAX works, standards for identifying elements on pages, how the state of any particular page may or may not change, how user-visible messages are managed in the app, etc. etc.  Without this sort of deep understanding of software architecture, UI tests are bound to be fragile.

I've said before that UI tests should be a shallow layer that tests only presentation, and that rarely tests logic, math, or any underlying manipulation of data.  If tests are designed in this way, then they will be robust and maintainable over the life of the app.

UI test design is a skill.  Designing such tests is no harder or easier than any other activity that requires skill and understanding.  The tools with which I am familiar provide enough power to create reasonable, robust, maintainable tests.

Finally, I think Alan is missing one aspect of automated UI tests that I find the most valuable of all. 

From green-screen mainframe systems to bleeding-edge web applications, in my experience every software system suffers from one particular sort of error that is always extremely difficult to see when testing manually:  when something goes missing. 

A search that used to return results no longer does.  A widget that used to be on the page no longer is.  A bit of text critical to the user's work goes missing. 

Actions that cause errors are easy to find when testing manually, as are errors of presentation.  Elements and functions that used to exist but no longer do are difficult to find:  it is not easy for a human being to see the absence of a thing, but such errors stick out like sore thumbs in an automated UI test suite.

In my experience, this is one of the most valuable aspects of automated UI testing, and one of the best reasons to invest in UI test automation.  The absence of a thing in the UI is simply not detectable with unit tests or with integration tests.  That critical bit of function that doesn't manage to cross the last interface to the UI is only detectable at the UI itself, and automated UI tests are very, very good at detecting errors where something has gone missing. 

Tuesday, August 30, 2011

Automated Test Design (riffing/ripping off Alan Page)

Alan just posted this: http://angryweasel.com/blog/?p=325

This is the nicest test example I've seen in a long time, and I think it bears a little more analysis.

If I were in charge of the development of this app, I would make automated testing happen on 3 levels.

First, there would have to be some sort of test for generating a single random number. Which seems easy enough, but at this level, you really have to understand exactly what a call to rand() (or whatever) on your particular OS is going to do. I once wrote a script in Perl (praise Allah just a toy, not production code) that returned perfectly random numbers on OSX but returned exactly the same values every time on Windows. You'd better test that your call to rand() really returns random numbers, regardless of the OS it is running on.

At a higher level, you'd want to do exactly what Alan talks about in looping 100,000 times (although 100K seems like overkill to me). This is what I think of as an "integration test". You're going to have some method or possibly even an API call or REST endpoint like "generate_five_random_numbers_and_add_the_results". You're going to want to exercise that wrapper enough to convince yourself that the numbers are right and that the math is right.

For some time now I've been making my living writing automated UI/browser tests along with doing ET. Here's my take on a UI test for Alan's app:

Open the page
Check that text "Total" exists.
Check that the 6 textboxes exist.
Click the Roll button.
Check that the state of the page changes (however that happens).
Check that there are values in each textbox (even this might be too fancy, depending on the app and the test framework).

Also getting fancy, you could check that the 6 textboxes exist in the correct order. (Selenium's "glob:" feature makes this pretty painless.)

Doing math in UI tests isn't very smart. Doing data comparison over multiple runs in UI tests isn't very smart. Do that stuff at lower levels, where you can take advantage of programming power and speed. UI tests are a shallow layer where you simply check that the user has all the stuff they need to get the job done.

Monday, December 20, 2010

where ideas come from

Today stickyminds published my article with expanded descriptions of the "10 Frontiers for Software Testing" that I suggested as starting points for those interested in attending the second Writing About Testing conference

Since I announced the CFP for the first WAT conference in October 2009, I have published several dozen articles on software and software testing.  (I actually lost count: it is well over thirty but fewer than fifty individual pieces.)

My friend Charley Baker asked me recently where I get the ideas for so many articles.   It is an interesting question, and worth answering:

The most important source of ideas is simply everyday work.  As I go about doing my job, it happens fairly often that a situation crops up that I think would be of general interest to the community of software testers and developers.  So I write it down and I make it public.  Articles about bugs, bug reports, test design, architecture, workflow, telecommuting, frameworks, war stories all come from noticing the details of the everyday work.

Here is the story of the very first software article I ever published:  I have been following Brian Marick's work for a long time now.  Brian used to be the editor of Better Software magazine, and he would occasionally solicit articles for the magazine on his blog.  In March 2004 Brian asked for submissions for a piece along the lines of "add(ing) a scripting language to her manual testing portfolio.".  In particular, I recall that Brian wanted an article suitable for beginners with an example of a testing problem that could only be solved by scripting a little program. 

I had written book reviews before but I had never published a piece about software.  I had just encountered a situation at work that was a perfect example of what Brian wanted.  I was working for a company that was switching from shipping whole custom-built servers to shipping installation CDs for COTS hardware.  The installation CDs contained more than 4000 files.  The switch was a little bumpy, and at one point we very nearly shipped an installation CD missing 4 critical files of the 4000.   I had been teaching myself Perl (so I was a beginner myself), and I wrote a little script in Perl to compare recursively the contents of large directories, so that it would be easy to see if some few files had gone missing.  I described what I had done, Brian published it in Better Software, and in one of the highlights of my career as a writer, that article (with me as a character!) became the basis of the first example in Brian's book Everyday Scripting with Ruby.  (Get the book:  it will make you a better coder, no matter your level of skill.)  The article was titled "Is Your Haystack Missing a Needle"

Another source of ideas for software articles comes from having some Very Large Idea that evolves over a long time.  At Bret Pettichord's Austin Workshop on Test Automation in 2007, in a moment of inspiration, I gave a five-minute lightning talk demonstrating an example of using the artistic language of critical theory (in particular, New Criticism) to evaluate the quality of a piece of software.  The talk got an enthusiastic reaction from the people in the room, mixed with some skepticism as I recall.   It struck me at the time as being an odd idea, but the more I considered it, the more it made sense.  I wrote a long paper on the subject and submitted the paper to the CAST 2008 conference, but it was rejected.  I published it on my blog, and I still refer to it now and then.  My thinking on the subject has matured and expanded since then, so if you'd like to see the latest example, look at PragPub magazine for November of this year.  In 2008 I was a lonely voice on the subject.  Today I have colleagues, it is nice to see others considering critical theory applied to software as well. 

Finally, every once in a while, I manage to do something really unusual, something that will actually change peoples' minds about how they go about their work.  In 2006 I was working for Thoughtworks on an EAI project.  Our code base had great unit test coverage and integration test coverage, and as the QA guy, I was not finding defects in what we were creating.  But we had to interact with a legacy database, and we were often surprised by unusual or corrupt historical data.  I made it my business to expose as much of that bad data as I could.  I wrote a little Ruby script that would do quasi-random queries in the database, request the same data from the API we were building, and compare the results, running within an infinite loop.  I found a significant number of issues in this way, where the API we were building failed to handle data we never expected to find in the database.  To my knowledge, no one had ever published anything describing a situation like this.

So I wrote a draft of an article on the subject and submitted it to Brian at Better Software.  Nearly all of my articles have been published with only minor editorial changes, but this draft was a hot mess.  Any reasonable editor would have rejected it outright.  What Brian did instead was to dissect the piece, pull out the essential concepts, and make diagrams showing what I had failed to describe well.  He sent me some diagrams, I made some corrections, he sent me some more diagrams.  Once the diagrams were correct, I re-wrote the piece from scratch as a description of Brian's diagrams.  I've always thought he should have had co-author credit for that piece. It was called "Old School Meets New Wave" and it had some really goofy artwork, a photo of a skinny punk kid with a pink mohawk overlaid on a black-and-white fifties dude with a fedora.

It ended up being one of the best articles of my career.  Some time later a tester named Paul Carvalho told me that he had created and gotten funded a testing effort at his company based on the concepts in that article.  Sometimes writing really can change the world.  It has happened to me a couple of times since then, but that article was the first time I knew I had made a difference to someone else by writing about software.  (Paul, if you read this, I hope I didn't garble your story, it was a long time ago we had that conversation.)

From about 1998 until the middle of the decade, the field of software testing and software development experienced any number of radical shifts, with the increased value for the role of a tester because of Y2K testing, the rise of open source, the rise of the agile movement, the rise of dynamic programming languages, and more.  But by late 2009 my own sense was that the public discourse on software testing in particular had become stale and outdated.  I started the writing-about-testing mail list and the WAT conference in an attempt to encourage new voices and new ideas in the public discourse on software testing.  A little over a year later, I think we have had some influence.  Since the first WAT conference, Alan Page, Matt Heusser, and others have begun calling for some examination of what the future of software testing holds.  

New ideas in our field come from three places.  They come from beginners who stumble upon some beautifully simple idea and are moved to tell the world about what they have done.   They come from people who think about the work on a really grand scale over a long period of time and build a body of work to support that grand idea.  And they come from people who truly make a breakthrough of some sort and are moved to explain that breakthrough to everyone.

So Charley, that is where my ideas come from.

(UPDATED: fixed garbled links)