Skip to main content

validate order with regular expressions

Today I wrote some UI tests to validate that certain text appears on pages in a particular order. Since I've done this now in two different jobs using two different tools, I thought it might be of interest to others.

There are a couple of reasons you might want to do this. For example, a page might have a list of records sorted by certain aspects of the record. Here's a crude example:


|name |date|description|
|chris|july|tester |
|tracy|june|gardener |


One set of critical tests validates that name/date/description appear in the right order; that chris/july/tester appears in the right order; and that tracy/june/gardener appears in the right order.

Another critical test is that chris appears above tracy in the display.

The first challenge for the tester is to identify and be able to address the smallest part of the page that contains the text under consideration. Every tool does this in a slightly different way, but Firebug is your friend here.

The next challenge is to make sure that your particular framework knows how to interpret regular expressions. I have seen this in place in Selenium-RC and SWAT, and I have rolled my own in Watir. (But for all I know Watir might have this kind of thing automatically by now. It's been some time since I've used Watir on a real project.)

Now, to check the horizontal order, you construct a test that looks something like


|assert_text|id=text_container|name.+date.+description|
|assert_text|id=text_container|chris.+july.+tester |
|assert_text|id=text_container|tracy.+june.+gardener |


To test that chris appears before tracy on the page:


|assert_text|id=text_container|chris[\W\w]+tracy|


Regular expressions are for matching text. The first set of tests is fairly straightforward. The "." operator says "match any character". The "+" operator says "match 1 or more instances of whatever". This means that text like "namedate" would cause the test to fail (which is exactly what we want) but text like "name foo date" would pass (which might not be what we want but it's still a decent test, we could write a fancier regex if that happened to be critical).

The second test is a little trickier. The trouble is that by default "." does not match any newline characters. And since chris has to appear above tracy in the page, there will always be a newline involved. Most if not all languages have a way to tell the "." operator to match a newline, but some of them are really awkward. (I'm looking at you, .NET.)

So instead we make a little workaround: the \W means "match anything that's not a word" and "\w" means "match anything that is a word". "Word" is defined as any letters plus any numbers plus "_", but it doesn't matter. [\d\D] (for digits) would have worked as well, since the point is that one of those expressions will always match anything we encounter between one interesting string ("chris") and the other interesting string ("tracy").