Chris McMahon's Blog

What is ReST? (and a bunch of other stuff about API testing)

2022-10-31T09:22:00.001-06:00

"ResT" in the context of a software Application Programming Interface (API) stands for "Representational State Transfer". Like many software design patterns, ReST is a pattern, not an implementation. Before we dig deep into ReST though, let us talk about APIs in general.

When we in the QA/test world talk about APIs, we usually mean APIs exposed to users in order to manipulate software systems efficiently from the outside. These are public-facing APIs, as opposed to internal APIs set up at boundaries between disparate subsystems within the application itself. There are any number of public facing API schemes that are not ReST. SOAP is still widely in use. CORBA was a pioneer design in public APIs. AJAX (Asyncronous Javascript And XML) is an API that enables things like sliding maps in a browser. And applications may have their own proprietary public APIs: the Salesforce API is a notable example; while similar to SQL and some other things, the Salesforce API is unlike anything else in the software world.

A ReST API is typically implemented as json data sent over HTTP. ReST APIs sending XML over HTTP are also feasible. However, not every json-over-HTTP or XML-over-HTTP API is "ReSTful". I once worked with an XML-over-HTTP API whose only method was GET. This is not ReSTful.

The essence of a ReST API is: the client wants the system to be in a particular state. The server will either make the system conform to that state, or else it will tell the client why that is not possible. The "representation" of the "state" is "transferred" from the client to the server. The client says "Be this way" and the server either says "OK" or else it tells the client why that can't happen. This is ReST. (In the case of a GET request, the state of the system is transferred from the server to the client.)

For example, the client may want User X to exist in the system. The client may issue a GET request asking "Does User X exist?" and the server will reply either "no" or may supply information about a User X that exists. If User X does not exist, the client will then issue a POST request with appropriate information to create User X on the system.

The client may want for User X to buy Product A. The client may issue a GET request asking "Does User X own Product A?". The server will reply "no". The client will issue a POST request with the details by which User X purchases Product A.

(I deliberately omit discussion of HTTP PUT and DELETE commands, let us just cover the basics.)

The essence of the ReST design pattern is that it is the client that determines the state of the system. The server merely implements what the client asks, or else explains to the client why this is not possible.

This is the genius of the ReST design pattern: some clever people discovered that the state of the server-side system request could be communicated to the client solely by means of HTTP status codes. So the client will tell the server "Be this way" and the server will reply in only one of several limited ways:

20X: OK it worked
30X: what you're trying to do exists, but it lives somewhere else and you have to ask over there
40X: the server can't do that and it's your fault
50X: the server can't do that and it's my fault

The mechanism of a ReST API is a simple transaction between a web client and a web server. Behind the web server could be literally anything. The web server's job is to translate the client's request into any number of actions on the whole back end system to accomplish what the client is asking. Whether those changes actually occur correctly is the challenge for the API tester. (Hint: those 50X errors are where the bugs are.)

Note that access to the API is a significant tool in the toolbox of the User Interface (UI) tester. Tell the API to set up User X and buy Product A: then check in the UI that User X owns Product A. You are a test wizard.

Bonus story! In 2008 the ideas behind ReST were fairly new and very popular, but not well understood. I had been working with a really well-designed ReST API for some time. I was at a conference and got into a conversation with a developer who was trying to find an answer to the question "How do you make LOGIN ReSTful?" and we butted heads. Authentication is an entirely different problem than setting system state. He never got it. May as well ask "How do you make a grapefruit be a blueberry?" Do not confuse the purpose of a ReST API with APIs dedicated to different purposes.

The Most Dangerous Bug

2022-09-16T12:04:00.000-06:00

There was a thread on Twitter recently from people who caused major problems in software systems, and I was reminded of the worst bug I've ever seen. Long ago I had published this story with a major tech media vendor, but that article seems to have succumbed to bit rot, so I am going to tell that story here. It happened more than twenty years ago when I had only been in QA for a couple of years, but it is seared into my mind...

In the late 1990s I worked for a company that handled 911 land line location software for major telcos. So when you choke on that chicken bone and call 911, you can't talk but they send the ambulance to where you live anyway. We handled about 75% of all the telephone numbers in the USA.

So from time to time in those days, there would be an area code split, where a new area code is added to a populous area and phone numbers start getting the new area code where they had a different area code before. In the 911 world, we would maintain records for both old and new area codes for a certain amount of time, but eventually we would delete the numbers from the old area codes to save space on the system and only use the new area codes. This was a completely routine operation.

Except for this one time. We arrived at the point where we intended to delete the obsolete numbers from the system for a large Midwestern US state. There was a particular bit of code that identified the obsolete numbers. That code was run by analysts, not by programmers or sysadmins. This code was put in the hands of a poor newbie analyst. They configured it incorrectly, and it identified for deletion all of the old numbers AND ALL OF THE NEW NUMBERS. This list was delivered to our sysadmin to execute.

The sysadmin was named Kevin. No one liked Kevin. He was not a nice person. Kevin took one look at this file and said "Whoa this file is WAY too big something is wrong". The managers told Kevin to delete the numbers anyway. Kevin resisted. Kevin was threatened with being fired, so he started running the numbers through the delete process.

The delete process was a script that had been created by sysadmins (not developers) in the earliest days of the company, and it had never gone through formal QA (which was where I worked. I had never seen this script.) Because this is 911, we always make backups and copies, and this is how the script worked:

READ THE NUMBER
WRITE THE NUMBER TO A FILE FOR BACKUP
DELETE THE NUMBER PERMANENTLY FROM THE NETWORK
REPEAT

This is where it gets interesting: this software ran on the Tandem computer, an old mainframe system (it still exists today, after many acquisitions it is today known as HP NonStop, and it is still in use in certain industries.) The thing about Tandem is that when you create a file, you have to declare the size of the file at the time you create it, and you can't exceed the size that you declare.

I'm sure you can see where this is going.

Kevin ran the numbers through the script. The script wrote the backup records to the file. The file filled up. Because the script had never been subjected to our rigorous development process, it had never occurred to its creators to catch the error when the file filled up. So the script did this:

READ THE NUMBER
WRITE THE NUMBER TO A FILE FOR BACKUP
GET AN ERROR THAT THE FILE WAS FULL
FAIL TO CATCH THE ERROR
DELETE THE NUMBER PERMANENTLY FROM THE NETWORK
REPEAT

We deleted most of the 911 location records for a major Midwestern US state. Being 911, we had backups of the data, but we had deleted so many records that our original idea was that it would be faster to give someone a physical tape and put them on an airplane from Colorado to the Midwest in order to restore the records locally from the tape. Ultimately one of our more brilliant programmers devised a compression scheme on the spot that let us update the records over the network.

We were so very thankful that no major disasters happened in the 36 hours or so that the 911 location information was missing. A big fire or a chemical spill or something like that would have been a problem of epic, historical proportions.

Using API Clients in the Service of End to End Tests

2021-01-29T11:39:00.006-07:00

It often happens that someone creating automated testing programs using Selenium or a similar framework that controls the UI (User Interface) eventually finds it necessary to address a non-Selenium API (Application Programming Interface) of some sort. You might need to create, update, or delete test data. You might need to modify a system setting. You might need to determine the current state of the system. You might need to have your test send or receive a signal from some other part of the deployment process. Often the only way to do these things is through use of an API.

API Defined

An API is a relationship between a client and a server. The server controls some aspect of the system, and the client wants to see or change some aspect of the system that the server controls.

Selenium-webdriver itself operates as an API. The server for the Selenium API is called WebDriver, and WebDriver exists only on the browsers that Selenium is automating. Your script controls the Selenium client that communicates with WebDriver in the browser instance. Selenium is a tool that automates browsers, nothing more. Working with APIs beyond WebDriver requires an understanding of what those broader APIs provide, what the test code requires, and how to make those connections. Knowing these approaches to APIs other than Selenium/WebDriver is important work in test automation.

There are many kinds of APIs in the world of application programming, and you as a developer will have to understand which APIs you need to address, and how to address them. For example, the most common type of API a tester will typically encounter is a ReST (Representational State Transfer) API, that sends data, usually in JSON or XML format, over an HTTP connection between the api host and the consuming client. Less popular today than ReST is SOAP, an API that tends to offer a wider range of possible actions than other kinds of APIs.

Just as there are generic ReST and SOAP APIS, there are also application-specific APIs: Wikipedia has its own API; Salesforce has its own API. In some cases the API is not a single monolithic API, but a disparate collection of multiple APIs collected for different services.

Designing Browser Tests Using APIs

API calls from an end-to-end test script answer different needs and serve different contexts. One context might be that your test needs to know something about the state of the system before it can proceed: is a service down? Is a setting in place?

Or your API calls might make tests run more quickly or make tests less prone to failure. For example, the API call might set up a known set of test data for a test and then tear down that data after the test runs. A typical example is to create a user account with specific properties that you do not want to create via some tedious and irrelevant browser-based registration.

Implementing API Client Calls From Within Test Code

The code you will use to address an API has nothing to do with the code you use to drive the UI. Using the ReST example, if you are working in a language like C# or Java, you may have to create your own instance of a ReST API client using your language's HTTP library and your language's JSON or XML parser. Dynamic languages like Python and Ruby tend to have shared libraries that make this easier, but there is no skipping the work involved in understanding how these API clients function in the context of your particular language and operating system.

There is another approach to addressing an API other than using the native clients in your programming language. There are any number of command-line options for addressing APIs, from the most basic curl() utility, to the comprehensive clients offered by for example Postman. Every programming language offers the ability to exit the language and address the command shell directly. (This is often called to "shell out".) If creating an API client in your own language is too troublesome, it may be worth simply exiting the program, invoking an API client on the local host like a curl() command or something similar, then capturing the results of the shell-based API call. This has the benefit of being relatively easy in almost every programming language, and may provide a useful set of tools beyond just browser testing. The drawback is that you have to rely on the utilities you need to exist and to be configured properly in the environment in which you run your test scripts.

Finally, consider unusual approaches. Every test environment is unique, and your test environment may offer opportunities to address your API that are not obvious. For example, one of the authors of this paper occasionally needed to do a simple operation by way of the API. The API server required an extensive security regimen involving passwords, tokens, third-party authentication, really an extremely secure transaction. Instead of investing in programming that overhead into the test framework, we knew that the API offered a Swagger interface on a web page on a local server, so we simply used Selenium to navigate to the Swagger web page, login as a regular user using Selenium to accomplish the login, and then accomplished their API transaction on the Swagger web page with Selenium automation. So sometimes you actually CAN address an API with Selenium automation!

Which brings us to our last consideration: authentication and security. In some cases, security for test environments may be relaxed and authentication may be simple or not required at all. But that may not be the case. Besides understanding what information your API provides and how your API client can address that information, you also need to understand and honor whatever security measures may be in place.

Using APIs Gives Tests Power

Selenium itself is an API client, it controls the browser. Using API calls, tests can observe the state of the system, and can control the state of the system being tested. Tests can create, read, update and delete test data. Using APIs, tests can reach into other aspects of the build and deploy systems in the sense of DevOps. This power over system state and test data and interprocess communication gives your browser tests the flexibility and reliability to be an integral part of a Testing Observability And DevOps (TOAD) environment.

Feb 2021

Benjamin Hofmann
Tim Western
Chris McMahon
Thanks to The Testing Observability And DevOps (#TOAD) API Working Group

This work is published under the Creative Commons “CC BY-SA” Attribution-ShareAlike license.

Easy fake paella recipe

2020-05-29T11:47:00.009-06:00

Real paella is made in one pan. This recipe is "fake" because the seasoned rice is prepared separately from the other ingredients and added at the end. The advantage is that we can include a bigger variety of fresh vegetables. Feel free to vary the amounts below for your own taste.

Invest in preparation. Once you start cooking, this process goes very quickly.
Seasoned rice:

2 cups long grain white rice. (I like jasmine rice, but any will do)
1 spice bottle of saffron (about 1 gram, get the biggest amount for the least cost you can find, more is better)
Dried oregano, a healthy shake
Cumin to taste (I go easy on cumin, too much smells like armpits)
Black pepper, just a bit

Meat and seafood:

1/2 pound or more savory sausage links, like spicy Italian, Andouille, or Spanish chorizo, cut in thin medallions (Note that sausage is required because all the other ingredients rely on a little sausage grease in the pot. If you omit sausage, add a little oil to cook the chicken and seafood)
1 pound boneless skinless chicken breast, cut in bite size pieces
1 pound shrimp, shelled and cleaned (fish or other seafood would also work, but I always use shrimp)

A variety of fresh vegetables with a variety of textures, such as:

Florets from 2 or 3 broccoli stalks
1 or 2 red bell peppers, cut in bite size pieces
Green peas, fresh or frozen (peas are traditional in real paella)
Mild pitted olives, like cerignolo or castelveltano (strong olives like kalamata are not recommended)
Green beans or asparagus, cut in bite size pieces

The order in which the ingredients are prepared is critical. Messing this up will make for an unpleasant outcome:

Cook the rice. For 2 cups of rice I use about 4 cups of water, but this varies with altitude and preference. The goal is to have rice that is more dry and less wet, but it will be tasty regardless. However you cook your rice, stir in all the saffron, the oregano, cumin and black pepper immediately and allow the spices to cook with the rice. Stir occasionally to mix in the spices as the rice cooks.

When the rice is nearly done, or completely done and off the heat, cook the meat and seafood. Use a very large pot, this pot will eventually hold all of the paella:

On high heat, stirring constantly:

Brown the sausage medallions thoroughly.
When the sausage is cooked, add the cut up chicken to the pot
When the chicken is cooked, add the shrimp to the pot
When the shrimp is cooked or nearly cooked, add the vegetables and olives to the pot
Still stirring constantly, cook this mixture on high heat until the vegetables are barely tender
Add the cooked rice and mix thoroughly

Serve and salt to taste. Store refrigerated.

To cook leftovers, put paella with a splash of water in a pan with a lid over low heat.

3 recipes for a pandemic

2020-03-18T10:40:00.000-06:00

These are three of my favorite large scale recipes. They all feature complete and concentrated nutrition. They all involve a significant commitment of labor and time, so I learned to make them in large batches. Also, they can all be made in a concentrated form, and reconstituted at serving time by adding water. The marinara and green chile are particularly amenable to freezing and refreezing over and over. Enjoy...

Marinara:
5-6 pounds fresh red tomatoes (or more), diced
1 head garlic, skinned and crushed
1 small yellow onion, diced
4 oz fresh basil, diced
4+ oz grated fresh parmesan cheese (optional, see vegan note below)
1 or 2 small cans tomato paste

olive oil
dried oregano to taste
black pepper to taste
salt to taste

Dice the onion and crush or dice the garlic into a large pot. Cover with olive oil and cook until onion is transparent. Dice the tomatoes and fresh basil and add to cooked onion/garlic. Stir. Add tomato paste, grated parmesan, oregano and black pepper. Simmer at least one hour. Add salt to taste. Add water if necessary, or to stretch the recipe. Add salt to taste.

Serve with pasta and protein, as pizza sauce, in lasagna, etc.

Vegan note: if you leave out the parmesan, add a bit more olive oil for texture.

(I published a version of this recipe on the writing-about-testing mail list in 2009. Today's version is better and a lot less context-driven)

Reverse-Engineered Pork green chile stew: (If you are in or around New Mexico in autumn, this recipe is based on one bushel of fresh green chile, roasted cleaned and diced. )

3-4 pounds (minimum) frozen, bottled, or canned (or fresh!) roasted green New Mexico chile
1 pound ground pork
1 pound potatoes (at least) cut into bite-sized pieces.
1 head garlic, skinned and crushed
1 small onion, diced

oregano (Mexican oregano preferred)
cumin to taste
black pepper to taste
salt to taste

Brown the pork thoroughly in a large pot. Add the diced onion and crushed garlic, stir until onion is transparent. Add the green chile. Add the potatoes and cover with water. Add oregano and cumin and black pepper.

Simmer until potatoes are done. Add salt to taste.

Serve with blue corn tortilla chips, fresh avocado, and cotilla cheese, or use as a side dish for breakfast, or for other New Mexican themed food.

Note: for a tasty but less bulky alternative, omit the potatoes and instead add a thickener: in a separate container stir together white flour and cold water into a thin paste. Stirring constantly, add the flour paste to the hot stew until you get the desired thickness. This is much closer to the original recipe I reverse-engineered from Olde Tymer's in Durango CO.

Reverse-Engineered Chicken Soup (pasta or rice)

1 large (1 pound+) chicken breast or about 2 pounds chicken thighs (or more)
1 small onion, diced
1 head garlic, skinned and crushed
3-4 large carrots or more smaller carrots, diced
4-5 celery stalks, equivalent to the amount of carrot, diced
fresh parsley, diced

black pepper to taste
salt to taste

100% Durum wheat pasta, or wild rice mix

Cover chicken with extra water, boil at least 30 minutes until chicken is thoroughly cooked. Pierce chicken skin while cooking, to release fat into the cook water. Remove chicken from water and let cool. Keep the chicken water for stock.

Add diced carrot, celery, onion, parsley, garlic to stock. Bring back to boil.

Clean and dice chicken meat, discarding bones and skin. Add diced chicken back to boiling stock.

Add several handfuls of pasta or a big shot of wild rice mix. The more of either you add, the more water you will need, so this recipe can make a whole lot of soup. If using pasta, cook only until pasta is done before making the first serving. Pasta will disintegrate over time with heat. Rice can cook much longer.

Serve with a side salad or a grilled-cheese sandwich.

Note: everything in this soup was on the label of Campbell's Chicken Soup as of 30 years ago.

Recipe for a healthy QA/test relationship with the rest of the org (TOAD!)

2019-08-15T11:18:00.000-06:00

Like any recipe, you can tweak this for your own situation.

First, you need a shared test environment to work in. I recommend beginning with a persistent test environment intended to model your production environment as closely as possible. While you may eventually evolve the ability to share a test environment with commonly-configured VMs or containers, having a persistent shared test environment at the beginning gives everyone the experience of keeping something running that looks a lot like the production system. (TOAD! DevOps!) For that matter, proper use of feature flags and such could make production itself a perfectly fine shared test environment. Sharing the responsibility of keeping the shared test environment up and running and valuable is a big part of understanding the value of the recipe.

Second, you need some end-to-end tests (Testing!) to run against the shared test environment. You need to set up test data, tear down test data, type the typies and click the clickies in the UI to use the data, figure out what to assert to make sure that you know that the system continues to work the way you expect. The tests themselves are not all that valuable right away-- the most important thing is that you emphasize the *design* of the end-to-end tests, and include everyone in the end-to-end test design process. How you "include everyone" might vary-- it may be as limited as asking questions of dev and ops (DevOps!) about how the data work, or how the UI is hooked up the API. Or it could be mob programming on the end-to-end tests. Your org is your org, but make sure to share the creating of the tests.

But you have to make sure to get code changes to the shared test environment right away in order to run your end-to-end tests against the changes. You need a deploy pipeline straight from the code repository to the test environment (DevOps!). Having built this, of course you could just as easily deploy straight to production, but that would be irresponsible. You need to monitor your test env (Observability!) and run your regression tests (Testing AND Observability!) before you commit to updating the production environment.

Over time your end-to-end tests, your test data schemes, your test environment monitoring become more valuable. You have well-designed tests with reporting schemes that tell you exactly what kind of problems you have as soon as you have them.

That is the recipe.
* Have a shared test environment
* Use that environment for deploying, testing, monitoring, and reporting
* Make it all better over time

TOAD Goes To 11

2019-06-21T10:17:00.000-06:00

TOAD is Testing, Observability And Devops. We think these three things are related. What would happen if we took each of the three aspects of TOAD and put strong emphasis on each in turn? What happens if we take our testing effort as far as it can go, and "dial it to 11"? Observability to 11? Devops to 11? The meaning of "dial to 11" will be different for different organizations: it might mean hiring staff, or investing in tools, or even just emphasizing the mission more than before.

If we dial testing to 11, I think two things will result. For one thing, the number of observable incidents in production will likely go down, because the emphasis on testing means that more problems will be found and fixed before deploying to prod. I also think that rate of deployments (Devops) could potentially increase, because the decrease in observable incidents will make deployments safer. So: increase the T in TOAD to decrease the O and increase the D.

With testing at 11, now we can dial Devops to 11. We can increase the rate that we deploy code to production, because testing has made deployment safer. With D at 11, what happens to T and O? I suspect the number of observable incidents will increase. We reduced the number of problems we deploy to prod, but now we are deploying more often, so it makes sense that the number of problems in production will increase.

With Devops at 11, now it is clear that we need to dial our Observability to 11. With more frequent deployments, it stands to reason that we would have more incidents to observe. We probably also have trickier and more sophisticated problems in production too, because our testing has been at 11 for a while now. So now we invest in observability because it makes sense.

So all of TOAD is dialed up to 11 now. Or is it? We know from our observability efforts that the number of observable incidents in production has increased. Clearly it is now time to revisit our Testing, because we can see it is no longer at 11, Observability and Devops have left it behind.

In summary:

Invest in Testing to make better Devops possible while reducing Observable problems.

Then invest in Devops to get better features to production faster.

Then invest in Observability because the deploy rate is much faster.

Then repeat the cycle because improvements in any one area mean that the other areas will need to keep pace. In practice, I think we can dial all the parts of TOAD as high as they will go more or less simultaneously, but I think it is helpful to consider the effects of changing any one of Testing, Observability And Devops on the other two practices.

TOAD: RRRAR! Rollbacks, Replays, Reverts, and Regressions

2019-06-18T11:27:00.000-06:00

TOAD is

Testing
Observability
And
DevOps

My last blog post was about why TOAD ideas are important and how TOAD ideas interact with each other. But even the most excellent TOAD systems occasionally release bugs to production. This post describes how to understand these problems and how to address them when they happen. And there is an experience report at the end!

Most of the time we learn about problems in the production system because the system is observable. We see a performance problem, or a data problem, or a space problem, and we can use observability tools to narrow down and ultimately fix the cause(s) of the observed problem. There is also an entire class of problems that cannot be discovered with typical observability tools, but only by testing the system. Given that we find such problems in production, what can we do about it?

Rollback and Replay

The most drastic response to a production problem is to "roll back" the system to the last known good state by means of restoring a system backup or backups of some sort. Organizations that adopt a rollback strategy to mitigate production problems will typically have the ability to execute all the transactions since the rollback time a second time, against the repaired system, in order to bring the state of the system back to the current time. This strategy is expensive, and not all that common. (If the organization has to roll back but is not capable of replaying all the system transactions, users are forced to re-do everything they already did once, which is not very nice.)

Revert

A more common strategy to repair a problem in production is to identify the the particular commit to the code base that contains the code that caused the problem, and then revert that particular commit, thus restoring the system to a state where we can attempt to achieve whatever the problem code was attempting to achieve but in a way that does not disrupt production. It is possible to skip the revert step completely, and simply push some new code that corrects the problem; this is typically called a "hot fix"; the net result is the same as a revert plus a new push.

Regression

When a feature that used to work no longer works, we tend to call this a "regression" problem, but "regression" is really a misnomer. It is rare in a TOAD system to break a feature by introducing older, broken code as the word "regression" implies. It is also rare that a feature that used to work but no longer does can be fixed by putting older code in place of newer; much more likely is that we will have to fix the feature by adding even more code. In a TOAD system, a "regression" bug is almost certainly going to be a "progression" bug, caused by adding new code that seemed reasonable but whose implications in production we failed to understand properly.

Regression Testing with TOAD

As I noted in my previous post, Devops describes the nature of the development process from idea to production. Observability tells us what has happened on the system. Only the testing part of TOAD describes what the system *should* do. Put another way, the T in TOAD is our model of how the system should work, it is the part of TOAD that asserts things about the behavior of the system before the system runs, when we then check that our assertions are correct. There is a class of production problems that cannot be identified by observability but only by testing. When we release one of these problems to production, it is a sign that our model of the system as embodied in our tests is either incomplete, or not understood correctly. It follows that every behavior problem released to production that cannot be identified with observability tools should prompt a change to our testing. While we may call this change a "regression test", it actually represents an update to our understanding of our model of the system's behavior.

An Experience Report

In early 2012 I was hired by the Wikimedia Foundation to create a QA and testing practice. Part of this work was to create a shared test environment called "beta labs". (Although it has evolved wildly since 2012, the beta cluster still exists!) Production Wikipedia is a fiendishly complex environment, and emulating it in the beta labs test environment was a hard problem. In late 2012 beta labs had a lot of problems and glitches, and improving it as a model for Wikipedia was one of my top priorities.

The Ops team for Wikipedia has always been exceptionally talented and effective. Wikipedia has had observability baked in from its earliest days. (Testing came along much later, but is a big part of the culture today.) Wikipedia ran for its first decade with essentially no formal testing at all. This was possible because they had from the very first an architecture and a culture where they made reverting easy. Any commit that caused an observable problem could be reverted instantly. (During my tenure there it was possible to earn a t-shirt saying "I broke Wikipedia/But I fixed it")

But some problems are not amenable to observability tools in the Ops sense. In late 2012 we started hearing reports from Wikipedia editors that sub-headings in Wikipedia articles were being rendered in wildly inappropriate sizes: very large, very small, and everything in between.

I immediately recognized these reports. I and a number of other responsible people had seen these corrupt article headers in beta labs in late November 2012. That allowed us to narrow the time window to find the offending commit. Unfortunately, even though several people had seen the problem in the test environment, none of us *believed* that it was a real problem, because we did not yet trust beta labs to be an accurate model of Wikipedia. (That bug was my first real sign that we were on the right track with beta labs, and I *always* paid attention to bugs there afterward!)

The story behind how that bug made it to production is instructive. The source of the bug was a pull request for the Wikipedia editor from a contributor/developer in Europe. They were lobbying hard to get the PR merged, but the PR had not passed code review after several revisions, and a number of responsible developers were not willing to merge that PR, because they suspected it might not be completely sound. Late in the evening on Thanksgiving Day, while the US Wikimedia Foundation staff had Thanksgiving Day holidays, the author of the PR persuaded a European developer to approve the PR and merge it. No one noticed. Merging that PR caused the changed code to be deployed to beta labs automatically, where I and my colleagues saw the bug but did not believe the test environment. From beta labs the code went to production automatically.

This was a bug that could not be identified with normal observability tools, but only by testing-- and the testing failed. The bug was active for a number of days before we were able to revert it, but we were not able to revert all the corrupt sub-headers the bug created. One of our developers had to painstakingly find and fix every one of a couple of hundred bad headers on Wikipedia articles.

RRRAR!

Testing, Observability, and DevOps are interlocking practices that support each other in order to get good code to production quickly. But even so, things can go wrong. Before things go wrong, we should have a plan in place, and know our options: Rollback, Replay, Revert, and Regression are all strategies to consider.

TOAD FTW! Evaluating a Test Suite: A TOAD Thought Experiment

2019-05-22T13:35:00.000-06:00

TOAD is

Testing
Observability
AND
DevOps
The AND is important!

Disclaimer: In my career I have done most of the things I describe below, but never tied them all into one project. The following should be possible:

Testing a System

Suppose we have a software system of reasonable complexity. Suppose our system is comprised of a front end with a user interface (UI) and a back end, thus a client and a server. The front end and the back end communicate via an application programming interface (API) of some sort. This is a common architecture of many software systems.

Suppose further that our front end and our back end are well-designed. They have unit tests, and meet whatever definition of quality you would want to apply.

Because our system is reasonably complex, we want to have a suite of end-to-end tests that exercises the entire software-and-data stack, that demonstrates that the users can do the things they need to do, and that the front end and the back end are communicating properly in the service of the users' requests.

In a system of reasonable complexity, there are a finite number of paths through the UI. We create a suite of tests that exercise these paths. Suppose for this thought experiment that our tests exercise all of the paths through the application available to the user. (In other places I have called this sort of test design "feature coverage". In this case we have 100% feature coverage.) The tests operate on a carefully chosen well-known set of data. The tests are designed well: each test navigates a path through the application that changes the application state at least one time; each test makes at least one assertion about the changed state of the application, and reports any unexpected state it encounters.

Now we have a question: how good is our end-to-end test suite? We believe that the tests cover all the interactions that a user would experience in operating the application, and we think we can expose all the errors or mistakes that a user might encounter-- but how can we know that for sure?

Observability at a High Level

In the Wikipedia article on Observability it says that a system is observable if we "...can determine the behavior of the entire system from the system's outputs." Each of our end-to-end tests change the state of the system and then assert something about the changed state. We have already accomplished a certain level of observability, and because we have good feature coverage, we can say quite a lot about the state of what is probably the most important part of the system-- everything that users can do.

But we can't know about states that potentially exist but that we did not exercise. However, we can think about our architecture, and we can make our systems more observable than they are. Examining the front end of the system, it is likely that our end-to-end tests have in fact exercised all of the calls to the back end that exist. Since we stipulated that this is code of high quality, it is unlikely that there is dead code in the form of unused calls to the back end of the system.

The back end is more difficult to reason about. It is entirely possible that the back end is capable of supplying more information than the front end is capable of consuming. There is also a chance that there may be paths through the application that we failed to discern, and there could be data that cause problems we have failed to anticipate.

So our system is in fact observable because we can change its state and infer its status from those changes. We can be reasonably sure that we are exercising the capabilities of the front end to the greatest extent possible. And we can say that our test suite is probably pretty good-- but the possibility of "unknown unknowns" still exists.

Observability at a Deep Level

Google "software observability" and find a wealth of tools, approaches, and other material on the subject. A simple description of software observability is that code is instrumented in such a way as to expose state changes in the application. The records of these state changes may be consumed so as to analyze the behavior of the system, with the goal of exposing problems in behavior, performance, errors, etc.

TOAD FTW

This is an essay on TOAD, Testing, Observability AND DevOps. We want to be able to evaluate how effective our suite of end-to-end tests is, and we think TOAD can help. So far we have a good grasp of a certain kind of end-to-end testing, and two kinds of observability. That second level of observability is where DevOps comes in.

Our code is instrumented such that it emits detailed notices of state changes in the system. We can consume and analyze those notices and gain a good understanding of the behavior of the system. We can create a profile of the behavior of the production system over a period of time that shows things like the kinds of state changes occurring across the platform and across the code base.

We can run our end-to-end test suite with the same instrumented code and generate the same profile of the behavior of the system under our suite of tests. When we compare the production profile to the test profile, we expect to see the same sort of instances of behavior and state changes in both profiles. When this is true, we can say with certainty that our suite of end-to-end tests is valuable and its coverage is excellent. Note that the instrumented code in production does not participate in that first kind of observability that we noted, where we make detailed assertions about the state of the system. The production profile only tells us what happened. The test system profile tells us that the same things happened when we tested the system, but only the test system tells us that the correct things happened.

But what if the production profile shows activity in areas that the test profile does not? Remember that the design of our tests mean that parts of the system, the back end of the system in particular, is necessarily something of a black box. Our test suite has no mechanism to discover dead code or unsuspected communication channels. This is where our second kind of observability and our DevOps operations can illuminate parts of the system that we may have neglected to put under test. We designed the best end-to-end tests that we could, and we can use TOAD to tell us if we did it right the first time, or if we need to create more test coverage than we have.

My Remote Retrospective Process

2019-02-23T09:54:00.001-07:00

Possibly the most important aspect of an agile process is the retrospective. A retrospective usually happens in a team meeting, generally at the end of every agile iteration or sprint. While there are any number of ways to run retrospectives, the object is to discuss three questions:

What is working well?
What is not working well?
What should we change?

The first question tends to be the easiest to answer, and it is tempting to just skip it, but that is dangerous. It is every bit as important to celebrate the team's success as it is to grapple with the team's issues and problems. Positive reinforcement is more powerful than negative reinforcement.

The second question also tends to be easy to answer in teams where the members feel safe to discuss work honestly. It is important to surface problems so that the whole team is aware of the current issues and why those issues affect the work.

The third question tends to be hard to answer. There is an entire body of literature devoted to running retrospectives, most of which focuses on dealing with that third question.

Before I suggest any answers, I would like to emphasize that a retrospective is an explicit invitation to change the development process for the whole team. And in fact the whole team should change its process over time, as a direct result of discussion in retrospectives and along the way. If you start doing your agile process "by the book" and a year later you still have the same process in place, you are doing it wrong.

The most effective retrospective change process I ever saw came from Thoughtworks. That was with a (mostly) co-located team. I've adapted the process slightly for a remote team and it works something like this:

The retrospective happens at the end of a sprint in a conference call with everyone attending. One person acts as facilitator (so far just me, but I intend to start recruiting others now that our retrospectives are becoming routine) and shares their desktop open to a wiki page that they edit in real time. I keep the discussion open and the tone light and friendly.

We ask the "what went well?" question and write down what people say in a list. As facilitator, I try to pay attention to who might be more shy, or new to the team or such, and ask them specifically about their experience in the last sprint. On one recent sprint, I specifically asked one person who is notoriously grumpy to name something that went well, and he really came through! I do not "go around the room" or enforce any kind of structure. I find this open format (with some friendly support from the facilitator) over time encourages participation from those who might be less inclined to contribute, whether they're shy, or skeptical, or for whatever other reason not excited about the exercise.

As a result of answering "What is not working well?" the team creates a bullet list of current issues and problems. Say for example the team identifies five problems like:

Problem 1: X is unpleasant
Problem 2: Y is broken
Problem 3: Z is missing
Problem 4: Definitions are vague
Problem 5: Process Q is onerous

Note that while some problems will be persistent over several iterations, other problems will pop up and then go away as they are addressed.

What we did at Thoughtworks was, having created the list of problems, we gave everyone on the team a certain number of votes; say three votes each for the five problems. Everyone on the team could cast their three votes to address whatever problems they felt were most important. One person might cast all three votes for "Problem 3: Z is missing" because they are blocked by the missing Z. Another person might cast one vote for Problem 4, one vote for Problem 5, and one vote for Problem 3, creating now four votes for Problem 3. The problem with the most votes at the end of the exercise is the most important problem affecting the whole team right now.

This is easily done on a whiteboard with a co-located team. For a remote team, my first attempt at a voting process was with a table on the wiki. It didn't work very well. Since then we have been using a plugin for the Confluence wiki called "MultiVote" that almost does what I want but not quite-- it won't allow one person to cast all their votes for a single problem. I'm surprised that more than ten years after I did this on a whiteboard at Thoughtworks, I am unable to find software that does it today. But the existing MultiVote plugin is close enough that it accomplishes what I'm after: to expose the problems that are currently annoying the most people.

When the votes are tallied, the team commits some time to investigating the problem with the highest number of votes. The team does not necessarily solve the problem, but we commit some time to looking for a solution or workaround or somehow to further a solution if a solution is possible at all. At Thoughtworks the biggest problem every iteration would be assigned to a "champion" to lead this, but right now in my current team no one person has any particular responsibility. It still works pretty well.

The problem-identifying and problem-solving activity is repeated for every iteration. And-- this is very important-- at every retrospective, we revisit the the problems from the previous iteration and discuss whether the problems with the most votes have been solved, or have gone away, or if they need to carry over into the current iteration. This looking back is really important. It closes the feedback loop. Otherwise, people would get a sense that they have problems, but nothing is ever done about them. Over time, this proved to be an extremely powerful agent for positive change.

(I usually revisit last iteration's problems between the "what went well?" part and the "what didn't go well?" part, but sometimes I do it at the very end of the retrospective instead.)

It is important to note that this activity is powerful over time. For example, as the only QA person on the team at Thoughtworks, I often had problems that were different than those of the many developers on the team. So I would introduce a problem to the list, I would cast my votes for it, and some other issue would top the list to be addressed. But on the next iteration, if I still had the problem, my problem would begin to affect others on the team, and would gradually gather more votes and be more important.

My favorite example was that this team had an issue creating test data for the application. I flagged it as a problem for myself, but it was not immediately a problem for anyone else. But as our code progressed, more and more developers and others on the team also found a need to create test data, and the problem eventually became the #1 issue for everyone. We assigned various champions to the problem, even eventually a Computer Science Ph.D/ architect/technical project manager, and even he could not solve it. Eventually we solved our test data problem on that team by hiring a whole new developer with this particular area of expertise.

I started this saying "a retrospective usually happens in a team meeting" but I'll end it by saying that change is the essence of agile practice, and we can make changes at any time for any reason. Retrospectives are an important tool to guide change in a productive way, but they are not necessarily the only path to change.

Why I like Cucumber (beyond BDD)

2018-08-09T13:29:00.000-06:00

There is a sentiment in software development that if you do not have a working BDD practice then Cucumber is just unnecessary overhead. I understand this position, but I disagree, based on my own experience and my own practice. I find Cucumber is particularly valuable in automated browser tests.

I've been using Cucumber for automated browser tests at work just about every day for the last six years or so. At the same time, I have never worked with a full-on BDD team. Beyond BDD, here are three aspects of Cucumber I find particularly valuable:

Cucumber creates a low barrier to entry for anyone at any time to contribute and understand the project.
Cucumber's Given/When/Then syntax provides a design guide particularly well suited for browser tests, especially using "When" in a particular way.
When a test fails, Cucumber provides a plain-English description of what the test does that may not be immediately apparent from the code or the nature of the failure.

I tend to write browser test suites of significant size and scope, whose working life extends to years. Over the course of years, people will join the project and leave the project, people will have various levels of engagement with the test suite. Cucumber allows someone new not only to quickly grasp the nature of the tests, but also the nature of the project itself. Well-written Cucumber Scenarios are an effective way to describe the behavior that the user sees. I've used Cucumber Features to get interns started, I've used Cucumber features as a starting point for developers familiar with other languages, and I've used Cucumber features to explain current function to people on the team not involved in the day-to-day development work.

I also value Cucumber as a design guide. Of course a Given step represents "setup", a condition that must be in place for the test to be meaningful. And of course a "Then" step should always contain an assertion about the final state of the application being tested. But I think that the "When" step has a particular meaning for browser tests that may not be true for other kinds of testing.

End-to-end browser tests are the only kind of tests that change the state of the entire application each time an element on a web page is changed. I treat When steps as verbs. Just as every Then step should have an assertion, every When step should have an action, like click(), select(), enter(). A test suite that enforces a convention that every Given step is part of setup, and every When step contains an action, and every Then step contains an assertion turns out to be a powerful and surprisingly maintainable set of tests, even when the suite grows large.

Finally, I tend to write tests whose working life is long, typically years long. In a span of years, an application can change significantly with old tests still being valuable. Institutional knowledge of the behavior of the application will be gained and lost over a span of years.

I find that when one of my tests fails after a year, or two years, or three years, the cause of the failure may not be immediately apparent from the failure message or the from the code that caused it. Just as a well-written Cucumber Scenario is valuable to someone new, when a test fails after a long time, it is also valuable as a plain-English description of what the test was intended to accomplish.

Watir: the first five years

2018-08-03T14:09:00.000-06:00

Some of the Watir community has been discussing the history of the project. Here I try to set down some notable things that I remember about that time.

We have to start the story of Watir with Ruby. The first English documentation for Ruby was published in 2000, and it garnered a lot of interest, particularly as it was so amenable to creating Domain Specific Languages (DSLs), which caught the attention of a number of people working in testing, and in the Agile world. Python 2.0 also came out in 2000, and the dominant scripting language at the time was Perl.

Of all the browsers available at that time, only Internet Explorer exposed an API for automating browser actions, via a COM interface. In 2001, Chris Morris published a Ruby library that exploited IE's COM interface called WTR, for Web Testing in Ruby.

At that time, Open Source software was often seen as inferior or even downright suspicious, and was poorly understood by most businesses. Test automation products were exclusively proprietary, and the market was dominated by Mercury Interactive and Borland SilkTest. These products were demonstrably flawed, but had no competition. Watir, and later Selenium, would change that.

Somewhere around 2002 or 2003 Bret Pettichord and Brian Marick created a one-day training session called "Scripting for Testers" that was based on Chris Morris' IE driver. Brian had written a simple timeclock application, and they wrote some structured Ruby around Chris' driver code to show examples of how to go about testing the timeclock application in an automated way. The Ruby interactive shell IRB was also featured in the training. (Aside: I downloaded that code and tried to get it to work, but it was always pretty buggy and difficult to get running. I tried to persuade my managers at the time to let me attend Scripting for Testers, but failed on several occasions. I would later end up teaching SfT myself, once at STAREast, once at STARWest, and once at Agile2006. There are a couple of interesting things that happened at the Agile2006 SfT. This was the only SfT I taught solo, Bret was not there. Owen Rogers, whom I had never met, had agreed to assist me, but he canceled at the last minute, and in fact contacted the conference organizers without my knowledge attempting to cancel the workshop. I stopped that, and led the training myself. The level of skill and knowledge at Agile2006 was significantly more sophisticated than at the STAR conferences, and the students moved very quickly through the material. Just as I was starting to feel overwhelmed as the sole instructor, Elisabeth Hendrickson and Michael Bolton happened to stop by, and they helped me out for a while. )

In 2004, Paul Rogers in Calgary had a child. During his parental leave, while caring for his new infant, he re-wrote the entire WTR codebase from scratch. Somewhere around this time they also decided to change the project name from WTR to Watir, "Web Application Testing In Ruby." Where WTR had been buggy and difficult to use, Paul's rewrite was a massive improvement. Watir began to see a lot of interest from the Ruby community, the Agile community, and the testing community. (Aside: Watir was the first browser automation tool, commercial or open source, that supported iframes. I was testing an iframe-based product at the time, and I submitted a failing test to the Watir mail list. As I recall, Paul told me it would take him about three days to add the iframe support. I replied that I had been waiting for this for three years, and to take his time.)

Today it is hard to understand just how dominant the test automation vendors were at that time, and how obscure the new open source alternatives were. One of those years in the early 2000s, Mercury had their user conference in Las Vegas. Elton John was the entertainment, they were that big. Bret, and a growing number of other technical testers, were adamantly against these companies and their practices, and Watir, and later Selenium, were the means to subvert them. Note that Selenium got its name because Selenium is an antidote for mercury poisoning, and Watir because water ruins silk. By about 2003, Brian Marick had become the Technical Editor for the magazine Software Testing and Quality Engineering, which later changed its name from STQE to Better Software. (The STQE acronym is the basis for SQE's web publication 'Sticky Minds'.) STQE/Better Software supplied a platform for a lot of us to evangelize open source test tools. I published my first professional article in Better Software in 2004.

Bret hired me at Thoughtworks in 2005, where I met Jason Huggins and got a look at an early version of Selenium. Ironically perhaps, I spent my tenure at Thoughtworks doing no UI testing at all, but rather spent my time testing in an API. It would not be until 2012, seven years later, that I would use Watir professionally again, thanks to Željko Filipin and Jeff "Cheezy" Morgan. (I did maintain an amateur interest in Ruby, Selenium, and Watir, and I followed the work that Jari Bakken did with the first Watir wrapper for Selenium WebDriver, which was absolutely brilliant. With some help from Jari, in 2010 I used Selenium/webdriver in Ruby to win an iPad from SauceLabs in a contest for "best use of Selenium that is not testing". Perhaps my most far-reaching contribution to the overall Selenium community was in 2011 when I posted on Twitter "if I ever meet Jari Bakken I'm going to buy him a steak bigger than his head." Today Selenium people celebrate with large steaks. I'll also mention here that I believe that Watir was a strong influence on the Selenium version 1 API, in that Selenium v1 supplied a rich set of automatable actions. Webdriver changed the design goals of the Selenium project. The Selenium/webdriver API had only about a third as many methods as Selenium RC. It was Jari's work re-imagining Watir as a superset of Selenium actions that kept the project relevant, and also made Watir a model for other modern rich browser automation projects.)

Finally, I would like to emphasize that Watir has never really been about code, Ruby or otherwise. I believe an early Watir slogan went something like "test tools by testers for testers". Watir more than anything else is a particular approach to browser test design that simply did not exist before. Watir has certainly evolved, and each of the Watir maintainers have taken the project in particular directions, but the Watir community is a group of people who value ease and elegance in browser automation design.

Long Term Remote Pair Programming a Complex Project

2018-01-22T11:57:00.000-07:00

This is the story of a really great project I did while working for Salesforce.org. I have done a significant amount of remote pair programming over the last ten years, but this project was extraordinary in a number of ways. For one thing, it was a really complicated problem that demanded a technically advanced solution. For another thing, it took almost an entire year to finish-- one hour per week.

The Problem

I will try to give you the background in a way such that your eyes don't glaze over: in order to work with data in a Salesforce instance via the API, you address "Objects" and "Fields". (These are actually tables in a database that may be addressed by a poor and crippled version of SQL.) For example, here is a description of the Account object whose first field is AccountNumber.

If you are a developer on the Salesforce platform, you can add your own Field to the Account object, but you have to append '__c' to it, like "MyField__c". You can also create your own Object, with the similar convention "MyObject__c". But when you get really serious about developing on Salesforce, you also have to create a "namespace" as a unique identifier for your "package", which would then be "foo_MyObject__c" and thus also "foo_MyField__c". (Sorry about that, I'm done now, I will spare you any more tedious detail.)

My tests had to set up and tear down data via the Salesforce API. There is a low-level Ruby client for the Salesforce API, but it demands the literal names of the Objects and Fields. My problem was that at runtime, the tests had no way to know the state of namespaces of the custom Objects and Fields in the target environment. Any of these conditions could be true, or false:

Custom objects with no namespace
Custom fields with no namespace
Custom objects and fields with arbitrary namespace
Custom objects and fields with multiple namespaces
Custom fields with multiple arbitrary namespaces on objects with arbitrary namespaces or no namespaces

A Little Help From My Friends

I do not consider myself a particularly good programmer. As a QA person, I have written only a fraction of the amount of code that a line programmer has. I have, however, read (and debugged!) an enormous amount of code over the last twenty years, such that I can tell a good solution from a poor solution, a good programming idea from a bad programming idea.

All of my ideas to solve my test data problem were bad. I needed help from someone who was a better Ruby programmer than me, and who also knew Salesforce better than me.

My colleagues introduced me to Kevin Poorman. I explained what I was trying to do, and Kevin graciously agreed to help me-- for one hour per week.

So every Thursday afternoon Kevin and I would join a teleconference session. It took almost a year to solve my test data API problem completely. This was my first experience with metaprogramming, and it produced my first Ruby gem, SFDO-API.

One Bite at a Time

...is of course the punch line to the old joke "How do you eat an elephant?" but it is also a good approach to tackling big projects in short sessions. The big problem was finding out at run time what namespaces, if any, were on the particular fields and objects we needed to work with. Then we needed to be able to

create instances of Objects with Fields with those namespaces
update instances of Objects with Fields with those namespaces
query the target environment using those namespaces
delete instances of Objects with the proper namespaces

The good way to handle this sort of situation is to use metaprogramming, particularly Ruby's "method_missing" feature. Ultimately we had methods for delete_all_MyObject, create_MyObject, and a method "select_api" that would parse the namespaces inside the select() query. (It turned out that we got the 'update_api' method for free without needing method_missing to generate it.) If you want to know the details about how SFDO-API works in practice, the README is pretty good.

These open-ended methods inside of method_missing we created one by one, Objects first and Fields when we had Objects working, one hour per week at time.

The Weekly Routine

Besides solving my test data problem, I also wanted to learn how metaprogramming in Ruby actually works, because it was something I had read about but that I had never done in practice. I find that it is almost always the case that on a remote pair programming session, one person is teaching and one person is learning. I always have the learner do the typing, while the teacher does the navigating. Getting that stuff under your fingers as you learn it is critically important.

Kevin and I quickly fell into a routine. We never knew where we would end up at the end of any given session, but every week before the session started I would review the current state of our code and isolate the next logical problem to tackle. I tried really hard to isolate bits of the next problem small enough to be solved in an hour. I was usually successful. As I said, I have read an extraordinary amount of code for a QA person, so I could visualize the next logical step to tackle, even if I had no idea how we would tackle it. So every Thursday when Kevin joined our meeting, we would immediately dive into the next conceptual challenge.

And after the hour was up, I would spend some time tidying the code. Sometimes I would add comments to remind future us of where we'd stopped in the previous session. As I got better and began to master some of these new concepts, I would sometimes right away fix bugs we had left behind. Then I would put away the code until the next week's session. I mentioned from time to time that "Kevin wrote the good parts, the rest is mine" but actually I did get a whole lot better as time went on.

The Big Picture

Our first commit was July 2016, and my last bug fix was May 2017, so a little less than a year in total. Maybe forty or so Thursday afternoon sessions. About forty hours of Kevin's time, and maybe a little more than double that of my time, because I spent time each week setting up the session problems beforehand and tidying up afterward.

Sometimes I wonder if maybe it would have been better just to knock out the whole project in a single push over a couple of weeks instead of over a year, but had we done it that way I am sure I would never have been able to understand the code as deeply as I do having lived with it and studied it and watched it grow week after week

Since I fixed the last bug in May 2017 this code has been called thousands of times for test data from at least four repositories, and as far as I can tell has not failed. I am really proud of this.

PS: Technical Details Of The Last Bug Fix

It took me several hours to isolate the problem and figure out how to fix it. The problem was a particular edge case when the non-namespaced string for one Field was a substring of the non-namespaced string for a different field. For example, a field MyAddress is a substring of NewMyAddressValue. Until I fixed that bug, every time I tried to address the NewMyAddressValue field it would come out looking something like "foo_Newfoo_MyAddress__cValue__c". It was really hard to see the different between gsub() and sub() here, it took me a while to figure out where the problem was.

Who I Am and Where I Am January 2018

2018-01-04T08:11:00.000-07:00

As of January 2018 I resigned my position as "Senior Member of the Technical Staff, Quality Assurance" at Salesforce.org. I have more than twenty years experience in testing user interfaces and APIs across a wide variety of platforms. If you would like to contact me, my DMs on Twitter are open or by email at christopher dot mcmahon at gmail. I do not use Facebook, LinkedIn, or Skype.

I have been working remotely for more than ten years. I enjoy telecommuting, it suits me nicely. In the past decade I have lived all over the western United States, including some time in Hawaii.

Here are some points from my career that help tell the story of how I came to be here today:

In 1997 I started testing 911 telecom location services, life-critical software for police/fire/ambulance dispatching for most of the USA. I tested these systems through Y2K and beyond. We saved the world. Don't let anyone tell you otherwise.

In 2004 I was, as far as anyone knows, the first person ever to point the open source browser test automation tool Watir at a production system. Because of this, Watir was the first ever automation tool, proprietary or open source, to support the automation of iframes and frames. Although it is radically different than it was in 2004, I still use Watir today, it has been my mainstay for the last six years.

Also in 2004 I published my first professional article for Better Software magazine. I would go on to publish many dozens of other professional articles about software development, testing, and methodology for a number of media vendors.

In 2005 I gave my first conference presentations, at PNSQC (large PDF) and at STARWest . My PNSQC presentation became official documentation for FreeBSD. I was an early adopter of open source software and the Wiki Way, my presentation at PNSQC was a real-time installation and configuration of the Twiki wiki on FreeBSD that attendees could edit immediately.

Also in 2005 I joined Thoughtworks , where I worked with Bret Pettichord and met Jason Huggins, who was working on the project that would become Selenium. Even before Thoughtworks I had been an early adopter of agile methods. I gave talks at the Agile2006, Agile2009, and Agile2013 conferences.

After thirteen months at Thoughtworks I began working remotely. An early highlight of my remote career was working at Socialtext, an Enterprise wiki company. Many of my colleagues at Socialtext have become influential in their later pursuits.

I created and hosted the Writing About Testing conference in 2009, and repeated it in 2010. I think that WAT has had a positive influence on discourse in the field of software testing in the time since.

In 2012, as a direct result of my having met Jeff "Cheezy" Morgan at a peer conference, Željko Filipin and I founded the QA/testing and browser test automation practice at the Wikimedia Foundation, testing the software that powers Wikipedia. The end of my tenure at WMF came at a difficult time.

In 2015 I founded the QA/testing and browser test automation practice at what was then the Salesforce Foundation and is today called Salesforce.org. I wrote my first Ruby gem in the service of this project, a wrapper for the Salesforce API that uses metaprogramming to to handle API calls for "objects" and "fields" that may or may not have arbitrary "namespace" values in the target Salesforce environment.

From time to time I find it helpful to write these "Who I Am and Where I Am" pieces. I hope my story has been of interest.

Test Heuristic: Managing Test Data

2017-12-21T13:07:00.003-07:00

Managing test data is one of the most difficult parts of good testing practice, but I think that managing test data receives surprisingly little attention from the testing community. Here are a few approaches that I think are useful to consider for both automated testing and for exploratory testing.

Antipattern: Do Not Create Test Data In The UI

But first I want to point out what I think is a mistake that many testers make. When you are ready to test a new feature or you are ready to do a regression test, or you are ready to demonstrate some aspect of the system, the data you need to work with should already be in place for you to use. Having to create test data from scratch before meaningful testing begins is an anti-pattern (and a waste of time!), whether for automated testing or for exploratory testing.

Create Test Data With An API

This is usually my favorite approach to managing test data. Considering this scenario in the context of an automated test, I usually will have a 'setup' step in the test that will use whatever API or APIs exist to create a set of records that the subsequent test will use. Then after the test runs, I have a teardown step that deletes all of the data created in the setup step. This keeps my test environment clean and ready for use by any test in any order. I often mark the records I create this way with a particular string based on random numbers or a date-time value, so I always know for certain that I am working with the data I expect to find.

I take this approach with exploratory testing as well. At times I will have a dedicated script that will use an API to load a set of data into a test environment, but more often I will simply comment out the tear-down steps in whatever automated test is appropriate, then use my partial automated test to lay down a set of data that I can use for exploratory testing, without my having to set it all up by hand. This is also a handy approach when designing automated tests.

There can be drawbacks to using an API to set up and tear down data. In some cases the API can be quite slow. In other cases the nature of the data relationships among records can be unmanageably complex. Sometimes a different approach to managing test data is warranted.

Existing Data Store

It may be possible to save a 'golden image' of a set of useful test data such that the set of data can be deployed at will to a test environment. In this case you could deploy the whole set of expected test data to a fresh test environment, or you could swap out a set of data that you have altered until it is no longer useful with the fresh set of pristine data that you need to work with.

I once worked with an application that would happily address a number of different data stores, including SQLite for development purposes. This made swapping test data sets in and out quite simple.

Extract From Production

While this is a practice I have seen almost exclusively in the mainframe world, it is an approach that may be appropriate in some situations. It may be that you could create some sort of Extract/Transfer/Load (ETL) operation that would pull actual production data from a production instance into your test environment.

Automated UI Test Heuristic: Sleep and Wait

2017-12-14T17:05:00.001-07:00

Wait

In modern web applications, elements on the page may come into existence as a result of actions that the user may take, or they may disappear from the page as a result of actions that the user may take. In order to test modern web applications in the browser, it is rare that tests do not have to wait for some condition or another to be true or false before the test can proceed properly.

In general, there are two kinds of waiting in automated browser tests. In the language of Watir, these are "wait_until" and "wait_while". If you are not using Watir, you have probably already implemented similar methods in your own framework.

My experience is that wait_until type methods are more prevalent than wait_while type methods. Your test needs to wait until a field is available to fill in, then it needs to wait until the "Save" button is enabled after filling in the field. Your test needs to wait until a modal dialog appears in order to dismiss it, then it needs to wait until the modal dialog is gone before the test proceeds.

But wait_while style methods are still important. Your test may wait while a spinner is on the page. Your test may wait while an interim message like "Processing..." is on the page.

And of course there are variations and elaborations on these. A sometimes useful method in modern Watir is "wait_until_present" which waits until an element on the page is both visible and allows interaction.

These methods poll the page over and over again (typically multiple times per second) for a set amount of time until the condition they are waiting for is true before allowing the test to proceed. I often call these "polling waits" as I think that is a better description of the actual function of the wait. Polling waits are far and away your best choice for handling situations where elements on the page that your tests depend on come and go in response to actions in the pages.

Sleep

But sometimes a polling wait just does not work. Web pages can do strange things, and while it should be somewhat rare, simply stopping your test for some number of seconds may be the best thing to do.

For example, your web page may kick off a data update of some sort on the back end of the system for which there is no notification. While it may be appropriate to have the test issue API calls until the data update is complete, it may be more convenient simply to have the test sleep for a short time.

Or it may be that the condition you would wait for is so ephemeral that you can't capture it in order to wait for it. In this case, a one- or two-second sleep might be appropriate.

This is hard to explain, but from time to time I have seen web pages that will for example briefly manifest an interim page or otherwise cause faulty signals from the browser, and then you can see Selenium try to find an element in the wrong DOM. On rare occasions you may want to have a test sleep while the server decides what the DOM presented to the user should be.

Prefer Polling Waits, Keep Sleep Waits Short

In general, it is best to prefer polling waits over sleep waits. On those occasions when you are forced to use sleep waits, I find that a good rule of thumb is to have sleep waits of no more than one or two seconds, maybe three seconds under difficult circumstances. Sleep waits of small duration will not hurt the performance of your test suite too badly. And of course you should take regular passes through your battery of tests to refactor sleep waits to polling waits whenever you can.

Test Automation Heuristic: No Conditionals

2017-11-26T13:23:00.000-07:00

A conditional statement is the use of if() (or its relatives) in code. Not using if() is a fairly well-known test design principle. I will explain why that is, but I am also going to point out some less well-known places where it is tempting to use if(), but where other flow constructs are better choices. My code looks like Ruby for convenience, but these principles are true in any language.

No Conditionals in Test Code

This is the most basic misuse of if() in test code:

if (x is true)
  test x
elsif (y is true)
  test y
else
  do whatever comes next

It is a perfectly legal thing to do, but consider: if this test passes, you have no way of knowing which branch of the statement your test followed. If "x" or "y" had some kind of problem, you would never know it, because your conditional statement allows your test to bypass those situations.

Far better is to test x and y separately:

def test_x
  (bring about condition x)
  (do an x thing)
  (do another x thing)
end

def test_y
  (bring about condition y)
  (do a y thing)
  (do another y thing)
end

No Conditionals in Framework Code

It is also good to avoid conditionals in other parts of your test framework. One example in my own framework is where I choose which browser to run my tests for. It is tempting to do something like

if ENV['SELENIUM_BROWSER'] = 'firefox'
  caps.platform = 'Windows 10'
  caps.version = '53.0'
elsif ENV['SELENIUM_BROWSER'] = 'chrome'
  caps.platform = 'Windows 10'
  caps.version = '58.0'
etc.

In this instance, I think that a case() statement is not only more readable, but also more definitive as to what the environment variable options must be. Although a case() statement may have 'else' as an option, instead of falling through a series of elsif() conditions to the end of the chain, a case() statement directs the code exactly where you want it to be.

case ENV['SELENIUM_BROWSER']
        when 'internet_explorer'
          caps.platform = 'Windows 10'
          caps.version = '11.103'
        when 'chrome'
          caps.platform = 'Windows 10'
          caps.version = '58.0'
        when 'firefox'
          caps.platform = 'Windows 10'
          caps.version = '53.0'
        else
          puts 'Environment variable SELENIUM_BROWSER is not valid'
      end

No Conditionals for Irrelevant Conditions

From time to time, test automation has to accommodate local conditions that have no bearing on the test itself. When this happens, it is tempting to say "if (stuff is in the way)...". In these cases, I prefer a try/catch operation (in Ruby, this is begin/rescue) over a conditional. I have a couple of examples:

One system I work in has an intermittent bug. I can't fix this bug, it is part of a third-party framework my application uses, and my team has no access to the source code, nor do we have influence with the people providing the framework. I have to work around it. Rather than using a conditional to check if the bogus condition is in place before handling it, I just attempt to handle it, and if the attempt fails, I move on. The code (shown as a step in Cucumber) looks like this:

When(/^I click the Action button$/) do
  begin # REMOVE BOGUS ERROR MODAL DIALOG IF IT EXISTS
    on(FooPage).close_this_window_button
      @browser.refresh
      on(FooPage).action_button
    rescue
  end
end

As another example, I work in test environments where it is impossible to delete a User. Instead of using a conditional to check if the User record I need is in place before the setup steps create it, I just attempt to create the record, and if I fail, print an informative message and move on:

begin
    create_user_via_api("Test_user #{@random_string}"
  rescue
    puts 'User already exists'
end

In my system, this is actually more efficient than using a conditional to check for the existence of the User record I need.

No Conditionals When Controlling the Test Environment

One last pattern I follow is to treat my test environments in a REST-like fashion. REST stands for "Representational State Transfer", and it is used to describe a particular approach to managing software systems.

In a REST-like approach, you have a client and a server. The client sends a message to the server saying in essence "Be the way I want you to be". The server may return one of several responses, saying "OK" or "You can't do that" or "I can't do that" (In HTTP, these are respectively 200, 400, and 500 response codes. )

We can adopt the same approach when managing our test environments. One example from my own experience is that I have to manage test environments where, when I first encounter the system, a drop-down list of applications exists containing a number of options that are not of interest, and one option that is the application I want to test.

To simplify, let's say my drop-down list contains options for "Foo App", "My App", and "Bar App". Let's also say that the default choice in a fresh test environment is "Foo App", but in a test environment I have already used, I have already chosen "My App", and it becomes the default choice after that.

A naive use of a conditional would do something like

if app == "Foo App"
  select "My App"

But if for some reason the default option is set to "Bar App", then this code breaks down.

A less naive use of a conditional would do something like

if app != "My App"
  select "My App"

This is better, because it does not matter what wrong choice is the default, the code will always select "My App" if it is not already selected.

But that is not how I prefer to manage my test environments. In this case I would simply "select "My App"" every time I hit any instance of this system. In a REST-like approach, I tell the system to be the way I want it to be regardless of what state it is in before I encounter it.

There is a valid argument that this is less efficient than the "if app != "My App"" approach, but I do this because there are many things I may not know about the state of any particular instance of this environment before my test code begins manipulating my application. Always starting by selecting "My App" forces the system to call the server and refresh the page with the latest conditions, regardless of what state the page may be in when I first encounter it, regardless of what my test code may have done before it starts to manipulate my application. This REST-like approach to system state guarantees the most consistent behavior for the automated tests that follow.

UI Test Heuristic: Don't Repeat Your Paths

2017-11-24T11:07:00.001-07:00

There is a principle in software development "DRY" for "Don't Repeat Yourself", meaning that duplicate bits of code should be abstracted into methods so you only do one thing one way in one place. I have a similar guideline for automated UI tests, DRYP, for "Don't Repeat Your Paths".

I discuss DRYP in the context of UI test automation, but it applies to UI exploratory testing as well. Following this guideline in the course of exploratory testing helps avoid a particular instance of the "mine field problem".

Antipattern: Repeating Paths to Test Business Logic

I took this example from the Cucumber documentation:

Scenario Outline: feeding a suckler cow
  Given the cow weighs "weight" kg
  When we calculate the feeding requirements
  Then the energy should be "energy" MJ
  And the protein should be "protein" kg

  Examples:
    | weight | energy | protein |
    |    450 |  26500 |     215 |
    |    500 |  29500 |     245 |
    |    575 |  31500 |     255 |
    |    600 |  37000 |     305 |

Cucumber is great for UI tests, but Cucumber is great for lots of other kinds of tests also. This example is a really poor UI test, for two reasons. Good UI tests take a single path through the application; and good UI tests should not test business logic.

The point of a UI test is to navigate from one point in the application to a final point in the application in order to demonstrate that the application allows users to accomplish the things they need to accomplish. In the example above, more than one pass through through this path in the application is pointless; if the first pass fails, all the subsequent passes will also fail, yielding no new information. And if the underlying mathematical calculations are in fact wrong, a UI test is a really bad place to have to discover that.

Antipattern: Repeating Paths to Test Boundaries, Equivalence Classes, and Errors

Here is another related example that is a variation on the testing-business-logic antipattern.

Scenario Outline: deposit amounts
  Given a user has an account
  When we deposit "amount"
  Then we should see "result"

  Examples:
    | amount | result   |
    |     -1 |  error   |
    |      0 |  error   |
    |      1 |  success |
    |   1000 |  success |
    |   1001 |  error   |

In the first example, the point of the test was to check the underlying calculation engine. In this example, the point of the test is to test boundary conditions and equivalence class partitioning. These tests are done far more efficiently at the API level or the method level. Doing this sort of thing in the UI is terribly expensive.

Use Unique Single Paths to Test Errors

However, there is a legitimate argument for testing the appearance of error messages in the UI. My design approach for testing the appearance of error messages in the UI falls into two categories:

The most convenient situation is when I am certain that all the error messages are handled by the same error processing code. In this case, I am confident that having demonstrated the proper appearance of one error, all the other errors will behave in a similar way, and I do not have to repeat my path through the application to generate more than one error.

The inconvenient case is when I know that different errors are handled by different routines in the system. Even here, instead of going around and around the same path in the application, I prefer to create individual paths that target individual errors:

Background: 
  Given I have an account
  And the account contains "$100"

Scenario: negative balance
  When I withdraw "$101"
  Then I should see the negative balance error

Scenario: exceed balance limit
  When I deposit "$901"
  Then I should see the balance limit exceeded error

Note that in this example, the Background steps will have been accomplished by way of some sort of API or data-load operation. The individual Scenarios will in fact represent distinct paths through the application, since each path shares the smallest number of steps, and each path ends in a different and unique state.

DRYP in practice

A good automated UI test navigates a path through the application in order to demonstrate that your users continue to be able to accomplish the things they need to accomplish in your application.

A good suite of automated UI tests will each navigate as unique a path as possible through your application in the service of your users.

A good suite of automated UI tests manifests what I call "feature coverage". By "feature coverage" I mean that the test suite executes a web of unique navigation paths in a way that makes it very difficult to be unaware of the behavior of any given feature in your application.

Test Automation Heuristic: Minimum Data

2017-11-18T10:53:00.001-07:00

When designing automated UI tests, one thing I've learned to do over the years is to start by creating the most minimal valid records in the system. Doing this illuminates assumptions in various parts of the system that are likely not being caught in unit tests.

As I have written elsewhere, I make every effort to set up test data for UI tests by way of the system API. (Even if the "API" is raw SQL, it is still good practice.) That way when your browser hits the system to run the test, all the data it needs are right in place.

For example (and this is a mostly true example!) say that you have a record in your system for User, and the only required field on the User record is "Last Name". If you start designing your tests with a record having only "Last Name" on it, you will quickly uncover places in the system that may assume the existence of a field "First Name", or "Address", or "Email", or "Phone". For some background on this sort of thing see the well-known article "Falsehoods Programmers Believe About Names"

I ran into a particularly interesting case of this some time ago where I was testing with a record with a field containing an empty string. I encountered a part of the system where I should have been able to delete the record, but lack of that field prevented the operation. What was happening was that the system expected to find a string of text in that field on the record, and it expected to do a split() operation on a particular character in that string, which would return an array of strings split on that one character.

First of all, I expected to be able to delete the record, there was no reason the absence of this particular string should have prevented that. Failing that, I would have expected an error message along the lines of "Information in Field X cannot be read or is missing" or some such. But the system did not catch that exception since it didn't expect it, and the message I got instead was the language-level error "ARRAY INDEX OUT OF BOUNDS", which of course is not very helpful.

There is an argument that goes something like "But no user would ever do that", to which the canonical QA answer is "I'm a user and I just did it", but that reply is facile and over-simple. The underlying issue is that once your system gets into the hands of your customers, you cannot predict what your customers are going to try to do in the system. You owe it to them to be as helpful as possible. The much better answer is that you expect this system to scale to many users and many records of many kinds, and that the possibility of any one event happening at scale approaches certainty. There is a very good essay on this from long ago in 2004 called One In A Million Is Next Tuesday.

Of course not every operation can proceed with minimal data. You can't change an address from one value to another if you have no starting address and no address to change to. But starting your design with the most minimal legal set of data for the system is a great way to uncover assumptions about what data should and should not exist for your UI operations.

And finally, designing and building out test data takes time and thought. By starting with the minimum set, your tests are up and running and finding issues as quickly as they possibly can be.

Selenium Implementation Patterns

2017-11-11T15:13:00.000-07:00

Recently on Twitter Sarah Mei and Marlena Compton started a conversation about "...projects that still use selenium..." to which Marlena replied "My job writing selenium tests convinced me to do whatever it took to avoid it but still write tests..." A number of people in the Selenium community commented (favorably) on the thread, but I liked Simon Stewart's reply where he said "The traditional testing pyramid has a sliver of space for e2e tests – where #selenium shines. Most of your tests should use something else." I'd like to talk about the nature of that "sliver of space".

In particular, I want to address two areas: where software projects evolve to a point where investing in browser tests makes sense in the life of the project; and also where the architecture of particular software projects make browser tests the most economical choice to achieve a desired level of test coverage.

What Is A Browser Test Project?

Also recently Josh Grant wrote a blog post "How big is your UI automation project?" where he defines Selenium projects in terms of size. In Josh's terms, here I am discussing "medium" projects that expect to become "enterprise" size (or bigger!)

Given projects of this scale, we can't talk about using Selenium alone. Using Selenium alone for projects of this size would be madness, and destined for failure. Regardless of what language you use with Selenium, you will also need:

an abstraction layer or convenience methods that wrap Selenium itself. The most well-known instance of such a wrapper is the Watir project in Ruby, but others exist. (I've often said that if you don't use Watir already you will eventually write Watir yourself.)
an assertion framework for checking test results
a logging mechanism for reporting and collating test results
a test runner. Cucumber is today probably the best-known test runner, but this could be a keyword-driven table framework, or some other test management system

This is why I rarely talk about "testing with Selenium" and instead talk about "browser testing". A successful browser test project needs a lot more structure than just Selenium.

Selenium As a Logical Evolutionary Step

Over the last decade or so I have been part of four projects that followed a similar path to adopting a browser testing practice. I rarely hear others describe this path.

These are projects that start with just a few developers and a very high level of quality from the beginning of the project. Unit test coverage is high, code review/pair programming is essential practice, technical standards and project standards in general start high and are kept high. In three of the four cases in my history, the code is open source and subject to public scrutiny. The number of users grows, the budget for software development grows too.

As the project gets larger, the team finds that there are risks to the project that unit tests alone do not cover, so they institute integration tests that exercise the data store and interactions between system states and parts of the system that could interact in undesirable ways. The project continues to grow and to be successful.

Then the team realizes that there is a class of potential problems that can only be identified at the user interface and that the only way to address this particular class of risk is to have a browser test practice. This is where I make a living.

In the evolution of projects like I describe here, at the time a browser test practice is called for, the practice demands a high level of quality and sophistication. There is little leeway or tolerance for waste, or worse, failure. The browser test practice has to be expert from the beginning in order to be accepted in the existing culture of quality, and the scale of the project has to be significant in order to provide the value that the team expects.

Note that in three of the four instances of this evolutionary pattern that I have experienced personally, it is only after implementing a browser test practice that the team identifies a final level of risk, and goes about creating (or reifying) an exploratory testing practice. There is in fact a market for excellent exploratory testers on projects of very high quality. What I think many people find unusual that the role of exploratory tester is the last role added to the whole practice.

From what I gather, what I describe here is not the experience of most people in the testing community. I hope my description here opens the discourse.

Selenium As a Logical Strategic Choice

Some applications are difficult to test thoroughly at levels below the user interface, so the most efficient test approach is to test at the user interface. In other cases, a browser testing practice is indicated because the nature of the application dictates that the user interface has to be complicated, and simplifying the user interface would be detrimental to the project.

For the first case, where the underlying architecture of the project makes browser testing a logical choice for test coverage, my favorite reference is David Heinemeier Hansson's essay from 2014 "TDD is dead. Long live testing." In it he argues that unit testing is a good first step that will eventually evolve into a robust set of "system tests". (Titus Fortner, who maintains the Selenium Ruby bindings and also the Watir project, likes to refer to this testing approach as "DOM-to-database") Selenium is the agent by which DHH's system tests happen.

DHH, of course, is the author of Ruby on Rails. I am not an expert on Rails, but I have been told that unit testing in Rails is difficult. But if you take a close look at how Rails actually works, a Rails app is in fact nothing but a collection of DOM-to-database operations. It makes sense to test Rails apps in this way.

Another example comes from Wikipedia. The core of Wikipedia is written in PHP. Somewhat like Rails, good unit testing in PHP is more difficult than in many other languages. When good unit testing is impractical, it makes sense to adopt a testing practice at a higher level, which is exactly what Željko Filipin and I did at the Wikimedia Foundation starting in 2012. Our original project was in Ruby. As I understand, they are porting the project to javascript now that Selenium bindings are fully supported in that language, but the design goals remain the same. I wish them well.

Finally, it may be that the nature of the application itself demands a complicated and complex UI, and testing that complexity must happen in the UI. One application I worked on was a coursework application for art students, where the controls for manipulating visual materials were the entire reason for the existence of the application, so all the function was in the UI. Another application I work on today is pushing the edge of the features available in a UI framework provided by a third party. It is complex and sophisticated and we don't have access to the framework internals.

The Selenium Sliver

With the WebDriver standard being backed by Mozilla, Google, Microsoft, Salesforce, etc., I don't think Selenium will become obsolete any time soon. Selenium solves a real problem that software projects of the highest quality continue to encounter.

But it may be that the scope of that problem is more narrow than most people think.

Watir is What You Use Instead When Local Conditions Make Automated Browser Testing Otherwise Difficult.

2017-09-25T20:06:00.000-06:00

I spent last weekend in Toronto talking to Titus Fortner, Jeff "Cheezy" Morgan, Bret Pettichord, and a number of other experts involved with the Watir project. There are a few things you should know:

The primary audience and target user group for Watir is people who use programming languages other than Ruby, and also people who do little or no programming at all. Let's say that again:

The most important audience for Watir is not Ruby programmers

Let's talk about "local conditions":

it may be that the language in which you work does not support Selenium

I have been involved with Watir since the very beginning, but I started using modern Watir with the Wikimedia Foundation to test Wikipedia software. The main language of Wikipedia is PHP, in which Selenium is not fully supported, and in which automated testing in general is difficult. Watir/Ruby was a great choice to do browser testing. At the time we started the project, there were no selenium bindings for Javascript. Now that Selenium is fully supported in Node.js/webdriver.io, I understand that WMF is porting the Watir-based tests in 20+ repositories to Node/webdriver.io. It is very exciting.

Today I use Watir at Salesforce.org, the philanthropic arm of the much larger company Salesforce.com. Salesforce has its own proprietary programming language APEX, which does not have Selenium bindings and never will. Watir is the best choice to do automated browser testing in this environment.

it may be that even if your language supports Selenium, automated browser testing is still really hard

Selenium is fully supported in C#, but many Windows organizations do not use C#. Even when they do, the amount of scaffolding required for robust browser testing can be daunting. Creating an assertion framework, a logging structure, a Page Object model, a test runner, all of these take significant investment. The Watir project allows you to short-circuit all that work.

Java and Python are in a similar situation. Watir and Ruby provide an off-the-shelf answer that is remarkably well-documented, with a proven history of excellent design and implementation decisions as well as an active and polite user community. (Watir as a concept is actually older than Selenium!)

it may be that the browser test automation staff do not have access to the feature code

Certainly this is an anti-pattern, but it is sometimes the case that the people who need to do browser test automation are not in regular contact with the people creating the code that needs testing.

Two Paths for Watir

Right now today there are two approaches to using Watir, each with their own design philosophy, each with their own
(overlapping) sets of tools.

Simple, straightforward, and easy; but powerful

The first approach is exemplified by the project at watir_install, that I will call Team Titus. Team Titus wants to provide a single installation that yields an instantly usable Watir testing instance with no configuration, with examples in place for anyone to follow. Regardless of what programming language you find most comfortable, Team Titus wants to provide the most readable, understandable framework possible.

The second approach is exemplified by the project at page-object, that I will call Team Cheezy. Team Cheezy wants to provide the most powerful application of Watir for browser testing possible. Team Cheezy gives you a framework that needs to be tweaked a little to work, and has more layers requiring understanding. To understand the internals for Team Cheezy you need to know something about how Ruby works.

You should know that the barriers to entry for Team Cheezy were already low; Team Titus wants to make them even lower, but at the cost of some power in the framework.

Note that these approaches are not incompatible. They do not compete, they cooperate. Bits of each framework can be switched in and out.

Me, I started with Team Cheezy about six years ago and I'm sticking with it. I need that power. However, my heart is with Team Titus, and I intend to contribute as that project grows.

Choosing between Team Titus and Team Cheezy

Team Titus (watir_install)	Team Cheezy (page-object)
Don't care about Cucumber	Want Cucumber
Need easy install	Willing to tweak some directories and settings
Only encounter standard HTML and DOM	Needs to address non-standard page elements
Probably don't need to scale across multiple repos	Needs to work across multiple repos
Want simple internal code to analyze	Willing to learn some Ruby magic for internals

I went to Toronto to talk these people because I saw changes happening and I need to know the context of those changes. I
learned what I needed to know:

Everyone agrees that the primary audience and most important users for Watir are people who need to do automated browser testing BUT who are not primarily Ruby programmers.

We need to provide a framework that works for everyone upon installing it.

We need to provide power to people who need power.

We need to make it easy to start, but also easy to improve.

Appendix: Why not target Ruby programmers?

For historical reasons, the Ruby web development community mostly uses Capybara and not Watir. We might argue which is
better, but it would be pointless. Watir has always been intended for use by everyone, not just Ruby programmers, and we
carry on that design philosophy.

Use "Golden Image" to test Big Ball Of Mud software systems

2017-02-20T16:42:00.000-07:00

So I had a brief conversation on Twitter with Noah Sussman about testing a software system designed as a "Big Ball Of Mud" (BBOM).

We could talk about the technical definition of BBOM, but in practical terms a BBOM is a system where we understand and expect that changing one part of the system is likely to cause unknown and unexpected results in other, unrelated parts of the system. Such systems are notoriously difficult to test, but I have tested them long ago in my career, and I was surprised that Noah hadn't encountered this approach of using a "Golden Image" to accomplish that.

Let's assume that we're creating an automated system here. Every part of the procedure I describe can be automated.

First you need some tests. And you'll need a test environment. BBOM systems come in many different flavors, so I won't specify a test environment too closely. It might be a clone of the production system, or a version of prod with fewer data. It might be something different than that.

Then you need to be able to make a more-or-less exact copy of your test environment. This may mean putting your system on a VM or a Docker image, or it may be a matter of simply copying files. However you accomplish it, you need to be able to make faithful "Golden Image" copies of your test environment at a particular point in time.

Now you are ready to do some serious testing of a BBOM system using Golden Images:

Step One: Your test environment right now is your Golden Image. Make a copy of your Golden Image.

Step Two: Install the software to be tested on the copy of your Golden Image. Run your tests. If your tests pass, deploy the changes to production. Check to make sure that you don't have to roll back any of the production changes. If your tests fail or if your changes to production get rolled back, go back to Step One.

Step Three: the copy of your first Golden Image with the successful changes is your new Golden Image. You may or may not want to discard the now obsolete original Golden Image, see Step Five below.

Step Four: Add more tests for the system. Repeat the procedure at Step One.

Step Five (optional) You may want to be able to compare aspects of a current Golden Image test environment with previous versions of the Golden Image. Differences in things like test output behavior, file sizes, etc. may be useful information in your testing practice.

Open Letter about Agile Testing Days cancelling US conference

2017-02-13T18:21:00.001-07:00

I sent the following by email contact pages to Senator John McCain, Senator Jeff Flake, and Representative Martha McSally of Arizona in regard to Agile Testing Days cancelling their US conference on 13 February.

Agile Testing Days is a top-tier tech conference about software testing and Quality Assurance in Europe. They had planned their first conference in the USA to be held in Boston MA, with a speaker lineup from around the world. They cancelled the entire conference on 13 February because of the "current political situation" in the USA. Here is their statement: https://agiletestingdays.us/

Although I was not scheduled to attend or to speak at this particular conference, it is conferences such as Agile Testing Days where the best ideas in my field are presented, and it is from conferences such as Agile Testing Days that many of my peers get those ideas, and I rely on conversations from those who do speak and attend in order to stay current in my field.

As a resident of Arizona, cancelling such conferences affects me directly. I have enough expertise and skill to live anywhere I choose. I choose to live in Arizona, but my work absolutely depends on the free flow of people and information across national and state borders.

It is shameful that such a prestigious and respected multi-national software organization finds it necessary to cancel their first ever conference in the USA because of the outrageous policies of the current administration. I urge you to take measures to make organizations such as Agile Testing Days and their attendees and speakers feel safe and welcome, as they should be.

Chris McMahon
Senior Member of Technical Staff, Quality Assurance
Salesforce.org
Tucson, AZ

Sanction Keith Klain

2016-08-03T08:24:00.000-06:00

I have asked the Association for Software Testing to make a statement regarding the relationship of AST with their former Board officer Keith Klain in regard to the lawsuit for fraud filed against Klain by his former employer Doran Jones.

The lawsuit alleges that Klain did a number of reprehensible things. What should concern the AST and the software testing community in particular is that Klain is alleged to have been a party to sabotaging and undermining the training in software testing given to disadvantaged people by Per Scholas in New York City.

Klain filed a "LETTER addressed to Judge Analisa Torres from A. Goldenberg dated July 20, 2016 re: Request for a Pre-Motion Conference Regarding Anticipated Motion to Dismiss". Of concern to the software testing community is that this letter in no way disputes or denies Klain's behavior alleged in the law suit; this letter merely attempts to make the case that Klain's abominable behavior should not meet the letter of the statute itself for actual fraud. (1)

Then Doran Jones answered that letter with one of its own "FIRST LETTER addressed to Judge Analisa Torres from JASON H. KISLIN dated August 1, 2016 re: Plaintiff's Response to Defendant Klain's Request for a Pre-Motion Conference." (2), which concludes in part:

"In addition, should the Court dismiss Doran Jones' CFAA claim, its remaining state law claims -- for breach of contract, breach of the duty of good faith and fair dealing, breach of fiduciary duty, tortious interference with prospective economic advantage, tortious interference with contractual relations, misappropriation of trade secrets, fraudulent inducement, and declaratory judgment -- will be properly before this Court because they do not require the Court to rule on novel or unsettled issues of state law."

This is reprehensible at best, immoral at worst. Not only should the AST sanction Keith Klain, but TechWell, SoftwareTestPro, and the conference organizations at which Klain presents should seriously reconsider their current and past association with Klain, and anyone else with whom he has any ties should consider the value of that association as well.

(1) Anyone may get direct access to court documents for this case for free or for a nominal fee by creating an account at PACER The case number is Case 1:16-cv-02843-AT . Notices of actions are available publicly for free.
(2) This PDF does not render in my browser, but seems OK after downloading. For an original copy see (1) above.

Open letter to "CDT Test Automation" reviewers

2016-07-14T08:23:00.000-06:00

To:

Tim Western
Alan Page
Keith Klain
Ben Simo
Paul Holland
Alan Richardson
Christin Wiedemann
Albert Gareev
Noah Sussman
Joseph Quaratella

Apropos of my criticism of "Context Driven Approach to Automation in Testing" (I reviewed version 1.04), I ask you to join me in condemning publicly both the tone and the substance of that paper.

If you do support the paper, I ask you to do so publicly.

And regardless of your view, I request that you ask the authors of the paper bearing your names to remove that paper from public view as well as to remove the copy that Keith Klain hosts here. For the reasons I pointed out, this paper is an impediment to reasonable discussion and it has no place in the modern discourse about test automation.