Skip to main content

agile COBOL is no oxymoron

Dr. Brian Hanks of Fort Lewis College said on Twitter just now:

"Agile Cobol - the new oxymoron"

I have to disagree. In the late 90s I worked testing a life-critical 911 data-validation system written in COBOL and C. I was the tester when we migrated from key-sequenced data files (basically VSAM, or very close) to an SQL database (albeit one with no relational integrity-- we had to write that into the code).

When I joined the company, system releases were chaos. By the time I left, system releases were orderly, done on a regular basis, with great confidence. We evolved what was essentially an agile process to regularly ship running, tested features that our customers wanted.

But we broke every rule in the book (of the time) to do it. We had customers talking to developers all the time, we had sysadmins reviewing features before release, we had testers reading code and running debuggers, we (quietly) ignored test plans in 5-inch binders.

The Agile Manifesto validated this way of working. It was the first public acknowledgment that we were not alone in thinking that it took people working together all the time to release good software. Suddenly I had somewhere to point to say "See? Someone else thinks that working this way is a good idea!" As the Agile Manifesto became more widely known, any number of old-school mainframe people came out of the woodwork to talk about how they had been doing similar work for a long time. The Manifesto did not come into being from nowhere.

I want to talk about a few test techniques I used in a couple of COBOL code bases that are creative analogs of modern techniques, but using old tools.

Human Unit Testing

My mainframe career was spent all on Tandem systems. Tandem had a GREAT debugger, and a GREAT proprietary scripting language. As a tester, the first thing I did whenever new code got checked in was to crank up the debugger, set a breakpoint at the new code, step back and forth modifying the values of variables trying to cause a failure. When I did cause a failure, I would step back to a system level and use the UI or the batch processes to attempt to generate the data to cause the failure. Most of the time this was possible. Then I filed a bug report.

It is worth noting, now that the statute of limitations has probably run out: I was completely confident that our system would survive Y2K. Not because of the system test plan in the 5-inch binder (a worthless and stupid exercise required by someone in another part of the building whom we never talked to and that I ignored outright) but because I had personally examined the use of every instance of every date in every place in the entire code base by hand, in the debugger. That took significant time, but it was worth it.

Debugger + macros + scripting = automated unit testing

In hindsight I should have done a lot more of this.

Tandem's scripting language allowed you to start a program, call the debugger, and load something called a 'macro'. The macro would drive the debugger, allowing you to set breakpoints, modify and capture values during the course of the run.

I used to use this technique a lot to accomplish what Jonathan Kohl calls "Computer Assisted Testing". Use automation to accomplish the tedious steps of bringing the system to a state where something interesting is just about to happen, then let human beings take over.

If I knew then what I know now, I could have built some awesome standalone regression test suites using this technique.

Golden Snapshot

I worked in another COBOL code base that was a disaster. Actually, the whole company was a mess. To give a sense of how bad it was: the ball of mud was far too big for the compiler. Whenever the size of the main procedure had to be cut, the devs didn't peel off a module or a functional area; they just cut lines 5000-6000 or whatever and called it a day. Utter chaos.

And improving the code base would have threatened the jobs of the most senior devs. The best developers in the company were given impossible assignments, sabotaged along the way, blamed for the failure, and then demoted. Not nice.

The code was so bad it literally could not be automated at a unit or integration level. So I did two things.

First, we designed a "smoke test" set of manual test cases. We had the best tester in the shop run each test very carefully by hand. At the end of each reference test run we took a snapshot of all the system files and stored them in directories named for each test case.

For subsequent manual runs, the tester would execute the steps, then (figuratively) press a button that caused a diff of the saved files vs. the current system files. Any discrepancies between the state of the files was cause to suspect a bug.

In step 2 of this system, I used a tool called VersaTest from a company called Ascert to begin automating the stuff that the manual testers did. This tool was basically a proxy for the UI and allowed me to drive the system guts by bypassing the UI. To this day, I have never had a better customer support experience than I did with Ascert. I was calling them constantly, and one of their customer support people became a real friend I still speak with now and then.

This is pretty much a last-resort automation option. For one thing, it is extremely expensive up front. For another thing, debugging failures is depressingly hard.

Stubs

I was teaching myself Perl at this time also, which came in really handy on several occasions.

At one point I was assigned to a small side project. One of the better devs had run afoul of his peers again and been sentenced to work on this not-sexy side project. This dev is responsible for one of my favorite quotes of all time. After we'd been working together for a while and the project was starting to flesh out a bit, I noticed that his COBOL was quite a lot better than the code in the main code base and the code I'd seen from other devs. He said "Just because it's COBOL doesn't mean it HAS to be ugly".

This project was a transaction system. Our code would send an transaction, and an outside party would send back one of 4 messages: Success; Failure; Queued for Later Processing; or Later Processing Complete. But their test interface was really flaky and was down most of the time. I wrote a little Perl script that would reply to our system with one of the 4 messages depending on whether the last digit in the transaction number in the record was 1, 2, 3, or 4. It was rather clever and simple, and that little script saw use for the whole life of the project. This little side project was also neat because the team was me, the dev, and a business guy, and we were iterating fast, all together, all using the code, all analyzing the work at the same time.

Meanwhile, in the big sexy part of the company, there was a highly visible project to do transactions with a very important partner, but our dev schedule was ahead of theirs, so interaction testing was some way out. When I was informed that the code was "done", I read the spec and implemented a little Perl network server that could read and write messages over TCP/IP. I pointed our code at it, which immediately fell over dead. I started reporting bugs on the code. The dev actually complained to my boss about this. He could not understand how someone could report bugs when the other side of the transaction was not complete. One day though, he turned all the way around and started being nice to me. I suspect someone had reviewed his "done" code and chewed him out.

No Oxymoron

So it is possible to have automated unit testing, integration testing, and system testing in a COBOL/mainframe environment. Furthermore, it is not only possible to develop COBOL according to the Agile Manifesto; I think the points made by the Manifesto came from the cultures of successful mainframe projects.

Agile software development has been around for a long, long time. It just didn't have a name until those guys got together at Snowbird and cranked up the Agile movement.