Why The Andromeda Strain is the greatest debugging tutorial ever made

Note: This is an unfinished "slush-pile" article. It may change drastically or be deleted in the near future. And "near future" could be five years later.

 Michael Crighton is famous for Jurassic Park, but while he was a med student he wrote the greatest tutorial ever made for debugging computer programs. Later it was made into a movie by Robert Wise--director of The Day The Earth Stood Still, The Sound of Music, and Star Trek: The Motion Picture. The Andromeda Strain was big-budget for its time and followed the book closely enough to preserve the key lessons we're about to cover below. The story is about finding and deciphering an extra-terrestrial bug, but it's a modern parable that I'd like to explain.

Lesson 1: Record what you're going to test before you test it

We use bug tracking systems because most of the work is simply figuring out what the bug is, how it manifests and how to reproduce it. 

A classic problem with bugs—depending on how deep they are in the code—is their tendency to have multiple or mis-directing symptoms. For example, a date arithmetic bug might affect financial calculations (closing quarters too early, amortizing over too many or too few compounding periods, etc.), or customer service (reporting the wrong business hours for a particular day), or system stability (running scheduled tasks at the wrong time).

Code is built on code and deep bugs are discovered by recognizing patterns on the surface. Recording everything you know about a bug gives you and your team a chance to see those patterns line up and point, like an arrow, to the source. 

Lesson 2: "That's crazy, I didn't know buzzards fly at night"

Buzzards flying at night, wounds that don't bleed, five quarts of blood turned to powder, such sights are suspicious as hell. Even if you don't know how a phenomena might be connected to a bug you should slot it in the back of your mind or write it on a slip of paper. Your subconscious will put it all together, eventually.

Lesson 3: "Assign Gunner Wilson, that's if he's not crocked someplace"

Every shop of sufficient size has a Gunner Wilson: that guy who seems a bit flaky but has an amazing talent for seeing things the way you never could. He or she might be on your Q&A or testing team, or might be another programmer, or might just be manning the phones in Customer Support, but they're the one who have that uncanny knack of spotting what you will always miss.

Lesson 4: "We've had experiences with scientists before"

Management is frequently made of people who don't understand or appreciate programming or debugging, and have a particular allergy to time and expense overruns. However, their life revolves around documentation, for it affords delegation, buck-passing and plausible deniability. If you are faced with management decisions that hinder your work then you should document the hell out of what you're doing. 

Lesson 5: "This took time, regardless of what made her do it, it took time!"

Users can rarely tell you what they did to expose a bug, but they sure know one when they see it, and only sometimes file a report. More often than not they will take bizarre and even self-destructive steps in response; keep rebooting, save after every few keystrokes, drink Sterno. Many don't understand that it's a fixable problem and will assume that it's either something they did wrong or the uncontrollable whim of the gods.

But the things users do will have recordable effects: files, log entries, mutated database records, so make sure they have timestamps. Anything users do will take time, and what computers do tends to happen much faster, so you can see the difference between a user's activity and an algorithm's activity based on how close they occur on the timeline.

Lesson 6: "We don't do anything until we get that kid out of here and into a controlled situation"

Resist the urge to treat a symptom of a bug before you know what caused it. "Maybe eating is part of the disease process!" Even if the solution seems obvious and the need to supply it is dire, you're gambling with entropy: maybe your quick "obvious" fix will not only erase the information you need to diagnose the problem correctly, but also destroy the data you'd need to revert to a stable state.

Lesson 7: Eliminate contaminants from your debugging environment

It's unlikely you'll need to go to your office naked, but the tools you take into your debugging environment should. On Twitter I use the "Works On My Machine" logo as a humorous reminder of #1 cause of irreproducible bugs: your workstation is set up just so and you obliviously wrote your program to suit it.

Many software companies keep a zoo of devices that their products are sold to run on, and keep them clean of any detritus that developers tend to pick up in their own personal workstations: custom keyboard shortcuts, browsers, scripts, doodads, viruses, drivers for the USB missile launcher you got from ThinkGeek, etc. Another common practice is the machine image, which can be used to restore a virtual machine to a known and sterile state. The bug may be caused by an interaction with another piece of software or customized settings--which you can introduce as you eliminate other causes--but it's far easier to isolate the bug in your code if you're sure that nothing else is interfering.

Lesson 8: "The only way you might possibly break your suit is with a scalpel, and a surgeon isn't likely to do that"

You can make a bad situation worse by being cocky and writing code that you think will solve a problem and stuffing it all together with other fixes and features into a pile. It's good to have a source-control system such as Git or Subversion (but not Visual Source-Safe, abandon that ship while you can), but on top of that you need something like GitFlow or any consistent practice for managing branches of the source. Code branches can breed like weeds, and anything that imposes a little bit of discipline will help you. 

A surgeon, or a programmer, should be unlikely to cut indiscriminately. The boss asked for Feature X and Bugfix Y. Never combine them in a single commit. Do not slash and burn, do not let the meat and vegetables touch each other on the plate. Contamination is death.

Lesson 9: "Nothing can happen, sir, I'm faster than the Hands"

Most bugs are garden-variety bugs, and they're solved quickly by experienced programmers. It makes them cocky. 

Nasty, persistent, and near-invisible bugs will destroy whole companies when cocky programmers think they can cruise by and fix everything by just being awesome.

One of the talents that make great developers more productive than the average hunt-n-pecker is their ability to accurately intuit what a bug probably is and take shortcuts to find them, but it's a circus trick that you can't rely on. Don't get over-confident when you recognise a Power-Of-Two and realise someone tried to assign an unsigned int to a signed int, or didn't encode a quote-mark. You are going to forget a feature that the boss asked for last year, you are going to forget a critical detail about the data model, you are going to misunderstand a concept that came from a third party.

Don't make a code change until you understand what you're changing and why. Be grindingly thorough with the help of Computer Number 1.

Lesson 10: "You're saying Stone's Ninety Million Dollar Facility was knocked out by a sliver of paper?"

This is the most informative scene that was ever shot in Cinema history: 

It makes the "bone-to-spaceship" scene in Stanley Kubrick's 2001: A Space Odyssey look like an accident in the editing room.

"These were highly trained electronics men, Senator, looking for an electronic fault. The trouble was purely mechanical of the simplest kind, but for them it was like trying to see an elephant through a microscope. The sliver had peeled from the roll and wedged between the bell and striker, preventing the bell from ringing."

Stop what you're doing, go get a cup of tea, sit down somewhere comfortable. No, really. All that stuff above is nothing compared to this scene. It is the most important scene a programmer can ever see in a movie. It is the most useful scene an engineer will ever see in a film, and I don't care how many times you watch Galloping Gertie. Shut up and pay attention.

You will know who the experienced programmers are in your audience when this scene appears and they punch the air and say "now that's what I'm talking about!"

That's how good this scene is. Rent or buy the film, watch it up to this scene, and re-play it until you want to stick your head in a bucket of ice water.

If I had 2 hours to teach a new programmer everything I knew about problem solving, then I'd show them this movie and pause it right after this scene, rewind and replay it as many times as it took until they got it.

When I was in High School my Computer Science teacher--the greatest teacher I ever had--was a renaissance man named James Vagliardo. On the first day of class, in the first year, he asked all of his new students to give him a spoon with our name written on paper taped to the handle. We asked him why and what it meant, but he just smiled in a way that curled his handlebar moustache. "Everything is related to Everything" he would say.

Everything is related to Everything.

Computer programs do not run in a vacuum.

Everything affects Everything.

The buzzards flying at night, the Sterno drinker, the baby crying, the plane crash.

If you can't find the bug in your code within an hour or two then it's probably not in your code. It might be exacerbated by something in your code or correctable by changing something in your code but it's not in your code. And don't swing that analytic spotlight onto random targets, just let your brain work and do its thing.

Stand up, step away from your workstation, go for a walk, have lunch and think about something else; think about the beach you're taking the family to for the weekend, that pond you're going to dig for your garden, the birthday toys you should buy for your daughter, that football game from last night.

Impossible problems are always caused by things that would never occur to us to think about, but are related by the dumbest, stupidest connections. Therefore the only way to solve them is to think about dumb and stupid things. Your subconscious is very poor at analysis, but it is fantastic at connecting the illogical. Let it do its thing.

Lesson 11: "Something's wrong, it's not registering"   "Yes it is, sir, it's just registering double-zero, double-zero"

A null or zero result doesn't necessarily mean the diagnostic tool failed, it might mean that it's accurately recording a zero outcome.

Lots of diagnostic tools keep log files that, if anything else, can reveal the fact that they're working. If you don't get the memory dump you were expecting but the log file indicates that the tool kept running and making measurements, then maybe it's not because it failed to dump the memory, but that there wasn't anything to dump.

Lesson 12: "I heard! I heard! I've been... busy!"

Ruth, the scientist seen above, has epilepsy. It's brought-on by flashing red lights, like the computer's UI for "Zero Growth" in a petri-dish culture. It makes her blank-out and forget what she's just seen. She kept her epilepsy a secret because "insurance, prejudice, all that crap".

As much as 8% of your users--and your testers and developers--may be color blind, or can miss something else—just because. QA and beta testers miss things.

Just because the user, tester or co-worker didn't report a symptom doesn't mean it didn't happen.

Lesson 13: "Jeremy, these are biological warfare maps!"

Major Hollywood movies go to the n-th degree for the sake of sellable drama--everything endangers the whole world! But while your assignment is unlikely to be as severe it's still subject to politics and business.

I work in a field where all of my code must be audited by a government regulatory body, which means I can't do anything so clever that the regulator can't figure it out, because they'll just reject it and I have to re-write it all over again. 

So everything has to read like a bed-time story, with lots of inline documentation (comments in the code), and meaningful names for variables and functions. 

You cannot have maintainable code if you hide its purpose with either deliberate obfuscation or lzy fnctn nms and sht. We've got more than 80 characters per line, now. Code must make it clear what it is and what it does.

Lesson 14: "The defense system is perfect, Mark, it'll even bury our mistakes!"

Lorem ipsum.

Lesson 15: "Cut the panel off! Cut it off!"

First of all, let me say that
Anything Is Appropriate As Long As You Know What You Are DoingTM

When the wise man came down from the mountain at the beckoning of the frightened villagers, he threw apart his arms and sang to the heavens "Experience Trumps Procedure!"

The final lesson to learn from the movie is to drop protocol if and when your best judgement says so. This goes for trademarked methodologies, books, blogs, tutorials (including this one), and user manuals. Anything beyond that (corporate policy, law, etc.) is your call and you take the risk.

Your lesson is that experience is expensive and it costs a fortune to get to the point where you can exercise this lesson: if you don't have a grey beard--if you don't have at least 10 years of programming and debugging experience under your belt--then don't even think about breaking the rules; you're not wise enough, yet.

This lesson isn't for those who've already learned it, it's for those who shall, someday.