Note: This is an unfinished "slush-pile" article. It may change drastically or be deleted in the near future.
Michael Crighton is famous for Jurassic Park, but when he was a med student he wrote the greatest tutorial ever made for debugging computer programs. Later it was made into a movie by Robert Wise--director of The Day The Earth Stood Still, The Sound of Music, and Star Trek: The Motion Picture--with special effects by Douglas Trumbull: mastermind behind the "Stargate" sequence in 2001: A Space Odyssey and the special effects of Blade Runner. The Andromeda Strain was big-budget for its time and followed the book closely enough to preserve the key lessons we're about to cover below. In essence, the plot is about finding an extraordinary extra-terrestrial bug and figuring out how to deal with it, and it was made with so much attention to detail and procedure that I skirted corporate policy and snagged a couple of lunch-breaks to screen it for my department full of young programmers. Got called to the carpet for it, too.
The re-make by A&E was horrible and you shouldn't watch it.
A classic problem with bugs, depending on how deep they are in the code, is their tendency to have multiple or mis-directing symptoms. For example, a date arithmetic bug might affect financial calculations (closing quarters too early, amortizing over too many or too few compounding periods, etc.), or customer service (reporting the wrong business hours for a particular day), or system stability (running scheduled tasks at the wrong time).
Code is built on code and deep bugs are discovered by recognizing patterns on the surface. Recording everything you know about a bug gives you and your team a chance to see those patterns line up and point, like an arrow, to the source.
Buzzards flying at night, wounds that don't bleed, five quarts of blood turned to powder, such sights are suspicious as hell. Even if you don't know how a phenomena might be connected to a bug you should slot it in the back of your mind or write it on a slip of paper. Your subconscious will put it all together, eventually.
Management is frequently made of people who don't understand or appreciate programming or debugging, and have a particular allergy to time and expense overruns. However, their life revolves around documentation, for it affords delegation, buck-passing and plausible deniability. If you are faced with Management Decisions that hinder your work then document the crap out of what you're doing.
Users can rarely tell you what they did to expose a bug, but they sure know one when they see it, and only sometimes file a report. More often than not they will take bizarre and even self-destructive steps in response; keep rebooting, save after every few keystrokes, drink Sterno. Many don't understand that it's a fixable problem and will assume that it's either something they did wrong or the uncontrollable whim of the gods.
But the things users do will have recordable effects: files, log entries, mutated database records, many of which have timestamps. Anything users do will take time, and what computers do tends to happen much faster, so you can see the difference between a user's activity and an algorithm's activity based on how close they occur on the timeline.
Resist the urge to treat a symptom of a bug before you know what caused it. "Maybe eating is part of the disease process!" Even if the solution seems obvious and the need to supply it is dire, you're gambling with entropy: maybe your quick "obvious" fix will not only erase the information you need to diagnose the problem correctly, but also destroy the data you'd need to revert to a stable state.
It's unlikely you'll need to go to your office naked, but the tools you take into your debugging environment should. On Twitter I use the "Works On My Machine" logo as a humorous reminder of #1 cause of irreproducible bugs: your workstation is set up just so and you obliviously wrote your program to suit it.
Many software companies keep a zoo of devices that their products are sold to run on and keep them clean of any detritus that developers tend to pick up in their own personal workstations: custom keyboard shortcuts, browsers, scripts, doodads, viruses, drivers for your USB missile launcher, etc. Another common practice is the machine image, which can be copied to a DVD and used to restore a real or virtual machine to a known and sterile state. The bug may be caused by an interaction with another piece of software or customized settings--which you can introduce as you eliminate other causes--but it's far easier to isolate the bug in your code if you're sure that nothing else is interfering.
Lesson 8: "The only way you might possibly break your suit is with a scalpel, and a surgeon isn't likely to do that"
Don't take it for granted that you, as a professional programmer, can be so careful as not to make a bad situation worse.
The garden variety bug tends to fall quickly to the talents of a good programmer, even as he takes shortcuts. It's the nasty, persistent, near-invisible bugs that laugh at your expediency.
One of the specific talents that make great developers so much more productive than the average hunt-n-pecker is their ability to accurately intuit what a bug probably is and take shortcuts in their diagnosis to find them. If you haven't got to that point, yet, then the only way to acquire such skills is to be grindingly thorough with the help of Computer Number 1.
If you haven't been a programmer for at least 10 years then don't take shortcuts when diagnosing bugs. If you've been a programmer for more than 10 years and you get stumped, then you probably don't need me to state the obvious, but I will anyway: go back to being a noob. Sweat a little bit more. Start at 100 angstroms and work your way bigger.
Lesson 10: "You're saying Stone's Ninety Million Dollar Facility was knocked out by a sliver of paper?"
This is the most informative scene that was ever shot in Cinema history:
It makes the "bone-to-spaceship" scene in Stanley Kubrick's 2001: A Space Odyssey look like an accident in the editing room.
"These were highly trained electronics men, Senator, looking for an electronic fault. The trouble was purely mechanical of the simplest kind, but for them it was like trying to see an elephant through a microscope. The sliver had peeled from the roll and wedged between the bell and striker, preventing the bell from ringing."
Stop what you're doing, go get a cup of tea, sit down somewhere comfortable and think deeply about this scene. It is the most important scene a programmer can ever see in a movie. It is the most applicable lesson an engineer will ever learn in their life.
You will know who the experienced programmers are in your audience when this scene appears and they yell, punch the air, slap their knees, jab a finger at the screen and say "now that's what I'm talking about!"
If I had 2 hours to teach a new programmer everything I knew about problem solving, then I'd show them this movie and pause it right after this scene, rewind and replay it as many times as it took until they got it.
When I was in High School my Computer Science teacher--the greatest teacher I ever had--was a renaissance man named James Vagliardo who introduced me to Douglas Hofstadter's Godel, Escher, Bach. On the first day of class, in the first year, he asked all of his new students to give him a spoon with our name written on paper taped to the handle. We asked him why and what it meant, but he just smiled in a way that curled his handlebar mustache into a mischievous question mark at the edge of his grin. "Everything is related to Everything" he would say, making it his catchphrase.
Everything is related to Everything.
Computer programs do not run in a vacuum.
Everything affects Everything.
The buzzards flying at night, the Sterno drinker, the baby crying, the plane crash, everything.
And to make sense of it all you have to deliberately not try to make sense of it all, because your brain is divided into parts that make sense of details and parts that make sense of intuitions; the way hard problems are solved is by giving the passive side--the intuitive side--enough time, space and information to do its job and give you the answer, like it was writing on the back of a business card and sliding it under your newspaper.
If you can't find the bug in your code within an hour or two then it's probably not in your code. It might be exacerbated by something in your code or correctable by changing something in your code but it's not in your code. And don't swing that analytic spotlight onto random targets, just let your brain work and do its thing.
Stand up, step away from your workstation, go for a walk, have lunch and think about something else; think about the beach you're taking the family to for the weekend, that pond you're going to dig for your garden, think about the birthday toys you should buy for your daughter, think about that football game from last night.
Impossible problems are always caused by things that would never occur to us to think about, but are related by the dumbest, stupidest connections. Therefore the only way to solve them is to think about dumb and stupid things. Your subconscious is very poor at analysis, but it is fantastic at connecting the illogical. Let it do its thing.
Lesson 11: "Something's wrong, it's not registering" "Yes it is, sir, it's just registering double-zero, double-zero"
A null or zero result doesn't necessarily mean the diagnostic tool failed, it might mean that it's accurately recording a zero outcome.
Lots of diagnostic tools keep log files that, if anything else, can reveal the fact that they're working. If you don't get the memory dump you were expecting but the log file indicates that the tool kept running and making measurements, then maybe it's not because it failed to dump the memory, but that there wasn't anything to dump.
Ruth, the scientist seen above, has epilepsy that can be brought-on by flashing red lights. It makes her blank-out and forget what she's just seen, but she has a deeply personal reason to keep it secret: "insurance, prejudice, all that crap!"
As much as 8% of your users--and your testers and developers--may be color blind, and some of them won't even know they are. Some of them do not want you to know they are. Just think: if you couldn't see the color arbitrarine, but everyone else could, and they took it for granted so much that it never occurred to them to ask you about it, couldn't you easily go through life without ever knowing? What if you did find out, but realize it'd abridge your career?
You've played around with random number generators. You've dabbled with a few genetic algorithms yanked from the back-pages of Dr. Dobb's Journal. You remember your high-school biology lessons on basic genetics. So plot that against humanity: your customers, your users, your co-workers, yourself might have inherited a condition that you're either oblivious to, or you're afraid of disclosing. Others are likely to have similar problems.
Just because the user, tester or co-worker didn't report a symptom doesn't mean it didn't happen.
Major Hollywood movies go to the n-th degree for the sake of sellable drama--everything endangers the whole world! But while your assignment is unlikely to be as severe it's still subject to politics and business.
First of all, let me say that
Anything Is Appropriate As Long As You Know What You Are DoingTM
When the wise man came down from the mountain at the beckoning of the frightened villagers, he threw apart his arms and sang to the heavens "Experience Trumps Procedure!"
The final lesson to learn from the movie is to drop protocol if and when your best judgement says so. This goes for trademarked methodologies, books, blogs, tutorials (including this one), and user manuals. Anything beyond that (corporate policy, law, etc.) is your call and you take the risk.
Your lesson is that experience is expensive and it costs a fortune to get to the point where you can exercise this lesson: if you don't have a grey beard--if you don't have at least 10 years of programming and debugging experience under your belt--then don't even think about breaking the rules; you're not wise enough, yet.
This lesson isn't for those who've already learned it, it's for those who shall, someday.