Problem Solving Tips From the Trenches

Two weeks ago, my colleague and I did battle with a rather insidious bug related to a ODBC database connections failing. It delayed our software release by a week, but we managed to find a workaround/solution and there was much rejoicing. The root cause was a defect in a DB2 driver where the act of installing a windows service caused it to drop connections that were established in other processes on the same host. Yikes.

Solving problems like these can be exhausting but it's incredibly important to master on your software engineering career journey. I certainly have not yet mastered this yet, but I thought I'd share my hard-earned experiences here. These tips apply to any bugs on any wavelength of the difficulty spectrum:

Instrument properly

You won't get far if your application code isn't set up with adequate logging. This can be done many ways and there are lots of foot-guns so my advice is to focus on simply providing yourself with enough meaningful information at the right time with the right context.

You also must understand how your host OS and its services/daemons log things in addition to your own application. Ignore logs from your OS or other dependencies at your own peril.

Eliminate the usual suspects

As you gain experience with problem solving / debugging you will build up your own list of outlier things to check first. Eliminate these usual suspects first to save yourself time and suffering. Here my go-to things to rule out:

Prioritize reproducibility, eliminate variables, and inventory assumptions

Once you have eliminated the usual suspects the next step is to do everything in your power to ensure you have repeatable steps to reproduce the issue. This almost always involves the systematic elimination of variables one-at-a-time, which is the core skill in any kind of debugging. Don't ever get tempted to alter several variables at once for the sake of time; for me it's always been a net negative.

The variables you hone in on need to be scrutinized by using occam's razor. If you find yourself googling for an obscure error message and find no results, it means that nobody in the entire universe has ever had this problem and you are probably barking up the wrong tree.

You should also document any assumptions you are making along the way. For example, it's easy to assume that your issue doesn't depend on which Linux distro you are using but this is not always true. As you progress though eliminating variables, revisit your key assumptions and consider them as variables to eliminate.

Prepare to hit brick walls

Just like learning computer programming, debugging this stuff usually involves dead ends, rabbit holes, and failed experiments. It's incredibly important to have the right frame-of-mind when you encounter such things and take appropriate action. Here is what works for me:

You are not an electrical engineer designing circuit boards dealing with physical constraints. Your constraints are mostly in your mind so it's incredibly important to learn emotional intelligence, avoid traps like the sunk cost fallacy, and figure out what works best for you.

Ask for help

Your ego should not be so fragile that you take things personally but also should not be so big to prevent you from asking for help. The best engineers I have every worked with knew precisely when to ask for help. This may mean asking a friend, posting a question on Stack Overflow, or contacting a vendor's support team. Try to master this timing by looking back on your own experiences and asking yourself if reaching out for help sooner would have been the right thing to do (or not).

We software engineers operate on top of many layers of abstraction. Your problem may be caused by a layer you don't understand, and you can never be expected to deeply understand all of them. While it's often good to have some level of understanding of the layers beneath yours (to give you hints and instincts), you will need to seek help from experts in their own respective layers.

Communicate with the outside world appropriately

The context in which you are trying to solve these bugs matters a lot. Your blood pressure readings will be different when fixing a bug in a prototype app versus a mission-critical business application with no safety net on Friday at 6:00pm. Who is affected by the issue and how to manage their expectations is very important, almost as important as how efficiently you resolve it.

People are surprisingly understanding when it comes to technical glitches. Things happen, especially in the complex world we live in. They primarily want to know that it is being worked on and, once fixed, that steps ae being made to prevent it from happening again. So post something on your status page (or whatever appropriate channel) and follow up with a postmortem once everything is fixed. You may be tempted to move on with life once the issue is fixed (because it can be exhausting) but taking the time to communicate effectively will maintain the trust you are working hard to build with your users and colleagues.

Too often I see vendors magically fix critical issues with no acknowledgement or explanation. This erodes my confidence in their ability to manage their own systems and makes me think they aren't interested in learning from their mistakes.

It's also important to properly deal with the knowledge you acquired while solving the issue when you are a member of an engineering team. That could be writing up a root-cause fix to be prioritized later or a document on how to resolve an issue if it comes up again. The worst thing you can do is hoard this knowledge so that you're the only person who can fix it in the future. Doing so is unprofessional and whatever job security you think you will get will be negated by your inability to take sick/vacation time and your team's inability to make progress because you will become a process bottleneck.


There are many things to master when it comes to solving software bugs in the real world. Having a systematic/scientific approach, using common sense, maintaining a healthy mindset, knowing when to ask for help, and communicating effectively are important skills to master.

To quote Bill Paxton from the 1986 movie Aliens: "Is this going to be a stand-up fight, sir, or another bug hunt?"

We will all face many bugs in our travels, some of which will push us to our limits... And there will always be more. So be prepared, take ownership, apply what you learn, and enjoy the challenge.

[ Archive · Home ]