Debugging

Debugging is an art and not a science. There is no step-by-step procedure or a single recipe for success when debugging a problem. Asking the following questions can help to understand and identify the nature of the problem and how best to solve it:

Is the problem easily reproducible?
Is there a reproducer or test that can trigger the bug consistently?
When the bug is triggered, are there any panic, error, or debug messages in the dmesg?
Is reproducing the problem time-sensitive?

When developing software in an embedded environment, the most likely scenario when testing a new hardware interface is… nothing happens. Unless things work perfectly, it is difficult to know where to begin looking for problems. With a logic analyzer, one can capture and visualize any data that is being transmitted.

For example, when working on software to drive a serial port, it is possible to determine whether anything is being transmitted, and if so, what. This becomes especially important where the embedded processor is communicating with an external device - where every command requires a transmitting and receiving a specific binary sequence. A logic analyzer provides the key to observing the actual communication events (if any!).

One of the most useful techniques for debugging software is to print messages to a terminal.

Serial communication through USART (Universal Synchronous Asynchronous Receiver Transmitter), in order to access a terminal on the board.

Asynchronous serial - debugging

How to approach debugging?

Make sure that you are working on code that is built cleanly—without warnings
You need to gather all the relevant data - in some cases, you may need to watch the user who reported the bug in action to get a sufficient level of detail
The best way to start fixing a bug is to make it reproducible. After all, if you can’t reproduce it, how will you know if it is ever fixed? Failing Test Before Fixing Code - be able to reproduce the case with one command
Read the Damn Error Message
- Bad Results - What if it’s not a crash? What if it’s just a bad result? Understand the definition of fault, error, and failure (from ECE 716)
Use a debugger - ensure you also see the incorrect value in the debugger
- Make sure you know how to move up and down the call stack and examine the local stack environment
- Keep notes of your process
Sensitivity to Input Values - get a copy of the data that makes your program fail, and make sure that it fails in your environment as well
Regressions Across Releases - look back into your release git history and see what release was the last one that worked fine and see what changes were introduced right before the bug
Use binary chop (binary search) - with the stack trace errors log, the releases between the current one and the last working, and with input data
Logging and tracing
- Debuggers generally focus on the state of the program now. Sometimes, you need more—you must watch the state of a program or a data structure over time.
- Tracing statements are those little diagnostic messages you print to the screen or to a file that say things such as “got here” and “value of x = 2.”
Rubber Ducking - A very simple but particularly useful technique for finding the cause of a problem is to explain it to someone else.
Don’t Assume It—Prove It Don’t gloss over a routine or piece of code involved in the bug because you “know” it works. Prove it. Prove it in this context, with this data, with these boundary conditions. - basically, what you learned at ECE 653

Systematic debugging:

Debugging
Tracing: is a specialized use of logging to record information about a program’s execution.
Profiling: measuring an application or system by running a profiler analysis tool. Profiling tools can focus on many aspects: function call times and count, memory usage, CPU load, and resource usage

Debugging Tools and Techniques

Software

Tracing - print statements to the terminal, logs
GDB
Kernel - KGDB
Stack trace analysis (kernel - scripts/decode_stacktrace.sh)

Hardware

Oscilloscope
Logic Analyzer

Debugging Checklist

checklist

Frequently Asked Questions

How do you approach debugging? System debugging

The 13 Golden Rules of Debugging by Sebastian Fischmeister

Understand the requirements: make sure you build the right program, have the right environment and tests.
Make it fail: find a clear error state; will also serve as an input for regression test; example: assert statements; with timing error, place a guard that checks whether something has been completed.
Simplify the test case: complicated test cases complicate reasoning about the bug.
Read the right error message
Check the plug: check that all the connections are correct.
Separate facts from interpretetion: revisit the processor specification; recheck some of your assumed knowledge.
Divide and conquer: reduce the test case, test one assumption after the other.
Match the tool to the bug: learn special tools.
One change at a time: this is particularly deadly.
Keep an audit trail: use a version control system with local branching.
Get a fresh view: explain the code to your car - seriously!
If you didn’t fix it, it ain’t fixed: changing code that works can only introduce new bugs.
Cover your bugfix with a regression test: the easiest way to build up a test suite.