What are the best pieces of advice to remember when you start debugging code?
1. Take Error Messages with a Grain of Salt
- Why it's good advice: Error messages often point to symptoms rather than the root cause of a bug. Sometimes, they show where the error manifested but not where it originated. Interpreting error messages should involve careful thought, especially if the message is vague or non-specific.
- How it helps: By encouraging trainees to approach error messages skeptically, you’re fostering a mindset where they dig deeper into the issue rather than rely solely on what the system tells them.
<aside>
💡
Check Assumptions Early: Not only should you be skeptical of error messages, but you should also make sure that you are looking into your general assumptions before you even start paying attention to the errors. Try to validate any assumptions (e.g., expected values of inputs, outputs, and intermediate steps) early. It’s easy to overlook basic mistakes like incorrectly formatted input data or incorrect initial configurations.
</aside>
2. Read the Error from the Bottom Up
- Why it's good advice: The last message in a stack trace often shows where the code failed (the point of failure), while earlier lines may show the steps leading to the failure. This is especially helpful when the error occurs deep within a chain of function calls or within a large framework.
- How it helps: Starting from the bottom often gives a more immediate clue of where things went wrong. It can help avoid getting bogged down in irrelevant information at the top of the stack trace.
<aside>
💡
- Use Logging and Debuggers: While following the flow of the data, add logging or breakpoints to inspect what’s happening at critical points in the pipeline or scripts. This way, you can inspect intermediate steps, which can often reveal where things are going wrong.
- Start with Simple Tests: Before delving deep into debugging complex systems, sometimes it's useful to break the problem down into smaller, manageable tests or even unit tests, especially if the pipeline allows it.
</aside>
3. Follow the Flow of Data from Input to Output
- Why it's good advice: Debugging often involves verifying that data flows correctly through the entire system. By tracing the data from where it enters the system (input) to where it leaves (output), you can spot where data might be mishandled, corrupted, or transformed incorrectly.
- How it helps: This process mimics how a program is executed in practice and ensures that every transformation or computation is inspected in the correct sequence. It’s a logical way to identify issues without making guesses.
<aside>
💡
Testing code end-to-end: This advice is perfect for identifying where the issue occurs in larger systems. Testing from the start to the end of a process helps confirm that the issue isn’t just a localized bug but a systemic one that spans multiple parts of the application.
</aside>
4. When Debugging Pipelines, Start from the Pipeline and Dive into the Scripts
- Why it's good advice: Debugging a complex system, like a pipeline, is trickier since it involves multiple interconnected components. By focusing first on the pipeline’s higher level (its structure), you ensure that the correct data is being passed between different scripts and stages. From there, drilling down into each script, following the flow of information and order of execution, helps isolate the problem systematically.
- How it helps: Complex systems require a methodical approach. If you debug in a top-down manner, starting with the pipeline itself, you can ensure that you’re looking in the right places. It prevents wasting time debugging individual scripts that may be functioning correctly but are fed with bad data due to an earlier issue in the pipeline.
<aside>
💡
- Systematic approach to debugging: By promoting a structured method (e.g., following the flow of data), you help avoid chaotic debugging sessions. This ensures consistency in problem-solving and better results, especially for those new to debugging.
- Iterate Gradually: If possible, make changes in small increments and test after each step. This helps confirm when a bug is introduced or fixed, which is especially useful when working with a multi-stage pipeline.
</aside>