Memory – Part 5: Debugging Tools

Introduction

Here we are! We spent 4 articles explaining what memory is, how to deal with it and what are the kind of problems you can expect from it. Even the best developers write bugs. A commonly accepted estimation seems to be around of few tens of bugs per thousand of lines of code, which is definitely quite huge. As a consequence, even if you proficiently mastered all the concepts covered by our articles, you’ll still probably have a few memory-related bugs.

Memory-related bugs may be particularly hard to spot and fix. Let’s take the following program as an example:

This program is supposed to take a message as argument and print “hello <message>!” (the default message being “world”).

The behavior of this program is completely undefined, it is buggy, however it will probably not crash. The function build_message returns a pointer to some memory allocated in its stack-frame. Because of how the stack works, that memory is very susceptible to be overwritten by another function call later, possibly by fputs. As a consequence, if fputs internally uses sufficient stack-memory to overwrite the message, then the output will be corrupted (and the program may even crash), in the other case the program will print the expected message. Moreover, the program may overflow its buffer because of the use of the unsafe sprintf function that has no limit in the number of bytes written.

So, the behavior of the program varies depending on the size of the message given in the command line, the value of MAX_LINE_SIZE and the implementation of fputs. What’s annoying with this kind of bug is that the result may not be obvious: the program “works” well enough with simple use cases and will only fail the day it will receive a parameter with the right properties to exhibit the issue. That’s why it’s important that developers are at ease with some tools that will help them to validate (or to debug) memory management.

In this last article, we will cover some free tools that we consider should be part of the minimal toolkit of a C (and C++) developer.

Debugger

The first of these tools is the debugger. On Linux this will probably be gdb. Most developers know at least the basics of gdb: inspecting a backtrace (bt, up, down, frame <id>, …), adding a breakpoint (break <function|line>, continue, …), executing step-by-step (step, next, fin, …), inspecting memory (print <expr>, call <func>, x/<FMT> <addr>, …), … The debugger is the tool of choice of most developers in the case the program crashes with a segmentation fault.Then the debugger will automatically catch the signal and allow inspecting the state of the program at that instant. A lot of segmentation faults are obvious (uninitialized pointer, NULL pointer dereference, …) and require little work from the debugger.

Less known however, is the ability to place a watchpoint: adding a dynamic breakpoint that will interrupt the program every time the result of an expression changes. This is extremely useful to detect the origin of a memory corruption: place a watchpoint on the content of the memory that get corrupted and as a consequence the program will be interrupted each time the content of that memory changes. This has very little impact on the performance of the program because, as long as you don’t want to monitor too much memory addresses, the watchpoint is managed directly by the hardware.

Take back the example given in the introduction: we do run fputs that prints the content of the pointer given as its first argument, however, the actually printed string is not the one we wrote in build_message. Here is a small debugging session:

  • First we set a breakpoint on build message and check that sprintf properly built our message

  • In order to be notified whenever the message gets modified, we place a watchpoint on the content of the first character of the string and we let the program continu. The debugger lets us know that it successfully put a hardware watchpoint, which is nice, because a software watchpoint would have a more noticeable impact on the overall performance.

  • The watchpoint interrupts the execution of the program. The debugger prints the old and the new value and we can easily inspect the program. A quick look at the backtrace lets us know that we are somewhere in the code of the dynamic linker (probably during the resolution of the symbol for fputs).

Here, the debugger tells us where the memory gets changed, however understanding the issue requires some understanding of what is going on. The debugger provides raw information, the developer remains in charge of the analysis. More generally, the debugger is a good tool whenever you know what to look for.

valgrind

valgrind is some kind of swiss knife of the C/C++ developer. It provides various tools such as a memory checker (memcheck), a memory profiler (massif), a cache profiler (cachegrind), a CPU profiler (callgrind), some thread checkers (helgrind, DRD, tsan), …

valgrind is basically a virtual machine that monitors every interaction with the operating system and the virtualized hardware. In order to achieve this, it takes an unmodified executable and wraps every single CPU instruction and every system call with instrumented version. It is extremely configurable: you can define the exact desired behavior of your virtual machine: the number of cores, the size of the caches, the behavior of the system calls (for system calls whose behavior varies from one kernel version to another)… The main drawback however, is that since the code is not directly executed, valgrind has an important overhead and cause a substantial slowdown that varies from 5x to 50x, depending on the tool and the chosen options.

Running valgrind is easy. It requires no modification of your program or of your build system in order to work (however, it can benefit from making some code valgrind-aware). The most basic incantation is just: valgrind --tool=<toolname> <yourprogram and arguments>.

memcheck

memcheck is the default tool of valgrind. It’s a memory checker that tracks every single memory access and allocation looking for management errors such as:

  • accessing not allocated memory
  • making the program behavior depend on non-initialized memory
  • leaking some allocated memory

To do this, the first thing memcheck does is to maintain a registry of all allocated memory. Every time a new chunk memory is allocated memcheck keeps track of it by remembering the returned pointer, the amount of memory allocated as well as the backtrace from which it has been allocated. Additionally, it adds some redzones around the allocated memory that cannot be allocated in order to easily detect out-of-buffer accesses.

Needless to say it will also catch every single deallocation in order to keep its registry up to date. The deallocation does not immediately remove the entry from the registry, it marks it as deallocated and remembers the deallocation backtrace. By putting deallocated memory in quarantine, it ensures that use-after-free accesses can be caught as such since that memory cannot be reused for other purpose too rapidly.

At the end of the execution of the program, memcheck will dump its registry: every entry that is not marked as deallocated is a leaked allocation. The report of leaked allocations is associated with the information whether the memory is still referenced or not. Memory that is not pointed anymore by the program is considered as definitely lost.

Additionally, for every single allocated byte, memcheck also maintains an initialization state: memory is considered initialized if, and only if, its value is the result of the computation that uses only initialized bytes. As soon as a non-initialized byte is used in a computation, the result of the computation is undefined and if the program behavior depends on that result, its own behavior is considered undefined.

Overall, at the cost of a massive slowdown and some memory overhead, memcheck detects most dynamic-allocation related errors. However, it’s far less efficient in detecting errors in code that uses static memory or stack-allocated memory because memcheck has very few insights on the internals of the program: it does not know about the various variables that are put on the stack and thus cannot check that you are not overflowing from a stack-allocated buffer on a nearby variable.

A good standard is to impose that every written code be memcheck-clean (or valgrind-clean): a program is not good enough if it produces some errors when run within valgrind. That does not guarantee the program is bug-free, however it ensures that the allocations are well-down. However, that standard is often hard to reach because, for real-life programs, the slowdown of memcheck reaches 40x which makes it almost impossible to run too often. Thankfully, tools such as ASan (covered later in that post) can be used for this purpose.

The documentation of memcheck is full of small examples, so let’s stop paraphrasing the upstream documentation and see what memcheck produces on our small buggy program:

This is a bit more meaningful that the debugging session in gdb. It tells us that fputs is calling strlen (which is quite obviously needed to compute the length of the string it should print), but that strlen reaches some memory that is just below the stack pointer (it actually gone two bytes below the stack pointer). This will still require some analysis, but this time it is quite easy: we are computing the length of a string that is on the same memory as the stack, but that seems to be partially outside of the stack.

A last useful trick with valgrind is its ability to interact with a debugger. Start you program with valgrind --db-attach=yes <yourprogram>. Then every time memcheck reports an error, you’ll be asked whether or not you’d like to debug that error in a debugger.

massif

massif is a different kind of tool, it is a memory profiler. It also tracks memory allocations and deallocations, but instead of checking every memory address, it builds a timeline of memory usage. For some chosen moments of the program (such as the moment at which the program had the higher memory usage), it keeps the count of allocations for every single backtrace.

At the end, it dumps the report, by default named massif.out.<pid>. The report is a list of snapshots of the repartions of the memory allocations. It’s hard to process manually. However some tools such as ms_print produce reports easier to understand. The output of ms_print starts with an ASCII-art histogram that visually shows the memory usage:

The # column represents the peak of memory usage, while the @ columns are the detailed snapshots available in the report. If your report looks like this one, you probably have a memory leak in your program, and you should consider fixing it.

The diagram is followed by a table containing the memory usage at every snapshot. It looks like this:

The first 14 lines here are simple snapshots with only the report about heap-consumption, while the line number 14 is followed by the detailed report about allocation. We can see that at that point, most of the allocated memory was the consequence of the configuration loading.

Address Sanitizer

Address Sanitizer (or ASan) is a much more recent tool. It has been initiated by Google in order to provide good memory checking tools without the performance drawback of memcheck for large projects such as WebKit or Chromium. ASan still slows the program down, but by a factor 2, not 40. The tradeoff however is that ASan won’t detect errors such as uses of uninitialized variables or leaks that memcheck can detect, but on the other hand it can detect more errors related to static or stack memory. ASan was first introduced in LLVM/clang 3.1 and has since made its way to GCC with GCC 4.8.

ASan is a pair of tools: first a compiler pass and second a runtime. The runtime of ASan allocates a shadow memory: a huge chunk of RAM that it used to store a single byte for every 8byte word of memory. By default all the memory has its shadow bytes set to 0 which means it is not accessible. Then, when memory gets allocated, the shadow bytes are set to some other values that bring information about which bytes of the word are allocated, who allocated them, … It also overloads the allocators in order to be able to track the allocations and deallocations of memory. Just like memcheck, it will put deallocated memory in a quarantine in order to be able to detect use-after-free accesses.

Then each time a memory access is performed, the runtime will check the values of the associated shadow bytes and if the access is disallowed, ASan will abort the execution of the program: ASan crashes the program on the first error, this forces the program to be ASan-clean.

Overall, the runtime of ASan is less-feature complete than valgrind: it won’t be able to detect memory leaks or access to non-initialized memory. However, most of the power of ASan comes from its compiler-side component. The fact that ASan is intrusive may seem inconvenient, however this allows some closer integration with the program itself. On the other hand, it will only check code that has been instrumented, and won’t be able to catch errors that occur in third party libraries (for example, in the libc).

The main role of the compiler pass is to wrap every single memory access in a small branch that will check that the access is allowed by checking the content of the shadow memory. But, since it is in the compiler, it has access to a lot of information such as, what memory we are accessing, what is the layout of the variables (or the structure members), … and it can also alter all of this. And that is where ASan shines: it can add redzones between global variables or between variables that are put on the stack in order to make bad accesses to those variables easy to detect.

ASan can detect both issues of our example, however since the issue occurs only in functions of the libc, this will not happen as-is. At Intersec, we have our own implementation of sprintf, which cause it to be instrumented by ASan with the program. Here is the output of ASan with a too-long string passed as argument (after running asan_symbolize.py on the output to get the symbol names):

Doing the same thing with a short string and a reimplementation of fputs gives the same kind of result:

Still, as seen in the previous examples, this does not provide anything more than a hint, not a full answer to what is wrong with the program.

Conclusion

Memory is a fundamental resource for any computer program, but it is hard to understand and manage. Tools exist to help the developer and the system administrator but their output requires some brain juice in order to be really meaningful.

This series of article tried to covere a large range of subjects, a lot more could be said (and a lot more as already been said by others). The topic we have selected is what we consider the minimal toolkit for both developers and system administrators, both in term of raw knowledge and for the comprehension of the various limitations. We just hope this has been helpful.