- Debugger: Getting mixed signals
- 'Packaging Infrastructure' Contract Weekly Report #4
- Haiku monthly activity report - 06/2015
- 'Packaging Infrastructure' Contract Weekly Report #3
- 'Packaging Infrastructure' Contract Weekly Report #2
- GCI 2014 winners trip report (mentor side)
- TeX Live and LyX; Changes to the boot code
- 'Packaging Infrastructure' Contract Weekly Report #1
- Beginning of 'Packaging Infrastructure' Contract
- Haiku monthly activity report - 05/2015
Using malloc_debug to Find Memory Related Bugs
There's plenty of ways to introduce subtle bugs into your code that give you a hard time finding and fixing. In this post I'd like to introduce you to malloc_debug, a heap implementation with added debug helpers, and outline how it can be used to find some of these problems.
Enabling malloc_debug
Since the malloc_debug heap implementation does a lot of unconditional error checking and validation it isn't used by default. Instead it is part of libroot_debug.so. To run an application with libroot_debug.so instead of libroot.so, which automatically makes the app use the debug heap, you need to export the environment variable LD_PRELOAD with a value of "libroot_debug.so". The easiest way to do so is to run your app from a Terminal like this:
LD_PRELOAD=libroot_debug.so myApplication
Helpful Kernel Debugger Output
In general when your app crashes or enters the debugger, the kernel will output relevant information to the syslog. This info most often is more to the point and easier to understand than what gdb will tell you in userland. Therefore I recommend to always keep a Terminal with the syslog output open while debugging things. The easiest is to run tail on the syslog like this:
tail -F /var/log/syslog
Bug: Using Uninitialized Memory
One of the things malloc_debug does for you is to always initialize memory blocks it returns to you to a known value: 0xcc. This helps you find cases where you use allocated memory uninitialized.
Common Occurrence
Commonly this happens when forgetting to initialize class members or structure fields.
Example
class TheClass { ... BWindow *fWindow; }; TheClass::TheClass() : fMemberA(0), fMemberB(NULL) { } ... TheClass::SomeMthod() { if (fWindow == NULL) fWindow = new BWindow(...); fWindow->Show(); ... }
In this example the fWindow member of the class was not initialized in the constructor. When it is later used like in the method SomeMethod(), the assumption is that fWindow happens to be initialized to 0. This may be the case, or it may not, it really is mostly random.
How To Spot
Since the data allocated by malloc() and friends as well as new (including the storage for your object members) is normally not initialized the results of running the buggy code depend on what happens to be in these memory ranges at the time you run it. When running without the debug heap this will usually manifest in random misbehaviour that changes from run to run, it sometimes crashes, it sometimes doesn't. With the debug heap the memory returned by the memory allocation functions is always initialized to 0xcc. This means that if you are comparing uninitialized memory to NULL for example all of these checks will now fail reliably and if you try to execute or access uninitialized pointers your application will reliably crash with a segfault. If you look at the syslog output of such a crash you can easily spot a line similar to this:
KERN: vm_page_fault: vm_soft_fault returned error 'Permission denied' on fault at 0xccccccd4
Recommendations
Always initialize all your variables. The only exception are static variables, because these are always initialized to 0. I personally still initialize them to make it absolutely obvious, but that's a matter of taste (or coding style policy).
Bug: Using Already Freed Memory
Another thing malloc_debug does for you is to always fill memory you return to the heap by the means of free() or delete to a different known value: 0xdeadbeef. This helps you find cases where you use already freed memory.
Common Occurrence
Most often this happens when holding pointers to memory blocks in different locations of an application. Like when keeping multiple lists and forgetting to remove a pointer from one of them.
It can also happen if a certain process runs through in an order that was not expected. For example event handlers being called while or after a target has already been freed (due to missing locking).
Sometimes it is also just a simple oversight like still accessing data inside an object just freed.
Example
element_data * unlink_and_free(linked_list_element *previous, linked_list_element *element) { previous->next = element->next; free(element); return element->data; } void function() { ... element_data *data = unlink_and_free(...); if (data->has_something) ... }
When reading closely this is pretty obvious. Still it can easily occur, especially when refactoring code. The thing is that code like this will often work just fine due to the allocator not necessarily doing anything with the memory and not giving it back out quickly enough. Crashes coming from such bugs can be very rare and therefore frustrating to analyze.
How To Spot
When freeing memory malloc_debug will overwrite the block you hand in before returing. The pattern used is 0xdeadbeef and when accessing pointers in freed memory blocks the app will crash with a segfault. The syslog will contain a line similar to this:
KERN: vm_page_fault: vm_soft_fault returned error 'Permission denied' on fault at 0xdeadbeef
Recommendations
Depending on the actual bug. If it is a concurrency issue introduce proper locking, if memory management seems to get difficult consider introducing reference counting or other more advanced memory management techniques. Sometimes is enough to take a good long look at the function the stack trace points to.
Bug: Double Free
Freeing memory twice, i.e. by actually using free() twice or by using delete on already deleted objects.
Common Occurrence
Most often this is just an oversight. It can happen easily when using self-deleting objects and then deleting manually again on exit.
Example
TheClass::Recycle() { ... delete this; } void main_function() { TheClass *object = new TheClass(); function_that_will_indirectly_cause_a_recycle_of_the_object(object); delete object; }
With that wording pretty obvious, usually not quite that easy to see.
How To Spot
A double free will cause the debugger to be invoked directly. The debugger message will be similar to this:
free(): address 0x18006000 already exists in page free list (double free)
Recommendations
Remove the superfluous free() / delete calls. If memory management gets difficult a solution might be to switch to reference counting or other more advanced techniques.
Bug: Misaligned Free / Free of Unallocated Memory
Using free() on an address that is offset from the originally returned address or using free() on an address that wasn't allocated by the heap at all.
Common Occurrence
Misaligned frees can happen when doing pointer arithmetic.
Example
void some_function() { char *string = strdup("hello"); while (string[0] != 'l') string++; ... free(string); }
In the example above the string variable has been advanced and therefore doesn't point to the same address the allocation returned.
How To Spot
A misaligned free will cause the debugger to be invoked. The debugger message will be similar to this:
free(): address 0x1800102a does not fall on allocation boundary for page base 0x18001000 and element size 20
address - (int((address - base) / elementSize) * elementSize)
free(): free failed for address 0x8000
Recommendations
Review the places where you do pointer arithmetic and make proper copies of the originally returned addresses where necessary.
Bug: Overwriting Memory Past the Allocation
When overwriting memory past the allocated size this doesn't necessarily lead to a segfault. If the write stays within the heap address range it will usually just corrupt whatever is overwritten. Sadly malloc_debug cannot tell the exact place where this corruption happens. In some cases it can however tell you that it did happen sooner than you would otherwise notice.
Common Occurrence
Most often the classic buffer overrun by writing more data into an allocated space than fits. It can also happen when (reinterpret) casting memory blocks to the wrong type and then using fields that aren't actually there.
Example
void some_function(const char *inputString) { char *buffer = new char[64]; strcpy(buffer, inputString); ... delete[] buffer; }
The function uses the unsafe strcpy() instead of strncpy() and therefore doesn't tell the function how much space is available in the buffer. Depending on the length of the input string, memory will be corrupted.
How To Spot
When not using interval based wall checking (see below) this form of corruption will be detected on free() / delete of the allocation where certain extra data stored on allocation is verified. Either of these two messages can occur depending on how far the memory has been corrupted:
someone wrote beyond small allocation at 0x18003050; size: 64 bytes; allocated by 9084; value: 0x18003000
If the overwriting progresses further it is possible that another debugger call happens:
leak check info has invalid size 1633771873 for element size 80, probably memory has been overwritten past allocation size
Recommendations
Check for places where unsafe string handling is done like with strcpy, strcat, sprintf and replace by the safe counterparts (strncpy, strlcat, snprintf), review sizes to memcpy and memset. Maybe consider to use higher level abstractions for string handling like BString or C++ strings.
Advanced Use of malloc_debug
There are also features present in malloc_debug that aren't enabled by default because of their performance overhead. These features include interval based wall checking and heap validation. There are also reporting features that allow you to request that heap info is dumped for you to manually analyze.
Accessing Advanced Features
The malloc_debug functions are part of the malloc_debug.h header present in the posix header directory (it might not be there yet depending on how recent your installation is, it was introduced in r35431. Note that you can't just copy that header and use these features, as the API was redesigned when adding the header so the functions aren't compatible). Since this API is not present in the normal libroot, you need to explicitly link against libroot_debug.so when building your app with these function calls. So to use:
#include <malloc_debug.h>
-lroot_debug
Interval Based and Manual Wall Checking
Wall checking can be done for you at a certain interval automatically, or you can trigger wall checking manually if you already have a more specific suspicion.
extern status_t heap_debug_start_wall_checking(int msInterval); extern status_t heap_debug_stop_wall_checking();
These functions are used to start and stop interval based wall checking. The start function takes an interval in milliseconds (so a value of 1000 will cause a wall check every second). Note that these checks have a certain overhead, so you might not want to use them with too small an interval.
extern void heap_debug_validate_walls();
This will trigger a manual validation of the wall values of all allocations. It does the same as the interval based wall checker does.
When either of these methods detect that a wall value has been overwritten a debugger call will be triggered. The output line is the same as for wall checking on free (see above).
Paranoid and Manual Validation
The heap has extensive validation functions which pretty much validate every aspect of the internal heap implementation. If memory corruption is going on and hits any of the data used by the allocator validation will most likely detect it. This feature was mainly used during the heap implementation, but since it can also uncover random memory corruption this has been made available.
extern void heap_debug_set_paranoid_validation(bool enabled);
Using this function paranoid validation can be enabled and disabled again. By default it is disabled. Paranoid validation means that after every heap operation (allocation, reallocation, free) a full validation of the corresponding heap will be done. Note that this is very performance intensive, especially when the heap usage gets bigger.
extern void heap_debug_validate_heaps();
With that function a manual validation of all heaps can be triggered (internally the heap implementation uses different heap classes for the different allocation sizes). If you suspect a specific code part to be problematic you could for example run a wall check and a validation after running through that code.
Dumping Heap and Allocation Info
There are two functions that can be used to dump allocations done by the application and to dump general info about the heap. Note that the dumped info also include allocations done by the system during startup of the application and by the system classes used by the application.
extern void heap_debug_dump_allocations(bool statsOnly, thread_id thread);
The heap keeps certain info when allocating memory. Currently this is only the allocation size as well as the allocating thread. In the future this might be extended by stack traces. Using this function the allocations can be dumpped to stdout. If statsOnly is true it will only print a few stats and not all allocations. If thread is >= 0 it is interpreted as a filter and only allocations done by that thread are dumped. If you want to dump all allocations just provide -1 as the thread argument.
extern void heap_debug_dump_heaps(bool dumpAreas, bool dumpBins);
Dumping the heap info gives an idea about the internal structure of the heap and how it is currently used. Usually this info is of little value to the app developer directly and it was added mostly for completeness' sake.
- mmlr's blog
- Login or register to post comments
