API Design is Hard, Finding Bugs (Can be Made) Easy!

Blog post by mmlr on Fri, 2011-12-23 19:34

Puh, time has passed again and the signals from my side might have been a bit confusing with only the last blog post in mind. Therefore I’m going to explain what provoked that flurry of seemingly unrelated commits and how the KeyStore API is coming along.

Starting Out with the KeyStore

Originally I started out with my mind set on the bigger topic of the KeyStore API and its backend in the registrar. As mentioned before I’ve started from a draft API which I intended to rework a bit to make it more flexible. The draft API based most things around BPassword, which was somehow mixing more generic things in to try and be much more than just about passwords. Reviewing that it wasn’t possible for me to shake off that feeling of “it’s a bit messy”. It looked as if the original idea was to have a relatively slim BPassword class that did passwords which then grew feature by feature until it wasn’t really about passwords anymore. My immediate thought was to separate things a lot more and divide the generic parts into a BKey base class with subclasses for the individual key types (passwords, certificates, possibly auth devices, etc.) to make things generic but still relatively easy to use. My thought was that an app using the API will still need a certain understanding of what kind of key it is working with (you can’t usually use a certificate to login at a web form; using passwords/WEP keys/WPA pre-shared-keys vs. certificates requires a rather different setup for the wpa_supplicant for things to work; things like that…). Therefore I figured it’d not make too much sense to try and put all the different methods into a single class. Still some parts like the generic meta data about a BKey (owner, creation time) should be shared and uniformly accessible, so there should be a common base class that ideally would also provide generic means to manage flat data that will be required by most subclasses.

The design I was going with therefore involves a generic BKey base class and (for now) a BPasswordKey convenience class that wraps the generic data store of BKey to easily set and retrieve strings (passwords, WEP keys, pre-shared-keys for example). The existing BPasswordType that previously indicated the use of a password (web, network, volume) was split into BKeyType and BKeyPurpose. The BKeyType actually determines the type of the key, i.e. what subclass it is (B_KEY_TPYE_GENERIC for BKey, B_KEY_TYPE_PASSWORD for BPasswordKey, B_KEY_TYPE_CERTIFICATE for the eventual BCertificateKey, etc.) while the BKeyPurpose shall indicate the “domain” of the key (there is a generic, web, network and volume purpose for now). The idea behind the purpose is that browsers would only look for “web” keys when trying to auto log in at a website while the network manager would only look for “network” keys when trying to join a network. This design should allow for future extension, like maybe a special case of a certificate key subclass that’d use a smartcard as a backend or a fingerprint sensor providing a key sublcass. I’m not yet sure that the simplistic “getter method” is sufficient for that though.

The BKeyStore API which is used to enumerate and manage BKeys gained support for BKeyType and BKeyPurpose and was generally transformed from anything “password” to “key”. Other than that it stayed pretty much as it was in the draft already.

While it sounds relatively obvious now that I write it down, it took me a while to come up with and to try and think about the different use cases. I have yet to integrate it into an actual application (in this case most likely the wpa_supplicant for network passwords) and see if the API is actually easy to use.

As far as the implementation goes, pretty much all of it is still stubbed out. The basic functions (the easy part) are there, but the whole storage backend (which will need to be added to the registrar and get encrypted) is yet to be implemented.

Even though I wrote this up in one go I didn’t actually do things in one run. Once I came up with those initial design ideas I started to rework the draft API, but I soon got distracted by some large other thing…

Guarded Heap

Rebooting one day I ran into that FreeBSD network driver compatibility layer bug where it would crash in “if_initname” due to some Haiku specifics mysteriously getting messed up. This happened to me a couple of times and some cursory debugging I did some time ago wasn’t enough to turn any actual issue up. I’ve discussed this issue with Rene Gollent, who happened to encounter that crash far more frequently than me (it only happened in like 1 of 50 boots for me, if that). He added some debugging output already to further investigate the issue, but as usual, once debug output was there, the bug didn’t show up anymore.

From looking at the crash it looked like some already freed memory was used which only happened to be triggered in some situations. Having played extensively with the old kernel heap, making it into a debug heap, bringing it into the userland as malloc_debug and having recently helped adding allocation tracking in the slab I have some background when it comes to adding debug features to memory allocation. A thing that I always had in the back of my mind (and I actually added to malloc_debug in userland) was to add guard pages in the kernel.

The basic idea is to align each allocation so that the end of the allocation resides at a page boundary, and having that next page marked as a guard page so that any access to it, be that read or write, trigger a debug action (a panic in the kernel or a debugger call in userland). This already worked in userland, since you can use mprotect() to mark individual pages of an area with different protections (read-only, write-only, both or none as used for the guard pages). An access beyond the allocation (as in buffer overruns) would then cause a debugger call. Although this was already present in malloc_debug, it was added more like an afterthought and it couldn’t possibly work in the kernel anyway. Therefore I went out and wrote a very simplistic heap that would basically only support guard pages, but for the kernel. Additionally to adding guard pages after allocations (and beyond what malloc_debug does) it would also guard any yet unallocated or already freed memory, so it would not only point out an access beyond an allocation, but also panic on accessing free memory. While there is already a debug feature in the normal heap that overwrites freed memory with a pattern (0xdeadbeef) which makes spotting such cases easier, it is still not guaranteed that something will actually crash with that. If it only reads that memory it might just lead to bogus things happening which are extremely hard to track down. Think about a list implementation that might access freed memory but simply compares two pointers to check for the empty case: 0xdeadbeef == 0xdeadbeef, so it may not crash immediately or at all and it is later way harder to figure out where things actually went wrong. Having things outright crash or panic immediately where an invalid memory access happens is far more obvious.

When we work in the kernel the situation isn’t quite as easy as in userland. The kernel doesn’t support page protections for kernel areas. Neither can you read-protect a page, as kernel code is always allowed to read pages. Additionally the protection is only enforced once a page fault occurs and a page is mapped in. For fully locked pages (as needs to be the case for the kernel heap) this wouldn’t ever happen, so even if we set page protections, they wouldn’t be enforced. The only real way to read and write protect a certain page is to actually unmap it, so that it will always generate a fault. As easy as that sounds, mapping and unmapping pages isn’t exactly trivial and in many situations isn’t allowed at all. Since the heap is used from pretty much everywhere, including some critical sections inside the kernel that actually deal with page mapping, we can’t really mess around and map/unmap pages on demand. Some trickery is needed…

Since we are going to use this heap implementation only for debugging and since we have a certain set of conditions we can rely on for the heap use case, we can go a level deeper and have the thing we need happen. Basically what we want is to cause a page fault exception when a certain page is accessed so that the page fault handler has a chance of evaluating whether or not the access is valid or not. Since we can’t actually map/unmap pages on demand we do only the part that actually causes the fault: We clear and set the presence bit for the page in question directly on the translation map. The translation map basically tells the CPU what physical page of RAM backs a certain virtual page and also indicates if a mapping currently exists or not. For the pages we want protected, we now directly clear the presence bit, telling the CPU that a page fault is needed to resolve an access to a certain virtual address. Obviously doing that under normal circumstances without doing the rest of the page mapping/unmapping would seriously mess things up. In our case however we know that our memory is fully locked, hence the pages we are using and are in the translation map do not change and the mapping stays valid. Through clearing and re-setting the presence bit we only “turn the pages on and off” in a sense.

Additional to that some more support was needed, like the mentioned lack of support for actually setting individual page protections for kernel areas. We still don’t really support that, but we now allow the protection bits to be mapped for kernel areas, so that they work good enough in this context. With those two things in place it was therefore possible to actually make working guard pages.

As mentioned the heap is used from pretty much everywhere. This includes the VM code, where memory allocations are done for creating areas. Since we use areas as the basis for the guarded heap, growing the heap means creating a new area and adding it to the managed memory pool. This introduces a few cases where we might get called recursively and cases where we may not be able to grow at all. This isn’t actually solved in the guarded heap. It basically only tries to keep a certain amount of free space available for these cases so that it is less likely to run out of memory in a situation where it can not do anything about it. Since we only use it for debugging, that limitation seemed acceptable though.

Obviously the performance of the guarded heap is really bad. It is intentionally not optimized to keep the complexity as low as possible (as to not introduce bugs and for not wasting too much time). The memory overhead this whole thing has is ridiculous as well of course. Even for a single byte allocation (actually even a 0 byte allocation), at least two full pages (4096 bytes) are used (one or more for the actual allocation and one for the guard page). This makes memory use explode. Even if you have a lot of RAM you’ll eventually run out of kernel address space (which is limited to about 2GB). Again as this is used only as a debug helper those limitations are acceptable.

Bug Hunting Time! (again)

With the guarded heap in place things were immediately crashing, mostly pointing out various places where things were reading past allocations or used already freed memory. Often those are bugs that in 99% of the cases do not cause any harm, but might just sometimes go wrong depending on what surrounds an allocation. This is then a case of those mysterious bug reports that are near impossible to reproduce. There are two cases that frequently happen.

One thing is the 0 termination of strings. A proper C string is simply a run of arbitrary bytes that is marked by a 0 bytes at the end. If you don’t take care to add or preserve that mark, then things may go wrong. I say “may” as it pretty much depends on what follows your allocation. It is quite likely that there is a 0 byte laying around somewhere past your allocation, so usually reading past the allocation would maybe result in some string with garbled random extra data. But there’s an edge case: If your allocation happens to reside at the end of a heap/slab area that happens to not be followed by any more mapped memory you will crash.

The other thing is deleting an object but then using storage that belonged to that object to return something. As an example you might be done with some transaction and want to delete the transaction object and then return the transaction status. So you: delete transaction; return transaction->Status(); … which is obviously invalid, but not always easy to spot. Before we re-introduced overwriting freed memory with 0xdeadbeef in the slab, this would work just fine almost always since the time between the delete and the return is relatively short it is likely that the memory wouldn’t have been touched in the meantime. In the unfortunate case where the thread was rescheduled after the delete but before the return and someone else actually acquired the now freed memory and overwrite it, you would either crash or return bogus data though. The static code analysis Coverity does (that’s where the CIDs come from in the commit messages if you wondered, those are Coverity defect ids that may hint at a problem) is relatively good at detecting these cases (even though there are unfortunately a lot of cases where it mistakes intended behaviour as an error as well and where the logic is apparently to complex for it to see an error).

The result of identifying and fixing the issues you can see in detail in the commit log, starting with the introduction of the helper functions and later the guarded heap itself from hrev43387 and the various fixes up to hrev43482. As can be seen there I later made the slab replaceable with the guarded heap as well to find issues that may lay in the networking code (that makes heavy use of the slab for the various net buffers). And it did indeed reveal a minor issue that is now corrected as well.

After having the kernel and modules up and running with the guarded heap another big target was getting the guarded heap into userland as well. The idea was that with it available there some of the harder to reproduce app_server memory corruptions might be easier to find. Porting it to userland was comparatively easy, as from there you don’t need to mess with all the low level conditions you run into at the kernel level. Also mprotect() provides for a very easy way to guard the managed memory without having to revert to any tricks. The only problematic thing was that the registrar and the app_server crashed with it immediately due to some uncovered issues. While basically a good thing, that’s what I wrote it for in the end, it made getting at the actual crash info a bit difficult. You see the convenient stack trace that is produced when a userland app crashes isn’t done in the kernel. It is the debug_server who prints it out. Now, since the registrar and the app_server are dependencies of the debug_server, the latter wasn’t running yet and could not yet be started. After thinking about the problem for way too long and trying to enable the disabled (due to deadlock danger) kernel side stack trace, I finally realized that I could simply drop into KDL to get a stack trace…

Fixing the issues involved mostly the first thing mentioned above, namely running past string ends due to some places not ensuring a proper termination. It also required to introduce a more restrictive UTF8 handling routine that made sure to not read beyond a certain length, so that fixed with buffers could properly be handled.

While resolving some issues I continually took a look at Coverity whenever I encountered a use-after-free case. And sure enough many of them showed up there. So before trying to reproduce all of these manually I decided to go through the other related defects Coverity listed. Many of those could easily be fixed and quite a lot were also marked as false positives due to only conditionally freed memory.

After that run of commits I was able to use a full install with the guarded heap (the kernel as well as the userland). Due to the excessive waste of memory you can’t really run it for long as you’ll simply exhaust the available pages, and if not that, then the available address space after a while. But for immediate bug hunting it was enough.

Back to Work

Well that was fun, but I still had that key_store branch sitting there in a rather messy state of being halfway rewritten. So I sat down and finished the initial rewrite only to discover some cases that weren’t really practical. So after finishing that redesign another one was due again. The refinements were getting smaller and smaller and now I think the API should be quite usable.

The next step on that topic will now be to implement the rest of the backend and then see whether or not the API is really any good when finally using it from the wpa_supplicant to store and retrieve network keys.

In Closing

Yeah, I realize that this blog post has gotten pretty long again. I’m still working on keeping it short, but often when I start writing, so many details come into my mind that it is really hard to sort them out…

For anyone wondering I will be working, although probably at a reduced amount of hours a day, the coming week. So there only remains one thing to say (for those who it applies to): Happy holidays :-)