axeld's blog

Almost, but not there yet

Blog post by axeld on Tue, 2010-12-14 22:08

As you might have noticed, the WiFi encryption bounty ends tomorrow. Obviously, this is a good time to give an overview over what I did in the past weeks. Unfortunately, and hopefully before I got you excited, the most interesting thing of the bounty, the wpa_supplicant, does not work yet. I've ported it to Haiku, but so far it has resisted my attempts to find out where the problem is located -- well, in the hours I put into debugging I've found a couple of potential causes, but there is at least one more to be found, and fixed.

WPA encryption progress

Blog post by axeld on Wed, 2010-10-13 21:03

I've been working on getting WPA encryption to work on Haiku. While I haven't been able to invest as much time as I hoped for before, I made a bit of progress that at least justifies a small status update.

The first part of WPA related work has actually happened while I was working on the network stack for Haiku Inc., as it was not possible to inject packets into the network - while you could easily monitor all incoming packets, there was no mechanism to send packets for arbitrary protocols. I've extended the AF_LINK protocol to be able to deliver that mechanism.

Then I started the actual work on porting wpa_supplicant, the standard tool for allowing WPA/WPA2 encryption almost everywhere. Since we're using the net80211 WLAN stack as part of our FreeBSD driver compatibility layer, a lot of code from the existing BSD driver backend could be used. There are actually only two differences between Haiku, and your favorite BSD for this kind of work:

  • ioctl() needs a length parameter that tells the kernel how large the structure passed in is. In BSD this information is encrypted in the control constant, but this is not the case for Haiku. While the BSD way is certainly more convenient, Haiku's solution is safer and more flexible.
  • the BSDs are using AF_ROUTE sockets to pass network notifications to the interested user software. In Haiku, we already have a system wide interprocess messaging service, and this is what is therefore used there. The only downside of this approach is that the API is C++ only which is probably something that won't be accepted upstream.
  • After I had the driver backend ported, the next thing was to implement the level 2 packet support, as the wpa_supplicant needs a way to receive, and send packets outside of the usual protocols. This part is now using the AF_LINK method I described above, although there seems to be a problem left that I need to look into: select() always reports the socket readable. For now, due to the event based implementation of the wpa_supplicant, this is only causing a bit more CPU usage than you would want to have. I am not sure there is anything actually relying on receiving something yet (or if it works, as nothing has been sent or received so far).

    Once I had the wpa_supplicant working in theory, more problems surfaced, though, as usual. One is that the FreeBSD driver layer does not receive a few ioctl() calls that need to be forwarded to it in the use case of wpa_supplicant. That one only took a bit of debugging, and I've added a quick hack that forwards the needed calls for now; since I'm not happy with that yet, I haven't committed the changes yet.

    A more serious problem was that Colin did an iterative port of the net80211 stack: he added/implemented only the parts that were needed at the time. It seems that WPA needs a bit more parts; so far I've only added the xauth module which allowed me to actually receive node association events from the the stack. After that, it still fails for some reason, but I haven't figured out why.

    I will only be able to work a few hours per week in the next weeks, so I hope I'll get something useful done until the deadline is hit. In any case, once I have the wpa_supplicant working, I will commit what I have done so far - since the wpa_supplicant currently needs private Haiku headers, and the changes I'm planning on doing are unlikely to be accepted upstream, I will probably end up adding all of its sources to the Haiku repository. Once I have WPA working, I will continue with extending the existing configuration tools to support WLAN as well. This will also cause further integration of the wpa_supplicant, it will probably end up being an add-on for the net_server.

Network Stack Update

Blog post by axeld on Tue, 2010-07-27 10:20

First of all, thank a lot for your generous donations! It was really stunning to see how much money could be raised in such a short time. And since it's been some time since my last commit, I thought it would be a good idea to report what I'm currently up to. But first, let's have a look at what I did last week for the most part:

  • I started to fix some annoying bugs in the FreeBSD compatibility layer. Now it's possible to unload the networking stack completely again, and the "callout" implementation should provide more accurate timing. Oh, and booting over the network didn't work either anymore with FreeBSD drivers. Originally, I wanted to find out why Haiku would instantly reboot on one of my machines, but the problem mysteriously vanished once I started looking into it.
  • Next on the list were some minor bugs, mostly having to do with routing, some could crash the system, others would just render your network unaccessible. Most of those bugs were reported by Atis Elsts, one of our current GSoC students that works on implementing IPv6. I've recently committed his work in progress to our repository in order to ease reviewing his patches, and give what he has done so far a bit more exposure.
  • Then I started to integrate the ICMP patches that two former students (Ivo Vachkov GSoC 2007, and Yin Qiu HCD 2008) produced by pretty much rewriting it. Looking back, it was not well spent money: neither student joined the project, nor was the quality of their work really acceptable. It took me 3 days to rework it, and it still has some issues like introducing an IPv4 specific error mechanism to the protocol agnostic stack. I doubt it would have taken much more time to write it from scratch. I will continue to work on this later, though, and address its remaining shortcomings. The current state is that we can produce ICMP error messages (and will in most appropriate places), and also forward those errors to userland applications. For example, if you send a UDP packet (through a connected socket) to a port that is not served, the server will answer with an ICMP port unreachable error (even if that server would be Haiku), and your application would retrieve the appropriate error code from its next socket interaction.

Since the last three days, I'm working on changing some stack internals that caused a bit more work than I originally anticipated: the network stack that is currently in Haiku only allows a single address per interface. This is something that was quite okay with IPv4, but starts to be problematic with the adoption of IPv6, since it's common there for an interface to have more than a single address (this feature is also requested by the RFCs that cover it). Originally, I had thought that simple aliasing of interfaces would do it (the plan was that several interfaces could use the same driver, but had different names that would only be joined for displaying them through ipconfig), but even though other systems seem to actually do that, it's quite a limited approach. Besides, the network stack support for this was utterly broken. That's probably what you get when you don't consider a feature important.

Now that I learned that aliasing the way it was planned and implemented won't do it, and since Atis would welcome the ability to have several addresses per interface, as this would ease (or rather, due to the brokenness of the current solution, make possible) implementing Neighbor Discovery for IPv6 (a.k.a. NDP, see RFC 4861), I thought it would be a good idea to spend my contract time to implement this.

The rest of this blog entry will detail the technical details of this, so if you're not interested in those, feel free to skip the rest.

Each net_interface can now have any number of net_interface_address items, each of which stores a local interface address, as well as its network mask, and broadcast address (if any). Therefore, a net_buffer, and a net_route can no longer point directly to the net_interface. Instead, they now have a pointer to the net_interface_address structure which also features the pointers to its net_interface, as well to its net_domain so that no information is lost. The net_protocols (the transport, and network layer in OSI speech) were therefore very easy to port to the new architecture, besides the little annoyance that the datalink layer can no longer figure out the interface address a buffer belongs to.

The meaty bits were hidden in the data link layer, and the stack internals. Before, the net_datalink_protocols were instantiated for each interface. Since an interface had one address, and one specific domain (like IPv4), this worked out nicely. However, now, an interface is domain agnostic, so this nice and simple model was not adequate anymore. Furthermore, modules like ARP have to know and maintain the local addresses of the interfaces so that they can be used for outgoing ARP requests, and replies. This requires a more flexible approach, and the net_datalink_protocols are now instantiated per domain per interface. That means, whenever the first address of a specific domain is added to an interface, the protocol chain is created, and attached to the interface for that domain only. I've currently implemented this with a simple array lookup per interface, but it will do the job. I've also taken the opportunity to slightly introduce a bit more C++ into the stack internals, there is now an Interface class that takes over the private parts of a net_interface. I've also changed interfaces, and their addresses to a reference count based solution, which was still on my TODO list from when I originally implemented the stack. Furthermore, the datalink module needed a number of new methods in order to give the other layers enough information to deal with the new situation.

The next problem was the ioctl interface that is used to configure the stack. With more than a single address per interface, it's no longer adequate. In FreeBSD, for example, there is no direct way to iterate over the addresses of an interface due to this. You can collect the information by looking at the routing table, though, and that's what each application interested in this data has to do (luckily, they provide a non-POSIX extension getifaddrs() that does the job for you). I haven't decided yet what I will do in Haiku, but I guess I will add some dedicated ioctls for this issue.

I'm still kind of in the middle of the changes, but I hope that I can start to commit the first bits later today, or tomorrow. Likely, those commits will break the network stack, Haiku in general, and introduce severe regressions. I will then start looking into WPA, and see if I can make enough time to apply for the WiFi bounty at Haikuware, leaving the regressions in place to prepare for my future contracts. You know the deal :-)

Once I'm done with the current rework (currently I need to adapt the routing code to the other changes), I would continue completing the error mechanism, and introduce a nice C++ API to access the stack internals in order to hide the ioctl interface to those that really want it. I will probably do the latter only after having had a look at WPA, though, since that might bring further API needs to the table.

Why BFS needs chkbfs

Blog post by axeld on Fri, 2007-10-05 09:16

You are probably aware of the existance of chkbfs. This tool checks the file system for errors, and corrects them, if possible.
Nothing is perfect, so you might not even be asking yourself why a journaling file system comes with such a tool.

In fact, it wasn't originally included or planned in the first releases of the new BFS file system. It was added because there is a real need for this tool and you are advised to run it after having experienced some BeOS crashes.

app_server Memory Management Revisited

Blog post by axeld on Thu, 2006-03-23 08:30

I recently looked into why BeIDE's interface did only have green squares where its icons should have been (bug #313). The function importing the client's bitmap data did not work correctly, and while playing with it, the app_server suddenly crashed, and continued to do so in a reproducible way.

How was this possible? Bitmaps are located in a shared memory pool between the app_server, and an application. Unfortunately, the app_server put those bitmaps into arbitrary larger areas, and put the structures managing that space into those areas as well - like a userland memory allocator would do. However, if a client clobbered memory outside of its space in those areas (and that's what buggy clients do all the time), the structures could be easily broken, which caused the app_server to crash when it tried to use them next time. Also, since all applications shared the same area, they could easily clobber bitmaps of each other, as well.

But there even were more disadvantages of the way client memory was managed: the client would clone that area for each bitmap therein - that meant for an application like Tracker with potentially tons of icons (which are bitmaps), that it wasted huge amounts of address space: if the area was 1 MB large and contained 500 icons, Tracker would have cloned that area 500 times, once for each icon, wasting 500 MB of address space. If you have a folder with many image thumbnails, the maximum limit (2 GB per application) could have been reached with ease. Not a very good idea.

Another problem of the previous solution was memory fragmentation and contention - if many applications were allocating server memory at the same time, their memory would have been spread out over the available areas, and since it was only a single resource, all applications needed to reserve their memory one after the other, for every single allocation. If now one of these applications were quit, its memory had to be freed again, and left holes in the area. Of course, the app_server needed to create quite a few areas - and with memory fragmentation like this, would waste much more memory and address space, which is a real concern in the app_server.

Anyway, the new solution works pretty much different: the app_server now tries to have a single area per application - if that application dies, that area can be freed instantly, without having to worry about other applications. To achieve this, the client reserves a certain area for the app_server - that makes sure that the area can be resized if required - at the server's side, the area is always exactly as large as needed. Since the app_server doesn't reserve space for the client, it comes up with fully relocatable memory; if an area cannot be resized in the app_server (since there are other areas in its way), it can be relocated to another address where it fits. If that's not possible, a new area is created, and the client is triggered to clone it. Of course, every area is now only cloned once in the client, too.

The structures that manage the allocations and free space in these areas are now separated from the memory itself, and not reachable by the client, with the desired effect that the app_server cannot be crashed that easily anymore that way. The contention is reduced to the requirements of a single application which should be much more adequate.

As an additional bonus, the new solution should be much faster due to the wastly reduced amount of area creations and clones. The allocator itself is pretty simple, though, and could probably be improved further, however it works pretty nice so far.

APM Support

Blog post by axeld on Sat, 2006-02-04 13:03

Since a few days, we have a working APM driver in our kernel. APM stands for Advanced Power Management. It's a service as part of the computer's firmware commonly called BIOS in the x86 world. The latest APM standard, version 1.2, is already almost 10 years old. Today's computers do still support it, even though the preferred method to have similar services (among others) is now ACPI, or Advanced Configuration and Power Interface. Thanks to Nathan Whitehorn effort and Intel's example implementation, we even have the beginnings of ACPI support in Haiku as well.

But let's go back to APM. Theoretically, it can be used to put your system in one of several power states, like suspend or power off. You may also read out battery information from your laptop as the estimated remaining power. It also supports throttling the CPU for some laptops, but it only differentiates between full speed and slower speed.

The driver doesn't do much yet, but it should let you shutdown your computer. In addition to that, it follows the standard and periodically polls for APM events. An example APM event would happen when you connect the AC adapter to your laptop.

By default, the driver is currently disabled, but that might change when I have a better picture about on which hardware it doesn't run yet. I have successfully tested in on 4 different systems over here, but I also have one negative report.

If you're interested to test Haiku's APM support yourself, you can add the line "apm true" to your kernel settings file. When you then enter "shutdown -q" in the Terminal, the system should be turned off. If an error comes back, APM couldn't be enabled for some reason. If nothing happens, your computer's APM implementation is probably not that good. In some rare cases, your computer may refuse to boot with APM enabled - in this case, you can disable APM in the safemode settings in the boot loader. If it really doesn't work, I would be very interested in the serial debug output in case you can retrieve it.

In other news, we now also have syslog support in the kernel, as well a on screen debug output during boot. The former can be enabled in the kernel settings file "syslog_debug_output true", while the latter can be enabled in the safemode settings of the boot loader. "syslog" is a system logging service that currently stores its output file under /var/log/syslog. Note that you must shutdown the system gracefully to make sure the log could be written to disk.

Sorry, Volume Is Busy!

Blog post by axeld on Sun, 2006-01-15 19:43

If you've used BeOS, you're probably familiar with the above message when trying to unmount a volume. From time to time, some application keeps accessing a volume, and you can't determine which application that is. It might be caused by a running live query, but it might also be caused by buggy background applications that forget to close a file.

I've just given you control over your volumes back again in Haiku: you can force unmounting such a volume -- applications still trying to access it, would get an error back. Forcing an unmount requires an extra user interaction, though, so it's not the preferred solution.

To remove one of the problems, live queries shouldn't bother unmounting a volume at all: it doesn't make any sense that they are preventing the normal unmounting process to stop. This can hardly be in the interest of an application that is querying for something.

On the other side, we should try to improve the user perception of a busy volume: instead of saying "sorry, busy" it should say something like: "Sorry, application Tracker is still accessing the volume." - for the user this makes an important difference, especially when he now has the power to force unmounting a volume, it gives him the information he needs to properly decide what he really wants to do.

As a side effect, we'd get a tool that can determine which applications have which files open - to be able to report misbehaviour of the application back to its developers. Or even better, to give the developer the possibility to monitor the performance of his application.

Well, at least you have the power now, control comes next :-)

Syndicate content