Tomorrow, I’ll have a short look at implementing Hyper Threading support - not the full monty, but it would still be nice to have it start using the two logical processors in my system; after all, I bought this system with enabling Hyper Threading in Haiku in mind.
After that excursion, I will start looking at the app_server again. I had started to refactor the code some weeks ago, but got interrupted, and didn’t find the time to continue this effort. I hope to make the app_server stable in the next few weeks - it’s one of the biggest show stoppers for Haiku right now: the kernel should decide when the time for a reboot has come (read KDL), not some bloody userland application :-)
Anyway, the app_server is the single most important application running under Haiku, and it’s in many regards as critical as the kernel. When the Application Kit/Interface Kit/app_server triumvirate works as expected, we should be able to run almost every R5 or Haiku application under Haiku. And that should enable us sooner or later to start distributing official alpha releases - not that we’ll be able to work with these in a productive manner, but it’ll be a major step forward.
I just booted into Haiku working on an SMP machine. Unfortunately, I am not really sure what change exactly triggered this - I’ve tried so much and all of a sudden it started to work, after I disabled setting up the APIC (the advanced programmable interrupt controller) to use ExtINT delivery mode - that shouldn’t tell you anything, I know, but it’s still remarkably that this code was originally disabled as well.
It took me quite a number of hours to get it working, so it’s a bit frustrating not to know what was actually responsible for the hickup, but it still didn’t make me that curious to start an investigation on this topic for now…
Anyway, our SMP configuration is pretty weak right now - it only supports virtual-wire mode which is only one of two possible modes every compatible IA-32 MP system should support. We don’t yet support to run the system in so called symmetrical MP mode - that would require us to do some more APIC programming for interrupt redirection, which I obviously didn’t need to do to get my machine up and running. Bad for Haiku, but good for me :-)
Next on the list are some more SMP related changes, as some things like call_all_cpus() are not yet working. I expect to finish the remaining SMP work tomorrow, and that’s when all the testing can begin on your end. A serial debugging cable (with a second machine) would be very helpful, though, in order to get me useful information about what went wrong. Your effort is surely appreciated!
Older SMP machines could or even should work now, but I would be surprised if the same could be said for current SMP machines - but as I don’t have access to such a machine, it’s not on me to find out about that now.
Even though I usually don’t work at the weekend, I had to, since I didn’t manage to work 8 hours on friday.
Unfortunately, I still haven’t got SMP to work yet. I’ve investigated the issue, and came to the simple conclusion that the APIC interrupts doesn’t reach their goal (it just took me some time to get there, and exlude all other possible faults). I can trigger such an interrupt manually, so the second CPU is setup correctly, but its APIC doesn’t seem to be. You don’t understand a word of what I just said? Well, let’s just say one CPU doesn’t manage to talk to the other CPU (through the APIC, the “advanced programmable interrupt controller”).
Basically, the boot process is halted as soon as the first CPU tries to tell the second CPU what to do, and then waits for an answer - until you stop it.
I haven’t been able to find a bug in the initialization code yet, and I haven’t even started looking at the I/O APIC, but I still hope I can figure out what’s going wrong on monday.
It took a bit longer to get the dual machine up and running again - it has two 500 MHz PIIIs and the hard drive is a bit older as well, so it took about two hours to update the source repository and get it compiled.
While waiting for the machine to complete its task, I had the time to look into some other known issues of our code, and clean the signaling code a bit. We are now able to handle signals with interrupts turned on, some minor bugs went away, and there is now support for sigsuspend() and sigpending() - nothing earth shaking, but definitely a step into the right direction.
There were some other distractions, so I played around with SMP only shortly - I am just sure now that it still doesn’t work :-)
Shortly after the second CPU steps in, both CPUs seem to come to a halt. I don’t know yet what’s causing this, but it seems to be a single basic problem - let’s just hope I don’t waste too much time searching for it.
I’m done implementing sub transactions for now - I haven’t yet tested detaching sub transactions, but everything seems to work fine. Time will tell :-)
A complete Tracker build now dropped from 13.5 minutes to 5.4 minutes - that’s great, but BeOS R5 does the same job on this machine in around 2.5 minutes, so even while this is an improvement, we still have a long road ahead of us. I can only guess where we lose those 3 minutes for now, but I am sure we’ll find out well before R1. One of the responsible components should be the caching system, as it still only looks up single blocks/pages, instead of doing some bigger reads and read-ahead.
Anyway, since Adi is still working on the app_server, my next assignment is getting Haiku to work again on SMP machines. While it may seem like luxury right now, having an SMP machine to test a multi-threaded system on is almost mandatory. Let’s see how many related bugs have sneaked into the system - I only know about one particular piece of code that won’t work well on those machines (and I am to blame for that one, out of pure laziness).
The machine I am testing on is a dual PIII (with Intel BX chipset) that was generously donated (or lent :-)) by Ingo, one of the best developers we have on the team.
A small update to the BFS incompatibility: I’ve now ported the original logging structure to the R5 version of BFS as well, so that the tools like bfs_shell can now successfully mount “dirty” volumes, too. I also found another bug in Be’s implementation, and needed to cut down the log entry array by one to make it work with larger transactions.
Now I am working on implementing sub transactions. If you have tried out Haiku and compiled some stuff or just redirected some shell output to a file, you undoubtedly are aware that this takes ages on the current system.
The reason for this is that BFS starts a new transaction for every write to a file that enlarges its file size - and that’s indeed a very common case. Since writing back a transaction also includes flushing the drive caches, this isn’t a very cheap operation - it slows down BFS a lot.
The original approach taken by Be Inc. was to combine several smaller transactions to a bigger transaction - problem solved. The downside to this approach is that you lose the ability to undo a transaction. If you need to undo some actions, you have to manually undo the changes in the transaction that would have belonged to the small transaction.
That works but also complicates the code a lot, and is a welcome for any kind of bugs (and that’s one more reason why file systems take ages to become mature).
In Haiku, we introduce the concept of a sub transaction: you can start a transaction in the context of the current transaction, and then abort only the sub transaction instead of the whole thing. As soon as the sub transaction is acknowledged, its changes are merged with the parent transaction - at that point, you cannot revert its changes anymore, you can only still revert the whole transaction.
The only downside of this approach is that it uses more memory, as it has to store the changes of the sub transaction alongside those of the parent. The largest transaction that is possible with a standard BFS volume currently consists of 4096 blocks - so even the worst case should be acceptable.
If a sub transaction grows too much, it can be detached from its parent - since the parent transaction itself is done already, it can safely be written back to disk.
I hope to finish implementing sub transactions and use them in BFS until some time tomorrow. Depending on the number of bugs I add to the code, it might also go faster, though :-)
Turns out BFS logging code is not that intelligent - it uses block_runs in the log area, but it doesn’t make use of them. In other words: it only accepts block_runs with length 1 - which effectively kills the whole idea of using them. It’s now as space consuming as the single block number arrays I had before, but doesn’t share the binary search capability we had earlier.
While our code now could use block_runs how they should be used, I have disabled joining separate block_runs to make our BFS fully compatible to Be’s in this regard. If we someday leave compatibility with the current BFS behind, we can enable it again, of course.
While this is probably just a fault in the implementation of the original BFS, it’s not the first time we have to live with a sub-optimal solution in order to retain compatibility. The good thing is, since we should be 100% compatible to BFS now, it should also be the last of these surprises now.
This morning, I went through analyzing the BFS log area structure. Turns out it’s very different from what I did for our BFS.
Our current log structure looks like this:
block 1 - n:
uint64 number of blocks
off_t[] array of block numbers
block n+1 - m:
real block data
While the one from BFS looks like this:
block 1:
uint32 number of runs
uint32 max. number of runs
block_run[] array of block runs
block 2 - m:
real block data
BFS only has one header block, so it can only store a certain number of blocks per log entry. On the other hand, it uses block runs instead of single block numbers which potentially compacts the size of the block array, but also makes lookups a lot more expensive.
Like a block number, a block run is 8 bytes wide, and is a composed data type that looks like this:
uint32 allocation_group;
uint16 start;
uint16 length;
BFS divides the whole volume into allocation groups - each of which can combine up to 65536 blocks. This way, it can also represent a number of sequential blocks. This structure is used a lot throughout BFS, so it’s not surprising to find it again in the log area.
So I will now convert our BFS to use that same format, so that you can safely mount uncleanly unmounted volumes from both operating systems, BeOS and Haiku, and in both directions.
First of all, I successfully booted Haiku from CD-ROM from several machines today. It took a bit longer than I thought, as no emulator that I have access to seems to support multi-session CDs, and not every BIOS I have works by the book. The boot device selection is still very simplistic, so it might not end up booting completely from CD if you just inserted it, and didn’t choose “Boot from CD-ROM” in the boot loader - but you’ll have to bear with that currently. I’ll probably fix that tomorrow.
Anyway, you could build you own bootable CD image with the “makehaikufloppy” script that’s now in our top-level directory (it’s still rough, and you have to build the whole system manually or via “makehdimage” before). You just need “mkisofs” and use the resulting image in this way:
$ mkisofs -b boot.image -c boot.catalog -R -o <output-ISO-image> <path-to-directory-with-boot-image-and-other-stuff-that-should-go-into-the-boot-session>
As those of you, that attended the Haiku presentation at BeGeistert are aware of, we initially had some problems getting Haiku to run.
The reason behind these problems were incompatibilities between our version of BFS and Be’s version: the log area is currently written differently in both implementations. As soon as you mount a dirty volume with the wrong operating system, chances are that blocks are written in random order to your hard drive, and thus, corrupting the file system - as Haiku currently crashes rather often, you get into this situation faster than you’d like.
Since we expect early adopters to double boot into BeOS (and for good reason), we should definitely get rid of this annoying risk to lose data. It will also be much easier to track down real remaining problems in BFS if all these simple traps are removed.
Therefore, this will be the next thing to work on for me. I will start to understand Be’s logging format, and then I’ll make ours compatible. Thanks to the wonders of “fsh” (no reboot necessary to uncleanly unmount a disk) this shouldn’t take too long - I expect to get it done sometime tomorrow.
Everything is in place now, and the boot loader is even passing all information to the kernel to be able to boot from a CD. It’s not yet working though, as the VFS is only evaluating the partition offset of the boot volume, and nothing more.
It’s probably only a tiny bit left, so I try to finish it tomorrow - in my spare time, as I usually don’t work during the weekend :-)
We have also agreed on not making demo CDs (images only, of course) available before the whole system runs a bit more stable. Compared to a hard disk image, a CD image is likely to be tested by a lot more people - and therefore, the first impression should not be too bad.