Sub-Transactions

Blog post by axeld on Wed, 2005-10-19 15:38

A small update to the BFS incompatibility: I’ve now ported the original logging structure to the R5 version of BFS as well, so that the tools like bfs_shell can now successfully mount “dirty” volumes, too. I also found another bug in Be’s implementation, and needed to cut down the log entry array by one to make it work with larger transactions.

Now I am working on implementing sub transactions. If you have tried out Haiku and compiled some stuff or just redirected some shell output to a file, you undoubtedly are aware that this takes ages on the current system.
The reason for this is that BFS starts a new transaction for every write to a file that enlarges its file size - and that’s indeed a very common case. Since writing back a transaction also includes flushing the drive caches, this isn’t a very cheap operation - it slows down BFS a lot.

The original approach taken by Be Inc. was to combine several smaller transactions to a bigger transaction - problem solved. The downside to this approach is that you lose the ability to undo a transaction. If you need to undo some actions, you have to manually undo the changes in the transaction that would have belonged to the small transaction.
That works but also complicates the code a lot, and is a welcome for any kind of bugs (and that’s one more reason why file systems take ages to become mature).

In Haiku, we introduce the concept of a sub transaction: you can start a transaction in the context of the current transaction, and then abort only the sub transaction instead of the whole thing. As soon as the sub transaction is acknowledged, its changes are merged with the parent transaction - at that point, you cannot revert its changes anymore, you can only still revert the whole transaction.
The only downside of this approach is that it uses more memory, as it has to store the changes of the sub transaction alongside those of the parent. The largest transaction that is possible with a standard BFS volume currently consists of 4096 blocks - so even the worst case should be acceptable.
If a sub transaction grows too much, it can be detached from its parent - since the parent transaction itself is done already, it can safely be written back to disk.

I hope to finish implementing sub transactions and use them in BFS until some time tomorrow. Depending on the number of bugs I add to the code, it might also go faster, though :-)