Blog post by axeld on Wed, 2005-12-14 09:47

Sure, you too! Since Stephan made a BDirectWindow based version of our app_server that directly uses the hardware frame buffer and acceleration features, we noticed that it felt much faster there than on real hardware. How could that be?

The reason is actually very simple. Parts of our rendering pipeline like text output isn’t optimized to use 32/64-bit memory access - that means it doesn’t make full use of the memory bus. While we’d like to change this for the future, Intel introduced a functionality called write-combining in something like 1998 that is supposed to optimize write access to something like a frame buffer. Instead of directly writing the bytes back to the buffer instantly, the CPU waits until you have written 32 sequential bytes, and then writes them back at once, in a single burst. Enabling write-combining is therefore even a good idea if you already have a optimized your graphics output, although the effect is less noticeable in that case.

This brings us back to MTRR, “memory type range register”, just in case you asked whatever that would be :-) Using them, you can specify that the CPU should access a part of memory in a specific way - like write-combining, but there are other options, too. In BeOS and Haiku, they can only be specified in the map_physical_memory() call (via the B_MTR_* flags). Graphics drivers usually try to map their frame buffer in write-combining mode, at least all of ours do that, so they are directly benefiting from the new functionality.

MTRR is a CPU dependent feature that is programmed using the machine state registers. Luckily Intel and AMD uses the exact same mechanism here, and thus, we support it for processors of both vendors. We’ll make sure that it is supported for other brands like VIA or Transmeta as well.

Even though the app_server has lots of potential optimizations left, it already feels pretty well now. You can still manage to lock it up, but those problems should go away soon.