Be Newsletters - Volume 4: 1999

Issue 4-38, September 22, 1999

Be Engineering Insights: When the Fast Case Is Not Enough

By R. Jason Sams

Most programs where speed counts have a "fast case" code path somewhere in the code. This is a path which is optimized for a particular state of the program. Examples of this would be an aligned memcpy or gourad triangles in OpenGL. The developer has picked a state combination that he thinks is important and has created either an optimized C or asm path for that case. This often speeds up that case considerably. However, it has some problems.

The big problem with using fast cases is that not all cases are fast. An example of this would be setting any one of the color masks in GL to false. This will cut performance by ~25%, because you now miss what the designer considered a fast case and you hit the generic slow case. It's not that this case need be any slower, but either that it wasn't considered important or there wasn't time to write a fast case for it.

So, why not write fast cases for every path (or my path)? I often see the question "Why isn't drawing with XYZ enabled faster?" To give you an idea how many paths would have to be optimized just for triangle drawing, here's a brief list of state that needs to be considered.

State               Possible Settings
Color Masks                16
Depth Mask                  2
Depth Test                  2
Depth Function              8
Alpha Test                  2
Alpha Function              8
Stencil Test                2
Stencil Function            8
Stencil Pass Op             8
Stencil Fail Op             8
Total                 8388608 !!!!

This table doesn't include texturing, stipple, fog, or blending, among others. You can see how choosing five or six of the above combinations can easily miss the needs of many applications.

Another solution is needed. One that works well is to dynamically generate the fast path for a given case as needed. This gives much greater flexibility in covering more cases and generally improving performance, because you can literally have a separate code path for every case.

Dynamic assembly is not the same as self-modifying code. It's much more closely related to a JIT (just-in-time compiler) than self-modifying code. With dynamic assembly, the code is not modified after it's created until it is no longer needed. This avoids the big problem with self-modifying code; that is, that most modern processors will at best stall for a few hundred cycles, or at worst not maintain coherency between the data and instruction caches. When the code is created we need only update the processor state once for many thousands of uses.

The next problem that comes up is creating the asm code. Many of you are probably wondering how to go about adding an assembler to your program. The answer is that you don't really need to do so for most programs. Let's take a simple case of a blitter.

We want the following abilities:

Source is 8-bit index or 32-bit RGBA.
Destination is 32-bit RGBA.
Support the eight standard alpha test functions.
Support replace and xor write modes.

From the above list you get 2*8*2 paths. With 32 code paths it might be practical to implement every case by hand, but it would quickly get out of control if you added more destination or source types, or any other features.

To implement this the problem can be broken into these components, or steps:

Some function initialization.
Loop initialization.
Loading of the source.
Alpha test.
Write destination.
Continue loop.
Clean up and return.

Here is some sample code for each component (x86):

Initialization
    Push edi
    Push esi
    Mov edi, [esp+12]   ; Destination
    Mov esi, [esp+ 16]  ; Source
    Mov ecx, [esp+ 20]  ; Count


Load source index
    Movzx  edx, byte [esi]
    Mov edx, [theTable + edx]


Load source RGBA
    Mov edx, [esi]


Alpha test
    Mov eax, edx
    Shr eax, 24
    Cmp eax, [alphaRefrence]
    ## Jcc toContinueLoop


Write destination
    Mov [edi], edx


Continue loop
    Dec ecx
    ## Jnz toLoop


Cleanup and Return
    Pop esi
    Pop edi
    Ret

Once you have the above parts ready (probably in an assembly file of components) you can assemble them to make a function. The easiest way to do this (on x86) is simply to allocate some memory large enough to hold the largest case and copy the pieces into place from your parts file to the memory allocation. Here's the example for color index and alpha function of write on not equal:

Allocate memory for function.
Copy initialization code.
Note the current index into the allocation for loop fix up later.
Copy source index loader.
Copy alpha test code.
Make space for conditional jump and note the location.
Copy destination writer code.
Fix up alpha test conditional jump. The function is not equal so we choose JE as our conditional test and insert it at the address noted from step 6 above to jump to the current location.
Copy continue loop code.
Fix up loop jump. We need a JNZ to jump from the current address to the address noted in step 3 above.
Copy clean up and return code.

After following these steps you should have a blitter fully optimized for all the above cases. The trickiest part of dynamic assembly is carefully choosing the register allocation, since all your components are constrained by the choices you make. The other difficulty is dealing with all the conditional jumps. Because they are relative, the offsets need to be calculated at assembly time, unless the jump and its destination are contained within one code component. Calls are also relative and need a similar fix up if used.

I hope this will help you to write code that is very fast and covers all the needed cases. Please note that while this example is for x86, the techniques should also work well on other processors. Thanks for reading.

Be Engineering Insights: Long Live the Input Recorder, and 104-Key Keyboards, Too

By Steven Black

Greetings one and all, whether you read my articles or ignore them! Today I'll talk about two new and different items. One's a little script which provides an unsupported keymap-independent solution for 104-key keyboards. The other is a generic input filter that enhances the Input Recorder.

First, my 104-key keyboard solution. Because we need access to the Option ^[1] key in BeOS, and 101-key keyboards lack an equivalent key, we've made the right Control key behave like an Option key by default. While most people don't notice this as a problem, if you have a 104-key keyboard and habitually use your right Control key, it's inconvenient.

We have a "keymap" command line program which allows users to modify their own keymaps. Although it's fairly straight- forward, I wouldn't call it the easiest thing for a novice to use.

There was an item on BeWare which remapped the right Control and Option keys to be as labeled, but it mysteriously disappeared. I hadn't noticed its disappearance until I recommended it to someone. When I realized it was gone, I whipped up this script, as I felt bad about recommending something that didn't exist:

<ftp://ftp.be.com/pub/contrib/util/104-keymap-script.zip>

This script reads in your current keymap, adjusts it slightly, and writes it out to a file in your home directory. (It auto- matically loads the modified keymap, but doesn't automatically overwrite the file in your home directory.) There are command line options to alter both of these behaviors. Check out --help for details.

The script does a read-modify-write, with your current keymap. The only changes are to the right Option and right Control keys, and only if your current keymap has those two keys mapped to their defaults. (For example, if you're using a custom keymap with the right Control behaving like number lock, it won't touch that key.)

And now for Part II of my article. This is an input filter which sends an application-defined message to an application, based upon a hot key combination.

While there are already applications which launch programs via key combinations, there is currently no generic solution for simply sending messages to applications. Because I wanted this for the Input Recorder, I decided to take the friendlier approach and whip up a generic solution for the problem.

The latest InputRecorder zip file has a WarmKeysFilter. The install script now also installs this:

<ftp://ftp.be.com/pub/samples/input server/InputRecorder.zip>

As you'll see if you look at the source, the Input Recorder sends a message to the filter, informing it of the requested hot key information. One new, interesting use for this is that it lets you start and stop recording—even if the InputRecorder window is minimized or on a different workspace.

The Input Recorder uses the WarmKeys filter to enable the user to press Play—(Modifier+P), stop, (Modifier+S)—or Record -- (Modifier+R) -- when the application is not active or minimized. The Input Recorder filter was modified to pass through the WarmKeys messages, so you can press Modifier+S to stop, even when input is locked out. For the modifier, it uses the Menu key on x86 systems keyboards, and the right Control on PPC systems.

If you change which modifier key is used by WarmKeys, (for example, you want to use the menu key on your BeBox, or use a 101-key keyboard on your x86 machine) you'll also want to make a change in Input Recorder filter, so that it can't lock out the WarmKeys messages.

If there were a way to make the WarmKeys filter always get the keyboard event before the InputRecorder filter, it would always catch and process hot keys, and the InputFilter wouldn't need to know anything about passing through hot key events. However, this would also prevent the InputRecorder from recording hot key events.

Currently, there is no logic in the WarmKeys filter. The InputRecorder requests its three keys; if someone already has events attached to those keys, the request fails. While the InputRecorder could get a message informing it of the status of the request, there is no logic in place in the InputRecorder to deal with this.

Clearly, randomly requesting keys won't work in the long term. Some sort of method is required that allows the user to assign which keys do what when dealing with more applications.

In the WarmKeys filter, I've included ways to check to see who is receiving events from keys. There are also ways to verify that your hot keys were given to you properly.

However, I do see this as an excellent third-party opportunity. There's already at least one hot key enabling technology available on BeWare, although at the time I'm writing this, there's no solution which allows you to send application-defined messages.

The little filter does everything that I wanted done superbly. The Input Recorder is most grateful for this evolutionary leap in its development.

Developers' Workshop: Translation Kit Part 1: File Panel Fun

By Daniel Switkin

Welcome to Part I of a two-part look at the Translation Kit, a subject near and dear to my heart. In this installment, I'll look at a common task: how to modify a file panel in order to save an image in any available format.

I hear someone in the crowd yell, "But that's simple! Just subclass BFilePanel, and then start adding some controls to the window, and... uh, wait a second. What do you mean BFilePanel isn't a descendant of BWindow?" And so the matter of message handling comes up, as does font sensitivity, and with a few other gotchas along the way, you suddenly have a newsletter article worth reading again. So grab the code, and let's get to it! If the code is not there yet, it will be soon.

<ftp://ftp.be.com/pub/samples/translation_kit/TranslatorPanel.zip>

BFilePanel is a strange beast. Its headers live in the Storage Kit, but you have to link against libtracker.so. It also doesn't inherit from anything, it simply owns a window. The primary problem this poses is that when you add your own controls, you don't have a BHandler around to receive their messages. You could add all your controls to a view (which is a BHandler), but that would require them to be adjacent in a way that they all fit into that parent view. You could also handle the messages in the application's looper (where a BFilePanel's messages go by default), but then your code is spread all over, and you don't have access to private members of your file panel.

The solution I've chosen (which is not the only one) is to build the save dialog by subclassing both BFilePanel and BHandler. This provides a MessageReceived() hook where it's most needed, and allows me to place controls anywhere in the window. Of course, handlers need to be attached to loopers, and conveniently the file panel's window fits the bill. To set this up, do:

TranslatorSavePanel::TranslatorSavePanel() : BFilePanel(),
    BHandler() {

    Window()->AddHandler(this);
}

Look at the source to see all the arguments and gory details. The one important thing to do before you start adding controls is to move the existing ones out of the way. The Be Book documents the names of all the standard parts of a file panel, so you can FindView() and then MoveBy() as appropriate. When you add your own controls, make sure they're positioned relative to the existing pieces:

// Build the "Settings" button relative to the cancel button
BRect rect = cancel->Frame();
rect.right = rect.left - 10;
float width = be_plain_font->StringWidth("Settings...")
    + 20;
rect.left = (width > 75) ? (rect.right - width) :
    (rect.right - 75);
BButton *settings = new BButton(rect, "Settings",
    "Settings...", new BMessage(SAVE_FILE_PANEL_SETTINGS),
    B_FOLLOW_RIGHT | B_FOLLOW_BOTTOM,
    B_WILL_DRAW | B_NAVIGABLE);
background->AddChild(settings);
settings->SetTarget(this);

Now to add a list of all the available translators. Some people like to add this to the menu bar on top; personally, I like to add a BMenuField above the file name. One way to do this is to subclass BMenuItem and add a translator id variable. This allows the desired output format to be queried by finding the currently marked menu item. When the user chooses to save a file, this id is included in the file panel's message, and is passed to Translate() to complete the request. Building the menu looks like this:

void TranslatorSavePanel::BuildMenu() {
    formatpopupmenu = new BPopUpMenu("FormatPopUpMenu");
    BTranslatorRoster *roster = BTranslatorRoster::Default();
    translator_id *translators;
    int32 count;

    // Find all translators on the system
    roster->GetAllTranslators(&translators, &count);
    const translation_format *format;
    int32 format_count;

    for (int x = 0; x < count; x++) {
        // Determine which formats this one can write
        roster->GetOutputFormats(translators[x], &format,
            &format_count);
        for (int y = 0; y < format_count; y++) {
            // Check if this is an image translator
            if (format[y].group == B_TRANSLATOR_BITMAP) {
                // If this format saves to some native
                // format build a menu item for it
                if (format[y].type == B_TRANSLATOR_BITMAP)
                    continue;
                TranslatorMenuItem *item = new
                    TranslatorMenuItem(format[y].name,
                    new BMessage(SAVE_FILE_PANEL_FORMAT),
                    translators[x]);
                item->SetTarget(this);
                formatpopupmenu->AddItem(item);
                break;
            }
        }
    }
    delete [] translators;

    // Pick the first translator in the list
    TranslatorMenuItem *item = (TranslatorMenuItem*)
        formatpopupmenu->ItemAt(0);
    if (item != NULL) item->SetMarked(true);
}

Make sure you don't delete the default translator roster—you're just borrowing it.

The settings button tries to build a configuration view for the selected translator. It passes a pointer to a BRect(0, 0, 239, 239) to BTranslatorRoster::MakeConfigurationView() in case the translator doesn't set the size of the rect itself. It then builds a window according to this rect, and ensures that the view is the right size and in the right place. The layout and sizing of this view has been standardized for the next BeOS release—I'll go into details of that in my next article.

The rest of the app is fairly simple: it draws a bitmap if there is one, or "Drop an image here" if there isn't. It also handles loading and saving of images. Loading an image is as simple as grabbing an entry ref from a message:

entry_ref ref;
if (message->FindRef("refs", &ref) != B_OK) return;
BEntry entry(&ref);
BPath path(&entry);
BBitmap *new_bitmap =
    BTranslationUtils::GetBitmapFile(path.Path());

Saving is also pretty easy, with only one catch. After recovering your information from the message, and creating a BFile, watch out for BBitmapStream taking your bitmap down with it:

BTranslatorRoster *roster = BTranslatorRoster::Default();
BBitmapStream stream(bitmap);
// If the id is no longer valid or the translator fails
// for any other reason, catch it here
if(roster->Translate(id, &stream, NULL, &file, 0) != B_OK) {
    BAlert *alert = new BAlert(NULL,
        "Could not save the image.", "OK");
    alert->Go();
}
// Reclaim the ownership of the bitmap, otherwise it would
// be deleted when stream passes out of scope
stream.DetachBitmap(&bitmap);

Stay tuned for Part II of our examination of the Translation Kit: a prefabricated translator add-on for reading and writing images.

A Whiff of Infinity...

By Jean-Louis Gassée

As the title of today's column does not suggest, my topic is Life After the PC. As for the title itself, I'll attempt to connect it to the topic without too many unseemly contortions or painful puns.

First of all, Life After the PC is a misnomer. PCs will be with us for a long time. We've progressed from the first crawling instances of the species in 1967, to the affordable microprocessors that started the current explosion in the late 70s, to the personal supercomputers of today (to quote a captain of our industry). Prospects for their continuing evolution are good: faster better cheaper processors and memories, I/O, displays, networks...

So, why even mention Life After the PC if so many refinements are at hand? Because PCs are becoming "free." Though as with cell phones, "free" actually means a different form of payment. That is, you commit to use a service, wireless telephony or Internet access, for a number of months, and you get a free phone or a free PC. Good services and good devices, where "free" amounts to a form of loan and risk-taking by the service provider. For PCs to achieve that status means we're now in a mature, commoditized market. One might joke that the girth of Office 2000 running on Windows 2000 requires a 600 Mhz Pentium III, but many consumers have figured out that a competent 366 Mhz Celeron PC, minus the Compuserve rebate, is "free" and perfectly capable of running Windows applications.

Note the last part of the sentence, Windows applications. Once upon a time, there was an appliance called a Word Processor. That's all it did and, for a while it processed words better than PCs running clunky imitation word processing software. But it died because PCs gave off this wonderful whiff of infinity. Customers perceived the infinite potential for PCs to run all kinds of software. Some useful, some not, some utilitarian, some fun, some reliable, some awful. But the abundance, the fun, the dream won over the seriousness, the finite fitness-to-purpose of word processing appliances.

So, if PCs are a) "free" and b) "infinite," how come we're so enthusiastic about the future of Web appliances? The answer lies in another infinity. Just as PCs are surrounded by myriads of applications, the Web offers another kind of infinity of content—information, entertainment, services, even applications available from servers somewhere in the clouds. Some believe that PCs will absorb this new Web dimension and offer both infinities rolled into one. This is the philosophy behind Microsoft's unified user interface for the desktop and the Net, with everything represented as a Web page. It was an interesting experiment. And, yes, one can hope PCs will become simpler, shedding costly, clunky, vestigial organs such as serial, parallel, and keyboard ports and the ISA bus. It might happen—the irresistible drive to offer add-ons, graphics and sound cards, new-fangled DVD-ROM or CR-RW units might subside—but when? In other words, some think PCs will be simultaneously even more protean and even simpler than they are today. Procrustean might be a better word.

I believe that discussing PCs and Web appliances in an either/or manner isn't helpful. We want both. Knowledgeable PC users know what PCs do well; where they're too complex and cumbersome, users will want both PCs and Internet appliances. Normal human beings who've stayed away from PCs because of their perceived Swiss Army knife complexity will be happy to have devices dedicated to one task. They'll appreciate drilling the Web and enjoying one kind of infinity without being encumbered by the other, or being constricted by the finite range of the older style appliances.

Which points to the need for a better word than "appliance" for the new genre we're discussing. Here in the land of neologism—and Silicon Valley is the capital of buzzwords—I know this immigrant is about to learn a new noun. "Wa" would be nice—sounds Japanese and actually means something like harmony. In any case, we need something short and crisp, like PIM, PDA, or NC. Suggestions?

^[1] The key I refer to as the Option Key, is only labeled "Option" on Mac-style keyboards. On 104-key keyboards, it often sports the logo of an alternate operating system (e.g., a "Window"). It's in the same location on all keyboards, between the Alt/Command key and the Control key. I refer to it as the Option key, because in BeOS its identifier is named B_OPTION_KEY.