Be Newsletters - Volume 5: 2000

Issue 5-12, March 22, 2000

Be Engineering Insights: Ego Surfing

By George Hoffman

If you do a Google search for "George Hoffman," the whole first page of hits is for some "Captain George Hoffman" from New Jersey who not only is no relation to me but is also dead. If you just search for "Hoffman," I'm not even in the first hundred hits (I didn't look any farther). If you search for "geh," there are a bunch of hits for the George Eastman House, followed by a lot of stuff that must be total garbage because *it* isn't even in *Engish*.

In fact, the only Google search I know of for me in which I actually appear on the first page of hits is for "George Earle Hoffman" (my full, unfortunate name). This turns up two references to two separately maintained lists of GNUStep volunteers. I'm listed as working on a GNUStep test suite, which must be the result of an e-mail question I sent to the project maintainer in 1994. Let this be a warning to you all: Open Source is catching. It could happen to you!

What's my point? I recently decided that there simply isn't enough with my name on it. In the grand scheme of things, I've done nothing with my life. My existence is meaningless. Compared to the likes of Stephen Hawking, Ani DiFranco, or Regis Philbin, I am as an ant to the mighty cockroach. Now, this is a realization that everyone (except maybe The Regis) comes to at one point or another in their lives. Some people deal with it gracefully, and some people grab automatic weapons and head for the nearest clock tower.

How did I deal with this personal crisis? Well, I'd like to say that I rose to the challenge and adopted a Cause: steadfastly dedicated my life to preserving the rain forests, or feeding starving children, or freeing Kevin Mitnick, or organizing sit-ins or walk-outs or walkathons, or saving baby seals, or clubbing baby seals, or being outraged at people who have genitals or who use them in a certain way, or one of those other things that somehow always results in the blocking off of perfectly good roads which happen to lead where I want to go. But what I actually did was move to Millbrae, buy a red sports car, and start dating a girl who is far too young for me.

I could have sworn there was a point to all that. Oh, yes! I also wrote some code.

You see, you may think that what Be engineers are good at is writing an operating system. This is an imprecise statement. You may think that we are good at doing things the "right" way. This, while arguably true, is also not quite right. What Be engineers are best at is taking some problem which is laughably easy to solve and making it complicated enough to interest us. It's not enough, for example, for us to be able to play a movie. We have to be able to play 30 of them, in parallel, on eight CPUs, while surfing the Web and playing Quake II. (You may have heard this referred to in our marketing literature as "sexy"... we engineers know that's just marketing speak for "butt-ass complicated.") Who buys an eight-processor machine and then watches 30 movies on it all at the same time? Beats me. They told us they could sell it, so we made it.

You may have heard that recently they told us they can sell it even better if it's smaller and can browse the Web really, *really* well. Web browsers are complicated beasts, but not nearly complicated enough for us. Our browser has to be totally ripped. So a number of us have been working on the Next Generation BeOS Browser, based on the latest Opera Browser technology, and oh man, it doesn't get much more "sexy" than this.

One thing the astute engineer will notice about the Web is that it is fairly large. There is thus a lot of data that can be attained by examining it. Furthermore, this data comes in small packets: things like GIF images, HTML pages, style sheets, Flash movies, etc. For the traditional monolithic browser, this kind of data is not too hard to manage, but for a highly multithreaded browser like the one we're creating, there arises a question of locking granularity. If I'm accessing a given resource, say, a GIF image, in one thread of a multithreaded application, how do I control access to that resource? I need a way to guarantee that there will be no destructive interference between two threads that desire access to a resource.

I could use semaphores, of course. Let's say I decide to give each resource a semaphore, and use it to mutex access to the resource. There are two problems with this approach: one, if access to the resource happens often, I may take a sizable performance hit, because I need to enter the kernel twice for every pass through the critical section. Second, I use up a semaphore, a kernel object, for each and every resource. Now, I could switch to the benaphore, a locking primitive invented by our own Benoit Schillings, which has been discussed in this newsletter many times. Benaphores use an extra atomic variable to eliminate the need for entering the kernel in low-contention situations.

This does much to help solve the first problem, especially as contention will be low for this type of fine-grain locking, but the second issue is still just as bad. Solving this second problem is of great concern when dealing with the kind of small devices to which we're expanding our platform support, as RAM is a valuable and scarce resource on these devices. Each semaphore takes up 33 bytes at a minimum, plus whatever overallocation malloc makes for a structure of this size, plus four more bytes in the kernel for a pointer to the semaphore in the global semaphore list and another eight in user space to store the semaphore ID and the benaphore atomic variable. This adds up to around (depending on the behavior of malloc) 52 bytes for every benaphore. There's also a hard limit, set by the kernel, on the number of semaphores that can be allocated.

This is a locking primitive we're talking about here. Not a terribly exciting piece of code. But let's make it as complicated as possible: let's say I have a bunch of resources—oh, say, twenty thousand items -- and I need to be able to lock each and every one of them separately. Memory is precious. Contention is low. What's the ideal solution? In this situation, semaphores (or benaphores) aren't even an option without some changes to the kernel. I cleverly chose the figure twenty thousand for a reason: the BeOS kernel allows a system-wide maximum of 16384 semaphores in any configuration. Even if we could use benaphores, by our estimation the benaphores alone would consume almost a megabyte of RAM!

We could use some kind of many-to-one scheme where one semaphore is used to lock many resources. But then we may be locking things we don't need, and we may introduce dependencies between resources that we'd rather avoid. What we'd like is an ultra-lightweight locking primitive. It needs to take up the minimum amount of memory, perform well in the anticipated situation (i.e., low contention), and, above all, it needs to include my name somewhere in it.

Enter the gehnaphore! The gehnaphore is a locking primitive that uses only a single 32-bit atomic variable. For synchronization, it uses the fairly obscure send_data()/receive_data() API. These calls provide a one-message-deep message queue which is associated with each thread. Some people look at these calls and see a largely deprecated API which was subsumed by the generalized port API. I, on the other hand, see an opportunity to flex my monstrous ego.

Gehnaphores use the compare-and-swap operation to perform atomic updates to the 32-bit variable. This operation can easily and efficiently be implemented on multiprocessor PowerPC or Intel machines, and actually has its own instruction on the CISCy Pentium. The code I've included only has the Intel implementation, but the PowerPC version is equally trivial.

There are two trade offs to using gehnaphores. First, you probably should not depend on being able to play nice with other code that uses send_data()/receive_data(); this should not be a big concern in most code. Second, and more importantly, there will be a performance hit under contention: if there are N threads waiting for a gehnaphore when it is released by the holding thread, there will be N context switches (or N-1 more than you might expect) before the next thread in line can access the critical section. This is because threads in the queue must "chain" the lock release to the first thread in line for the lock, which is actually the *last* thread in the chain. Their performance is thus virtually identical to benaphores for a one or two thread situation, but gets worse and worse as the amount of contention goes up. This is a significant enough reason to not use gehnaphores for general purpose locking that might be subjected to high contention, but is acceptable under the constraints we've been given.

Gehnaphores use only four bytes per lock, and require kernel calls only when they cause blocking. They thus have the performance advantages of benaphores (aside from the caveat listed above) but use one thirteenth of the memory. They also do not require a kernel call to create and destroy a semaphore (the 32-bit value need only be initialized correctly), or take up an entry in the kernel semaphore list. They are probably ideal for most ultra-fine-grain locking situations. Basically, they totally kick the benaphore's ass. Heh, heh, heh...

I've also written a multiple-reader/one-writer gehnaphore which uses a whopping sixteen bytes. For very fine-grained locking, something like this is probably overkill, but there might be situations where it would make sense to use them. I'll leave it as one of my famous "exercises for the reader" to implement this (you can put the solution in the same directory as all the code you wrote to extend QuickPaint™).

The code follows. You'll need to pull out the assembly snippet into a .S assembly file and link with it in order to get the compare_and_swap32 function. Remember: changing the name of the class is a violation of the licensing agreement. Yeah, I know you were planning on it.

#include <OS.h>
#include <SupportDefs.h>

/*
#
# int32 compare_and_swap32(int32 *location, int32 oldValue,
int32 newValue);
#

.align 8
.globl compare_and_swap32
compare_and_swap32: pushl %ebx # Save these
pushl %edi
movl 12(%esp), %edi # Get location
movl 16(%esp), %eax # Get old value
movl 20(%esp), %ebx # Get new value
lock
cmpxchgl %ebx, (%edi)
sete %cl # get success
xorl %eax, %eax
movb %cl, %al
popl %edi
popl %ebx
ret
*/

class Gehnaphore {
private:
  int32 m_lockValue;
  public:
  Gehnaphore() { m_lockValue = -1; };
  void Lock();
  void Unlock();
};

inline bool cmpxchg32(int32 *atom, int32 *value, int32
newValue)
{
  int32 success = compare_and_swap32(atom, *value, newValue);
  if (!success) *value = *atom;
    return success;
};

void Gehnaphore::Lock()
{
  int32 chainedThread=-1, myThread=find_thread(NULL), msg=-1;
  bool success = false;

  while (!success) {
    while (!success && (chainedThread == -1))
      success = cmpxchg32(&m_lockValue,&chainedThread,0);

    while (!success && (chainedThread != -1))
    success = cmpxchg32(&m_lockValue,&chainedThread,myThread);
  };

  do {
    if (chainedThread != -1)
      msg = receive_data(NULL,NULL,0);

    if ((chainedThread > 0) && (chainedThread != msg))
      send_data(chainedThread,msg,NULL,0);
  } while ((chainedThread > 0) && (chainedThread != msg));
}

void Gehnaphore::Unlock()
{
  int32 chainedThread=0,myThread=find_thread(NULL);
  bool success;

  success = cmpxchg32(&m_lockValue,&chainedThread,-1);
  if (!success && (chainedThread == myThread))
    success = cmpxchg32(&m_lockValue,&chainedThread,-1);

  if (!success) send_data(chainedThread,myThread,NULL,0);
}

Developers' Workshop: Writing a USB Video Camera Driver, Part 1.01

By Todd Thomas

Part 2 of this series is unfortunately not ready yet. We will publish no code before its time. Please stay tuned to this channel.

I do have one correction to last week's article. The recommended device-naming scheme begins numbering at "1"; i.e., the CPiA driver should publish /dev/video/usb/CPiA/1 for the first camera, /dev/video/usb/CPiA/2 for the second, and so on.

While you're waiting for Part 2, I recommend reading and rereading the article at <http://www.b500.com/bepage/driver.html>. In my humble opinion, it's still the best general treatment of writing BeOS drivers there is. And I was paid nothing for that endorsement.

Also, if you have Maui, you can install the CPiA driver from Part 1 and make it do a few tricks. To install the CPiA driver, put it in /boot/home/config/add-ons/kernel/drivers/bin, then make a link to it from /boot/home/config/add-ons/kernel/drivers/dev/video/usb/. In other words, when you're done you should have something similar to

$ dir /boot/home/config/add-ons/kernel/drivers/dev/video/usb/ total 0
lrwxrwxrwx 1 tt users 0 Mar 22 04:51 CPiA ->
/boot/home/config/add-ons/kernel/drivers/bin/CPiA*

P.S.: for Maui, I've added a rule named driverinstall to the makefile-engine located in /boot/develop/etc. If you're using the makefile-engine, and you set DRIVER_PATH in your makefile to the desired path of your driver relative to /dev, you can type make driverinstall to properly install your driver. For example, check out the CPiA makefile. You'll see the line

DRIVER_PATH = video/usb

Once you've installed CPiA, try using BeOS's nifty rescan command on it when you have serial debugging turned on. Invoke it from the command line like so:

$ rescan CPiA

rescan causes the kernel to reload a driver. In so doing it calls several of the driver's exported functions. With CPiA's copious serial debug output, you should get a good idea of when and how these and other CPiA functions are called. rescan takes the name (not the path) of the driver binary as its parameter. You can also run the test harness (test.cpp) to see some of CPiA's hook functions being called.

If you don't have Maui, you can try rescanning whichever drivers you find in /boot/beos/system/add-ons/kernel/drivers/bin, but most shipping drivers are not nearly as verbose as CPiA. Some are downright speechless.

Hot tip: did you know that you can look at serial debugging output even if you don't have anything hooked up to the serial port? The command tail shows you the end of a file, and if you use the -f option it continues showing lines as they are added to the file. If you turned on serial debugging at boot by hitting Delete or F1, you can type

tail -f /var/log/syslog

into a terminal to watch serial debug output as it happens. Cool!

Warm tip: check out the kernel settings file in /boot/home/config/settings/kernel/drivers/sample/. If you make it an active settings file by moving it up to the drivers directory, you can control the behavior of syslog and serial debug output without having to remember which keys to hold down when.

If I Worked for Be...

By Jean-Louis Gassée

As you can imagine, I get a good deal of e-mail. Real e-mail-- what's left after a CTRL-T on offers for descramblers; various life-enhancing regimens, substances, or surgery; get-rich schemes; spying or counterspying techniques. When we screw up, I hear about it, and I'm grateful. Strongly worded feedback gives us a chance to correct a specific situation, or to discover a more systemic problem. Silent spite can kill a business, so I view complaints as opportunities.

Other mail is congratulatory. Since we know we still have much to do and much to prove, encouragement along the way is appreciated. Such mail sometimes comes from unexpected places: recently we got mail from the Greek army. We also get suggestions, ranging from porting our OS to various processors to writing drivers for a hot card to helping revive the Amiga. These, too, are much appreciated. We don't agree with all of them; some we strongly disagree with, because we must focus our resources according to certain priorities.

In any event, we welcome the input for several reasons. For one, we are, of necessity, myopic, and having an outside perspective is healthy. Furthermore, how many good ideas does one really need per year? One or two, maybe. I know it's fashionable to complain about e-mail volume, but personally, I prefer to filter it myself and retain the portion that brings feedback and ideas.

With this in mind, I'll share part of a message I got from Steve Klingsporn on his Top Ten Reasons for liking BeOS (at some point, I'll get to his unabbreviated list). Steve worked at Be many moons ago and has kept a strong, informed interest in our work. Steve had this to say about the introduction of BeOS 5...

"If I worked for Be, I would do the following:

Introduce BeOS 5 with the free version amidst as much fanfare as possible.
Spend a little money on web and print and maybe radio advertising—plenty of volunteers available to help -- to get the word out.
Put out press releases touting the power in having the same operating system on recent PCs and on new types of devices/IAs.
Follow up strong on the BeOS 5 free release with the release of BeOS Pro and BONE/OpenGL/Java/etc., two to four months thereafter.
Continue pursuit of BeIA adoptees. Move BeIA technologies to BeOS where appropriate and have common/new APIs.
Tie the desktop and device versions together in a really data-compatible way, maybe with a unified object/messaging format."

This is very much our philosophy for introducing BeOS 5. Details may vary here and there, but basically Steve is right. We're proud of our product and we want BeOS 5 to reach as many current and potential new users as possible. In doing this we want to broadly advertise our technology, including its strengths for media-rich Internet Appliances. As one of our recent press releases claims, we have more than 100,000 preregistrations for the free, downloadable version of BeOS 5. We look forward to a good launch. And lots more e-mail...