Be Newsletters - Volume 3: 1998

Issue 3-30, July 29, 1998

Be Engineering Insights: "Threads Don't Like Playing Ping Pong," Part I

By Pierre Raynaud-Richard

The main purpose of the engineering articles is to deliver simple and accurate technical information that's meaningful to as many people as possible. Thus, simple articles are better than complex ones—a rule I follow as often as I can.

But not today.

With apologies to the average developer who is never going to deal with the subtleties I'm about to discuss, I just can't stop myself! After many general information articles on various subjects, I felt an irresistible urge to write some strong technical content. You've been warned, now continue at your own risk.

At the heart of the BeOS graphic system lies the application server. It's been the theater of continuous cleaning, reorganization and optimisation during the last 12 months, and more will happen, at least until Release 6.

Recently, we discovered a nasty interaction between the app server and the kernel's thread scheduler. This problem had very serious consequences, and solving it has brought a significant performance improvement to Release 4. It also pointed out an interesting class of problems, that some of you could encounter in the future (if not already).

So in this article, we are going to study different mechanisms you can use to synchronize threads, and try to describe their behavior under well-defined stress conditions. For that we used one or two "worker" threads, executing the same reference task again and again. They had to use a synchronization mechanism to guarantee that only one thread executed the task at a time (from now on, the reference task will be called the "critical section").

To simulate the CPU noise generated by other unrelated threads, we sometimes added an additional dummy thread. The dummy thread didn't vie for the critical section—in other words, it didn't interfere directly with the other two threads.

All measurements were done on a Pentium II 350MHz in conditions as stable and similar as possible, with most of the servers, applications and services disabled, but with some daemons left running. In this environment, all code that we executed was always available in the level 2 cache (if not in the level 1!). In the real world, of course, this isn't a realistic assumption, but as you will see, studying the perfect case is hard enough already...

TEST 1—An Ideal World

In this test, one B_NORMAL_PRIORITY thread repeatedly executed a 6.1us ("us" = "microsecond") task that's protected by a simple semaphore:

while (true) {
  acquire_sem(g_sem);
  do_critical_section(6.1);
  release_sem(g_sem);
}

We ran the loop for about two seconds and recorded the latency (time consumption) for each lock and for each unlock. We then plotted the results on separate graphs (i.e. one for locking, another for unlocking). These graphs represent a very rich source of information, and we're going to be studying them throughout most of this article.

The graph's horizontal axis is latency in microseconds; the vertical axis is the hit count, or the number of locks/unlocks at a particular latency. Both axes are exponential and elided where necessary (to fit the format of an email, and reduce the length of this article). All graphs are supposed to be viewed with a monospaced font.

This first graph shows the locking latency:

-------------One thread, semaphore, locking-------------
32768|    A
     |   AA
16384|   AA
     |   AA
 8192|   AA
     |   AA
 4096|   AA
     |  AAA
     .  ...
     |  AAA
  128|  AAA
     |  AAA   B
   64|  AAA   BB
     |  AAA   BB
   32|  AAA  BBBB
     |  AAA  BBBB
   16|  AAA  BBBB
     |  AAA  BBBBB           C
    8|  AAA  BBBBB           C
     |  AAAA BBBBB           C
    4|  AAAA BBBBB           CC
     |  AAAA BBBBB           CC
    2|  AAAA BBBBB        # CCC          #    #
     |  AAAABBBBBB ##    ## CCC #        #  ###            #             #
     +!-------!-------!-------!-------!-------!-------!-------!--...--!--
      0       8       16      32      64      128     256     512     2048

The (A) spike shows the case where the semaphore is acquired without blocking. This happens 99.6% of the time with an average latency of 3.5us.
In the (B) spike, a scheduler interruption occurs while acquire_sem() is executing, but without rescheduling (i.e. acquire_sem() continues after the interruption). Probability: 0.3%; average latency: 8.5us. So both (A) and (B) represent the same phenomena, except that (B) is delayed 5.0us by the scheduler interruption. This delay is only a part of the more generic "scheduler noise," and so it doesn't represent any relevant information. In further comments, we will call that phenomena "scheduler echo"; we will identify it when needed, but we won't spend any more time commenting about it.
(C) is another component of the noise, weaker than the "scheduler echo," but with an higher latency. It's probably the signature of a daemon (most likely a real-time priority thread) that runs regularly (from example 60 times a second) for a short amount of time (15 to 20us). Once added with the two necessary context switch latencies, that leaves us with what we will call the "daemon echo." In this case it's an "echo" of (A), with an addi- tional latency of 27us.
What's left—the scattered "#"s—is the rest of the noise generated by either the scheduler or some other daemons whenever they used some CPU. Although the noise delay can reach 3000us, this is actually reasonable since it reflects quite accurately the maximum scheduling quanta. This contains even less relevant information than the two previously described "echoes," so from now we won't pay any attention to it.

Here's the graph for unlocking :

-------------One thread, semaphore, unlocking------------------
32768|    A
     |   AA
16384|   AA
     |   AA
 8192|   AA
     |   AA
 4096|   AA
     |  AAA
     .  ...
     |  AAA
  256|  AAA
     |  AAA    B
  128|  AAA   BB
     |  AAA   BBB
   64|  AAA  BBBB
     |  AAA  BBBBB
   32|  AAA  BBBBB
     |  AAA  BBBBB
   16|  AAA  BBBBB            C
     |  AAA  BBBBB            C
    8|  AAA  BBBBB           CC
     |  AAA  BBBBBB          CC
    4|  AAAA BBBBBB  #       CCC
     |  AAAA BBBBBB ##      CCCC
    2|  AAAABBBBBBB ##      CCCC         ## # #                    #
     |  AAAABBBBBBB###      CCCCC        ## ### #         #        #
     +!-------!-------!-------!-------!-------!-------!-----...--!-----
      0       8       16      32      64      128     256        2048

(A) Again, non-blocking acquisition is the overwhelming winner: 99.06% and 3.5us.
(B) "scheduler echo" of (A), has a higher probability (0.86%) than in the analogous locking test, but that's because the unlocking measurement includes the noise associated with both the unlocking and the critical section. This is an artifact of measuring the locking and unlocking with only two timestamps :
```
while (true) {
    TIME_STAMP();
    acquire_sem(g_sem);
    TIME_STAMP();
    do_critical_section(6.1);
    release_sem(g_sem);
}
```
This allowed us to reduce the impact of time-measurement on the system, as we can just subtract the latency of the critical section from the second latency to get the unlocking latency. But there was no way to avoid increasing the magnitude of the noise. Still, it's only noise -- as long as we can identify it (and ignore it), we don't care about its magnitude.
(C) is the "daemon echo" of (A).
Round trip: More than 99% of the time, a trip through the entire loop (acquire the semaphore, perform the critical section, release the semaphore), takes about 13.1us (3.5 acquisition, 6.1 critical section, 3.5 release).

TEST 2—An Idealer World

We replaced the semaphore with a benaphore. You can look at one of the previous Engineering Insights articles for more detailed information about the benaphore: Be Engineering Insights: Benaphores.

while (true) {
    if (atomic_add(&g_lock, 1) > 0)
        acquire_sem(g_ben);
    do_critical_section(6.1);
    if (atomic_add(&g_lock, -1) > 1)
        release_sem(g_ben);
}

-------------One thread, benaphore, locking-------------
65536|A
     |A
32768|AA
     |AA
     ...
     |AA
   16|AA    B
     |AA   BBB
    8|AA   BBB
     |AAA  BBB
    4|AAA  BBB
     |AAA BBBBB               C
    2|AAA BBBBB              CC
     |AAA BBBBB              CC          #
     +!-------!-------!-------!-------!-------!--
      0       8       16      32      64      128

As expected, the probability and timing are much improved:

(A) represents the main and fast case of the benaphore (when we don't have to take the semaphore at all), with an outrageously good probability of 99.96%. Second good news, as expected the latency is down to 0.3us (12x faster than the semaphore).
(B) is the "scheduler echo" of (A). Its probability is very low because (A) runs for such a short time that it's rarely interrupted by the scheduler.
Same thing for (C), a very weak "daemon echo" of (A).

-------------One thread, benaphore, unlocking-------------
65536|A
     |AA
     ...
     |AA
  256|AAA  BB
     |AAA  BBB
  128|AAA  BBB
     |AAA  BBB
   64|AAA  BBBB
     |AAA  BBBB
   32|AAA  BBBB
     |AAA BBBBB              C
   16|AAA BBBBB              CC
     |AAA BBBBB              CC
    8|AAA BBBBB              CC
     |AAA BBBBBB             CC
    4|AAA BBBBBB             CCC
     |AAA BBBBBBB            CCC         #
    2|AAA BBBBBBBBB          CCC         ##  ##   #       #         #
     |AAA BBBBBBBBBBB  ## # CCCC         ##  ## ### ##    #         #
     +!-------!-------!-------!-------!-------!-------!-----...--!-----
      0       8       16      32      64      128     256        2048

(A) is the logical mirror of the previous (A), with a similarly good latency of 0.3us.
(B) "scheduler echo" of (A), is stronger than the previous (B). This was expected, because we get the noise for both the unlocking and the critical section.
Same comment for both (C), the "daemon echo" of (A), and (#) the other noise.

Round-trip: Greater than 99% probability of a 6.7us total round-trip -- pretty good for a 6.1us critical section. But, then again, this is an ideal—and so unrealistic—test.

NOTE: In all our tests, the unlocking statistics are mostly similar to one or the other of the two already shown. Unlocking never blocks so, as you might expect, it doesn't really affect overall system performance. So as there isn't too much to be learned from those, we won't show any unlocking graphs for any of the other tests, but will only make some comments when necessary.

TEST 3—A Small Dose of Reality

Let's see what happens when we introduce a second B_NORMAL_PRIORITY thread. We'll have two equivalent threads fighting over the lock. First, we use a semaphore:

-------------Two threads, semaphore, locking-------------
16384|                    A
     .                    .
     |                    A
 2048|                   AA
     |    B              AA
 1024|   BB              AA
     .   ..              ..
  256|   BB              AA C
     |   BB              AA CC
  128|  BBB              AA CC
     |  BBB              AA CC
   64|  BBB              AA CC
     .  ...              .....
   16|  BBB              AACCC     D
     |  BBB              AACCCC    D
    8|  BBB              AACCCC   DD          #                     #
     |  BBB              AACCCC   DD          #                     #
    4|  BBB   #          AACCCC   DDD      #  #                     #
     |  BBB   #          AACCCC  DDDD      #  #                   # #
    2|  BBB   #       #  AACCCCC DDDDD     #  ##                  # #
     |  BBB # ##     ##AAAACCCCCDDDDDD     ## ##     #   ##       # #
     +!-------!-------!-------!-------!-------!-------!-----...--!----
      0       8       16      32      64      128     256        2048

NOTE: This is a graph of only one of the threads. The profile of the other thread is very similar, as expected, with the same spikes in term of both probability and latency:

(A) : 24.138us (85.677%) against 24.168us (85.285%)
(B) :  3.545us (11.741%) against  3.543us (12.159%)
(C) : 29.059us ( 2.303%) against 28.878us ( 2.236%)
(D) : 52.718us ( 0.137%) against 51.378us ( 0.157%)

So from now on, we won't comment about the second thread at all.

(A) The most common case is no longer unblocked acquisition. Instead, the threads are starting to "ping pong": 85.6% of time, the thread must wait to acquire the semaphore, with a latency of 24.1us. That time includes the first thread blocking, a context switch, the second thread unblocking, the second thread executing the critical section, the second thread blocking, another context switch and finally our first thread unblocking. That also implies that the context switch takes about 5us, a very impressive number, but not very realistic as in this case everything clearly happens in the level 1 cache.
(B) is the unblocked acquisition spike. The latency is logically the same as in the one-thread test (3.5us), but the probability has dropped to 11.7%. This is easy to explain: whenever the scheduler suspends both threads in a state where no one owns the semaphore, whatever thread starts first is free from any synchronization issue, as if it was alone. And that lasts until the scheduler suspends it again, which most likely puts the system back into "ping pong" mode.
(C & D) are the "scheduler echo" and "daemon echo" of (A).

Round-trip: 24.1 + 6.1 + 5.0 = 35.1us for 7/8 of the iterations. The "ping pong" clearly affected the performance, but it's not a total disaster compared to the 13.1 * 2 (don't forget, now we have 2 threads working in parallel!) = 26.2us of the semaphore.

TEST 4—The Benaphore Curiosity

Mostly out of curiosity, let's see what happens when we use a benaphore in the two threads context.

-------------Two threads, benaphore, locking-------------

16384|                    A
     .                    .
     |                    A
 2048|B                   AA
     |B                   AA
 1024|BB                  AA
     |BB                 AAA
  512|BB  C              AAA
     |BB  C              AAA
  256|BB CC              AAA D
     ... ..              ... .
   32|BB CC              AAADD
     |BBCCC              AAADD
   16|BBCCC              AAADD     E
     |BBCCC              AAADDD    E
    8|BBCCC              AAADDD  # E
     |BBCCC              AAADDD  # EE
    4|BBCCC              AAADDD  #EEE
     |BBCCC   #          AAADDD  #EEE       #                       #
    2|BBCCCC  #         AAAADDD  #EEE       # ###                   #
     |BBCCCC  ##       AAAAADDDD #EEE     ####### #  #    ##       ##
     +!-------!-------!-------!-------!-------!-------!-----...--!----
      0       8       16      32      64      128     256        2048

(A, D & E) The ping pong mode and its two "echoes" comprise 82% of the iterations with a latency of 24.7us (+5.0us and +27.0us for the echoes).
(B) Non-blocking acquisition comprises 14.3% at 0.31us (a result similar to the previous non-blocking acquisition for two threads using a semaphore, but just faster as we have a benaphore now).
(C) This aberration is the result of a race condition in which the benaphore becomes a semaphore: Thread A sets the benaphore counter (bc) to 1 and enters the critical section. Thread B comes along, sets bc to 2, and heads for acquire_sem()—BUT—the scheduler switches to thread A before thread B gets to actually invoke acquire_sem(). Thread A leaves the critical section, decrements bc to 1 and calls release_sem(). Since no one is actually waiting for the semaphore (since thread B never got that far), thread A sets the semaphore counter to 1. After that, thread A increments bc to 2 and call acquire_sem(). Since the semaphore has already been released, thread A acquires it without blocking. Then it passes through the critical section, decrements bc to 1, releases the semaphore... and so on until the scheduler decides it's B's turn. In other words, all those acquire_sem() and release_sem() calls were unnecessary, but they don't really hurt all that much (it's still better than playing ping pong).

As expected, in ping pong mode the benaphore is slightly slower than the semaphore (24.7us vs. 24.1us), but the non-blocking case is still much faster (0.31us vs. 3.5us). The unlocking test (not shown) also satisfies this expectation.

Round-trip: A little bit better than the semaphore. The improvement in non-blocking performance makes up for the degraded ping pong. In real life, it's not clear that we would get a similar behavior, but in any case, even if the benaphore gets slower than the semaphore, it won't be by much.

First part conclusion:

When two threads have to fight for the ownership of a shared critical section, performances can be significantly reduced. The snappy little benaphore degenerates into a semaphore, and the CPU efficiency is reduced by a small factor—the two-thread tests are 1.4 to 2.5 times less efficient than the one-thread tests. This would seem bad enough already, but who knows what's going to happen once we move those tests from a ideal and isolated environment into a more realistic one. Clearly a story for next week...

Developers' Workshop: Mucking With Menus

By Owen Smith

Foreword

Harlan Ellison once noted that first impressions are deadly, dangerous, and deceiving. Since this is my debut Newsletter article, I'm aware of what's at stake. I've had my hair cut. I've shaved my face raw and put a styptic pen in my pocket for the nicks. I'm wearing that same collared shirt, spiffy tie, and cotton slacks that almost lost me the job when I interviewed here. All this so that you'll walk away from this column with that warm, fuzzy, BeOS-coding kind of feeling. So much for introductions.

Delicatessens, Laminates, and American Consumers

An odd phenomenon occurs after midnight on summer weekends here in the land of suburban sprawl. In eerie emulation of the coastal grunion runs, hordes of Hollywood zombies issue forth from their movie theater coffins. At that time of night activities are scarce—absent certain chemical and hormonal addictions, and/or a zoot suit and the irrepressible urge to dance.

For the rest of us there's the 24-hour delicatessen. Some go for the food; some go to avoid the grunion. Me, I head to the deli to marvel at one thing: a laminated, folded sheet of cardboard stock which, when unfolded, stands two feet high and three feet wide. This item is not just a menu—it's a symbol of the bewildering diversity of American culture, of the consumer's RIGHT TO CHOOSE. Confronted by this formidable selection, I forego exercising my birthright and always get the corned beef Reuben and the vanilla Coke.

So what has that got to do with the BeOS, you ask? The connection is menus. These are not just the cornerstone of the contemporary GUI; they also show off BMessages and object- oriented design to good effect. And, they provide a safe, wholesome springboard to the wonderful world of the Interface Kit (see the foreword on the Importance of First Impressions).

To whet your appetite for menus, I've concocted an example of an application that tweaks menus a bit more than the average Be application. Appropriately enough, it's called MenuWorld. Go ahead, download it:

ftp://ftp.be.com/pub/samples/interface_kit/MenuWorld.zip

The following notes boil down the essence of this example.

The Four Corners

Ninety percent of menu behavior is defined in four classes:

BMessage:
A command that is executed when a menu item is selected. If you're unfamiliar with BMessage, check The Be Book Application Kit documentation:
BMessage.html.
A romantic relationship awaits you.
BMenu:
A subclass of BView that displays a pop-up or pull- down menu. The most commonly used functions here are AddItem(), AddSeparatorItem(), and RemoveItem(). Note that AddItem() takes either a single item or a whole submenu. You've seen hierarchical menus before; you know the drill. Also note that BMenus appear and disappear in pop-up BWindows, which makes an interesting flourish to apply to other pop-up UI widgets—such as tool tips.
BMenuBar:
A subclass of BMenu that specifically displays a pull-down menu bar. This is often added to a window in the window's constructor, and automatically sizes itself quite nicely.
BMenuItem:
Represents one drawable item in the menu. Although it's not actually a view (it uses the BMenu it's contained in for drawing), it's capable of invoking a command, as any decent BInvoker can. This is an excellent candidate for subclassing if you've got some custom item drawing to do (and I did; see the MenuWorld class BitmapMenuItem).

Primping and Preening

There are three different kinds of menus as far as drawing goes: column-drawn (standard), row-drawn (e.g., menu bars), and custom-drawn (called "matrix," though there's no restriction on how the items are arranged). You specify which one you're using when you create the menu. MenuWorld proudly presents all three methods.

If you're arranging menu items in a column, things couldn't be easier. The menu works with your menu items to figure out how big to make the menu, and how to arrange the items within the menu. The menu also takes care of drawing submenu triangles and shortcut symbols. All you need to do is create your menu items and add them to the menu. Working with row-by-row organization is similar.

If you're doing a custom arrangement of items, things get more interesting. Here are some points to consider:

You need to calculate the bounds of the menu, and the frame of each item in the menu. This can't be done directly with standard BMenuItems, so you'd better be ready to derive your own menu item subclass, and provide some ability to calculate the item's frame based on either its contents, the menu's dimensions, or both. (Note that blindly calling the menu item's GetContentSize() function won't work—it's protected, and can only be called by the BMenu class, not you!) In MenuWorld, the Test menu displays icons, whose sizes are well defined, so this is not difficult. Each icon is represented by the simple class BitmapMenuItem, and the class TestMenuBuilder does the actual work of building the menu items and menu.
The padding that would be inserted around normal BMenuItems in a column arrangement to take care of the submenu triangles and shortcuts is noticeably missing in a row or custom arrangement. So, if you want those submenu wedges and shortcuts, or the hefty left and right margins that these imply, you're going to need to cook them up yourself.

ZPG and Other Population Paradigms

Stuffing your menus with items is easy, and can be done in several places:

For many purposes, it's enough to add the items once at the beginning, generally when your window is being built. The File menu in MenuWorld is built up this way (see MenuWindow::BuildFileMenu()).
Simple item changes traceable to a single action can be done at any time. For instance, when the Add Menu button is clicked, the window immediately adds the menu to the menu bar. Similarly, when the Delete button is pressed while a top-level menu is selected, the window immediately removes the menu from the menu bar. You can easily extend this approach to swap in and out different menu bars from the window as needed (the OLE paradigm of menu-document relations comes hurtling to mind, but try doing this in MFC under any other circumstances!). In MenuWorld, clicking on the check box Hide User Test Items swaps the menu bar that contains the user menu items for a menu bar which doesn't.
If you have a fixed set of menus in the menu bar, you may want to defer touching up the menu items until the menus are actually being shown (i.e., in BWindow::MenusBeginning()). Here you can enable/disable items, or even completely build up the contents of menus. If your menus have especially volatile contents, or the interactions that cause menu items to be enabled or disabled are particularly complicated, this is a great way to go. MenuWorld demonstrates the approach by dynamically building up the contents of each user-specified menu in MenuWindow::MenusBeginning().

Magick Mother Invocation

Since BMenuItems inherit from BInvoker, you can pull many of the same tricks with them as you can with controls and other invokers. In particular, your messages can be arbitrarily complex, and you don't need to dispatch them to the main window. For instance, the File->About... menu item dispatches to the global application object. Also note that each user item contains the same message ID, so they all get handled in the same place (a separate field identifies the actual name of the item which sent the message); the test items are set up in a similar manner. Again, try achieving that same kind of flexibility with MFC!

There is a price to this flexibility, however. Assuming MenuWorld were a full-featured menu editor, let's say we wanted to save the menus we've created and use them in our application, à la Visual C++ menu resource editing. The Be way to do this is to archive the Menu Bar and its constituents into a BMessage, then write the flattened BMessage data to a BResources object.

The problem, which ties in to a recent missive on BeDevTalk, is that the BMenuItems do not store their targets—indeed, they have no inherent way to identify a target once they've been archived and reconstituted. One thing you can do pretty easily is to set the target of each menu item to the main window when you instantiate them. However, if you want to set special BMenuItem targets in MenuWorld, coming up with an archiving mechanism is non-trivial. Some third-party apps on BeWare, such as AppSketcher and Interface Elements, address this problem:

http://www.be.com/beware/Development.html#Cat_Interface%20Creation

That's it for this week. I leave you with a German blessing: "Froehlich sei das Codeverbessern; guten Appetit!"

BeDevTalk Summary

BeDevTalk is an unmonitored discussion group in which technical information is shared by Be developers and interested parties. In this column, we summarize some of the active threads, listed by their subject lines as they appear, verbatim, in the mail.

To subscribe to BeDevTalk, visit the mailing list page on our web site: http://www.be.com/aboutbe/mailinglists.html.

NEW

Subject: Random not so random?

The opening parry regarded the proper seeding of the random number generator: Call srandom() (or srand()) only once, and before any random() (rand()) calls. The safest bet is to set the seed in main().

Other details popped out:

The high bits are more random than the low bits. If you want to restrict the range of the random number, you should divide rather than mask (or mod).
If you want a random number that's bigger than 16 bits, combine multiple random() calls.
random() is more random than rand(). But is it truly random? Some folks think it's about as random as you can get. Others say that because the implementation varies between compilers, you shouldn't rely on random(). Instead, you should find a good algorithm and roll it yourself.

Subject: objects in areas

How do you put an object in a shared area so more than one application can use it? For example, how do you share a BBitmap? As asked by Ingo Weinhold:

“The only way I see is to archive the BBitmap object and flatten this archive to the area. [However,] each application has to unflatten and reinstantiate the object, which won't save any memory...”

“So let's assume it's possible...other applications could clone the area, but the logical addresses may differ. An object that has stored pointers to address its data will get into trouble if its logical address changes. Correct?”

First things first: No, you can't share an object (not safely and predictably, anyway). As for creating an object at a specific address, Jon Watte reminds us that C++ provides a "placement" allocation.

Subject: (Sigh) BeOS on PPC

Announcing a grass roots movement to petition Apple to release G3 specs to Be. The devil's advocate response from Geoffrey Clements (which, read literally, depicts a bizarre anatomy):

“Who cares, really? Yeah a G3 running the Mac OS will kick a Pentium II running Windows's ass. But a dual Pentium II running the BeOS will Kick the G3's ass.”

Yes, but with a proviso:

“BeOS is a faster operating system, so long as it doesn't get klunkier in each release. Its been rather noticeable that BeBoxen that could zip speedily along in DR8.2 (BeOS in 1996) are almost unusably slow in R3. I'm speculating that one of these slowdowns is due to the MIME typing file handling system.”

The proposed solution: Make MIME-typing optional. "What?!?" quoth Chris Herborth:

“How could the filesystem's MIME file attributes slow down the display and applications in R3? The filesystem in R3 is _orders_of_magnitude_ faster than the one in DR8. And it wasn't the only thing that changed between DR8 and R3.”

“If the file typing is made optional, we'll have the same useless filesystem as Windows and UNIX; our files won't know what they are, and we'll have to rely on stupid file extensions for everything.”

Subject: Performance followup

A spawn of the BeOS on G3 thread, listeners discussed system performance. Is it degrading? Is it the app server's fault? MIME types? Anti-aliased fonts? After some discussion

THE BE LINE: (From George Hoffman).

“The graphics team is well aware of the state of app_server performance. Not all of the perceived slowdown from PR2 to R3 is the fault of the app_server, but there are hard numbers which do indicate a slight slowdown in graphics performance. ... One of the big priorities we've had for R4 is performance, and I'm pleased to say that the R4 development app_server is quite a bit faster than R3, and, more importantly, _feels_ faster.”

(Also, see Pierre Raynaud-Richard's article, above.)

Subject: Broken assembly debugger on Intel R3.1?

Delesley S. Hutchins is having trouble with Be's (ahem) streamlined debugger:

“...I can't get the debugger to work... I type "help", and nothing happens... it doesn't even go back to the debugger prompt.”

From the scept'red isle, that other Eden, demi-paradise, royal throne of kings and Tinky Winky, Simon Clarke suggests, moddishly:

“Do you have

export BEDEBUG=true

set in your user setup environment file? I had the same problem if I went into the debugger from the alert that a crashed app shows. However, with the debugger on all the time it works fine.”

Subject: Tips on maniacal code optimization

Q:	What's the first step along the path to optimization?
A:	"Profile it first!"