Issue 3-37, September 16, 1998

Be Engineering Insights: Doing More Work Than You Should

By Dominic Giampaolo

Recently I've been working on the disk cache code in the BeOS and I thought that my travails in trying to optimize it would make a good Newsletter article.

The BeOS disk cache was written about a year and a half ago. I was in a bit of a hurry because I was also trying to complete the file system at the same time. The cache code works and meets the needs of the file system but the implementation left something to be desired. I decided that for BeOS Release 4, rewriting a big chunk of it to clean it up and take advantage of the new scatter-gather primitives would be a good thing, and should even improve performance.

First some background. The BeOS disk cache is a two-part data structure. There is a hash table indexed by device and block number as well as an LRU (least-recently-used)-ordered doubly linked list that keeps track of all the disk blocks in the cache. The hash table is used to quickly look up a block to see if it's in the cache. The linked list is ordered by how recently the blocks have been used. Most recently used blocks are at the head of the list and older blocks at the tail. The linked list decides who to kick out of the cache when the cache is full and needs to be flushed. This is a pretty standard design for a disk cache. The problem wasn't the overall design but rather the implementation.

Because there were no scatter-gather primitives when the cache was originally written, the cache has to use a temporary block of memory to copy cache blocks into before writing them to disk. The idea was that if many consecutive disk blocks are being flushed, it made more sense to do one single write than many individual disk writes. Had the cache done individual disk writes for consecutive disk blocks it would have avoided the memcpy() to the temporary buffer but it would have also performed poorly. This is where scatter-gather seemed to offer a great advantage: the cache could do a single large I/O even though all the cache blocks weren't contiguous in memory.

The same problem occurs when the cache tries to do read-ahead (but it's worse). On read-ahead the lack of scatter-gather primitives means that the cache first has to read the data into a temporary buffer, copy it to the appropriate cache blocks, and then finally copy it to the user buffer. These extra memcpy()'s seemed grossly inefficient and always were a source of disappointment to me. It seemed that the cache performance would improve significantly if I could eliminate the memcpy()'s.

In addition, I looked at the huge LRU list of all disk blocks and felt certain I could improve the cache performance if I separated all the disk blocks into individual lists depending on their state (clean, dirty, or locked in the cache). This way it seemed that deciding which blocks to flush would require traversing through fewer blocks, which would definitely be more efficient.

With these general principles in mind I set about rewriting the disk cache. After mucking about with the code and a bit of debugging (in a user level test program) I emerged with a clean, shiny new cache. Blocks were separated into clean, dirty, and locked lists, there were no extra memcpy's and the code seemed a shining example of good software engineering.

Then I put it the code into the kernel to see how my changes affected real-world performance.

After a little more debugging (whoops) it was ready for testing. I was very eager to see the results.

I ran the tests and...(drumroll please)...it was slower.

I cringed. How could this be? I have these nicely organized lists, I'm not doing any extra memcpy()'s, and the code is so much cleaner! How could it be slower?

I started to look for explanations. Perhaps because I was using scatter-gather I/O now, the extra calls to lock_memory() and get_memory_map() in the disk drivers were eroding my performance (remember the cardinal rule of software engineering: if it doesn't work or is slow, it's someone else's fault). I took a trip down to Brian Swetland's office (our resident SCSI god) to discuss the bad news with him. He instrumented the SCSI driver to measure the cost of the VM-related calls and disappointment struck again: there was indeed extra overhead associated with lock_memory() and get_memory_map() calls, but it was insignificant compared with the cost of the I/O.

Brian also implemented another performance monitoring tool that showed the size and amount of time used by each I/O through the SCSI driver. Looking at the output surprised me and provided the first clue about what was causing the problem. The list of blocks being written by the file system was poorly organized—many writes were happening to individual disk blocks. This surprised me because I had spent a good deal of time looking at traces of the cache before its first release to make sure that it would be a good citizen and flush data as contiguously as possible. Obviously, this was no longer happening.

I went back and used my test program (which is just BFS running as a user program) and looked at the I/O traces again. Clearly, they were not optimal. Then it dawned on me what was happening: in my effort to clean up the cache I broke the original single list of blocks into three separate lists.

Originally, when deciding who to kick out of the cache, the code to select victims would step through all the blocks loaded, which inevitably meant good-sized runs of contiguous disk blocks could be found (because read-ahead always reads in contiguous chunks of disk). With the list of blocks separated into three lists, the code that tries to pick victims to flush only scans the dirty list, so it would not be as likely to find contiguous runs of disk blocks.

To understand what happened, consider the following example. First, the file system asks to read a single block in, say, block 1000. The cache in turn performs read-ahead and reads in 32 extra blocks (1000 through 1031). Now let's say that the file system allocates and modifies three blocks, 1001, 1010, and 1025 (assume the other blocks were already allocated). When the file system is done with those three blocks and they are eventually flushed, the new cache code will have to do three separate I/O's to flush blocks 1001, 1010, and 1025, because they are the only blocks on the dirty list. The old cache code would instead do a single write of all blocks from 1001 through 1025 because it would find all the blocks on its single list of blocks and even though most of the blocks were clean, doing a single write is much faster than doing three individual writes.

After I realized this, the solution was simple: I merged the clean and dirty lists into a "normal" list and re-ran my tests. As expected, the performance numbers were back to normal and sometimes a bit faster.

I still wasn't satisfied though. Why was there no speed boost now that the spiffy new cache code was using scatter-gather and avoiding as many as three memcpy()'s per I/O? The answer is simple: the cost of the I/O's so far outweighed the cost of the memcpy()'s that eliminating them made no difference in performance. Although the absolute performance numbers did not increase, by eliminating the memcpy()'s the cache is now much friendlier to the rest of the system and isn't using memory bandwidth (and CPU time) that could be better spent doing something else. So while the performance numbers may not have changed, the cache is still "better."

The title of this article, "Doing More Work Than You Should," applies in two ways. First, writing more data to disk in one transaction is doing more work but is faster than writing less data in separate transactions. Second, even though the old cache was doing a lot more work than it should have with all its extra memcpy()'s, there was no noticeable performance difference in the speed of file system benchmarks since that extra work was lost in the noise when compared to the cost of the disk I/O.

There are two main things you can learn from this. First, if your application uses an on-disk data structure, think about the layout of the structure on disk. If there are lots of small pieces that require seeking around (such as a B+tree), it can be slower to access compared to having a larger, more contiguous data structure. For example, I just wrote a test program which reads 1024 random 1k chunks from a 1 megabyte file and another program which reads 20 megabytes contiguously (1k at a time). The random reads of 1 megabyte took 9.5 seconds versus 3.5 seconds for the contiguous read of 20 megabytes (and reading the 20 meg file 64k at a time is even faster). Depending on how you access an on-disk data structure and how big it is, it may make sense to use the brute-force approach and just store everything linearly and read through it each time.

The second lesson is an old one that I know but didn't think about when expecting a performance gain from my cache rewrite. You have to know the relative costs of the operations your program does if you want to optimize it. You can spend a lot of time optimizing a particular part of your program but if it only accounts for 1% of the total time, no matter how much you optimize it you won't improve the overall performance of your program. In my case, eliminating the memcpy()'s didn't affect the performance of the cache because they were a small amount of time relative to the time it took to do the disk I/O.

The flip side is that in the case of disk I/O, it may pay to copy your data around to make it contiguous before writing it to disk, since the cost of doing the memcpy() is small compared to the cost of the I/O.

I hope this article will help people better understand where and what the costs are when doing disk I/O. Knowing how to structure your I/O can have a significant impact on the I/O performance of your app.


Be Engineering Insights: Code Maintenance for the Millennium and a DBUG Update

By Fred Fish

As programmers, we all generally strive to write code which is as bug free as possible and is easily maintainable. Because we completely understand the code we write (or at least we should), we sometimes fail to appreciate how hard it may be for future code maintainers to reach even a basic level of understanding of the overall structure of a large application and the flow of control that takes place when it runs. One thing we can do to improve the ongoing maintenance process is to build some "internal instrumentation" into the application from the start.

Internal instrumentation is already a familiar concept to most programmers, since it is usually the first debugging technique learned. Typically, "print statements" are inserted in the source code at interesting points, the code is recompiled and executed, and the resulting output is examined in an attempt to determine where the problem is. An example of this would be something like:

#include <stdio.h>
main (argc, argv)
int argc;
char *argv[];
{
  printf ("argv[0] = %d\n", argv[0]);
  /*
  * Rest of program
  */
  printf ("== done ==\n");
}

Eventually, and usually after at least several iterations, the problem will be found and corrected. At this point, the newly inserted print statements must be dealt with. One obvious solution is to simply delete them all. Beginners usually do this a few times until they have to repeat the entire process every time a new bug pops up. The second most obvious solution is to somehow disable the output, either through the source code comment facility, creation of a debug variable to be switched on or off, or by using the C preprocessor. Below is an example of all three techniques:

#include <stdio.h>

int debug = 0;

main (argc, argv)
int argc;
char *argv[];
{
  /* printf ("argv = %x\n", argv) */
  if (debug) printf ("argv[0] = %d\n", argv[0]);
  /*
  * Rest of program
  */
#ifdef DEBUG
  printf ("== done ==\n");
#endif
}

Each technique has its advantages and disadvantages with respect to dynamic versus static activation, source code overhead, recompilation requirements, ease of use, program readability, etc. Overuse of the preprocessor solution leads to problems with source code readability and maintainability when multiple #ifdef symbols are to be defined or undefined based on specific types of debug desired.

My solution to this problem is a package I wrote in 1984 and subsequently released into the public domain when I saw how useful it was to myself and others. This package, known as "DBUG," hasn't changed much in the last 10 years or so, though there have been a few variants of it floating around on the Internet recently.

Motivated by a desire to see it support multithreaded applications, and by a very real need to find a problem that was preventing the latest version of Bash from working on BeOS as a boot shell, I recently modified the DBUG runtime to link into the BeOS kernel and instrument some portions of the kernel so I could better understand what was happening inside the kernel at boot time and fix the problem with bash. Since the BeOS kernel itself is multithreaded, I had to make substantial changes to the DBUG code that got linked into the kernel. Encouraged by that success, I retrofitted the changes into the mainline DBUG sources and the new package is now available for use by BeOS application writers, with a few caveats, but more on that later.

Let's take a quick look at how we instrument code with the DBUG package. Consider a simple-minded factorial program which is implemented recursively to better demonstrate some of the DBUG package features. There are two source files, main.c and factorial.c:

/* ============== main.c ============== */

#include <stdio.h>
#include "dbug.h"

int
main (argc, argv)
int argc;
char *argv[];
{
  register int result, ix;
  extern int factorial (), atoi ();

  DBUG_ENTER ("main");
  DBUG_PROCESS (argv[0]);
  DBUG_PUSH_ENV ("DBUG");
  for (ix = 1; ix < argc && argv[ix][0] == '-'; ix++) {
    switch (argv[ix][1]) {
      case '#':
       DBUG_PUSH (&(argv[ix][2]));
       break;
    }
  }
  for (; ix < argc; ix++) {
    DBUG_PRINT ("args", ("argv[%d] = %s", ix, argv[ix]));
    result = factorial (atoi (argv[ix]));
    printf ("%d\n", result);
    fflush (stdout);
  }
  DBUG_RETURN (0);
}

/* ============== factorial.c ============== */

#include <stdio.h>
#include "dbug.h"

int factorial (value)
register int value;
{
  DBUG_ENTER ("factorial");
  DBUG_PRINT ("find", ("find %d factorial", value));
  if (value > 1) {
    value *= factorial (value - 1);
  }
  DBUG_PRINT ("result", ("result is %d", value));
  DBUG_RETURN (value);
}

On BeOS, we might create the "factorial" application by running the following, where "$" is our shell prompt:

$ mwcc -c -DDBUG main.c
$ mwcc -c -DDBUG factorial.c
$ mwcc -o factorial main.o factorial.o -ldbug

This assumes that we have put dbug.h someplace where the compiler will find it, and also put the runtime library (libdbug.a) where it will be found by the linker. If we then run factorial we get something like the following output:

$ factorial 1 2 3 4 5
1
2
6
24
120

To enable various features of the internal instrumentation provided by the DBUG package, we have several ways of telling the DBUG runtime what sort of information we want to see as the application executes. From the command line, the easiest way to do this is to pass it various flags via the "-#" option. As an example, to enable function tracing, we would use the "t" option:

$ factorial -#t 2 3
| >factorial
| | >factorial
| | <factorial
| <factorial
2
| >factorial
| | >factorial
| | | >factorial
| | | <factorial
| | <factorial
| <factorial
6

Note that entering a function produces a line with ">funcname", leaving it produces "<funcname", and the nesting level is shown graphically. We can turn on additional output by using the "d" option, which in it's most primitive form produces something like:

$ factorial -#d 2 3
main: args: argv[2] = 2
factorial: find: find 2 factorial
factorial: find: find 1 factorial
factorial: result: result is 1
factorial: result: result is 2
2
main: args: argv[3] = 3
factorial: find: find 3 factorial
factorial: find: find 2 factorial
factorial: find: find 1 factorial
factorial: result: result is 1
factorial: result: result is 2
factorial: result: result is 6
6

Of course, we can use multiple options at the same time, including some additional ones that do things like print file names, line numbers of the corresponding source code for particular instrumentation output, etc. As one last example with our factorial program, before we move on to other things, consider:

$ factorial -#d:t:F:L 6
      main.c:  23: | args: argv[2] = 6
 factorial.c:   7: | >factorial
 factorial.c:   8: | | find: find 6 factorial
 factorial.c:   7: | | >factorial
 factorial.c:   8: | | | find: find 5 factorial
 factorial.c:   7: | | | >factorial
 factorial.c:   8: | | | | find: find 4 factorial
 factorial.c:   7: | | | | >factorial
 factorial.c:   8: | | | | | find: find 3 factorial
 factorial.c:   7: | | | | | >factorial
 factorial.c:   8: | | | | | | find: find 2 factorial
 factorial.c:   7: | | | | | | >factorial
 factorial.c:   8: | | | | | | | find: find 1 factorial
 factorial.c:  12: | | | | | | | result: result is 1
 factorial.c:  13: | | | | | | <factorial
 factorial.c:  12: | | | | | | result: result is 2
 factorial.c:  13: | | | | | <factorial
 factorial.c:  12: | | | | | result: result is 6
 factorial.c:  13: | | | | <factorial
 factorial.c:  12: | | | | result: result is 24
 factorial.c:  13: | | | <factorial
 factorial.c:  12: | | | result: result is 120
 factorial.c:  13: | | <factorial
 factorial.c:  12: | | result: result is 720
 factorial.c:  13: | <factorial
720
      main.c:  28: <main

While testing the new multithreaded support, I added the ability for the DBUG runtime to emit its output to the serial port, and added a handful of DBUG_* macros to the BeBounce demo program. Starting this modified BeBounce now produces the following output at the serial port:

main.cpp:   80: | >TBounceApp::TBounceApp
main.cpp:   97: | | count: found 1 apps with our signature
main.cpp:  106: | | ball: we are first instance and ball is in our court
main.cpp:  288: | | >TWindow::TWindow
main.cpp:  303: | | | ball: our window has the ball, so add it
main.cpp:  376: | | | >TWindow::AddBall
main.cpp:  382: | | | | ball: adding a new ball
main.cpp:  648: | | | | >Tball::TBall
main.cpp:  659: | | | | | ball: fSleep 0, fPercentRemaining 0.000000
main.cpp:  664: | | | | <Tball::TBall
main.cpp:  797: | | | | >TBall::SetGap
main.cpp:  798: | | | | | ball: start -1.000000, end -1.000000
main.cpp:  803: | | | | <TBall::SetGap
main.cpp:  386: | | | <TWindow::AddBall
main.cpp:  585: | | | >TWindow::DrawOffScreen
main.cpp:  672: | | | | >TBall::Draw
main.cpp:  681: | | | | <TBall::Draw
main.cpp:  612: | | | <TWindow::DrawOffScreen
main.cpp:  330: | | <TWindow::TWindow
main.cpp:  138: | <TBounceApp::TBounceApp
main.cpp:   60: | run: start new bebounce running

...

If you've been debugging your apps on x86 BeOS with plain old printfs, you may want to consider using the DBUG package to include instrumentation that will be invaluable to future maintainers as well as making your current debugging task much easier. Sometime in the next couple of weeks I'll nail down the last couple annoying bugs that make the multitasking support less useful than it could be, and put a new version of DBUG into BeWare, along with an example program like BeBounce. Until then, you can play with the alpha version available at:

<ftp://ftp.ninemoons.com:pub/geekgadgets/be/i586/alpha/dbug.tgz>

Feel free to offer suggestions for ways to improve this package, particularly the multithreaded support: fnf@be.com


Developers Workshop: Called Himself a King

By Doug Fulton
Q:

I noticed that the second volume of the Be Book (Advanced Topics) has finally arrived on the shelves of the local tree-killers. I'd love to assimilate the wisdom therein, but I'm opposed to the ecoterrorism that's waged by the mongers of pulp and ink. Moreover, I'm cheap and don't want to spring for the $39.95. Can you tell me everything that's in the book so I don't have to buy it?

-- Dunvallo Molmutius, Trinovant, Britain

A:

We share your concern, Mr. Molmutius, and we think you'll be happy to hear that "Advanced Topics" is only printed on paper made from free-range trees. As for scooping the press, this column doesn't permit the reprinting of an entire 500 page book, but we can take a quick look at the Midi Kit.

Because the edge of the table is always closer than it appears in the mirror, the Midi Kit documentation has gotten short shrift the last couple of releases. Although a number of MIDI-related columns have been published here in the last year or so, some of the finer details (such as bugs) have gone undocumented, and one of the Kit's most amusing classes (BSamples) hasn't even been mentioned. For the full story, you'll have to head over to O'Reilly (http://www.oreilly.com/catalog/beadv/), or wait for the on-line version to appear on the Be website. In the meantime I will give you some highlights from the synthesis section of the Midi Kit chapter.

BSynth

The BeOS includes a 16-channel General MIDI software synthesizer designed by HeadSpace Inc. (http://www.headspace.com/). In addition to realizing MIDI data, the synthesizer can also play back audio sample data. The synthesizer is represented by the BSynth class.

Any application that wants to use the synthesizer must include a BSynth object; however, most applications won't need to create the object directly: The BMidiSynth, BMidiSynthFile, and BSamples classes create a BSynth object for you. You can only have one BSynth object in your app; it's represented by the global be_synth object.

The synthesizer can generate as many as 32 voices simultaneously, where a voice is a MIDI note or a stream of audio data (a BSamples object). By default the BSynth allocates 28 voices for MIDI and 4 for samples; you can change this allotment through BSynth::SetVoiceLimits().

BSynth doesn't have any API for actually playing MIDI data. To play MIDI data, you need an instance of BMidiSynth or BMidiSynthFile.

BMidiSynth

If you want to send MIDI data to the synthesizer, you have to create an instance of BMidiSynth. BMidiSynth derives from BMidi, so you can play notes on it directly by calling NoteOn(), NoteOff(), etc.

BMidiSynth doesn't spray MIDI messages, so it doesn't do any good to connect other BMidi objects to its output. In other words, don't do this:

/* --- DON'T DO THIS ---  It's meaningless. */
midiSynth.Connect(someOtherMidiObject);

Before using your BMidiSynth, you have to call EnableInput(). The function enables the object's input and tells the synthesizer whether it should load the "synth file" (this is the file that contains the synthesizer's instrument definitions). If you tell EnableInput() not to load the file, you'll have to load the instruments that you want yourself.

On a slow machine, loading the entire file can take a very long time, so you may want to load the instruments yourself as they're needed. For example, here we load a single instrument, then play a note. We also have to send a ProgramChange() message to tell the BMidiSynth object to use our loaded instruments on the proper channels:

/* Enable input, but don't load the synth file. */
midiSynth.EnableInput(true, false);

/* Load an instrument. */
midiSynth.LoadInstrument(B_TINKLE_BELL);

/* Associate the instrument with a MIDI channel. */
midiSynth.ProgramChange(1, B_TINKLE_BELL);

/* Play. */
midiSynth.NoteOn(1, 84, 100);
snooze(1000000);
midiSynth.NoteOff(1, 84, 100);

To use the MIDI Channel 10 percussion instruments, you must load all instruments:

/* I want percussion, therefore... */
midiSynth.EnableInput(true, true);

NOTE: BMidiSynth's MuteChannel(), GetMuteMap(), SoloChannel(), and GetSoloMap() functions are broken. Don't use them.

BMidiSynthFile

If you want to realize the contents of a MIDI file, you have to use an instance of BMidiSynthFile (BMidiSynthFile derives from BMidiSynth). *Don't* try to play a MIDI file by connecting a BMidiStore to a BMidiSynth.

You should create a different BMidiSynthFile object for each MIDI file that you want to mix together. Although its possible for a single BMidiSynthFile to load and play more than one file at a time, you shouldn't rely on this feature.

You don't have to call EnableInput() when you use a BMidiSynthFile; the function that loads the MIDI file (LoadFile()) calls it for you, loading just those instruments that are called for by the file.

BMidiSynthFile is different from other BMidi objects in that it doesn't have a run loop. The lack of a run loop shouldn't affect the way you write your code, but you should be aware that the thread isn't there so you won't go looking for it.

Furthermore, BMidiSynthFile doesn't implement the Run() function. Starting and stopping the object's performance (activities that are normally handled in the Run() function) are handled by the synthesizer in its own synthesis thread. If you create a BMidiSynthFile subclass, don't try to resurrect the Run() function—leave it as a no-op.

BSamples

The BSamples class lets you add a stream of audio samples into the MIDI mix. When you create a BSamples object, it automatically creates a BSynth object and puts it in "samples only" mode. Unfortunately, this mode is broken. The easiest way around this bug is to construct a BMidiSynth/ BMidiSynthFile object (either before or after you create your BSamples -- it doesn't matter which). If you don't need the extra object, you can immediately destroy it; the fix is effected by the object's construction.

The object's brain is in its Start() function:

void Start( void *samples,
            int32  frameCount,
            int16  sampleSize,
            int16  channelCount,
            double samplingRate,
            int32  loopStart,
            int32  loopEnd,
            double volume,
            double stereoPan,
            int32  hookArg,
            sample_loop_hook loopHook,
            sample_exit_hook exitHook  )
  • samples is a pointer to the audio data itself. The data is assumed to be little-endian linear.

  • frameCount is the number of frames of audio data.

  • sampleSize is the size of a single sample, in bytes (1 or 2).

  • channelCount is the number of channels of data (1 or 2).

  • samplingRate is the rate at which you want to the data played back, expressed as frames-per-second. The range of valid values is [0, ~65 kHz]. You can change the object's sampling rate on the fly through SetSamplingRate() (fool your friends).

  • loopStart and loopEnd specify the first and last frames that are in the "loop section." The loop section can be any valid section of frames within the sound data (i.e. [0, frameCount - 1] inclusive). Everything up to the beginning of the loop section is the attack section; everything after the loop section is the release section.

    When the sound is played, the attack section is heard, then the loop section is repeated until the object is told to Stop(), or until the loopHook function (defined below) returns false, at which point the release section is played. If you don't want the sound to loop, set the loop arguments to 0.

    Currently, the release section is automatically faded out over a brief period of time. If your release section is designed to do a slow fade (for example) you probably won't hear it.

  • volume is an amplitude scalar.

  • stereoPan locates the sound stereophonically, where -1.0 is hard left, 0.0 is center, and 1.0 is hard right. Notice that if this is a stereo sound, a stereoPan value of (say) -1.0 completely attenuates the right channel—it doesn't move the right channel into the left channel.

  • hookArg is an arbitrary value that's passed to the loopHook and exitHook functions.

  • loopHook is a hook function that's called each time the loop section is about to repeat. If the function returns true, the loop is, indeed, repeated. If it returns false, the release section is played and the sound stops. If you don't supply a loopHook, the loop is automatically repeated (unless the loop is set to 0).

  • exitHook is called when the sound is all done playing, regardless of how it stopped (whether through Stop(), a loopHook return of false, or because the BSamples object was deleted).

When you tell a BSamples to Start(), it starts playing immediately. You can stop it through the Stop() function, and you can pause and resume it through Resume() and Pause() -- respectively (really!). Until further notice, the Pause() and Resume() functions are backwards: To pause a sound, call Resume(). To resume it, call Pause(). Sorry about that.


The Internet Coming of Age?

By Jean-Louis Gassée

It's nice when the venerable older media sagely proclaim that the Internet is now official just because a salacious opus, hidden behind a legal fig leaf, spread itself around the world like a case of CTD -- cyber-transmitted disease.

The event or, rather, the reaction to it, tends to prove the old rule that carnality is what brings new media into the mainstream. Centuries ago, the mail was denounced for facilitating illicit romances. We can buy reprints of more recent art nouveau images promoting the newly invented telephone. I recall one, split diagonally in two; on one side is a dark, sensitive, long-eyelashed male murmuring sweet nothings to the female in period drag gracing the other half...

We remember how the VCR got started, with "educational" videos. The Minitel was a home information terminal offered free to French consumers by the state telecommunication monopoly in the late seventies, in turn for renouncing printed phone books. France Telecom had an incredible deal for you: if you built a Minitel server, it would bill the user on your behalf for the time spent online, say investigating the merits of Michelin tires, and only keep 20% to 25% of the gross. The Minitel followed the usual route to mainstream popularity and, as a result, France Telecom was accused of being the largest smut peddler in the Western world.

Now the French government has been unseated as the top state smut monger. Since last Friday, our own federal government has become the uncontested world leader in peddling online smut. But does that constitute a coming of age for the Internet? Let's be serious. I thought the way the network infrastructure resisted the latest stock market downs and ups was a much better cause for celebration. Last year, a "false" market correction caused highly visible disruptions, this year the Net e-trading doomsday stories were nowhere to be heard.

Does this resilience show that our new printing press has achieved respectable standing in the community? Probably. It's imperfect, but we can rely on it for serious activities such as tracking packages, street directions, research, buying groceries, books, stocks, games, downloading music...

Speaking of downloading, and if we have to use human development metaphors, the record shows that the Net spent a couple of decades gestating in government and university research labs. It was born for us with the browser as the Web, and it demonstrated fairly robust qualities a good two or three years before Ken Starr's measly 244 KB zip file arrived. Downloading the multimegabyte Navigator and Explorer archives is a far manlier challenge.

Now we've clear-cut the original IP protocols and the telephone infrastructure—along with the arrival of the licentious Starr report a sure sign of the maturity of the medium. The next two stages of Internet development ought to take us much further, to using the Net without thinking, just as we expect to find and use telephones everywhere. One of these stages might be the advent of real Internet appliances. The other might be real, pervasive high-speed access, not undelivered promises.

Creative Commons License
Legal Notice
This work is licensed under a Creative Commons Attribution-Non commercial-No Derivative Works 3.0 License.