Be Newsletters - Volume 5: 2000

Issue 5-4, January 26, 2000

Be Engineering Insights: An Overview of Media File Formats and Codecs

By Dominic Giampaolo

“The wonderful thing about standards is that there are so many to choose from.”
—

I can't think of a better way to introduce an article about "standard" media file formats. In this article I'd like to provide a pretty detailed description of the different file formats and codecs available for storing audio and video data. I'll start with a high-level view of what is going on and then drill down to the details. My hope is that this will help people better understand what is happening when they read or write a media file.

Before we begin though, let me state that you don't strictly need to know any of this information. The BMediaFile and BMediaTrack objects handle all the grungy details for you. However, if you do know how media file formats work, it may help you understand why the APIs were designed the way they were and why they behave in certain ways.

File Formats First

The first thing to understand is the difference between a "file format" and a "codec." A file format is nothing more than a container to store data. A file format contains additional information about the movie (such as its size, the frame rate, etc.) but the file format does not typically specify that the audio or video must be encoded in a specific way. A codec, on the other hand, is a way of encoding audio or video data. How a piece of data is encoded is (typically) not dependent on the file format it's stored in.

As an example, consider an AVI file. An AVI file can store audio and video encoded with numerous different encoders. Another way to say it is that an AVI file is a way to store audio and video in a file, but you can encode that audio/video data any way you wish. Encoding audio or video data transforms it from a very high- bandwidth stream of raw data into a stream of compressed data. While you could do anything with a stream of compressed data, it is typically stored in a common file format such as AVI or QuickTime.

The following table lists the media file formats that BeOS supports:

File Format Name	Type of Data Supported
AVI	audio, video
QuickTime	audio, video, misc.
MPEG system stream	audio, video, misc.
DV stream	audio, video
AIFF	audio
8SVX	audio
WAV	audio
AVR	audio
AU	audio

There are many other file formats (such as FLI/FLC, ASF, etc.) but currently the BeOS does not support them.

Each of the file formats listed in the table (except for MPEG) has a similar structure, although each one uses its own terminology to describe the specifics. The basic idea is that a file is made up of "chunks" of data and if you don't understand a chunk you can skip over it. The way this is typically done is with a 4-byte identifier to describe the type of chunk and a 4-byte (or sometimes 8-byte) size. A diagram helps to illustrate:

file offset   data
------------+---------------------+
          0 |  0x000012b8         |
------------+---------------------+
          4 | 'moov' (0x6d6f6f76) |
----------------------------------+

This represents the first 8 bytes of a QuickTime file. QuickTime stores its chunks with the size first and then the identifier (and as one would expect, AVI does it the opposite way). The identifier is known as a "four-character code" or "fourcc." A fourcc is often a mnemonic or clever name that makes it easy to identify when looking at a hex-dump of a file. In this example the ASCII characters 'moov' have the value 0x6d6f6f76 when treated as a 4-byte integer (on a big-endian system).

The size of this "moov" chunk is 0x12b8 bytes long (4792 in decimal). Because we know the size, even if we don't know anything about a moov chunk we can correctly skip over it and parse the next item (of course, if you skip the moov chunk you won't be able to do very much!).

The concept of a chunk'ed file with an identifier and size for each chunk is a simple but good way to store data that needs to be parsed by many different types of programs. Having well-identified chunks and the ability to skip over them if they are unknown makes for a robust file format.

In addition to a linear series of chunks in a file, both QuickTime and AVI allow a chunk to contain other nested chunks. That is, certain chunk identifiers indicate that the chunk is a container that has other chunks within it. The size of a container chunk is the sum of all the chunks inside it. For example, an AVI file may contain a 'LIST' chunk that contains other chunks. The size of the LIST chunk would allow you to skip over it if you wanted. An example in a QuickTime file would be the 'trak' chunk, which contains many other chunks that describe the media track.

To get a better feel for the layout of a file format I suggest that you fire up DiskProbe and scan through some media files. You'll quickly see how they're laid out. It's also helpful to have the file format spec handy for serious perusal (a good source of specs is http://www.wotsit.org/, or just fire off a search on http://www.google.com/).

The Other File Format (Mpeg)

The MPEG file format is actually not one but three formats. You can have a raw video stream, a raw audio stream, or a "system" stream (which mixes multiple channels of audio and video together). The MPEG file formats are designed to be a continuous stream that you can start receiving and begin decoding after you find a sync token. With streaming (or broadcasting if you prefer) as a major design goal, MPEG does not have many of the features that other media file formats have.

The different MPEG file formats all have some common deficiencies. The first is that none of them have a good header identifier that is even moderately unique. Instead, MPEG files have header identifiers like 0x1ba (system stream), 0x1b3 (video stream), and 0xfff (audio stream). These values are atrocious identifiers because they are far too easy to encounter in chunks of non-MPEG data.

Next, the structure of MPEG streams is tightly bit-packed and does not lend itself well to easy parsing. For example, to extract the clock value from a video stream, you have to do this:

t = ((((bigtime_t)(stream[0] & 0x0e)) << 30) |
     (((bigtime_t)(stream[1] & 0xff)) << 22) |
     (((bigtime_t)(stream[2] & 0xfe)) << 15) |
     (((bigtime_t)(stream[3] & 0xff)) << 8)  |
     (((bigtime_t)(stream[4] & 0xfe)) << 0));

As Dave Bort would say: Crazy.

Another area that poses big problems when dealing with MPEG is that a lot of MPEG data is not well formatted. There's what the spec says and then there is what's done. This is the kind of problem you come to expect when dealing with data files generated by lots of programs, but MPEG seems to have the problem in spades. The highly varied interpretations of what the spec says make trying to parse MPEG files an interesting job.

Getting beyond my childlike whining, MPEG does have a notion of chunks of data, although the size of each chunk is not usually specified as a 4-byte value in the stream of data. Instead, the size is either implied or determined by continuing to read the data until you find the start of the next chunk.

The inherent streaming nature of MPEG means that there are no global indices of file positions to frames or to where the sync points are in the file. This complicates life for BeOS because a program often wants to know how long a media file is so it can display a properly sized slider or progress bar. To make that work, BMediaFile pre-parses the MPEG video stream so that it knows where sync points are in the file and roughly how many frames there are. The disadvantage of doing this is that it can take a long time to read through all the data first before decoding any of the frames. Our current approach of pre-parsing the entire MPEG stream is not something we're happy with and we plan to rewrite it.

From a media programmer's standpoint the important lesson here is that the number of frames returned from BMediaTrack->CountFrames() is only an approximate number. BMediaTrack->Duration() is also only an approximate value. This may be more or it may be less due to the imprecise nature of file formats like MPEG. Therefore, when you write a piece of code to process all the frames in a file you should write the loop this way:

while(1) {
    status_t err;
     err = track->ReadFrames();
    if (err == B_LAST_BUFFER_ERROR) // all done!
        break;
    else if (err == ...)            // handle other errors
        ....
    else if (err == B_NO_ERROR)
        render the frames...
}

or some variant of that approach (a good example is what the tcode program does). You should NEVER code a for loop that counts up to the number of frames returned by CountFrames().

Another interesting twist is that in QuickTime and some AVI files a single frame may be displayed for longer than the frame rate of the file. That is, a single frame is displayed for several seconds instead of repeating the frame's data in the file. Therefore, if you need to display a scroll bar or slider, the Duration() of the track is a better estimate of how long it really is. The correct thing to do is to use Duration() as an estimate and dynamically adjust scrollbars and sliders until you get a B_LAST_BUFFER_ERROR.

It's important to remember that CountFrames() and Duration() are not guaranteed to be precise and thus should NOT be used as an upper limit in a for or while loop. This imprecision is certainly frustrating for programmers but it's a fact of life that needs to be dealt with.

From Chunks To Tracks

Now that we understand the concept of chunks, it's time to talk about what's in some of those chunks. In QuickTime and AVI there is a notion of a media track that contains audio or video data. In a QuickTime file the 'trak' chunk indicates a media track. In AVI there is the 'avih' chunk, which contains global information about the entire movie, and the 'vids' chunk that contains details about the video track.

The QuickTime file format is much more flexible than AVI and allows any number of tracks, each with its own frame rate and data type. QuickTime tracks have extensive information about the presentation of the track's contents. Video tracks even have a 3x3 transformation matrix that can be applied to the video before compositing it into the final presentation!

Audio tracks in all file formats have the necessary information for playing the audio: sample rate; size of the samples (8 or 16 bit); mono, stereo, or multichannel, etc.

The track header also specifies how the track's data is encoded. For example, Cinepak-encoded data is identified with a fourcc of "cvid." Indeo-5 encoded data is identified with a fourcc of 'iv50'. You can find a very complete list of fourcc codes from AVI files at:

http://www.fourcc.org/

Audio track information usually uses a single integer to identify which codec to use. In a WAV files for example, raw audio is indicated by the integer value 1, and MS-ADPCM (Microsoft's Adaptive Pulse Code Modulation) is identified by a 2. AVI uses the same identifiers as WAV. QuickTime uses its own set of fourcc's, of course.

Another good reference for AVI/WAV file codec id's is RFC-2361:

http://www.faqs.org/rfcs/rfc2361.html

In addition to the information about a track, a file format will also have information about where in the file the data can be found. Typically this takes the form of a table that indexes frames and file positions. Using this information, you can extract the stream of data that makes up a particular track. In the abstract, most media files look something like this:

+--------+------------+------------+-----+------------+------
| header |   track 1  |   track 2  | ... |  track 1   | ...
|  info  | data chunk | data chunk |     | data chunk |
+--------+------------+------------+-----+------------+------

What you effectively have are interleaved chunks of data for each track. In practice, you usually have a single audio track and a single video track. Because the data rate of audio is far less than video, you often have a single audio chunk followed by several video chunks, followed by another audio chunk, etc.

If we were to split out the data chunks for each track and put them together, we would have a contiguous stream of data for each track. Again, it's important to remember that the file format doesn't care what the data is in a track or how it's encoded, it only cares about the ordering of the chunks of data and what frame (or frames) they correspond to.

From Tracks To Your Eyes And Ears

Now that we know what a track is, let's discuss how we get from the bits of data that belong to a track to something that we see on screen (or hear). If you're programming on BeOS you simply use the BMediaTrack object to get access to the data that makes up a track. The BMediaTrack object will let you access the decoded data for the track with ReadFrames(), or you can choose to get the encoded data with ReadChunk().

What's going on behind the scenes is that when you want to decode a frame from a media file, you have to first look up in the index where that frame's data lives on disk. Once you know that, you can seek there, read the encoded data into memory and hand it off to a decoder. This process (and doing it efficiently) is the bulk of the work that BMediaFile and BMediaTrack do.

The index that maps frames to on-disk locations can be very simple, as it is in AVI, or much more complex, as it is in QuickTime. In QuickTime each frame can have a different duration, can appear anywhere in the file, and QuickTime index/mapping tables support complex arrangements of the data (presumably to allow it to be optimized for slow devices such as cd-roms). In most cases the QuickTime approach is massive overkill. Aside from the ability to change the duration of individual frames, the extra functionality that QuickTime offers is almost never used.

Decoders

Above we described conceptually how we get the encoded data out of a file format. Once we've retrieved the encoded data for a particular frame, we hand it off to a decoder. The decoder's job is to convert that encoded video into a decoded format that the client (the person calling BMediaTrack->ReadFrames()) can deal with. Most simple video compression formats read the chunk, decode it into the output buffer, and are done. More complex codecs have state information that they must maintain from frame to frame. Indeo5 and MPEG are two examples of encodings that require lots of state information to be able to decode a frame.

Let's now delve into a few more details about different video and audio codecs. This is the list of encoded media formats supported by the upcoming BeOS Release 5. There are, of course, many more encodings out there—these are just the ones we support:

Encoded Video Format	Comment
Cinepak	expensive to encode; fast to decode
PJPEG/MJPEG	decent quality; common hardware capture format
MPEG-1	good quality; very widespread
Indeo-5	decent quality; can encode in real time
DV	constant bitrate; FireWire/i.Link video format
Apple Video	not widely used; low quality
MS Video	not widely used; low quality
MS RLE	not widely used; low quality
raw	heavy bandwidth requirements; perfect quality

Encoded Audio Format	Comment
MS-ADPCM	compress 16-bit audio into 4-bit; good quality
CCITT-ADPCM	compress 16-bit audio into 4-bit; good quality
ima4	compress 16-bit audio into 4-bit; good quality
ulaw	compress 12-bit audio into 8-bit; OK quality
raw	perfect quality
mpeg-1 layer 1,2,3	great compression (10:1); insanely popular

Video Codec Background

Before we jump into the details of each video codec, let's step back for a second and discuss some general terminology. A video codec encodes data using an algorithm to transform an uncompressed frame or field of video. The algorithm may operate on each frame independently or it may require information about previous frames. If an encoder encodes each frame independently, then when you want decode a frame you can do it without having to look at any other frames. In this situation each frame is treated as a keyframe (i.e., it can be decoded completely without requiring other data) and, therefore, you can seek to any frame in the file (aka: a perfectly seekable file).

If the encoding algorithm requires information about previous frames to encode the current frame, then when decoding the data you must first decode the prior frames leading up the frame you want. This type of encoder will output a "keyframe" every so often as a sync point in the file. Remember—a keyframe can be decoded independently of any other frames. So for example, the output of an encoder could be this: 1 keyframe, 10 delta frames, 1 keyframe, 10 delta frames, etc. That means that you can't seek to any arbitrary frame; you have to seek to a keyframe and then play forward to reach the frame you want.

Most sophisticated encoding algorithms almost always have keyframes every 10-20 frames. The algorithms take advantage of the temporal cohesion between each frame in video data. Keyframes are the reason that you can not seek to an arbitrary position in most video files. BMediaFile has support for seeking to the closest keyframe (ahead of or behind) to the frame you wanted. This feature did not work very well in Release 4.5 but will work much better in Release 5. If you require seeking to the exact frame you requested, you need to iterate calling ReadFrames() until you get to the frame you want.

Video Codec Descriptions

Video codecs each have different properties that make them more or less suited to different tasks. Cinepak, for example, is definitely not a real-time capture type of encoder. It can take several seconds to encode a frame. A stream of Cinepak data always starts with a keyframe and is followed by delta frames and then usually another keyframe (then the cycle repeats). Some Cinepak-encoded videos only have a single keyframe, as the first frame and everything else is just deltas. Decoding Cinepak data, though, takes very little time, making it a good playback format (for lower-end machines, etc.).

PJPEG (aka Photo-JPEG) encodes each frame as a JPEG image. This works reasonably well but does not take advantage of the temporal coherence in video. You can find fast (real time, even) PJPEG encoders (such as the one that ships with personalStudio). MJPEG is JPEG encoding of fields of video (remember though—frames are not the same as fields with video!). PJPEG is a good format for editing, because each frame is independent (i.e., a keyframe) so perfect seeking is possible.

One (simplistic) way to describe MPEG-1 is to think of it as PJPEG frames interspersed with deltas between the PJPEG frames. MPEG-1 encodes frames into one of three types: I-frames, P-frames, or B-frames (there are also D frames but these are extremely rare). I-frames are keyframes, P frames are "predicted" frames, and B frames are bi-directional frames. You can only seek to I-frames (and if an MPEG-1 file had only I-frames it would be roughly equivalent to a PJPEG file).

Indeo-5 encoding is similar MPEG in that it uses temporal cohesion to improve the compression rate, but there isn't a lot of information about the specifics of Indeo-5 encoding. Indeo-5 encoding can be done in real time if you have the right encoder (not currently on BeOS). The quality of Indeo-5 is decent, although not as good as a good MPEG-1 encoder. Indeo-5 is a good distribution format but not the best as an editing format.

DV (Digital Video) encoding is the encoding used by DV cameras (duh!) that communicate with a computer over 1394 (aka Sony i.Link, aka Firewire). DV is a good encoding: it's high quality; has constant bit-rate (about 4 megs/sec); and it can be decoded in real time in software. DV is also a good format for editing because every frame is independent.

The Apple Video, MS Video, and MS RLE formats are all old video encodings that don't have good quality compared to the newer codecs. Essentially they're all variants of run-length encoding.

Raw video isn't really an "encoding" but it is an option for storing video if you have the disk bandwidth. Typically, you can capture raw 320x240 size video to a standard IDE hard disk. Capturing 640x480 video at 16 bits per pixel requires about 17.57 megabytes/second bandwidth. That means that you need either a super-fast single drive or a striped disk setup. You can store raw video in a variety of pixel formats (RGB-32, RGB-16, YUV-422, etc.). Both AVI and QuickTime support raw video; MPEG does not. Raw video is perfectly seekable and is the best format for editing if you can handle it.

Audio Codec Descriptions

The MS-ADPCM, CCITT-ADPCM and ima4 encoders are all variants of Adaptive Pulse Code Modulation encoding. Essentially, they encode 16-bit audio into 4-bit chunks in a mostly lossless manner. The encoding process isn't terribly CPU intensive and decoding is quite cheap. You probably wouldn't want to use these formats for editing but for final distribution they work well.

uLaw encoding compresses 12-bit audio into an 8-bit quantity. The quality is usually poor (phone-line quality) but the encode and decode are extremely cheap. Old timers may remember the original SparcStation-1 had a /dev/audio that spat out (and accepted) uLaw-encoded data. And before I get flamed, yes, the NeXT Cube did it too. BeOS only supports decoding this format.

MPEG-1 audio is a sophisticated encoding scheme that uses psycho-acoustic models, DCT transforms, and Huffman encoding to compress audio, typically around 10 to 1. MPEG-1 audio has three layers, called, appropriately enough, layers 1, 2, and 3. MPEG-1 layer 3 audio is commonly known as MP3. There is also a layer 2.5 for low bit rates but it is not terribly common. MPEG-1 audio can be encoded with a wide variety of options (different data rates, stereo/mono, different sample sizes and sample rates). By far the most common form of MPEG-1 audio data is 128kbs data rate 44.1 khz stereo data (it's the format that 90% of your pirated mp3's are in). MPEG-1 audio is typically quite expensive to encode. Decoding doesn't take a lot of CPU time relative to video but it's much more than any of the other audio encodings. MPEG-1 audio is a great format to distribute audio in because of its wide acceptance and excellent compression ratio.

As with video, raw audio provides perfect quality. Raw audio can be stored in a variety of sample rates (11,000 hz, 22050 hz, 32000 hz, 44100 hz, 48000 hz) and in a variety of sample sizes (8-bit, 16-bit, or even 32-bit). Fortunately, raw audio doesn't have the bandwidth requirements of video, so storing it on a standard hard disk is easy. Editing audio is always best when done with raw audio.

Wrapping Up

That about covers our tour of media file formats and codecs. I know that this is a bit short but the subject really is vast and we don't have the space to write all that should be written about it.

The two most important points that I hope people take away from this article are these:

MPEG makes life difficult because it's difficult to know exactly how many frames are in a file until you've read them all. Therefore, you must code processing loops to end when they receive the error B_LAST_BUFFER_ERROR (even if you're not planning to deal with MPEG files).
Most video formats (and some audio formats) use keyframes, which makes seeking imprecise. When seeking is imprecise you may ask to seek to frame 35 but only get frame 30.

Developers' Workshop: The BeOS—The Rescue OS

By Daniel Switkin

I recently went hiking with my Nikon digital camera. After capturing a few memorable shots, I headed home to examine my handiwork. I downloaded the images to my hard drive and deleted them from the Compact Flash card. I happened to be viewing one particular image in ArtPaint and in Retouch, a photo manipulator I'm writing. I decided to scale it down to 640x480 from 1600x1200 to post it on the web. I then promptly overwrote my original file by being too quick with my keyboard shortcuts.

So here's the setup: the image was gone from the camera, it was overwritten on disk, undo was off in ArtPaint (these are 8 meg images), and my app doesn't have a save feature yet. The only place the original image existed was in a BBitmap in Retouch. A good challenge. Not to be thwarted, I did the following:

#include <OS.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <File.h>

#define APP_SERVER_TEAM         15
#define APP_IN_TROUBLE          "Retouch"

int main(int argc, char **argv) {
   int32 cookie = 0;
   area_info info;

   while (get_next_area_info(APP_SERVER_TEAM, &cookie,
       &info) == B_OK) {

       if (strstr(info.name, APP_IN_TROUBLE) != NULL) {
          printf("\nName is %s\nArea ID is %d\nSize is %d
            or 0x%x\nAddress is 0x%x", info.name,
            info.area, info.size, info.size,
            info.address);

          if (strstr(info.name, "RWHeap") == NULL)
            continue;

          printf("Found RWHeap for %s\n", APP_IN_TROUBLE);

          int32 address = 0;
          int32 *address_pointer = &address;
          area_id cloned_area = clone_area("Clone",
            (void **)&address_pointer, B_ANY_ADDRESS,
            B_READ_AREA, info.area);

          if (cloned_area < 0) {
            printf("Clone failed: %s\n",
                strerror(cloned_area));
            return 1;
          }

          area_info cloned_info;

          if (get_area_info(cloned_area, &cloned_info) !
            = B_OK) {
            printf("get_area_info failed\n");
            return 1;
          }

          BFile file("/boot/home/src/Fun/rescue.rw",
            B_WRITE_ONLY | B_CREATE_FILE);

          if (file.InitCheck() == B_OK) {
            int32 size = file.Write(cloned_info.address,
                cloned_info.size);
            printf("Wrote %d bytes out of %d bytes\n",
                size, cloned_info.size);
          } else printf("Could not create file\n");

          delete_area(cloned_area);
       }
   }
   return 0;
}

This dumped the entire contents of the read-write area for Retouch out of the App Server into a file. Running ps told me what team number to search for. So now the raw data of my photograph was on disk, which is good, but somewhere in a 46 megabyte file, which is bad. Hmmm.

I fired up Magnify, and found the RGB values of the first three pixels in the top left corner. After converting these to hex, adding 0xff for the alpha channel, and writing them out as BGRA (the order of a B_RGB32 bitmap) I ran:

hd rescue.rw | grep -1 "a3 a5 94 ff a3 a9 95 ff"

to find every occurrence of the first two pixels. This only turned up a few hits, and the third pixel narrowed it down to one location. I then hacked up the following to write a Targa header, seek into the rescue.rw file, and dump the image contents to my new image:

#include <File.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define LOCATION        0x01d4f368
#define SIZE            7680000

int main(int argc, char **argv) {
    BFile in("/boot/home/src/Fun/rescue.rw", B_READ_ONLY);
    if (in.InitCheck() != B_OK) {
            printf("Could not load file\n");
            return 1;
    }

    unsigned char header[18];
    memset(header, 0, 18);
    header[2] = 2;
    header[12] = 1600 % 256;
    header[13] = 1600 / 256;
    header[14] = 1200 % 256;
    header[15] = 1200 / 256;
    header[16] = 32;
    header[17] = 0x28;

    BFile out("/boot/home/src/Fun/rescue.tga",
            B_WRITE_ONLY | B_CREATE_FILE);
    if (out.InitCheck() != B_OK ||
            out.Write(header, 18) != 18) {
            printf("Could not write file\n");
            return 1;
    }
    if (in.Seek(LOCATION, SEEK_SET) != LOCATION) {
            printf("Could not seek to %d\n", LOCATION);
            return 1;
    }

    int size = 1 << 16;
    char *buffer = (char *)malloc(size);
    if (buffer == NULL) {
            printf("Could not allocate memory\n");
            return 1;
    }

    int total_size = SIZE;
    while (total_size > size) {
            in.Read(buffer, size);
            out.Write(buffer, size);
            total_size -= size;
    }

    if (total_size != 0) {
            in.Read(buffer, total_size);
            out.Write(buffer, total_size);
    }

    free(buffer);
    return 0;
}

Conveniently, true color Targa data is little endian, so I could dump the data directly (this is also why taking a screenshot in BeOS writes a .tga file).

What Now?

By Jean-Louis Gassée

First, two words of apology. One for being unable to answer all the e-mail I've received in recent days at my own jlg@be.com account, or through the info@be.com feed—I get to keep an eye on the daily flow of queries. I hope this column will help address most questions regarding last week's announcement. Second, the timing and manner of disclosure. The SEC frowns upon what is called "selective disclosure," a practice by which a subset of "select" individuals get material information before the market at large. In other words, we have to make sure we disseminate material information via a medium that provides timely and reasonably broad dissemination.

In our case, we use a press release on Business Wire. Perish the thought, but should an enthusiastic CEO discuss product plans in too much detail at some industry conference, a publicly traded company would have to rush out a press release the same day in order to put all buyers and sellers of its stock on the same footing.

But enough of that. Let's go right to the heart of the matter: Why are we shifting resources to Internet appliances, and what does it mean for what is commonly referred to as "the desktop?" Once upon a time, four or five years ago, if memory serves, Vint Cerf, one of the true fathers of the Internet, was on the cover of Boardwatch magazine. The professorial-looking Mr. Cerf proudly modeled a t-shirt bearing a simple motto: "IP on Everything". I remembered thinking, right, Coke, with a capital C, has gone to his head, as in pinging the proverbial soft-drink machine in a university dormitory. What's next? IP-enabled refrigerators?

Cut to January 2000 football commercials where the repairman comes to your house for your refrigerator. But it's not on the fritz. Not yet, goes the penetrating answer. The not-so-subtle subtext here is that we've entered the "everything connected" era where, yes, your IP-enabled fridge will report incipient trouble and get it fixed before the contents of the freezer spoil.

We agree, this is the post-PC revolution, a new phase for our industry, when all the objects in our daily lives will be connected to the Internet, with or without wires. At each previous phase, mainframes to minis, minis to personal computers, we've seen tremendous growth in the number of people and devices. We'll see a similar phase change in the post-PC era. Last spring, Michael Dell saw two billion devices connected to the Net by 2002 or 2003, with PCs accounting for 700 million of the number. Since then, most industry analysts have upped the forecast and agreed that Vint Cerf's "IP on Everything" vision was becoming a market reality. "Everything" should probably refer to objects ranging from watches to TVs, from cars to video recorders, and from stereos on Net steroids to refrigerators, security systems, wireless tablets, PDAs, telephones, and whiteboards.

Let's immediately qualify this by referring to the early days of marketing a new invention, the telephone. The (urban?) legend has it that the telephone was promoted as a means to listen to opera performances and theater plays. The concept of a worldwide web of telephone wires allowing anyone to call anyone any time was unimagined. How could anyone have seen the consequences of the telephone on our lives? It didn't change our DNA, but it is interwoven, "webbed," says the thesaurus, into our lives. Now, we're beginning to weave a new generation of IP-enabled devices into our culture. As we do this, we have to keep in mind the difficulties in foreseeing the impact of the telephone and expect similar surprises with "IP on Everything".

Moving on to BeOS, we have OS technology that combines many desirable features for the new breed of applications. Unlike embedded systems running under the hood of a car, these new appliances need strong multimedia capabilities. BeOS offers a small footprint and a modern, robust, modular, customizable solution for these applications. As disclosed in several of last quarter's announcements, customers and partners have validated our offering and we're planning a more formal announcement of what we refer to as Stinger, a complete software solution for Internet appliances. We'll be providing details at the upcoming introduction event.

So far, we have an exciting emerging market and a product for it. Looking at the market again, we see no 800-pound gorilla monopolizing it. Rather, we see a very fluid situation, fast growth, and we see the opportunity to become a mainstream player. That is why we've decided to shift our resources to that opportunity, to the goal of establishing BeOS as the premier OS platform for media- rich Internet devices. "Our resources," in the previous sentence, includes the desktop BeOS. In support of our Internet appliances effort, the desktop BeOS plays two roles, both vital. The first role is the development system for Stinger-based products, offering the advantages of a native environment, already well-tested, and, if I may say so, well-liked. Then, by offering a free version of BeOS available for download, we advertise our technology on the widest billboard known to humankind, the Internet. The goals are to gain visibility, market testing and feedback and to inspire developers to create new types of Internet appliance devices using Be technology. As a result, we'll continue to issue updates and new releases for the desktop BeOS in support of its role in our appliances strategy. For example, as new drivers and features are developed for Stinger-based products, BeOS desktop will gain driver compatibility and features.

I realize there is concern that we'll "ditch" the desktop, and I accept the fact that a shift in strategy always creates uncertainty. Only our actions over time can allay those concerns.

Fortunately, it's not entirely up to us. We intend to work with publishers and other partners to make commercial versions of BeOS 5 available through retail channels. This allows us to refocus the energies we previously applied to our own retail distribution efforts. We've received a number of calls from all over the world expressing interest in BeOS 5. Several software developers are interested in bundling BeOS with their applications and others want to publish and ship BeOS 5 itself. I've even heard a comment to the effect one company wants to be the Red Hat of the BeOS. I like the sentiment, and I'll let the legal eagles have fun with the putative motto.

As I have been required to do in the past, I must inform you that many of the statements I have made here are forward-looking in nature. That is, statements that are not historical facts are "forward-looking statements," including without limitation my statements regarding the future growth of the Internet appliance market, future availability and performance of Internet appliances and third party applications, plans for product development and release, the future capabilities of our products or other products mentioned herein, the market acceptance of our products, and our ability to penetrate and capture the emerging Internet appliance markets. Actual events or results may differ materially as a result of risks facing Be Incorporated or actual results differing from the assumptions underlying such statements. Such risks and assumptions include, but are not limited to, risks related to the growth of the market for Internet appliances, our ability to establish and maintain strategic relationships, our ability to develop and engineer modifications to BeOS, and the competition and market acceptance of BeOS. All such forward-looking statements are expressly qualified in their entirety by the "Risk Factors" and other cautionary statements included in Be Incorporated's prospectus, filed pursuant to Rule 424(b) of the Securities Act of 1933 on July 20, 1999 (Commission File No. 333- 77855), and other public filings with the Securities and Exchange Commission.