Issue 3-40, October 7, 1998

Be Engineering Insights: Mining The Net... (Part 1 of N)

By Benoît Schillings

A while back, I started playing with the Internet in a new way. Instead of plodding along as usual in a browser, I wanted to bring all that data and functionality into my own programs. To do that, I started working on a set of objects that would let me mine the Web selectively, for my own purposes.

There's a lot to steal from the net. It's pretty easy, for instance, to write a piece of code that can find someone's phone number or display a map of a location. The Internet begins to look different—like the ultimate software candy store—when you start thinking of it as a giant subroutine to *your* program.

This week we'll start with some basic stuff—how to extract data from the Net. This was what I did first when I began working on the project. My idea was to get a program to display the latest satellite picture of California on my desktop.

In the next issue, we'll look at some more elaborate things you can make the Internet do for you, like controlling Alta Vista or HotBot from your application, or finding someone's e-mail address or phone number.

For now, let's start with a basic building block, which I call a site_getter. The site_getter knows how to play with basic low-level network access to get an arbitrary object from a web site.

#include <Be.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <netdb.h>

enum NetResult {
  NET_NO_ERROR       =  0,
  NET_ERROR          = -1,
  NET_NO_SOCKETS     = -2,
  NET_UNKNOWN_HOST   = -3,
  NET_CANT_CONNECT   = -4,
  NET_CANT_SEND      = -5,
  NET_CANT_RECV      = -6,
  NET_TIMEOUT        = -7,
  NET_IS_CLOSED      = -8,
  NET_ALREADY_CLOSED = -9,
  NET_DONT_DOWNLOAD  = -10,

  NET_GET_URL        = 'nget',
  HTTP_ERROR         = -101
};

#define MAX_BUF 512000  // This should be dynamic,
                        // so do not load anything
                        //bigger than that

class  site_getter {
public:
              site_getter();
virtual       ~site_getter();

virtual int   Fetch(char *site_path);
        char  *GetData(long *size);
        void  WaitForData();
virtual int   doit0();

private:
    ulong   LookupHost(const char *host);
    int     Connect(char *host);
    int     Request(long msocket,
                    char *usrstring, char *cpath);
    long    FillBuffer(long msocket, char *buffer);

    long    done_sem;
    char    full_path[512];
    char    done;
    char    *fbuffer;
    long    total;
    long    msocket;
};

// Load the data from the site once the connection is open
long site_getter::FillBuffer(long msocket, char *buffer)
{
  long  size;
  long  i;

  size = 4096;
  size = recv(msocket, buffer, size, 0);

  printf("got %ld\n", size);
  return size;
}

site_getter::site_getter()
{
  done_sem = create_sem(0, "site_getter");
  fbuffer = 0;
  done = 0;
  msocket = -1;
}

site_getter::~site_getter()
{
  free((char *)fbuffer);
  delete_sem(done_sem);
}

// Little jumping board to start the internal thread
long init_p(void *p)
{
  site_getter  *g;

  g = (site_getter *)p;

  g->doit0();
  return 0;
}

// separate the web site from the internal path on the
// web site
void parse(char *raw, char *site, char *path)
{
  long  p;
  char  *copy;

again:;

  p = 0;

  copy = raw;

  while(((*raw != '/') || (*(raw + 1) != '/')) && (*raw)) {
    raw++;
  }

  raw += 2;

  while(*raw != '/' && *raw) {
    site[p] = *raw;
    raw++;
    p++;
  }
  site[p] = 0;

  strcpy(path, raw);
  if (strlen(path) == 0) {
    path[0] = '/';
    path[1] = 0;
  }
}

// This is the main function for the site_getter object.
// It will spawn a thread responsible for all its ,
// functionality; that way the object can access a web site
// in the background and let you open multiple connections
// at the same time to multiple sites!
int site_getter::Fetch(char *site_path)
{
  char  *buffer;
  char  *pb;
  long  size;
  long  total = 0;
  double  start, end;
  char  first = 1;
  long  i;
  char  site[256];
  char  path[256];

  done = 0;
  strcpy(full_path, site_path);

  resume_thread(spawn_thread(init_p,site_path,
    B_NORMAL_PRIORITY,this));
  return 0;
}

// find the actual address of a site
ulong site_getter::LookupHost(const char *host)
{
  ulong result = 0;

  hostent* h = gethostbyname(host);
  if (h && h->h_addr) {
    result = *(long *)(h->h_addr);
  }

  result = ntohl(result);
  ulong a = result;
  return result;
}

// open the connection with the web server
int site_getter::Connect(char *host)
{
  sockaddr_in  addr;
  ulong  result;
  int    s;

  addr.sin_family = AF_INET;
  addr.sin_port = htons(80);

  result = LookupHost(host);
  addr.sin_addr.s_addr = htonl(result);

  if (result == 0) {
    printf("cannot resolve %s\n", host);
    return -1;
  }

  s = socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
  if (s < 0) {
    return NET_NO_SOCKETS;    // Can't get a socket
  }

  if (connect(s,(sockaddr *)&addr,sizeof(sockaddr_in)) < 0) {
    closesocket(s);
    return NET_CANT_CONNECT;  // Can't connect! Geez!
  }
  return s;
}

// send the HTTP request
// We will pretend we are a web browser !
int site_getter::Request(long msocket, char *usrstring,
  char *cpath)
{
  static  char *HTTPVERS = " HTTP/1.0\r\n";
  static  char *USER_AGENT =
    "User-Agent: Mozilla/2.0 (compatible; NetPositive; BeOS)\r\n";
  static  char *ACCCEPT =
    "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\n";
  static  char *HOST = "Host: ";

  char  req[2048];
  char  path[2048];

  strcpy(path, cpath);

  //  GET or POST a request
  req[0] = 0;
  strcat(req, "GET ");
  strcat(req, path);
  strcat(req, HTTPVERS);
  strcat(req, USER_AGENT);
  strcat(req, ACCCEPT);
  strcat(req, HOST);
  strcat(req, usrstring);
  strcat(req, "\r\n\r\n");

  printf("str = %s\n", req);
  long result = NET_NO_ERROR;

  result = send(msocket, req, strlen(req), 0);
  if (result < 0) {
    result = NET_CANT_SEND;
  }
  return result;
}

char *site_getter::GetData(long *size)
{
  *size = total;
  return fbuffer;
}

int site_getter::doit0()
{
  char  *pb;
  long  size;
  char  first = 1;
  long  i;
  char  site[256];
  char  path[256];

  total = 0;
  parse(full_path, site, path);

  if (fbuffer == 0) {
    fbuffer = (char *)malloc(MAX_BUF);
  }

again:;

  if (msocket < 0) {
    msocket = Connect(site);
    if (msocket<0) {
      closesocket(msocket);
      release_sem(done_sem);
      done = 1;
      return -1;
    }
  }
  pb = fbuffer;
  Request(msocket, site, path);

  // Not really certain about what follows, but it seems to help !

  do {
    size = FillBuffer(msocket, pb);

    if (size == 0) {
      snooze(32000);
    }

    if (size == -1) {
      goto out;
    }

    total += size;
    if (total > (MAX_BUF-4096)) {
      break;
    }

    pb += size;
  } while((total == 0) || (size != 0));

out:;

  size = FillBuffer(msocket, pb);
  closesocket(msocket);
  msocket = -1;

  release_sem(done_sem);
  done = 1;
  return 0;
}

void  site_getter::WaitForData()
{
  acquire_sem(done_sem);
  release_sem(done_sem);
}

// To use this object, you just get it running on a
// given site: For instance
void main()
{
  site_getter  *a_getter;
  char    *data;
  long    data_size;

  a_getter = new site_getter();

  a_getter->Fetch(
    "http://www.wrh.noaa.gov/wrhq/CURRENT/VIS1MTR.GIF");

  a_getter->WaitForData();

  data = a_getter->GetData(&data_size);

  printf("got %ld bytes !\n", data_size);

  delete a_getter;
}

Well, now that we have a basic object that can fetch data from the Net, we'll be able to start looking at that data.


Developers Workshop: Modal Muddle

By Owen Smith

In my last Newsletter article I created an application, Doodle, which illustrated differences between the Be way of doing things and the Windows way. A few days after the article appeared, I was approached by a Ms. Morgan le Be, BeOS hacker and mistress of the black arts.

"I'm a little disappointed with the way you handled modal dialogs," she said. This was a typical Morgan response.

"Ah, but I used the same kind of scheme MFC applications use, and same approach BAlert does, so it should work fine!", I retorted.

"Just take a look at your fiendish creation," replied the smug Ms. M. She pointed at the Pen Widths dialog. "Try dragging that dialog over your document window." To my dismay, the document window didn't redraw correctly! The document view was blank, and the menu and scroll bars showed garbage.

Obviously something was wrong. I asked Morgan what she'd suggest. "Make the dumb thing asynchronous, silly. Instead of waiting for the dialog to finish, just pack its data into a BMessage and post that to the document."

But that would defeat my goal of showing how to do modal dialogs on BeOS. Clearly, my first attempt was not going to satisfy our discerning audience of geeks. So, what follows is a grossly magnified look at that marvel of interface design: modal dialogs. The code for a slightly revised Doodle is at:

ftp://ftp.be.com/pub/samples/intro/doodle.zip

This time I've added a MultiLocker class and abstracted the modal processing loop into its own file, syncutil.cpp. More about this in a moment.

User Behavior vs. Code Structure

Before I begin, some clarification. There are two more or less orthogonal issues lumped together in the Windows conception of modal and modeless dialogs.

The first issue is how the user interacts with the dialog. A dialog is modal when it prevents the user from performing certain activities while it's being used. A modeless dialog imposes no restrictions on the user's activities. There's a continuum of possibilities here. By far the most common ones are, for modal, a dialog that prevents the user from working within any other application windows while the dialog is running; and for modeless, a dialog that allows the user to interact with all the application windows while the dialog is running.

The other issue is how the dialog is actually coded. There are two possibilities: synchronous or asynchronous invocation. In a synchronous situation, one or more of your main threads blocks until the dialog is finished, at which point it picks up where it left off. There are varying degrees of synchronicity you can implement, depending on how many threads in your application you've blocked. In asynchronous behavior, one of your main threads invokes the dialog (by spawning a dialog thread, for instance), and then goes on its merry way. Later, the dialog informs you when it's finished, and you pick up where you left off when you invoked the dialog.

In the MFC library, you don't really have a choice of which code structure to use—modal dialogs are always invoked synchronously, and modeless dialogs are always invoked asynchronously. In the BeOS, however, these issues are separate. Modal/modeless behavior is implemented by creating a window with or without a modal "feel," but you have a choice of whether you want to block your thread (or threads) waiting for the dialog to finish (synchronous), or whether you want your thread to continue processing while the modal dialog is running, and get the data in some other way (asynchronous).

So, there are two questions to address when coding dialogs on the BeOS:

  1. Should your dialog be modal or modeless?

  2. Should you use synchronous or asynchronous design when creating your dialog?

Defending Modality

First, let's address the question of modal vs. modeless dialogs. Some UI zealots crusade against modal dialogs in applications because they limit the user's options. At the risk of igniting a UI jihad, I believe that modals are useful when it's desirable to limit the user's options. For instance, it might be useful to keep your user from closing a window and clobbering a document's data while you're waiting for the user to save the document.

Modal dialogs can also help simplify interactions between windows in your application. If the dialog's data depends on the active window, it can be difficult tracking the dialog's data in a modeless situation, where different windows may be activated—especially with Focus Follows Mouse turned on.

At the same time, it's good to make dialogs that are as unintrusive as possible. The BeOS gives you quite a bit of control over just how 'modal' you want a window to be (whether it blocks a specific set of windows in an application, all windows in an application, or—shame on you!—the entire system).

For further debate on this fascinating topic, please consult your local UI religious fanatic. Meanwhile, let's assume that you've decided to use modal dialogs and consider how to design them: synchronously or asynchronously?

Syncrosimplicity

Why are modal dialogs so popular in Windows? In a word, simplicity. Here's what you have to do to implement a modal, synchronous dialog:

  1. Throw an instance of a CDialog-derived class onto the stack.

  2. Stuff the dialog with the data you want it to be initialized with.

  3. Execute DoModal(). DoModal() then :

    1. Initializes and starts the dialog.

    2. Waits for the user to dismiss the dialog (the synchronous part).

    3. Returns a value which tells you the ID of the command (i.e. button) that was used to dismiss the dialog (e.g. IDOK, IDCANCEL).

  4. Grab the data from the dialog once it's finished.

  5. Get on with your life; the dialog is destroyed once you leave its scope.

The nice thing about this approach is that all the synchronous dialog behavior is contained in one function call, and you get direct access to the dialog's data after it's done. It would be nice if we could apply that simplicity to modal dialogs on the BeOS as well.

The Fine Art of Doing Nothing

The drawback to implementing synchronous dialogs is that you're blocking the calling thread. "But where's the problem there? When I'm running the modal dialog, my application doesn't do anything!" However, that's not the case. Even if your background windows don't respond to user events, they may still need to respond to

  1. Drawing updates (if the modal dialog or other applications' windows move).

  2. Pulse events.

  3. Events sent by other threads or applications.

What happens if you block all of the threads in your entire application, ignoring all these events? You get the Ultimate Modal Dialog: background tasks in your application stop, your windows aren't redrawn correctly, and your message queue backs up. This will earn you howls of derision from the modern computer sophisticate—especially BeOS aficionados, who are used to having things happen Right Now, Where I Want It, and Not a Care About Anything Else in the World.

Synchronicity Among Threads

"I seem to remember that Windows applications only have one thread by default. Why, then, doesn't your application grind to a halt when you run a modal dialog in Windows?" When you run a modal dialog in MFC, your thread doesn't actually block. Instead, it enters a special message processing loop (CWnd::RunModalLoop), continuing to dispatch messages to all your needy windows, while your dialog runs. But you probably have better things to do with your time than write a full-featured message processing loop, just to get synchronous dialogs implemented!

The good news is, you don't have to jump through that hoop to get synchronous dialogs implemented in the BeOS. Because each window runs in its own thread, they can process messages independently of each other. On the other hand, since you're in a multithreaded situation, there are synchronization issues to consider.

In order to keep your app responsive while your modal dialog is running, here are three cases to consider when running a synchronous dialog in the BeOS:

  1. The calling thread is unrelated to your window and application threads (i.e. they don't share access to any data). In this case, doing a synchronous call will have no effect on the rest of your system, so you can relax.

  2. The calling thread is a window thread. In this case, just blocking the window thread would be bad since the window would no longer respond to messages or update events. At the bare minimum, you'll want the window to be able to draw itself. You can do this by telling the window to update itself periodically while you're blocking. (This is the same trick that BAlert uses; see WaitForDelete in syncutil.cpp.)

  3. The calling thread is not a window thread, but one of your windows shares data with this thread. Your window will be able to run normally, but you need to make sure that it will be able to acquire access to the data it needs to perform its operations.

Note that if the calling thread is a BLooper, then while it's blocking, it won't be able to process messages in a timely manner. This will cause delays in message processing and can result in the message queue overflowing. It's up to you to determine whether this is an acceptable risk.

Wherefore Doodle?

Of the cases described above, Doodle falls into category 3. Recall that in Doodle, there is the application thread, a thread for each window, and a thread for each open document. Also, recall that in the previous implementation of Doodle, access to the document's data was protected using the looper's lock.

Let's take the simplified case where there's one application thread, one open window, and one open document. When Pen>Pen Widths is invoked synchronously, the document thread is the one that blocks. However, when it blocks, the window ceases to update correctly. Why?

There are two kinds of window events, occurring while the modal dialog is running, that depend on access to the document's data: (a) UpdateUI, which is called by the application thread's Pulse task to update the menu states, and (b) Draw, which the window's thread calls in response to window update events. Both of these functions attempt to access the document by locking the built-in looper lock before proceeding.

But Pen>Pen Widths gets called from the document's MessageReceived, and the looper is automatically locked while handling messages. Thus, the document is locked for the entire duration of the synchronous dialog, so the windows block waiting for access to the document. The result? Window garbage.

To the Locksmith

The way around this problem in Doodle is to change the way the locking mechanisms work. First, let me show you how NOT to solve the problem. "Simple," I said at first. "I'll just unlock the document looper, run the dialog synchronously, and then relock the document looper afterwards. Nobody outside will know the difference, and once I've unlocked, the windows are free to play with the document."

I asked Morgan le Be what she thought of this clever plan. "Nope," said Morgan. "What if the looper was locked multiple times before MessageReceived?" Well, I could unlock multiple times and then relock the same number of times, but that's forbidden: Never unlock something that you yourself didn't lock.

In coming up with a better locking mechanism, I realized that: (a) the windows only need to *read* the data to draw it, and (b) the looper locking mechanism doesn't have to be associated with the document's data at all. I can solve this problem by introducing a new, separate lock for the document's data. For maximum flexibility, I'll implement the data lock by using a multiple reader/single writer lock, provided by Stephen in a recent Newsletter article:

Developers Workshop: Yet Another Locking Article

Even though the looper is locked, the document's data can be read, and the windows can continue with their merry business.

Asynchronous Dialogs

As I mentioned before, an asynchronous dialog lets your thread continue to function while the thread is blocked, which means that you don't have to worry so much about other threads waiting for you. However, getting data out of asynchronous dialogs is akin to normal asynchronous window communication, and is usually a bit more complicated than synchronous dialogs. What do you have to do when you create a asynchronous dialog?

  1. Create a persistent instance of a dialog—or, if it's already running (a possibility for modeless dialogs), find it and make it active.

  2. Stuff the dialog with the data you want it to be initialized with, and make sure the dialog knows who it's supposed to communicate with. You could hand the dialog a BMessage and a BMessenger to eliminate dependency between your application and dialog.

  3. Start the dialog using ShowWindow().

  4. Get on with your life, making sure that you're not doing anything behind the dialog's back that it doesn't expect or can't handle (the asynchronous part).

  5. When the dialog is ready to give you data, you need to find some way of getting the dialog's data. There are several ways to do this:

    1. Have the dialog modify your data directly.

    2. Have the dialog signal you when it's ready (e.g. via a message). You then grab the data directly from the dialog. Hopefully, nothing untoward will have happened to the dialog's data in the interim.

    3. Have the dialog send you the new data in a message. It's a little tricky to do this in Windows since you're only given WPARAM and LPARAM arguments to work with. In comparison, the BeOS gives you the BMessage class, which is often a quite convenient way of solving this problem.

  6. When the dialog is finished (or if its target window or document is destroyed) send it a B_QUIT_REQUESTED message. You could Lock() and Quit() the dialog directly, but doing so could lead to deadlocks if the dialog uses locked, direct access to the target.

For asynchronous dialogs, you create the dialog in a different place from where you handle it, and getting the data from the dialog requires a bit more thought. On the BeOS you can use BMessages and BInvokers to get data back to your application and eliminate some of the dependency between the dialog and your application. For many people, Morgan included, this approach is actually simpler than worrying about thread blocking issues.

Conclusion

Here are some parting guidelines to help you decide between synchronous and asynchronous dialogs:

  • Use synchronous dialogs if you really must stop your thread until the dialog has finished. Keep in mind that it's up to you to find a way to keep your application responsive to time-critical messages (drawing updates in particular) while your calling thread is blocked.

  • Consider synchronous dialogs if you want to be scrupulous about protecting the data that's exchanged between application and dialog, or if the data can't easily be packaged into a BMessage.

  • In all other cases, use asynchronous dialogs. They let your calling thread continue to be responsive, and when you use BMessages to exchange data, you can eliminate dependencies between the dialog and your application. The drawback is that data exchange is a bit trickier, especially if your data doesn't fit well into a BMessage.


When Is It Done?

By Jean-Louis Gassée

The "it" in the title is the BeOS—that's easy. The matter of "doneness" is more complicated, because it can be seen >from at least two perspectives, which I'll explore here.

From experience we know that an operating system is an evolving entity. Looking at Windows, Unix, RSTS, VMS, or the Mac OS, we realize these creations live much longer than their makers ever intended or dreamed they would. Unix is a spectacular case of OS longevity. Designed as a single-user version of Multics—hence the name Unix --it has prospered and proliferated on servers and workstations as Solaris, AIX, A/UX, SCO Unix, AT&T Unix, FreeBSD, Linux, and Irix, to name a few.

"Invented in the late 1960s for a small computer with a 64K-byte address space," to quote Corey Satten, of the University of Washington, in his "Brief Introduction to Unix," this operating system shows that 30 years or so after its birth, few people think it's "done." In fact, as the enthusiasm for Linux shows, it looks as if predictions of Unix's demise at the hands of Windows NT were premature. In release after release, the many flavors of Unix keep adding features and hardware coverage. Each hardware platform, from Intel to Alpha and from PowerPC to MIPS, has several versions of Unix available. At its present distance from its Multics roots, Unix could now stand for Universal. So universal, in fact, that the BeOS includes a Posix "layer," a program interface that allows us to benefit from many Unix programs and utilities.

Another family of systems—younger than Unix—also demonstrates the evolving capability of system software: DOS and Windows. I haven't tried lately, but I bet most original DOS programs still run nicely on Windows 98 or Windows NT. NT isn't exactly an evolutionary relative of DOS (itself the son of CP/M) but 17 years after the birth of the IBM PC, Windows 98 is a true descendant of DOS.

In a similar evolutionary sense, we hope the much younger BeOS is far from "done." In fact, we plan to demonstrate with our upcoming Release 4 that we inhabit a part of the growth curve where returns on hardware still increase. That is, with BeOS Release 4, on the same hardware, speed and function both increase. More mature systems inhabit another part of the curve where, in general, on the same hardware, the price of additional features in a new release is a decrease in performance.

The word "done" has an additional meaning in our context. I'll use Windows again for comparison. In the summer of 1983, Bernard Vergnes gave me my first demo of Windows on a CGA screen, with tiled Windows. This was shortly after the difficult launch of the Lisa, and the even less auspicious kick off of VisiOn. Windows was by no means "done." It took a few more years—probably until Windows 3.0 in 1990, if memory serves.

Others will argue that Windows reached the "done" stage with 3.11, with Windows for Workgroups and the inclusion of reasonable networking features. I could take Macintosh examples as well, but as I've been involved with some stages of its development, including HFS (a good thing) and the System 6.0 MultiFinder (not so good), I probably lack the necessary distance.

Now, for the dangerous question. Using the frame of reference just described, where are we? Are we at the summer of '83 stage or the Windows 3.0 stage, where most people could use and enjoy Windows? In some ways, we'll reach further than Windows 3.0. Again, I'm not comparing features but relative development stages; in other respects we'll still exist at an earlier stage.

By "further" I mean the BeOS will be in many respects more robust than Windows was at that stage—as a legacy-free OS must be. And by "earlier stage," I refer to the other side of legacy-free: Windows ran legacy applications; we, on the other hand, must continue to work with developers and generate a growing number of BeOS-specific applications that make full use of the platform.

I'm excited by the work the engineers have done for our next release, and by the plans for subsequent releases. This is the best project I've ever been a part of. Of course, in view of my rather close participation, one could say that my statement lacks objectivity. Perhaps it does, but I've always been accused of one kind of parental feeling or other for the projects I've been involved with, so I'll take the liberty of believing in the relative part of the feeling. In any case, developers and customers are the ones who will judge whether I'm right or not.

Creative Commons License
Legal Notice
This work is licensed under a Creative Commons Attribution-Non commercial-No Derivative Works 3.0 License.