OpenBeOS Project - Displaying Newsletter

Displaying Newsletter

Newsletter Archive

Issue 21, 30 Jun 2002

In This Issue:

Automating tasks with xicon and TermShell by Daniel Reinhold
Project changes on the way by Michael Phipps
Virtual Memory in the OpenBeOS kernel by Michael Phipps


Automating tasks with xicon and TermShell	by Daniel Reinhold

Recenty, I've become somewhat occupied with looking at our new build process. As I played around with the commands, I started to create some shell scripts to automate the process. While I'm hardly a bash guru, I've managed to slap together some basic stuff. For example, to download the latest source tree, I use this:

# Update the entire OpenBeOS source tree
# from the CVS repository (anonymous checkout)
SourceTree=/boot/home/local
cd $SourceTree
echo ==================================
echo CVS: updating local source tree...
echo ==================================
echo
export CVSROOT=:pserver:anonymous@cvs.open-beos.sourceforge.net:/cvsroot/open-beos
echo -e "\n" | cvs login
echo "(checking cvs repository for changes...)"
cvs -z3 co -P openbeos
echo
echo update complete.

From this, you can see that my local copy of the OpenBeOS source tree is located at
/boot/home/local/openbeos.

Using this same directory structure, it's useful to automate the builds too.

# Rebuild the entire OpenBeOS source tree
SourceTree=/boot/home/local
cd $SourceTree
echo
echo ==================================
echo Jam: rebuilding all binaries...
echo ==================================
echo
cd openbeos
jam
echo
echo build complete.

These two scripts could be combined, but I don't care to do that myself. I don't necessarily run a build after every CVS update, or vice-versa. Creating a master script to call both could be handy, tho.

To organize all the scripts I use, I just created the folder /boot/home/scripts. This makes it easier to find all the ones that I have. I also put a link to this directory on the desktop, so that I can quickly bring up the list.

Simplify launching the scripts

Ya know... it's always bugged me that when you double click on a script file in Tracker, it doesn't automatically open up a new Terminal window and execute it there. Seems so logical, but that's not how it is. Naturally, I can open up my own Terminal windows, but gosh, that's a lot of work (*grin*). And running the script from an already open Terminal just clutters it up with all that extra text.

But hey, I'm a programmer -- how hard could it be to fix up my own solution? Having come across a BeTips item on Terminal configuration options, I felt compelled to give it a try. Here's the method that I decided to go with:

// size of the Terminal window (cols x rows)
#define GEOMETRY "130x40"
launch_terminal (char *srcfile, char *tmpfile)
    {
    FILE *fps = fopen (srcfile, "r");
    FILE *fpt = fopen (tmpfile, "w");
	
    if (fpt && fps)
        {
        int  c;
        char buf[600];
		
        // put shebang at the top
        fprintf (fpt, "#!/bin/sh\n");
		
        // copy in remainder of source script
        while ((c = fgetc (fps)) != EOF)
            {
            fputc (c, fpt);
            }
        fclose (fps);
		
        // put in 'read' command to delay Terminal exit
        fprintf (fpt, "\necho\n");
        fprintf (fpt, "echo \"Press ENTER to exit this Terminal window\"\n");
        fprintf (fpt, "read -e\n");
        fclose (fpt);
		
        // run the script via Terminal
        sprintf (buf, "Terminal -t \"TermShell: %s\""
                      " -geom " GEOMETRY " %s", srcfile, tmpfile);
        system (buf);
        }
    }

The idea here is to create a new temp script file and run it in place of the original. That way, I don't have to worry whether the original script has a shebang line at the top, or if it has a ".sh" extension or not, or is associated with some other program. Yep, any old text file with some bash commands in it will do. It also allows me to put in a delay prompt (using the 'read' command) because, otherwise, Terminal would exit as soon as the program it launched was finished.

My first version using this idea was a small C program called 'term' that received the script name on the command line. Thus I could use "term foo" to run the foo script. It worked fine, but... well, I still couldn't launch 'term' from Tracker, even when specifying 'term' as the preferred app for the script.

The problem is, Tracker doesn't send the passed arguments on the command line, but instead sends them to the app as entry_refs contained in a BMessage. In short, I needed a C++ program.

Hey, no problem. I wrote one up called TermShell. However, in the process of investigating how to go about writing it, I came across a dandy program that pretty much handles this already called xicon.

Xicon

Xicon was written by Pete Goodeve and is pretty damn slick. If you do any shell scripting at all, then you simply must download this program. It comes with it's own associated type and icon image. In fact, it relies on script files having the required file type -- it doesn't run normal text files by default. Pete includes a conversion script to make any current script you have lying around usable by xicon.

What about creating new script files? Well, you could create a new text file and, after you've finished putting in all the commands, run it thru the convert script. But it makes more sense to me to create a new Tracker template:

Right click in a Tracker window and select New::Edit Templates...
Copy one of the included xicon scripts to this special window
Rename the file to something simple and meaningful like 'shellscript'
Open the file with a text editor, delete all the text, and then save it

Now, when you right click in a Tracker window, you can choose New::shellscript (or whatever you called it). By default, the newly created file will open with xicon, which won't do anything because the file is still blank. Slightly annoying, but no biggie. Then just edit the new script with any text editor. When finished, the script can be double clicked and executed. Sweet.

This technique does require that you run OpenTracker, not the original Tracker that comes with the BeOS (which does not have the Template feature). But my goodness, you should be using OpenTracker anyway -- it's far too good to pass by.

TermShell

One annoyance with xicon is that, having solved the problem of launching from Tracker, it doesn't handle command line args, so running it as a command from Terminal doesn't do any good. The source code to xicon is included with the download, so you could modify it if you like. For my purposes, I keep TermShell around. It's useful in that it will accept the script name to launch as either a command line arg or an entry_ref from Tracker. However, unlike xicon, it doesn't handle passing parameters to the scripts.

I've posted the source code for TermShell for anyone who might be interested. The binary is included as well, however, if you recompile, you'll need to reset the program's attributes to insert the app signature and handled file types. Instructions for this are included.

Source Code:
TermShell.zip


Project changes on the way	by Michael Phipps

I am not a fish...

therefore, I do not scale well. :-)

When we first started OBOS, I started by listing all of the teams. Then, as people came along and expressed ability and interest, I made them team leaders. People would email me, expressing interest in the project and I would determine where they should go based on their skills, interests and our current staffing issues. I also added everyone to CVS access, under the mentality "Let a Thousand Flowers Bloom".

The Team Lead (TL) concept worked out very well. While there has been some turnover (mostly due to real life intervening), many of our original team leads are still around and doing great work. The other concepts, though, have been outgrown over time. I pruned our CVS access list a while ago, for example, to only include people who had committed code, basically. I don't add people to CVS unless their TL asks me to. That occurs when people are "proven" - have submitted enough code that we can be reasonably confident in their coding ability and their seriousness about the project. We added a recruiter because my answering a hundered emails a day was too much.

In the same vein, it is time for another change. No longer will we have team membership lists, in the same way. Instead there will be "regular contributors" and TLs. The TLs will be posting (regularly) a list of things that need to be done. If you want to help, grab something off of the list and look into it. When you are sure and confident (maybe even wrote a little code), let the TL know, then dive into it. When you are done, you send the patch to the TL who reviews it and accepts or rejects it. No more need to join, commit or send a half dozen emails to do things. A lot less stress on the TLs, in that they don't have to deal with all of that. Or with people disappearing (as they do, with open source projects).

The changes to the website will appear over the next few weeks. The changes in the way we do things can happen (nearly) immediately. Most teams have web pages or TODO lists in CVS or something. Again - we will organize this a little better over time. But I think that this will be an improvement. Admin work is not really a natural fit for engineer types. I would like to publically thank our TLs, both past and present, along with Deej, Daniel, Aaron and all of the other folks who have helped with the admin work.

In other news... We are nearly ready, finally, to reorganize our CVS setup. For those of you who are newer to the project, I will summarize... I created a CVS setup back in September or so to hold all of the new source code. Over time, it became clear to most everyone that a different design was required. We had a volunteer design one and begin to implement it and move all of the code over. But real life intervened and he no longer had (almost) any spare time to contribute. So we have existed in limbo for a few months.

Recently, things have come to a head and Erik (TL of the Interface Kit) agreed to take this task on for himself. He has nearly completed the scripting to make this smooth and easy. With a bit of luck, we will have a new hierarchy shortly. Then the build team will go to work moving everything over to the new Jam build system.

We have come a long way, both code wise and organizationally. And we probably have further to go than any of us can imagine. Watch for more good stuff in the near future. And if you have any suggestions, ideas, thoughts, or comments, you know how to find us...


Virtual Memory in the OpenBeOS kernel	by Michael Phipps

Virtual memory is a tricky and arcane thing. Everyone's implementation is a little bit different, tailored to the usage characteristics of their operating system. The OpenBeOS VM system is no exception.

What is VM?

In a nutshell, virtual memory is the system by which memory and disk pages are managed. When a program reads from a disk, it asks the kernel for the information. The kernel asks the VM system. The VM system looks to see if the information is already in memory. If so, it passes a reference to that information back to the caller (caching). If not, it finds a free space in memory and loads that page, then passing a reference back to the caller.

The paging system runs periodically, looking for altered pages of memory that have disk space associated with them. It writes the pages back to disk and marks them as unaltered. You might ask why the system would do this. The answer comes from low memory situations. When you start to run very low on memory, the system starts to look for pages that it has a copy of on disk; it marks those pages as "unused". So what happens when you need that data again? The system allocates another memory chunk for it, loads it into place, and adjusts the addressing so that your process can find it.

What does VM get me?

After all, a gigabyte of memory is pretty cheap, now. Why bother with paging in and out and all of that? One answer is performance. By automatically caching every single disk read and write, the operating system runs faster. Furthermore, it provides a mechanism for different processes to write to the same file at the same time without breaking. Finally, with functions like fork(), two processes can have different perspectives on what was once the same file.

The major components of the OpenBeOS VM system

There are seven key "objects" (structures, really) in the VM that really hold all of the information about the VM system. They represent both the physical memory actually present on the machine and the virtual memory layout.

The physical side of things:

vm_page - represents one physical page (usually 4k) of real, physical memory. It holds all of the details about the page - the address in physical memory, the state of it (clear, free, in use, busy, etc.)
vm_cache_ref - essentially the bridge object between the virtual perspective on memory and the physical side. Each vm_page has one and only one vm_cache_ref, but each vm_cache_ref can be pointed to by multiple vm_pages.
vm_cache - a collection of vm_pages and their vm_store (see below).
vm_store - an API or interface to one of several different implementations. A vm_store is a data structure that contains the data necessary to support swapping in and out (where appropriate). The most commonly thought of implementation is vm_store_vnode, which allows swapping pages in and out.

You could easily conceive of these four objects as one "mega" object, the physical chunk. This mega object has many pages, knows how to read/write from disk, knows it's pages addresses, knows how to look for pages that are unused, etc. It is called the process_space "mega object" to allocate space. There would be one instance of this physical chunk per process.

The virtual side of things:

A vm_region - a named "area", i.e. a data object that represents a virtual memory block in much the same way that vm_page represents a physical memory block (except that it can be any size).
vm_virtual_map - contains the list of regions held in the address space.
vm_address_space - contains a pointer to the vm_translation_map, a processor dependent data structure used to tell the processor what to map where. The vm_address_space counts page faults, is used to ensure that there is a sufficient "working_set" - amount of physical memory and is the lowest level structure in the "common" VM area.

These objects could be considered as a mega object as well - a "process_space" object. These three structures hold all of the information about the processes' virtual mapping. This mega object would contain system calls for allocating and freeing memory, memory mapping files, etc.

Before we dicuss how these components work together, there are a couple of concepts that we need to talk about.

Lazy allocation

Lazy allocation means that the system waits until the resource is actually needed before it is actually allocated. This is helpful for two reasons. The first is that if a huge amount of memory is allocated, the system doesn't have to actually go and set it all up right away. The second is that if memory is allocated, but not used, the physical memory is still available to others (without swapping).

Cache chains, shadow copies, and copy-on-write

Consider the following scenario: process A calls fork() and spawns process B, which means B has access to A's memory. In fact, the only way B knows that it is B is from the return value from fork(). But what happens if B starts to write into A's memory? Isn't that a hot bed of race conditions and confusion? Yes. Unless you, as Posix requires, make B's memory separate from A's.

One very simple way to do that would be to copy all of A's memory onto different physical pages. The problem with that is that it takes a long time, it stops both A and B from running until the copies are all made, it takes up a lot of memory, and it is often wasteful.

So, instead, we mark the pages as "copy-on-write". When writing is to occur to either, make a duplicate copy and write on that. B's virtual memory starts out pointing to the same caches as A's. When a copy on write occurs, it adds a cache to B's "cache chain" - when lookups need to occur to find the physical memory for a virtual address, we start at the bottom of the cache chain (with B's private memory first), proceeding up (to A's memory) until an entry is found. If we move all the way through the cache chain and no entry is ever found, a new page must be allocated and set as B's private memory. When a copy-on-write occurs and an entry is placed in B's private space, it "shadows" the original, and is therefore called a "shadow copy".

Putting the pieces together

So, given all of these objects and concepts, how does virtual memory work? When you allocate memory using a kernel function (i.e. not malloc), a new vm_region is created. It is added to the vm_virtual_map and it gets a vm_cache_ref, vm_store and vm_cache. Assuming lazy allocation is used (a flag to allocation), when the first byte is accessed, a page fault occurs. The VM system looks through the cache and finds that there is no memory allocated for this page. So it allocates a page, sets it to zeros, and enters the physical page information in the processor specific tables, so this page fault will not occur again. Execution of the user land code is allowed to continue.

Now let's assume that memory becomes very full. All of the pages for this process are set to "unmarked". The CPU sets the marked bit when the physical page is accessed. Some period of time later, a kernel process looks at the memory blocks. If they are backed by disk space and they are still unmarked, they are written to disk and the memory is set as "free".

Finally, let's assume that an attempt is made to read one of those "swapped out" pages. A page fault occurs. When the VM system looks through the cache, it sees that there is indeed a cache for that memory and that it is swapped out. It finds a free page of memory and asks the vm_store to read the data back in and fill the physical page.