Be Newsletters - Volume 5: 2000

Issue 5-15, April 12, 2000

Be Engineering Insights: Writing Video Drivers for BeOS

By Andrew Kimpton

Like many engineers, I get a lot of my motivation from having something new on my desk, a gadget, widget (or toy, as my wife prefers to call it). I recently bought a new Sony laptop, which like most on the market uses a Neomagic graphics controller. BeOS runs just fine on this notebook. Mine even came with two 4GB partitions on the 8GB drive--made just for multiple OS's!

BeOS supports the Neomagic family of graphics controllers from the old Neomagic 128 (aka 2070) through to the newer Neomagic 256AV (aka 2200) and the variants in between. Unfortunately, that support is achieved through the "old" style app server add-ons, rather than the newer (and blessed) app server accelerants. Whilst it works it's not "current" and doesn't do some things I'd like - such as centering the display when the resolution is less than the size of the LCD panel, or supporting DPMS to turn the display off when not in use. So, given the hardware and motivation, here's an example of an app server accelerant you can write for the Neomagic chips.

Oh, yes...one last "wrinkle" - there's no public documentation available on these chips, so our information is derived from the sources contained within the XFree86 X-Windows server for Un*x, and from previous experience with programming VGA controllers (the sort of stuff I'm afraid you can't easily find in a single book; you have to get "apprenticed" to a master and learn it that way 8-)

This is a fairly large project, so it'll be split up into a few newsletter articles. Initially we'll look at the significant parts of the kernel level hardware driver, then the parts of the app server accelerant necessary for basic framebuffer access. We'll also cover a couple of bells and whistles: subsequent articles will deal with adding support for a hardware cursor and handling hardware acceleration of 2D operations.

The Kernel Driver

For video cards the kernel driver is a fairly simple beast (in most cases), providing a mechanism to establish that a card is installed and to map sections of the cards Memory into the address space for others to access. The driver may also need to provide a couple of convenience routines for accessing VGA registers and possibly an interrupt handler, if you need to do something special, such as catch the vertical blanking moment.

Taking a look through the source for the driver (driver.c) init_hardware() simply walks the list of PCI devices to establish whether something significant to this driver is installed. init_driver() is a little more interesting. It allocates some per-driver storage and creates a lock that we can use to serialize access to the driver. Once again, it looks for installed hardware by calling probe_devices(). Finally it optionally adds some extra commands to the kernel debugger. This is a useful trick that lets us print out significant information from within the kernel debugger; it's perfectly feasible to use this technique from any driver.

If the hardware is not installed, init_driver() will never be called and the commands never added to the kernel debugger. probe_devices() again walks the list of installed hardware,building a /dev/graphics entry for each piece of hardware that is significant to this driver. The name of the entry follows a set of rules defined by Trey Boudreau, which allows ls /dev/graphics to be decoded by humans to determine installed hardware and its location on the PCI bus. publish_devices() simply uses the array created during probe_devices() to tell the OS what should be presented in /dev/graphics. Only two of the remaining functions deserve any special attention: the open and control functions. The first time this device is opened (generally by the app-server) nm_opened() will call first_open(). first_open() will allocate an area that will be shared between the driver and the accelerant. It will contain pointers to certain parts of the card and other useful pieces of configuration information, such as the cardinfo structure globally referenced as "ci" in both the driver and the accelerant. first_open() will also call map_physical_memory() to map the device's framebuffer RAM and memory-mapped registers into accessible memory so that the app server and its accelerant can write to them. You would set up and install an interrupt handler inside first_open() if you needed one.

nm_control() is where most of the work during operation is performed. This function provides one standard ioctl() selector and three private ones. The standard selector required for all graphics device drivers is B_GET_ACCELERANT_SIGNATURE. During the app server startup it scans and opens each entry in /dev/graphics and then calls ioctl() with B_GET_ACCELERANT_SIGNATURE for the opened device. Graphics devices should then return a string with the name of the accelerant for this device. The app server will load this accelerant and continue the initialization process through the accelerant.

The three private ioctl() selectors in our driver are actually quite standard and will probably be needed by any driver. Since the card_info structure is stored in an area, if the ID of that area is known it can be cloned and then shared by multiple applications. It's this mechanism that allows the accelerant to share data with the driver. Lastly, we need to be able to read and write VGA registers. These registers reside in the bottom of system memory (at locations such as 0x3d4). They can only be written to or read from by kernel software, so we provide two ioctl() selectors that allow arbitrary access for byte reads and writes to this area.

The remainder of the kernel driver is largely standard or even unimplemented. Reading and writing to the graphics device doesn't make as much sense as you might at first think. Other functions simple reverse he work of their partners (nm_open() and nm_close()). For more information on general driver "things," Todd Thomas's recent articles on USB drivers (Parts 1 and 1.01) are a good source (Developers' Workshop: Writing a USB Video Camera Driver, Part 1) and (Developers' Workshop: Writing a USB Video Camera Driver, Part 1.01).

And now on to the app server accelerant. An accelerant has one primary entry point - get_accelerant_hook() - which returns the addresses of other functions as requested by the app server.

The functions are in five groups:

Accelerant initialization and "cloning" - used mostly by the Game Kit for BWindowScreen
Mode configuration - determining supported screen resolutions and depths, setting a given mode, handling the palette for 8-bit (256 color) modes, and handling Display Power Management System (DPMS).
Cursor management - setting the cursor shape and mask, and moving the location of the cursor onscreen.
Synchronization - reporting which of the app server's BLIT requests have completed.
2D Acceleration - carrying out BLIT requests to use hardware features (if available) to perform fast fills or copy areas of the screen.

At the early stage of accelerant development only the Init function and mode onfiguration are mandatory in order to actually see the desktop on screen. If the clone functions are not implemented BWindowScreen will not work, and without an implementation of hardware cursor functions BDirectWindow will not work fully. Also, there is a significant performance penalty for not mplementing the 2D acceleration features, since the system CPU will have o do work that could be off-loaded onto the graphics chip. But hey! You'll see "something"!

So let's plough on with the work. Our init() function uses the GETGLOBALS ioctl selector to retrieve the area_id of the Card Info structure the hardware river setup. We call clone_area() with that area_id so that we have a shared area of memory that the driver and accelerant can communicate through. We then set some basic information in that structure, such as memory size, etc. We also build a list of available display modes that we can use later during the mode setting process.

Mode configuration can often seem the most complicated part of the process, and can also be the most frustrating, since problems at this stage ill nearly always result in either no display at all or an unreadable display. The app server calls four functions to handle mode configuration: _get_accelerant_mode_count() to find how many different modes are available; _get_mode_list() to return a complete list of all available modes; and _propose_display_mode() when small adjustments have been made to a previously chosen mode. These adjustments are the sort of thing that would result from using the slider in the Screen Preferences panel to adjust the refresh rate, and the call can actually be ignored - as is the case in this driver. Finally _set_display_mode() does all the real work. Ultimately _set_display_mode() calls SetupCRTC() in this sample driver to get all the work done. Let's walk through this function and look at what it does.

Much current graphics chip programming still carries lots of legacy from VGA (and even earlier) display standards. All of our clock values (particularly the value for horizontal timing) need to be converted from "pixel" values to "character" values; this means dividing by 8, since characters are 8 pixels wide. For simplicity, we'll also extract the values for vertical timing into local variables too, to make the code a little easier to read.

In order to have the electron beam of a CRT paint an image, the beam (a pretty analog device) needs a certain amount of setup time before and after the actual drawing area. This could be considered to be when the beam is scanning in space outside the edge of the picture tube. There is also a certain amount of time required for the beam to retrace from the right edge of the screen to the left edge, and from the bottom to the top. So hsyncstartand hsyncend correspond to "edges" of the area. hdisp is the actual time the beam is "painting" visible data, htotal is the total time required to draw one line and retrace back to the beginning of the next.

Driving all of this is a clock that needs to be correctly programmed to give the appropriate pulse rate to drive the whole analog system. Fortunately for us, in this example we're driving an LCD screen directly (as opposed to driving an external LCD panel through a standard VGA connector), so programming the clock is unnecessary (we'll cover it in another article dealing with simultaneous display on the LCD and an external monitor).

The code should be self-explanatory but there are some points worth noting: The Neomagic chip uses the basic standard VGA registers and adds some extensions. VGA registers are programmed by writing an index value to a particular location and then writing the 1-byte data value to the location after the index. After the data has been written the index is assumed to have incremented by one, so writing to the data location a second time will write to the next register in the table. There are four main groups of registers. The attribute and sequence egisters generally contain the standard values seen in this sample. The CRT controller and Extension registers need to be programmed on a per mode basis in many cases, and it is these registers that are often extended for additional features (as can be seen in the Neomagic chip).

Finally, there are the DAC and palette registers. One section of code in the driver appears to read and discard a value from the DAC register four times before writing 0 to the register. This apparent waste of reads is necessary to "uncover" a particular DAC register before writing a value to it. The palette of the Neomagic chips is 6 bits each for Red, Green, and Blue (except in 24bpp mode when it's 8 bits). The size of the per "gun" values of a palette varies from chip to chip; however if your image seems too dark, washed out, or has a particular colour cast to it, you're probably not shifting the palette values appropriately before writing to them.

After talking extensively about the indexed style of VGA registers, it's worth noting that some vendors have switched to more "regular" memory-mapped registers for their PCI/AGP video controllers (thankfully). 3Dfx is one good example of this, and is a company which has published its register specifications.

Our example driver is at this point functional: it displays a picture, allows you to chose different display modes, and even supports DPMS. However, it's not fast, and doesn't have a hardware cursor. Those are both topics for a future article in the next couple of weeks.