Against Directories

Body: 

Let me start out by saying that I am not "against" directories. It is more like "rethinking" directories. We have been talking on the Glass Elevator list about queries and how to make them more useful for some time now. There is a pretty broad consensus that we should try to bring more use of queries into R2 and beyond, that we should help the end user to use them far more than R5 does. While not a "killer app", queries can be, at the very least, an "attacker app": one of those little features that you really miss when you move to other operating systems. As a result of some serious consideration of what I know about how people use computers, it seems to me that, in many ways, directories may not make sense anymore. Beginning users struggle with them. Many people just don't use them--they fill their desktop with all of their files, or deposit everything into a large, flat "My Documents" folder. Advanced users become frustrated with them because they are forced into only one organizational method. They can't easily search for files except along the lines that they first organized their files.

If you take a graphic designer, for example, she might organize her files by client. If Client 1 wants a piece of stock clipart, she has three choices. One is to put her only copy of that clip art in that client's folder. This makes it difficult to find again. Her second choice is to keep it in a "stock" folder. This defeats her purposes in a few ways. One is that the folder for the client is no longer a "pick it up and go"--she needs to include some of the stock folder as well. Another is that she needs to remember what clip art a customer needs. A third is that this defeats the "last directory used" concept that most file requesters use--going automatically to the last folder that the user selected. The graphic designer is now forced to "ping pong" between two or more directories to complete her work. For these reasons, her third option, the link, was invented. There are two varieties of links: hard and soft. Soft links are much like references or shortcuts in Windows. They contain a path to the "real" filename. Hard links are a directory entry that points to the same data as another directory entry. Each has issues and difficulties. Links are "just different enough" that users need to be aware of them and consider them. For example, if our graphic artist wants to copy files to a CD, she may well need to tell the burning software to include the original file rather than the link (for soft links). Hard links cannot cross volumes: You can't make a hard link from Disk 1 to Disk 2. None of the options above, though, really make it feasible to ask questions like "which customers use clip art ABC?"

Directories have problems. We started by discussing all of the "cool things" we could do with queries. Add a tab to the file requester allowing you to do queries on the fly for the file that you are looking for and you have an easier system. We considered some ideas about what metadata should exist by default. The system provides filename, creation date, mod date, mime type, file size, etc. Additional possibilities include resolution (for images), MP3 tags, document type (for text files), etc.--anything that the system can reliably autogenerate. Users could add their own attributes to files to make it easier for them to find things. The core problem that this generates is that it is more work for the end user to do--assigning values. Then it hit me--if we convince end users to replace directories with attributes, it decreases their work and increases the system's usefulness.

This proposal was met with some resistance, because directories have been around a very long time and have solved organizational needs for a long time. I would like to discuss some of the objections here and react to them.

The first is stability--that directories are more stable and queries are more dynamic. That directories only change on command. I would respond that queries change in the same way as directories--when you want them to. If you choose a very dynamic parameter for your query (file size, for example, or last_mod_date), you will have folders that seem random. But that also makes sense--if you told a secretary that you want all of the newest memos on your desk at all times, you wouldn't expect the older ones to remain.

Another is identity--that directories provide one and only one clear path to each file. This is completely true. I don't see a lot of value in it, to be honest. In fact, with hard links, it is not true. BeOS doesn't have them, but Unixes that I have used do. It doesn't disturb me all that much when I see that there are a bunch of hard links to a file--that there is more than one way to get there.

The notion of hierarchy is brought up--that the hierarchy of files that a directory provides brings value. That queries are "opt-out" instead of "opt-in". Much of this depends on how you structure your queries. Sure, if you say "type=text" and nothing more, you get a lot of files. This is not radically different, though, from going to "My Documents" and saying "ls -R", recursively listing everything in your documents. If you instead say "type=text and project=obos-editorials", this is a lot like saying "ls -R 'My Documents/obos-editorials'". In a similar vein, the ability of the hierarchical tree to be divided into subtrees is brought up. You can do things like crawl the whole tree in pieces. This is more powerfully done with queries, though. Pick an attribute, divide up the possible results any way that you like, and run separate queries. So long as there is only one value associated with one attribute, this works well.

Queries make "drag and drop organization" a little bit more of a challenge. For example, if you organized your directories of music by genre then artist, you can assign an artist's music by dragging and dropping them into the genre of choice. That is a little tougher with queries--you would have to assign attributes. For a simple query (genre=blues) this is OK, but for (genre=blues or genre=jazz), this is harder. Various possibilities of a GUI to support this were discussed. The issue comes from the extra power of queries. You can make queries that you can't make with folders. One way to solve this solution is the same way that file permissions are handled in Windows--multiply select a bunch of files, assign a set of permissions and click OK.

Temporarily collecting files with queries is different than with directories. With a directory, I can drag and drop files into a temp directory, then copy the whole thing to a CD. With queries, you would need to assign the files a common attribute which does not get burned to the CD and gets removed as the files are burned. This is a bit of a shuffle around. It does have an advantage, though: The files still match their original queries. So, using backups for example, you could deal with backups by creating a "backed up" attribute of type datetime. Creating a new backup would be an easy query: "last_mod_date>backed_up". While the backup application is running, other applications could still access those files.

Finally, dealing with other filesystems (non-BFS) is more complex without directories. The whole directory API must exist to support interfaces with ext, ntfs, iso9660, udf, fat, FTP and others. Some method of manipulating directories must continue to exist as long as we must coexist. I understand and accept that. That doesn't mean that we need to allow others to dictate our future.

Queries have a vast number of advantages over unidirectional organizational schemes. They allow the users to ask questions in the way that makes sense to them. We have all seen classic Star Trek, in which they use the computer like a librarian--"Show me the files on Midas 5". That is the sort of power that a well built query system can bring to OBOS. The ability to work the way that you think at the moment, not the way that you (or a dev) thought of some time in the past.