Introduction to the (Unnamed) Full Text Searching and Indexing Tool

Blog post by GeneralMaximus on Sun, 2009-05-17 13:29

Hi, I'm Ankur Sethi. I'm a first year Information Technology student at Indraprastha University, New Delhi. I will be working on a full text indexing and search application for Haiku Code Drive 2009.

I use Mac OS X as my primary OS. Before I switched to the Mac, I had been an Ubuntu user for four solid years. I first read about Haiku on OSNews back in 2007 (my profile says my account is 1 year 36 weeks old), and I was hooked. What first caught my attention was the incredibly short boot time, and the low resource usage. When I read up more about what Haiku is like under the hood, this is what I thought: WANT (excuse the meme). I'm waiting for the day I can just pop a Haiku install disk into my PC and use Haiku as my primary OS.

About The Project

The objective is to build an application that can be brought up with a simple keystroke and used to navigate to important documents and applications quickly, thus reducing mouse usage.

Here are my project goals, exactly as I listed them in my GSoC application:

  • Implement a full-text indexing tool which uses a database for indexing files containing textual content, and filesystem attributes for non-text files (MP3s, images, etc.).
  • Create a plugin-based architecture that allows the indexer to index different kinds of textual content (i.e., PDFs, ODFs etc.).
  • Implement a userland process to keep the index in sync as the files change on disk.
  • Create a mechanism to query the index and implement an algorithm to sort search results by relevance.
  • Create a GUI front end for querying the index.

I will keep the Haiku world updated on what I'm doing through this blog. Looking forward to a fun summer :)

EDIT: Changed title. Names, anyone?

Comments

Re: Introduction to the (Unnamed) Full Text Searching and ...

Good Luck Ankur.

Name for tool: Lighthouse ;)

Re: Introduction to the (Unnamed) Full Text Searching and ...

What about 'fetch' or 'fetcher' ? BTW, Lighthouse sound really good, i already see the icon ;)
Good work Ankur !

Re: Introduction to the (Unnamed) Full Text Searching and ...

An name idea: "Wheriku"? From Where and Haiku...
I know, this is very bad :)

Re: Introduction to the (Unnamed) Full Text Searching and ...

Maybe 'Scope'?

Re: Introduction to the (Unnamed) Full Text Searching and ...

Maybe it doesn't need a "product" name. Or a name at all... a name implies it's a separate program rather than part of the whole.

Something like File Detective would suffice.

Re: Introduction to the (Unnamed) Full Text Searching and ...

Oracle perhaps or File-O-Dex XD

Re: Introduction to the (Unnamed) Full Text Searching and ...

You could pay homage to Haiku's ancestor (BeOS) by adopting a name like "ZooKeeper", which was the original name of BeOS's file database system (until Dominique created BeFS, and later at Apple Spotlight).

Re: Introduction to the (Unnamed) Full Text Searching and ...

"mole", or "data mole". Let it burrow deep down into data.

A variant could be "mobot", a robot mole that works restless. Anyone like to make a Haiku style icon for it?

Re: Introduction to the (Unnamed) Full Text Searching and ...

Byte Extractor (for the most obvious, and worst, acronym).

Or, just plain Diggger.

Re: Introduction to the (Unnamed) Full Text Searching and ...

Full text search + Indexing Tool? DO WANT! :) I'm waiting for this a long time, especially since I've seen the progress of Spotlight on OSX and Beagle @ Linux. Wish you good luck, sir! :)

By the way, this new cool thing may be placed very nicely in Haiku 'old fashioned' ~Find~ window.

Re: Introduction to the (Unnamed) Full Text Searching and ...

All those names sound great :)

I will pick something once I actually start working on the project (from May 29, since finals end on May 28).

Re: Introduction to the (Unnamed) Full Text Searching and ...

Why not call the indexing and searching tool simply "Index"?

I know that might sound a little bit 'obvious', but it makes it clear what it actually does, it's short and snappy, and it fits in with the whole Haiku/word theme (i.e. think the index of a book).

Re: Introduction to the (Unnamed) Full Text Searching and ...

Ooh, an indian on Haiku Project!! (am one too)

As for the name; it's for keeping track of objects: thus,
Lighthouse(as suggested) or Beacon

Re: Introduction to the (Unnamed) Full Text Searching and ...

sounds a lot like the skyos indexfeeder :)

it's indeed very nice once you get used to it!

(I can give you a little more info on the skyos indexfeeder if you want)

Meshing with queries

How will this new indexing system fit in with the old, cool, BeOS query system? Certainly they're related, no?

Re: Meshing with queries

they are related, but if I remember correctly from when I was working with the befs code, the query system you mention is built into the file system. This allows to query on file attributes.

The indexing system itself would be a system with knowledge of certain file types to be able to scan (part of) the file content. Based on the ability to read the actual data, it can index certain keywords or extra information like author that may not be available in the filesystem attributes.

So with this system, launching a query should launch both a query on the FS and on the index database.

Since it is similar to the SkyOS index feeder, allow me to post this link that has an example: http://www.skyos.org/?q=node/532

so as long as the client interface to the query system is transparent for both systems, all is fine :)

Re: Introduction to the (Unnamed) Full Text Searching and ...

I like Phil Costin proposition, but i would simplified it to just "Detective".