hcd2009

Full Text Indexing: Search UI

Blog post by GeneralMaximus on Fri, 2009-07-31 15:34

So far, I have been working on the indexing part of Beacon, which is nearly complete. In the coming weeks, I will be running beacond (now index_server) as a service in the background so that I can find and squash whatever bugs remain in the code. For now, index_server is blazing fast, but that might be because the indexes I test against are just a few megabytes in size. From what I hear, though, CLucene can easily handle indexes which are several gigabytes in size without blinking an eye, so speed might not be an issue as the indexes grow. Anyway, any potential performance bottlenecks will only show themselves once people start using index_server regularly.

And now for the good part: I have a basic search UI for Beacon up and running. For now, it can only perform simple keyword searches. In the future, I would love to integrate full text search into the Tracker "Find" UI, but for now I'm concentrating on improving this simple search tool.

Here is a screenshot:

screenshot3

/Files/pg/ is the directory where I'm keeping about 600MB of Project Gutenberg texts for stressing out index_server.

The next step is, of course, writing DataTranslators for a few file formats (as I've said before, PDF is top priority) and writing a simple preferences UI. I hope to post here soon with more good news :)

PS: Anybody interested in learning the CLucene query syntax can take a look at this page (although, in the future, Beacon will take a BeOS query and convert it into a CLucene query transparently).

Full Text Indexing: Status Update

Blog post by GeneralMaximus on Tue, 2009-06-30 13:18

After more than a week of thinking, "Today is the day I'll write that blog post", here I am with a status update on my HCD2009 project. I have only a few more points to add to what Matt has already posted here.

First of all, the previously unnamed full text indexing and search tool now has a name: Beacon. The indexing daemon currently in the works is called beacond. This is what beacond can do right now:

  • Monitor files for changes and add new/modified files to the index. Only plain text files are supported for now.
  • Handle mounting/unmounting of BFS volumes. Start watching volumes when they are mounted, and stop watching them when they are unmounted.
  • Selectively exclude certain folders from being indexed.

Right now, I'm mostly concerned with polishing beacond. A few short term goals are:

  • Reduce memory usage. Currently, beacond eats up about 60MB of memory, which is way too much for what it does.
  • Perform the actual indexing operation in a separate thread. This is required so that the daemon does not become unresponsive during long indexing operations.
  • Write a small tool which can search the index created by beacond (for demonstration and testing purposes only).
  • Several minor tweaks (properly saving/loading settings, better build system etc.).
  • Write a few DataTranslators so that beacond can be tested with different kinds of files. PDF is top priority.

In the long run, my major goals will be (1) seamlessly integrating Beacon with the existing Find tool in Haiku and (2) supporting more file types. But for now, the focus is on getting the daemon right.

If anybody wishes to check Beacon out, here is the project homepage (hosted on Google Code).

Update on the Web Services Kit and the Haiku Code Drive

Blog post by AntiRush on Fri, 2009-05-29 02:40

As you may have read recently, I've had to withdraw from the code drive this summer. Luckily, another student has stepped up to take over my spot. I hope his project is a success and that I'll be able to jump back in later in the summer.

I've had to reorganize my priorities because I was accepted into an REU program in which I'll do graduate style research for the summer. I'm looking to attend graduate school so this was an opportunity I could not pass up. Thankfully Matt Madia was quite understanding and was even able to locate another student to take over from me.

I will be doing research at the University of Wisconsin until the end of July. The project I'm working on is a drive to collect, create, and evaluate the effectiveness of visualizations of algorithms. The resulting collection will be used across many universities in their CS programs.

With that said, I'm pretty much without spare time for the next weeks. I've got an eye on the Haiku project though, and I'll be back and contributing when I'm able. Thanks for the opportunity and good luck to everyone this summer!

Implement BFS over FUSE

Blog post by raghuram87 on Thu, 2009-05-28 09:07

I am a BTech 4th year student at Indian Institute of Technology Madras, Chennai, India.

I will be working on implementing a FUSE based filesystem for BFS so that BFS partitions can be mounted natively in Linux and other POSIX operating systems.

I enjoy building systems like these where the final outcome is really interesting to watch and useful. I will be keeping the community updated regarding the progress in this blog. Happy coding all! Enjoy your summer!!

Introduction to the (Unnamed) Full Text Searching and Indexing Tool

Blog post by GeneralMaximus on Sun, 2009-05-17 13:29

Hi, I'm Ankur Sethi. I'm a first year Information Technology student at Indraprastha University, New Delhi. I will be working on a full text indexing and search application for Haiku Code Drive 2009.

I use Mac OS X as my primary OS. Before I switched to the Mac, I had been an Ubuntu user for four solid years. I first read about Haiku on OSNews back in 2007 (my profile says my account is 1 year 36 weeks old), and I was hooked. What first caught my attention was the incredibly short boot time, and the low resource usage. When I read up more about what Haiku is like under the hood, this is what I thought: WANT (excuse the meme). I'm waiting for the day I can just pop a Haiku install disk into my PC and use Haiku as my primary OS.

About The Project

The objective is to build an application that can be brought up with a simple keystroke and used to navigate to important documents and applications quickly, thus reducing mouse usage.

Here are my project goals, exactly as I listed them in my GSoC application:

  • Implement a full-text indexing tool which uses a database for indexing files containing textual content, and filesystem attributes for non-text files (MP3s, images, etc.).
  • Create a plugin-based architecture that allows the indexer to index different kinds of textual content (i.e., PDFs, ODFs etc.).
  • Implement a userland process to keep the index in sync as the files change on disk.
  • Create a mechanism to query the index and implement an algorithm to sort search results by relevance.
  • Create a GUI front end for querying the index.

I will keep the Haiku world updated on what I'm doing through this blog. Looking forward to a fun summer :)

EDIT: Changed title. Names, anyone?

Network Services Kit Introduction

Blog post by AntiRush on Sun, 2009-05-17 03:20

Hello Haiku World, I'm Tom Fairfield and I've been chosen to work on a project for the Code Drive this summer. You'll see me around IRC and elsewhere as fairfieldt or AntiRush. I'm a 4th year computer science major at Xavier University in Cincinnati, Ohio.

I've been interested in operating system development for quite some time and Haiku is a great looking project in that regard.

The project I proposed and was chosen to complete is a Network Services Kit for Haiku.

List of project goals:
o Design an API for the Services kit
o Implement a basic Services Kit that provides this base API that's extendable for any web service
o As a byproduct create various utility classes for HTTP/HTTPS and other web-oriented functions that can be easily re-used.
o Build a Twitter service on top of the base Services Kit
o Write a proof-of-concept application that utilizes the Twitter functions of the Services Kit
o Fully document the base Services Kit so that it is easy for future developers to add their own services. This will be both documenting the API and with a tutorial following the implementation of a protocol. Given time the tutorial will be written in conjunction with another server. Flickr? Facebook?
o Fully document the Twitter Service as well as the application. Again this will be both API documentation and a tutorial following the development of the Twitter application to demonstrate the use of the Services Kit.

The first step is to design and implement the HTTP library. I've been working with Pier Fiorini to design an api for both the HTTP library and the Network Services kit itself. At this point I've begun writing code for the HTTP side of things.

I'll try to frequently update with new blog posts to keep the community involved. Any comments or suggestions are more than welcomed - it can only make my project better!

Syndicate content