Following is a description of some of our current projects, with links to project homepages with additional information:

PolyBot: A High-Performance Distributed Web Crawler

PolyBot is a scalable web crawler that can download several hundred pages per second. The system is flexible enough to be used by various different crawling applications (e.g., bulk crawlers, focused crawlers, random walkers, page trackers), and is engineered to handle a number of possible performance bottlenecks (e.g., DNS lookup, robot exclusion checking, URL frontier). For more information, see the PolyBot Project Homepage.

Scalable Webpage Repository

This project is building a scalable storage system for web content that allows efficient retrieval of pages and sites, is robust against crashes, and uses highly optimized compression techniques to store different versions of a page. More information available soon.

Peer-To-Peer SearchInfrastructure

There has recently been a lot of interest in peer-to-peer systems and other highly resilient, widely distributed networks and applications. We are investigating and implementing a possible future search engine architecture that is based on an underlying open and highly distributed IR substrate. Applications of this work include search in file sharing and storage networks, intranet search, as well as standard large engines. More details can be be found on the Project Homepage.