ODISSEA: An Open and Highly Distributed Search Engine Architecture

Most major search engines are currently based on cluster architectures, with large numbers of low-cost servers located at one or a few locations and connected by high-speed LANs. Recently, there has been a lot of interest in using peer-to-peer (P2P) architectures to provide large-scale services, and several groups have proposed scalable substrates for P2P applications, for example Chord, Pastry, Tapestry, or CAN.

In the ODISSEA project, we study the problem of building a P2P-based search engine for massive document collections on top of such a substrate. A prototype of the ODISSEA (Open DIStributed Search Engine Architecture) system is currently under development in our group. ODISSEA provides a highly distributed global indexing and query execution service
that can be used for content residing inside or outside of a P2P network. ODISSEA is different from most other approaches to P2P search in that it assumes a two-tier search engine architecture and a global index structure that is distributed over the nodes of the system.

The following short paper gives an overview of the proposed system and discuss some basic design choices. It also discusses some preliminary simulation results for distributed query processing on a terabyte-size web page collection that indicate good scalability for our approach. A more detailed manuscript will be available soon.

ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval. T. Suel, C. Mathur, J. Wu, J. Zhang, A. Delis, M. Kharrazi, X. Long, and K. Shanmugasunderam. 6th International Workshop on the Web and Databases (WebDB), June 2003. PDF
Technical Report (23 pages): PDF
WWW2003 Poster Version (2 pages): PDF HTML