An Open and Highly Distributed Search Engine Architecture
Most major search engines are currently based
on cluster architectures, with large numbers of low-cost servers
located at one or a few locations and connected by high-speed
LANs. Recently, there has been a lot of interest in using
peer-to-peer (P2P) architectures to provide large-scale services,
and several groups have proposed scalable substrates for P2P
applications, for example Chord, Pastry, Tapestry, or CAN.
In the ODISSEA project, we study the problem of building
a P2P-based search engine for massive document collections
on top of such a substrate. A prototype of the ODISSEA (Open
DIStributed Search Engine Architecture) system is currently
under development in our group. ODISSEA provides a highly
distributed global indexing and query execution service
that can be used for content residing inside or outside of
a P2P network. ODISSEA is different from most other approaches
to P2P search in that it assumes a two-tier search engine
architecture and a global index structure that is distributed
over the nodes of the system.
The following short paper gives an overview of the proposed
system and discuss some basic design choices. It also discusses
some preliminary simulation results for distributed query
processing on a terabyte-size web page collection that indicate
good scalability for our approach. A more detailed manuscript
will be available soon.
ODISSEA: A Peer-to-Peer Architecture for Scalable
Web Search and Information Retrieval. T. Suel, C.
Mathur, J. Wu, J. Zhang, A. Delis, M. Kharrazi, X. Long, and
K. Shanmugasunderam. 6th International Workshop on the Web
and Databases (WebDB), June 2003. PDF
Technical Report (23 pages): PDF
WWW2003 Poster Version (2 pages): PDF