Welcome to the homepage
of the Web Exploration and Search Technology Lab (WestLab) in the
Computer and Information Science Department at Polytechnic University.
The goal of our group is to design new tools and techniques for
searching and analyzing the structure and content of the World Wide
Web.
More details on our work can be found on the project page. Our
work is focused on the following four main areas:
Performance of cluster-based search engines
Current large search engines are based on scalable clusters, i.e.,
large numbers of workstations connected by fast LANs. We are working
to improve the performance and scalability of such engines and to
increase the quality of the results returned to the user. We have
implemented and studied a a number of search engine components,
including a scalable high-performance crawler (Polybot), specialized
storage systems, and indexing and query execution software. We are
also looking at new ranking techniques based on link analysis, and
at the integration of term-based, link-based, and other techniques.
Future Distributed Web Search Architectures
We are studying potential alternatives to the current centralized
cluster-based architectures, such as highly distributed and peer-to-peer
architectures and client-based search tools. Our current focus is
on the design of a novel peer-to-peer information retrieval substrate,
and on query execution in widely distributed systems.
Data Extraction, Mining and Discovery, and
the Deep Web
With collaborators at Poly and at UC Berkeley, we are looking at
automatic access to and query processing over web-accessible databases,
and at data extraction from unstructured and loosely structured
web pages. We are also looking at techniques for focused crawling,
recrawling, and other strategies for the discovery and monitoring
of web resources.
Optimizing Performance over Slow Wireless Links
In collaboration with the Visual Information Processing Lab, we
are studying delta compression and file synchronization techniques
for efficient storage and replication of collections of similar
files. For example, we are looking at ways to improve basic tools
such as tar+gzip for distributing file collections and the rsync
utility for file synchronization, in the case of very slow bandwidth
links. We are also working on protocol optimization and scheduling
issues in a proxy system for wireless web access, called SPAWN,
that we have built.
We may have research projects available for students looking for
Senior Project and MS Thesis topics. If you are a strong and highly-motivated
student at Poly who is interested in doing research in the web technology
area, please contact Prof. Torsten Suel to inquire about possible
topics. |