Welcome to the homepage of the Web Exploration
and Search Technology Lab (WestLab) in the Computer
and Information Science Department at Polytechnic
University. The goal of our group is to design new tools
and techniques for searching and analyzing the structure and
content of the World Wide Web.
More details on our work can be found on the
project page. Our work is focused on the following four main
areas:
Performance
of cluster-based search engines
Current large search engines are based on scalable clusters,
i.e., large numbers of workstations connected by fast LANs.
We are working to improve the performance and scalability
of such engines and to increase the quality of the results
returned to the user. We have implemented and studied a a
number of search engine components, including a scalable high-performance
crawler (Polybot), specialized storage
systems, and indexing and query execution software. We are
also looking at new ranking techniques based on link analysis,
and at the integration of term-based, link-based, and other
techniques.
Future
Distributed Web Search Architectures
We are studying potential alternatives to the current centralized
cluster-based architectures, such as highly distributed and
peer-to-peer architectures and client-based search tools.
Our current focus is on the design of a novel peer-to-peer
information retrieval substrate, and on query execution in
widely distributed systems.
Data
Extraction, Mining and Discovery, and the Deep Web
With collaborators at Poly and at UC Berkeley, we are looking
at automatic access to and query processing over web-accessible
databases, and at data extraction from unstructured and loosely
structured web pages. We are also looking at techniques for
focused crawling, recrawling, and other strategies for the
discovery and monitoring of web resources.
Optimizing
Performance over Slow Wireless Links
In collaboration with the Visual Information Processing Lab,
we are studying delta compression and file synchronization
techniques for efficient storage and replication of collections
of similar files. For example, we are looking at ways to improve
basic tools such as tar+gzip for distributing file collections
and the rsync utility for file synchronization, in the case
of very slow bandwidth links. We are also working on protocol
optimization and scheduling issues in a proxy system for wireless
web access, called SPAWN,
that we have built.
We may have research projects available for
students looking for Senior Project and MS Thesis topics.
If you are a strong and highly-motivated student at Poly who
is interested in doing research in the web technology area,
please contact Prof. Torsten
Suel to inquire about possible topics. |