Welcome to the homepage of the Web Exploration and Search Technology Lab (WestLab) in the Computer and Information Science Department at Polytechnic University. The goal of our group is to design new tools and techniques for searching and analyzing the structure and content of the World Wide Web.

More details on our work can be found on the project page. Our work is focused on the following four main areas:

Performance of cluster-based search engines

Current large search engines are based on scalable clusters, i.e., large numbers of workstations connected by fast LANs. We are working to improve the performance and scalability of such engines and to increase the quality of the results returned to the user. We have implemented and studied a a number of search engine components, including a scalable high-performance crawler (Polybot), specialized storage systems, and indexing and query execution software. We are also looking at new ranking techniques based on link analysis, and at the integration of term-based, link-based, and other techniques.

Future Distributed Web Search Architectures

We are studying potential alternatives to the current centralized cluster-based architectures, such as highly distributed and peer-to-peer architectures and client-based search tools. Our current focus is on the design of a novel peer-to-peer information retrieval substrate, and on query execution in widely distributed systems.

Data Extraction, Mining and Discovery, and the Deep Web

With collaborators at Poly and at UC Berkeley, we are looking at automatic access to and query processing over web-accessible databases, and at data extraction from unstructured and loosely structured web pages. We are also looking at techniques for focused crawling, recrawling, and other strategies for the discovery and monitoring of web resources.

Optimizing Performance over Slow Wireless Links

In collaboration with the Visual Information Processing Lab, we are studying delta compression and file synchronization techniques for efficient storage and replication of collections of similar files. For example, we are looking at ways to improve basic tools such as tar+gzip for distributing file collections and the rsync utility for file synchronization, in the case of very slow bandwidth links. We are also working on protocol optimization and scheduling issues in a proxy system for wireless web access, called SPAWN, that we have built.

We may have research projects available for students looking for Senior Project and MS Thesis topics. If you are a strong and highly-motivated student at Poly who is interested in doing research in the web technology area, please contact Prof. Torsten Suel to inquire about possible topics.