Computer & Information Science Department   Polytechnic University

ATTENTION: THIS WEB SITE HAS MOVED. The pages you are looking at are no longer being maintained. Please go to http://www.poly.edu/cis/ to visit the new site of the Department of Computer and Information Science at Polytechnic University.

Databases And Information Retrieval

(Profs. Delis, Hellerstein, Memon, Suel)

The fourth major research concentration in the department is concerned with the management, querying and analysis of large data sets, and includes the areas of database systems, data mining, information retrieval, and web search and exploration. Work is performed in several labs and research groups, with emphasis on algorithmic and architectural issues.

Client-Server Databases: Prof. Delis and his students in the Database Systems Lab are working on architectural and performance issues for client-server databases. Most modern databases are organized following variants of the Client-Server model, where a number of clients (e.g., PCs) interact with one or more servers that use database engines to retrieve data and serve it to the clients. Prof. Delis' work, supported by an NSF Career Award, focuses on performance issues in such architectures, where a naive implementation quickly leads to a performance bottleneck at the server. He has studied the scalability of the standard two-tier Client-Server model, and has proposed a three-tier model that employs a number of optimization techniques, including caching, prefetching, and client clustering, to scale to larger numbers of clients.

Query Processing and Optimization: Database systems have to be able to efficiently process highly complicated queries on large amounts of data. To achieve this, systems use a variety of techniques, such as highly optimized index structures for accessing the data, or query optimization for finding the best way to execute a query. Several faculty members are working on new techniques in this area, including index structures, cost estimation techniques, approximate query answers, and efficient operations in spatial databases. In particular, Prof. Delis has studied the performance characteristics of common index structures for disk- and memory-resident data, and the efficient implementation of temporal query operations. Prof. Hellerstein is working on new techniques for selectivity and cost estimation of database queries, and has worked on efficient coding schemes for parallel disk architectures (RAID) and methods for generating random range queries. Prof. Suel is working on problems in selectivity estimation, data partitioning, sampling and approximation techniques for query results, and query processing in spatial databases.

Intelligent Information Retrieval and Text Mining: Prof. Hellerstein is working on problems in intelligent information retrieval, such as learning to automatically categorize documents by topic, and learning to extract information from documents. Her work in this area, supported by a grant from the National Science Foundation, focuses on learning-based approaches based on fundamental results from Computational Learning Theory. Prof. Hellerstein is also leading a reading group of faculty and students focusing on current developments in information retrieval and machine learning.

Web Search and Analysis: One of the most fundamental problems facing the World Wide Web is how to efficiently find the desired information among the more than one billion currently accessible web pages. A large amount of industrial and academic work over the last few years has focused on this problem, and powerful search engines (such as AltaVista and Google) have been built using massive amounts of hardware. However, the basic search problem is far from resolved, and new challenges arise constantly as the web evolves.

Prof. Suel and his students are working on techniques for improving the efficiency of web search. Besides improving the quality of the search results, it is also important to improve computing and storage efficiency in order to keep up with the growth of the web and allow a deployment on more modest hardware. Closely related problems of interest are those of exploring or analyzing the structure and properties of the web and of efficiently storing and archiving the content of the web. Search engines need to be able to store massive amounts of encountered pages, and many techniques used by current search engines to rank results are based on analyzing and exploiting the hyperlink structure of the web.

Prof. Suel's research in this area, performed in the recently opened Web Exploration and Search Technology Lab (WestLab), looks at a number of problems in this context, ranging from system building to formal algorithm design and analysis, and includes the storage and compression of large web page collections, efficient data acquisition (crawling), analysis of the web graph and structure, and support for powerful query operations on web archives. Parts of this work are also performed in collaboration with Profs. Delis, Hellerstein, and Memon.