Computer & Information Science Department   Polytechnic University

ATTENTION: THIS WEB SITE HAS MOVED. The pages you are looking at are no longer being maintained. Please go to http://www.poly.edu/cis/ to visit the new site of the Department of Computer and Information Science at Polytechnic University.

Redundancy Elimination Within Large Collections of Files

Fred Douglis
IBM T.J. Watson Research Center
Friday, February 13, 2004, 11:00am - 12:00pm
LC 102, Brooklyn Campus, Polytechnic University

Ongoing advancements in technology lead to ever-increasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements for resources such as computation and memory. We propose a new scheme for storage reduction that reduces data sizes with an effectiveness comparable to the more expensive techniques, but at a cost comparable to the faster but less effective ones. The scheme, called Redundancy Elimination at the Block Level (REBL), leverages the benefits of compression, duplicate block suppression, and delta-encoding to eliminate a broad spectrum of redundant data in a scalable and efficient manner. REBL also uses super-fingerprints, a technique that reduces the data needed to identify similar blocks and therefore the computational requirements of this process. As a result, REBL encodes more compactly than compression and duplicate suppression while executing faster than generic delta-encoding. For the data sets analyzed, REBL improved on the space reduction of other techniques by factors of 4-23 in the best case.

This is joint work with Purushottam Kulkarni, Jason LaVoie, and John M. Tracey.

Biography:
Dr. Fred Douglis is a Research Staff Member at IBM Research in Hawthorne, NY. His research interests include data reduction techniques, internet service performance, internet tools, load sharing, and file systems. Before joining IBM, he was with AT&T Labs--Research for many years, where he most recently headed the Distributed Systems Research department. Fred Douglis was the founding chair of the IEEE Computer Society's Technical Committee on the Internet (TCI), and is also a past chair of the Computer Society's Technical Committee on Operating Systems (TCOS). In addition to serving on numerous program committees, he serves on the editorial board of IEEE Internet Computing and was the program chair or vice-chair of many conferences including WWW in 2003, 2002 and 1999, USENIX Symposium on Internet Technologies and Systems (1999), the 1998 USENIX Technical Conference, Symposium on Applications and the Internet (SAINT) in 2001, and the Web Caching Workshop in 2003. He holds a PhD from Berkeley.

For further information or to meet with the speaker please contact Torsten Suel at suel@poly.edu.