Computer & Information Science Department   Polytechnic University

ATTENTION: THIS WEB SITE HAS MOVED. The pages you are looking at are no longer being maintained. Please go to http://www.poly.edu/cis/ to visit the new site of the Department of Computer and Information Science at Polytechnic University.

Compressing File Collections with a TSP-Based Approach

TR-CIS-2004-02 (04/29/2004)
Dimitre Trendafilov, Nasir Memon, Torsten Suel

pdf version of this paper

Abstract
Delta compression techniques solve the problem of encoding a given target file with respect to one or more reference files. Recent work has demonstrated the benefits of using such techniques in the context of file collection compression. In these scenarios, files are often better compressed by computing deltas with respect to other similar files from the same collection, as opposed to compressing each file by itself. It is known that the optimal set of such delta encodings, assuming that only a single reference file is used for each target file, can be found by computing an optimal branching on a directed graph.

In this paper we propose two techniques for improving the compression of file collections. The first one utilizes deltas computed with respect to more than one file, while the second one improves the compressibility of batched file collections, such as tar archives, using standard compression tools. Both techniques are based on a reduction to the Traveling Sales Person problem on directed weighted graphs. We present experiments demonstrating the benefits of our methods.