![]() |
![]() |
TR-CIS-2004-02 (04/29/2004)
Dimitre Trendafilov, Nasir Memon, Torsten Suel
Abstract
Delta compression techniques solve the problem of encoding a given target file
with respect to one or more reference files. Recent work
has demonstrated the benefits of using
such techniques in the context of file collection compression. In these
scenarios, files are often better compressed by computing deltas with respect
to other similar files from the same collection, as opposed to compressing
each file by itself. It is known that the optimal set of such delta encodings,
assuming that only a single reference file is used for each target file, can
be found by computing an optimal branching on a directed graph.
In this paper we propose two techniques for improving the compression of file
collections. The first one utilizes deltas computed with respect to more
than one file, while the second one improves the compressibility of batched
file collections, such as tar archives, using standard compression tools.
Both techniques are based on a reduction to the Traveling Sales Person
problem on directed weighted graphs. We present experiments demonstrating
the benefits of our methods.