The zdelta-2.0 Experimental Results
Experimental results comparing zdelta against vcdiff
(version 2002),
xdelta, and gzip.
Three different classes of experiments are presented here.
- The first experiment observes the compression
performance, achieved compression ratio, and compression/decompression speed
on real data. The data for this experiment are the gcc and the
emacs data sets from Benchmark1.
-
The second experiment, observes how the compression
parameters are affected by the input data similarity. The data for this
experiment are the artificial morph data sets from
Benchmark2; the input file size is
fixed to 1MB and the file similarity is varied.
-
The third experiment observes the relation between the
input size and the compression/decompression speed. For this experiment
again the data
are morph data set from
Benchmark2;
this time the file similarity is fixed, and the file size i
s varied.
Experiments exploring potential delta compression applications.
-
In experiment 4 we observe zdelta-2.0 performance
for compressing versions of HTML files (Benchmark 3 avalable here). For comparison we used
gzip to compress individually each of the target data set. In
addition we define a trivial delta compression, using the
following agorithm: if target and the reference files are identical output
1, otherwise output 0 followed by gzip(target).
- In experiment 5 we compress collections of files
with similar structure and content, that a set of files from a specific
website.
- In experiment 6 we compress web pages in relation
to 1, 2, 3, or 4 pages linked to them.
The experiments presented here were conducted on machines with
the following configurations:
NOTE: Machine II and Machine III differ from the machines
used to run the corresponding zdelta-1.0 experiments.
- Machine I:
E450 Sun Enterprise, with 2 ULTRA SPARC iie CPU
at 400MHz and 4GB of RAM, using 5 SCSI disks with RAID-5 configuration
- Machine II:
2*2.4Ghz Xeon CPU with 2GB of RAM, using 2 IDE Western Digital
WD1200JB disk drives
- Machine III:
Sun Blade 100, with ULTRA SPARC iie CPU at
500MHz and 2GB of RAM with 2 IDE Western Digital WD1000BB disk drives
For more information on the conducted experiments refer to the results section
of [coming soon].
Experiments comparing zdelta-2.0 to other available delta
compression tools:
Experiment 1
| gcc |
size |
ratio |
MachineI compress |
MachineI decompress |
MachineII compress |
MachineII decompress |
MachineIII compress |
MachineIII decompress |
| uncompressed |
27289 |
- |
- |
- |
- |
- |
- |
- |
| gzip |
7563 |
3.61 |
31.1 |
17.9 |
8.52 |
6.11 |
33.8 |
22.8 |
| xdelta |
462 |
59.1 |
21.1 |
15.7 |
4.94 |
3.80 |
26.9 |
22.9 |
| vcdiff 2002 |
287 |
94.9 |
31.9 |
16.3 |
11.3 |
5.83 |
42.2 |
20.6 |
| zdelta (STDIO) |
227 |
120 |
37.6 |
18.0 |
9.20 |
6.0 |
43.4 |
20.7 |
| zdelta (direct) |
227 |
120 |
29.6 |
10.0 |
5.66 |
2.41 |
32.7 |
11.6 |
Compressed sizes and running times for the gcc files
(sizes in KB and times in seconds)
| emacs |
size |
ratio |
MachineI compress |
MachineI decompress |
MachineII compress |
MachineII decompress |
MachineIII compress |
MachineIII decompress |
| uncompressed |
27327 |
- |
- |
- |
- |
- |
- |
- |
| gzip |
8577 |
3.19 |
35.8 |
22.5 |
10.2 |
7.73 |
40.6 |
29.6 |
| xdelta |
2132 |
12.8 |
30.1 |
21.2 |
10.2 |
7.73 |
40.6 |
29.6 |
| vcdiff 2002 |
1819 |
15.0 |
34.8 |
20.8 |
11.8 |
7.50 |
44.6 |
26.3 |
| zdelta (STDIO) |
1419 |
19.3 |
49.7 |
22.7 |
12.7 |
7.51 |
57.7 |
25.5 |
| zdelta (direct) |
1419 |
19.3 |
39.4 |
12.5 |
8.22 |
3.08 |
44.5 |
13.9 |
Compressed sizes and running times for the emacs files
(sizes in KB and times in seconds)
Experiment 2
Compressed size (in KB) versus file similarity
Compression time (in seconds) versus file similarity
Decompression time (in seconds) versus file similarity
Experiment 3
Compression time (in seconds) versus file size
Decompression time (in seconds) versus file size
Experiments exploring potential delta compression applications:
Experiment 4
| Benchmark |
gzip |
trivial delta |
vcdiff 2002 |
zdelta |
| version | size |
size | ratio |
size | ratio |
size | ratio |
size | ratio |
| Version2 (August 15) | 143MB |
35.1MB | 4.08 |
12.6MB | 11.4 |
1.9MB | 74.5 |
1.46MB | 97.7 |
| Version3 (September 2) | 143MB |
35.0MB | 4.08 |
16.4MB | 8.69 |
2.8MB | 50.7 |
2.15MB | 64.2 |
| Version4 (October 27) | 140MB |
34.3MB | 4.09 |
21.0MB | 6.66 |
5.02MB | 27.9 |
3.86MB | 36.3 |
Compressed sizes for different versions of a web crawl consiting
of 10000 HTML files. Reference Version1 crawled on August 13.
| Benchmark |
gzip |
vcdiff 2002 |
zdelta |
| version | size |
size | ratio |
size | ratio |
size | ratio |
| Version2 (August 15) | 54.2MB |
12.6MB | 4.30 |
1.78MB | 30.5 |
1.39MB | 39.0 |
| Version3 (September 2) | 71.8MB |
16.4MB | 4.37 |
2.69MB | 26.7 |
2.09MB | 34.3 |
| Version4 (October 27) | 91.1MB |
21.0MB | 4.33 |
4.33MB | 18.5 |
3.81MB | 23.9 |
Compressed sizes for different versions of a web crawl consiting
of 10000 HTML files - compressing only non-identical pages.
Reference Version1 crawled on August 13.
Experiment 5
| Benchmark | Size(MB) | Files | cat+gzip ratio | zdeltaratio |
| cbc |
10.59 |
530 |
5.83 |
10.0 |
| cbsnews |
8.27 |
218 |
5.06 |
15.4 |
| csmonitor |
16.58 |
344 |
5.06 |
17.3 |
| ebay |
2.20 |
100 |
6.78 |
10.9 |
| thomas |
2.85 |
105 |
6.39 |
9.73 |
| usatoday |
8.23 |
344 |
6.26 |
9.17 |
| emacs |
38.86 |
1438 |
3.32 |
4.97 |
| gcc |
25.86 |
917 |
3.87 |
4.72 |
Compression ratios for for files with similar content and/or
structure - that is pages from a given website, or source files for a
specific program.
Experiment 6
The results will be made available soon.
Copyright © Dimitre Trendafilov 2002.
dtrend01@utopia.poly.edu