Mail Archives: djgpp/1996/11/14/03:38:57
Mark Habersack wrote:
>
> The reason of better compression ratios of tar archives is very simple.
> LZW compression which (or rather its modification) is used by PKZIP and
> some other archivers uses something called a "sliding dictionary". It's
> a structure that holds pairs of data: pattern and its numerical code.
> When the compressor reads the input stream it looks up the dictionary to
> whether the just-read pattern already ocurred in the previously read
> stream of data. If so, then the pattern is replaced with the
> corresponding code read from dictionary - thus you have just compressed
> the input data. This is a simplified description of the LZW algorithm,
> but it's enough to understand what follows. >
>
Just a small point : the algorithm used by PKZip is *not* LZW (that's the
one in the GIF format), but another one called "LZ77".
The LZ mean Lempel-Ziv, the two guys which invented the algorithm. They
actually published two algorithms, on in 1977 (LZ77) and the second one
in 1978 (LZ78), LZW is a derivative of LZ78.
These two algorithm are *not* alike at all :
LZ77 use a "sliding window" to look for dictionnary matches
LZ78 builds a dictionnary incrementally, by adding en entry everytime it
meets a string not yet in the dictionnary.
Historically LZ78 was used in the first compression programs for PC, like
ARC, and LZW if used in the GIF format.
Since then, newer progs like PKZIP, LHA, ARJ... use LZ77.
LZ77 has one very strong point (IMHO): its coding and decoding routines
are very asymetric : coding is a bit complex to program, and the
algorithm is rather slow. But decoding is very simple to program, and
very quick to run. For this reason, it has been used a lot in executable
compression.
Another (not so unimportant) point is that LZW is patented...
Francois
- Raw text -