www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/11/14/03:38:57

Message-ID: <328A467A.1A58@pobox.oleane.com>
Date: Wed, 13 Nov 1996 23:06:50 +0100
From: Francois Charton <deef AT pobox DOT oleane DOT com>
Organization: CCMSA
MIME-Version: 1.0
To: grendel AT ananke DOT amu DOT edu DOT pl
CC: djgpp AT delorie DOT com
Subject: Re: Why not to use 'tar' before packing DJGPP?
References: <32823D97 DOT 44DD AT sabat DOT tu DOT kielce DOT pl> <3282A82E DOT 7EE7 AT cs DOT com> <55vapk$s4l AT news DOT ox DOT ac DOT uk> <babcock DOT 847510845 AT cybercom DOT net> <561pv7$36c AT news DOT ox DOT ac DOT uk> <32851B7C DOT 66E2 AT ananke DOT amu DOT edu DOT pl>

Mark Habersack wrote:
> 
> The reason of better compression ratios of tar archives is very simple.
> LZW compression which (or rather its modification) is used by PKZIP and
> some other archivers uses something called a "sliding dictionary". It's
> a structure that holds pairs of data: pattern and its numerical code.
> When the compressor reads the input stream it looks up the dictionary to
> whether the just-read pattern already ocurred in the previously read
> stream of data. If so, then the pattern is replaced with the
> corresponding code read from dictionary - thus you have just compressed
> the input data. This is a simplified description of the LZW algorithm,
> but it's enough to understand what follows. > 
>


Just a small point : the algorithm used by PKZip is *not* LZW (that's the 
one in the GIF format), but another one called "LZ77".

The LZ mean Lempel-Ziv, the two guys which invented the algorithm. They 
actually published two algorithms, on in 1977 (LZ77) and the second one 
in 1978 (LZ78), LZW is a derivative of LZ78. 

These two algorithm are *not* alike at all : 

LZ77 use a "sliding window" to look for dictionnary matches 

LZ78 builds a dictionnary incrementally, by adding en entry everytime it 
meets a string not yet in the dictionnary.

Historically LZ78 was used in the first compression programs for PC, like 
ARC, and LZW if used in the GIF format. 

Since then, newer progs like PKZIP, LHA, ARJ... use LZ77. 

LZ77 has one very strong point (IMHO): its coding and decoding routines 
are very asymetric : coding is a bit complex to program, and the 
algorithm is rather slow. But decoding is very simple to program, and 
very quick to run. For this reason, it has been used a lot in executable 
compression. 

Another (not so unimportant) point is that LZW is patented...


Francois

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019