www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/11/11/13:46:46

From: alaric AT abwillms DOT demon DOT co DOT uk (Alaric B. Williams)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Why not to use 'tar' before packing DJGPP?
Date: Mon, 11 Nov 1996 18:08:11 GMT
Lines: 53
Message-ID: <847735694.6203.0@abwillms.demon.co.uk>
References: <32823D97 DOT 44DD AT sabat DOT tu DOT kielce DOT pl> <3282A82E DOT 7EE7 AT cs DOT com> <55vapk$s4l AT news DOT ox DOT ac DOT uk> <32841395 DOT 48F6 AT cs DOT com>
NNTP-Posting-Host: abwillms.demon.co.uk
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

"John M. Aldrich" <fighteer AT cs DOT com> wrote:
>> Not that I really understand tar anyway, but those compression ratios
>> look good... maybe you could just tar the source files? People who are
>> interested in the source code of the compiler are more likely to know
>> what they're doing with tar anyway.

>That's not such a bad idea.  After all, if they get the source, then
>they probably already have at least djdev, which contains djtar.  :) 
>Tar is really neat, but the reason it gets good ratios is not because it
>does any compression itself, but because it's much more efficient to
>compress a single tar file than lots of untarred ones.  I imagine you
>would get similar (but not identical) results if you took one of the
>distribution .zip files and re-zipped it.  I have known this to reduce a
>zipfile's size by several percent.

ZIP and similar utilities work by spotting repeated patterns in the
file, and the second time that string arises, merely inserting a
pointer of sorts to the first occurence. If you rezip a ZIP file, the
saving is generally on the headers and other stuff in the ZIP file,
since the compressed bits have a very high entropy
(uncompressability). If you make a .tar file then zip it, ZIP can
detect repeated strings /accross/ files, which it can't normally do,
since it compresses each file individually.

EG: 100 files containing the string "Hello, World".
PKZIP will not be able to compress any of them, there's no repeated
strings, so it will 'store' them.

TAR would create a big file with "Hello World" 100 times and the
filenames in.

ZIPing that file, PKZIP would notice the repeated "Hello World", and
probably replace each copy of it with a two byte escape sequence. And
the .tar file headers will be mostly the same (size of file, probably
most of the filename (test1,test2,test3...), etc) and produce immense
compression.

Neither ZIP nor TAR alone can compress the files themselves. The whole
is greater than the sum of the parts!

ABW
--

"Simply drag your mother in law's cellphone number from the
Address Book to the Laser Satellite icon, and the Targeting
Wizard will locate her. Then follow the onscreen prompts for
gigawattage and dispersion pattern..."

(Windows for Early Warning and Defence User's manual P385)

Alaric B. Williams Internet : alaric AT abwillms DOT demon DOT co DOT uk
<A HREF="http://www.abwillms.demon.co.uk/">Hello :-)</A>

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019