From: alaric AT abwillms DOT demon DOT co DOT uk (Alaric B. Williams) Newsgroups: comp.os.msdos.djgpp Subject: Re: Why not to use 'tar' before packing DJGPP? Date: Mon, 11 Nov 1996 18:08:11 GMT Lines: 53 Message-ID: <847735694.6203.0@abwillms.demon.co.uk> References: <32823D97 DOT 44DD AT sabat DOT tu DOT kielce DOT pl> <3282A82E DOT 7EE7 AT cs DOT com> <55vapk$s4l AT news DOT ox DOT ac DOT uk> <32841395 DOT 48F6 AT cs DOT com> NNTP-Posting-Host: abwillms.demon.co.uk To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp "John M. Aldrich" wrote: >> Not that I really understand tar anyway, but those compression ratios >> look good... maybe you could just tar the source files? People who are >> interested in the source code of the compiler are more likely to know >> what they're doing with tar anyway. >That's not such a bad idea. After all, if they get the source, then >they probably already have at least djdev, which contains djtar. :) >Tar is really neat, but the reason it gets good ratios is not because it >does any compression itself, but because it's much more efficient to >compress a single tar file than lots of untarred ones. I imagine you >would get similar (but not identical) results if you took one of the >distribution .zip files and re-zipped it. I have known this to reduce a >zipfile's size by several percent. ZIP and similar utilities work by spotting repeated patterns in the file, and the second time that string arises, merely inserting a pointer of sorts to the first occurence. If you rezip a ZIP file, the saving is generally on the headers and other stuff in the ZIP file, since the compressed bits have a very high entropy (uncompressability). If you make a .tar file then zip it, ZIP can detect repeated strings /accross/ files, which it can't normally do, since it compresses each file individually. EG: 100 files containing the string "Hello, World". PKZIP will not be able to compress any of them, there's no repeated strings, so it will 'store' them. TAR would create a big file with "Hello World" 100 times and the filenames in. ZIPing that file, PKZIP would notice the repeated "Hello World", and probably replace each copy of it with a two byte escape sequence. And the .tar file headers will be mostly the same (size of file, probably most of the filename (test1,test2,test3...), etc) and produce immense compression. Neither ZIP nor TAR alone can compress the files themselves. The whole is greater than the sum of the parts! ABW -- "Simply drag your mother in law's cellphone number from the Address Book to the Laser Satellite icon, and the Targeting Wizard will locate her. Then follow the onscreen prompts for gigawattage and dispersion pattern..." (Windows for Early Warning and Defence User's manual P385) Alaric B. Williams Internet : alaric AT abwillms DOT demon DOT co DOT uk Hello :-)