From: k3040e4 AT c210 DOT edvz DOT uni-linz DOT ac DOT at (Oberhumer Markus) Message-Id: <199608311034.MAA27344@c210.edvz.uni-linz.ac.at> Subject: LZO 0.20 -- a real-time data compression library To: djgpp-announce AT delorie DOT com Date: Sat, 31 Aug 1996 12:34:05 -0200 (MET DST) Return-Read-To: markus DOT oberhumer AT jk DOT uni-linz DOT ac DOT at Content-Type: text Content-Length: 12354 -----BEGIN PGP SIGNED MESSAGE----- ============================================================================== LZO -- a real-time data compression library ============================================================================== Author : Markus Franz Xaver Johannes Oberhumer http://www.infosys.tuwien.ac.at/Staff/lux/marco/lzo.html Version : 0.20 Date : 11-Aug-1996 Availability ------------ ftp://hpv17.infosys.tuwien.ac.at/pub/mfx/lzo/lzo-0.20.tar.gz ftp://tsx-11.mit.edu/pub/linux/sources/libs/lzo-0.20.tar.gz Archive size is about 99 kB. Introduction ------------ LZO is a data compression library which is suitable for data de-/compression in real-time. This means it favours speed over compression ratio. I named it LZO standing for Lempel-Ziv-Oberhumer. LZO is written in ANSI C. Both the source code and the compressed data format are designed to be portable across platforms, and I intend to make LZO work on each architecture that has at least 32 bit integers. Somewhat limited support for 16 bit MSDOS is implemented as well (using 'huge' pointers). This release includes 7 algorithms (LZO1, LZO1A, LZO1B, LZO1C, LZO1D, LZO1F and LZO1X) each of which can be tuned to different compression levels. I already published LZO1 and LZO1A via the Internet newsgroups comp.compression and comp.compression.research in March 1996. LZO1 and LZO1A are mainly included for compatibility reasons. LZO1D decompression is too slow, and there is no fast compressor anyway. So the algorithms of current interest are LZO1B, LZO1C, LZO1F and LZO1X. Each of these has a slightly different characteristics, and it seems that the new LZO1X algorithm is a good overall choice for a wide range of data. The LZO1B/LZO1C/LZO1F/LZO1X algorithm has the following features: - Decompression is simple and *very* fast. - Requires no memory for decompression. - Requires 64kB of memory for compression. - Allows you to dial up extra compression at a speed cost in the compressor. The speed of the decompressor is not affected. - Algorithm is thread safe. - Algorithm is lossless. - LZO1B is deterministic. Design criteria --------------- LZO was designed with speed in mind. Decompressor speed has been favoured over compressor speed. Real-time decompression should be possible for virtually any application. I expect an implementation of the decompressor in assembler code to run at the third of the speed of a memcpy(). In fact, I first wrote the decompressor of each algorithm thereby defining the compressed data format, verified it with manually created test data and at last added the compressor. Performance ----------- To keep you interested, here is a short overview of the average results when compressing the (in-)famous Calgary Corpus test suite with a blocksize of 256kB, running on my i486 DX2/66. The naming convention of the algorithms goes LZOxx-N, where N is the compression level. Range 1-9 indicates the fast standard levels using 64kB memory for compression. Level 99 offers better compression at the cost of more memory (256kB), and is still reasonable fast. Level 999 achieves nearly optimal compression - but it is slow and uses much memory, and is mainly intended for generating pre-compressed data. +--------------------------------------------------------------------+ | Algorithm Length CxB ComLen %Remn Bits Com K/s Dec K/s | | --------- ------ --- ------ ----- ---- ------- ------- | | LZO1-1 224401 1 118375 53.5 4.28 1704.11 3580.56 | | | | LZO1A-1 224401 1 115988 52.0 4.16 1656.07 3670.48 | | | | LZO1B-1 224401 1 110653 49.6 3.97 1575.28 4411.62 | | LZO1B-2 224401 1 107416 48.6 3.89 1451.90 4437.53 | | LZO1B-3 224401 1 105537 47.9 3.83 1343.32 4422.70 | | LZO1B-4 224401 1 104828 47.4 3.79 1037.02 4514.11 | | LZO1B-5 224401 1 102724 46.7 3.73 943.35 4521.54 | | LZO1B-6 224401 1 101210 46.0 3.68 883.91 4483.46 | | LZO1B-7 224401 1 101388 46.0 3.68 811.64 4572.20 | | LZO1B-8 224401 1 99453 45.2 3.62 731.56 4514.98 | | LZO1B-9 224401 1 99118 45.0 3.60 592.89 4426.57 | | | | LZO1C-1 224401 1 112051 50.2 4.02 1518.80 4392.48 | | LZO1C-2 224401 1 108791 49.1 3.93 1416.69 4413.89 | | LZO1C-3 224401 1 106825 48.4 3.87 1321.23 4389.14 | | LZO1C-4 224401 1 105717 47.7 3.82 1032.84 4494.94 | | LZO1C-5 224401 1 103605 46.9 3.76 940.19 4498.48 | | LZO1C-6 224401 1 102585 46.5 3.72 863.84 4452.71 | | LZO1C-7 224401 1 101937 46.2 3.70 772.97 4548.76 | | LZO1C-8 224401 1 100779 45.6 3.65 708.72 4487.48 | | LZO1C-9 224401 1 100252 45.4 3.63 598.06 4396.90 | | | | LZO1F-1 224401 1 116765 51.5 4.12 1665.56 4805.74 | | | | LZO1X-1 224401 1 110368 49.4 3.95 1601.11 4919.72 | | | | LZO1-99 224401 1 101560 46.7 3.73 473.35 3643.55 | | LZO1A-99 224401 1 99958 45.5 3.64 471.55 3692.92 | | LZO1B-99 224401 1 95399 43.6 3.49 454.05 4461.44 | | LZO1C-99 224401 1 97252 44.1 3.53 455.97 4447.05 | | | | LZO1C-999 224401 1 87762 40.2 3.21 87.57 4714.26 | | LZO1D-999 224401 1 87901 40.0 3.20 84.90 3208.91 | | LZO1F-999 224401 1 89620 40.4 3.23 78.35 5466.84 | | LZO1X-999 224401 1 84521 38.5 3.08 53.40 5203.72 | | | | LZRW1-A 224401 1 123194 55.1 4.41 1564.23 3396.10 | | LZRW2 224401 1 115399 51.5 4.12 1301.08 2379.13 | | LZRW3 224401 1 110942 50.0 4.00 1518.32 1893.92 | | LZRW3-A(2) 224401 1 106308 48.1 3.85 1089.16 1989.55 | | LZRW3-A(4) 224401 1 103126 46.8 3.74 889.30 2060.22 | | LZRW3-A(8) 224401 1 100722 45.8 3.67 607.05 2064.86 | | LZV 224401 1 112511 51.4 4.11 1115.18 4247.76 | | zlib-8/1 224401 1 85302 38.8 3.10 360.44 1084.91 | | memcpy() 224401 1 224401 100.0 8.00 19992.22 20023.29 | +--------------------------------------------------------------------+ More detailed results can be found in the ./doc directory. Short documentation ------------------- LZO is a block compression algorithm - it compresses and decompresses a block of data. Block size must be the same for compression and decompression. LZO compresses a block of data into matches (a LZ77 sliding dictionary) and runs of non-matching literals. LZO takes care about long matches and long literal runs so that it produces good results on highly redundant data and deals accecptable with non-compressible data. When dealing with uncompressible data, LZO expands the input block by a maximum of 16 bytes per 1024 bytes input (probably only by a maximum of 8 bytes, I still have to compute the exact value). Most algorithms come with two decompressors - a fast and a safe one. The safe decompressor is designed such a way that it won't crash your application when it is fed with incorrect or damaged data but will catch all possible overruns and return an error code in this case. When using the safe decompressor you must pass the number of bytes available in 'dst' via the parameter 'dst_len'. I have verified LZO using gcc 2.7.2 with Richard W.M. Jones's bounds checking patches and compressed a lot of files when tuning some parameters. LZO is free of any known bugs. The algorithms -------------- There are (too) many algorithms implemented. But I want to support unlimited backwards compatibility, so I will *not* reduce the LZO distribution in the future (what I had planned some time ago). As the many object files are mostly independent of each other, the size overhead for an executable statically linked with the LZO library is usually pretty low cause the linker will only add the modules that you are actually using. My experiments have shown that LZO1B is good with a large blocksize or with very redundant data, LZO1F is good with a small blocksize or with binary data and that LZO1X is often the best choice of all. LZO1F seems to compress better than LZO1C on these files where LZO1C is better than LZO1B. Beware, your mileage may vary. Portability ----------- I have built and tested LZO successfully on the following platforms: i486 - Linux 1.2.6 - gcc 2.6.3 i486 - MSDOS - djgpp v2 + gcc 2.7.2 i486 - MSDOS - emx + gcc 2.7.2 i486 - MSDOS - Watcom C 10.5 (32 bit) i486 - MSDOS - Borland C 4.0 (16 bit) LZO should work without problems on all machines that have an ANSI C compiler and at least 32 bit integers. The supplied makefile requires GNU make. I need *your* feedback to support more platforms and fix possible portability problems. Please take a look at the file PLATFORM.TXT. The future ---------- Right now my time is very limited as I'm (still :-) working on finishing my diploma thesis (which has nothing to do with compression at all). I consider LZO to be pretty finished right now, and I'd also like to freeze most of the source code except for fixes - but no promises... The following things still need to be done before this could be called version 1: - test on other 32/64 bit architechtures. I do not expect major problems. - thoroughly test on 16 bit systems (already works with Borland C 4.0) - implement the missing LZO1F compression levels - implement the missing LZO1X compression levels - write a more detailed documentation I'm also thinking about these features (contributions are welcome): - implement an ultra fast LZO1X-0 (at the cost of compression ratio) - speed up the compressors using unaligned memory access where possible - a stream interface (compatible to zlib ?) would be nice (and slow ?) - can someone help me setting up automake/autoconf ? - interfaces to Perl and Java Some comments about the source code ----------------------------------- If you want to understand how LZO works, carefully study the sources in the ./extra directory first. Be warned: the main source code in the ./src directory is a real pain to understand as I've experimented with hundreds of slightly different versions. It contains many #if and some gotos, and is completely optimized for speed and not for readability. Code sharing of the different algorithms is implemented by stressing the preprocessor - this can be really confusing. Lot's of marcos and assertions don't make things better. Nevertheless the sources compile very quiet on a variety of compilers with the highest warning levels turned on, even in C++ mode. Thanks ------ Laszlo Molnar Charles W. Sandmann Wolfgang Lugmayr Natascha Copyright --------- LZO is Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer LZO is distributed under the terms of the GNU Library General Public License (LGPL). See the file COPYING.LIB. Special licenses for commercial and other applications which are not willing to accept the GNU Library General Public License are available by contacting the author. -----BEGIN PGP SIGNATURE----- Version: 2.6.2i iQCVAwUBMh3RaW10fyLu8beJAQHZCwP/bE/eEJvRIsI2B3CtQ1g6fzmmt8ZHikXo wRTdk5gSccTuDIFBYqOqKpMurmEjOKQFlr34e787MbpXZHAWl2yAbh1pPmg9iFZy 7d4uow1a3J/72QIt/ucWavT0mxXzsBl19ylc71E0Auc76xvxg2HgsD5Iax4L70au PG/mQiOsVxo= =LXqt -----END PGP SIGNATURE-----