The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/contrib/zlib/doc/rfc1950.txt

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1 
    2 
    3 
    4 
    5 
    6 
    7 Network Working Group                                         P. Deutsch
    8 Request for Comments: 1950                           Aladdin Enterprises
    9 Category: Informational                                      J-L. Gailly
   10                                                                 Info-ZIP
   11                                                                 May 1996
   12 
   13 
   14          ZLIB Compressed Data Format Specification version 3.3
   15 
   16 Status of This Memo
   17 
   18    This memo provides information for the Internet community.  This memo
   19    does not specify an Internet standard of any kind.  Distribution of
   20    this memo is unlimited.
   21 
   22 IESG Note:
   23 
   24    The IESG takes no position on the validity of any Intellectual
   25    Property Rights statements contained in this document.
   26 
   27 Notices
   28 
   29    Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
   30 
   31    Permission is granted to copy and distribute this document for any
   32    purpose and without charge, including translations into other
   33    languages and incorporation into compilations, provided that the
   34    copyright notice and this notice are preserved, and that any
   35    substantive changes or deletions from the original are clearly
   36    marked.
   37 
   38    A pointer to the latest version of this and related documentation in
   39    HTML format can be found at the URL
   40    <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
   41 
   42 Abstract
   43 
   44    This specification defines a lossless compressed data format.  The
   45    data can be produced or consumed, even for an arbitrarily long
   46    sequentially presented input data stream, using only an a priori
   47    bounded amount of intermediate storage.  The format presently uses
   48    the DEFLATE compression method but can be easily extended to use
   49    other compression methods.  It can be implemented readily in a manner
   50    not covered by patents.  This specification also defines the ADLER-32
   51    checksum (an extension and improvement of the Fletcher checksum),
   52    used for detection of data corruption, and provides an algorithm for
   53    computing it.
   54 
   55 
   56 
   57 
   58 Deutsch & Gailly             Informational                      [Page 1]
   59 
   60 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
   61 
   62 
   63 Table of Contents
   64 
   65    1. Introduction ................................................... 2
   66       1.1. Purpose ................................................... 2
   67       1.2. Intended audience ......................................... 3
   68       1.3. Scope ..................................................... 3
   69       1.4. Compliance ................................................ 3
   70       1.5.  Definitions of terms and conventions used ................ 3
   71       1.6. Changes from previous versions ............................ 3
   72    2. Detailed specification ......................................... 3
   73       2.1. Overall conventions ....................................... 3
   74       2.2. Data format ............................................... 4
   75       2.3. Compliance ................................................ 7
   76    3. References ..................................................... 7
   77    4. Source code .................................................... 8
   78    5. Security Considerations ........................................ 8
   79    6. Acknowledgements ............................................... 8
   80    7. Authors' Addresses ............................................. 8
   81    8. Appendix: Rationale ............................................ 9
   82    9. Appendix: Sample code ..........................................10
   83 
   84 1. Introduction
   85 
   86    1.1. Purpose
   87 
   88       The purpose of this specification is to define a lossless
   89       compressed data format that:
   90 
   91           * Is independent of CPU type, operating system, file system,
   92             and character set, and hence can be used for interchange;
   93 
   94           * Can be produced or consumed, even for an arbitrarily long
   95             sequentially presented input data stream, using only an a
   96             priori bounded amount of intermediate storage, and hence can
   97             be used in data communications or similar structures such as
   98             Unix filters;
   99 
  100           * Can use a number of different compression methods;
  101 
  102           * Can be implemented readily in a manner not covered by
  103             patents, and hence can be practiced freely.
  104 
  105       The data format defined by this specification does not attempt to
  106       allow random access to compressed data.
  107 
  108 
  109 
  110 
  111 
  112 
  113 
  114 Deutsch & Gailly             Informational                      [Page 2]
  115 
  116 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  117 
  118 
  119    1.2. Intended audience
  120 
  121       This specification is intended for use by implementors of software
  122       to compress data into zlib format and/or decompress data from zlib
  123       format.
  124 
  125       The text of the specification assumes a basic background in
  126       programming at the level of bits and other primitive data
  127       representations.
  128 
  129    1.3. Scope
  130 
  131       The specification specifies a compressed data format that can be
  132       used for in-memory compression of a sequence of arbitrary bytes.
  133 
  134    1.4. Compliance
  135 
  136       Unless otherwise indicated below, a compliant decompressor must be
  137       able to accept and decompress any data set that conforms to all
  138       the specifications presented here; a compliant compressor must
  139       produce data sets that conform to all the specifications presented
  140       here.
  141 
  142    1.5.  Definitions of terms and conventions used
  143 
  144       byte: 8 bits stored or transmitted as a unit (same as an octet).
  145       (For this specification, a byte is exactly 8 bits, even on
  146       machines which store a character on a number of bits different
  147       from 8.) See below, for the numbering of bits within a byte.
  148 
  149    1.6. Changes from previous versions
  150 
  151       Version 3.1 was the first public release of this specification.
  152       In version 3.2, some terminology was changed and the Adler-32
  153       sample code was rewritten for clarity.  In version 3.3, the
  154       support for a preset dictionary was introduced, and the
  155       specification was converted to RFC style.
  156 
  157 2. Detailed specification
  158 
  159    2.1. Overall conventions
  160 
  161       In the diagrams below, a box like this:
  162 
  163          +---+
  164          |   | <-- the vertical bars might be missing
  165          +---+
  166 
  167 
  168 
  169 
  170 Deutsch & Gailly             Informational                      [Page 3]
  171 
  172 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  173 
  174 
  175       represents one byte; a box like this:
  176 
  177          +==============+
  178          |              |
  179          +==============+
  180 
  181       represents a variable number of bytes.
  182 
  183       Bytes stored within a computer do not have a "bit order", since
  184       they are always treated as a unit.  However, a byte considered as
  185       an integer between 0 and 255 does have a most- and least-
  186       significant bit, and since we write numbers with the most-
  187       significant digit on the left, we also write bytes with the most-
  188       significant bit on the left.  In the diagrams below, we number the
  189       bits of a byte so that bit 0 is the least-significant bit, i.e.,
  190       the bits are numbered:
  191 
  192          +--------+
  193          |76543210|
  194          +--------+
  195 
  196       Within a computer, a number may occupy multiple bytes.  All
  197       multi-byte numbers in the format described here are stored with
  198       the MOST-significant byte first (at the lower memory address).
  199       For example, the decimal number 520 is stored as:
  200 
  201              0     1
  202          +--------+--------+
  203          |00000010|00001000|
  204          +--------+--------+
  205           ^        ^
  206           |        |
  207           |        + less significant byte = 8
  208           + more significant byte = 2 x 256
  209 
  210    2.2. Data format
  211 
  212       A zlib stream has the following structure:
  213 
  214            0   1
  215          +---+---+
  216          |CMF|FLG|   (more-->)
  217          +---+---+
  218 
  219 
  220 
  221 
  222 
  223 
  224 
  225 
  226 Deutsch & Gailly             Informational                      [Page 4]
  227 
  228 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  229 
  230 
  231       (if FLG.FDICT set)
  232 
  233            0   1   2   3
  234          +---+---+---+---+
  235          |     DICTID    |   (more-->)
  236          +---+---+---+---+
  237 
  238          +=====================+---+---+---+---+
  239          |...compressed data...|    ADLER32    |
  240          +=====================+---+---+---+---+
  241 
  242       Any data which may appear after ADLER32 are not part of the zlib
  243       stream.
  244 
  245       CMF (Compression Method and flags)
  246          This byte is divided into a 4-bit compression method and a 4-
  247          bit information field depending on the compression method.
  248 
  249             bits 0 to 3  CM     Compression method
  250             bits 4 to 7  CINFO  Compression info
  251 
  252       CM (Compression method)
  253          This identifies the compression method used in the file. CM = 8
  254          denotes the "deflate" compression method with a window size up
  255          to 32K.  This is the method used by gzip and PNG (see
  256          references [1] and [2] in Chapter 3, below, for the reference
  257          documents).  CM = 15 is reserved.  It might be used in a future
  258          version of this specification to indicate the presence of an
  259          extra field before the compressed data.
  260 
  261       CINFO (Compression info)
  262          For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
  263          size, minus eight (CINFO=7 indicates a 32K window size). Values
  264          of CINFO above 7 are not allowed in this version of the
  265          specification.  CINFO is not defined in this specification for
  266          CM not equal to 8.
  267 
  268       FLG (FLaGs)
  269          This flag byte is divided as follows:
  270 
  271             bits 0 to 4  FCHECK  (check bits for CMF and FLG)
  272             bit  5       FDICT   (preset dictionary)
  273             bits 6 to 7  FLEVEL  (compression level)
  274 
  275          The FCHECK value must be such that CMF and FLG, when viewed as
  276          a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
  277          is a multiple of 31.
  278 
  279 
  280 
  281 
  282 Deutsch & Gailly             Informational                      [Page 5]
  283 
  284 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  285 
  286 
  287       FDICT (Preset dictionary)
  288          If FDICT is set, a DICT dictionary identifier is present
  289          immediately after the FLG byte. The dictionary is a sequence of
  290          bytes which are initially fed to the compressor without
  291          producing any compressed output. DICT is the Adler-32 checksum
  292          of this sequence of bytes (see the definition of ADLER32
  293          below).  The decompressor can use this identifier to determine
  294          which dictionary has been used by the compressor.
  295 
  296       FLEVEL (Compression level)
  297          These flags are available for use by specific compression
  298          methods.  The "deflate" method (CM = 8) sets these flags as
  299          follows:
  300 
  301             0 - compressor used fastest algorithm
  302             1 - compressor used fast algorithm
  303             2 - compressor used default algorithm
  304             3 - compressor used maximum compression, slowest algorithm
  305 
  306          The information in FLEVEL is not needed for decompression; it
  307          is there to indicate if recompression might be worthwhile.
  308 
  309       compressed data
  310          For compression method 8, the compressed data is stored in the
  311          deflate compressed data format as described in the document
  312          "DEFLATE Compressed Data Format Specification" by L. Peter
  313          Deutsch. (See reference [3] in Chapter 3, below)
  314 
  315          Other compressed data formats are not specified in this version
  316          of the zlib specification.
  317 
  318       ADLER32 (Adler-32 checksum)
  319          This contains a checksum value of the uncompressed data
  320          (excluding any dictionary data) computed according to Adler-32
  321          algorithm. This algorithm is a 32-bit extension and improvement
  322          of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
  323          standard. See references [4] and [5] in Chapter 3, below)
  324 
  325          Adler-32 is composed of two sums accumulated per byte: s1 is
  326          the sum of all bytes, s2 is the sum of all s1 values. Both sums
  327          are done modulo 65521. s1 is initialized to 1, s2 to zero.  The
  328          Adler-32 checksum is stored as s2*65536 + s1 in most-
  329          significant-byte first (network) order.
  330 
  331 
  332 
  333 
  334 
  335 
  336 
  337 
  338 Deutsch & Gailly             Informational                      [Page 6]
  339 
  340 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  341 
  342 
  343    2.3. Compliance
  344 
  345       A compliant compressor must produce streams with correct CMF, FLG
  346       and ADLER32, but need not support preset dictionaries.  When the
  347       zlib data format is used as part of another standard data format,
  348       the compressor may use only preset dictionaries that are specified
  349       by this other data format.  If this other format does not use the
  350       preset dictionary feature, the compressor must not set the FDICT
  351       flag.
  352 
  353       A compliant decompressor must check CMF, FLG, and ADLER32, and
  354       provide an error indication if any of these have incorrect values.
  355       A compliant decompressor must give an error indication if CM is
  356       not one of the values defined in this specification (only the
  357       value 8 is permitted in this version), since another value could
  358       indicate the presence of new features that would cause subsequent
  359       data to be interpreted incorrectly.  A compliant decompressor must
  360       give an error indication if FDICT is set and DICTID is not the
  361       identifier of a known preset dictionary.  A decompressor may
  362       ignore FLEVEL and still be compliant.  When the zlib data format
  363       is being used as a part of another standard format, a compliant
  364       decompressor must support all the preset dictionaries specified by
  365       the other format. When the other format does not use the preset
  366       dictionary feature, a compliant decompressor must reject any
  367       stream in which the FDICT flag is set.
  368 
  369 3. References
  370 
  371    [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
  372        available in ftp://ftp.uu.net/pub/archiving/zip/doc/
  373 
  374    [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
  375        available in ftp://ftp.uu.net/graphics/png/documents/
  376 
  377    [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
  378        available in ftp://ftp.uu.net/pub/archiving/zip/doc/
  379 
  380    [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
  381        Transmissions," IEEE Transactions on Communications, Vol. COM-30,
  382        No. 1, January 1982, pp. 247-252.
  383 
  384    [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
  385        November, 1993, pp. 144, 145. (Available from
  386        gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
  387 
  388 
  389 
  390 
  391 
  392 
  393 
  394 Deutsch & Gailly             Informational                      [Page 7]
  395 
  396 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  397 
  398 
  399 4. Source code
  400 
  401    Source code for a C language implementation of a "zlib" compliant
  402    library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
  403 
  404 5. Security Considerations
  405 
  406    A decoder that fails to check the ADLER32 checksum value may be
  407    subject to undetected data corruption.
  408 
  409 6. Acknowledgements
  410 
  411    Trademarks cited in this document are the property of their
  412    respective owners.
  413 
  414    Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
  415    the related software described in this specification.  Glenn
  416    Randers-Pehrson converted this document to RFC and HTML format.
  417 
  418 7. Authors' Addresses
  419 
  420    L. Peter Deutsch
  421    Aladdin Enterprises
  422    203 Santa Margarita Ave.
  423    Menlo Park, CA 94025
  424 
  425    Phone: (415) 322-0103 (AM only)
  426    FAX:   (415) 322-1734
  427    EMail: <ghost@aladdin.com>
  428 
  429 
  430    Jean-Loup Gailly
  431 
  432    EMail: <gzip@prep.ai.mit.edu>
  433 
  434    Questions about the technical content of this specification can be
  435    sent by email to
  436 
  437    Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
  438    Mark Adler <madler@alumni.caltech.edu>
  439 
  440    Editorial comments on this specification can be sent by email to
  441 
  442    L. Peter Deutsch <ghost@aladdin.com> and
  443    Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
  444 
  445 
  446 
  447 
  448 
  449 
  450 Deutsch & Gailly             Informational                      [Page 8]
  451 
  452 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  453 
  454 
  455 8. Appendix: Rationale
  456 
  457    8.1. Preset dictionaries
  458 
  459       A preset dictionary is specially useful to compress short input
  460       sequences. The compressor can take advantage of the dictionary
  461       context to encode the input in a more compact manner. The
  462       decompressor can be initialized with the appropriate context by
  463       virtually decompressing a compressed version of the dictionary
  464       without producing any output. However for certain compression
  465       algorithms such as the deflate algorithm this operation can be
  466       achieved without actually performing any decompression.
  467 
  468       The compressor and the decompressor must use exactly the same
  469       dictionary. The dictionary may be fixed or may be chosen among a
  470       certain number of predefined dictionaries, according to the kind
  471       of input data. The decompressor can determine which dictionary has
  472       been chosen by the compressor by checking the dictionary
  473       identifier. This document does not specify the contents of
  474       predefined dictionaries, since the optimal dictionaries are
  475       application specific. Standard data formats using this feature of
  476       the zlib specification must precisely define the allowed
  477       dictionaries.
  478 
  479    8.2. The Adler-32 algorithm
  480 
  481       The Adler-32 algorithm is much faster than the CRC32 algorithm yet
  482       still provides an extremely low probability of undetected errors.
  483 
  484       The modulo on unsigned long accumulators can be delayed for 5552
  485       bytes, so the modulo operation time is negligible.  If the bytes
  486       are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
  487       and order sensitive, unlike the first sum, which is just a
  488       checksum.  That 65521 is prime is important to avoid a possible
  489       large class of two-byte errors that leave the check unchanged.
  490       (The Fletcher checksum uses 255, which is not prime and which also
  491       makes the Fletcher check insensitive to single byte changes 0 <->
  492       255.)
  493 
  494       The sum s1 is initialized to 1 instead of zero to make the length
  495       of the sequence part of s2, so that the length does not have to be
  496       checked separately. (Any sequence of zeroes has a Fletcher
  497       checksum of zero.)
  498 
  499 
  500 
  501 
  502 
  503 
  504 
  505 
  506 Deutsch & Gailly             Informational                      [Page 9]
  507 
  508 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  509 
  510 
  511 9. Appendix: Sample code
  512 
  513    The following C code computes the Adler-32 checksum of a data buffer.
  514    It is written for clarity, not for speed.  The sample code is in the
  515    ANSI C programming language. Non C users may find it easier to read
  516    with these hints:
  517 
  518       &      Bitwise AND operator.
  519       >>     Bitwise right shift operator. When applied to an
  520              unsigned quantity, as here, right shift inserts zero bit(s)
  521              at the left.
  522       <<     Bitwise left shift operator. Left shift inserts zero
  523              bit(s) at the right.
  524       ++     "n++" increments the variable n.
  525       %      modulo operator: a % b is the remainder of a divided by b.
  526 
  527       #define BASE 65521 /* largest prime smaller than 65536 */
  528 
  529       /*
  530          Update a running Adler-32 checksum with the bytes buf[0..len-1]
  531        and return the updated checksum. The Adler-32 checksum should be
  532        initialized to 1.
  533 
  534        Usage example:
  535 
  536          unsigned long adler = 1L;
  537 
  538          while (read_buffer(buffer, length) != EOF) {
  539            adler = update_adler32(adler, buffer, length);
  540          }
  541          if (adler != original_adler) error();
  542       */
  543       unsigned long update_adler32(unsigned long adler,
  544          unsigned char *buf, int len)
  545       {
  546         unsigned long s1 = adler & 0xffff;
  547         unsigned long s2 = (adler >> 16) & 0xffff;
  548         int n;
  549 
  550         for (n = 0; n < len; n++) {
  551           s1 = (s1 + buf[n]) % BASE;
  552           s2 = (s2 + s1)     % BASE;
  553         }
  554         return (s2 << 16) + s1;
  555       }
  556 
  557       /* Return the adler32 of the bytes buf[0..len-1] */
  558 
  559 
  560 
  561 
  562 Deutsch & Gailly             Informational                     [Page 10]
  563 
  564 RFC 1950       ZLIB Compressed Data Format Specification        May 1996
  565 
  566 
  567       unsigned long adler32(unsigned char *buf, int len)
  568       {
  569         return update_adler32(1L, buf, len);
  570       }
  571 
  572 
  573 
  574 
  575 
  576 
  577 
  578 
  579 
  580 
  581 
  582 
  583 
  584 
  585 
  586 
  587 
  588 
  589 
  590 
  591 
  592 
  593 
  594 
  595 
  596 
  597 
  598 
  599 
  600 
  601 
  602 
  603 
  604 
  605 
  606 
  607 
  608 
  609 
  610 
  611 
  612 
  613 
  614 
  615 
  616 
  617 
  618 Deutsch & Gailly             Informational                     [Page 11]
  619 

Cache object: 988bf43c0665fa61d1b4e902d4a1a4f4


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.