diff contrib/python-zstandard/README.rst @ 42070:675775c33ab6

zstandard: vendor python-zstandard 0.11 The upstream source distribution from PyPI was extracted. Unwanted files were removed. The clang-format ignore list was updated to reflect the new source of files. The project contains a vendored copy of zstandard 1.3.8. The old version was 1.3.6. This should result in some minor performance wins. test-check-py3-compat.t was updated to reflect now-passing tests on Python 3.8. Some HTTP tests were updated to reflect new zstd compression output. # no-check-commit because 3rd party code has different style guidelines Differential Revision: https://phab.mercurial-scm.org/D6199
author Gregory Szorc <gregory.szorc@gmail.com>
date Thu, 04 Apr 2019 17:34:43 -0700
parents 73fef626dae3
children 69de49c4e39c
line wrap: on
line diff
--- a/contrib/python-zstandard/README.rst	Thu Apr 04 15:24:03 2019 -0700
+++ b/contrib/python-zstandard/README.rst	Thu Apr 04 17:34:43 2019 -0700
@@ -20,9 +20,9 @@
 Requirements
 ============
 
-This extension is designed to run with Python 2.7, 3.4, 3.5, and 3.6
-on common platforms (Linux, Windows, and OS X). x86 and x86_64 are well-tested
-on Windows. Only x86_64 is well-tested on Linux and macOS.
+This extension is designed to run with Python 2.7, 3.4, 3.5, 3.6, and 3.7
+on common platforms (Linux, Windows, and OS X). On PyPy (both PyPy2 and PyPy3) we support version 6.0.0 and above. 
+x86 and x86_64 are well-tested on Windows. Only x86_64 is well-tested on Linux and macOS.
 
 Installing
 ==========
@@ -215,7 +215,7 @@
 
                # Do something with compressed chunk.
 
-When the context manager exists or ``close()`` is called, the stream is closed,
+When the context manager exits or ``close()`` is called, the stream is closed,
 underlying resources are released, and future operations against the compression
 stream will fail.
 
@@ -251,8 +251,54 @@
 Streaming Input API
 ^^^^^^^^^^^^^^^^^^^
 
-``stream_writer(fh)`` (which behaves as a context manager) allows you to *stream*
-data into a compressor.::
+``stream_writer(fh)`` allows you to *stream* data into a compressor.
+
+Returned instances implement the ``io.RawIOBase`` interface. Only methods
+that involve writing will do useful things.
+
+The argument to ``stream_writer()`` must have a ``write(data)`` method. As
+compressed data is available, ``write()`` will be called with the compressed
+data as its argument. Many common Python types implement ``write()``, including
+open file handles and ``io.BytesIO``.
+
+The ``write(data)`` method is used to feed data into the compressor.
+
+The ``flush([flush_mode=FLUSH_BLOCK])`` method can be called to evict whatever
+data remains within the compressor's internal state into the output object. This
+may result in 0 or more ``write()`` calls to the output object. This method
+accepts an optional ``flush_mode`` argument to control the flushing behavior.
+Its value can be any of the ``FLUSH_*`` constants.
+
+Both ``write()`` and ``flush()`` return the number of bytes written to the
+object's ``write()``. In many cases, small inputs do not accumulate enough
+data to cause a write and ``write()`` will return ``0``.
+
+Calling ``close()`` will mark the stream as closed and subsequent I/O
+operations will raise ``ValueError`` (per the documented behavior of
+``io.RawIOBase``). ``close()`` will also call ``close()`` on the underlying
+stream if such a method exists.
+
+Typically usage is as follows::
+
+   cctx = zstd.ZstdCompressor(level=10)
+   compressor = cctx.stream_writer(fh)
+
+   compressor.write(b'chunk 0\n')
+   compressor.write(b'chunk 1\n')
+   compressor.flush()
+   # Receiver will be able to decode ``chunk 0\nchunk 1\n`` at this point.
+   # Receiver is also expecting more data in the zstd *frame*.
+
+   compressor.write(b'chunk 2\n')
+   compressor.flush(zstd.FLUSH_FRAME)
+   # Receiver will be able to decode ``chunk 0\nchunk 1\nchunk 2``.
+   # Receiver is expecting no more data, as the zstd frame is closed.
+   # Any future calls to ``write()`` at this point will construct a new
+   # zstd frame.
+
+Instances can be used as context managers. Exiting the context manager is
+the equivalent of calling ``close()``, which is equivalent to calling
+``flush(zstd.FLUSH_FRAME)``::
 
    cctx = zstd.ZstdCompressor(level=10)
    with cctx.stream_writer(fh) as compressor:
@@ -260,22 +306,12 @@
        compressor.write(b'chunk 1')
        ...
 
-The argument to ``stream_writer()`` must have a ``write(data)`` method. As
-compressed data is available, ``write()`` will be called with the compressed
-data as its argument. Many common Python types implement ``write()``, including
-open file handles and ``io.BytesIO``.
+.. important::
 
-``stream_writer()`` returns an object representing a streaming compressor
-instance. It **must** be used as a context manager. That object's
-``write(data)`` method is used to feed data into the compressor.
-
-A ``flush()`` method can be called to evict whatever data remains within the
-compressor's internal state into the output object. This may result in 0 or
-more ``write()`` calls to the output object.
-
-Both ``write()`` and ``flush()`` return the number of bytes written to the
-object's ``write()``. In many cases, small inputs do not accumulate enough
-data to cause a write and ``write()`` will return ``0``.
+   If ``flush(FLUSH_FRAME)`` is not called, emitted data doesn't constitute
+   a full zstd *frame* and consumers of this data may complain about malformed
+   input. It is recommended to use instances as a context manager to ensure
+   *frames* are properly finished.
 
 If the size of the data being fed to this streaming compressor is known,
 you can declare it before compression begins::
@@ -310,6 +346,14 @@
         ...
         total_written = compressor.tell()
 
+``stream_writer()`` accepts a ``write_return_read`` boolean argument to control
+the return value of ``write()``. When ``False`` (the default), ``write()`` returns
+the number of bytes that were ``write()``en to the underlying object. When
+``True``, ``write()`` returns the number of bytes read from the input that
+were subsequently written to the compressor. ``True`` is the *proper* behavior
+for ``write()`` as specified by the ``io.RawIOBase`` interface and will become
+the default value in a future release.
+
 Streaming Output API
 ^^^^^^^^^^^^^^^^^^^^
 
@@ -654,27 +698,63 @@
 ``tell()`` returns the number of decompressed bytes read so far.
 
 Not all I/O methods are implemented. Notably missing is support for
-``readline()``, ``readlines()``, and linewise iteration support. Support for
-these is planned for a future release.
+``readline()``, ``readlines()``, and linewise iteration support. This is
+because streams operate on binary data - not text data. If you want to
+convert decompressed output to text, you can chain an ``io.TextIOWrapper``
+to the stream::
+
+   with open(path, 'rb') as fh:
+       dctx = zstd.ZstdDecompressor()
+       stream_reader = dctx.stream_reader(fh)
+       text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8')
+
+       for line in text_stream:
+           ...
+
+The ``read_across_frames`` argument to ``stream_reader()`` controls the
+behavior of read operations when the end of a zstd *frame* is encountered.
+When ``False`` (the default), a read will complete when the end of a
+zstd *frame* is encountered. When ``True``, a read can potentially
+return data spanning multiple zstd *frames*.
 
 Streaming Input API
 ^^^^^^^^^^^^^^^^^^^
 
-``stream_writer(fh)`` can be used to incrementally send compressed data to a
-decompressor.::
+``stream_writer(fh)`` allows you to *stream* data into a decompressor.
+
+Returned instances implement the ``io.RawIOBase`` interface. Only methods
+that involve writing will do useful things.
+
+The argument to ``stream_writer()`` is typically an object that also implements
+``io.RawIOBase``. But any object with a ``write(data)`` method will work. Many
+common Python types conform to this interface, including open file handles
+and ``io.BytesIO``.
+
+Behavior is similar to ``ZstdCompressor.stream_writer()``: compressed data
+is sent to the decompressor by calling ``write(data)`` and decompressed
+output is written to the underlying stream by calling its ``write(data)``
+method.::
 
     dctx = zstd.ZstdDecompressor()
-    with dctx.stream_writer(fh) as decompressor:
-        decompressor.write(compressed_data)
+    decompressor = dctx.stream_writer(fh)
 
-This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to
-the decompressor by calling ``write(data)`` and decompressed output is written
-to the output object by calling its ``write(data)`` method.
+    decompressor.write(compressed_data)
+    ...
+
 
 Calls to ``write()`` will return the number of bytes written to the output
 object. Not all inputs will result in bytes being written, so return values
 of ``0`` are possible.
 
+Like the ``stream_writer()`` compressor, instances can be used as context
+managers. However, context managers add no extra special behavior and offer
+little to no benefit to being used.
+
+Calling ``close()`` will mark the stream as closed and subsequent I/O operations
+will raise ``ValueError`` (per the documented behavior of ``io.RawIOBase``).
+``close()`` will also call ``close()`` on the underlying stream if such a
+method exists.
+
 The size of chunks being ``write()`` to the destination can be specified::
 
     dctx = zstd.ZstdDecompressor()
@@ -687,6 +767,13 @@
     with dctx.stream_writer(fh) as decompressor:
         byte_size = decompressor.memory_size()
 
+``stream_writer()`` accepts a ``write_return_read`` boolean argument to control
+the return value of ``write()``. When ``False`` (the default)``, ``write()``
+returns the number of bytes that were ``write()``en to the underlying stream.
+When ``True``, ``write()`` returns the number of bytes read from the input.
+``True`` is the *proper* behavior for ``write()`` as specified by the
+``io.RawIOBase`` interface and will become the default in a future release.
+
 Streaming Output API
 ^^^^^^^^^^^^^^^^^^^^
 
@@ -791,6 +878,10 @@
    memory (re)allocations, this streaming decompression API isn't as
    efficient as other APIs.
 
+For compatibility with the standard library APIs, instances expose a
+``flush([length=None])`` method. This method no-ops and has no meaningful
+side-effects, making it safe to call any time.
+
 Batch Decompression API
 ^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -1147,18 +1238,21 @@
 * search_log
 * min_match
 * target_length
-* compression_strategy
+* strategy
+* compression_strategy (deprecated: same as ``strategy``)
 * write_content_size
 * write_checksum
 * write_dict_id
 * job_size
-* overlap_size_log
+* overlap_log
+* overlap_size_log (deprecated: same as ``overlap_log``)
 * force_max_window
 * enable_ldm
 * ldm_hash_log
 * ldm_min_match
 * ldm_bucket_size_log
-* ldm_hash_every_log
+* ldm_hash_rate_log
+* ldm_hash_every_log (deprecated: same as ``ldm_hash_rate_log``)
 * threads
 
 Some of these are very low-level settings. It may help to consult the official
@@ -1240,6 +1334,13 @@
 MAGIC_NUMBER
     Frame header as an integer
 
+FLUSH_BLOCK
+    Flushing behavior that denotes to flush a zstd block. A decompressor will
+    be able to decode all data fed into the compressor so far.
+FLUSH_FRAME
+    Flushing behavior that denotes to end a zstd frame. Any new data fed
+    to the compressor will start a new frame.
+
 CONTENTSIZE_UNKNOWN
     Value for content size when the content size is unknown.
 CONTENTSIZE_ERROR
@@ -1261,10 +1362,18 @@
     Minimum value for compression parameter
 SEARCHLOG_MAX
     Maximum value for compression parameter
+MINMATCH_MIN
+    Minimum value for compression parameter
+MINMATCH_MAX
+    Maximum value for compression parameter
 SEARCHLENGTH_MIN
     Minimum value for compression parameter
+
+    Deprecated: use ``MINMATCH_MIN``
 SEARCHLENGTH_MAX
     Maximum value for compression parameter
+
+    Deprecated: use ``MINMATCH_MAX``
 TARGETLENGTH_MIN
     Minimum value for compression parameter
 STRATEGY_FAST
@@ -1283,6 +1392,8 @@
     Compression strategy
 STRATEGY_BTULTRA
     Compression strategy
+STRATEGY_BTULTRA2
+    Compression strategy
 
 FORMAT_ZSTD1
     Zstandard frame format