diff contrib/python-zstandard/README.rst @ 40121:73fef626dae3

zstandard: vendor python-zstandard 0.10.1 This was just released. The upstream source distribution from PyPI was extracted. Unwanted files were removed. The clang-format ignore list was updated to reflect the new source of files. setup.py was updated to pass a new argument to python-zstandard's function for returning an Extension instance. Upstream had to change to use relative paths because Python 3.7's packaging doesn't seem to like absolute paths when defining sources, includes, etc. The default relative path calculation is relative to setup_zstd.py which is different from the directory of Mercurial's setup.py. The project contains a vendored copy of zstandard 1.3.6. The old version was 1.3.4. The API should be backwards compatible and nothing in core should need adjusted. However, there is a new "chunker" API that we may find useful in places where we want to emit compressed chunks of a fixed size. There are a pair of bug fixes in 0.10.0 with regards to compressobj() and decompressobj() when block flushing is used. I actually found these bugs when introducing these APIs in Mercurial! But existing Mercurial code is not affected because we don't perform block flushing. # no-check-commit because 3rd party code has different style guidelines Differential Revision: https://phab.mercurial-scm.org/D4911
author Gregory Szorc <gregory.szorc@gmail.com>
date Mon, 08 Oct 2018 16:27:40 -0700
parents b1fb341d8a61
children 675775c33ab6
line wrap: on
line diff
--- a/contrib/python-zstandard/README.rst	Tue Sep 25 20:55:03 2018 +0900
+++ b/contrib/python-zstandard/README.rst	Mon Oct 08 16:27:40 2018 -0700
@@ -196,6 +196,17 @@
 
    with open(path, 'rb') as fh:
        cctx = zstd.ZstdCompressor()
+       reader = cctx.stream_reader(fh)
+       while True:
+           chunk = reader.read(16384)
+           if not chunk:
+               break
+
+           # Do something with compressed chunk.
+
+Instances can also be used as context managers::
+
+   with open(path, 'rb') as fh:
        with cctx.stream_reader(fh) as reader:
            while True:
                chunk = reader.read(16384)
@@ -204,9 +215,9 @@
 
                # Do something with compressed chunk.
 
-The stream can only be read within a context manager. When the context
-manager exits, the stream is closed and the underlying resource is
-released and future operations against the compression stream stream will fail.
+When the context manager exists or ``close()`` is called, the stream is closed,
+underlying resources are released, and future operations against the compression
+stream will fail.
 
 The ``source`` argument to ``stream_reader()`` can be any object with a
 ``read(size)`` method or any object implementing the *buffer protocol*.
@@ -419,6 +430,64 @@
    data = cobj.compress(b'foobar')
    data = cobj.flush()
 
+Chunker API
+^^^^^^^^^^^
+
+``chunker(size=None, chunk_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE)`` returns
+an object that can be used to iteratively feed chunks of data into a compressor
+and produce output chunks of a uniform size.
+
+The object returned by ``chunker()`` exposes the following methods:
+
+``compress(data)``
+   Feeds new input data into the compressor.
+
+``flush()``
+   Flushes all data currently in the compressor.
+
+``finish()``
+   Signals the end of input data. No new data can be compressed after this
+   method is called.
+
+``compress()``, ``flush()``, and ``finish()`` all return an iterator of
+``bytes`` instances holding compressed data. The iterator may be empty. Callers
+MUST iterate through all elements of the returned iterator before performing
+another operation on the object.
+
+All chunks emitted by ``compress()`` will have a length of ``chunk_size``.
+
+``flush()`` and ``finish()`` may return a final chunk smaller than
+``chunk_size``.
+
+Here is how the API should be used::
+
+   cctx = zstd.ZstdCompressor()
+   chunker = cctx.chunker(chunk_size=32768)
+
+   with open(path, 'rb') as fh:
+       while True:
+           in_chunk = fh.read(32768)
+           if not in_chunk:
+               break
+
+           for out_chunk in chunker.compress(in_chunk):
+               # Do something with output chunk of size 32768.
+
+       for out_chunk in chunker.finish():
+           # Do something with output chunks that finalize the zstd frame.
+
+The ``chunker()`` API is often a better alternative to ``compressobj()``.
+
+``compressobj()`` will emit output data as it is available. This results in a
+*stream* of output chunks of varying sizes. The consistency of the output chunk
+size with ``chunker()`` is more appropriate for many usages, such as sending
+compressed data to a socket.
+
+``compressobj()`` may also perform extra memory reallocations in order to
+dynamically adjust the sizes of the output chunks. Since ``chunker()`` output
+chunks are all the same size (except for flushed or final chunks), there is
+less memory allocation overhead.
+
 Batch Compression API
 ^^^^^^^^^^^^^^^^^^^^^
 
@@ -542,17 +611,24 @@
 
    with open(path, 'rb') as fh:
        dctx = zstd.ZstdDecompressor()
-       with dctx.stream_reader(fh) as reader:
-           while True:
-               chunk = reader.read(16384)
-               if not chunk:
-                   break
+       reader = dctx.stream_reader(fh)
+       while True:
+           chunk = reader.read(16384)
+            if not chunk:
+                break
+
+            # Do something with decompressed chunk.
 
-               # Do something with decompressed chunk.
+The stream can also be used as a context manager::
 
-The stream can only be read within a context manager. When the context
-manager exits, the stream is closed and the underlying resource is
-released and future operations against the stream will fail.
+   with open(path, 'rb') as fh:
+       dctx = zstd.ZstdDecompressor()
+       with dctx.stream_reader(fh) as reader:
+           ...
+
+When used as a context manager, the stream is closed and the underlying
+resources are released when the context manager exits. Future operations against
+the stream will fail.
 
 The ``source`` argument to ``stream_reader()`` can be any object with a
 ``read(size)`` method or any object implementing the *buffer protocol*.
@@ -1077,7 +1153,6 @@
 * write_dict_id
 * job_size
 * overlap_size_log
-* compress_literals
 * force_max_window
 * enable_ldm
 * ldm_hash_log