Mercurial > public > mercurial-scm > hg-stable
annotate contrib/python-zstandard/README.rst @ 40450:41f0529b5112 stable
commandserver: get around ETIMEDOUT raised by selectors2
selector.select() should exits with an empty event list on timed out, but
selectors2 raises OSError if timeout expires while recovering from EINTR.
Spotted while debugging new chg feature.
author | Yuya Nishihara <yuya@tcha.org> |
---|---|
date | Mon, 03 Dec 2018 21:45:15 +0900 |
parents | 73fef626dae3 |
children | 675775c33ab6 |
rev | line source |
---|---|
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1 ================ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
2 python-zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
3 ================ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
4 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
5 This project provides Python bindings for interfacing with the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
6 `Zstandard <http://www.zstd.net>`_ compression library. A C extension |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
7 and CFFI interface are provided. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
8 |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
9 The primary goal of the project is to provide a rich interface to the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
10 underlying C API through a Pythonic interface while not sacrificing |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
11 performance. This means exposing most of the features and flexibility |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
12 of the C API while not sacrificing usability or safety that Python provides. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
13 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
14 The canonical home for this project lives in a Mercurial repository run by |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
15 the author. For convenience, that repository is frequently synchronized to |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
16 https://github.com/indygreg/python-zstandard. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
17 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
18 | |ci-status| |win-ci-status| |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
19 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
20 Requirements |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
21 ============ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
22 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
23 This extension is designed to run with Python 2.7, 3.4, 3.5, and 3.6 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
24 on common platforms (Linux, Windows, and OS X). x86 and x86_64 are well-tested |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
25 on Windows. Only x86_64 is well-tested on Linux and macOS. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
26 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
27 Installing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
28 ========== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
29 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
30 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
31 So, to install this package:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
32 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
33 $ pip install zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
34 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
35 Binary wheels are made available for some platforms. If you need to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
36 install from a source distribution, all you should need is a working C |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
37 compiler and the Python development headers/libraries. On many Linux |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
38 distributions, you can install a ``python-dev`` or ``python-devel`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
39 package to provide these dependencies. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
40 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
41 Packages are also uploaded to Anaconda Cloud at |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
42 https://anaconda.org/indygreg/zstandard. See that URL for how to install |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
43 this package with ``conda``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
44 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
45 Performance |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
46 =========== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
47 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
48 zstandard is a highly tunable compression algorithm. In its default settings |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
49 (compression level 3), it will be faster at compression and decompression and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
50 will have better compression ratios than zlib on most data sets. When tuned |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
51 for speed, it approaches lz4's speed and ratios. When tuned for compression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
52 ratio, it approaches lzma ratios and compression speed, but decompression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
53 speed is much faster. See the official zstandard documentation for more. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
54 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
55 zstandard and this library support multi-threaded compression. There is a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
56 mechanism to compress large inputs using multiple threads. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
57 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
58 The performance of this library is usually very similar to what the zstandard |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
59 C API can deliver. Overhead in this library is due to general Python overhead |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
60 and can't easily be avoided by *any* zstandard Python binding. This library |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
61 exposes multiple APIs for performing compression and decompression so callers |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
62 can pick an API suitable for their need. Contrast with the compression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
63 modules in Python's standard library (like ``zlib``), which only offer limited |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
64 mechanisms for performing operations. The API flexibility means consumers can |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
65 choose to use APIs that facilitate zero copying or minimize Python object |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
66 creation and garbage collection overhead. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
67 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
68 This library is capable of single-threaded throughputs well over 1 GB/s. For |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
69 exact numbers, measure yourself. The source code repository has a ``bench.py`` |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
70 script that can be used to measure things. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
71 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
72 API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
73 === |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
74 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
75 To interface with Zstandard, simply import the ``zstandard`` module:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
76 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
77 import zstandard |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
78 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
79 It is a popular convention to alias the module as a different name for |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
80 brevity:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
81 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
82 import zstandard as zstd |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
83 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
84 This module attempts to import and use either the C extension or CFFI |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
85 implementation. On Python platforms known to support C extensions (like |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
86 CPython), it raises an ImportError if the C extension cannot be imported. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
87 On Python platforms known to not support C extensions (like PyPy), it only |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
88 attempts to import the CFFI implementation and raises ImportError if that |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
89 can't be done. On other platforms, it first tries to import the C extension |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
90 then falls back to CFFI if that fails and raises ImportError if CFFI fails. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
91 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
92 To change the module import behavior, a ``PYTHON_ZSTANDARD_IMPORT_POLICY`` |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
93 environment variable can be set. The following values are accepted: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
94 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
95 default |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
96 The behavior described above. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
97 cffi_fallback |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
98 Always try to import the C extension then fall back to CFFI if that |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
99 fails. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
100 cext |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
101 Only attempt to import the C extension. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
102 cffi |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
103 Only attempt to import the CFFI implementation. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
104 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
105 In addition, the ``zstandard`` module exports a ``backend`` attribute |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
106 containing the string name of the backend being used. It will be one |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
107 of ``cext`` or ``cffi`` (for *C extension* and *cffi*, respectively). |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
108 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
109 The types, functions, and attributes exposed by the ``zstandard`` module |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
110 are documented in the sections below. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
111 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
112 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
113 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
114 The documentation in this section makes references to various zstd |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
115 concepts and functionality. The source repository contains a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
116 ``docs/concepts.rst`` file explaining these in more detail. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
117 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
118 ZstdCompressor |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
119 -------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
120 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
121 The ``ZstdCompressor`` class provides an interface for performing |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
122 compression operations. Each instance is essentially a wrapper around a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
123 ``ZSTD_CCtx`` from the C API. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
124 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
125 Each instance is associated with parameters that control compression |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
126 behavior. These come from the following named arguments (all optional): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
127 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
128 level |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
129 Integer compression level. Valid values are between 1 and 22. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
130 dict_data |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
131 Compression dictionary to use. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
132 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
133 Note: When using dictionary data and ``compress()`` is called multiple |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
134 times, the ``ZstdCompressionParameters`` derived from an integer |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
135 compression ``level`` and the first compressed data's size will be reused |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
136 for all subsequent operations. This may not be desirable if source data |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
137 size varies significantly. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
138 compression_params |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
139 A ``ZstdCompressionParameters`` instance defining compression settings. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
140 write_checksum |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
141 Whether a 4 byte checksum should be written with the compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
142 Defaults to False. If True, the decompressor can verify that decompressed |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
143 data matches the original input data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
144 write_content_size |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
145 Whether the size of the uncompressed data will be written into the |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
146 header of compressed data. Defaults to True. The data will only be |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
147 written if the compressor knows the size of the input data. This is |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
148 often not true for streaming compression. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
149 write_dict_id |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
150 Whether to write the dictionary ID into the compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
151 Defaults to True. The dictionary ID is only written if a dictionary |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
152 is being used. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
153 threads |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
154 Enables and sets the number of threads to use for multi-threaded compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
155 operations. Defaults to 0, which means to use single-threaded compression. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
156 Negative values will resolve to the number of logical CPUs in the system. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
157 Read below for more info on multi-threaded compression. This argument only |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
158 controls thread count for operations that operate on individual pieces of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
159 data. APIs that spawn multiple threads for working on multiple pieces of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
160 data have their own ``threads`` argument. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
161 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
162 ``compression_params`` is mutually exclusive with ``level``, ``write_checksum``, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
163 ``write_content_size``, ``write_dict_id``, and ``threads``. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
164 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
165 Unless specified otherwise, assume that no two methods of ``ZstdCompressor`` |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
166 instances can be called from multiple Python threads simultaneously. In other |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
167 words, assume instances are not thread safe unless stated otherwise. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
168 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
169 Utility Methods |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
170 ^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
171 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
172 ``frame_progression()`` returns a 3-tuple containing the number of bytes |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
173 ingested, consumed, and produced by the current compression operation. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
174 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
175 ``memory_size()`` obtains the memory utilization of the underlying zstd |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
176 compression context, in bytes.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
177 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
178 cctx = zstd.ZstdCompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
179 memory = cctx.memory_size() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
180 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
181 Simple API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
182 ^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
183 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
184 ``compress(data)`` compresses and returns data as a one-shot operation.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
185 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
186 cctx = zstd.ZstdCompressor() |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
187 compressed = cctx.compress(b'data to compress') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
188 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
189 The ``data`` argument can be any object that implements the *buffer protocol*. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
190 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
191 Stream Reader API |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
192 ^^^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
193 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
194 ``stream_reader(source)`` can be used to obtain an object conforming to the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
195 ``io.RawIOBase`` interface for reading compressed output as a stream:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
196 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
197 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
198 cctx = zstd.ZstdCompressor() |
40122
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
199 reader = cctx.stream_reader(fh) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
200 while True: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
201 chunk = reader.read(16384) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
202 if not chunk: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
203 break |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
204 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
205 # Do something with compressed chunk. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
206 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
207 Instances can also be used as context managers:: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
208 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
209 with open(path, 'rb') as fh: |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
210 with cctx.stream_reader(fh) as reader: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
211 while True: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
212 chunk = reader.read(16384) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
213 if not chunk: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
214 break |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
215 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
216 # Do something with compressed chunk. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
217 |
40122
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
218 When the context manager exists or ``close()`` is called, the stream is closed, |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
219 underlying resources are released, and future operations against the compression |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
220 stream will fail. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
221 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
222 The ``source`` argument to ``stream_reader()`` can be any object with a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
223 ``read(size)`` method or any object implementing the *buffer protocol*. |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
224 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
225 ``stream_reader()`` accepts a ``size`` argument specifying how large the input |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
226 stream is. This is used to adjust compression parameters so they are |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
227 tailored to the source size.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
228 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
229 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
230 cctx = zstd.ZstdCompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
231 with cctx.stream_reader(fh, size=os.stat(path).st_size) as reader: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
232 ... |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
233 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
234 If the ``source`` is a stream, you can specify how large ``read()`` requests |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
235 to that stream should be via the ``read_size`` argument. It defaults to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
236 ``zstandard.COMPRESSION_RECOMMENDED_INPUT_SIZE``.:: |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
237 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
238 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
239 cctx = zstd.ZstdCompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
240 # Will perform fh.read(8192) when obtaining data to feed into the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
241 # compressor. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
242 with cctx.stream_reader(fh, read_size=8192) as reader: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
243 ... |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
244 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
245 The stream returned by ``stream_reader()`` is neither writable nor seekable |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
246 (even if the underlying source is seekable). ``readline()`` and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
247 ``readlines()`` are not implemented because they don't make sense for |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
248 compressed data. ``tell()`` returns the number of compressed bytes |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
249 emitted so far. |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
250 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
251 Streaming Input API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
252 ^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
253 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
254 ``stream_writer(fh)`` (which behaves as a context manager) allows you to *stream* |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
255 data into a compressor.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
256 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
257 cctx = zstd.ZstdCompressor(level=10) |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
258 with cctx.stream_writer(fh) as compressor: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
259 compressor.write(b'chunk 0') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
260 compressor.write(b'chunk 1') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
261 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
262 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
263 The argument to ``stream_writer()`` must have a ``write(data)`` method. As |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
264 compressed data is available, ``write()`` will be called with the compressed |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
265 data as its argument. Many common Python types implement ``write()``, including |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
266 open file handles and ``io.BytesIO``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
267 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
268 ``stream_writer()`` returns an object representing a streaming compressor |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
269 instance. It **must** be used as a context manager. That object's |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
270 ``write(data)`` method is used to feed data into the compressor. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
271 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
272 A ``flush()`` method can be called to evict whatever data remains within the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
273 compressor's internal state into the output object. This may result in 0 or |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
274 more ``write()`` calls to the output object. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
275 |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
276 Both ``write()`` and ``flush()`` return the number of bytes written to the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
277 object's ``write()``. In many cases, small inputs do not accumulate enough |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
278 data to cause a write and ``write()`` will return ``0``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
279 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
280 If the size of the data being fed to this streaming compressor is known, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
281 you can declare it before compression begins:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
282 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
283 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
284 with cctx.stream_writer(fh, size=data_len) as compressor: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
285 compressor.write(chunk0) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
286 compressor.write(chunk1) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
287 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
288 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
289 Declaring the size of the source data allows compression parameters to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
290 be tuned. And if ``write_content_size`` is used, it also results in the |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
291 content size being written into the frame header of the output data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
292 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
293 The size of chunks being ``write()`` to the destination can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
294 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
295 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
296 with cctx.stream_writer(fh, write_size=32768) as compressor: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
297 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
298 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
299 To see how much memory is being used by the streaming compressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
300 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
301 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
302 with cctx.stream_writer(fh) as compressor: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
303 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
304 byte_size = compressor.memory_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
305 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
306 Thte total number of bytes written so far are exposed via ``tell()``:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
307 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
308 cctx = zstd.ZstdCompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
309 with cctx.stream_writer(fh) as compressor: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
310 ... |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
311 total_written = compressor.tell() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
312 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
313 Streaming Output API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
314 ^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
315 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
316 ``read_to_iter(reader)`` provides a mechanism to stream data out of a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
317 compressor as an iterator of data chunks.:: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
318 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
319 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
320 for chunk in cctx.read_to_iter(fh): |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
321 # Do something with emitted data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
322 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
323 ``read_to_iter()`` accepts an object that has a ``read(size)`` method or |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
324 conforms to the buffer protocol. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
325 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
326 Uncompressed data is fetched from the source either by calling ``read(size)`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
327 or by fetching a slice of data from the object directly (in the case where |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
328 the buffer protocol is being used). The returned iterator consists of chunks |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
329 of compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
330 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
331 If reading from the source via ``read()``, ``read()`` will be called until |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
332 it raises or returns an empty bytes (``b''``). It is perfectly valid for |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
333 the source to deliver fewer bytes than were what requested by ``read(size)``. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
334 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
335 Like ``stream_writer()``, ``read_to_iter()`` also accepts a ``size`` argument |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
336 declaring the size of the input stream:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
337 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
338 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
339 for chunk in cctx.read_to_iter(fh, size=some_int): |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
340 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
341 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
342 You can also control the size that data is ``read()`` from the source and |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
343 the ideal size of output chunks:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
344 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
345 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
346 for chunk in cctx.read_to_iter(fh, read_size=16384, write_size=8192): |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
347 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
348 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
349 Unlike ``stream_writer()``, ``read_to_iter()`` does not give direct control |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
350 over the sizes of chunks fed into the compressor. Instead, chunk sizes will |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
351 be whatever the object being read from delivers. These will often be of a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
352 uniform size. |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
353 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
354 Stream Copying API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
355 ^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
356 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
357 ``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
358 compressing it.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
359 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
360 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
361 cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
362 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
363 For example, say you wish to compress a file:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
364 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
365 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
366 with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
367 cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
368 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
369 It is also possible to declare the size of the source stream:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
370 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
371 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
372 cctx.copy_stream(ifh, ofh, size=len_of_input) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
373 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
374 You can also specify how large the chunks that are ``read()`` and ``write()`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
375 from and to the streams:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
376 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
377 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
378 cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
379 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
380 The stream copier returns a 2-tuple of bytes read and written:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
381 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
382 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
383 read_count, write_count = cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
384 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
385 Compressor API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
386 ^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
387 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
388 ``compressobj()`` returns an object that exposes ``compress(data)`` and |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
389 ``flush()`` methods. Each returns compressed data or an empty bytes. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
390 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
391 The purpose of ``compressobj()`` is to provide an API-compatible interface |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
392 with ``zlib.compressobj``, ``bz2.BZ2Compressor``, etc. This allows callers to |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
393 swap in different compressor objects while using the same API. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
394 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
395 ``flush()`` accepts an optional argument indicating how to end the stream. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
396 ``zstd.COMPRESSOBJ_FLUSH_FINISH`` (the default) ends the compression stream. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
397 Once this type of flush is performed, ``compress()`` and ``flush()`` can |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
398 no longer be called. This type of flush **must** be called to end the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
399 compression context. If not called, returned data may be incomplete. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
400 |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
401 A ``zstd.COMPRESSOBJ_FLUSH_BLOCK`` argument to ``flush()`` will flush a |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
402 zstd block. Flushes of this type can be performed multiple times. The next |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
403 call to ``compress()`` will begin a new zstd block. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
404 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
405 Here is how this API should be used:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
406 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
407 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
408 cobj = cctx.compressobj() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
409 data = cobj.compress(b'raw input 0') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
410 data = cobj.compress(b'raw input 1') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
411 data = cobj.flush() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
412 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
413 Or to flush blocks:: |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
414 |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
415 cctx.zstd.ZstdCompressor() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
416 cobj = cctx.compressobj() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
417 data = cobj.compress(b'chunk in first block') |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
418 data = cobj.flush(zstd.COMPRESSOBJ_FLUSH_BLOCK) |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
419 data = cobj.compress(b'chunk in second block') |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
420 data = cobj.flush() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
421 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
422 For best performance results, keep input chunks under 256KB. This avoids |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
423 extra allocations for a large output object. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
424 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
425 It is possible to declare the input size of the data that will be fed into |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
426 the compressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
427 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
428 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
429 cobj = cctx.compressobj(size=6) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
430 data = cobj.compress(b'foobar') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
431 data = cobj.flush() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
432 |
40122
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
433 Chunker API |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
434 ^^^^^^^^^^^ |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
435 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
436 ``chunker(size=None, chunk_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE)`` returns |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
437 an object that can be used to iteratively feed chunks of data into a compressor |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
438 and produce output chunks of a uniform size. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
439 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
440 The object returned by ``chunker()`` exposes the following methods: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
441 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
442 ``compress(data)`` |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
443 Feeds new input data into the compressor. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
444 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
445 ``flush()`` |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
446 Flushes all data currently in the compressor. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
447 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
448 ``finish()`` |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
449 Signals the end of input data. No new data can be compressed after this |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
450 method is called. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
451 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
452 ``compress()``, ``flush()``, and ``finish()`` all return an iterator of |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
453 ``bytes`` instances holding compressed data. The iterator may be empty. Callers |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
454 MUST iterate through all elements of the returned iterator before performing |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
455 another operation on the object. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
456 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
457 All chunks emitted by ``compress()`` will have a length of ``chunk_size``. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
458 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
459 ``flush()`` and ``finish()`` may return a final chunk smaller than |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
460 ``chunk_size``. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
461 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
462 Here is how the API should be used:: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
463 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
464 cctx = zstd.ZstdCompressor() |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
465 chunker = cctx.chunker(chunk_size=32768) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
466 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
467 with open(path, 'rb') as fh: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
468 while True: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
469 in_chunk = fh.read(32768) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
470 if not in_chunk: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
471 break |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
472 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
473 for out_chunk in chunker.compress(in_chunk): |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
474 # Do something with output chunk of size 32768. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
475 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
476 for out_chunk in chunker.finish(): |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
477 # Do something with output chunks that finalize the zstd frame. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
478 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
479 The ``chunker()`` API is often a better alternative to ``compressobj()``. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
480 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
481 ``compressobj()`` will emit output data as it is available. This results in a |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
482 *stream* of output chunks of varying sizes. The consistency of the output chunk |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
483 size with ``chunker()`` is more appropriate for many usages, such as sending |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
484 compressed data to a socket. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
485 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
486 ``compressobj()`` may also perform extra memory reallocations in order to |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
487 dynamically adjust the sizes of the output chunks. Since ``chunker()`` output |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
488 chunks are all the same size (except for flushed or final chunks), there is |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
489 less memory allocation overhead. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
490 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
491 Batch Compression API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
492 ^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
493 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
494 (Experimental. Not yet supported in CFFI bindings.) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
495 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
496 ``multi_compress_to_buffer(data, [threads=0])`` performs compression of multiple |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
497 inputs as a single operation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
498 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
499 Data to be compressed can be passed as a ``BufferWithSegmentsCollection``, a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
500 ``BufferWithSegments``, or a list containing byte like objects. Each element of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
501 the container will be compressed individually using the configured parameters |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
502 on the ``ZstdCompressor`` instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
503 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
504 The ``threads`` argument controls how many threads to use for compression. The |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
505 default is ``0`` which means to use a single thread. Negative values use the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
506 number of logical CPUs in the machine. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
507 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
508 The function returns a ``BufferWithSegmentsCollection``. This type represents |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
509 N discrete memory allocations, eaching holding 1 or more compressed frames. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
510 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
511 Output data is written to shared memory buffers. This means that unlike |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
512 regular Python objects, a reference to *any* object within the collection |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
513 keeps the shared buffer and therefore memory backing it alive. This can have |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
514 undesirable effects on process memory usage. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
515 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
516 The API and behavior of this function is experimental and will likely change. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
517 Known deficiencies include: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
518 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
519 * If asked to use multiple threads, it will always spawn that many threads, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
520 even if the input is too small to use them. It should automatically lower |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
521 the thread count when the extra threads would just add overhead. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
522 * The buffer allocation strategy is fixed. There is room to make it dynamic, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
523 perhaps even to allow one output buffer per input, facilitating a variation |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
524 of the API to return a list without the adverse effects of shared memory |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
525 buffers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
526 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
527 ZstdDecompressor |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
528 ---------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
529 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
530 The ``ZstdDecompressor`` class provides an interface for performing |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
531 decompression. It is effectively a wrapper around the ``ZSTD_DCtx`` type from |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
532 the C API. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
533 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
534 Each instance is associated with parameters that control decompression. These |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
535 come from the following named arguments (all optional): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
536 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
537 dict_data |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
538 Compression dictionary to use. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
539 max_window_size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
540 Sets an uppet limit on the window size for decompression operations in |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
541 kibibytes. This setting can be used to prevent large memory allocations |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
542 for inputs using large compression windows. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
543 format |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
544 Set the format of data for the decoder. By default, this is |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
545 ``zstd.FORMAT_ZSTD1``. It can be set to ``zstd.FORMAT_ZSTD1_MAGICLESS`` to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
546 allow decoding frames without the 4 byte magic header. Not all decompression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
547 APIs support this mode. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
548 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
549 The interface of this class is very similar to ``ZstdCompressor`` (by design). |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
550 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
551 Unless specified otherwise, assume that no two methods of ``ZstdDecompressor`` |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
552 instances can be called from multiple Python threads simultaneously. In other |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
553 words, assume instances are not thread safe unless stated otherwise. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30444
diff
changeset
|
554 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
555 Utility Methods |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
556 ^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
557 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
558 ``memory_size()`` obtains the size of the underlying zstd decompression context, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
559 in bytes.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
560 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
561 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
562 size = dctx.memory_size() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
563 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
564 Simple API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
565 ^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
566 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
567 ``decompress(data)`` can be used to decompress an entire compressed zstd |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
568 frame in a single operation.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
569 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
570 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
571 decompressed = dctx.decompress(data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
572 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
573 By default, ``decompress(data)`` will only work on data written with the content |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
574 size encoded in its header (this is the default behavior of |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
575 ``ZstdCompressor().compress()`` but may not be true for streaming compression). If |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
576 compressed data without an embedded content size is seen, ``zstd.ZstdError`` will |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
577 be raised. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
578 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
579 If the compressed data doesn't have its content size embedded within it, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
580 decompression can be attempted by specifying the ``max_output_size`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
581 argument.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
582 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
583 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
584 uncompressed = dctx.decompress(data, max_output_size=1048576) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
585 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
586 Ideally, ``max_output_size`` will be identical to the decompressed output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
587 size. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
588 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
589 If ``max_output_size`` is too small to hold the decompressed data, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
590 ``zstd.ZstdError`` will be raised. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
591 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
592 If ``max_output_size`` is larger than the decompressed data, the allocated |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
593 output buffer will be resized to only use the space required. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
594 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
595 Please note that an allocation of the requested ``max_output_size`` will be |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
596 performed every time the method is called. Setting to a very large value could |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
597 result in a lot of work for the memory allocator and may result in |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
598 ``MemoryError`` being raised if the allocation fails. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
599 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
600 .. important:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
601 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
602 If the exact size of decompressed data is unknown (not passed in explicitly |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
603 and not stored in the zstandard frame), for performance reasons it is |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
604 encouraged to use a streaming API. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
605 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
606 Stream Reader API |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
607 ^^^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
608 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
609 ``stream_reader(source)`` can be used to obtain an object conforming to the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
610 ``io.RawIOBase`` interface for reading decompressed output as a stream:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
611 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
612 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
613 dctx = zstd.ZstdDecompressor() |
40122
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
614 reader = dctx.stream_reader(fh) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
615 while True: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
616 chunk = reader.read(16384) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
617 if not chunk: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
618 break |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
619 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
620 # Do something with decompressed chunk. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
621 |
40122
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
622 The stream can also be used as a context manager:: |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
623 |
40122
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
624 with open(path, 'rb') as fh: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
625 dctx = zstd.ZstdDecompressor() |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
626 with dctx.stream_reader(fh) as reader: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
627 ... |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
628 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
629 When used as a context manager, the stream is closed and the underlying |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
630 resources are released when the context manager exits. Future operations against |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
631 the stream will fail. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
632 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
633 The ``source`` argument to ``stream_reader()`` can be any object with a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
634 ``read(size)`` method or any object implementing the *buffer protocol*. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
635 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
636 If the ``source`` is a stream, you can specify how large ``read()`` requests |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
637 to that stream should be via the ``read_size`` argument. It defaults to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
638 ``zstandard.DECOMPRESSION_RECOMMENDED_INPUT_SIZE``.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
639 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
640 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
641 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
642 # Will perform fh.read(8192) when obtaining data for the decompressor. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
643 with dctx.stream_reader(fh, read_size=8192) as reader: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
644 ... |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
645 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
646 The stream returned by ``stream_reader()`` is not writable. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
647 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
648 The stream returned by ``stream_reader()`` is *partially* seekable. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
649 Absolute and relative positions (``SEEK_SET`` and ``SEEK_CUR``) forward |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
650 of the current position are allowed. Offsets behind the current read |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
651 position and offsets relative to the end of stream are not allowed and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
652 will raise ``ValueError`` if attempted. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
653 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
654 ``tell()`` returns the number of decompressed bytes read so far. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
655 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
656 Not all I/O methods are implemented. Notably missing is support for |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
657 ``readline()``, ``readlines()``, and linewise iteration support. Support for |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
658 these is planned for a future release. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
659 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
660 Streaming Input API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
661 ^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
662 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
663 ``stream_writer(fh)`` can be used to incrementally send compressed data to a |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
664 decompressor.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
665 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
666 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
667 with dctx.stream_writer(fh) as decompressor: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
668 decompressor.write(compressed_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
669 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
670 This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
671 the decompressor by calling ``write(data)`` and decompressed output is written |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
672 to the output object by calling its ``write(data)`` method. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
673 |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
674 Calls to ``write()`` will return the number of bytes written to the output |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
675 object. Not all inputs will result in bytes being written, so return values |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
676 of ``0`` are possible. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
677 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
678 The size of chunks being ``write()`` to the destination can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
679 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
680 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
681 with dctx.stream_writer(fh, write_size=16384) as decompressor: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
682 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
683 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
684 You can see how much memory is being used by the decompressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
685 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
686 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
687 with dctx.stream_writer(fh) as decompressor: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
688 byte_size = decompressor.memory_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
689 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
690 Streaming Output API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
691 ^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
692 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
693 ``read_to_iter(fh)`` provides a mechanism to stream decompressed data out of a |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
694 compressed source as an iterator of data chunks.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
695 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
696 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
697 for chunk in dctx.read_to_iter(fh): |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
698 # Do something with original data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
699 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
700 ``read_to_iter()`` accepts an object with a ``read(size)`` method that will |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
701 return compressed bytes or an object conforming to the buffer protocol that |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
702 can expose its data as a contiguous range of bytes. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
703 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
704 ``read_to_iter()`` returns an iterator whose elements are chunks of the |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
705 decompressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
706 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
707 The size of requested ``read()`` from the source can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
708 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
709 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
710 for chunk in dctx.read_to_iter(fh, read_size=16384): |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
711 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
712 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
713 It is also possible to skip leading bytes in the input data:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
714 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
715 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
716 for chunk in dctx.read_to_iter(fh, skip_bytes=1): |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
717 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
718 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
719 .. tip:: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
720 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
721 Skipping leading bytes is useful if the source data contains extra |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
722 *header* data. Traditionally, you would need to create a slice or |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
723 ``memoryview`` of the data you want to decompress. This would create |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
724 overhead. It is more efficient to pass the offset into this API. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
725 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
726 Similarly to ``ZstdCompressor.read_to_iter()``, the consumer of the iterator |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
727 controls when data is decompressed. If the iterator isn't consumed, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
728 decompression is put on hold. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
729 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
730 When ``read_to_iter()`` is passed an object conforming to the buffer protocol, |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
731 the behavior may seem similar to what occurs when the simple decompression |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
732 API is used. However, this API works when the decompressed size is unknown. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
733 Furthermore, if feeding large inputs, the decompressor will work in chunks |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
734 instead of performing a single operation. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
735 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
736 Stream Copying API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
737 ^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
738 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
739 ``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
740 performing decompression.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
741 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
742 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
743 dctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
744 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
745 e.g. to decompress a file to another file:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
746 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
747 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
748 with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
749 dctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
750 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
751 The size of chunks being ``read()`` and ``write()`` from and to the streams |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
752 can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
753 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
754 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
755 dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
756 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
757 Decompressor API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
758 ^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
759 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
760 ``decompressobj()`` returns an object that exposes a ``decompress(data)`` |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
761 method. Compressed data chunks are fed into ``decompress(data)`` and |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
762 uncompressed output (or an empty bytes) is returned. Output from subsequent |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
763 calls needs to be concatenated to reassemble the full decompressed byte |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
764 sequence. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
765 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
766 The purpose of ``decompressobj()`` is to provide an API-compatible interface |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
767 with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
768 to swap in different decompressor objects while using the same API. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
769 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
770 Each object is single use: once an input frame is decoded, ``decompress()`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
771 can no longer be called. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
772 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
773 Here is how this API should be used:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
774 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
775 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
776 dobj = dctx.decompressobj() |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
777 data = dobj.decompress(compressed_chunk_0) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
778 data = dobj.decompress(compressed_chunk_1) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
779 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
780 By default, calls to ``decompress()`` write output data in chunks of size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
781 ``DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE``. These chunks are concatenated |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
782 before being returned to the caller. It is possible to define the size of |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
783 these temporary chunks by passing ``write_size`` to ``decompressobj()``:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
784 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
785 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
786 dobj = dctx.decompressobj(write_size=1048576) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
787 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
788 .. note:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
789 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
790 Because calls to ``decompress()`` may need to perform multiple |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
791 memory (re)allocations, this streaming decompression API isn't as |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
792 efficient as other APIs. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
793 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
794 Batch Decompression API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
795 ^^^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
796 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
797 (Experimental. Not yet supported in CFFI bindings.) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
798 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
799 ``multi_decompress_to_buffer()`` performs decompression of multiple |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
800 frames as a single operation and returns a ``BufferWithSegmentsCollection`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
801 containing decompressed data for all inputs. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
802 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
803 Compressed frames can be passed to the function as a ``BufferWithSegments``, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
804 a ``BufferWithSegmentsCollection``, or as a list containing objects that |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
805 conform to the buffer protocol. For best performance, pass a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
806 ``BufferWithSegmentsCollection`` or a ``BufferWithSegments``, as |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
807 minimal input validation will be done for that type. If calling from |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
808 Python (as opposed to C), constructing one of these instances may add |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
809 overhead cancelling out the performance overhead of validation for list |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
810 inputs.:: |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
811 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
812 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
813 results = dctx.multi_decompress_to_buffer([b'...', b'...']) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
814 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
815 The decompressed size of each frame MUST be discoverable. It can either be |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
816 embedded within the zstd frame (``write_content_size=True`` argument to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
817 ``ZstdCompressor``) or passed in via the ``decompressed_sizes`` argument. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
818 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
819 The ``decompressed_sizes`` argument is an object conforming to the buffer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
820 protocol which holds an array of 64-bit unsigned integers in the machine's |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
821 native format defining the decompressed sizes of each frame. If this argument |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
822 is passed, it avoids having to scan each frame for its decompressed size. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
823 This frame scanning can add noticeable overhead in some scenarios.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
824 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
825 frames = [...] |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
826 sizes = struct.pack('=QQQQ', len0, len1, len2, len3) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
827 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
828 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
829 results = dctx.multi_decompress_to_buffer(frames, decompressed_sizes=sizes) |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
830 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
831 The ``threads`` argument controls the number of threads to use to perform |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
832 decompression operations. The default (``0``) or the value ``1`` means to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
833 use a single thread. Negative values use the number of logical CPUs in the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
834 machine. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
835 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
836 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
837 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
838 It is possible to pass a ``mmap.mmap()`` instance into this function by |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
839 wrapping it with a ``BufferWithSegments`` instance (which will define the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
840 offsets of frames within the memory mapped region). |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
841 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
842 This function is logically equivalent to performing ``dctx.decompress()`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
843 on each input frame and returning the result. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
844 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
845 This function exists to perform decompression on multiple frames as fast |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
846 as possible by having as little overhead as possible. Since decompression is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
847 performed as a single operation and since the decompressed output is stored in |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
848 a single buffer, extra memory allocations, Python objects, and Python function |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
849 calls are avoided. This is ideal for scenarios where callers know up front that |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
850 they need to access data for multiple frames, such as when *delta chains* are |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
851 being used. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
852 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
853 Currently, the implementation always spawns multiple threads when requested, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
854 even if the amount of work to do is small. In the future, it will be smarter |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
855 about avoiding threads and their associated overhead when the amount of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
856 work to do is small. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
857 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
858 Prefix Dictionary Chain Decompression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
859 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
860 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
861 ``decompress_content_dict_chain(frames)`` performs decompression of a list of |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
862 zstd frames produced using chained *prefix* dictionary compression. Such |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
863 a list of frames is produced by compressing discrete inputs where each |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
864 non-initial input is compressed with a *prefix* dictionary consisting of the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
865 content of the previous input. |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
866 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
867 For example, say you have the following inputs:: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
868 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
869 inputs = [b'input 1', b'input 2', b'input 3'] |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
870 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
871 The zstd frame chain consists of: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
872 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
873 1. ``b'input 1'`` compressed in standalone/discrete mode |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
874 2. ``b'input 2'`` compressed using ``b'input 1'`` as a *prefix* dictionary |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
875 3. ``b'input 3'`` compressed using ``b'input 2'`` as a *prefix* dictionary |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
876 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
877 Each zstd frame **must** have the content size written. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
878 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
879 The following Python code can be used to produce a *prefix dictionary chain*:: |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
880 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
881 def make_chain(inputs): |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
882 frames = [] |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
883 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
884 # First frame is compressed in standalone/discrete mode. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
885 zctx = zstd.ZstdCompressor() |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
886 frames.append(zctx.compress(inputs[0])) |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
887 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
888 # Subsequent frames use the previous fulltext as a prefix dictionary |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
889 for i, raw in enumerate(inputs[1:]): |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
890 dict_data = zstd.ZstdCompressionDict( |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
891 inputs[i], dict_type=zstd.DICT_TYPE_RAWCONTENT) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
892 zctx = zstd.ZstdCompressor(dict_data=dict_data) |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
893 frames.append(zctx.compress(raw)) |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
894 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
895 return frames |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
896 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
897 ``decompress_content_dict_chain()`` returns the uncompressed data of the last |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
898 element in the input chain. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
899 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
900 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
901 .. note:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
902 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
903 It is possible to implement *prefix dictionary chain* decompression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
904 on top of other APIs. However, this function will likely be faster - |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
905 especially for long input chains - as it avoids the overhead of instantiating |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
906 and passing around intermediate objects between C and Python. |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
907 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
908 Multi-Threaded Compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
909 -------------------------- |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
910 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
911 ``ZstdCompressor`` accepts a ``threads`` argument that controls the number |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
912 of threads to use for compression. The way this works is that input is split |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
913 into segments and each segment is fed into a worker pool for compression. Once |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
914 a segment is compressed, it is flushed/appended to the output. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
915 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
916 .. note:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
917 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
918 These threads are created at the C layer and are not Python threads. So they |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
919 work outside the GIL. It is therefore possible to CPU saturate multiple cores |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
920 from Python. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
921 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
922 The segment size for multi-threaded compression is chosen from the window size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
923 of the compressor. This is derived from the ``window_log`` attribute of a |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
924 ``ZstdCompressionParameters`` instance. By default, segment sizes are in the 1+MB |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
925 range. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
926 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
927 If multi-threaded compression is requested and the input is smaller than the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
928 configured segment size, only a single compression thread will be used. If the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
929 input is smaller than the segment size multiplied by the thread pool size or |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
930 if data cannot be delivered to the compressor fast enough, not all requested |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
931 compressor threads may be active simultaneously. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
932 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
933 Compared to non-multi-threaded compression, multi-threaded compression has |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
934 higher per-operation overhead. This includes extra memory operations, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
935 thread creation, lock acquisition, etc. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
936 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
937 Due to the nature of multi-threaded compression using *N* compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
938 *states*, the output from multi-threaded compression will likely be larger |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
939 than non-multi-threaded compression. The difference is usually small. But |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
940 there is a CPU/wall time versus size trade off that may warrant investigation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
941 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
942 Output from multi-threaded compression does not require any special handling |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
943 on the decompression side. To the decompressor, data generated with single |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
944 threaded compressor looks the same as data generated by a multi-threaded |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
945 compressor and does not require any special handling or additional resource |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
946 requirements. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
947 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
948 Dictionary Creation and Management |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
949 ---------------------------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
950 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
951 Compression dictionaries are represented with the ``ZstdCompressionDict`` type. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
952 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
953 Instances can be constructed from bytes:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
954 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
955 dict_data = zstd.ZstdCompressionDict(data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
956 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
957 It is possible to construct a dictionary from *any* data. If the data doesn't |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
958 begin with a magic header, it will be treated as a *prefix* dictionary. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
959 *Prefix* dictionaries allow compression operations to reference raw data |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
960 within the dictionary. |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
961 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
962 It is possible to force the use of *prefix* dictionaries or to require a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
963 dictionary header: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
964 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
965 dict_data = zstd.ZstdCompressionDict(data, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
966 dict_type=zstd.DICT_TYPE_RAWCONTENT) |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
967 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
968 dict_data = zstd.ZstdCompressionDict(data, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
969 dict_type=zstd.DICT_TYPE_FULLDICT) |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
970 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
971 You can see how many bytes are in the dictionary by calling ``len()``:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
972 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
973 dict_data = zstd.train_dictionary(size, samples) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
974 dict_size = len(dict_data) # will not be larger than ``size`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
975 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
976 Once you have a dictionary, you can pass it to the objects performing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
977 compression and decompression:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
978 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
979 dict_data = zstd.train_dictionary(131072, samples) |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
980 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
981 cctx = zstd.ZstdCompressor(dict_data=dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
982 for source_data in input_data: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
983 compressed = cctx.compress(source_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
984 # Do something with compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
985 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
986 dctx = zstd.ZstdDecompressor(dict_data=dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
987 for compressed_data in input_data: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
988 buffer = io.BytesIO() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
989 with dctx.stream_writer(buffer) as decompressor: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
990 decompressor.write(compressed_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
991 # Do something with raw data in ``buffer``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
992 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
993 Dictionaries have unique integer IDs. You can retrieve this ID via:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
994 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
995 dict_id = zstd.dictionary_id(dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
996 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
997 You can obtain the raw data in the dict (useful for persisting and constructing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
998 a ``ZstdCompressionDict`` later) via ``as_bytes()``:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
999 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1000 dict_data = zstd.train_dictionary(size, samples) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1001 raw_data = dict_data.as_bytes() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1002 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1003 By default, when a ``ZstdCompressionDict`` is *attached* to a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1004 ``ZstdCompressor``, each ``ZstdCompressor`` performs work to prepare the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1005 dictionary for use. This is fine if only 1 compression operation is being |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1006 performed or if the ``ZstdCompressor`` is being reused for multiple operations. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1007 But if multiple ``ZstdCompressor`` instances are being used with the dictionary, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1008 this can add overhead. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1009 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1010 It is possible to *precompute* the dictionary so it can readily be consumed |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1011 by multiple ``ZstdCompressor`` instances:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1012 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1013 d = zstd.ZstdCompressionDict(data) |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1014 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1015 # Precompute for compression level 3. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1016 d.precompute_compress(level=3) |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1017 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1018 # Precompute with specific compression parameters. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1019 params = zstd.ZstdCompressionParameters(...) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1020 d.precompute_compress(compression_params=params) |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1021 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1022 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1023 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1024 When a dictionary is precomputed, the compression parameters used to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1025 precompute the dictionary overwrite some of the compression parameters |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1026 specified to ``ZstdCompressor.__init__``. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1027 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1028 Training Dictionaries |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1029 ^^^^^^^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1030 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1031 Unless using *prefix* dictionaries, dictionary data is produced by *training* |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1032 on existing data:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1033 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1034 dict_data = zstd.train_dictionary(size, samples) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1035 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1036 This takes a target dictionary size and list of bytes instances and creates and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1037 returns a ``ZstdCompressionDict``. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1038 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1039 The dictionary training mechanism is known as *cover*. More details about it are |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1040 available in the paper *Effective Construction of Relative Lempel-Ziv |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1041 Dictionaries* (authors: Liao, Petri, Moffat, Wirth). |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1042 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1043 The cover algorithm takes parameters ``k` and ``d``. These are the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1044 *segment size* and *dmer size*, respectively. The returned dictionary |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1045 instance created by this function has ``k`` and ``d`` attributes |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1046 containing the values for these parameters. If a ``ZstdCompressionDict`` |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1047 is constructed from raw bytes data (a content-only dictionary), the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1048 ``k`` and ``d`` attributes will be ``0``. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1049 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1050 The segment and dmer size parameters to the cover algorithm can either be |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1051 specified manually or ``train_dictionary()`` can try multiple values |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1052 and pick the best one, where *best* means the smallest compressed data size. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1053 This later mode is called *optimization* mode. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1054 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1055 If none of ``k``, ``d``, ``steps``, ``threads``, ``level``, ``notifications``, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1056 or ``dict_id`` (basically anything from the underlying ``ZDICT_cover_params_t`` |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1057 struct) are defined, *optimization* mode is used with default parameter |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1058 values. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1059 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1060 If ``steps`` or ``threads`` are defined, then *optimization* mode is engaged |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1061 with explicit control over those parameters. Specifying ``threads=0`` or |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1062 ``threads=1`` can be used to engage *optimization* mode if other parameters |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1063 are not defined. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1064 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1065 Otherwise, non-*optimization* mode is used with the parameters specified. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1066 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1067 This function takes the following arguments: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1068 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1069 dict_size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1070 Target size in bytes of the dictionary to generate. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1071 samples |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1072 A list of bytes holding samples the dictionary will be trained from. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1073 k |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1074 Parameter to cover algorithm defining the segment size. A reasonable range |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1075 is [16, 2048+]. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1076 d |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1077 Parameter to cover algorithm defining the dmer size. A reasonable range is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1078 [6, 16]. ``d`` must be less than or equal to ``k``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1079 dict_id |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1080 Integer dictionary ID for the produced dictionary. Default is 0, which uses |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1081 a random value. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1082 steps |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1083 Number of steps through ``k`` values to perform when trying parameter |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1084 variations. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1085 threads |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1086 Number of threads to use when trying parameter variations. Default is 0, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1087 which means to use a single thread. A negative value can be specified to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1088 use as many threads as there are detected logical CPUs. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1089 level |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1090 Integer target compression level when trying parameter variations. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1091 notifications |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1092 Controls writing of informational messages to ``stderr``. ``0`` (the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1093 default) means to write nothing. ``1`` writes errors. ``2`` writes |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1094 progression info. ``3`` writes more details. And ``4`` writes all info. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1095 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1096 Explicit Compression Parameters |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1097 ------------------------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1098 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1099 Zstandard offers a high-level *compression level* that maps to lower-level |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1100 compression parameters. For many consumers, this numeric level is the only |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1101 compression setting you'll need to touch. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1102 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1103 But for advanced use cases, it might be desirable to tweak these lower-level |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1104 settings. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1105 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1106 The ``ZstdCompressionParameters`` type represents these low-level compression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1107 settings. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1108 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1109 Instances of this type can be constructed from a myriad of keyword arguments |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1110 (defined below) for complete low-level control over each adjustable |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1111 compression setting. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1112 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1113 From a higher level, one can construct a ``ZstdCompressionParameters`` instance |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1114 given a desired compression level and target input and dictionary size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1115 using ``ZstdCompressionParameters.from_level()``. e.g.:: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1116 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1117 # Derive compression settings for compression level 7. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1118 params = zstd.ZstdCompressionParameters.from_level(7) |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1119 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1120 # With an input size of 1MB |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1121 params = zstd.ZstdCompressionParameters.from_level(7, source_size=1048576) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1122 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1123 Using ``from_level()``, it is also possible to override individual compression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1124 parameters or to define additional settings that aren't automatically derived. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1125 e.g.:: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1126 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1127 params = zstd.ZstdCompressionParameters.from_level(4, window_log=10) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1128 params = zstd.ZstdCompressionParameters.from_level(5, threads=4) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1129 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1130 Or you can define low-level compression settings directly:: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1131 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1132 params = zstd.ZstdCompressionParameters(window_log=12, enable_ldm=True) |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1133 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1134 Once a ``ZstdCompressionParameters`` instance is obtained, it can be used to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1135 configure a compressor:: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1136 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1137 cctx = zstd.ZstdCompressor(compression_params=params) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1138 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1139 The named arguments and attributes of ``ZstdCompressionParameters`` are as |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1140 follows: |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1141 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1142 * format |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1143 * compression_level |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1144 * window_log |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1145 * hash_log |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1146 * chain_log |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1147 * search_log |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1148 * min_match |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1149 * target_length |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1150 * compression_strategy |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1151 * write_content_size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1152 * write_checksum |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1153 * write_dict_id |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1154 * job_size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1155 * overlap_size_log |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1156 * force_max_window |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1157 * enable_ldm |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1158 * ldm_hash_log |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1159 * ldm_min_match |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1160 * ldm_bucket_size_log |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1161 * ldm_hash_every_log |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1162 * threads |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1163 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1164 Some of these are very low-level settings. It may help to consult the official |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1165 zstandard documentation for their behavior. Look for the ``ZSTD_p_*`` constants |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1166 in ``zstd.h`` (https://github.com/facebook/zstd/blob/dev/lib/zstd.h). |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1167 |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1168 Frame Inspection |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1169 ---------------- |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1170 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1171 Data emitted from zstd compression is encapsulated in a *frame*. This frame |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1172 begins with a 4 byte *magic number* header followed by 2 to 14 bytes describing |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1173 the frame in more detail. For more info, see |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1174 https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1175 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1176 ``zstd.get_frame_parameters(data)`` parses a zstd *frame* header from a bytes |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1177 instance and return a ``FrameParameters`` object describing the frame. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1178 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1179 Depending on which fields are present in the frame and their values, the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1180 length of the frame parameters varies. If insufficient bytes are passed |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1181 in to fully parse the frame parameters, ``ZstdError`` is raised. To ensure |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1182 frame parameters can be parsed, pass in at least 18 bytes. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1183 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1184 ``FrameParameters`` instances have the following attributes: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1185 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1186 content_size |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1187 Integer size of original, uncompressed content. This will be ``0`` if the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1188 original content size isn't written to the frame (controlled with the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1189 ``write_content_size`` argument to ``ZstdCompressor``) or if the input |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1190 content size was ``0``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1191 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1192 window_size |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1193 Integer size of maximum back-reference distance in compressed data. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1194 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1195 dict_id |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1196 Integer of dictionary ID used for compression. ``0`` if no dictionary |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1197 ID was used or if the dictionary ID was ``0``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1198 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1199 has_checksum |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1200 Bool indicating whether a 4 byte content checksum is stored at the end |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1201 of the frame. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1202 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1203 ``zstd.frame_header_size(data)`` returns the size of the zstandard frame |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1204 header. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1205 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1206 ``zstd.frame_content_size(data)`` returns the content size as parsed from |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1207 the frame header. ``-1`` means the content size is unknown. ``0`` means |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1208 an empty frame. The content size is usually correct. However, it may not |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1209 be accurate. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1210 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1211 Misc Functionality |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1212 ------------------ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1213 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1214 estimate_decompression_context_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1215 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1216 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1217 Estimate the memory size requirements for a decompressor instance. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1218 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1219 Constants |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1220 --------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1221 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1222 The following module constants/attributes are exposed: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1223 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1224 ZSTD_VERSION |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1225 This module attribute exposes a 3-tuple of the Zstandard version. e.g. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1226 ``(1, 0, 0)`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1227 MAX_COMPRESSION_LEVEL |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1228 Integer max compression level accepted by compression functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1229 COMPRESSION_RECOMMENDED_INPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1230 Recommended chunk size to feed to compressor functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1231 COMPRESSION_RECOMMENDED_OUTPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1232 Recommended chunk size for compression output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1233 DECOMPRESSION_RECOMMENDED_INPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1234 Recommended chunk size to feed into decompresor functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1235 DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1236 Recommended chunk size for decompression output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1237 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1238 FRAME_HEADER |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1239 bytes containing header of the Zstandard frame |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1240 MAGIC_NUMBER |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1241 Frame header as an integer |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1242 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1243 CONTENTSIZE_UNKNOWN |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1244 Value for content size when the content size is unknown. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1245 CONTENTSIZE_ERROR |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1246 Value for content size when content size couldn't be determined. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1247 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1248 WINDOWLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1249 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1250 WINDOWLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1251 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1252 CHAINLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1253 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1254 CHAINLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1255 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1256 HASHLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1257 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1258 HASHLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1259 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1260 SEARCHLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1261 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1262 SEARCHLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1263 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1264 SEARCHLENGTH_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1265 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1266 SEARCHLENGTH_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1267 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1268 TARGETLENGTH_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1269 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1270 STRATEGY_FAST |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1271 Compression strategy |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1272 STRATEGY_DFAST |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1273 Compression strategy |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1274 STRATEGY_GREEDY |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1275 Compression strategy |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1276 STRATEGY_LAZY |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1277 Compression strategy |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1278 STRATEGY_LAZY2 |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1279 Compression strategy |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1280 STRATEGY_BTLAZY2 |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1281 Compression strategy |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1282 STRATEGY_BTOPT |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1283 Compression strategy |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1284 STRATEGY_BTULTRA |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1285 Compression strategy |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1286 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1287 FORMAT_ZSTD1 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1288 Zstandard frame format |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1289 FORMAT_ZSTD1_MAGICLESS |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1290 Zstandard frame format without magic header |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1291 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1292 Performance Considerations |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1293 -------------------------- |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1294 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1295 The ``ZstdCompressor`` and ``ZstdDecompressor`` types maintain state to a |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1296 persistent compression or decompression *context*. Reusing a ``ZstdCompressor`` |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1297 or ``ZstdDecompressor`` instance for multiple operations is faster than |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1298 instantiating a new ``ZstdCompressor`` or ``ZstdDecompressor`` for each |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1299 operation. The differences are magnified as the size of data decreases. For |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1300 example, the difference between *context* reuse and non-reuse for 100,000 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1301 100 byte inputs will be significant (possiby over 10x faster to reuse contexts) |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1302 whereas 10 100,000,000 byte inputs will be more similar in speed (because the |
30924
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1303 time spent doing compression dwarfs time spent creating new *contexts*). |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1304 |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1305 Buffer Types |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1306 ------------ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1307 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1308 The API exposes a handful of custom types for interfacing with memory buffers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1309 The primary goal of these types is to facilitate efficient multi-object |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1310 operations. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1311 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1312 The essential idea is to have a single memory allocation provide backing |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1313 storage for multiple logical objects. This has 2 main advantages: fewer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1314 allocations and optimal memory access patterns. This avoids having to allocate |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1315 a Python object for each logical object and furthermore ensures that access of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1316 data for objects can be sequential (read: fast) in memory. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1317 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1318 BufferWithSegments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1319 ^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1320 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1321 The ``BufferWithSegments`` type represents a memory buffer containing N |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1322 discrete items of known lengths (segments). It is essentially a fixed size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1323 memory address and an array of 2-tuples of ``(offset, length)`` 64-bit |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1324 unsigned native endian integers defining the byte offset and length of each |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1325 segment within the buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1326 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1327 Instances behave like containers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1328 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1329 ``len()`` returns the number of segments within the instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1330 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1331 ``o[index]`` or ``__getitem__`` obtains a ``BufferSegment`` representing an |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1332 individual segment within the backing buffer. That returned object references |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1333 (not copies) memory. This means that iterating all objects doesn't copy |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1334 data within the buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1335 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1336 The ``.size`` attribute contains the total size in bytes of the backing |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1337 buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1338 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1339 Instances conform to the buffer protocol. So a reference to the backing bytes |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1340 can be obtained via ``memoryview(o)``. A *copy* of the backing bytes can also |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1341 be obtained via ``.tobytes()``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1342 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1343 The ``.segments`` attribute exposes the array of ``(offset, length)`` for |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1344 segments within the buffer. It is a ``BufferSegments`` type. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1345 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1346 BufferSegment |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1347 ^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1348 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1349 The ``BufferSegment`` type represents a segment within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1350 It is essentially a reference to N bytes within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1351 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1352 ``len()`` returns the length of the segment in bytes. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1353 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1354 ``.offset`` contains the byte offset of this segment within its parent |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1355 ``BufferWithSegments`` instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1356 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1357 The object conforms to the buffer protocol. ``.tobytes()`` can be called to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1358 obtain a ``bytes`` instance with a copy of the backing bytes. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1359 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1360 BufferSegments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1361 ^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1362 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1363 This type represents an array of ``(offset, length)`` integers defining segments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1364 within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1365 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1366 The array members are 64-bit unsigned integers using host/native bit order. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1367 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1368 Instances conform to the buffer protocol. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1369 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1370 BufferWithSegmentsCollection |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1371 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1372 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1373 The ``BufferWithSegmentsCollection`` type represents a virtual spanning view |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1374 of multiple ``BufferWithSegments`` instances. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1375 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1376 Instances are constructed from 1 or more ``BufferWithSegments`` instances. The |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1377 resulting object behaves like an ordered sequence whose members are the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1378 segments within each ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1379 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1380 ``len()`` returns the number of segments within all ``BufferWithSegments`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1381 instances. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1382 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1383 ``o[index]`` and ``__getitem__(index)`` return the ``BufferSegment`` at |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1384 that offset as if all ``BufferWithSegments`` instances were a single |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1385 entity. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1386 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1387 If the object is composed of 2 ``BufferWithSegments`` instances with the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1388 first having 2 segments and the second have 3 segments, then ``b[0]`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1389 and ``b[1]`` access segments in the first object and ``b[2]``, ``b[3]``, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1390 and ``b[4]`` access segments from the second. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1391 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1392 Choosing an API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1393 =============== |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1394 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1395 There are multiple APIs for performing compression and decompression. This is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1396 because different applications have different needs and the library wants to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1397 facilitate optimal use in as many use cases as possible. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1398 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1399 From a high-level, APIs are divided into *one-shot* and *streaming*: either you |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1400 are operating on all data at once or you operate on it piecemeal. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1401 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1402 The *one-shot* APIs are useful for small data, where the input or output |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1403 size is known. (The size can come from a buffer length, file size, or |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1404 stored in the zstd frame header.) A limitation of the *one-shot* APIs is that |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1405 input and output must fit in memory simultaneously. For say a 4 GB input, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1406 this is often not feasible. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1407 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1408 The *one-shot* APIs also perform all work as a single operation. So, if you |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1409 feed it large input, it could take a long time for the function to return. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1410 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1411 The streaming APIs do not have the limitations of the simple API. But the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1412 price you pay for this flexibility is that they are more complex than a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1413 single function call. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1414 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1415 The streaming APIs put the caller in control of compression and decompression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1416 behavior by allowing them to directly control either the input or output side |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1417 of the operation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1418 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1419 With the *streaming input*, *compressor*, and *decompressor* APIs, the caller |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1420 has full control over the input to the compression or decompression stream. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1421 They can directly choose when new data is operated on. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1422 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1423 With the *streaming ouput* APIs, the caller has full control over the output |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1424 of the compression or decompression stream. It can choose when to receive |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1425 new data. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1426 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1427 When using the *streaming* APIs that operate on file-like or stream objects, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1428 it is important to consider what happens in that object when I/O is requested. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1429 There is potential for long pauses as data is read or written from the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1430 underlying stream (say from interacting with a filesystem or network). This |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1431 could add considerable overhead. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1432 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1433 Thread Safety |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1434 ============= |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1435 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1436 ``ZstdCompressor`` and ``ZstdDecompressor`` instances have no guarantees |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1437 about thread safety. Do not operate on the same ``ZstdCompressor`` and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1438 ``ZstdDecompressor`` instance simultaneously from different threads. It is |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1439 fine to have different threads call into a single instance, just not at the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1440 same time. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1441 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1442 Some operations require multiple function calls to complete. e.g. streaming |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1443 operations. A single ``ZstdCompressor`` or ``ZstdDecompressor`` cannot be used |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1444 for simultaneously active operations. e.g. you must not start a streaming |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1445 operation when another streaming operation is already active. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1446 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1447 The C extension releases the GIL during non-trivial calls into the zstd C |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1448 API. Non-trivial calls are notably compression and decompression. Trivial |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1449 calls are things like parsing frame parameters. Where the GIL is released |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1450 is considered an implementation detail and can change in any release. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1451 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1452 APIs that accept bytes-like objects don't enforce that the underlying object |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1453 is read-only. However, it is assumed that the passed object is read-only for |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1454 the duration of the function call. It is possible to pass a mutable object |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1455 (like a ``bytearray``) to e.g. ``ZstdCompressor.compress()``, have the GIL |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1456 released, and mutate the object from another thread. Such a race condition |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1457 is a bug in the consumer of python-zstandard. Most Python data types are |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1458 immutable, so unless you are doing something fancy, you don't need to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1459 worry about this. |
31799
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30924
diff
changeset
|
1460 |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1461 Note on Zstandard's *Experimental* API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1462 ====================================== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1463 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1464 Many of the Zstandard APIs used by this module are marked as *experimental* |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1465 within the Zstandard project. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1466 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1467 It is unclear how Zstandard's C API will evolve over time, especially with |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1468 regards to this *experimental* functionality. We will try to maintain |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1469 backwards compatibility at the Python API level. However, we cannot |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1470 guarantee this for things not under our control. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1471 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1472 Since a copy of the Zstandard source code is distributed with this |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1473 module and since we compile against it, the behavior of a specific |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1474 version of this module should be constant for all of time. So if you |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1475 pin the version of this module used in your projects (which is a Python |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31799
diff
changeset
|
1476 best practice), you should be shielded from unwanted future changes. |
30444
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1477 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1478 Donate |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1479 ====== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1480 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1481 A lot of time has been invested into this project by the author. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1482 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1483 If you find this project useful and would like to thank the author for |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1484 their work, consider donating some money. Any amount is appreciated. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1485 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1486 .. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1487 :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1488 :alt: Donate via PayPal |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1489 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1490 .. |ci-status| image:: https://travis-ci.org/indygreg/python-zstandard.svg?branch=master |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1491 :target: https://travis-ci.org/indygreg/python-zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1492 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1493 .. |win-ci-status| image:: https://ci.appveyor.com/api/projects/status/github/indygreg/python-zstandard?svg=true |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1494 :target: https://ci.appveyor.com/project/indygreg/python-zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1495 :alt: Windows build status |