annotate mercurial/pure/charencode.py @ 51901:f4733654f144

typing: add `from __future__ import annotations` to most files Now that py36 is no longer supported, we can postpone annotation evaluation. This means that the quoting is usually optional (for things imported under the guard of `if typing.TYPE_CHECKING:` to avoid circular imports), and there's less overhead on startup[1]. There may be some missing here. I backed out 6000f5b25c9b (which removed the `from __future__ import ...` that was supporting py2), reverted the changes in `contrib/`, `doc/`, and `tests/`, and then ran: $ hg status -n --change . | \ xargs sed -i -e 's/from __future__ import .*$/from __future__ import annotations/' There were some minor tweaks needed when reviewing (mostly making the spacing around the import consistent, and `mercurial/testing/__init__.py` had a multiline import that wasn't fully rewritten. [1] https://docs.python.org/3/whatsnew/3.7.html#pep-563-postponed-evaluation-of-annotations
author Matt Harbison <matt_harbison@yahoo.com>
date Mon, 16 Sep 2024 15:36:44 +0200
parents 43adbe03079b
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
33782
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
1 # charencode.py - miscellaneous character encoding
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
2 #
46819
d4ba4d51f85f contributor: change mentions of mpm to olivia
Rapha?l Gom?s <rgomes@octobus.net>
parents: 45957
diff changeset
3 # Copyright 2005-2009 Olivia Mackall <olivia@selenic.com> and others
33782
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
4 #
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
6 # GNU General Public License version 2 or any later version.
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
7
51901
f4733654f144 typing: add `from __future__ import annotations` to most files
Matt Harbison <matt_harbison@yahoo.com>
parents: 51781
diff changeset
8 from __future__ import annotations
33782
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
9
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
10 import array
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
11
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34225
diff changeset
12 from .. import pycompat
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34225
diff changeset
13
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
14
51781
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48998
diff changeset
15 def isasciistr(s: bytes) -> bool:
33944
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33942
diff changeset
16 try:
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33942
diff changeset
17 s.decode('ascii')
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33942
diff changeset
18 return True
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33942
diff changeset
19 except UnicodeDecodeError:
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33942
diff changeset
20 return False
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33942
diff changeset
21
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34225
diff changeset
22
51781
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48998
diff changeset
23 def asciilower(s: bytes) -> bytes:
45957
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43554
diff changeset
24 """convert a string to lowercase if ASCII
33782
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
25
45957
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43554
diff changeset
26 Raises UnicodeDecodeError if non-ASCII characters are found."""
33782
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
27 s.decode('ascii')
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
28 return s.lower()
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
29
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34225
diff changeset
30
51781
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48998
diff changeset
31 def asciiupper(s: bytes) -> bytes:
45957
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43554
diff changeset
32 """convert a string to uppercase if ASCII
33782
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
33
45957
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43554
diff changeset
34 Raises UnicodeDecodeError if non-ASCII characters are found."""
33782
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
35 s.decode('ascii')
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
36 return s.upper()
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
37
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34225
diff changeset
38
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
39 _jsonmap = []
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
40 _jsonmap.extend(b"\\u%04x" % x for x in range(32))
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
41 _jsonmap.extend(pycompat.bytechr(x) for x in range(32, 127))
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
42 _jsonmap.append(b'\\u007f')
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
43 _jsonmap[0x09] = b'\\t'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
44 _jsonmap[0x0A] = b'\\n'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
45 _jsonmap[0x22] = b'\\"'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
46 _jsonmap[0x5C] = b'\\\\'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
47 _jsonmap[0x08] = b'\\b'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
48 _jsonmap[0x0C] = b'\\f'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
49 _jsonmap[0x0D] = b'\\r'
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
50 _paranoidjsonmap = _jsonmap[:]
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
51 _paranoidjsonmap[0x3C] = b'\\u003c' # '<' (e.g. escape "</script>")
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
52 _paranoidjsonmap[0x3E] = b'\\u003e' # '>'
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
53 _jsonmap.extend(pycompat.bytechr(x) for x in range(128, 256))
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
54
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34225
diff changeset
55
51781
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48998
diff changeset
56 def jsonescapeu8fast(u8chars: bytes, paranoid: bool) -> bytes:
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
57 """Convert a UTF-8 byte string to JSON-escaped form (fast path)
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
58
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
59 Raises ValueError if non-ASCII characters have to be escaped.
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
60 """
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
61 if paranoid:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
62 jm = _paranoidjsonmap
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
63 else:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
64 jm = _jsonmap
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
65 try:
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
66 return b''.join(jm[x] for x in bytearray(u8chars))
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
67 except IndexError:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
68 raise ValueError
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
69
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34225
diff changeset
70
48998
b677bccf74b9 charencode: remove Python 2 support code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 48966
diff changeset
71 _utf8strict = r'surrogatepass'
34225
aa877860d4d7 py3: use 'surrogatepass' error handler to process U+DCxx transparently
Yuya Nishihara <yuya@tcha.org>
parents: 34224
diff changeset
72
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34225
diff changeset
73
51781
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48998
diff changeset
74 def jsonescapeu8fallback(u8chars: bytes, paranoid: bool) -> bytes:
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
75 """Convert a UTF-8 byte string to JSON-escaped form (slow path)
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
76
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
77 Escapes all non-ASCII characters no matter if paranoid is False.
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
78 """
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
79 if paranoid:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
80 jm = _paranoidjsonmap
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
81 else:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
82 jm = _jsonmap
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
83 # non-BMP char is represented as UTF-16 surrogate pair
34225
aa877860d4d7 py3: use 'surrogatepass' error handler to process U+DCxx transparently
Yuya Nishihara <yuya@tcha.org>
parents: 34224
diff changeset
84 u16b = u8chars.decode('utf-8', _utf8strict).encode('utf-16', _utf8strict)
43554
9f70512ae2cf cleanup: remove pointless r-prefixes on single-quoted strings
Augie Fackler <augie@google.com>
parents: 43077
diff changeset
85 u16codes = array.array('H', u16b)
33942
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33782
diff changeset
86 u16codes.pop(0) # drop BOM
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
87 return b''.join(jm[x] if x < 128 else b'\\u%04x' % x for x in u16codes)