Mercurial > public > mercurial-scm > hg-stable
comparison mercurial/utils/stringutil.py @ 38478:96f65bdf0bf4
stringutil: add a new function to do minimal regex escaping
Per https://bugs.python.org/issue29995, re.escape() used to
over-escape regular expression strings, but in Python 3.7 that's been
fixed, which also improved the performance of re.escape(). Since it's
both an output change for us *and* a perfomance win, let's just
effectively backport the new behavior to hg on all Python versions.
Differential Revision: https://phab.mercurial-scm.org/D3841
author | Augie Fackler <augie@google.com> |
---|---|
date | Tue, 26 Jun 2018 10:33:52 -0400 |
parents | fbb2eddea4d2 |
children | de275ab362cb |
comparison
equal
deleted
inserted
replaced
38477:622f79e3a1cb | 38478:96f65bdf0bf4 |
---|---|
20 from .. import ( | 20 from .. import ( |
21 encoding, | 21 encoding, |
22 error, | 22 error, |
23 pycompat, | 23 pycompat, |
24 ) | 24 ) |
25 | |
26 # regex special chars pulled from https://bugs.python.org/issue29995 | |
27 # which was part of Python 3.7. | |
28 _respecial = pycompat.bytestr(b'()[]{}?*+-|^$\\.# \t\n\r\v\f') | |
29 _regexescapemap = {ord(i): (b'\\' + i).decode('latin1') for i in _respecial} | |
30 | |
31 def reescape(pat): | |
32 """Drop-in replacement for re.escape.""" | |
33 # NOTE: it is intentional that this works on unicodes and not | |
34 # bytes, as it's only possible to do the escaping with | |
35 # unicode.translate, not bytes.translate. Sigh. | |
36 wantuni = True | |
37 if isinstance(pat, bytes): | |
38 wantuni = False | |
39 pat = pat.decode('latin1') | |
40 pat = pat.translate(_regexescapemap) | |
41 if wantuni: | |
42 return pat | |
43 return pat.encode('latin1') | |
25 | 44 |
26 def pprint(o, bprefix=False): | 45 def pprint(o, bprefix=False): |
27 """Pretty print an object.""" | 46 """Pretty print an object.""" |
28 if isinstance(o, bytes): | 47 if isinstance(o, bytes): |
29 if bprefix: | 48 if bprefix: |