mercurial-scm/hg-stable: contrib/byteify-strings.py comparison

comparison contrib/byteify-strings.py @ 38395:1d68fd5f614a

byteify-strings: do not rewrite system string literals to u'' It would make things worse on Python 2 because unicode processing is generally slower than byte string. We should just leave system strings unmodified.

author	Yuya Nishihara <yuya@tcha.org>
date	Thu, 31 May 2018 23:44:35 +0900
parents	f701bc936e7f
children	47dd23e6b116

comparison

equal deleted inserted replaced

-:f701bc936e7f
+:1d68fd5f614a
 Returns a generator of possibly rewritten tokens.
 The input token list may be mutated as part of processing. However,
 its changes do not necessarily match the output token stream.
 """
+sysstrtokens = set()
 # The following utility functions access the tokens list and i index of
 # the for i, t enumerate(tokens) loop below
 def _isop(j, *o):
 """Assert that tokens[j] is an OP with one of the given values"""
 try:
 elif _isop(j, ',') and nested == 0:
 n -= 1
 return None
-def _ensureunicode(j):
+def _ensuresysstr(j):
-"""Make sure the token at j is a unicode string
+"""Make sure the token at j is a system string
-This rewrites a string token to include the unicode literal prefix
+Remember the given token so the string transformer won't add
-so the string transformer won't add the byte prefix.
+the byte prefix.
 Ignores tokens that are not strings. Assumes bounds checking has
 already been done.
 """
 st = tokens[j]
 if st.type == token.STRING and st.string.startswith(("'", '"')):
-tokens[j] = st._replace(string='u%s' % st.string)
+sysstrtokens.add(st)
 for i, t in enumerate(tokens):
 # Convert most string literals to byte literals. String literals
 # in Python 2 are bytes. String literals in Python 3 are unicode.
 # Most strings in Mercurial are bytes and unicode strings are rare.
 # Rather than rewrite all string literals to use ``b''`` to indicate
 # byte strings, we apply this token transformer to insert the ``b``
 # prefix nearly everywhere.
-if t.type == token.STRING:
+if t.type == token.STRING and t not in sysstrtokens:
 s = t.string
 # Preserve docstrings as string literals. This is inconsistent
 # with regular unprefixed strings. However, the
 # "from __future__" parsing (which allows a module docstring to
 # *attr() builtins don't accept byte strings to 2nd argument.
 if (fn in ('getattr', 'setattr', 'hasattr', 'safehasattr') and
 not _isop(i - 1, '.')):
 arg1idx = _findargnofcall(1)
 if arg1idx is not None:
-_ensureunicode(arg1idx)
+_ensuresysstr(arg1idx)
 # .encode() and .decode() on str/bytes/unicode don't accept
 # byte strings on Python 3.
 elif fn in ('encode', 'decode') and _isop(i - 1, '.'):
 for argn in range(2):
 argidx = _findargnofcall(argn)
 if argidx is not None:
-_ensureunicode(argidx)
+_ensuresysstr(argidx)
 # It changes iteritems/values to items/values as they are not
 # present in Python 3 world.
 elif opts['dictiter'] and fn in ('iteritems', 'itervalues'):
 yield t._replace(string=fn[4:])

Mercurial > public > mercurial-scm > hg-stable

comparison contrib/byteify-strings.py @ 38395:1d68fd5f614a