comparison mercurial/encoding.py @ 33927:853574db5b12

encoding: add fast path of from/tolocal() for ASCII strings This is micro optimization, but seems not bad since to/fromlocal() is called lots of times and isasciistr() is cheap and simple. We boldly assume that any non-ASCII characters have at least one 8-bit byte. This isn't true for some email character sets (e.g. ISO-2022-JP and UTF-7), but I believe no such encodings are used as a platform default. Shift_JIS, a major crap, is okay as it should have a leading byte in 0x80-0xff range. (with mercurial repo) $ export HGRCPATH=/dev/null HGPLAIN= $ hg log --time --config experimental.stabilization=all > /dev/null (original) time: real 7.460 secs (user 7.420+0.000 sys 0.030+0.000) time: real 7.670 secs (user 7.590+0.000 sys 0.080+0.000) time: real 7.560 secs (user 7.510+0.000 sys 0.040+0.000) (this patch) time: real 7.340 secs (user 7.260+0.000 sys 0.060+0.000) time: real 7.260 secs (user 7.210+0.000 sys 0.030+0.000) time: real 7.310 secs (user 7.260+0.000 sys 0.060+0.000)
author Yuya Nishihara <yuya@tcha.org>
date Sun, 23 Apr 2017 13:06:23 +0900
parents f4433f2713d0
children 6c119dbfd0c0
comparison
equal deleted inserted replaced
33926:f4433f2713d0 33927:853574db5b12
125 >>> l 125 >>> l
126 'foo: ?' 126 'foo: ?'
127 >>> fromlocal(l) # magically in utf-8 127 >>> fromlocal(l) # magically in utf-8
128 'foo: \\xc3\\xa4' 128 'foo: \\xc3\\xa4'
129 """ 129 """
130
131 if isasciistr(s):
132 return s
130 133
131 try: 134 try:
132 try: 135 try:
133 # make sure string is actually stored in UTF-8 136 # make sure string is actually stored in UTF-8
134 u = s.decode('UTF-8') 137 u = s.decode('UTF-8')
168 """ 171 """
169 172
170 # can we do a lossless round-trip? 173 # can we do a lossless round-trip?
171 if isinstance(s, localstr): 174 if isinstance(s, localstr):
172 return s._utf8 175 return s._utf8
176 if isasciistr(s):
177 return s
173 178
174 try: 179 try:
175 u = s.decode(_sysstr(encoding), _sysstr(encodingmode)) 180 u = s.decode(_sysstr(encoding), _sysstr(encodingmode))
176 return u.encode("utf-8") 181 return u.encode("utf-8")
177 except UnicodeDecodeError as inst: 182 except UnicodeDecodeError as inst: