Mercurial > public > mercurial-scm > hg
comparison mercurial/encoding.py @ 33927:853574db5b12
encoding: add fast path of from/tolocal() for ASCII strings
This is micro optimization, but seems not bad since to/fromlocal() is called
lots of times and isasciistr() is cheap and simple.
We boldly assume that any non-ASCII characters have at least one 8-bit byte.
This isn't true for some email character sets (e.g. ISO-2022-JP and UTF-7),
but I believe no such encodings are used as a platform default. Shift_JIS,
a major crap, is okay as it should have a leading byte in 0x80-0xff range.
(with mercurial repo)
$ export HGRCPATH=/dev/null HGPLAIN=
$ hg log --time --config experimental.stabilization=all > /dev/null
(original)
time: real 7.460 secs (user 7.420+0.000 sys 0.030+0.000)
time: real 7.670 secs (user 7.590+0.000 sys 0.080+0.000)
time: real 7.560 secs (user 7.510+0.000 sys 0.040+0.000)
(this patch)
time: real 7.340 secs (user 7.260+0.000 sys 0.060+0.000)
time: real 7.260 secs (user 7.210+0.000 sys 0.030+0.000)
time: real 7.310 secs (user 7.260+0.000 sys 0.060+0.000)
author | Yuya Nishihara <yuya@tcha.org> |
---|---|
date | Sun, 23 Apr 2017 13:06:23 +0900 |
parents | f4433f2713d0 |
children | 6c119dbfd0c0 |
comparison
equal
deleted
inserted
replaced
33926:f4433f2713d0 | 33927:853574db5b12 |
---|---|
125 >>> l | 125 >>> l |
126 'foo: ?' | 126 'foo: ?' |
127 >>> fromlocal(l) # magically in utf-8 | 127 >>> fromlocal(l) # magically in utf-8 |
128 'foo: \\xc3\\xa4' | 128 'foo: \\xc3\\xa4' |
129 """ | 129 """ |
130 | |
131 if isasciistr(s): | |
132 return s | |
130 | 133 |
131 try: | 134 try: |
132 try: | 135 try: |
133 # make sure string is actually stored in UTF-8 | 136 # make sure string is actually stored in UTF-8 |
134 u = s.decode('UTF-8') | 137 u = s.decode('UTF-8') |
168 """ | 171 """ |
169 | 172 |
170 # can we do a lossless round-trip? | 173 # can we do a lossless round-trip? |
171 if isinstance(s, localstr): | 174 if isinstance(s, localstr): |
172 return s._utf8 | 175 return s._utf8 |
176 if isasciistr(s): | |
177 return s | |
173 | 178 |
174 try: | 179 try: |
175 u = s.decode(_sysstr(encoding), _sysstr(encodingmode)) | 180 u = s.decode(_sysstr(encoding), _sysstr(encodingmode)) |
176 return u.encode("utf-8") | 181 return u.encode("utf-8") |
177 except UnicodeDecodeError as inst: | 182 except UnicodeDecodeError as inst: |