comparison mercurial/encoding.py @ 13940:b7b26e54e37a stable

encoding: avoid localstr when a string can be encoded losslessly (issue2763) localstr's hash method exists to prevent bogus matching on lossy local encodings. For instance, we don't want 'caf?' to match 'caf?' in an ASCII locale. But when caf? can be losslessly encoded in the local charset, we can simply use a normal string and avoid the hashing trick. This avoids using localstr's hash method, which would prevent a match between
author Matt Mackall <mpm@selenic.com>
date Fri, 15 Apr 2011 23:45:41 -0500
parents 120eccaaa522
children e38846a79a23
comparison
equal deleted inserted replaced
13937:5f126c01ebfa 13940:b7b26e54e37a
93 """ 93 """
94 94
95 for e in ('UTF-8', fallbackencoding): 95 for e in ('UTF-8', fallbackencoding):
96 try: 96 try:
97 u = s.decode(e) # attempt strict decoding 97 u = s.decode(e) # attempt strict decoding
98 if e == 'UTF-8': 98 r = u.encode(encoding, "replace")
99 return localstr(s, u.encode(encoding, "replace")) 99 if u == r.decode(encoding):
100 # r is a safe, non-lossy encoding of s
101 return r
102 elif e == 'UTF-8':
103 return localstr(s, r)
100 else: 104 else:
101 return localstr(u.encode('UTF-8'), 105 return localstr(u.encode('UTF-8'), r)
102 u.encode(encoding, "replace")) 106
103 except LookupError, k: 107 except LookupError, k:
104 raise error.Abort("%s, please check your locale settings" % k) 108 raise error.Abort("%s, please check your locale settings" % k)
105 except UnicodeDecodeError: 109 except UnicodeDecodeError:
106 pass 110 pass
107 u = s.decode("utf-8", "replace") # last ditch 111 u = s.decode("utf-8", "replace") # last ditch