mercurial-scm/hg: mercurial/posix.py comparison

comparison mercurial/posix.py @ 26876:b8381832ce2b

posix: use getutf8char to handle OS X filename percent-escaping This replaces an open-coded utf-8 parser that was ignoring subtle issues like overlong encodings.

author	Matt Mackall <mpm@selenic.com>
date	Thu, 05 Nov 2015 17:09:00 -0600
parents	99b6afff09ae
children	8b2fbe3f59b1

comparison

equal deleted inserted replaced

-:cf47bdb2183c
+:b8381832ce2b
 try:
 u = path.decode('utf-8')
 except UnicodeDecodeError:
 # OS X percent-encodes any bytes that aren't valid utf-8
 s = ''
-g = ''
+pos = 0
-l = 0
+l = len(s)
-for c in path:
+while pos < l:
-o = ord(c)
+try:
-if l and o < 128 or o >= 192:
+c = encoding.getutf8char(path, pos)
-# we want a continuation byte, but didn't get one
+pos += len(c)
-s += ''.join(["%%%02X" % ord(x) for x in g])
+except ValueError:
-g = ''
+c = '%%%%02X' % path[pos]
-l = 0
+pos += 1
-if l == 0 and o < 128:
+s += c
-# ascii
-s += c
-elif l == 0 and 194 <= o < 245:
-# valid leading bytes
-if o < 224:
-l = 1
-elif o < 240:
-l = 2
-else:
-l = 3
-g = c
-elif l > 0 and 128 <= o < 192:
-# valid continuations
-g += c
-l -= 1
-if not l:
-s += g
-g = ''
-else:
-# invalid
-s += "%%%02X" % o
-# any remaining partial characters
-s += ''.join(["%%%02X" % ord(x) for x in g])
 u = s.decode('utf-8')
 # Decompose then lowercase (HFS+ technote specifies lower)
 enc = unicodedata.normalize('NFD', u).lower().encode('utf-8')
 # drop HFS+ ignored characters

Mercurial > public > mercurial-scm > hg

comparison mercurial/posix.py @ 26876:b8381832ce2b