Mercurial > public > mercurial-scm > hg-stable
comparison mercurial/encoding.py @ 37990:57b0c7221dba
encoding: fix toutf8b() to resurrect lossy characters even if "\xed" in it
If 's' is a localstr, 's._utf8' must be returned to get the original UTF-8
sequence back. Because of this, it was totally wrong to test if '"\xed" not
in s', which should be either '"\xed" not in s._utf8' or just omitted.
This patch moves the localstr handling to top as the validity of 's._utf8'
should be pre-checked by encoding.tolocal().
author | Yuya Nishihara <yuya@tcha.org> |
---|---|
date | Sun, 22 Apr 2018 11:38:53 +0900 |
parents | d4c760c997cd |
children | 3ea3c96ada54 |
comparison
equal
deleted
inserted
replaced
37989:bfe8ef6e370e | 37990:57b0c7221dba |
---|---|
502 arbitrary bytes into an internal Unicode format that can be | 502 arbitrary bytes into an internal Unicode format that can be |
503 re-encoded back into the original. Here we are exposing the | 503 re-encoded back into the original. Here we are exposing the |
504 internal surrogate encoding as a UTF-8 string.) | 504 internal surrogate encoding as a UTF-8 string.) |
505 ''' | 505 ''' |
506 | 506 |
507 if not isinstance(s, localstr) and isasciistr(s): | 507 if isinstance(s, localstr): |
508 # assume that the original UTF-8 sequence would never contain | |
509 # invalid characters in U+DCxx range | |
510 return s._utf8 | |
511 elif isasciistr(s): | |
508 return s | 512 return s |
509 if "\xed" not in s: | 513 if "\xed" not in s: |
510 if isinstance(s, localstr): | |
511 return s._utf8 | |
512 try: | 514 try: |
513 s.decode('utf-8', _utf8strict) | 515 s.decode('utf-8', _utf8strict) |
514 return s | 516 return s |
515 except UnicodeDecodeError: | 517 except UnicodeDecodeError: |
516 pass | 518 pass |