annotate mercurial/py3kcompat.py @ 26117:4dc5b51f38fe

revlog: change generaldelta delta parent heuristic The old generaldelta heuristic was "if p1 (or p2) was closer than the last full text, use it, otherwise use prev". This was problematic when a repo contained multiple branches that were very different. If commits to branch A were pushed, and the last full text was branch B, it would generate a fulltext. Then if branch B was pushed, it would generate another fulltext. The problem is that the last fulltext (and delta'ing against `prev` in general) has no correlation with the contents of the incoming revision, and therefore will always have degenerate cases. According to the blame, that algorithm was chosen to minimize the chain length. Since there is already code that protects against that (the delta-vs-fulltext code), and since it has been improved since the original generaldelta algorithm went in (2011), I believe the chain length criteria will still be preserved. The new algorithm always diffs against p1 (or p2 if it's closer), unless the resulting delta will fail the delta-vs-fulltext check, in which case we delta against prev. Some before and after stats on manifest.d size. internal large repo old heuristic - 2.0 GB new heuristic - 1.2 GB mozilla-central old heuristic - 242 MB new heuristic - 261 MB The regression in mozilla central is due to the new heuristic choosing p2r as the delta when it's closer to the tip. Switching the algorithm to always prefer p1r brings the size back down (242 MB). This is result of the way in which mozilla does merges and pushes, and the result could easily swing the other direction in other repos (depending on if they merge X into Y or Y into X), but will never be as degenerate as before. I future patch will address the regression by introducing an optional, even more aggressive delta heuristic which will knock the mozilla manifest size down dramatically.
author Durham Goode <durham@fb.com>
date Sun, 30 Aug 2015 13:58:11 -0700
parents a7a9d84f5e4a
children 5bfd01a3c2a9
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
11748
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
1 # py3kcompat.py - compatibility definitions for running hg in py3k
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
2 #
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
3 # Copyright 2010 Renato Cunha <renatoc@gmail.com>
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
4 #
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
6 # GNU General Public License version 2 or any later version.
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
7
21292
a7a9d84f5e4a py3kcompat: drop unused export
Matt Mackall <mpm@selenic.com>
parents: 21291
diff changeset
8 import builtins
11748
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
9
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
10 from numbers import Number
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
11
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
12 def bytesformatter(format, args):
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
13 '''Custom implementation of a formatter for bytestrings.
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
14
17424
e7cfe3587ea4 fix trivial spelling errors
Mads Kiilerich <mads@kiilerich.com>
parents: 11878
diff changeset
15 This function currently relies on the string formatter to do the
11748
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
16 formatting and always returns bytes objects.
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
17
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
18 >>> bytesformatter(20, 10)
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
19 0
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
20 >>> bytesformatter('unicode %s, %s!', ('string', 'foo'))
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
21 b'unicode string, foo!'
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
22 >>> bytesformatter(b'test %s', 'me')
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
23 b'test me'
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
24 >>> bytesformatter('test %s', 'me')
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
25 b'test me'
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
26 >>> bytesformatter(b'test %s', b'me')
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
27 b'test me'
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
28 >>> bytesformatter('test %s', b'me')
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
29 b'test me'
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
30 >>> bytesformatter('test %d: %s', (1, b'result'))
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
31 b'test 1: result'
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
32 '''
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
33 # The current implementation just converts from bytes to unicode, do
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
34 # what's needed and then convert the results back to bytes.
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
35 # Another alternative is to use the Python C API implementation.
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
36 if isinstance(format, Number):
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
37 # If the fixer erroneously passes a number remainder operation to
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
38 # bytesformatter, we just return the correct operation
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
39 return format % args
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
40 if isinstance(format, bytes):
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
41 format = format.decode('utf-8', 'surrogateescape')
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
42 if isinstance(args, bytes):
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
43 args = args.decode('utf-8', 'surrogateescape')
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
44 if isinstance(args, tuple):
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
45 newargs = []
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
46 for arg in args:
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
47 if isinstance(arg, bytes):
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
48 arg = arg.decode('utf-8', 'surrogateescape')
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
49 newargs.append(arg)
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
50 args = tuple(newargs)
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
51 ret = format % args
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
52 return ret.encode('utf-8', 'surrogateescape')
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
53 builtins.bytesformatter = bytesformatter
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
54
11878
8bb1481cf08f py3kcompat: added fake ord implementation for py3k
Renato Cunha <renatoc@gmail.com>
parents: 11748
diff changeset
55 origord = builtins.ord
8bb1481cf08f py3kcompat: added fake ord implementation for py3k
Renato Cunha <renatoc@gmail.com>
parents: 11748
diff changeset
56 def fakeord(char):
8bb1481cf08f py3kcompat: added fake ord implementation for py3k
Renato Cunha <renatoc@gmail.com>
parents: 11748
diff changeset
57 if isinstance(char, int):
8bb1481cf08f py3kcompat: added fake ord implementation for py3k
Renato Cunha <renatoc@gmail.com>
parents: 11748
diff changeset
58 return char
8bb1481cf08f py3kcompat: added fake ord implementation for py3k
Renato Cunha <renatoc@gmail.com>
parents: 11748
diff changeset
59 return origord(char)
8bb1481cf08f py3kcompat: added fake ord implementation for py3k
Renato Cunha <renatoc@gmail.com>
parents: 11748
diff changeset
60 builtins.ord = fakeord
8bb1481cf08f py3kcompat: added fake ord implementation for py3k
Renato Cunha <renatoc@gmail.com>
parents: 11748
diff changeset
61
11748
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
62 if __name__ == '__main__':
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
63 import doctest
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
64 doctest.testmod()
37a70a784397 py3kcompat: added a "compatibility layer" for py3k
Renato Cunha <renatoc@gmail.com>
parents:
diff changeset
65