comparison mercurial/revlog.py @ 38644:43d0619cec90

revlog: enforce chunk slicing down to a certain size Limit maximum chunk size to 4x final size when reading a revision from a revlog. We only apply this logic when the target size is known from the revlog. Ideally, revlog's delta chain would be written in a way that does not trigger this extra slicing often. However, having this second guarantee that we won't read unexpectedly large amounts of memory in all cases is important for the future. Future delta chain building algorithms might have good reason to create delta chain with such characteristics. Including this code in core as soon as possible will make Mercurial 4.7 forward-compatible with such improvement.
author Boris Feld <boris.feld@octobus.net>
date Tue, 10 Jul 2018 12:20:57 +0200
parents 967fee55e8d9
children cd1c484e31e8
comparison
equal deleted inserted replaced
38643:967fee55e8d9 38644:43d0619cec90
1947 1947
1948 Returns a str holding uncompressed data for the requested revision. 1948 Returns a str holding uncompressed data for the requested revision.
1949 """ 1949 """
1950 return self.decompress(self._getsegmentforrevs(rev, rev, df=df)[1]) 1950 return self.decompress(self._getsegmentforrevs(rev, rev, df=df)[1])
1951 1951
1952 def _chunks(self, revs, df=None): 1952 def _chunks(self, revs, df=None, targetsize=None):
1953 """Obtain decompressed chunks for the specified revisions. 1953 """Obtain decompressed chunks for the specified revisions.
1954 1954
1955 Accepts an iterable of numeric revisions that are assumed to be in 1955 Accepts an iterable of numeric revisions that are assumed to be in
1956 ascending order. Also accepts an optional already-open file handle 1956 ascending order. Also accepts an optional already-open file handle
1957 to be used for reading. If used, the seek position of the file will 1957 to be used for reading. If used, the seek position of the file will
1974 ladd = l.append 1974 ladd = l.append
1975 1975
1976 if not self._withsparseread: 1976 if not self._withsparseread:
1977 slicedchunks = (revs,) 1977 slicedchunks = (revs,)
1978 else: 1978 else:
1979 slicedchunks = _slicechunk(self, revs) 1979 slicedchunks = _slicechunk(self, revs, targetsize)
1980 1980
1981 for revschunk in slicedchunks: 1981 for revschunk in slicedchunks:
1982 firstrev = revschunk[0] 1982 firstrev = revschunk[0]
1983 # Skip trailing revisions with empty diff 1983 # Skip trailing revisions with empty diff
1984 for lastrev in revschunk[::-1]: 1984 for lastrev in revschunk[::-1]:
2077 rawtext = self._cache[2] 2077 rawtext = self._cache[2]
2078 2078
2079 # drop cache to save memory 2079 # drop cache to save memory
2080 self._cache = None 2080 self._cache = None
2081 2081
2082 bins = self._chunks(chain, df=_df) 2082 targetsize = None
2083 rawsize = self.index[rev][2]
2084 if 0 <= rawsize:
2085 targetsize = 4 * rawsize
2086
2087 bins = self._chunks(chain, df=_df, targetsize=targetsize)
2083 if rawtext is None: 2088 if rawtext is None:
2084 rawtext = bytes(bins[0]) 2089 rawtext = bytes(bins[0])
2085 bins = bins[1:] 2090 bins = bins[1:]
2086 2091
2087 rawtext = mdiff.patches(rawtext, bins) 2092 rawtext = mdiff.patches(rawtext, bins)