changeset 42479:a0b26fc8fbba

deltas: skip if projected delta size does not match text size constraint Before computing any delta, we get a basic estimation of the delta size we can expect and the resulted compressed value. We then checks this projected size against the ?? size constraints. This allows to exclude potential base candidates before doing any expensive computation. This only apply to the intermediate-snapshot case since this constraint only apply to them. In practice we only perform this new checks for the manifestlog. Manifest log combine two property: it is likely to have delta chain issue and its diffing/compression is fairly predictable. The initial author of this changeset is Valentin Gatien-Baron providing the initial idea and initial testing, Pierre-Yves David later consolidated the code in the right location and run more extensive testing.
author Valentin Gatien-Baron <vgatien-baron@janestreet.com>, Pierre-Yves David <pierre-yves.david@octobus.net>
date Thu, 25 Apr 2019 22:30:14 +0200
parents bc4373babd04
children 66c27df1be84
files mercurial/revlogutils/deltas.py
diffstat 1 files changed, 19 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/mercurial/revlogutils/deltas.py	Fri Apr 26 00:28:22 2019 +0200
+++ b/mercurial/revlogutils/deltas.py	Thu Apr 25 22:30:14 2019 +0200
@@ -679,6 +679,25 @@
             # if chain already have too much data, skip base
             if deltas_limit < chainsize:
                 continue
+            if sparse and revlog.upperboundcomp is not None:
+                maxcomp = revlog.upperboundcomp
+                basenotsnap = (p1, p2, nullrev)
+                if rev not in basenotsnap and revlog.issnapshot(rev):
+                    snapshotdepth = revlog.snapshotdepth(rev)
+                    # If text is significantly larger than the base, we can
+                    # expect the resulting delta to be proportional to the size
+                    # difference
+                    revsize = revlog.rawsize(rev)
+                    rawsizedistance = max(textlen - revsize, 0)
+                    # use an estimate of the compression upper bound.
+                    lowestrealisticdeltalen = rawsizedistance // maxcomp
+
+                    # check the absolute constraint on the delta size
+                    snapshotlimit = textlen >> snapshotdepth
+                    if snapshotlimit < lowestrealisticdeltalen:
+                        # delta lower bound is larger than accepted upper bound
+                        continue
+
             group.append(rev)
         if group:
             # XXX: in the sparse revlog case, group can become large,