comparison mercurial/revlogutils/deltas.py @ 39493:3ca144f1c8dd

snapshot: search for unrelated but reusable full-snapshot # New Strategy Step: Reusing Snapshot Outside Of Parents' Chain. If no suitable bases were found in the parent's chains, see if we could reuse a full snapshot not directly related to the current revision. Such search can be expensive, so we only search for snapshots appended to the revlog *after* the bases used by the parents of the current revision (the one we just tested). We assume the parent's bases were created because the previous snapshots were unsuitable, so there are low odds they would be useful now. This search gives a chance to reuse a delta chain unrelated to the current revision. Without this re-use, topological branches would keep reopening new full chains. Creating more and more snapshots as the repository grow. In repositories with many topological branches, the lack of delta reuse can create too many snapshots reducing overall compression to nothing. This results in a very large repository and other usability issues. For now, we still focus on creating level-1 snapshots. However, this principle will play a large part in how we avoid snapshot explosion once we have more snapshot levels. # Effects On The Test Repository In the test repository we created, we can see the beneficial effect of such reuse. We need very few level-0 snapshots and the overall revlog size has decreased. The `hg debugrevlog` call, show a "lvl-2" snapshot. It comes from the existing delta logic using the `prev` revision (revlog's tip) as the base. In this specific case, it turns out the tip was a level-1 snapshot. This is a coincidence that can be ignored. Finding and testing against all these unrelated snapshots can have a performance impact at write time. We currently focus on building good deltas chain we build. Performance concern will be dealt with later in another series.
author Boris Feld <boris.feld@octobus.net>
date Fri, 07 Sep 2018 11:17:30 -0400
parents a33f394b2bfd
children e72130f58f5d
comparison
equal deleted inserted replaced
39492:a33f394b2bfd 39493:3ca144f1c8dd
7 # GNU General Public License version 2 or any later version. 7 # GNU General Public License version 2 or any later version.
8 """Helper class to compute deltas stored inside revlogs""" 8 """Helper class to compute deltas stored inside revlogs"""
9 9
10 from __future__ import absolute_import 10 from __future__ import absolute_import
11 11
12 import collections
12 import heapq 13 import heapq
13 import struct 14 import struct
14 15
15 # import stuff from node for others to import from revlog 16 # import stuff from node for others to import from revlog
16 from ..node import ( 17 from ..node import (
605 # no delta for rawtext-changing revs (see "candelta" for why) 606 # no delta for rawtext-changing revs (see "candelta" for why)
606 if revlog.flags(rev) & REVIDX_RAWTEXT_CHANGING_FLAGS: 607 if revlog.flags(rev) & REVIDX_RAWTEXT_CHANGING_FLAGS:
607 continue 608 continue
608 group.append(rev) 609 group.append(rev)
609 if group: 610 if group:
611 # XXX: in the sparse revlog case, group can become large,
612 # impacting performances. Some bounding or slicing mecanism
613 # would help to reduce this impact.
610 yield tuple(group) 614 yield tuple(group)
615
616 def _findsnapshots(revlog, cache, start_rev):
617 """find snapshot from start_rev to tip"""
618 deltaparent = revlog.deltaparent
619 for rev in revlog.revs(start_rev):
620 if deltaparent(rev) == nullrev:
621 cache[nullrev].append(rev)
611 622
612 def _rawgroups(revlog, p1, p2, cachedelta): 623 def _rawgroups(revlog, p1, p2, cachedelta):
613 """Provides group of revision to be tested as delta base 624 """Provides group of revision to be tested as delta base
614 625
615 This lower level function focus on emitting delta theorically interresting 626 This lower level function focus on emitting delta theorically interresting
654 # a base for a new intermediate-snapshot 665 # a base for a new intermediate-snapshot
655 bases = [] 666 bases = []
656 for p in parents: 667 for p in parents:
657 bases.append(deltachain(p)[0]) 668 bases.append(deltachain(p)[0])
658 yield tuple(sorted(bases)) 669 yield tuple(sorted(bases))
670 # No suitable base found in the parent chain, search if any full
671 # snapshots emitted since parent's base would be a suitable base for an
672 # intermediate snapshot.
673 #
674 # It give a chance to reuse a delta chain unrelated to the current
675 # revisions instead of starting our own. Without such re-use,
676 # topological branches would keep reopening new full chains. Creating
677 # more and more snapshot as the repository grow.
678 snapfloor = min(bases) + 1
679 snapshots = collections.defaultdict(list)
680 _findsnapshots(revlog, snapshots, snapfloor)
681 yield tuple(snapshots[nullrev])
659 682
660 # other approach failed try against prev to hopefully save us a 683 # other approach failed try against prev to hopefully save us a
661 # fulltext. 684 # fulltext.
662 yield (prev,) 685 yield (prev,)
663 686