Mercurial > public > mercurial-scm > hg
comparison mercurial/revlogutils/deltas.py @ 39493:3ca144f1c8dd
snapshot: search for unrelated but reusable full-snapshot
# New Strategy Step: Reusing Snapshot Outside Of Parents' Chain.
If no suitable bases were found in the parent's chains, see if we could reuse
a full snapshot not directly related to the current revision. Such search can
be expensive, so we only search for snapshots appended to the revlog *after*
the bases used by the parents of the current revision (the one we just
tested). We assume the parent's bases were created because the previous
snapshots were unsuitable, so there are low odds they would be useful now.
This search gives a chance to reuse a delta chain unrelated to the current
revision. Without this re-use, topological branches would keep reopening new
full chains. Creating more and more snapshots as the repository grow.
In repositories with many topological branches, the lack of delta reuse can
create too many snapshots reducing overall compression to nothing. This
results in a very large repository and other usability issues.
For now, we still focus on creating level-1 snapshots. However, this principle
will play a large part in how we avoid snapshot explosion once we have more
snapshot levels.
# Effects On The Test Repository
In the test repository we created, we can see the beneficial effect of such
reuse. We need very few level-0 snapshots and the overall revlog size has
decreased.
The `hg debugrevlog` call, show a "lvl-2" snapshot. It comes from the existing
delta logic using the `prev` revision (revlog's tip) as the base. In this
specific case, it turns out the tip was a level-1 snapshot. This is a
coincidence that can be ignored.
Finding and testing against all these unrelated snapshots can have a
performance impact at write time. We currently focus on building good deltas
chain we build. Performance concern will be dealt with later in another
series.
author | Boris Feld <boris.feld@octobus.net> |
---|---|
date | Fri, 07 Sep 2018 11:17:30 -0400 |
parents | a33f394b2bfd |
children | e72130f58f5d |
comparison
equal
deleted
inserted
replaced
39492:a33f394b2bfd | 39493:3ca144f1c8dd |
---|---|
7 # GNU General Public License version 2 or any later version. | 7 # GNU General Public License version 2 or any later version. |
8 """Helper class to compute deltas stored inside revlogs""" | 8 """Helper class to compute deltas stored inside revlogs""" |
9 | 9 |
10 from __future__ import absolute_import | 10 from __future__ import absolute_import |
11 | 11 |
12 import collections | |
12 import heapq | 13 import heapq |
13 import struct | 14 import struct |
14 | 15 |
15 # import stuff from node for others to import from revlog | 16 # import stuff from node for others to import from revlog |
16 from ..node import ( | 17 from ..node import ( |
605 # no delta for rawtext-changing revs (see "candelta" for why) | 606 # no delta for rawtext-changing revs (see "candelta" for why) |
606 if revlog.flags(rev) & REVIDX_RAWTEXT_CHANGING_FLAGS: | 607 if revlog.flags(rev) & REVIDX_RAWTEXT_CHANGING_FLAGS: |
607 continue | 608 continue |
608 group.append(rev) | 609 group.append(rev) |
609 if group: | 610 if group: |
611 # XXX: in the sparse revlog case, group can become large, | |
612 # impacting performances. Some bounding or slicing mecanism | |
613 # would help to reduce this impact. | |
610 yield tuple(group) | 614 yield tuple(group) |
615 | |
616 def _findsnapshots(revlog, cache, start_rev): | |
617 """find snapshot from start_rev to tip""" | |
618 deltaparent = revlog.deltaparent | |
619 for rev in revlog.revs(start_rev): | |
620 if deltaparent(rev) == nullrev: | |
621 cache[nullrev].append(rev) | |
611 | 622 |
612 def _rawgroups(revlog, p1, p2, cachedelta): | 623 def _rawgroups(revlog, p1, p2, cachedelta): |
613 """Provides group of revision to be tested as delta base | 624 """Provides group of revision to be tested as delta base |
614 | 625 |
615 This lower level function focus on emitting delta theorically interresting | 626 This lower level function focus on emitting delta theorically interresting |
654 # a base for a new intermediate-snapshot | 665 # a base for a new intermediate-snapshot |
655 bases = [] | 666 bases = [] |
656 for p in parents: | 667 for p in parents: |
657 bases.append(deltachain(p)[0]) | 668 bases.append(deltachain(p)[0]) |
658 yield tuple(sorted(bases)) | 669 yield tuple(sorted(bases)) |
670 # No suitable base found in the parent chain, search if any full | |
671 # snapshots emitted since parent's base would be a suitable base for an | |
672 # intermediate snapshot. | |
673 # | |
674 # It give a chance to reuse a delta chain unrelated to the current | |
675 # revisions instead of starting our own. Without such re-use, | |
676 # topological branches would keep reopening new full chains. Creating | |
677 # more and more snapshot as the repository grow. | |
678 snapfloor = min(bases) + 1 | |
679 snapshots = collections.defaultdict(list) | |
680 _findsnapshots(revlog, snapshots, snapfloor) | |
681 yield tuple(snapshots[nullrev]) | |
659 | 682 |
660 # other approach failed try against prev to hopefully save us a | 683 # other approach failed try against prev to hopefully save us a |
661 # fulltext. | 684 # fulltext. |
662 yield (prev,) | 685 yield (prev,) |
663 | 686 |