Mercurial > public > mercurial-scm > hg
comparison mercurial/changegroup.py @ 39861:db5501d93bcf
changegroup: remove reordering control (BC)
This logic - including the experimental bundle.reorder option -
was originally added in a8e3931e3fb5 in 2011 and then later ported
to changegroup.py.
The intent of this option and associated logic is to control
the ordering of revisions in deltagroups in changegroups. At the
time it was implemented, only changegroup version 1 existed
and generaldelta revlogs were just coming into the world. Changegroup
version 1 requires that deltas be made against the last revision
sent over the wire. Used with generaldelta, this created an
impedance mismatch of sorts and resulted in changegroup producers
spending a lot of time recomputing deltas.
Revision reordering was introduced so outgoing revisions would be
sent in "generaldelta order" and producers would be able to
reuse internal deltas from storage.
Later on, we introduced changegroup version 2. It supported denoting
which revision a delta was against. So we no longer needed to
sort outgoing revisions to ensure optimal delta generation from the
producer. So, subsequent changegroup versions disabled reordering.
We also later made the changelog not store deltas by default. And
we also made the changelog send out deltas in storage order. Why we
do this for changelog, I'm not sure. Maybe we want to preserve revision
order across clones? It doesn't really matter for this commit.
Fast forward to 2018. We want to abstract storage backends. And having
changegroup code require knowledge about how deltas are stored
internally interferes with that goal.
This commit removes reordering control from changegroup generation.
After this commit, the reordering behavior is:
* The changelog is always sent out in storage order (no behavior
change).
* Non-changelog generaldelta revlogs are reordered to always be in DAG
topological order (previously, generaldelta revlogs would be emitted
in storage order for version 2 and 3 changegroups).
* Non-changelog non-generaldelta revlogs are sent in storage order (no
behavior change).
* There exists no config option to override behavior.
The big difference here is that generaldelta revlogs now *always* have
their revisions sorted in DAG order before going out over the wire. This
behavior was previously only done for changegroup version 1. Version 2
and version 3 changegroups disabled reordering because the interchange
format supported encoding arbitrary delta parents, so reordering wasn't
strictly necessary.
I can think of a few significant implications for this change.
Because changegroup receivers will now see non-changelog revisions
in DAG order instead of storage order, the internal storage order of
manifests and files may differ substantially between producer and
consumer. I don't think this matters that much, since the storage
order of manifests and files is largely hidden from users. Only
the storage order of changelog matters (because `hg log` shows the
changelog in storage order). I don't think there should be any
controversy here.
The reordering of revisions has implications for changegroup producers.
Previously, generaldelta revlogs would be emitted in storage order.
And in the common case, the internally-stored delta could effectively
be copied from disk into the deltagroup delta. This meant that emitting
delta groups for generaldelta revlogs would be mostly linear read I/O.
This is desirable for performance. With us now reordering generaldelta
revlog revisions in DAG order, the read operations may use more random
I/O instead of sequential I/O. This could result in performance
loss. But with the prevalence of SSDs and fast random I/O, I'm not
too worried. (Note: the optimal emission order for revlogs is actually
delta encoding order. But the changegroup code wasn't doing that before
or after this change. We could potentially implement that in a later
commit.)
Changegroups in DAG order will have implications for receivers.
Previously, receiving storage order might mean seeing a number of
interleaved branches. This would mean long delta chains, sparse
I/O, and possibly more fulltext revisions instead of deltas, blowing
up storage storage. (This is the same set of problems that sparse
revlogs aims to address.) With the producer now sending revisions in DAG
order, the receiver also stores revisions in DAG order. That means
revisions for the same DAG branch are all grouped together. And this
should yield better storage outcomes. In other words, sending the
reordered changegroup allows the receiver to have better storage
order and for the producer to not propagate its (possibly sub-optimal)
internal storage order.
On the mozilla-unified repository, this change influences bundle
generation:
$ hg bundle -t none-v2 -a
before: time: real 355.680 secs (user 256.790+0.000 sys 16.820+0.000)
after: time: real 382.950 secs (user 281.700+0.000 sys 17.690+0.000)
before: 7,150,228,967 bytes (uncompressed)
after: 7,041,556,273 bytes (uncompressed)
before: 1,669,063,234 bytes (zstd l=3)
after: 1,628,598,830 bytes (zstd l=3)
$ hg unbundle
before: time: real 511.910 secs (user 466.750+0.000 sys 32.680+0.000)
after: time: real 487.790 secs (user 443.940+0.000 sys 30.840+0.000)
00manifest.d size:
source: 274,924,292 bytes
before: 304,741,626 bytes
after: 245,252,087 bytes
.hg/store total file size:
source: 2,649,133,490
before: 2,680,888,130
after: 2,627,875,673
We see the bundle size drop. That's probably because if a revlog
internally isn't storing a delta, it will choose to delta against
the last emitted revision. And on repos with interleaved branches
(like mozilla-unified), the previous revision could be an unrelated
branch and therefore be a large delta. But with this patch, the
previous revision is likely p1 or p2 and a delta should be small.
We also see the manifest size drop by ~50 MB. It's worth noting that
the manifest actually *increased* in size by ~25 MB in the old
strategy and decreased ~25 MB from its source in the new strategy.
Again, my explanation for this is that the DAG ordering in the
changegroup is resulting in better grouping of revisions in the
receiver, which results in more compact delta chains and higher
storage efficiency.
Unbundle time also dropped. I suspect this is due to the revlog having
to work less to compute deltas since the incoming deltas are more
optimal. i.e. the receiver spends less time resolving fulltext
revisions as incoming deltas bounce around between DAG branches and
delta chains.
We also see bundle generation time increase. This is not desirable.
However, the regression is only significant on the original repository:
if we generate a bundle from the repository created from the new,
always reordered bundles, we're close to baseline (if not at it with
expected noise):
$ hg bundle -t none-v2 -a
before (original): time: real 355.680 secs (user 256.790+0.000 sys 16.820+0.000)
after (original): time: real 382.950 secs (user 281.700+0.000 sys 17.690+0.000)
after (new repo): time: real 362.280 secs (user 260.300+0.000 sys 17.700+0.000)
This regression is a bit worrying because it will impact serving
canonical repositories (that don't have optimal internal storage
unless they are reordered - possibly as part of running
`hg debugupgraderepo`). However, this regression will only be
noticed by very large changegroups. And I'm guessing/hoping that
any repository that large is using clonebundles to mitigate server
load.
Again, sending DAG order isn't the optimal send order for servers:
sending in storage-delta order is. But in order to enable
storage-optimal send order, we'll need a storage API that handles
sorting. Future commits will introduce such an API.
Differential Revision: https://phab.mercurial-scm.org/D4721
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Mon, 24 Sep 2018 09:41:42 -0700 |
parents | 5adc5fe41a7d |
children | 31b7e8e7132e |
comparison
equal
deleted
inserted
replaced
39860:d9b3cc3d5d07 | 39861:db5501d93bcf |
---|---|
34 util, | 34 util, |
35 ) | 35 ) |
36 | 36 |
37 from .utils import ( | 37 from .utils import ( |
38 interfaceutil, | 38 interfaceutil, |
39 stringutil, | |
40 ) | 39 ) |
41 | 40 |
42 _CHANGEGROUPV1_DELTA_HEADER = struct.Struct("20s20s20s20s") | 41 _CHANGEGROUPV1_DELTA_HEADER = struct.Struct("20s20s20s20s") |
43 _CHANGEGROUPV2_DELTA_HEADER = struct.Struct("20s20s20s20s20s") | 42 _CHANGEGROUPV2_DELTA_HEADER = struct.Struct("20s20s20s20s20s") |
44 _CHANGEGROUPV3_DELTA_HEADER = struct.Struct(">20s20s20s20s20sH") | 43 _CHANGEGROUPV3_DELTA_HEADER = struct.Struct(">20s20s20s20s20sH") |
535 yield meta | 534 yield meta |
536 if prefix: | 535 if prefix: |
537 yield prefix | 536 yield prefix |
538 yield data | 537 yield data |
539 | 538 |
540 def _sortnodesnormal(store, nodes, reorder): | 539 def _sortnodesnormal(store, nodes): |
541 """Sort nodes for changegroup generation and turn into revnums.""" | 540 """Sort nodes for changegroup generation and turn into revnums.""" |
542 # for generaldelta revlogs, we linearize the revs; this will both be | 541 # for generaldelta revlogs, we linearize the revs; this will both be |
543 # much quicker and generate a much smaller bundle | 542 # much quicker and generate a much smaller bundle |
544 if (store._generaldelta and reorder is None) or reorder: | 543 if store._generaldelta: |
545 revs = set(store.rev(n) for n in nodes) | 544 revs = set(store.rev(n) for n in nodes) |
546 return dagop.linearize(revs, store.parentrevs) | 545 return dagop.linearize(revs, store.parentrevs) |
547 else: | 546 else: |
548 return sorted([store.rev(n) for n in nodes]) | 547 return sorted([store.rev(n) for n in nodes]) |
549 | 548 |
654 basenode=nullid, | 653 basenode=nullid, |
655 ellipsis=True, | 654 ellipsis=True, |
656 ) | 655 ) |
657 | 656 |
658 def deltagroup(repo, store, nodes, ischangelog, lookup, forcedeltaparentprev, | 657 def deltagroup(repo, store, nodes, ischangelog, lookup, forcedeltaparentprev, |
659 allowreorder, | |
660 topic=None, | 658 topic=None, |
661 ellipses=False, clrevtolocalrev=None, fullclnodes=None, | 659 ellipses=False, clrevtolocalrev=None, fullclnodes=None, |
662 precomputedellipsis=None): | 660 precomputedellipsis=None): |
663 """Calculate deltas for a set of revisions. | 661 """Calculate deltas for a set of revisions. |
664 | 662 |
689 # store producing the deltas. | 687 # store producing the deltas. |
690 revs = sorted(cl.rev(n) for n in nodes) | 688 revs = sorted(cl.rev(n) for n in nodes) |
691 elif ellipses: | 689 elif ellipses: |
692 revs = _sortnodesellipsis(store, nodes, cl, lookup) | 690 revs = _sortnodesellipsis(store, nodes, cl, lookup) |
693 else: | 691 else: |
694 revs = _sortnodesnormal(store, nodes, allowreorder) | 692 revs = _sortnodesnormal(store, nodes) |
695 | 693 |
696 # In the first pass, collect info about the deltas we'll be | 694 # In the first pass, collect info about the deltas we'll be |
697 # generating. | 695 # generating. |
698 requests = [] | 696 requests = [] |
699 | 697 |
754 | 752 |
755 if progress: | 753 if progress: |
756 progress.complete() | 754 progress.complete() |
757 | 755 |
758 class cgpacker(object): | 756 class cgpacker(object): |
759 def __init__(self, repo, filematcher, version, allowreorder, | 757 def __init__(self, repo, filematcher, version, |
760 builddeltaheader, manifestsend, | 758 builddeltaheader, manifestsend, |
761 forcedeltaparentprev=False, | 759 forcedeltaparentprev=False, |
762 bundlecaps=None, ellipses=False, | 760 bundlecaps=None, ellipses=False, |
763 shallow=False, ellipsisroots=None, fullnodes=None): | 761 shallow=False, ellipsisroots=None, fullnodes=None): |
764 """Given a source repo, construct a bundler. | 762 """Given a source repo, construct a bundler. |
765 | 763 |
766 filematcher is a matcher that matches on files to include in the | 764 filematcher is a matcher that matches on files to include in the |
767 changegroup. Used to facilitate sparse changegroups. | 765 changegroup. Used to facilitate sparse changegroups. |
768 | |
769 allowreorder controls whether reordering of revisions is allowed. | |
770 This value is used when ``bundle.reorder`` is ``auto`` or isn't | |
771 set. | |
772 | 766 |
773 forcedeltaparentprev indicates whether delta parents must be against | 767 forcedeltaparentprev indicates whether delta parents must be against |
774 the previous revision in a delta group. This should only be used for | 768 the previous revision in a delta group. This should only be used for |
775 compatibility with changegroup version 1. | 769 compatibility with changegroup version 1. |
776 | 770 |
811 self._fullclnodes = fullnodes | 805 self._fullclnodes = fullnodes |
812 | 806 |
813 # Maps ellipsis revs to their roots at the changelog level. | 807 # Maps ellipsis revs to their roots at the changelog level. |
814 self._precomputedellipsis = ellipsisroots | 808 self._precomputedellipsis = ellipsisroots |
815 | 809 |
816 # experimental config: bundle.reorder | |
817 reorder = repo.ui.config('bundle', 'reorder') | |
818 if reorder == 'auto': | |
819 self._reorder = allowreorder | |
820 else: | |
821 self._reorder = stringutil.parsebool(reorder) | |
822 | |
823 self._repo = repo | 810 self._repo = repo |
824 | 811 |
825 if self._repo.ui.verbose and not self._repo.ui.debugflag: | 812 if self._repo.ui.verbose and not self._repo.ui.debugflag: |
826 self._verbosenote = self._repo.ui.note | 813 self._verbosenote = self._repo.ui.note |
827 else: | 814 else: |
860 # We need to make sure that the linkrev in the changegroup refers to | 847 # We need to make sure that the linkrev in the changegroup refers to |
861 # the first changeset that introduced the manifest or file revision. | 848 # the first changeset that introduced the manifest or file revision. |
862 # The fastpath is usually safer than the slowpath, because the filelogs | 849 # The fastpath is usually safer than the slowpath, because the filelogs |
863 # are walked in revlog order. | 850 # are walked in revlog order. |
864 # | 851 # |
865 # When taking the slowpath with reorder=None and the manifest revlog | 852 # When taking the slowpath when the manifest revlog uses generaldelta, |
866 # uses generaldelta, the manifest may be walked in the "wrong" order. | 853 # the manifest may be walked in the "wrong" order. Without 'clrevorder', |
867 # Without 'clrevorder', we would get an incorrect linkrev (see fix in | 854 # we would get an incorrect linkrev (see fix in cc0ff93d0c0c). |
868 # cc0ff93d0c0c). | |
869 # | 855 # |
870 # When taking the fastpath, we are only vulnerable to reordering | 856 # When taking the fastpath, we are only vulnerable to reordering |
871 # of the changelog itself. The changelog never uses generaldelta, so | 857 # of the changelog itself. The changelog never uses generaldelta and is |
872 # it is only reordered when reorder=True. To handle this case, we | 858 # never reordered. To handle this case, we simply take the slowpath, |
873 # simply take the slowpath, which already has the 'clrevorder' logic. | 859 # which already has the 'clrevorder' logic. This was also fixed in |
874 # This was also fixed in cc0ff93d0c0c. | 860 # cc0ff93d0c0c. |
875 fastpathlinkrev = fastpathlinkrev and not self._reorder | 861 |
876 # Treemanifests don't work correctly with fastpathlinkrev | 862 # Treemanifests don't work correctly with fastpathlinkrev |
877 # either, because we don't discover which directory nodes to | 863 # either, because we don't discover which directory nodes to |
878 # send along with files. This could probably be fixed. | 864 # send along with files. This could probably be fixed. |
879 fastpathlinkrev = fastpathlinkrev and ( | 865 fastpathlinkrev = fastpathlinkrev and ( |
880 'treemanifest' not in repo.requirements) | 866 'treemanifest' not in repo.requirements) |
1001 } | 987 } |
1002 | 988 |
1003 gen = deltagroup( | 989 gen = deltagroup( |
1004 self._repo, cl, nodes, True, lookupcl, | 990 self._repo, cl, nodes, True, lookupcl, |
1005 self._forcedeltaparentprev, | 991 self._forcedeltaparentprev, |
1006 # Reorder settings are currently ignored for changelog. | |
1007 True, | |
1008 ellipses=self._ellipses, | 992 ellipses=self._ellipses, |
1009 topic=_('changesets'), | 993 topic=_('changesets'), |
1010 clrevtolocalrev={}, | 994 clrevtolocalrev={}, |
1011 fullclnodes=self._fullclnodes, | 995 fullclnodes=self._fullclnodes, |
1012 precomputedellipsis=self._precomputedellipsis) | 996 precomputedellipsis=self._precomputedellipsis) |
1085 | 1069 |
1086 lookupfn = makelookupmflinknode(tree, nodes) | 1070 lookupfn = makelookupmflinknode(tree, nodes) |
1087 | 1071 |
1088 deltas = deltagroup( | 1072 deltas = deltagroup( |
1089 self._repo, store, prunednodes, False, lookupfn, | 1073 self._repo, store, prunednodes, False, lookupfn, |
1090 self._forcedeltaparentprev, self._reorder, | 1074 self._forcedeltaparentprev, |
1091 ellipses=self._ellipses, | 1075 ellipses=self._ellipses, |
1092 topic=_('manifests'), | 1076 topic=_('manifests'), |
1093 clrevtolocalrev=clrevtolocalrev, | 1077 clrevtolocalrev=clrevtolocalrev, |
1094 fullclnodes=self._fullclnodes, | 1078 fullclnodes=self._fullclnodes, |
1095 precomputedellipsis=self._precomputedellipsis) | 1079 precomputedellipsis=self._precomputedellipsis) |
1182 | 1166 |
1183 progress.update(i + 1, item=fname) | 1167 progress.update(i + 1, item=fname) |
1184 | 1168 |
1185 deltas = deltagroup( | 1169 deltas = deltagroup( |
1186 self._repo, filerevlog, filenodes, False, lookupfilelog, | 1170 self._repo, filerevlog, filenodes, False, lookupfilelog, |
1187 self._forcedeltaparentprev, self._reorder, | 1171 self._forcedeltaparentprev, |
1188 ellipses=self._ellipses, | 1172 ellipses=self._ellipses, |
1189 clrevtolocalrev=clrevtolocalrev, | 1173 clrevtolocalrev=clrevtolocalrev, |
1190 fullclnodes=self._fullclnodes, | 1174 fullclnodes=self._fullclnodes, |
1191 precomputedellipsis=self._precomputedellipsis) | 1175 precomputedellipsis=self._precomputedellipsis) |
1192 | 1176 |
1198 shallow=False, ellipsisroots=None, fullnodes=None): | 1182 shallow=False, ellipsisroots=None, fullnodes=None): |
1199 builddeltaheader = lambda d: _CHANGEGROUPV1_DELTA_HEADER.pack( | 1183 builddeltaheader = lambda d: _CHANGEGROUPV1_DELTA_HEADER.pack( |
1200 d.node, d.p1node, d.p2node, d.linknode) | 1184 d.node, d.p1node, d.p2node, d.linknode) |
1201 | 1185 |
1202 return cgpacker(repo, filematcher, b'01', | 1186 return cgpacker(repo, filematcher, b'01', |
1203 allowreorder=None, | |
1204 builddeltaheader=builddeltaheader, | 1187 builddeltaheader=builddeltaheader, |
1205 manifestsend=b'', | 1188 manifestsend=b'', |
1206 forcedeltaparentprev=True, | 1189 forcedeltaparentprev=True, |
1207 bundlecaps=bundlecaps, | 1190 bundlecaps=bundlecaps, |
1208 ellipses=ellipses, | 1191 ellipses=ellipses, |
1213 def _makecg2packer(repo, filematcher, bundlecaps, ellipses=False, | 1196 def _makecg2packer(repo, filematcher, bundlecaps, ellipses=False, |
1214 shallow=False, ellipsisroots=None, fullnodes=None): | 1197 shallow=False, ellipsisroots=None, fullnodes=None): |
1215 builddeltaheader = lambda d: _CHANGEGROUPV2_DELTA_HEADER.pack( | 1198 builddeltaheader = lambda d: _CHANGEGROUPV2_DELTA_HEADER.pack( |
1216 d.node, d.p1node, d.p2node, d.basenode, d.linknode) | 1199 d.node, d.p1node, d.p2node, d.basenode, d.linknode) |
1217 | 1200 |
1218 # Since generaldelta is directly supported by cg2, reordering | |
1219 # generally doesn't help, so we disable it by default (treating | |
1220 # bundle.reorder=auto just like bundle.reorder=False). | |
1221 return cgpacker(repo, filematcher, b'02', | 1201 return cgpacker(repo, filematcher, b'02', |
1222 allowreorder=False, | |
1223 builddeltaheader=builddeltaheader, | 1202 builddeltaheader=builddeltaheader, |
1224 manifestsend=b'', | 1203 manifestsend=b'', |
1225 bundlecaps=bundlecaps, | 1204 bundlecaps=bundlecaps, |
1226 ellipses=ellipses, | 1205 ellipses=ellipses, |
1227 shallow=shallow, | 1206 shallow=shallow, |
1232 shallow=False, ellipsisroots=None, fullnodes=None): | 1211 shallow=False, ellipsisroots=None, fullnodes=None): |
1233 builddeltaheader = lambda d: _CHANGEGROUPV3_DELTA_HEADER.pack( | 1212 builddeltaheader = lambda d: _CHANGEGROUPV3_DELTA_HEADER.pack( |
1234 d.node, d.p1node, d.p2node, d.basenode, d.linknode, d.flags) | 1213 d.node, d.p1node, d.p2node, d.basenode, d.linknode, d.flags) |
1235 | 1214 |
1236 return cgpacker(repo, filematcher, b'03', | 1215 return cgpacker(repo, filematcher, b'03', |
1237 allowreorder=False, | |
1238 builddeltaheader=builddeltaheader, | 1216 builddeltaheader=builddeltaheader, |
1239 manifestsend=closechunk(), | 1217 manifestsend=closechunk(), |
1240 bundlecaps=bundlecaps, | 1218 bundlecaps=bundlecaps, |
1241 ellipses=ellipses, | 1219 ellipses=ellipses, |
1242 shallow=shallow, | 1220 shallow=shallow, |