annotate mercurial/setdiscovery.py @ 23814:6a5877a73141

setdiscovery: drop the 'always' argument to '_updatesample' This argument exists because of the complex code flow in '_takequicksample'. It first gets the list of heads and then calls '_updatesample' on an empty initial sample and a size limit matching the differences between the number of heads and the target sample size. Finally the heads and the sample from '_updatesample' were added. To ensure this addition result had the exact target length, the code had to ensure no elements from the heads were added to the '_updatesample' content and therefore was passing this "always included set of heads". Instead we can just update the initial heads sample directly and use the final target size as target size for the update. This removes the need for this 'always' parameter to the '_updatesample' function The test are affected because different set building order results in different random sampling.
author Pierre-Yves David <pierre-yves.david@fb.com>
date Wed, 07 Jan 2015 10:32:17 -0800
parents 932f814bf016
children 31e75a362d44
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
1 # setdiscovery.py - improved discovery of common nodeset for mercurial
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
2 #
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
3 # Copyright 2010 Benoit Boissinot <bboissin@gmail.com>
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
4 # and Peter Arrenbrecht <peter@arrenbrecht.ch>
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
5 #
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
6 # This software may be used and distributed according to the terms of the
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
7 # GNU General Public License version 2 or any later version.
20656
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
8 """
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
9 Algorithm works in the following way. You have two repository: local and
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
10 remote. They both contains a DAG of changelists.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
11
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
12 The goal of the discovery protocol is to find one set of node *common*,
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
13 the set of nodes shared by local and remote.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
14
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
15 One of the issue with the original protocol was latency, it could
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
16 potentially require lots of roundtrips to discover that the local repo was a
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
17 subset of remote (which is a very common case, you usually have few changes
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
18 compared to upstream, while upstream probably had lots of development).
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
19
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
20 The new protocol only requires one interface for the remote repo: `known()`,
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
21 which given a set of changelists tells you if they are present in the DAG.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
22
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
23 The algorithm then works as follow:
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
24
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
25 - We will be using three sets, `common`, `missing`, `unknown`. Originally
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
26 all nodes are in `unknown`.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
27 - Take a sample from `unknown`, call `remote.known(sample)`
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
28 - For each node that remote knows, move it and all its ancestors to `common`
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
29 - For each node that remote doesn't know, move it and all its descendants
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
30 to `missing`
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
31 - Iterate until `unknown` is empty
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
32
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
33 There are a couple optimizations, first is instead of starting with a random
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
34 sample of missing, start by sending all heads, in the case where the local
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
35 repo is a subset, you computed the answer in one round trip.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
36
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
37 Then you can do something similar to the bisecting strategy used when
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
38 finding faulty changesets. Instead of random samples, you can try picking
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
39 nodes that will maximize the number of nodes that will be
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
40 classified with it (since all ancestors or descendants will be marked as well).
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
41 """
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
42
23343
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
43 from node import nullid, nullrev
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
44 from i18n import _
20034
1e5b38a919dd cleanup: move stdlib imports to their own import statement
Augie Fackler <raf@durin42.com>
parents: 17426
diff changeset
45 import random
1e5b38a919dd cleanup: move stdlib imports to their own import statement
Augie Fackler <raf@durin42.com>
parents: 17426
diff changeset
46 import util, dagutil
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
47
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
48 def _updatesample(dag, nodes, sample, quicksamplesize=0):
23809
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
49 """update an existing sample to match the expected size
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
50
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
51 The sample is updated with nodes exponentially distant from each head of the
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
52 <nodes> set. (H~1, H~2, H~4, H~8, etc).
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
53
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
54 If a target size is specified, the sampling will stop once this size is
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
55 reached. Otherwise sampling will happen until roots of the <nodes> set are
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
56 reached.
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
57
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
58 :dag: a dag object from dagutil
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
59 :nodes: set of nodes we want to discover (if None, assume the whole dag)
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
60 :sample: a sample to update
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
61 :quicksamplesize: optional target size of the sample"""
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
62 # if nodes is empty we scan the entire graph
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
63 if nodes:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
64 heads = dag.headsetofconnecteds(nodes)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
65 else:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
66 heads = dag.heads()
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
67 dist = {}
16834
cafd8a8fb713 util: subclass deque for Python 2.4 backwards compatibility
Bryan O'Sullivan <bryano@fb.com>
parents: 16683
diff changeset
68 visit = util.deque(heads)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
69 seen = set()
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
70 factor = 1
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
71 while visit:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
72 curr = visit.popleft()
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
73 if curr in seen:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
74 continue
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
75 d = dist.setdefault(curr, 1)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
76 if d > factor:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
77 factor *= 2
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
78 if d == factor:
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
79 sample.add(curr)
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
80 if quicksamplesize and (len(sample) >= quicksamplesize):
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
81 return
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
82 seen.add(curr)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
83 for p in dag.parents(curr):
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
84 if not nodes or p in nodes:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
85 dist.setdefault(p, d + 1)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
86 visit.append(p)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
87
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
88 def _setupsample(dag, nodes, size):
15063
c20688b7c061 setdiscovery: fix hang when #heads>200 (issue2971)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14981
diff changeset
89 always = dag.headsetofconnecteds(nodes)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
90 desiredlen = size - len(always)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
91 if desiredlen <= 0:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
92 # This could be bad if there are very many heads, all unknown to the
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
93 # server. We're counting on long request support here.
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
94 return always, None, desiredlen
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
95 return always, set(), desiredlen
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
96
23806
d6cbbe3baef0 setdiscovery: drop unused 'initial' argument for '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23747
diff changeset
97 def _takequicksample(dag, nodes, size):
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
98 always, sample, desiredlen = _setupsample(dag, nodes, size)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
99 if sample is None:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
100 return always
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
101 sample = always
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
102 _updatesample(dag, None, sample, quicksamplesize=size)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
103 return sample
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
104
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
105 def _takefullsample(dag, nodes, size):
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
106 sample = dag.headsetofconnecteds(nodes)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
107 # update from heads
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
108 _updatesample(dag, nodes, sample)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
109 # update from roots
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
110 _updatesample(dag.inverse(), nodes, sample)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
111 assert sample
23810
b681d3a2bf04 setdiscovery: randomly pick between heads and sample when taking full sample
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23809
diff changeset
112 sample = _limitsample(sample, size)
b681d3a2bf04 setdiscovery: randomly pick between heads and sample when taking full sample
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23809
diff changeset
113 if len(sample) < size:
b681d3a2bf04 setdiscovery: randomly pick between heads and sample when taking full sample
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23809
diff changeset
114 more = size - len(sample)
b681d3a2bf04 setdiscovery: randomly pick between heads and sample when taking full sample
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23809
diff changeset
115 sample.update(random.sample(list(nodes - sample), more))
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
116 return sample
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
117
23083
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
118 def _limitsample(sample, desiredlen):
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
119 """return a random subset of sample of at most desiredlen item"""
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
120 if len(sample) > desiredlen:
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
121 sample = set(random.sample(sample, desiredlen))
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
122 return sample
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
123
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
124 def findcommonheads(ui, local, remote,
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
125 initialsamplesize=100,
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
126 fullsamplesize=200,
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
127 abortwhenunrelated=True):
14206
2bf60f158ecb setdiscovery: limit lines to 80 characters
Steven Brown <StevenGBrown@gmail.com>
parents: 14164
diff changeset
128 '''Return a tuple (common, anyincoming, remoteheads) used to identify
2bf60f158ecb setdiscovery: limit lines to 80 characters
Steven Brown <StevenGBrown@gmail.com>
parents: 14164
diff changeset
129 missing nodes from or in remote.
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
130 '''
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
131 roundtrips = 0
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
132 cl = local.changelog
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
133 dag = dagutil.revlogdag(cl)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
134
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
135 # early exit if we know all the specified remote heads already
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
136 ui.debug("query 1; heads\n")
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
137 roundtrips += 1
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
138 ownheads = dag.heads()
23084
3ef893520a85 setdiscovery: limit the size of the initial sample (issue4411)
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23083
diff changeset
139 sample = _limitsample(ownheads, initialsamplesize)
23192
73cfaa348650 discovery: indices between sample and yesno must match (issue4438)
Mads Kiilerich <madski@unity3d.com>
parents: 23191
diff changeset
140 # indices between sample and externalized version must match
73cfaa348650 discovery: indices between sample and yesno must match (issue4438)
Mads Kiilerich <madski@unity3d.com>
parents: 23191
diff changeset
141 sample = list(sample)
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
142 if remote.local():
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
143 # stopgap until we have a proper localpeer that supports batch()
17204
4feb55e6931f localpeer: return only visible heads and branchmap
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 17191
diff changeset
144 srvheadhashes = remote.heads()
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
145 yesno = remote.known(dag.externalizeall(sample))
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
146 elif remote.capable('batch'):
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
147 batch = remote.batch()
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
148 srvheadhashesref = batch.heads()
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
149 yesnoref = batch.known(dag.externalizeall(sample))
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
150 batch.submit()
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
151 srvheadhashes = srvheadhashesref.value
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
152 yesno = yesnoref.value
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
153 else:
17424
e7cfe3587ea4 fix trivial spelling errors
Mads Kiilerich <mads@kiilerich.com>
parents: 17204
diff changeset
154 # compatibility with pre-batch, but post-known remotes during 1.9
e7cfe3587ea4 fix trivial spelling errors
Mads Kiilerich <mads@kiilerich.com>
parents: 17204
diff changeset
155 # development
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
156 srvheadhashes = remote.heads()
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
157 sample = []
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
158
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
159 if cl.tip() == nullid:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
160 if srvheadhashes != [nullid]:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
161 return [nullid], True, srvheadhashes
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
162 return [nullid], False, []
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
163
14206
2bf60f158ecb setdiscovery: limit lines to 80 characters
Steven Brown <StevenGBrown@gmail.com>
parents: 14164
diff changeset
164 # start actual discovery (we note this before the next "if" for
2bf60f158ecb setdiscovery: limit lines to 80 characters
Steven Brown <StevenGBrown@gmail.com>
parents: 14164
diff changeset
165 # compatibility reasons)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
166 ui.status(_("searching for changes\n"))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
167
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
168 srvheads = dag.internalizeall(srvheadhashes, filterunknown=True)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
169 if len(srvheads) == len(srvheadhashes):
14833
308e1b5acc87 discovery: quiet note about heads
Matt Mackall <mpm@selenic.com>
parents: 14624
diff changeset
170 ui.debug("all remote heads known locally\n")
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
171 return (srvheadhashes, False, srvheadhashes,)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
172
23191
86c35b7ae300 discovery: limit 'all local heads known remotely' to real 'all' (issue4438)
Mads Kiilerich <madski@unity3d.com>
parents: 23130
diff changeset
173 if sample and len(ownheads) <= initialsamplesize and util.all(yesno):
15497
9bea3aed6ee1 add missing localization markup
Mads Kiilerich <mads@kiilerich.com>
parents: 15063
diff changeset
174 ui.note(_("all local heads known remotely\n"))
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
175 ownheadhashes = dag.externalizeall(ownheads)
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
176 return (ownheadhashes, True, srvheadhashes,)
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
177
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
178 # full blown discovery
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
179
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 15713
diff changeset
180 # own nodes I know we both know
23343
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
181 # treat remote heads (and maybe own heads) as a first implicit sample
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
182 # response
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
183 common = cl.incrementalmissingrevs(srvheads)
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
184 commoninsample = set(n for i, n in enumerate(sample) if yesno[i])
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
185 common.addbases(commoninsample)
23746
4ef2f2fa8b8b setdiscovery: drop shadowed 'undecided' assignment
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23343
diff changeset
186 # own nodes where I don't know if remote knows them
23343
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
187 undecided = set(common.missingancestors(ownheads))
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 15713
diff changeset
188 # own nodes I know remote lacks
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 15713
diff changeset
189 missing = set()
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 15713
diff changeset
190
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
191 full = False
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
192 while undecided:
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
193
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
194 if sample:
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
195 missinginsample = [n for i, n in enumerate(sample) if not yesno[i]]
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
196 missing.update(dag.descendantset(missinginsample, missing))
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
197
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
198 undecided.difference_update(missing)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
199
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
200 if not undecided:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
201 break
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
202
23747
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
203 if full or common.hasbases():
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
204 if full:
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
205 ui.note(_("sampling from both directions\n"))
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
206 else:
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
207 ui.debug("taking initial sample\n")
23807
e97e363a7000 setdiscovery: delay sample building calls to gather them in a single place
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23806
diff changeset
208 samplefunc = _takefullsample
23130
ced632394371 setdiscovery: limit the size of all sample (issue4411)
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23084
diff changeset
209 targetsize = fullsamplesize
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
210 else:
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
211 # use even cheaper initial sample
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
212 ui.debug("taking quick initial sample\n")
23807
e97e363a7000 setdiscovery: delay sample building calls to gather them in a single place
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23806
diff changeset
213 samplefunc = _takequicksample
23130
ced632394371 setdiscovery: limit the size of all sample (issue4411)
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23084
diff changeset
214 targetsize = initialsamplesize
23808
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
215 if len(undecided) < targetsize:
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
216 sample = list(undecided)
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
217 else:
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
218 sample = samplefunc(dag, undecided, targetsize)
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
219 sample = _limitsample(sample, targetsize)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
220
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
221 roundtrips += 1
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
222 ui.progress(_('searching'), roundtrips, unit=_('queries'))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
223 ui.debug("query %i; still undecided: %i, sample size is: %i\n"
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
224 % (roundtrips, len(undecided), len(sample)))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
225 # indices between sample and externalized version must match
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
226 sample = list(sample)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
227 yesno = remote.known(dag.externalizeall(sample))
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
228 full = True
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
229
23343
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
230 if sample:
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
231 commoninsample = set(n for i, n in enumerate(sample) if yesno[i])
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
232 common.addbases(commoninsample)
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
233 common.removeancestorsfrom(undecided)
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
234
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
235 # heads(common) == heads(common.bases) since common represents common.bases
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
236 # and all its ancestors
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
237 result = dag.headsetofconnecteds(common.bases)
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
238 # common.bases can include nullrev, but our contract requires us to not
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
239 # return any heads in that case, so discard that
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
240 result.discard(nullrev)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
241 ui.progress(_('searching'), None)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
242 ui.debug("%d total queries\n" % roundtrips)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
243
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
244 if not result and srvheadhashes != [nullid]:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
245 if abortwhenunrelated:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
246 raise util.Abort(_("repository is unrelated"))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
247 else:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
248 ui.warn(_("warning: repository is unrelated\n"))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
249 return (set([nullid]), True, srvheadhashes,)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
250
14981
192e02680d09 setdiscovery: return anyincoming=False when remote's only head is nullid
Andrew Pritchard <andrewp@fogcreek.com>
parents: 14833
diff changeset
251 anyincoming = (srvheadhashes != [nullid])
192e02680d09 setdiscovery: return anyincoming=False when remote's only head is nullid
Andrew Pritchard <andrewp@fogcreek.com>
parents: 14833
diff changeset
252 return dag.externalizeall(result), anyincoming, srvheadhashes