2
Some quick notes about get_ancestry results when tested with bzr's ancestry
4
4401 Canonical.com Patch Queue Manager 2009-06-03 [merge]
5
revision-id:pqm@pqm.ubuntu.com-20090603080435-2gbqwzvbx31zr7ok
6
(andrew) Fix tracebacks when receiving error responses to
7
BzrDirFormat.initialize* RPCs.
9
This is in a fully packed repository, so there is only a single btree index,
10
and thus there should be only a small handful of genuinely missing keys.
12
First call to get_ancestry reads the root node, +1 internal node, +1 leaf node
13
It is able to expand from 1 key into 74 already know parent keys, and 75
16
Second call expands to 2 internal nodes (all of them), 14 leaf nodes, and 398
17
known parents, and 82 search tips. Of those 82 tips, 51 of them are present on
18
nodes that we had already read, we just didn't find them because we don't
19
search outside the current page for parent keys.
24
36 actually present on an unread LeafNode
25
794 parent_map (found parents)
48
103 really missing # went down?
50
103 leaf nodes (out of 221 total)
53
455 search keys (down?)
88
# I wonder if at this point a major limitation is the parents that are read but
89
# are not on the same page. You could have flip-flop cases, where you have:
90
# pqm@ => john@ => pqm@ => john@ => pqm@ => john@
91
# And each one of those is going to require another re-entrance into
94
gen search_keys parent_map parents_found
95
not loaded num_leaf_nodes_cached
101
5 246 108 2034 52 791
102
6 431 119 3272 79 1238
103
7 529 103 5472 103 2200
104
8 455 75 7759 119 2287
105
9 430 157 10004 130 2245
106
10 407 157 11788 148 1784
107
11 400 127 13502 165 1714
108
12 389 145 15500 179 1998
109
13 289 72 17586 194 2086
110
14 210 46 19185 203 1599
111
15 140 2 20474 210 1289
112
16 63 5 21331 211 857
113
17 53 9 21761 212 430
114
18 30 3 22159 214 398
115
19 23 4 22700 216 541
116
20 15 1 23055 217 355
117
21 11 1 23266 218 211
149
# Note the *really* long tail where after we have 20k keys we've pretty much
150
# read all the pages, and at this point, we only have a small handful
151
# (sometimes only 1) parent to search for.
152
# Then again, if you look at the number of parents found count, it is going
153
# up by easily 20-100 at each step, so it is still fairly efficient.
155
# Also, we can compare this to the performance of just looping over
156
# 'get_parent_map()' calls
157
gen search_keys next_keys
1270
Note that it takes 1100 trips to the index layer to generate the 25k revision
1271
history, versus only 50 when we fast-loop on a given index node.
1273
Note also that from 300=>1100 we are iterating over a single revision. This is
1274
because bzr has a *very* long single-parent ancestry because of Martin's
1275
initial work. However a project like OOo is going to have *much* more of this.
1277
We do get to 20k revisions in 107 round trips, but it only takes 15 round trips
1278
to do the same with the get_ancestry version.
1280
get_parent_map peaks at a width of 360 revisions per lookup
1281
get_ancestry peaks at 529 search_keys, but peaks at finding 2287 keys in a
1282
single pass, and only has 1 pass where it failed to find more than 1 parent,
1283
and only 4 passes where it found fewer than 10.
1285
So how to handle the multiple-index scenario. The issue is that ancestry can
1286
jump back and forth between indexes. Though if you consider it closely, it is
1287
actually quite unlikely to do so, because that would imply knowing about a
1288
revision in a previous pack, before that revision was transferred. Possible,
1289
yes (especially with ghosts), likely: no.
1291
However, certainly parents that are missing in one index are likely to be found
1292
in another index, and we don't really know the actual ordering between the
1295
# We want to keep looping until keys_to_search converges to 0, or we
1296
# determine that all indexes are missing the keys that we are currently
1298
# I think we could do something like take the set intersection of the keys
1299
# that each index says are missing, and then after a given pass, we would
1300
# stop searching all of the keys that all indexes are missing.
1302
My thought is to have a loop like:
1304
keys_to_search = set(input_keys)
1305
missing_keys = set()
1306
while keys_to_search:
1307
all_index_missing = None
1308
for index in self._indices:
1309
this_search_keys = keys_to_search
1310
# keys this *this* index cannot resolve
1311
index_missing_keys = set()
1312
while this_search_keys:
1313
this_missing_keys, this_search_keys =
1314
index.get_ancestry(this_search_keys, ...)
1315
index_missing_keys.update(this_missing_keys)
1316
keys_to_search = index_missing_keys
1317
if all_index_missing is None:
1318
all_index_missing = set(index_missing_keys)
1320
all_index_missing = all_index_missing.intersection(index_missing_keys)
1321
missing_keys.update(all_index_missing)
1322
keys_to_search = keys_to_search.difference(all_index_missing)