bzr branch
http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
1 |
# Copyright (C) 2008 Canonical Ltd
|
2 |
#
|
|
3 |
# This program is free software; you can redistribute it and/or modify
|
|
4 |
# it under the terms of the GNU General Public License as published by
|
|
5 |
# the Free Software Foundation; either version 2 of the License, or
|
|
6 |
# (at your option) any later version.
|
|
7 |
#
|
|
8 |
# This program is distributed in the hope that it will be useful,
|
|
9 |
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
10 |
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
11 |
# GNU General Public License for more details.
|
|
12 |
#
|
|
13 |
# You should have received a copy of the GNU General Public License
|
|
14 |
# along with this program; if not, write to the Free Software
|
|
15 |
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
|
16 |
||
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
17 |
"""Persistent maps from tuple_of_strings->string using CHK stores.
|
18 |
||
19 |
Overview and current status:
|
|
20 |
||
21 |
The CHKMap class implements a dict from tuple_of_strings->string by using a trie
|
|
22 |
with internal nodes of 8-bit fan out; The key tuples are mapped to strings by
|
|
23 |
joining them by \x00, and \x00 padding shorter keys out to the length of the
|
|
24 |
longest key. Leaf nodes are packed as densely as possible, and internal nodes
|
|
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
25 |
are all an additional 8-bits wide leading to a sparse upper tree.
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
26 |
|
27 |
Updates to a CHKMap are done preferentially via the apply_delta method, to
|
|
28 |
allow optimisation of the update operation; but individual map/unmap calls are
|
|
29 |
possible and supported. All changes via map/unmap are buffered in memory until
|
|
30 |
the _save method is called to force serialisation of the tree. apply_delta
|
|
31 |
performs a _save implicitly.
|
|
32 |
||
33 |
TODO:
|
|
34 |
-----
|
|
35 |
||
36 |
Densely packed upper nodes.
|
|
37 |
||
38 |
"""
|
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
39 |
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
40 |
import heapq |
|
3735.2.62
by Robert Collins
Create a rudimentary CHK page cache. |
41 |
|
|
3735.9.18
by John Arbash Meinel
Make the versionedfile import lazy. |
42 |
from bzrlib import lazy_import |
43 |
lazy_import.lazy_import(globals(), """ |
|
|
3735.16.3
by John Arbash Meinel
Add functions for _search_key_16 and _search_key_255 and some basic tests for them. |
44 |
import zlib
|
45 |
import struct
|
|
46 |
||
|
3735.9.18
by John Arbash Meinel
Make the versionedfile import lazy. |
47 |
from bzrlib import versionedfile
|
48 |
""") |
|
|
3735.16.7
by John Arbash Meinel
Start parameterizing CHKInventory and CHKSerializer so that we can |
49 |
from bzrlib import ( |
|
3735.2.98
by John Arbash Meinel
Merge bzr.dev 4032. Resolve the new streaming fetch. |
50 |
errors, |
|
3735.16.7
by John Arbash Meinel
Start parameterizing CHKInventory and CHKSerializer so that we can |
51 |
lru_cache, |
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
52 |
osutils, |
|
3735.16.7
by John Arbash Meinel
Start parameterizing CHKInventory and CHKSerializer so that we can |
53 |
registry, |
|
3735.2.112
by John Arbash Meinel
Merge Ian's try/except helper for aiding in debugging strange failures. |
54 |
trace, |
|
3735.16.7
by John Arbash Meinel
Start parameterizing CHKInventory and CHKSerializer so that we can |
55 |
)
|
|
3735.2.62
by Robert Collins
Create a rudimentary CHK page cache. |
56 |
|
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
57 |
# approx 4MB
|
|
3735.14.5
by John Arbash Meinel
Change _check_remap to only page in a batch of children at a time. |
58 |
# If each line is 50 bytes, and you have 255 internal pages, with 255-way fan
|
59 |
# out, it takes 3.1MB to cache the layer.
|
|
60 |
_PAGE_CACHE_SIZE = 4*1024*1024 |
|
|
3735.14.2
by John Arbash Meinel
Finish using the page cache as part of _check_remap, add debugging functions |
61 |
# We are caching bytes so len(value) is perfectly accurate
|
62 |
_page_cache = lru_cache.LRUSizeCache(_PAGE_CACHE_SIZE) |
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
63 |
|
|
3735.2.123
by Ian Clatworthy
only check for remap if changes are interesting in size |
64 |
# If a ChildNode falls below this many bytes, we check for a remap
|
65 |
_INTERESTING_NEW_SIZE = 50 |
|
66 |
# If a ChildNode shrinks by more than this amount, we check for a remap
|
|
67 |
_INTERESTING_SHRINKAGE_LIMIT = 20 |
|
68 |
# If we delete more than this many nodes applying a delta, we check for a remap
|
|
69 |
_INTERESTING_DELETES_LIMIT = 5 |
|
70 |
||
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
71 |
|
|
3735.16.6
by John Arbash Meinel
Include a _search_key_plain function. |
72 |
def _search_key_plain(key): |
73 |
"""Map the key tuple into a search string that just uses the key bytes.""" |
|
74 |
return '\x00'.join(key) |
|
75 |
||
76 |
||
|
3735.2.87
by Vincent Ladeuil
Same player shoots again, zlib.crc32, we'll get you. |
77 |
def _crc32(bit): |
|
3735.2.83
by Vincent Ladeuil
Better fix with explanation for zlib.crc32. |
78 |
# Depending on python version and platform, zlib.crc32 will return either a
|
79 |
# signed (<= 2.5 >= 3.0) or an unsigned (2.5, 2.6).
|
|
80 |
# http://docs.python.org/library/zlib.html recommends using a mask to force
|
|
81 |
# an unsigned value to ensure the same numeric value (unsigned) is obtained
|
|
82 |
# across all python versions and platforms.
|
|
|
3735.2.84
by John Arbash Meinel
Comment about using using 0xFFFFFFFF as part of _search_key_255 |
83 |
# Note: However, on 32-bit platforms this causes an upcast to PyLong, which
|
84 |
# are generally slower than PyInts. However, if performance becomes
|
|
85 |
# critical, we should probably write the whole thing as an extension
|
|
86 |
# anyway.
|
|
87 |
# Though we really don't need that 32nd bit of accuracy. (even 2**24
|
|
88 |
# is probably enough node fan out for realistic trees.)
|
|
|
3735.2.87
by Vincent Ladeuil
Same player shoots again, zlib.crc32, we'll get you. |
89 |
return zlib.crc32(bit)&0xFFFFFFFF |
90 |
||
91 |
||
92 |
def _search_key_16(key): |
|
93 |
"""Map the key tuple into a search key string which has 16-way fan out.""" |
|
94 |
return '\x00'.join(['%08X' % _crc32(bit) for bit in key]) |
|
95 |
||
96 |
||
97 |
def _search_key_255(key): |
|
98 |
"""Map the key tuple into a search key string which has 255-way fan out. |
|
99 |
||
100 |
We use 255-way because '\n' is used as a delimiter, and causes problems
|
|
101 |
while parsing.
|
|
102 |
"""
|
|
103 |
bytes = '\x00'.join([struct.pack('>L', _crc32(bit)) for bit in key]) |
|
|
3735.16.3
by John Arbash Meinel
Add functions for _search_key_16 and _search_key_255 and some basic tests for them. |
104 |
return bytes.replace('\n', '_') |
105 |
||
106 |
||
|
3735.16.7
by John Arbash Meinel
Start parameterizing CHKInventory and CHKSerializer so that we can |
107 |
search_key_registry = registry.Registry() |
108 |
search_key_registry.register('plain', _search_key_plain) |
|
109 |
search_key_registry.register('hash-16-way', _search_key_16) |
|
110 |
search_key_registry.register('hash-255-way', _search_key_255) |
|
111 |
||
112 |
||
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
113 |
class CHKMap(object): |
114 |
"""A persistent map from string to string backed by a CHK store.""" |
|
115 |
||
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
116 |
def __init__(self, store, root_key, search_key_func=None): |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
117 |
"""Create a CHKMap object. |
118 |
||
119 |
:param store: The store the CHKMap is stored in.
|
|
120 |
:param root_key: The root key of the map. None to create an empty
|
|
121 |
CHKMap.
|
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
122 |
:param search_key_func: A function mapping a key => bytes. These bytes
|
123 |
are then used by the internal nodes to split up leaf nodes into
|
|
124 |
multiple pages.
|
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
125 |
"""
|
126 |
self._store = store |
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
127 |
if search_key_func is None: |
|
3735.16.6
by John Arbash Meinel
Include a _search_key_plain function. |
128 |
search_key_func = _search_key_plain |
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
129 |
self._search_key_func = search_key_func |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
130 |
if root_key is None: |
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
131 |
self._root_node = LeafNode(search_key_func=search_key_func) |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
132 |
else: |
|
3735.2.41
by Robert Collins
Make the parent_id_basename index be updated during CHKInventory.apply_delta. |
133 |
self._root_node = self._node_key(root_key) |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
134 |
|
135 |
def apply_delta(self, delta): |
|
136 |
"""Apply a delta to the map. |
|
137 |
||
138 |
:param delta: An iterable of old_key, new_key, new_value tuples.
|
|
139 |
If new_key is not None, then new_key->new_value is inserted
|
|
140 |
into the map; if old_key is not None, then the old mapping
|
|
141 |
of old_key is removed.
|
|
142 |
"""
|
|
|
3735.2.123
by Ian Clatworthy
only check for remap if changes are interesting in size |
143 |
delete_count = 0 |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
144 |
for old, new, value in delta: |
145 |
if old is not None and old != new: |
|
|
3735.2.122
by Ian Clatworthy
don't check_remap on every unmap call in CHKMap.apply_delta() |
146 |
self.unmap(old, check_remap=False) |
|
3735.2.123
by Ian Clatworthy
only check for remap if changes are interesting in size |
147 |
delete_count += 1 |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
148 |
for old, new, value in delta: |
149 |
if new is not None: |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
150 |
self.map(new, value) |
|
3735.2.123
by Ian Clatworthy
only check for remap if changes are interesting in size |
151 |
if delete_count > _INTERESTING_DELETES_LIMIT: |
152 |
trace.mutter("checking remap as %d deletions", delete_count) |
|
|
3735.2.122
by Ian Clatworthy
don't check_remap on every unmap call in CHKMap.apply_delta() |
153 |
self._check_remap() |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
154 |
return self._save() |
155 |
||
156 |
def _ensure_root(self): |
|
157 |
"""Ensure that the root node is an object not a key.""" |
|
158 |
if type(self._root_node) == tuple: |
|
159 |
# Demand-load the root
|
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
160 |
self._root_node = self._get_node(self._root_node) |
161 |
||
162 |
def _get_node(self, node): |
|
163 |
"""Get a node. |
|
164 |
||
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
165 |
Note that this does not update the _items dict in objects containing a
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
166 |
reference to this node. As such it does not prevent subsequent IO being
|
167 |
performed.
|
|
|
3735.11.1
by John Arbash Meinel
Clean up some trailing whitespace. |
168 |
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
169 |
:param node: A tuple key or node object.
|
170 |
:return: A node object.
|
|
171 |
"""
|
|
172 |
if type(node) == tuple: |
|
173 |
bytes = self._read_bytes(node) |
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
174 |
return _deserialise(bytes, node, |
175 |
search_key_func=self._search_key_func) |
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
176 |
else: |
177 |
return node |
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
178 |
|
179 |
def _read_bytes(self, key): |
|
|
3735.2.124
by Ian Clatworthy
use the page cache in CHKMap._read_bytes() |
180 |
try: |
181 |
return _page_cache[key] |
|
182 |
except KeyError: |
|
183 |
stream = self._store.get_record_stream([key], 'unordered', True) |
|
|
3735.23.1
by John Arbash Meinel
If you are going to read from the page cache, |
184 |
bytes = stream.next().get_bytes_as('fulltext') |
185 |
_page_cache[key] = bytes |
|
186 |
return bytes |
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
187 |
|
|
3735.15.16
by John Arbash Meinel
Properly fix up the dump_tree tests, we now suppress the keys by default. |
188 |
def _dump_tree(self, include_keys=False): |
|
3735.9.2
by John Arbash Meinel
Add a _dump_tree helper that assists in debugging what is going on. |
189 |
"""Return the tree in a string representation.""" |
190 |
self._ensure_root() |
|
|
3735.15.15
by John Arbash Meinel
Change child_child to use _dump_tree, |
191 |
res = self._dump_tree_node(self._root_node, prefix='', indent='', |
192 |
include_keys=include_keys) |
|
|
3735.11.9
by John Arbash Meinel
Switch _dump_tree to returning trailing '\n' for nicer results |
193 |
res.append('') # Give a trailing '\n' |
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
194 |
return '\n'.join(res) |
|
3735.9.2
by John Arbash Meinel
Add a _dump_tree helper that assists in debugging what is going on. |
195 |
|
|
3735.15.15
by John Arbash Meinel
Change child_child to use _dump_tree, |
196 |
def _dump_tree_node(self, node, prefix, indent, include_keys=True): |
|
3735.9.2
by John Arbash Meinel
Add a _dump_tree helper that assists in debugging what is going on. |
197 |
"""For this node and all children, generate a string representation.""" |
198 |
result = [] |
|
|
3735.15.15
by John Arbash Meinel
Change child_child to use _dump_tree, |
199 |
if not include_keys: |
200 |
key_str = '' |
|
201 |
else: |
|
202 |
node_key = node.key() |
|
203 |
if node_key is not None: |
|
204 |
key_str = ' %s' % (node_key[0],) |
|
205 |
else: |
|
206 |
key_str = ' None' |
|
207 |
result.append('%s%r %s%s' % (indent, prefix, node.__class__.__name__, |
|
208 |
key_str)) |
|
|
3735.9.2
by John Arbash Meinel
Add a _dump_tree helper that assists in debugging what is going on. |
209 |
if isinstance(node, InternalNode): |
210 |
# Trigger all child nodes to get loaded
|
|
211 |
list(node._iter_nodes(self._store)) |
|
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
212 |
for prefix, sub in sorted(node._items.iteritems()): |
|
3735.15.15
by John Arbash Meinel
Change child_child to use _dump_tree, |
213 |
result.extend(self._dump_tree_node(sub, prefix, indent + ' ', |
214 |
include_keys=include_keys)) |
|
|
3735.9.2
by John Arbash Meinel
Add a _dump_tree helper that assists in debugging what is going on. |
215 |
else: |
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
216 |
for key, value in sorted(node._items.iteritems()): |
|
3735.2.109
by Vincent Ladeuil
Fixed as per John's comments |
217 |
# Don't use prefix nor indent here to line up when used in
|
218 |
# tests in conjunction with assertEqualDiff
|
|
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
219 |
result.append(' %r %r' % (key, value)) |
|
3735.9.2
by John Arbash Meinel
Add a _dump_tree helper that assists in debugging what is going on. |
220 |
return result |
221 |
||
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
222 |
@classmethod
|
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
223 |
def from_dict(klass, store, initial_value, maximum_size=0, key_width=1, |
224 |
search_key_func=None): |
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
225 |
"""Create a CHKMap in store with initial_value as the content. |
|
3735.11.1
by John Arbash Meinel
Clean up some trailing whitespace. |
226 |
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
227 |
:param store: The store to record initial_value in, a VersionedFiles
|
228 |
object with 1-tuple keys supporting CHK key generation.
|
|
229 |
:param initial_value: A dict to store in store. Its keys and values
|
|
230 |
must be bytestrings.
|
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
231 |
:param maximum_size: The maximum_size rule to apply to nodes. This
|
232 |
determines the size at which no new data is added to a single node.
|
|
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
233 |
:param key_width: The number of elements in each key_tuple being stored
|
234 |
in this map.
|
|
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
235 |
:param search_key_func: A function mapping a key => bytes. These bytes
|
236 |
are then used by the internal nodes to split up leaf nodes into
|
|
237 |
multiple pages.
|
|
238 |
:return: The root chk of the resulting CHKMap.
|
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
239 |
"""
|
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
240 |
result = CHKMap(store, None, search_key_func=search_key_func) |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
241 |
result._root_node.set_maximum_size(maximum_size) |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
242 |
result._root_node._key_width = key_width |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
243 |
delta = [] |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
244 |
for key, value in initial_value.items(): |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
245 |
delta.append((None, key, value)) |
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
246 |
return result.apply_delta(delta) |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
247 |
|
|
3735.2.30
by Robert Collins
Start iter_changes between CHKMap instances. |
248 |
def iter_changes(self, basis): |
249 |
"""Iterate over the changes between basis and self. |
|
250 |
||
251 |
:return: An iterator of tuples: (key, old_value, new_value). Old_value
|
|
252 |
is None for keys only in self; new_value is None for keys only in
|
|
253 |
basis.
|
|
254 |
"""
|
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
255 |
# Overview:
|
256 |
# Read both trees in lexographic, highest-first order.
|
|
257 |
# Any identical nodes we skip
|
|
258 |
# Any unique prefixes we output immediately.
|
|
259 |
# values in a leaf node are treated as single-value nodes in the tree
|
|
260 |
# which allows them to be not-special-cased. We know to output them
|
|
261 |
# because their value is a string, not a key(tuple) or node.
|
|
262 |
#
|
|
263 |
# corner cases to beware of when considering this function:
|
|
264 |
# *) common references are at different heights.
|
|
265 |
# consider two trees:
|
|
266 |
# {'a': LeafNode={'aaa':'foo', 'aab':'bar'}, 'b': LeafNode={'b'}}
|
|
267 |
# {'a': InternalNode={'aa':LeafNode={'aaa':'foo', 'aab':'bar'}, 'ab':LeafNode={'ab':'bar'}}
|
|
268 |
# 'b': LeafNode={'b'}}
|
|
269 |
# the node with aaa/aab will only be encountered in the second tree
|
|
270 |
# after reading the 'a' subtree, but it is encountered in the first
|
|
271 |
# tree immediately. Variations on this may have read internal nodes like this.
|
|
272 |
# we want to cut the entire pending subtree when we realise we have a common node.
|
|
|
3735.11.1
by John Arbash Meinel
Clean up some trailing whitespace. |
273 |
# For this we use a list of keys - the path to a node - and check the entire path is
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
274 |
# clean as we process each item.
|
275 |
if self._node_key(self._root_node) == self._node_key(basis._root_node): |
|
276 |
return
|
|
277 |
self._ensure_root() |
|
278 |
basis._ensure_root() |
|
279 |
excluded_keys = set() |
|
280 |
self_node = self._root_node |
|
281 |
basis_node = basis._root_node |
|
282 |
# A heap, each element is prefix, node(tuple/NodeObject/string),
|
|
283 |
# key_path (a list of tuples, tail-sharing down the tree.)
|
|
284 |
self_pending = [] |
|
285 |
basis_pending = [] |
|
|
3735.25.3
by John Arbash Meinel
Pre-filter when the nodes are identical. |
286 |
def process_node(node, path, a_map, pending): |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
287 |
# take a node and expand it
|
288 |
node = a_map._get_node(node) |
|
289 |
if type(node) == LeafNode: |
|
290 |
path = (node._key, path) |
|
291 |
for key, value in node._items.items(): |
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
292 |
# For a LeafNode, the key is a serialized_key, rather than
|
293 |
# a search_key, but the heap is using search_keys
|
|
294 |
search_key = node._search_key_func(key) |
|
295 |
heapq.heappush(pending, (search_key, key, value, path)) |
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
296 |
else: |
297 |
# type(node) == InternalNode
|
|
298 |
path = (node._key, path) |
|
299 |
for prefix, child in node._items.items(): |
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
300 |
heapq.heappush(pending, (prefix, None, child, path)) |
|
3735.25.3
by John Arbash Meinel
Pre-filter when the nodes are identical. |
301 |
def process_common_internal_nodes(self_node, basis_node): |
302 |
self_items = set(self_node._items.items()) |
|
303 |
basis_items = set(basis_node._items.items()) |
|
304 |
path = (self_node._key, None) |
|
305 |
for prefix, child in self_items - basis_items: |
|
306 |
heapq.heappush(self_pending, (prefix, None, child, path)) |
|
307 |
path = (basis_node._key, None) |
|
308 |
for prefix, child in basis_items - self_items: |
|
309 |
heapq.heappush(basis_pending, (prefix, None, child, path)) |
|
310 |
def process_common_leaf_nodes(self_node, basis_node): |
|
311 |
self_items = set(self_node._items.items()) |
|
312 |
basis_items = set(basis_node._items.items()) |
|
313 |
path = (self_node._key, None) |
|
314 |
for key, value in self_items - basis_items: |
|
315 |
prefix = self._search_key_func(key) |
|
316 |
heapq.heappush(self_pending, (prefix, key, value, path)) |
|
317 |
path = (basis_node._key, None) |
|
318 |
for key, value in basis_items - self_items: |
|
319 |
prefix = basis._search_key_func(key) |
|
320 |
heapq.heappush(basis_pending, (prefix, key, value, path)) |
|
321 |
def process_common_prefix_nodes(self_node, self_path, |
|
322 |
basis_node, basis_path): |
|
323 |
# Would it be more efficient if we could request both at the same
|
|
324 |
# time?
|
|
325 |
self_node = self._get_node(self_node) |
|
326 |
basis_node = basis._get_node(basis_node) |
|
327 |
if (type(self_node) == InternalNode |
|
328 |
and type(basis_node) == InternalNode): |
|
329 |
# Matching internal nodes
|
|
330 |
process_common_internal_nodes(self_node, basis_node) |
|
331 |
elif (type(self_node) == LeafNode |
|
332 |
and type(basis_node) == LeafNode): |
|
333 |
process_common_leaf_nodes(self_node, basis_node) |
|
334 |
else: |
|
335 |
process_node(self_node, self_path, self, self_pending) |
|
336 |
process_node(basis_node, basis_path, basis, basis_pending) |
|
337 |
process_common_prefix_nodes(self_node, None, basis_node, None) |
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
338 |
self_seen = set() |
339 |
basis_seen = set() |
|
340 |
excluded_keys = set() |
|
341 |
def check_excluded(key_path): |
|
342 |
# Note that this is N^2, it depends on us trimming trees
|
|
343 |
# aggressively to not become slow.
|
|
344 |
# A better implementation would probably have a reverse map
|
|
|
3735.11.1
by John Arbash Meinel
Clean up some trailing whitespace. |
345 |
# back to the children of a node, and jump straight to it when
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
346 |
# a common node is detected, the proceed to remove the already
|
347 |
# pending children. bzrlib.graph has a searcher module with a
|
|
348 |
# similar problem.
|
|
349 |
while key_path is not None: |
|
350 |
key, key_path = key_path |
|
351 |
if key in excluded_keys: |
|
352 |
return True |
|
353 |
return False |
|
354 |
||
|
3735.2.32
by Robert Collins
Activate test for common node skipping. - 50 times performance improvement. |
355 |
loop_counter = 0 |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
356 |
while self_pending or basis_pending: |
|
3735.2.32
by Robert Collins
Activate test for common node skipping. - 50 times performance improvement. |
357 |
loop_counter += 1 |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
358 |
if not self_pending: |
359 |
# self is exhausted: output remainder of basis
|
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
360 |
for prefix, key, node, path in basis_pending: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
361 |
if check_excluded(path): |
362 |
continue
|
|
363 |
node = basis._get_node(node) |
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
364 |
if key is not None: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
365 |
# a value
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
366 |
yield (key, node, None) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
367 |
else: |
368 |
# subtree - fastpath the entire thing.
|
|
369 |
for key, value in node.iteritems(basis._store): |
|
370 |
yield (key, value, None) |
|
371 |
return
|
|
372 |
elif not basis_pending: |
|
373 |
# basis is exhausted: output remainder of self.
|
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
374 |
for prefix, key, node, path in self_pending: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
375 |
if check_excluded(path): |
376 |
continue
|
|
377 |
node = self._get_node(node) |
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
378 |
if key is not None: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
379 |
# a value
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
380 |
yield (key, None, node) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
381 |
else: |
382 |
# subtree - fastpath the entire thing.
|
|
383 |
for key, value in node.iteritems(self._store): |
|
384 |
yield (key, None, value) |
|
385 |
return
|
|
386 |
else: |
|
387 |
# XXX: future optimisation - yield the smaller items
|
|
388 |
# immediately rather than pushing everything on/off the
|
|
389 |
# heaps. Applies to both internal nodes and leafnodes.
|
|
390 |
if self_pending[0][0] < basis_pending[0][0]: |
|
391 |
# expand self
|
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
392 |
prefix, key, node, path = heapq.heappop(self_pending) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
393 |
if check_excluded(path): |
394 |
continue
|
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
395 |
if key is not None: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
396 |
# a value
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
397 |
yield (key, None, node) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
398 |
else: |
|
3735.25.3
by John Arbash Meinel
Pre-filter when the nodes are identical. |
399 |
process_node(node, path, self, self_pending) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
400 |
continue
|
401 |
elif self_pending[0][0] > basis_pending[0][0]: |
|
402 |
# expand basis
|
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
403 |
prefix, key, node, path = heapq.heappop(basis_pending) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
404 |
if check_excluded(path): |
405 |
continue
|
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
406 |
if key is not None: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
407 |
# a value
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
408 |
yield (key, node, None) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
409 |
else: |
|
3735.25.3
by John Arbash Meinel
Pre-filter when the nodes are identical. |
410 |
process_node(node, path, basis, basis_pending) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
411 |
continue
|
412 |
else: |
|
413 |
# common prefix: possibly expand both
|
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
414 |
if self_pending[0][1] is None: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
415 |
# process next self
|
416 |
read_self = True |
|
417 |
else: |
|
418 |
read_self = False |
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
419 |
if basis_pending[0][1] is None: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
420 |
# process next basis
|
421 |
read_basis = True |
|
422 |
else: |
|
423 |
read_basis = False |
|
424 |
if not read_self and not read_basis: |
|
425 |
# compare a common value
|
|
426 |
self_details = heapq.heappop(self_pending) |
|
427 |
basis_details = heapq.heappop(basis_pending) |
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
428 |
if self_details[2] != basis_details[2]: |
429 |
yield (self_details[1], |
|
430 |
basis_details[2], self_details[2]) |
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
431 |
continue
|
|
3735.25.3
by John Arbash Meinel
Pre-filter when the nodes are identical. |
432 |
# At least one side wasn't a simple value
|
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
433 |
if (self._node_key(self_pending[0][2]) == |
434 |
self._node_key(basis_pending[0][2])): |
|
|
3735.2.32
by Robert Collins
Activate test for common node skipping. - 50 times performance improvement. |
435 |
# Identical pointers, skip (and don't bother adding to
|
436 |
# excluded, it won't turn up again.
|
|
437 |
heapq.heappop(self_pending) |
|
438 |
heapq.heappop(basis_pending) |
|
439 |
continue
|
|
440 |
# Now we need to expand this node before we can continue
|
|
|
3735.25.3
by John Arbash Meinel
Pre-filter when the nodes are identical. |
441 |
if read_self and read_basis: |
442 |
# Both sides start with the same prefix, so process
|
|
443 |
# them in parallel
|
|
444 |
self_prefix, _, self_node, self_path = heapq.heappop( |
|
445 |
self_pending) |
|
446 |
basis_prefix, _, basis_node, basis_path = heapq.heappop( |
|
447 |
basis_pending) |
|
448 |
assert self_prefix == basis_prefix |
|
449 |
process_common_prefix_nodes( |
|
450 |
self_node, self_path, |
|
451 |
basis_node, basis_path) |
|
452 |
continue
|
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
453 |
if read_self: |
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
454 |
prefix, key, node, path = heapq.heappop(self_pending) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
455 |
if check_excluded(path): |
456 |
continue
|
|
|
3735.25.3
by John Arbash Meinel
Pre-filter when the nodes are identical. |
457 |
process_node(node, path, self, self_pending) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
458 |
if read_basis: |
|
3735.25.2
by John Arbash Meinel
Change the data that is put on the queue. |
459 |
prefix, key, node, path = heapq.heappop(basis_pending) |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
460 |
if check_excluded(path): |
461 |
continue
|
|
|
3735.25.3
by John Arbash Meinel
Pre-filter when the nodes are identical. |
462 |
process_node(node, path, basis, basis_pending) |
|
3735.2.32
by Robert Collins
Activate test for common node skipping. - 50 times performance improvement. |
463 |
# print loop_counter
|
|
3735.2.30
by Robert Collins
Start iter_changes between CHKMap instances. |
464 |
|
|
3735.2.9
by Robert Collins
Get a working chk_map using inventory implementation bootstrapped. |
465 |
def iteritems(self, key_filter=None): |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
466 |
"""Iterate over the entire CHKMap's contents.""" |
467 |
self._ensure_root() |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
468 |
return self._root_node.iteritems(self._store, key_filter=key_filter) |
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
469 |
|
|
3735.2.12
by Robert Collins
Implement commit-via-deltas for split inventory repositories. |
470 |
def key(self): |
471 |
"""Return the key for this map.""" |
|
472 |
if isinstance(self._root_node, tuple): |
|
473 |
return self._root_node |
|
474 |
else: |
|
475 |
return self._root_node._key |
|
476 |
||
|
3735.2.17
by Robert Collins
Cache node length to avoid full iteration on __len__ calls. |
477 |
def __len__(self): |
478 |
self._ensure_root() |
|
479 |
return len(self._root_node) |
|
480 |
||
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
481 |
def map(self, key, value): |
482 |
"""Map a key tuple to value.""" |
|
483 |
# Need a root object.
|
|
484 |
self._ensure_root() |
|
485 |
prefix, node_details = self._root_node.map(self._store, key, value) |
|
486 |
if len(node_details) == 1: |
|
487 |
self._root_node = node_details[0][1] |
|
488 |
else: |
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
489 |
self._root_node = InternalNode(prefix, |
490 |
search_key_func=self._search_key_func) |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
491 |
self._root_node.set_maximum_size(node_details[0][1].maximum_size) |
492 |
self._root_node._key_width = node_details[0][1]._key_width |
|
493 |
for split, node in node_details: |
|
494 |
self._root_node.add_node(split, node) |
|
495 |
||
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
496 |
def _node_key(self, node): |
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
497 |
"""Get the key for a node whether it's a tuple or node.""" |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
498 |
if type(node) == tuple: |
499 |
return node |
|
500 |
else: |
|
501 |
return node._key |
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
502 |
|
|
3735.2.122
by Ian Clatworthy
don't check_remap on every unmap call in CHKMap.apply_delta() |
503 |
def unmap(self, key, check_remap=True): |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
504 |
"""remove key from the map.""" |
505 |
self._ensure_root() |
|
|
3735.2.122
by Ian Clatworthy
don't check_remap on every unmap call in CHKMap.apply_delta() |
506 |
if isinstance(self._root_node, InternalNode): |
507 |
unmapped = self._root_node.unmap(self._store, key, |
|
508 |
check_remap=check_remap) |
|
509 |
else: |
|
510 |
unmapped = self._root_node.unmap(self._store, key) |
|
|
3735.11.3
by John Arbash Meinel
At the end of unmap() see if children can be packed into a single Leaf. |
511 |
self._root_node = unmapped |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
512 |
|
|
3735.2.122
by Ian Clatworthy
don't check_remap on every unmap call in CHKMap.apply_delta() |
513 |
def _check_remap(self): |
514 |
"""Check if nodes can be collapsed.""" |
|
515 |
self._ensure_root() |
|
516 |
if isinstance(self._root_node, InternalNode): |
|
517 |
self._root_node._check_remap(self._store) |
|
518 |
||
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
519 |
def _save(self): |
520 |
"""Save the map completely. |
|
521 |
||
522 |
:return: The key of the root node.
|
|
523 |
"""
|
|
524 |
if type(self._root_node) == tuple: |
|
525 |
# Already saved.
|
|
526 |
return self._root_node |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
527 |
keys = list(self._root_node.serialise(self._store)) |
528 |
return keys[-1] |
|
|
3735.2.8
by Robert Collins
New chk_map module for use in tree based inventory storage. |
529 |
|
530 |
||
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
531 |
class Node(object): |
|
3735.15.6
by John Arbash Meinel
Add tests that LeafNodes track the common prefix for both their lookup keys |
532 |
"""Base class defining the protocol for CHK Map nodes. |
533 |
||
534 |
:ivar _raw_size: The total size of the serialized key:value data, before
|
|
535 |
adding the header bytes, and without prefix compression.
|
|
536 |
"""
|
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
537 |
|
538 |
def __init__(self, key_width=1): |
|
539 |
"""Create a node. |
|
540 |
||
541 |
:param key_width: The width of keys for this node.
|
|
542 |
"""
|
|
543 |
self._key = None |
|
544 |
# Current number of elements
|
|
545 |
self._len = 0 |
|
546 |
self._maximum_size = 0 |
|
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
547 |
self._key_width = key_width |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
548 |
# current size in bytes
|
|
3735.15.6
by John Arbash Meinel
Add tests that LeafNodes track the common prefix for both their lookup keys |
549 |
self._raw_size = 0 |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
550 |
# The pointers/values this node has - meaning defined by child classes.
|
551 |
self._items = {} |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
552 |
# The common search prefix
|
553 |
self._search_prefix = None |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
554 |
|
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
555 |
def __repr__(self): |
|
3735.2.111
by John Arbash Meinel
Merge Ian's updates to chk_map and chk_inventory.create_by_apply_delta. |
556 |
items_str = str(sorted(self._items)) |
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
557 |
if len(items_str) > 20: |
558 |
items_str = items_str[16] + '...]' |
|
|
3735.15.3
by John Arbash Meinel
repr update |
559 |
return '%s(key:%s len:%s size:%s max:%s prefix:%s items:%s)' % ( |
|
3735.15.6
by John Arbash Meinel
Add tests that LeafNodes track the common prefix for both their lookup keys |
560 |
self.__class__.__name__, self._key, self._len, self._raw_size, |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
561 |
self._maximum_size, self._search_prefix, items_str) |
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
562 |
|
|
3735.2.38
by Robert Collins
Sufficient fixes to allow bzr-search to index a dev3 format repository. |
563 |
def key(self): |
564 |
return self._key |
|
565 |
||
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
566 |
def __len__(self): |
567 |
return self._len |
|
568 |
||
569 |
@property
|
|
570 |
def maximum_size(self): |
|
571 |
"""What is the upper limit for adding references to a node.""" |
|
572 |
return self._maximum_size |
|
573 |
||
574 |
def set_maximum_size(self, new_size): |
|
575 |
"""Set the size threshold for nodes. |
|
576 |
||
577 |
:param new_size: The size at which no data is added to a node. 0 for
|
|
578 |
unlimited.
|
|
579 |
"""
|
|
580 |
self._maximum_size = new_size |
|
581 |
||
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
582 |
@classmethod
|
583 |
def common_prefix(cls, prefix, key): |
|
584 |
"""Given 2 strings, return the longest prefix common to both. |
|
585 |
||
586 |
:param prefix: This has been the common prefix for other keys, so it is
|
|
587 |
more likely to be the common prefix in this case as well.
|
|
588 |
:param key: Another string to compare to
|
|
589 |
"""
|
|
590 |
if key.startswith(prefix): |
|
591 |
return prefix |
|
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
592 |
# Is there a better way to do this?
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
593 |
for pos, (left, right) in enumerate(zip(prefix, key)): |
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
594 |
if left != right: |
|
3735.2.89
by Vincent Ladeuil
Fix the bogus previous fix. |
595 |
pos -= 1 |
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
596 |
break
|
|
3735.2.89
by Vincent Ladeuil
Fix the bogus previous fix. |
597 |
common = prefix[:pos+1] |
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
598 |
return common |
599 |
||
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
600 |
@classmethod
|
601 |
def common_prefix_for_keys(cls, keys): |
|
602 |
"""Given a list of keys, find their common prefix. |
|
603 |
||
604 |
:param keys: An iterable of strings.
|
|
605 |
:return: The longest common prefix of all keys.
|
|
606 |
"""
|
|
607 |
common_prefix = None |
|
608 |
for key in keys: |
|
609 |
if common_prefix is None: |
|
610 |
common_prefix = key |
|
611 |
continue
|
|
612 |
common_prefix = cls.common_prefix(common_prefix, key) |
|
613 |
if not common_prefix: |
|
614 |
# if common_prefix is the empty string, then we know it won't
|
|
615 |
# change further
|
|
616 |
return '' |
|
617 |
return common_prefix |
|
618 |
||
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
619 |
|
|
3735.2.131
by John Arbash Meinel
Only compute LeafNode._search_prefix when we will use it. |
620 |
# Singleton indicating we have not computed _search_prefix yet
|
621 |
_unknown = object() |
|
622 |
||
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
623 |
class LeafNode(Node): |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
624 |
"""A node containing actual key:value pairs. |
|
3735.11.1
by John Arbash Meinel
Clean up some trailing whitespace. |
625 |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
626 |
:ivar _items: A dict of key->value items. The key is in tuple form.
|
|
3735.15.4
by John Arbash Meinel
Clean up some little bits. |
627 |
:ivar _size: The number of bytes that would be used by serializing all of
|
628 |
the key/value pairs.
|
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
629 |
"""
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
630 |
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
631 |
def __init__(self, search_key_func=None): |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
632 |
Node.__init__(self) |
|
3735.15.4
by John Arbash Meinel
Clean up some little bits. |
633 |
# All of the keys in this leaf node share this common prefix
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
634 |
self._common_serialised_prefix = None |
635 |
self._serialise_key = '\x00'.join |
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
636 |
if search_key_func is None: |
|
3735.16.6
by John Arbash Meinel
Include a _search_key_plain function. |
637 |
self._search_key_func = _search_key_plain |
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
638 |
else: |
639 |
self._search_key_func = search_key_func |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
640 |
|
|
3735.20.7
by Ian Clatworthy
include keywidth in repr for LeafNode |
641 |
def __repr__(self): |
642 |
items_str = sorted(self._items) |
|
643 |
if len(items_str) > 20: |
|
644 |
items_str = items_str[16] + '...]' |
|
645 |
return \
|
|
646 |
'%s(key:%s len:%s size:%s max:%s prefix:%s keywidth:%s items:%s)' \ |
|
647 |
% (self.__class__.__name__, self._key, self._len, self._raw_size, |
|
648 |
self._maximum_size, self._search_prefix, self._key_width, items_str) |
|
649 |
||
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
650 |
def _current_size(self): |
|
3735.15.4
by John Arbash Meinel
Clean up some little bits. |
651 |
"""Answer the current serialised size of this node. |
652 |
||
|
3735.15.6
by John Arbash Meinel
Add tests that LeafNodes track the common prefix for both their lookup keys |
653 |
This differs from self._raw_size in that it includes the bytes used for
|
654 |
the header.
|
|
|
3735.15.4
by John Arbash Meinel
Clean up some little bits. |
655 |
"""
|
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
656 |
if self._common_serialised_prefix is None: |
657 |
bytes_for_items = 0 |
|
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
658 |
prefix_len = 0 |
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
659 |
else: |
660 |
# We will store a single string with the common prefix
|
|
661 |
# And then that common prefix will not be stored in any of the
|
|
662 |
# entry lines
|
|
663 |
prefix_len = len(self._common_serialised_prefix) |
|
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
664 |
bytes_for_items = (self._raw_size - (prefix_len * self._len)) |
665 |
return (9 # 'chkleaf:\n' |
|
666 |
+ len(str(self._maximum_size)) + 1 |
|
667 |
+ len(str(self._key_width)) + 1 |
|
668 |
+ len(str(self._len)) + 1 |
|
669 |
+ prefix_len + 1 |
|
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
670 |
+ bytes_for_items) |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
671 |
|
672 |
@classmethod
|
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
673 |
def deserialise(klass, bytes, key, search_key_func=None): |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
674 |
"""Deserialise bytes, with key key, into a LeafNode. |
675 |
||
676 |
:param bytes: The bytes of the node.
|
|
677 |
:param key: The key that the serialised node has.
|
|
678 |
"""
|
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
679 |
result = LeafNode(search_key_func=search_key_func) |
|
3735.2.72
by John Arbash Meinel
Change deserialise to properly handle when there is a '\r' in the key. |
680 |
# Splitlines can split on '\r' so don't use it, split('\n') adds an
|
681 |
# extra '' if the bytes ends in a final newline.
|
|
682 |
lines = bytes.split('\n') |
|
|
3735.2.99
by John Arbash Meinel
Merge bzr.dev 4034. Whitespace cleanup |
683 |
trailing = lines.pop() |
684 |
if trailing != '': |
|
685 |
raise AssertionError('We did not have a final newline for %s' |
|
686 |
% (key,)) |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
687 |
items = {} |
688 |
if lines[0] != 'chkleaf:': |
|
689 |
raise ValueError("not a serialised leaf node: %r" % bytes) |
|
690 |
maximum_size = int(lines[1]) |
|
691 |
width = int(lines[2]) |
|
692 |
length = int(lines[3]) |
|
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
693 |
prefix = lines[4] |
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
694 |
pos = 5 |
695 |
while pos < len(lines): |
|
|
3735.2.99
by John Arbash Meinel
Merge bzr.dev 4034. Whitespace cleanup |
696 |
line = prefix + lines[pos] |
697 |
elements = line.split('\x00') |
|
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
698 |
pos += 1 |
|
3735.2.99
by John Arbash Meinel
Merge bzr.dev 4034. Whitespace cleanup |
699 |
if len(elements) != width + 1: |
|
3735.20.6
by Ian Clatworthy
more helpful deserialize assertion msg |
700 |
raise AssertionError( |
701 |
'Incorrect number of elements (%d vs %d) for: %r' |
|
702 |
% (len(elements), width + 1, line)) |
|
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
703 |
num_value_lines = int(elements[-1]) |
704 |
value_lines = lines[pos:pos+num_value_lines] |
|
705 |
pos += num_value_lines |
|
706 |
value = '\n'.join(value_lines) |
|
707 |
items[tuple(elements[:-1])] = value |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
708 |
if len(items) != length: |
|
3735.14.3
by John Arbash Meinel
Properly remove keys that are found in the page cache. And add some debugging. |
709 |
raise AssertionError("item count (%d) mismatch for key %s," |
710 |
" bytes %r" % (length, key, bytes)) |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
711 |
result._items = items |
712 |
result._len = length |
|
713 |
result._maximum_size = maximum_size |
|
714 |
result._key = key |
|
715 |
result._key_width = width |
|
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
716 |
result._raw_size = (sum(map(len, lines[5:])) # the length of the suffix |
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
717 |
+ (length)*(len(prefix)) |
718 |
+ (len(lines)-5)) |
|
|
3735.23.2
by John Arbash Meinel
Avoid computing a known prefix on each deserialise. |
719 |
if not items: |
|
3735.2.131
by John Arbash Meinel
Only compute LeafNode._search_prefix when we will use it. |
720 |
result._search_prefix = None |
|
3735.23.2
by John Arbash Meinel
Avoid computing a known prefix on each deserialise. |
721 |
result._common_serialised_prefix = None |
722 |
else: |
|
|
3735.2.131
by John Arbash Meinel
Only compute LeafNode._search_prefix when we will use it. |
723 |
result._search_prefix = _unknown |
|
3735.23.2
by John Arbash Meinel
Avoid computing a known prefix on each deserialise. |
724 |
result._common_serialised_prefix = prefix |
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
725 |
if len(bytes) != result._current_size(): |
|
3735.2.99
by John Arbash Meinel
Merge bzr.dev 4034. Whitespace cleanup |
726 |
raise AssertionError('_current_size computed incorrectly') |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
727 |
return result |
728 |
||
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
729 |
def iteritems(self, store, key_filter=None): |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
730 |
"""Iterate over items in the node. |
731 |
||
732 |
:param key_filter: A filter to apply to the node. It should be a
|
|
733 |
list/set/dict or similar repeatedly iterable container.
|
|
734 |
"""
|
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
735 |
if key_filter is not None: |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
736 |
# Adjust the filter - short elements go to a prefix filter. Would this
|
737 |
# be cleaner explicitly? That would be no harder for InternalNode..
|
|
738 |
# XXX: perhaps defaultdict? Profiling<rinse and repeat>
|
|
739 |
filters = {} |
|
740 |
for key in key_filter: |
|
741 |
length_filter = filters.setdefault(len(key), set()) |
|
742 |
length_filter.add(key) |
|
743 |
filters = filters.items() |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
744 |
for item in self._items.iteritems(): |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
745 |
for length, length_filter in filters: |
746 |
if item[0][:length] in length_filter: |
|
747 |
yield item |
|
748 |
break
|
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
749 |
else: |
750 |
for item in self._items.iteritems(): |
|
751 |
yield item |
|
752 |
||
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
753 |
def _key_value_len(self, key, value): |
754 |
# TODO: Should probably be done without actually joining the key, but
|
|
755 |
# then that can be done via the C extension
|
|
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
756 |
return (len(self._serialise_key(key)) + 1 |
757 |
+ len(str(value.count('\n'))) + 1 |
|
758 |
+ len(value) + 1) |
|
|
3735.9.4
by John Arbash Meinel
Some small cleanups, and fix _dump_tree to handle in-progress nodes. |
759 |
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
760 |
def _search_key(self, key): |
761 |
return self._search_key_func(key) |
|
762 |
||
|
3735.11.13
by John Arbash Meinel
Refactor the LeafNode.map() code so we can do _check_remap more cheaply. |
763 |
def _map_no_split(self, key, value): |
764 |
"""Map a key to a value. |
|
765 |
||
766 |
This assumes either the key does not already exist, or you have already
|
|
767 |
removed its size and length from self.
|
|
768 |
||
769 |
:return: True if adding this node should cause us to split.
|
|
770 |
"""
|
|
771 |
self._items[key] = value |
|
|
3735.15.6
by John Arbash Meinel
Add tests that LeafNodes track the common prefix for both their lookup keys |
772 |
self._raw_size += self._key_value_len(key, value) |
|
3735.11.13
by John Arbash Meinel
Refactor the LeafNode.map() code so we can do _check_remap more cheaply. |
773 |
self._len += 1 |
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
774 |
serialised_key = self._serialise_key(key) |
775 |
if self._common_serialised_prefix is None: |
|
776 |
self._common_serialised_prefix = serialised_key |
|
777 |
else: |
|
778 |
self._common_serialised_prefix = self.common_prefix( |
|
779 |
self._common_serialised_prefix, serialised_key) |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
780 |
search_key = self._search_key(key) |
|
3735.2.131
by John Arbash Meinel
Only compute LeafNode._search_prefix when we will use it. |
781 |
if self._search_prefix is _unknown: |
782 |
self._compute_search_prefix() |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
783 |
if self._search_prefix is None: |
784 |
self._search_prefix = search_key |
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
785 |
else: |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
786 |
self._search_prefix = self.common_prefix( |
787 |
self._search_prefix, search_key) |
|
|
3735.11.13
by John Arbash Meinel
Refactor the LeafNode.map() code so we can do _check_remap more cheaply. |
788 |
if (self._len > 1 |
789 |
and self._maximum_size |
|
|
3735.16.10
by John Arbash Meinel
Don't track state for an infrequent edge case. |
790 |
and self._current_size() > self._maximum_size): |
791 |
# Check to see if all of the search_keys for this node are
|
|
792 |
# identical. We allow the node to grow under that circumstance
|
|
793 |
# (we could track this as common state, but it is infrequent)
|
|
794 |
if (search_key != self._search_prefix |
|
795 |
or not self._are_search_keys_identical()): |
|
796 |
return True |
|
|
3735.11.13
by John Arbash Meinel
Refactor the LeafNode.map() code so we can do _check_remap more cheaply. |
797 |
return False |
798 |
||
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
799 |
def _split(self, store): |
800 |
"""We have overflowed. |
|
801 |
||
802 |
Split this node into multiple LeafNodes, return it up the stack so that
|
|
803 |
the next layer creates a new InternalNode and references the new nodes.
|
|
804 |
||
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
805 |
:return: (common_serialised_prefix, [(node_serialised_prefix, node)])
|
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
806 |
"""
|
|
3735.2.131
by John Arbash Meinel
Only compute LeafNode._search_prefix when we will use it. |
807 |
assert self._search_prefix is not _unknown |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
808 |
common_prefix = self._search_prefix |
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
809 |
split_at = len(common_prefix) + 1 |
810 |
result = {} |
|
811 |
for key, value in self._items.iteritems(): |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
812 |
search_key = self._search_key(key) |
813 |
prefix = search_key[:split_at] |
|
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
814 |
# TODO: Generally only 1 key can be exactly the right length,
|
815 |
# which means we can only have 1 key in the node pointed
|
|
816 |
# at by the 'prefix\0' key. We might want to consider
|
|
817 |
# folding it into the containing InternalNode rather than
|
|
818 |
# having a fixed length-1 node.
|
|
819 |
# Note this is probably not true for hash keys, as they
|
|
820 |
# may get a '\00' node anywhere, but won't have keys of
|
|
821 |
# different lengths.
|
|
822 |
if len(prefix) < split_at: |
|
823 |
prefix += '\x00'*(split_at - len(prefix)) |
|
824 |
if prefix not in result: |
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
825 |
node = LeafNode(search_key_func=self._search_key_func) |
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
826 |
node.set_maximum_size(self._maximum_size) |
827 |
node._key_width = self._key_width |
|
828 |
result[prefix] = node |
|
829 |
else: |
|
830 |
node = result[prefix] |
|
831 |
node.map(store, key, value) |
|
832 |
return common_prefix, result.items() |
|
833 |
||
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
834 |
def map(self, store, key, value): |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
835 |
"""Map key to value.""" |
836 |
if key in self._items: |
|
|
3735.15.6
by John Arbash Meinel
Add tests that LeafNodes track the common prefix for both their lookup keys |
837 |
self._raw_size -= self._key_value_len(key, self._items[key]) |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
838 |
self._len -= 1 |
839 |
self._key = None |
|
|
3735.11.13
by John Arbash Meinel
Refactor the LeafNode.map() code so we can do _check_remap more cheaply. |
840 |
if self._map_no_split(key, value): |
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
841 |
return self._split(store) |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
842 |
else: |
|
3735.2.131
by John Arbash Meinel
Only compute LeafNode._search_prefix when we will use it. |
843 |
assert self._search_prefix is not _unknown |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
844 |
return self._search_prefix, [("", self)] |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
845 |
|
846 |
def serialise(self, store): |
|
|
3735.20.7
by Ian Clatworthy
include keywidth in repr for LeafNode |
847 |
"""Serialise the LeafNode to store. |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
848 |
|
849 |
:param store: A VersionedFiles honouring the CHK extensions.
|
|
850 |
:return: An iterable of the keys inserted by this operation.
|
|
851 |
"""
|
|
852 |
lines = ["chkleaf:\n"] |
|
853 |
lines.append("%d\n" % self._maximum_size) |
|
854 |
lines.append("%d\n" % self._key_width) |
|
855 |
lines.append("%d\n" % self._len) |
|
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
856 |
if self._common_serialised_prefix is None: |
857 |
lines.append('\n') |
|
|
3735.2.99
by John Arbash Meinel
Merge bzr.dev 4034. Whitespace cleanup |
858 |
if len(self._items) != 0: |
859 |
raise AssertionError('If _common_serialised_prefix is None' |
|
860 |
' we should have no items') |
|
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
861 |
else: |
862 |
lines.append('%s\n' % (self._common_serialised_prefix,)) |
|
863 |
prefix_len = len(self._common_serialised_prefix) |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
864 |
for key, value in sorted(self._items.items()): |
|
3735.20.7
by Ian Clatworthy
include keywidth in repr for LeafNode |
865 |
# Always add a final newline
|
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
866 |
value_lines = osutils.chunks_to_lines([value + '\n']) |
867 |
serialized = "%s\x00%s\n" % (self._serialise_key(key), |
|
868 |
len(value_lines)) |
|
|
3735.2.99
by John Arbash Meinel
Merge bzr.dev 4034. Whitespace cleanup |
869 |
if not serialized.startswith(self._common_serialised_prefix): |
870 |
raise AssertionError('We thought the common prefix was %r' |
|
871 |
' but entry %r does not have it in common' |
|
872 |
% (self._common_serialised_prefix, serialized)) |
|
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
873 |
lines.append(serialized[prefix_len:]) |
|
3735.17.1
by John Arbash Meinel
Change the serialized form for leaf nodes. |
874 |
lines.extend(value_lines) |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
875 |
sha1, _, _ = store.add_lines((None,), (), lines) |
876 |
self._key = ("sha1:" + sha1,) |
|
|
3735.15.8
by John Arbash Meinel
Add asserts so that when serializing and deserializing |
877 |
bytes = ''.join(lines) |
|
3735.15.9
by John Arbash Meinel
(broken) Initial prototype of leaf pages which pull out their common prefix. |
878 |
if len(bytes) != self._current_size(): |
|
3735.2.99
by John Arbash Meinel
Merge bzr.dev 4034. Whitespace cleanup |
879 |
raise AssertionError('Invalid _current_size') |
|
3735.15.8
by John Arbash Meinel
Add asserts so that when serializing and deserializing |
880 |
_page_cache.add(self._key, bytes) |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
881 |
return [self._key] |
882 |
||
|
3735.2.26
by Robert Collins
CHKInventory migrated to new CHKMap code. |
883 |
def refs(self): |
884 |
"""Return the references to other CHK's held by this node.""" |
|
885 |
return [] |
|
886 |
||
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
887 |
def _compute_search_prefix(self): |
888 |
"""Determine the common search prefix for all keys in this node. |
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
889 |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
890 |
:return: A bytestring of the longest search key prefix that is
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
891 |
unique within this node.
|
892 |
"""
|
|
|
3735.2.131
by John Arbash Meinel
Only compute LeafNode._search_prefix when we will use it. |
893 |
search_keys = [self._search_key_func(key) for key in self._items] |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
894 |
self._search_prefix = self.common_prefix_for_keys(search_keys) |
895 |
return self._search_prefix |
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
896 |
|
|
3735.16.10
by John Arbash Meinel
Don't track state for an infrequent edge case. |
897 |
def _are_search_keys_identical(self): |
898 |
"""Check to see if the search keys for all entries are the same. |
|
899 |
||
900 |
When using a hash as the search_key it is possible for non-identical
|
|
901 |
keys to collide. If that happens enough, we may try overflow a
|
|
902 |
LeafNode, but as all are collisions, we must not split.
|
|
903 |
"""
|
|
904 |
common_search_key = None |
|
905 |
for key in self._items: |
|
906 |
search_key = self._search_key(key) |
|
907 |
if common_search_key is None: |
|
908 |
common_search_key = search_key |
|
909 |
elif search_key != common_search_key: |
|
910 |
return False |
|
911 |
return True |
|
912 |
||
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
913 |
def _compute_serialised_prefix(self): |
914 |
"""Determine the common prefix for serialised keys in this node. |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
915 |
|
916 |
:return: A bytestring of the longest serialised key prefix that is
|
|
917 |
unique within this node.
|
|
918 |
"""
|
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
919 |
serialised_keys = [self._serialise_key(key) for key in self._items] |
920 |
self._common_serialised_prefix = self.common_prefix_for_keys( |
|
921 |
serialised_keys) |
|
|
3735.19.1
by Ian Clatworthy
CHKMap cleanups |
922 |
return self._common_serialised_prefix |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
923 |
|
924 |
def unmap(self, store, key): |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
925 |
"""Unmap key from the node.""" |
|
3735.2.112
by John Arbash Meinel
Merge Ian's try/except helper for aiding in debugging strange failures. |
926 |
try: |
927 |
self._raw_size -= self._key_value_len(key, self._items[key]) |
|
928 |
except KeyError: |
|
929 |
trace.mutter("key %s not found in %r", key, self._items) |
|
930 |
raise
|
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
931 |
self._len -= 1 |
932 |
del self._items[key] |
|
933 |
self._key = None |
|
|
3735.15.2
by John Arbash Meinel
Change LeafNode to also cache its unique serialized prefix. |
934 |
# Recompute from scratch
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
935 |
self._compute_search_prefix() |
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
936 |
self._compute_serialised_prefix() |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
937 |
return self |
938 |
||
939 |
||
940 |
class InternalNode(Node): |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
941 |
"""A node that contains references to other nodes. |
|
3735.11.1
by John Arbash Meinel
Clean up some trailing whitespace. |
942 |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
943 |
An InternalNode is responsible for mapping search key prefixes to child
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
944 |
nodes.
|
|
3735.15.4
by John Arbash Meinel
Clean up some little bits. |
945 |
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
946 |
:ivar _items: serialised_key => node dictionary. node may be a tuple,
|
|
3735.15.4
by John Arbash Meinel
Clean up some little bits. |
947 |
LeafNode or InternalNode.
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
948 |
"""
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
949 |
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
950 |
def __init__(self, prefix='', search_key_func=None): |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
951 |
Node.__init__(self) |
952 |
# The size of an internalnode with default values and no children.
|
|
953 |
# How many octets key prefixes within this node are.
|
|
954 |
self._node_width = 0 |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
955 |
self._search_prefix = prefix |
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
956 |
if search_key_func is None: |
|
3735.16.6
by John Arbash Meinel
Include a _search_key_plain function. |
957 |
self._search_key_func = _search_key_plain |
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
958 |
else: |
959 |
self._search_key_func = search_key_func |
|
|
3735.9.5
by John Arbash Meinel
Don't allow an InternalNode to add a key that doesn't fit. |
960 |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
961 |
def add_node(self, prefix, node): |
962 |
"""Add a child node with prefix prefix, and node node. |
|
963 |
||
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
964 |
:param prefix: The search key prefix for node.
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
965 |
:param node: The node being added.
|
966 |
"""
|
|
|
3735.2.126
by Ian Clatworthy
replace asserts in chk_map.py with AssertionErrors |
967 |
if self._search_prefix is None: |
968 |
raise AssertionError("_search_prefix should not be None") |
|
969 |
if not prefix.startswith(self._search_prefix): |
|
970 |
raise AssertionError("prefixes mismatch: %s must start with %s" |
|
971 |
% (prefix,self._search_prefix)) |
|
972 |
if len(prefix) != len(self._search_prefix) + 1: |
|
973 |
raise AssertionError("prefix wrong length: len(%s) is not %d" % |
|
974 |
(prefix, len(self._search_prefix) + 1)) |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
975 |
self._len += len(node) |
976 |
if not len(self._items): |
|
977 |
self._node_width = len(prefix) |
|
|
3735.2.126
by Ian Clatworthy
replace asserts in chk_map.py with AssertionErrors |
978 |
if self._node_width != len(self._search_prefix) + 1: |
979 |
raise AssertionError("node width mismatch: %d is not %d" % |
|
980 |
(self._node_width, len(self._search_prefix) + 1)) |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
981 |
self._items[prefix] = node |
982 |
self._key = None |
|
983 |
||
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
984 |
def _current_size(self): |
985 |
"""Answer the current serialised size of this node.""" |
|
|
3735.15.6
by John Arbash Meinel
Add tests that LeafNodes track the common prefix for both their lookup keys |
986 |
return (self._raw_size + len(str(self._len)) + len(str(self._key_width)) + |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
987 |
len(str(self._maximum_size))) |
988 |
||
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
989 |
@classmethod
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
990 |
def deserialise(klass, bytes, key, search_key_func=None): |
|
3735.2.25
by Robert Collins
CHKInventory core tests passing. |
991 |
"""Deserialise bytes to an InternalNode, with key key. |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
992 |
|
993 |
:param bytes: The bytes of the node.
|
|
994 |
:param key: The key that the serialised node has.
|
|
995 |
:return: An InternalNode instance.
|
|
996 |
"""
|
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
997 |
result = InternalNode(search_key_func=search_key_func) |
|
3735.2.72
by John Arbash Meinel
Change deserialise to properly handle when there is a '\r' in the key. |
998 |
# Splitlines can split on '\r' so don't use it, remove the extra ''
|
999 |
# from the result of split('\n') because we should have a trailing
|
|
1000 |
# newline
|
|
1001 |
lines = bytes.split('\n') |
|
|
3735.2.126
by Ian Clatworthy
replace asserts in chk_map.py with AssertionErrors |
1002 |
if lines[-1] != '': |
1003 |
raise AssertionError("last line must be ''") |
|
|
3735.2.72
by John Arbash Meinel
Change deserialise to properly handle when there is a '\r' in the key. |
1004 |
lines.pop(-1) |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1005 |
items = {} |
1006 |
if lines[0] != 'chknode:': |
|
1007 |
raise ValueError("not a serialised internal node: %r" % bytes) |
|
1008 |
maximum_size = int(lines[1]) |
|
1009 |
width = int(lines[2]) |
|
1010 |
length = int(lines[3]) |
|
|
3735.15.11
by John Arbash Meinel
Change the InternalNodes to also pull out the common prefix. |
1011 |
common_prefix = lines[4] |
1012 |
for line in lines[5:]: |
|
1013 |
line = common_prefix + line |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1014 |
prefix, flat_key = line.rsplit('\x00', 1) |
1015 |
items[prefix] = (flat_key,) |
|
1016 |
result._items = items |
|
1017 |
result._len = length |
|
1018 |
result._maximum_size = maximum_size |
|
1019 |
result._key = key |
|
1020 |
result._key_width = width |
|
|
3735.15.6
by John Arbash Meinel
Add tests that LeafNodes track the common prefix for both their lookup keys |
1021 |
# XXX: InternalNodes don't really care about their size, and this will
|
1022 |
# change if we add prefix compression
|
|
1023 |
result._raw_size = None # len(bytes) |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1024 |
result._node_width = len(prefix) |
|
3735.23.2
by John Arbash Meinel
Avoid computing a known prefix on each deserialise. |
1025 |
assert len(items) > 0 |
1026 |
result._search_prefix = common_prefix |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1027 |
return result |
1028 |
||
1029 |
def iteritems(self, store, key_filter=None): |
|
|
3735.2.115
by John Arbash Meinel
Fix the root cause of why _iter_nodes was not handling key_filter. |
1030 |
for node in self._iter_nodes(store, key_filter=key_filter): |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1031 |
for item in node.iteritems(store, key_filter=key_filter): |
1032 |
yield item |
|
1033 |
||
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1034 |
def _iter_nodes(self, store, key_filter=None, batch_size=None): |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1035 |
"""Iterate over node objects which match key_filter. |
1036 |
||
1037 |
:param store: A store to use for accessing content.
|
|
1038 |
:param key_filter: A key filter to filter nodes. Only nodes that might
|
|
1039 |
contain a key in key_filter will be returned.
|
|
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1040 |
:param batch_size: If not None, then we will return the nodes that had
|
1041 |
to be read using get_record_stream in batches, rather than reading
|
|
1042 |
them all at once.
|
|
1043 |
:return: An iterable of nodes. This function does not have to be fully
|
|
1044 |
consumed. (There will be no pending I/O when items are being returned.)
|
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1045 |
"""
|
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
1046 |
keys = {} |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1047 |
if key_filter is None: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
1048 |
for prefix, node in self._items.iteritems(): |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1049 |
if type(node) == tuple: |
|
3735.2.31
by Robert Collins
CHKMap.iter_changes |
1050 |
keys[node] = prefix |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1051 |
else: |
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1052 |
yield node |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1053 |
else: |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
1054 |
# XXX defaultdict ?
|
1055 |
length_filters = {} |
|
1056 |
for key in key_filter: |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1057 |
search_key = self._search_prefix_filter(key) |
1058 |
length_filter = length_filters.setdefault( |
|
1059 |
len(search_key), set()) |
|
1060 |
length_filter.add(search_key) |
|
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
1061 |
length_filters = length_filters.items() |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1062 |
for prefix, node in self._items.iteritems(): |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
1063 |
for length, length_filter in length_filters: |
1064 |
if prefix[:length] in length_filter: |
|
1065 |
if type(node) == tuple: |
|
1066 |
keys[node] = prefix |
|
1067 |
else: |
|
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1068 |
yield node |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
1069 |
break
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1070 |
if keys: |
|
3735.2.62
by Robert Collins
Create a rudimentary CHK page cache. |
1071 |
# Look in the page cache for some more bytes
|
1072 |
found_keys = set() |
|
1073 |
for key in keys: |
|
1074 |
try: |
|
1075 |
bytes = _page_cache[key] |
|
1076 |
except KeyError: |
|
1077 |
continue
|
|
1078 |
else: |
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1079 |
node = _deserialise(bytes, key, |
1080 |
search_key_func=self._search_key_func) |
|
|
3735.2.62
by Robert Collins
Create a rudimentary CHK page cache. |
1081 |
self._items[keys[key]] = node |
1082 |
found_keys.add(key) |
|
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1083 |
yield node |
|
3735.2.62
by Robert Collins
Create a rudimentary CHK page cache. |
1084 |
for key in found_keys: |
1085 |
del keys[key] |
|
1086 |
if keys: |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1087 |
# demand load some pages.
|
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1088 |
if batch_size is None: |
1089 |
# Read all the keys in
|
|
1090 |
batch_size = len(keys) |
|
1091 |
key_order = list(keys) |
|
1092 |
for batch_start in range(0, len(key_order), batch_size): |
|
1093 |
batch = key_order[batch_start:batch_start + batch_size] |
|
1094 |
# We have to fully consume the stream so there is no pending
|
|
1095 |
# I/O, so we buffer the nodes for now.
|
|
1096 |
stream = store.get_record_stream(batch, 'unordered', True) |
|
1097 |
nodes = [] |
|
1098 |
for record in stream: |
|
1099 |
bytes = record.get_bytes_as('fulltext') |
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1100 |
node = _deserialise(bytes, record.key, |
1101 |
search_key_func=self._search_key_func) |
|
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1102 |
nodes.append(node) |
1103 |
self._items[keys[record.key]] = node |
|
1104 |
_page_cache.add(record.key, bytes) |
|
1105 |
for node in nodes: |
|
1106 |
yield node |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1107 |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1108 |
def map(self, store, key, value): |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1109 |
"""Map key to value.""" |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1110 |
if not len(self._items): |
|
3735.2.122
by Ian Clatworthy
don't check_remap on every unmap call in CHKMap.apply_delta() |
1111 |
raise AssertionError("can't map in an empty InternalNode.") |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1112 |
search_key = self._search_key(key) |
|
3735.2.126
by Ian Clatworthy
replace asserts in chk_map.py with AssertionErrors |
1113 |
if self._node_width != len(self._search_prefix) + 1: |
1114 |
raise AssertionError("node width mismatch: %d is not %d" % |
|
1115 |
(self._node_width, len(self._search_prefix) + 1)) |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1116 |
if not search_key.startswith(self._search_prefix): |
|
3735.9.11
by John Arbash Meinel
Handle when an InternalNode decides it needs to split. |
1117 |
# This key doesn't fit in this index, so we need to split at the
|
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
1118 |
# point where it would fit, insert self into that internal node,
|
1119 |
# and then map this key into that node.
|
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1120 |
new_prefix = self.common_prefix(self._search_prefix, |
1121 |
search_key) |
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1122 |
new_parent = InternalNode(new_prefix, |
1123 |
search_key_func=self._search_key_func) |
|
|
3735.9.5
by John Arbash Meinel
Don't allow an InternalNode to add a key that doesn't fit. |
1124 |
new_parent.set_maximum_size(self._maximum_size) |
1125 |
new_parent._key_width = self._key_width |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1126 |
new_parent.add_node(self._search_prefix[:len(new_prefix)+1], |
|
3735.15.1
by John Arbash Meinel
Change InternalNode to always cache its serialized_prefix. |
1127 |
self) |
|
3735.9.5
by John Arbash Meinel
Don't allow an InternalNode to add a key that doesn't fit. |
1128 |
return new_parent.map(store, key, value) |
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1129 |
children = list(self._iter_nodes(store, key_filter=[key])) |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1130 |
if children: |
1131 |
child = children[0] |
|
1132 |
else: |
|
1133 |
# new child needed:
|
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1134 |
child = self._new_child(search_key, LeafNode) |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1135 |
old_len = len(child) |
|
3735.11.11
by John Arbash Meinel
Add logic to map() so that it can also collapse when necessary. |
1136 |
if isinstance(child, LeafNode): |
1137 |
old_size = child._current_size() |
|
1138 |
else: |
|
1139 |
old_size = None |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1140 |
prefix, node_details = child.map(store, key, value) |
1141 |
if len(node_details) == 1: |
|
|
3735.9.11
by John Arbash Meinel
Handle when an InternalNode decides it needs to split. |
1142 |
# child may have shrunk, or might be a new node
|
1143 |
child = node_details[0][1] |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1144 |
self._len = self._len - old_len + len(child) |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1145 |
self._items[search_key] = child |
|
3735.2.29
by Robert Collins
Untested code is broken code. |
1146 |
self._key = None |
|
3735.11.11
by John Arbash Meinel
Add logic to map() so that it can also collapse when necessary. |
1147 |
new_node = self |
|
3735.2.123
by Ian Clatworthy
only check for remap if changes are interesting in size |
1148 |
if isinstance(child, LeafNode): |
1149 |
if old_size is None: |
|
1150 |
# The old node was an InternalNode which means it has now
|
|
1151 |
# collapsed, so we need to check if it will chain to a
|
|
1152 |
# collapse at this level.
|
|
1153 |
trace.mutter("checking remap as InternalNode -> LeafNode") |
|
1154 |
new_node = self._check_remap(store) |
|
1155 |
else: |
|
1156 |
# If the LeafNode has shrunk in size, we may want to run
|
|
1157 |
# a remap check. Checking for a remap is expensive though
|
|
1158 |
# and the frequency of a successful remap is very low.
|
|
1159 |
# Shrinkage by small amounts is common, so we only do the
|
|
1160 |
# remap check if the new_size is low or the shrinkage
|
|
1161 |
# amount is over a configurable limit.
|
|
1162 |
new_size = child._current_size() |
|
1163 |
shrinkage = old_size - new_size |
|
1164 |
if (shrinkage > 0 and new_size < _INTERESTING_NEW_SIZE |
|
1165 |
or shrinkage > _INTERESTING_SHRINKAGE_LIMIT): |
|
1166 |
trace.mutter( |
|
1167 |
"checking remap as size shrunk by %d to be %d", |
|
1168 |
shrinkage, new_size) |
|
1169 |
new_node = self._check_remap(store) |
|
|
3735.2.126
by Ian Clatworthy
replace asserts in chk_map.py with AssertionErrors |
1170 |
if new_node._search_prefix is None: |
1171 |
raise AssertionError("_search_prefix should not be None") |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1172 |
return new_node._search_prefix, [('', new_node)] |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1173 |
# child has overflown - create a new intermediate node.
|
1174 |
# XXX: This is where we might want to try and expand our depth
|
|
1175 |
# to refer to more bytes of every child (which would give us
|
|
1176 |
# multiple pointers to child nodes, but less intermediate nodes)
|
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1177 |
child = self._new_child(search_key, InternalNode) |
1178 |
child._search_prefix = prefix |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1179 |
for split, node in node_details: |
1180 |
child.add_node(split, node) |
|
1181 |
self._len = self._len - old_len + len(child) |
|
|
3735.2.29
by Robert Collins
Untested code is broken code. |
1182 |
self._key = None |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1183 |
return self._search_prefix, [("", self)] |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1184 |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1185 |
def _new_child(self, search_key, klass): |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1186 |
"""Create a new child node of type klass.""" |
1187 |
child = klass() |
|
1188 |
child.set_maximum_size(self._maximum_size) |
|
1189 |
child._key_width = self._key_width |
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1190 |
child._search_key_func = self._search_key_func |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1191 |
self._items[search_key] = child |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1192 |
return child |
1193 |
||
1194 |
def serialise(self, store): |
|
1195 |
"""Serialise the node to store. |
|
1196 |
||
1197 |
:param store: A VersionedFiles honouring the CHK extensions.
|
|
1198 |
:return: An iterable of the keys inserted by this operation.
|
|
1199 |
"""
|
|
1200 |
for node in self._items.itervalues(): |
|
1201 |
if type(node) == tuple: |
|
1202 |
# Never deserialised.
|
|
1203 |
continue
|
|
1204 |
if node._key is not None: |
|
1205 |
# Never altered
|
|
1206 |
continue
|
|
1207 |
for key in node.serialise(store): |
|
1208 |
yield key |
|
1209 |
lines = ["chknode:\n"] |
|
1210 |
lines.append("%d\n" % self._maximum_size) |
|
1211 |
lines.append("%d\n" % self._key_width) |
|
1212 |
lines.append("%d\n" % self._len) |
|
|
3735.2.126
by Ian Clatworthy
replace asserts in chk_map.py with AssertionErrors |
1213 |
if self._search_prefix is None: |
1214 |
raise AssertionError("_search_prefix should not be None") |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1215 |
lines.append('%s\n' % (self._search_prefix,)) |
1216 |
prefix_len = len(self._search_prefix) |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1217 |
for prefix, node in sorted(self._items.items()): |
1218 |
if type(node) == tuple: |
|
1219 |
key = node[0] |
|
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1220 |
else: |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1221 |
key = node._key[0] |
|
3735.15.11
by John Arbash Meinel
Change the InternalNodes to also pull out the common prefix. |
1222 |
serialised = "%s\x00%s\n" % (prefix, key) |
|
3735.2.126
by Ian Clatworthy
replace asserts in chk_map.py with AssertionErrors |
1223 |
if not serialised.startswith(self._search_prefix): |
1224 |
raise AssertionError("prefixes mismatch: %s must start with %s" |
|
1225 |
% (serialised, self._search_prefix)) |
|
|
3735.15.11
by John Arbash Meinel
Change the InternalNodes to also pull out the common prefix. |
1226 |
lines.append(serialised[prefix_len:]) |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1227 |
sha1, _, _ = store.add_lines((None,), (), lines) |
1228 |
self._key = ("sha1:" + sha1,) |
|
|
3735.2.63
by Robert Collins
Divert writes into the CHK page cache as well. |
1229 |
_page_cache.add(self._key, ''.join(lines)) |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1230 |
yield self._key |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1231 |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1232 |
def _search_key(self, key): |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1233 |
"""Return the serialised key for key in this node.""" |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1234 |
# search keys are fixed width. All will be self._node_width wide, so we
|
|
3735.15.5
by John Arbash Meinel
Change the nomenclature. |
1235 |
# pad as necessary.
|
|
3735.16.1
by John Arbash Meinel
(broken) Start tracking down more code that needs to pass around the 'search_key_func' |
1236 |
return (self._search_key_func(key) + '\x00'*self._node_width)[:self._node_width] |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1237 |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1238 |
def _search_prefix_filter(self, key): |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
1239 |
"""Serialise key for use as a prefix filter in iteritems.""" |
|
3735.2.115
by John Arbash Meinel
Fix the root cause of why _iter_nodes was not handling key_filter. |
1240 |
return self._search_key_func(key)[:self._node_width] |
|
3735.2.43
by Robert Collins
Teach CHKMap how to iter items in 2-tuple keyspaces. |
1241 |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1242 |
def _split(self, offset): |
1243 |
"""Split this node into smaller nodes starting at offset. |
|
1244 |
||
1245 |
:param offset: The offset to start the new child nodes at.
|
|
1246 |
:return: An iterable of (prefix, node) tuples. prefix is a byte
|
|
1247 |
prefix for reaching node.
|
|
1248 |
"""
|
|
1249 |
if offset >= self._node_width: |
|
1250 |
for node in self._items.values(): |
|
1251 |
for result in node._split(offset): |
|
1252 |
yield result |
|
1253 |
return
|
|
1254 |
for key, node in self._items.items(): |
|
1255 |
pass
|
|
1256 |
||
|
3735.2.26
by Robert Collins
CHKInventory migrated to new CHKMap code. |
1257 |
def refs(self): |
1258 |
"""Return the references to other CHK's held by this node.""" |
|
1259 |
if self._key is None: |
|
1260 |
raise AssertionError("unserialised nodes have no refs.") |
|
1261 |
refs = [] |
|
1262 |
for value in self._items.itervalues(): |
|
1263 |
if type(value) == tuple: |
|
1264 |
refs.append(value) |
|
1265 |
else: |
|
1266 |
refs.append(value.key()) |
|
1267 |
return refs |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1268 |
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1269 |
def _compute_search_prefix(self, extra_key=None): |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1270 |
"""Return the unique key prefix for this node. |
1271 |
||
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1272 |
:return: A bytestring of the longest search key prefix that is
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1273 |
unique within this node.
|
1274 |
"""
|
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1275 |
self._search_prefix = self.common_prefix_for_keys(self._items) |
1276 |
return self._search_prefix |
|
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1277 |
|
|
3735.2.122
by Ian Clatworthy
don't check_remap on every unmap call in CHKMap.apply_delta() |
1278 |
def unmap(self, store, key, check_remap=True): |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1279 |
"""Remove key from this node and it's children.""" |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1280 |
if not len(self._items): |
|
3735.2.126
by Ian Clatworthy
replace asserts in chk_map.py with AssertionErrors |
1281 |
raise AssertionError("can't unmap in an empty InternalNode.") |
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1282 |
children = list(self._iter_nodes(store, key_filter=[key])) |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1283 |
if children: |
1284 |
child = children[0] |
|
1285 |
else: |
|
1286 |
raise KeyError(key) |
|
1287 |
self._len -= 1 |
|
1288 |
unmapped = child.unmap(store, key) |
|
|
3735.11.3
by John Arbash Meinel
At the end of unmap() see if children can be packed into a single Leaf. |
1289 |
self._key = None |
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1290 |
search_key = self._search_key(key) |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1291 |
if len(unmapped) == 0: |
1292 |
# All child nodes are gone, remove the child:
|
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1293 |
del self._items[search_key] |
|
3735.11.3
by John Arbash Meinel
At the end of unmap() see if children can be packed into a single Leaf. |
1294 |
unmapped = None |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1295 |
else: |
1296 |
# Stash the returned node
|
|
|
3735.15.13
by John Arbash Meinel
Change the term 'lookup' to the term 'search', as it is closer to what Robert envisioned. |
1297 |
self._items[search_key] = unmapped |
|
3735.2.23
by Robert Collins
Test unmapping with one child left but multiple keys. |
1298 |
if len(self._items) == 1: |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1299 |
# this node is no longer needed:
|
1300 |
return self._items.values()[0] |
|
|
3735.11.10
by John Arbash Meinel
Change how _check_remap works so it doesn't have to load all keys. |
1301 |
if isinstance(unmapped, InternalNode): |
1302 |
return self |
|
|
3735.2.122
by Ian Clatworthy
don't check_remap on every unmap call in CHKMap.apply_delta() |
1303 |
if check_remap: |
1304 |
return self._check_remap(store) |
|
1305 |
else: |
|
1306 |
return self |
|
|
3735.11.10
by John Arbash Meinel
Change how _check_remap works so it doesn't have to load all keys. |
1307 |
|
1308 |
def _check_remap(self, store): |
|
|
3735.11.11
by John Arbash Meinel
Add logic to map() so that it can also collapse when necessary. |
1309 |
"""Check if all keys contained by children fit in a single LeafNode. |
1310 |
||
1311 |
:param store: A store to use for reading more nodes
|
|
1312 |
:return: Either self, or a new LeafNode which should replace self.
|
|
1313 |
"""
|
|
|
3735.11.10
by John Arbash Meinel
Change how _check_remap works so it doesn't have to load all keys. |
1314 |
# Logic for how we determine when we need to rebuild
|
|
3735.11.3
by John Arbash Meinel
At the end of unmap() see if children can be packed into a single Leaf. |
1315 |
# 1) Implicitly unmap() is removing a key which means that the child
|
1316 |
# nodes are going to be shrinking by some extent.
|
|
1317 |
# 2) If all children are LeafNodes, it is possible that they could be
|
|
1318 |
# combined into a single LeafNode, which can then completely replace
|
|
1319 |
# this internal node with a single LeafNode
|
|
1320 |
# 3) If *one* child is an InternalNode, we assume it has already done
|
|
1321 |
# all the work to determine that its children cannot collapse, and
|
|
1322 |
# we can then assume that those nodes *plus* the current nodes don't
|
|
1323 |
# have a chance of collapsing either.
|
|
1324 |
# So a very cheap check is to just say if 'unmapped' is an
|
|
1325 |
# InternalNode, we don't have to check further.
|
|
|
3735.11.10
by John Arbash Meinel
Change how _check_remap works so it doesn't have to load all keys. |
1326 |
|
|
3735.11.3
by John Arbash Meinel
At the end of unmap() see if children can be packed into a single Leaf. |
1327 |
# TODO: Another alternative is to check the total size of all known
|
1328 |
# LeafNodes. If there is some formula we can use to determine the
|
|
1329 |
# final size without actually having to read in any more
|
|
1330 |
# children, it would be nice to have. However, we have to be
|
|
1331 |
# careful with stuff like nodes that pull out the common prefix
|
|
1332 |
# of each key, as adding a new key can change the common prefix
|
|
1333 |
# and cause size changes greater than the length of one key.
|
|
1334 |
# So for now, we just add everything to a new Leaf until it
|
|
1335 |
# splits, as we know that will give the right answer
|
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1336 |
new_leaf = LeafNode(search_key_func=self._search_key_func) |
|
3735.11.3
by John Arbash Meinel
At the end of unmap() see if children can be packed into a single Leaf. |
1337 |
new_leaf.set_maximum_size(self._maximum_size) |
1338 |
new_leaf._key_width = self._key_width |
|
|
3735.14.7
by John Arbash Meinel
Change _iter_nodes into a generator. |
1339 |
# A batch_size of 16 was chosen because:
|
1340 |
# a) In testing, a 4k page held 14 times. So if we have more than 16
|
|
1341 |
# leaf nodes we are unlikely to hold them in a single new leaf
|
|
1342 |
# node. This still allows for 1 round trip
|
|
1343 |
# b) With 16-way fan out, we can still do a single round trip
|
|
1344 |
# c) With 255-way fan out, we don't want to read all 255 and destroy
|
|
1345 |
# the page cache, just to determine that we really don't need it.
|
|
1346 |
for node in self._iter_nodes(store, batch_size=16): |
|
1347 |
if isinstance(node, InternalNode): |
|
1348 |
# Without looking at any leaf nodes, we are sure
|
|
1349 |
return self |
|
1350 |
for key, value in node._items.iteritems(): |
|
1351 |
if new_leaf._map_no_split(key, value): |
|
|
3735.11.10
by John Arbash Meinel
Change how _check_remap works so it doesn't have to load all keys. |
1352 |
return self |
|
3735.2.123
by Ian Clatworthy
only check for remap if changes are interesting in size |
1353 |
trace.mutter("remap generated a new LeafNode") |
|
3735.11.3
by John Arbash Meinel
At the end of unmap() see if children can be packed into a single Leaf. |
1354 |
return new_leaf |
|
3735.2.18
by Robert Collins
Partial multi-layer chk dictionary trees. |
1355 |
|
1356 |
||
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1357 |
def _deserialise(bytes, key, search_key_func): |
|
3735.2.16
by Robert Collins
Untested extensions to support repodetails |
1358 |
"""Helper for repositorydetails - convert bytes to a node.""" |
|
3735.2.24
by Robert Collins
test_chk_map tests all passing. |
1359 |
if bytes.startswith("chkleaf:\n"): |
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1360 |
return LeafNode.deserialise(bytes, key, search_key_func=search_key_func) |
|
3735.2.21
by Robert Collins
BROKEN: multi level CHKMap tries, unfinished, subsystem in flux. |
1361 |
elif bytes.startswith("chknode:\n"): |
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1362 |
return InternalNode.deserialise(bytes, key, |
1363 |
search_key_func=search_key_func) |
|
|
3735.2.16
by Robert Collins
Untested extensions to support repodetails |
1364 |
else: |
1365 |
raise AssertionError("Unknown node type.") |
|
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1366 |
|
1367 |
||
|
3735.2.67
by John Arbash Meinel
Merge bzr.dev 3903 which brings in 'chunked' encoding. |
1368 |
def _find_children_info(store, interesting_keys, uninteresting_keys, pb): |
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1369 |
"""Read the associated records, and determine what is interesting.""" |
1370 |
uninteresting_keys = set(uninteresting_keys) |
|
1371 |
chks_to_read = uninteresting_keys.union(interesting_keys) |
|
1372 |
next_uninteresting = set() |
|
1373 |
next_interesting = set() |
|
1374 |
uninteresting_items = set() |
|
1375 |
interesting_items = set() |
|
1376 |
interesting_records = [] |
|
|
3735.9.14
by John Arbash Meinel
Start using the iter_interesting_nodes. |
1377 |
# records_read = set()
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1378 |
for record in store.get_record_stream(chks_to_read, 'unordered', True): |
|
3735.9.14
by John Arbash Meinel
Start using the iter_interesting_nodes. |
1379 |
# records_read.add(record.key())
|
|
3735.9.17
by John Arbash Meinel
Pass around a progress bar and switch to using an adapter. |
1380 |
if pb is not None: |
1381 |
pb.tick() |
|
|
3735.2.67
by John Arbash Meinel
Merge bzr.dev 3903 which brings in 'chunked' encoding. |
1382 |
bytes = record.get_bytes_as('fulltext') |
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1383 |
# We don't care about search_key_func for this code, because we only
|
1384 |
# care about external references.
|
|
1385 |
node = _deserialise(bytes, record.key, search_key_func=None) |
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1386 |
if record.key in uninteresting_keys: |
1387 |
if isinstance(node, InternalNode): |
|
|
3735.9.14
by John Arbash Meinel
Start using the iter_interesting_nodes. |
1388 |
next_uninteresting.update(node.refs()) |
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1389 |
else: |
|
3735.9.14
by John Arbash Meinel
Start using the iter_interesting_nodes. |
1390 |
# We know we are at a LeafNode, so we can pass None for the
|
1391 |
# store
|
|
1392 |
uninteresting_items.update(node.iteritems(None)) |
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1393 |
else: |
1394 |
interesting_records.append(record) |
|
1395 |
if isinstance(node, InternalNode): |
|
|
3735.9.14
by John Arbash Meinel
Start using the iter_interesting_nodes. |
1396 |
next_interesting.update(node.refs()) |
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1397 |
else: |
|
3735.9.14
by John Arbash Meinel
Start using the iter_interesting_nodes. |
1398 |
interesting_items.update(node.iteritems(None)) |
1399 |
# TODO: Filter out records that have already been read, as node splitting
|
|
1400 |
# can cause us to reference the same nodes via shorter and longer
|
|
1401 |
# paths
|
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1402 |
return (next_uninteresting, uninteresting_items, |
1403 |
next_interesting, interesting_records, interesting_items) |
|
1404 |
||
1405 |
||
|
3735.9.19
by John Arbash Meinel
Refactor iter_interesting a little bit. |
1406 |
def _find_all_uninteresting(store, interesting_root_keys, |
1407 |
uninteresting_root_keys, adapter, pb): |
|
1408 |
"""Determine the full set of uninteresting keys.""" |
|
1409 |
# What about duplicates between interesting_root_keys and
|
|
1410 |
# uninteresting_root_keys?
|
|
1411 |
if not uninteresting_root_keys: |
|
1412 |
# Shortcut case. We know there is nothing uninteresting to filter out
|
|
1413 |
# So we just let the rest of the algorithm do the work
|
|
1414 |
# We know there is nothing uninteresting, and we didn't have to read
|
|
1415 |
# any interesting records yet.
|
|
1416 |
return (set(), set(), set(interesting_root_keys), [], set()) |
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1417 |
all_uninteresting_chks = set(uninteresting_root_keys) |
1418 |
all_uninteresting_items = set() |
|
1419 |
||
1420 |
# First step, find the direct children of both the interesting and
|
|
1421 |
# uninteresting set
|
|
1422 |
(uninteresting_keys, uninteresting_items, |
|
1423 |
interesting_keys, interesting_records, |
|
1424 |
interesting_items) = _find_children_info(store, interesting_root_keys, |
|
|
3735.9.17
by John Arbash Meinel
Pass around a progress bar and switch to using an adapter. |
1425 |
uninteresting_root_keys, |
|
3735.2.67
by John Arbash Meinel
Merge bzr.dev 3903 which brings in 'chunked' encoding. |
1426 |
pb=pb) |
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1427 |
all_uninteresting_chks.update(uninteresting_keys) |
1428 |
all_uninteresting_items.update(uninteresting_items) |
|
1429 |
del uninteresting_items |
|
1430 |
# Note: Exact matches between interesting and uninteresting do not need
|
|
1431 |
# to be search further. Non-exact matches need to be searched in case
|
|
1432 |
# there is a future exact-match
|
|
1433 |
uninteresting_keys.difference_update(interesting_keys) |
|
1434 |
||
|
3735.9.6
by John Arbash Meinel
Add a first pass to the interesting search. |
1435 |
# Second, find the full set of uninteresting bits reachable by the
|
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1436 |
# uninteresting roots
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1437 |
chks_to_read = uninteresting_keys |
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1438 |
while chks_to_read: |
1439 |
next_chks = set() |
|
|
3735.9.17
by John Arbash Meinel
Pass around a progress bar and switch to using an adapter. |
1440 |
for record in store.get_record_stream(chks_to_read, 'unordered', False): |
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1441 |
# TODO: Handle 'absent'
|
|
3735.9.17
by John Arbash Meinel
Pass around a progress bar and switch to using an adapter. |
1442 |
if pb is not None: |
1443 |
pb.tick() |
|
|
3735.2.98
by John Arbash Meinel
Merge bzr.dev 4032. Resolve the new streaming fetch. |
1444 |
try: |
|
3735.2.67
by John Arbash Meinel
Merge bzr.dev 3903 which brings in 'chunked' encoding. |
1445 |
bytes = record.get_bytes_as('fulltext') |
|
3735.2.98
by John Arbash Meinel
Merge bzr.dev 4032. Resolve the new streaming fetch. |
1446 |
except errors.UnavailableRepresentation: |
1447 |
bytes = adapter.get_bytes(record) |
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1448 |
# We don't care about search_key_func for this code, because we
|
1449 |
# only care about external references.
|
|
1450 |
node = _deserialise(bytes, record.key, search_key_func=None) |
|
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1451 |
if isinstance(node, InternalNode): |
1452 |
# uninteresting_prefix_chks.update(node._items.iteritems())
|
|
1453 |
chks = node._items.values() |
|
1454 |
# TODO: We remove the entries that are already in
|
|
1455 |
# uninteresting_chks ?
|
|
1456 |
next_chks.update(chks) |
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1457 |
all_uninteresting_chks.update(chks) |
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1458 |
else: |
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1459 |
all_uninteresting_items.update(node._items.iteritems()) |
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1460 |
chks_to_read = next_chks |
|
3735.9.19
by John Arbash Meinel
Refactor iter_interesting a little bit. |
1461 |
return (all_uninteresting_chks, all_uninteresting_items, |
1462 |
interesting_keys, interesting_records, interesting_items) |
|
1463 |
||
1464 |
||
1465 |
def iter_interesting_nodes(store, interesting_root_keys, |
|
1466 |
uninteresting_root_keys, pb=None): |
|
1467 |
"""Given root keys, find interesting nodes. |
|
1468 |
||
1469 |
Evaluate nodes referenced by interesting_root_keys. Ones that are also
|
|
1470 |
referenced from uninteresting_root_keys are not considered interesting.
|
|
1471 |
||
1472 |
:param interesting_root_keys: keys which should be part of the
|
|
1473 |
"interesting" nodes (which will be yielded)
|
|
1474 |
:param uninteresting_root_keys: keys which should be filtered out of the
|
|
1475 |
result set.
|
|
1476 |
:return: Yield
|
|
1477 |
(interesting records, interesting chk's, interesting key:values)
|
|
1478 |
"""
|
|
1479 |
# TODO: consider that it may be more memory efficient to use the 20-byte
|
|
1480 |
# sha1 string, rather than tuples of hexidecimal sha1 strings.
|
|
|
3735.2.68
by John Arbash Meinel
Add a TODO about avoiding all of the get_record_stream calls. |
1481 |
# TODO: Try to factor out a lot of the get_record_stream() calls into a
|
1482 |
# helper function similar to _read_bytes. This function should be
|
|
1483 |
# able to use nodes from the _page_cache as well as actually
|
|
1484 |
# requesting bytes from the store.
|
|
|
3735.9.19
by John Arbash Meinel
Refactor iter_interesting a little bit. |
1485 |
|
1486 |
# A way to adapt from the compressed texts back into fulltexts
|
|
1487 |
# In a way, this seems like a layering inversion to have CHKMap know the
|
|
1488 |
# details of versionedfile
|
|
1489 |
adapter_class = versionedfile.adapter_registry.get( |
|
1490 |
('knit-ft-gz', 'fulltext')) |
|
1491 |
adapter = adapter_class(store) |
|
1492 |
||
1493 |
(all_uninteresting_chks, all_uninteresting_items, interesting_keys, |
|
1494 |
interesting_records, interesting_items) = _find_all_uninteresting(store, |
|
1495 |
interesting_root_keys, uninteresting_root_keys, adapter, pb) |
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1496 |
|
1497 |
# Now that we know everything uninteresting, we can yield information from
|
|
1498 |
# our first request
|
|
1499 |
interesting_items.difference_update(all_uninteresting_items) |
|
1500 |
records = dict((record.key, record) for record in interesting_records |
|
1501 |
if record.key not in all_uninteresting_chks) |
|
|
3735.9.19
by John Arbash Meinel
Refactor iter_interesting a little bit. |
1502 |
if records or interesting_items: |
1503 |
yield records, interesting_items |
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1504 |
interesting_keys.difference_update(all_uninteresting_chks) |
1505 |
||
1506 |
chks_to_read = interesting_keys |
|
|
3735.18.1
by John Arbash Meinel
Change the fetch logic to properly use the child_pb for child ops. |
1507 |
counter = 0 |
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1508 |
while chks_to_read: |
1509 |
next_chks = set() |
|
|
3735.9.17
by John Arbash Meinel
Pass around a progress bar and switch to using an adapter. |
1510 |
for record in store.get_record_stream(chks_to_read, 'unordered', False): |
|
3735.18.1
by John Arbash Meinel
Change the fetch logic to properly use the child_pb for child ops. |
1511 |
counter += 1 |
|
3735.9.17
by John Arbash Meinel
Pass around a progress bar and switch to using an adapter. |
1512 |
if pb is not None: |
|
3735.18.1
by John Arbash Meinel
Change the fetch logic to properly use the child_pb for child ops. |
1513 |
pb.update('find chk pages', counter) |
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1514 |
# TODO: Handle 'absent'?
|
|
3735.2.98
by John Arbash Meinel
Merge bzr.dev 4032. Resolve the new streaming fetch. |
1515 |
try: |
|
3735.2.67
by John Arbash Meinel
Merge bzr.dev 3903 which brings in 'chunked' encoding. |
1516 |
bytes = record.get_bytes_as('fulltext') |
|
3735.2.98
by John Arbash Meinel
Merge bzr.dev 4032. Resolve the new streaming fetch. |
1517 |
except errors.UnavailableRepresentation: |
1518 |
bytes = adapter.get_bytes(record) |
|
|
3735.16.2
by John Arbash Meinel
Start passing around the search_key_func in more places. |
1519 |
# We don't care about search_key_func for this code, because we
|
1520 |
# only care about external references.
|
|
1521 |
node = _deserialise(bytes, record.key, search_key_func=None) |
|
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1522 |
if isinstance(node, InternalNode): |
|
3735.9.15
by John Arbash Meinel
Found a bug in iter_interesting_nodes and its test suite. |
1523 |
chks = set(node.refs()) |
1524 |
chks.difference_update(all_uninteresting_chks) |
|
1525 |
# Is set() and .difference_update better than:
|
|
1526 |
# chks = [chk for chk in node.refs()
|
|
1527 |
# if chk not in all_uninteresting_chks]
|
|
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1528 |
next_chks.update(chks) |
1529 |
# These are now uninteresting everywhere else
|
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1530 |
all_uninteresting_chks.update(chks) |
|
3735.9.19
by John Arbash Meinel
Refactor iter_interesting a little bit. |
1531 |
interesting_items = [] |
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1532 |
else: |
|
3735.9.19
by John Arbash Meinel
Refactor iter_interesting a little bit. |
1533 |
interesting_items = [item for item in node._items.iteritems() |
1534 |
if item not in all_uninteresting_items] |
|
|
3735.9.15
by John Arbash Meinel
Found a bug in iter_interesting_nodes and its test suite. |
1535 |
# TODO: Do we need to filter out items that we have already
|
1536 |
# seen on other pages? We don't really want to buffer the
|
|
1537 |
# whole thing, but it does mean that callers need to
|
|
1538 |
# understand they may get duplicate values.
|
|
|
3735.9.7
by John Arbash Meinel
Cleanup pass. |
1539 |
# all_uninteresting_items.update(interesting_items)
|
|
3735.9.19
by John Arbash Meinel
Refactor iter_interesting a little bit. |
1540 |
yield {record.key: record}, interesting_items |
|
3735.9.1
by John Arbash Meinel
Start working on an iter_interesting_nodes, which can find nodes to transmit |
1541 |
chks_to_read = next_chks |