bzr branch
http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
2484.1.1
by John Arbash Meinel
Add an initial function to read knit indexes in pyrex. |
1 |
# Copyright (C) 2005, 2006, 2007 Canonical Ltd
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
2 |
#
|
3 |
# This program is free software; you can redistribute it and/or modify
|
|
4 |
# it under the terms of the GNU General Public License as published by
|
|
5 |
# the Free Software Foundation; either version 2 of the License, or
|
|
6 |
# (at your option) any later version.
|
|
7 |
#
|
|
8 |
# This program is distributed in the hope that it will be useful,
|
|
9 |
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
10 |
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
11 |
# GNU General Public License for more details.
|
|
12 |
#
|
|
13 |
# You should have received a copy of the GNU General Public License
|
|
14 |
# along with this program; if not, write to the Free Software
|
|
15 |
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
|
16 |
||
17 |
"""Knit versionedfile implementation.
|
|
18 |
||
19 |
A knit is a versioned file implementation that supports efficient append only
|
|
20 |
updates.
|
|
1563.2.6
by Robert Collins
Start check tests for knits (pending), and remove dead code. |
21 |
|
22 |
Knit file layout:
|
|
23 |
lifeless: the data file is made up of "delta records". each delta record has a delta header
|
|
24 |
that contains; (1) a version id, (2) the size of the delta (in lines), and (3) the digest of
|
|
25 |
the -expanded data- (ie, the delta applied to the parent). the delta also ends with a
|
|
26 |
end-marker; simply "end VERSION"
|
|
27 |
||
28 |
delta can be line or full contents.a
|
|
29 |
... the 8's there are the index number of the annotation.
|
|
30 |
version robertc@robertcollins.net-20051003014215-ee2990904cc4c7ad 7 c7d23b2a5bd6ca00e8e266cec0ec228158ee9f9e
|
|
31 |
59,59,3
|
|
32 |
8
|
|
33 |
8 if ie.executable:
|
|
34 |
8 e.set('executable', 'yes')
|
|
35 |
130,130,2
|
|
36 |
8 if elt.get('executable') == 'yes':
|
|
37 |
8 ie.executable = True
|
|
38 |
end robertc@robertcollins.net-20051003014215-ee2990904cc4c7ad
|
|
39 |
||
40 |
||
41 |
whats in an index:
|
|
42 |
09:33 < jrydberg> lifeless: each index is made up of a tuple of; version id, options, position, size, parents
|
|
43 |
09:33 < jrydberg> lifeless: the parents are currently dictionary compressed
|
|
44 |
09:33 < jrydberg> lifeless: (meaning it currently does not support ghosts)
|
|
45 |
09:33 < lifeless> right
|
|
46 |
09:33 < jrydberg> lifeless: the position and size is the range in the data file
|
|
47 |
||
48 |
||
49 |
so the index sequence is the dictionary compressed sequence number used
|
|
50 |
in the deltas to provide line annotation
|
|
51 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
52 |
"""
|
53 |
||
1563.2.6
by Robert Collins
Start check tests for knits (pending), and remove dead code. |
54 |
# TODOS:
|
55 |
# 10:16 < lifeless> make partial index writes safe
|
|
56 |
# 10:16 < lifeless> implement 'knit.check()' like weave.check()
|
|
57 |
# 10:17 < lifeless> record known ghosts so we can detect when they are filled in rather than the current 'reweave
|
|
58 |
# always' approach.
|
|
1563.2.11
by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis. |
59 |
# move sha1 out of the content so that join is faster at verifying parents
|
60 |
# record content length ?
|
|
1563.2.6
by Robert Collins
Start check tests for knits (pending), and remove dead code. |
61 |
|
62 |
||
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
63 |
from copy import copy |
1563.2.11
by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis. |
64 |
from cStringIO import StringIO |
1596.2.28
by Robert Collins
more knit profile based tuning. |
65 |
from itertools import izip, chain |
1756.2.17
by Aaron Bentley
Fixes suggested by John Meinel |
66 |
import operator |
1563.2.6
by Robert Collins
Start check tests for knits (pending), and remove dead code. |
67 |
import os |
1628.1.2
by Robert Collins
More knit micro-optimisations. |
68 |
import sys |
1756.2.29
by Aaron Bentley
Remove basis knit support |
69 |
import warnings |
2762.3.1
by Robert Collins
* The compression used within the bzr repository has changed from zlib |
70 |
from zlib import Z_DEFAULT_COMPRESSION |
1594.2.19
by Robert Collins
More coalescing tweaks, and knit feedback. |
71 |
|
1594.2.17
by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading. |
72 |
import bzrlib |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
73 |
from bzrlib.lazy_import import lazy_import |
74 |
lazy_import(globals(), """ |
|
75 |
from bzrlib import (
|
|
2770.1.1
by Aaron Bentley
Initial implmentation of plain knit annotation |
76 |
annotate,
|
3224.1.10
by John Arbash Meinel
Introduce the heads_provider for reannotate. |
77 |
graph as _mod_graph,
|
2998.2.2
by John Arbash Meinel
implement a faster path for copying from packs back to knits. |
78 |
lru_cache,
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
79 |
pack,
|
2745.1.2
by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly. |
80 |
trace,
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
81 |
)
|
82 |
""") |
|
1911.2.3
by John Arbash Meinel
Moving everything into a new location so that we can cache more than just revision ids |
83 |
from bzrlib import ( |
84 |
cache_utf8, |
|
2745.1.1
by Robert Collins
Add a number of -Devil checkpoints. |
85 |
debug, |
2520.4.140
by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation |
86 |
diff, |
1911.2.3
by John Arbash Meinel
Moving everything into a new location so that we can cache more than just revision ids |
87 |
errors, |
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
88 |
osutils, |
2104.4.2
by John Arbash Meinel
Small cleanup and NEWS entry about fixing bug #65714 |
89 |
patiencediff, |
2039.1.1
by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000) |
90 |
progress, |
1551.15.46
by Aaron Bentley
Move plan merge to tree |
91 |
merge, |
2196.2.1
by John Arbash Meinel
Merge Dmitry's optimizations and minimize the actual diff. |
92 |
ui, |
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
93 |
)
|
94 |
from bzrlib.errors import ( |
|
95 |
FileExists, |
|
96 |
NoSuchFile, |
|
97 |
KnitError, |
|
98 |
InvalidRevisionId, |
|
99 |
KnitCorrupt, |
|
100 |
KnitHeaderError, |
|
101 |
RevisionNotPresent, |
|
102 |
RevisionAlreadyPresent, |
|
103 |
)
|
|
3287.6.1
by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method. |
104 |
from bzrlib.graph import Graph |
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
105 |
from bzrlib.osutils import ( |
106 |
contains_whitespace, |
|
107 |
contains_linebreaks, |
|
2850.1.1
by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when |
108 |
sha_string, |
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
109 |
sha_strings, |
110 |
)
|
|
3287.6.5
by Robert Collins
Deprecate VersionedFile.has_ghost. |
111 |
from bzrlib.symbol_versioning import ( |
112 |
DEPRECATED_PARAMETER, |
|
113 |
deprecated_method, |
|
114 |
deprecated_passed, |
|
115 |
one_four, |
|
116 |
)
|
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
117 |
from bzrlib.tsort import topo_sort |
3287.6.1
by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method. |
118 |
from bzrlib.tuned_gzip import GzipFile, bytes_to_gzip |
2094.3.5
by John Arbash Meinel
Fix imports to ensure modules are loaded before they are used |
119 |
import bzrlib.ui |
3287.6.1
by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method. |
120 |
from bzrlib.versionedfile import VersionedFile, InterVersionedFile |
1684.3.3
by Robert Collins
Add a special cased weaves to knit converter. |
121 |
import bzrlib.weave |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
122 |
|
123 |
||
124 |
# TODO: Split out code specific to this format into an associated object.
|
|
125 |
||
126 |
# TODO: Can we put in some kind of value to check that the index and data
|
|
127 |
# files belong together?
|
|
128 |
||
1759.2.1
by Jelmer Vernooij
Fix some types (found using aspell). |
129 |
# TODO: accommodate binaries, perhaps by storing a byte count
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
130 |
|
131 |
# TODO: function to check whole file
|
|
132 |
||
133 |
# TODO: atomically append data, then measure backwards from the cursor
|
|
134 |
# position after writing to work out where it was located. we may need to
|
|
135 |
# bypass python file buffering.
|
|
136 |
||
137 |
DATA_SUFFIX = '.knit' |
|
138 |
INDEX_SUFFIX = '.kndx' |
|
139 |
||
140 |
||
141 |
class KnitContent(object): |
|
142 |
"""Content of a knit version to which deltas can be applied.""" |
|
143 |
||
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
144 |
def __init__(self): |
145 |
self._should_strip_eol = False |
|
146 |
||
2921.2.1
by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for |
147 |
def apply_delta(self, delta, new_version_id): |
2921.2.2
by Robert Collins
Review feedback. |
148 |
"""Apply delta to this object to become new_version_id.""" |
2921.2.1
by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for |
149 |
raise NotImplementedError(self.apply_delta) |
150 |
||
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
151 |
def cleanup_eol(self, copy_on_mutate=True): |
152 |
if self._should_strip_eol: |
|
153 |
if copy_on_mutate: |
|
154 |
self._lines = self._lines[:] |
|
155 |
self.strip_last_line_newline() |
|
156 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
157 |
def line_delta_iter(self, new_lines): |
1596.2.32
by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility. |
158 |
"""Generate line-based delta from this content to new_lines.""" |
2151.1.1
by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests |
159 |
new_texts = new_lines.text() |
160 |
old_texts = self.text() |
|
2781.1.1
by Martin Pool
merge cpatiencediff from Lukas |
161 |
s = patiencediff.PatienceSequenceMatcher(None, old_texts, new_texts) |
2151.1.1
by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests |
162 |
for tag, i1, i2, j1, j2 in s.get_opcodes(): |
163 |
if tag == 'equal': |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
164 |
continue
|
2151.1.1
by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests |
165 |
# ofrom, oto, length, data
|
166 |
yield i1, i2, j2 - j1, new_lines._lines[j1:j2] |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
167 |
|
168 |
def line_delta(self, new_lines): |
|
169 |
return list(self.line_delta_iter(new_lines)) |
|
170 |
||
2520.4.41
by Aaron Bentley
Accelerate mpdiff generation |
171 |
@staticmethod
|
2520.4.48
by Aaron Bentley
Support getting blocks from knit deltas with no final EOL |
172 |
def get_line_delta_blocks(knit_delta, source, target): |
2520.4.41
by Aaron Bentley
Accelerate mpdiff generation |
173 |
"""Extract SequenceMatcher.get_matching_blocks() from a knit delta""" |
2520.4.48
by Aaron Bentley
Support getting blocks from knit deltas with no final EOL |
174 |
target_len = len(target) |
2520.4.41
by Aaron Bentley
Accelerate mpdiff generation |
175 |
s_pos = 0 |
176 |
t_pos = 0 |
|
177 |
for s_begin, s_end, t_len, new_text in knit_delta: |
|
2520.4.47
by Aaron Bentley
Fix get_line_delta_blocks with eol |
178 |
true_n = s_begin - s_pos |
179 |
n = true_n |
|
2520.4.41
by Aaron Bentley
Accelerate mpdiff generation |
180 |
if n > 0: |
2520.4.48
by Aaron Bentley
Support getting blocks from knit deltas with no final EOL |
181 |
# knit deltas do not provide reliable info about whether the
|
182 |
# last line of a file matches, due to eol handling.
|
|
183 |
if source[s_pos + n -1] != target[t_pos + n -1]: |
|
2520.4.47
by Aaron Bentley
Fix get_line_delta_blocks with eol |
184 |
n-=1 |
185 |
if n > 0: |
|
186 |
yield s_pos, t_pos, n |
|
187 |
t_pos += t_len + true_n |
|
2520.4.41
by Aaron Bentley
Accelerate mpdiff generation |
188 |
s_pos = s_end |
2520.4.48
by Aaron Bentley
Support getting blocks from knit deltas with no final EOL |
189 |
n = target_len - t_pos |
190 |
if n > 0: |
|
191 |
if source[s_pos + n -1] != target[t_pos + n -1]: |
|
192 |
n-=1 |
|
193 |
if n > 0: |
|
194 |
yield s_pos, t_pos, n |
|
2520.4.41
by Aaron Bentley
Accelerate mpdiff generation |
195 |
yield s_pos + (target_len - t_pos), target_len, 0 |
196 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
197 |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
198 |
class AnnotatedKnitContent(KnitContent): |
199 |
"""Annotated content.""" |
|
200 |
||
201 |
def __init__(self, lines): |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
202 |
KnitContent.__init__(self) |
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
203 |
self._lines = lines |
204 |
||
3316.2.13
by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this |
205 |
def annotate(self): |
206 |
"""Return a list of (origin, text) for each content line.""" |
|
207 |
return list(self._lines) |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
208 |
|
2921.2.1
by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for |
209 |
def apply_delta(self, delta, new_version_id): |
2921.2.2
by Robert Collins
Review feedback. |
210 |
"""Apply delta to this object to become new_version_id.""" |
2921.2.1
by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for |
211 |
offset = 0 |
212 |
lines = self._lines |
|
213 |
for start, end, count, delta_lines in delta: |
|
214 |
lines[offset+start:offset+end] = delta_lines |
|
215 |
offset = offset + (start - end) + count |
|
216 |
||
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
217 |
def strip_last_line_newline(self): |
218 |
line = self._lines[-1][1].rstrip('\n') |
|
219 |
self._lines[-1] = (self._lines[-1][0], line) |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
220 |
self._should_strip_eol = False |
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
221 |
|
222 |
def text(self): |
|
2911.1.1
by Martin Pool
Better messages when problems are detected inside a knit |
223 |
try: |
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
224 |
lines = [text for origin, text in self._lines] |
2911.1.1
by Martin Pool
Better messages when problems are detected inside a knit |
225 |
except ValueError, e: |
226 |
# most commonly (only?) caused by the internal form of the knit
|
|
227 |
# missing annotation information because of a bug - see thread
|
|
228 |
# around 20071015
|
|
229 |
raise KnitCorrupt(self, |
|
230 |
"line in annotated knit missing annotation information: %s" |
|
231 |
% (e,)) |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
232 |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
233 |
if self._should_strip_eol: |
234 |
anno, line = lines[-1] |
|
235 |
lines[-1] = (anno, line.rstrip('\n')) |
|
236 |
return lines |
|
237 |
||
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
238 |
def copy(self): |
239 |
return AnnotatedKnitContent(self._lines[:]) |
|
240 |
||
241 |
||
242 |
class PlainKnitContent(KnitContent): |
|
2794.1.3
by Robert Collins
Review feedback. |
243 |
"""Unannotated content. |
244 |
|
|
245 |
When annotate[_iter] is called on this content, the same version is reported
|
|
246 |
for all lines. Generally, annotate[_iter] is not useful on PlainKnitContent
|
|
247 |
objects.
|
|
248 |
"""
|
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
249 |
|
250 |
def __init__(self, lines, version_id): |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
251 |
KnitContent.__init__(self) |
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
252 |
self._lines = lines |
253 |
self._version_id = version_id |
|
254 |
||
3316.2.13
by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this |
255 |
def annotate(self): |
256 |
"""Return a list of (origin, text) for each content line.""" |
|
257 |
return [(self._version_id, line) for line in self._lines] |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
258 |
|
2921.2.1
by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for |
259 |
def apply_delta(self, delta, new_version_id): |
2921.2.2
by Robert Collins
Review feedback. |
260 |
"""Apply delta to this object to become new_version_id.""" |
2921.2.1
by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for |
261 |
offset = 0 |
262 |
lines = self._lines |
|
263 |
for start, end, count, delta_lines in delta: |
|
264 |
lines[offset+start:offset+end] = delta_lines |
|
265 |
offset = offset + (start - end) + count |
|
266 |
self._version_id = new_version_id |
|
267 |
||
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
268 |
def copy(self): |
269 |
return PlainKnitContent(self._lines[:], self._version_id) |
|
270 |
||
271 |
def strip_last_line_newline(self): |
|
272 |
self._lines[-1] = self._lines[-1].rstrip('\n') |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
273 |
self._should_strip_eol = False |
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
274 |
|
275 |
def text(self): |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
276 |
lines = self._lines |
277 |
if self._should_strip_eol: |
|
278 |
lines = lines[:] |
|
279 |
lines[-1] = lines[-1].rstrip('\n') |
|
280 |
return lines |
|
281 |
||
282 |
||
283 |
class _KnitFactory(object): |
|
284 |
"""Base class for common Factory functions.""" |
|
285 |
||
286 |
def parse_record(self, version_id, record, record_details, |
|
287 |
base_content, copy_base_content=True): |
|
288 |
"""Parse a record into a full content object. |
|
289 |
||
290 |
:param version_id: The official version id for this content
|
|
291 |
:param record: The data returned by read_records_iter()
|
|
292 |
:param record_details: Details about the record returned by
|
|
293 |
get_build_details
|
|
294 |
:param base_content: If get_build_details returns a compression_parent,
|
|
295 |
you must return a base_content here, else use None
|
|
296 |
:param copy_base_content: When building from the base_content, decide
|
|
297 |
you can either copy it and return a new object, or modify it in
|
|
298 |
place.
|
|
299 |
:return: (content, delta) A Content object and possibly a line-delta,
|
|
300 |
delta may be None
|
|
301 |
"""
|
|
302 |
method, noeol = record_details |
|
303 |
if method == 'line-delta': |
|
304 |
assert base_content is not None |
|
305 |
if copy_base_content: |
|
306 |
content = base_content.copy() |
|
307 |
else: |
|
308 |
content = base_content |
|
309 |
delta = self.parse_line_delta(record, version_id) |
|
310 |
content.apply_delta(delta, version_id) |
|
311 |
else: |
|
312 |
content = self.parse_fulltext(record, version_id) |
|
313 |
delta = None |
|
314 |
content._should_strip_eol = noeol |
|
315 |
return (content, delta) |
|
316 |
||
317 |
||
318 |
class KnitAnnotateFactory(_KnitFactory): |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
319 |
"""Factory for creating annotated Content objects.""" |
320 |
||
321 |
annotated = True |
|
322 |
||
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
323 |
def make(self, lines, version_id): |
324 |
num_lines = len(lines) |
|
325 |
return AnnotatedKnitContent(zip([version_id] * num_lines, lines)) |
|
326 |
||
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
327 |
def parse_fulltext(self, content, version_id): |
1596.2.7
by Robert Collins
Remove the requirement for reannotation in knit joins. |
328 |
"""Convert fulltext to internal representation |
329 |
||
330 |
fulltext content is of the format
|
|
331 |
revid(utf8) plaintext\n
|
|
332 |
internal representation is of the format:
|
|
333 |
(revid, plaintext)
|
|
334 |
"""
|
|
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
335 |
# TODO: jam 20070209 The tests expect this to be returned as tuples,
|
336 |
# but the code itself doesn't really depend on that.
|
|
337 |
# Figure out a way to not require the overhead of turning the
|
|
338 |
# list back into tuples.
|
|
339 |
lines = [tuple(line.split(' ', 1)) for line in content] |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
340 |
return AnnotatedKnitContent(lines) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
341 |
|
342 |
def parse_line_delta_iter(self, lines): |
|
2163.1.2
by John Arbash Meinel
Don't modify the list during parse_line_delta |
343 |
return iter(self.parse_line_delta(lines)) |
1628.1.2
by Robert Collins
More knit micro-optimisations. |
344 |
|
2851.4.2
by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge |
345 |
def parse_line_delta(self, lines, version_id, plain=False): |
1596.2.7
by Robert Collins
Remove the requirement for reannotation in knit joins. |
346 |
"""Convert a line based delta into internal representation. |
347 |
||
348 |
line delta is in the form of:
|
|
349 |
intstart intend intcount
|
|
350 |
1..count lines:
|
|
351 |
revid(utf8) newline\n
|
|
1759.2.1
by Jelmer Vernooij
Fix some types (found using aspell). |
352 |
internal representation is
|
1596.2.7
by Robert Collins
Remove the requirement for reannotation in knit joins. |
353 |
(start, end, count, [1..count tuples (revid, newline)])
|
2851.4.2
by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge |
354 |
|
355 |
:param plain: If True, the lines are returned as a plain
|
|
2911.1.1
by Martin Pool
Better messages when problems are detected inside a knit |
356 |
list without annotations, not as a list of (origin, content) tuples, i.e.
|
2851.4.2
by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge |
357 |
(start, end, count, [1..count newline])
|
1596.2.7
by Robert Collins
Remove the requirement for reannotation in knit joins. |
358 |
"""
|
1628.1.2
by Robert Collins
More knit micro-optimisations. |
359 |
result = [] |
360 |
lines = iter(lines) |
|
361 |
next = lines.next |
|
2249.5.1
by John Arbash Meinel
Leave revision-ids in utf-8 when reading. |
362 |
|
2249.5.15
by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down. |
363 |
cache = {} |
364 |
def cache_and_return(line): |
|
365 |
origin, text = line.split(' ', 1) |
|
366 |
return cache.setdefault(origin, origin), text |
|
367 |
||
1628.1.2
by Robert Collins
More knit micro-optimisations. |
368 |
# walk through the lines parsing.
|
2851.4.2
by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge |
369 |
# Note that the plain test is explicitly pulled out of the
|
370 |
# loop to minimise any performance impact
|
|
371 |
if plain: |
|
372 |
for header in lines: |
|
373 |
start, end, count = [int(n) for n in header.split(',')] |
|
374 |
contents = [next().split(' ', 1)[1] for i in xrange(count)] |
|
375 |
result.append((start, end, count, contents)) |
|
376 |
else: |
|
377 |
for header in lines: |
|
378 |
start, end, count = [int(n) for n in header.split(',')] |
|
379 |
contents = [tuple(next().split(' ', 1)) for i in xrange(count)] |
|
380 |
result.append((start, end, count, contents)) |
|
1628.1.2
by Robert Collins
More knit micro-optimisations. |
381 |
return result |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
382 |
|
2163.2.2
by John Arbash Meinel
Don't deal with annotations when we don't care about them. Saves another 300+ms |
383 |
def get_fulltext_content(self, lines): |
384 |
"""Extract just the content lines from a fulltext.""" |
|
385 |
return (line.split(' ', 1)[1] for line in lines) |
|
386 |
||
387 |
def get_linedelta_content(self, lines): |
|
388 |
"""Extract just the content from a line delta. |
|
389 |
||
390 |
This doesn't return all of the extra information stored in a delta.
|
|
391 |
Only the actual content lines.
|
|
392 |
"""
|
|
393 |
lines = iter(lines) |
|
394 |
next = lines.next |
|
395 |
for header in lines: |
|
396 |
header = header.split(',') |
|
397 |
count = int(header[2]) |
|
398 |
for i in xrange(count): |
|
399 |
origin, text = next().split(' ', 1) |
|
400 |
yield text |
|
401 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
402 |
def lower_fulltext(self, content): |
1596.2.7
by Robert Collins
Remove the requirement for reannotation in knit joins. |
403 |
"""convert a fulltext content record into a serializable form. |
404 |
||
405 |
see parse_fulltext which this inverts.
|
|
406 |
"""
|
|
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
407 |
# TODO: jam 20070209 We only do the caching thing to make sure that
|
408 |
# the origin is a valid utf-8 line, eventually we could remove it
|
|
2249.5.15
by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down. |
409 |
return ['%s %s' % (o, t) for o, t in content._lines] |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
410 |
|
411 |
def lower_line_delta(self, delta): |
|
1596.2.7
by Robert Collins
Remove the requirement for reannotation in knit joins. |
412 |
"""convert a delta into a serializable form. |
413 |
||
1628.1.2
by Robert Collins
More knit micro-optimisations. |
414 |
See parse_line_delta which this inverts.
|
1596.2.7
by Robert Collins
Remove the requirement for reannotation in knit joins. |
415 |
"""
|
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
416 |
# TODO: jam 20070209 We only do the caching thing to make sure that
|
417 |
# the origin is a valid utf-8 line, eventually we could remove it
|
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
418 |
out = [] |
419 |
for start, end, c, lines in delta: |
|
420 |
out.append('%d,%d,%d\n' % (start, end, c)) |
|
2249.5.15
by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down. |
421 |
out.extend(origin + ' ' + text |
1911.2.1
by John Arbash Meinel
Cache encode/decode operations, saves memory and time. Especially when committing a new kernel tree with 7.7M new lines to annotate |
422 |
for origin, text in lines) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
423 |
return out |
424 |
||
3316.2.13
by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this |
425 |
def annotate(self, knit, version_id): |
2770.1.1
by Aaron Bentley
Initial implmentation of plain knit annotation |
426 |
content = knit._get_content(version_id) |
3316.2.13
by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this |
427 |
return content.annotate() |
2770.1.1
by Aaron Bentley
Initial implmentation of plain knit annotation |
428 |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
429 |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
430 |
class KnitPlainFactory(_KnitFactory): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
431 |
"""Factory for creating plain Content objects.""" |
432 |
||
433 |
annotated = False |
|
434 |
||
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
435 |
def make(self, lines, version_id): |
436 |
return PlainKnitContent(lines, version_id) |
|
437 |
||
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
438 |
def parse_fulltext(self, content, version_id): |
1596.2.7
by Robert Collins
Remove the requirement for reannotation in knit joins. |
439 |
"""This parses an unannotated fulltext. |
440 |
||
441 |
Note that this is not a noop - the internal representation
|
|
442 |
has (versionid, line) - its just a constant versionid.
|
|
443 |
"""
|
|
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
444 |
return self.make(content, version_id) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
445 |
|
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
446 |
def parse_line_delta_iter(self, lines, version_id): |
2163.1.2
by John Arbash Meinel
Don't modify the list during parse_line_delta |
447 |
cur = 0 |
448 |
num_lines = len(lines) |
|
449 |
while cur < num_lines: |
|
450 |
header = lines[cur] |
|
451 |
cur += 1 |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
452 |
start, end, c = [int(n) for n in header.split(',')] |
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
453 |
yield start, end, c, lines[cur:cur+c] |
2163.1.2
by John Arbash Meinel
Don't modify the list during parse_line_delta |
454 |
cur += c |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
455 |
|
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
456 |
def parse_line_delta(self, lines, version_id): |
457 |
return list(self.parse_line_delta_iter(lines, version_id)) |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
458 |
|
2163.2.2
by John Arbash Meinel
Don't deal with annotations when we don't care about them. Saves another 300+ms |
459 |
def get_fulltext_content(self, lines): |
460 |
"""Extract just the content lines from a fulltext.""" |
|
461 |
return iter(lines) |
|
462 |
||
463 |
def get_linedelta_content(self, lines): |
|
464 |
"""Extract just the content from a line delta. |
|
465 |
||
466 |
This doesn't return all of the extra information stored in a delta.
|
|
467 |
Only the actual content lines.
|
|
468 |
"""
|
|
469 |
lines = iter(lines) |
|
470 |
next = lines.next |
|
471 |
for header in lines: |
|
472 |
header = header.split(',') |
|
473 |
count = int(header[2]) |
|
474 |
for i in xrange(count): |
|
475 |
yield next() |
|
476 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
477 |
def lower_fulltext(self, content): |
478 |
return content.text() |
|
479 |
||
480 |
def lower_line_delta(self, delta): |
|
481 |
out = [] |
|
482 |
for start, end, c, lines in delta: |
|
483 |
out.append('%d,%d,%d\n' % (start, end, c)) |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
484 |
out.extend(lines) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
485 |
return out |
486 |
||
3316.2.13
by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this |
487 |
def annotate(self, knit, version_id): |
3224.1.7
by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details. |
488 |
annotator = _KnitAnnotator(knit) |
3316.2.13
by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this |
489 |
return annotator.annotate(version_id) |
2770.1.1
by Aaron Bentley
Initial implmentation of plain knit annotation |
490 |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
491 |
|
492 |
def make_empty_knit(transport, relpath): |
|
493 |
"""Construct a empty knit at the specified location.""" |
|
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
494 |
k = make_file_knit(transport, relpath, 'w', KnitPlainFactory) |
495 |
||
496 |
||
497 |
def make_file_knit(name, transport, file_mode=None, access_mode='w', |
|
498 |
factory=None, delta=True, create=False, create_parent_dir=False, |
|
499 |
delay_create=False, dir_mode=None, get_scope=None): |
|
500 |
"""Factory to create a KnitVersionedFile for a .knit/.kndx file pair.""" |
|
501 |
if factory is None: |
|
502 |
factory = KnitAnnotateFactory() |
|
503 |
else: |
|
504 |
factory = KnitPlainFactory() |
|
505 |
if get_scope is None: |
|
506 |
get_scope = lambda:None |
|
507 |
index = _KnitIndex(transport, name + INDEX_SUFFIX, |
|
508 |
access_mode, create=create, file_mode=file_mode, |
|
509 |
create_parent_dir=create_parent_dir, delay_create=delay_create, |
|
510 |
dir_mode=dir_mode, get_scope=get_scope) |
|
511 |
access = _KnitAccess(transport, name + DATA_SUFFIX, file_mode, |
|
512 |
dir_mode, ((create and not len(index)) and delay_create), |
|
513 |
create_parent_dir) |
|
514 |
return KnitVersionedFile(name, transport, factory=factory, |
|
515 |
create=create, delay_create=delay_create, index=index, |
|
516 |
access_method=access) |
|
517 |
||
518 |
||
519 |
def get_suffixes(): |
|
520 |
"""Return the suffixes used by file based knits.""" |
|
521 |
return [DATA_SUFFIX, INDEX_SUFFIX] |
|
522 |
make_file_knit.get_suffixes = get_suffixes |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
523 |
|
524 |
||
525 |
class KnitVersionedFile(VersionedFile): |
|
526 |
"""Weave-like structure with faster random access. |
|
527 |
||
528 |
A knit stores a number of texts and a summary of the relationships
|
|
529 |
between them. Texts are identified by a string version-id. Texts
|
|
530 |
are normally stored and retrieved as a series of lines, but can
|
|
531 |
also be passed as single strings.
|
|
532 |
||
533 |
Lines are stored with the trailing newline (if any) included, to
|
|
534 |
avoid special cases for files with no final newline. Lines are
|
|
535 |
composed of 8-bit characters, not unicode. The combination of
|
|
536 |
these approaches should mean any 'binary' file can be safely
|
|
537 |
stored and retrieved.
|
|
538 |
"""
|
|
539 |
||
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
540 |
def __init__(self, relpath, transport, file_mode=None, |
2592.3.135
by Robert Collins
Do not create many transient knit objects, saving 4% on commit. |
541 |
factory=None, delta=True, create=False, create_parent_dir=False, |
542 |
delay_create=False, dir_mode=None, index=None, access_method=None): |
|
1563.2.25
by Robert Collins
Merge in upstream. |
543 |
"""Construct a knit at location specified by relpath. |
544 |
|
|
545 |
:param create: If not True, only open an existing knit.
|
|
1946.2.1
by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories |
546 |
:param create_parent_dir: If True, create the parent directory if
|
547 |
creating the file fails. (This is used for stores with
|
|
548 |
hash-prefixes that may not exist yet)
|
|
549 |
:param delay_create: The calling code is aware that the knit won't
|
|
550 |
actually be created until the first data is stored.
|
|
2592.3.1
by Robert Collins
Allow giving KnitVersionedFile an index object to use rather than implicitly creating one. |
551 |
:param index: An index to use for the knit.
|
1563.2.25
by Robert Collins
Merge in upstream. |
552 |
"""
|
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
553 |
super(KnitVersionedFile, self).__init__() |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
554 |
self.transport = transport |
555 |
self.filename = relpath |
|
1563.2.16
by Robert Collins
Change WeaveStore into VersionedFileStore and make its versoined file class parameterisable. |
556 |
self.factory = factory or KnitAnnotateFactory() |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
557 |
self.delta = delta |
558 |
||
2147.1.1
by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size |
559 |
self._max_delta_chain = 200 |
560 |
||
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
561 |
if None in (access_method, index): |
3316.2.15
by Robert Collins
Final review feedback. |
562 |
raise ValueError("No default access_method or index any more") |
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
563 |
self._index = index |
564 |
_access = access_method |
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
565 |
if create and not len(self) and not delay_create: |
566 |
_access.create() |
|
567 |
self._data = _KnitData(_access) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
568 |
|
1704.2.10
by Martin Pool
Add KnitVersionedFile.__repr__ method |
569 |
def __repr__(self): |
2592.3.159
by Robert Collins
Provide a transport for KnitVersionedFile's __repr__ in pack repositories. |
570 |
return '%s(%s)' % (self.__class__.__name__, |
1704.2.10
by Martin Pool
Add KnitVersionedFile.__repr__ method |
571 |
self.transport.abspath(self.filename)) |
572 |
||
2147.1.1
by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size |
573 |
def _check_should_delta(self, first_parents): |
574 |
"""Iterate back through the parent listing, looking for a fulltext. |
|
575 |
||
576 |
This is used when we want to decide whether to add a delta or a new
|
|
577 |
fulltext. It searches for _max_delta_chain parents. When it finds a
|
|
578 |
fulltext parent, it sees if the total size of the deltas leading up to
|
|
579 |
it is large enough to indicate that we want a new full text anyway.
|
|
580 |
||
581 |
Return True if we should create a new delta, False if we should use a
|
|
582 |
full text.
|
|
583 |
"""
|
|
584 |
delta_size = 0 |
|
585 |
fulltext_size = None |
|
586 |
delta_parents = first_parents |
|
2147.1.2
by John Arbash Meinel
Simplify the knit max-chain detection code. |
587 |
for count in xrange(self._max_delta_chain): |
2147.1.1
by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size |
588 |
parent = delta_parents[0] |
589 |
method = self._index.get_method(parent) |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
590 |
index, pos, size = self._index.get_position(parent) |
2147.1.1
by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size |
591 |
if method == 'fulltext': |
592 |
fulltext_size = size |
|
593 |
break
|
|
594 |
delta_size += size |
|
3287.5.6
by Robert Collins
Remove _KnitIndex.get_parents. |
595 |
delta_parents = self._index.get_parent_map([parent])[parent] |
2147.1.2
by John Arbash Meinel
Simplify the knit max-chain detection code. |
596 |
else: |
597 |
# We couldn't find a fulltext, so we must create a new one
|
|
2147.1.1
by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size |
598 |
return False |
2147.1.2
by John Arbash Meinel
Simplify the knit max-chain detection code. |
599 |
|
600 |
return fulltext_size > delta_size |
|
2147.1.1
by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size |
601 |
|
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
602 |
def _check_write_ok(self): |
603 |
return self._index._check_write_ok() |
|
604 |
||
1692.2.1
by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions. |
605 |
def _add_raw_records(self, records, data): |
606 |
"""Add all the records 'records' with data pre-joined in 'data'. |
|
607 |
||
608 |
:param records: A list of tuples(version_id, options, parents, size).
|
|
609 |
:param data: The data for the records. When it is written, the records
|
|
610 |
are adjusted to have pos pointing into data by the sum of
|
|
1759.2.1
by Jelmer Vernooij
Fix some types (found using aspell). |
611 |
the preceding records sizes.
|
1692.2.1
by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions. |
612 |
"""
|
613 |
# write all the data
|
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
614 |
raw_record_sizes = [record[3] for record in records] |
615 |
positions = self._data.add_raw_records(raw_record_sizes, data) |
|
1863.1.1
by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit |
616 |
offset = 0 |
1692.2.1
by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions. |
617 |
index_entries = [] |
2592.3.68
by Robert Collins
Make knit add_versions calls take access memo tuples rather than just pos and size. |
618 |
for (version_id, options, parents, size), access_memo in zip( |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
619 |
records, positions): |
2592.3.68
by Robert Collins
Make knit add_versions calls take access memo tuples rather than just pos and size. |
620 |
index_entries.append((version_id, options, access_memo, parents)) |
1863.1.1
by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit |
621 |
offset += size |
1692.2.1
by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions. |
622 |
self._index.add_versions(index_entries) |
623 |
||
1563.2.15
by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages. |
624 |
def copy_to(self, name, transport): |
625 |
"""See VersionedFile.copy_to().""" |
|
626 |
# copy the current index to a temp index to avoid racing with local
|
|
627 |
# writes
|
|
1955.3.30
by John Arbash Meinel
fix small bug |
628 |
transport.put_file_non_atomic(name + INDEX_SUFFIX + '.tmp', |
1955.3.24
by John Arbash Meinel
Update Knit to use the new non_atomic_foo functions |
629 |
self.transport.get(self._index._filename)) |
1563.2.15
by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages. |
630 |
# copy the data file
|
1711.7.25
by John Arbash Meinel
try/finally to close files, _KnitData was keeping a handle to a file it never used again, and using transport.rename() when it wanted transport.move() |
631 |
f = self._data._open_file() |
632 |
try: |
|
1955.3.8
by John Arbash Meinel
avoid some deprecation warnings in other parts of the code |
633 |
transport.put_file(name + DATA_SUFFIX, f) |
1711.7.25
by John Arbash Meinel
try/finally to close files, _KnitData was keeping a handle to a file it never used again, and using transport.rename() when it wanted transport.move() |
634 |
finally: |
635 |
f.close() |
|
636 |
# move the copied index into place
|
|
637 |
transport.move(name + INDEX_SUFFIX + '.tmp', name + INDEX_SUFFIX) |
|
1563.2.15
by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages. |
638 |
|
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
639 |
def get_data_stream(self, required_versions): |
640 |
"""Get a data stream for the specified versions. |
|
641 |
||
642 |
Versions may be returned in any order, not necessarily the order
|
|
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
643 |
specified. They are returned in a partial order by compression
|
644 |
parent, so that the deltas can be applied as the data stream is
|
|
645 |
inserted; however note that compression parents will not be sent
|
|
646 |
unless they were specifically requested, as the client may already
|
|
647 |
have them.
|
|
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
648 |
|
2670.3.7
by Andrew Bennetts
Tweak docstring as requested in review. |
649 |
:param required_versions: The exact set of versions to be extracted.
|
650 |
Unlike some other knit methods, this is not used to generate a
|
|
651 |
transitive closure, rather it is used precisely as given.
|
|
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
652 |
|
653 |
:returns: format_signature, list of (version, options, length, parents),
|
|
654 |
reader_callable.
|
|
655 |
"""
|
|
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
656 |
required_version_set = frozenset(required_versions) |
657 |
version_index = {} |
|
658 |
# list of revisions that can just be sent without waiting for their
|
|
659 |
# compression parent
|
|
660 |
ready_to_send = [] |
|
661 |
# map from revision to the children based on it
|
|
662 |
deferred = {} |
|
663 |
# first, read all relevant index data, enough to sort into the right
|
|
664 |
# order to return
|
|
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
665 |
for version_id in required_versions: |
666 |
options = self._index.get_options(version_id) |
|
667 |
parents = self._index.get_parents_with_ghosts(version_id) |
|
2535.3.36
by Andrew Bennetts
Merge bzr.dev |
668 |
index_memo = self._index.get_position(version_id) |
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
669 |
version_index[version_id] = (index_memo, options, parents) |
3034.3.1
by Martin Pool
Post-review cleanups from Robert for KnitVersionedFile.get_data_stream |
670 |
if ('line-delta' in options |
671 |
and parents[0] in required_version_set): |
|
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
672 |
# must wait until the parent has been sent
|
673 |
deferred.setdefault(parents[0], []). \ |
|
674 |
append(version_id) |
|
675 |
else: |
|
676 |
# either a fulltext, or a delta whose parent the client did
|
|
677 |
# not ask for and presumably already has
|
|
678 |
ready_to_send.append(version_id) |
|
679 |
# build a list of results to return, plus instructions for data to
|
|
680 |
# read from the file
|
|
681 |
copy_queue_records = [] |
|
3015.2.19
by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream. |
682 |
temp_version_list = [] |
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
683 |
while ready_to_send: |
684 |
# XXX: pushing and popping lists may be a bit inefficient
|
|
3023.2.3
by Martin Pool
Update tests for new ordering of results from get_data_stream - the order is not defined by the interface, but is stable |
685 |
version_id = ready_to_send.pop(0) |
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
686 |
(index_memo, options, parents) = version_index[version_id] |
2535.3.36
by Andrew Bennetts
Merge bzr.dev |
687 |
copy_queue_records.append((version_id, index_memo)) |
688 |
none, data_pos, data_size = index_memo |
|
3015.2.19
by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream. |
689 |
temp_version_list.append((version_id, options, data_size, |
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
690 |
parents)) |
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
691 |
if version_id in deferred: |
692 |
# now we can send all the children of this revision - we could
|
|
3023.2.3
by Martin Pool
Update tests for new ordering of results from get_data_stream - the order is not defined by the interface, but is stable |
693 |
# put them in anywhere, but we hope that sending them soon
|
694 |
# after the fulltext will give good locality in the receiver
|
|
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
695 |
ready_to_send[:0] = deferred.pop(version_id) |
696 |
assert len(deferred) == 0, \ |
|
697 |
"Still have compressed child versions waiting to be sent"
|
|
3015.2.19
by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream. |
698 |
# XXX: The stream format is such that we cannot stream it - we have to
|
699 |
# know the length of all the data a-priori.
|
|
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
700 |
raw_datum = [] |
3015.2.19
by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream. |
701 |
result_version_list = [] |
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
702 |
for (version_id, raw_data), \ |
703 |
(version_id2, options, _, parents) in \ |
|
704 |
izip(self._data.read_records_iter_raw(copy_queue_records), |
|
3015.2.19
by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream. |
705 |
temp_version_list): |
3023.2.2
by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106) |
706 |
assert version_id == version_id2, \ |
707 |
'logic error, inconsistent results'
|
|
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
708 |
raw_datum.append(raw_data) |
3015.2.19
by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream. |
709 |
result_version_list.append( |
710 |
(version_id, options, len(raw_data), parents)) |
|
711 |
# provide a callback to get data incrementally.
|
|
2535.3.3
by Andrew Bennetts
Add Knit.get_data_stream. |
712 |
pseudo_file = StringIO(''.join(raw_datum)) |
713 |
def read(length): |
|
714 |
if length is None: |
|
715 |
return pseudo_file.read() |
|
716 |
else: |
|
717 |
return pseudo_file.read(length) |
|
718 |
return (self.get_format_signature(), result_version_list, read) |
|
719 |
||
2520.4.47
by Aaron Bentley
Fix get_line_delta_blocks with eol |
720 |
def _extract_blocks(self, version_id, source, target): |
2520.4.41
by Aaron Bentley
Accelerate mpdiff generation |
721 |
if self._index.get_method(version_id) != 'line-delta': |
722 |
return None |
|
723 |
parent, sha1, noeol, delta = self.get_delta(version_id) |
|
2520.4.47
by Aaron Bentley
Fix get_line_delta_blocks with eol |
724 |
return KnitContent.get_line_delta_blocks(delta, source, target) |
2520.4.41
by Aaron Bentley
Accelerate mpdiff generation |
725 |
|
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
726 |
def get_delta(self, version_id): |
727 |
"""Get a delta for constructing version from some other version.""" |
|
2229.2.3
by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY |
728 |
self.check_not_reserved_id(version_id) |
3287.5.2
by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code. |
729 |
parents = self.get_parent_map([version_id])[version_id] |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
730 |
if len(parents): |
731 |
parent = parents[0] |
|
732 |
else: |
|
733 |
parent = None |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
734 |
index_memo = self._index.get_position(version_id) |
735 |
data, sha1 = self._data.read_records(((version_id, index_memo),))[version_id] |
|
1596.2.37
by Robert Collins
Switch to delta based content copying in the generic versioned file copier. |
736 |
noeol = 'no-eol' in self._index.get_options(version_id) |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
737 |
if 'fulltext' == self._index.get_method(version_id): |
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
738 |
new_content = self.factory.parse_fulltext(data, version_id) |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
739 |
if parent is not None: |
740 |
reference_content = self._get_content(parent) |
|
741 |
old_texts = reference_content.text() |
|
742 |
else: |
|
743 |
old_texts = [] |
|
744 |
new_texts = new_content.text() |
|
2781.1.1
by Martin Pool
merge cpatiencediff from Lukas |
745 |
delta_seq = patiencediff.PatienceSequenceMatcher(None, old_texts, |
746 |
new_texts) |
|
1596.2.37
by Robert Collins
Switch to delta based content copying in the generic versioned file copier. |
747 |
return parent, sha1, noeol, self._make_line_delta(delta_seq, new_content) |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
748 |
else: |
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
749 |
delta = self.factory.parse_line_delta(data, version_id) |
1596.2.37
by Robert Collins
Switch to delta based content copying in the generic versioned file copier. |
750 |
return parent, sha1, noeol, delta |
2535.3.1
by Andrew Bennetts
Add get_format_signature to VersionedFile |
751 |
|
752 |
def get_format_signature(self): |
|
753 |
"""See VersionedFile.get_format_signature().""" |
|
754 |
if self.factory.annotated: |
|
755 |
annotated_part = "annotated" |
|
756 |
else: |
|
757 |
annotated_part = "plain" |
|
2535.3.17
by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request. |
758 |
return "knit-%s" % (annotated_part,) |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
759 |
|
3287.6.7
by Robert Collins
* ``VersionedFile.get_graph_with_ghosts`` is deprecated, with no |
760 |
@deprecated_method(one_four) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
761 |
def get_graph_with_ghosts(self): |
762 |
"""See VersionedFile.get_graph_with_ghosts().""" |
|
3287.6.7
by Robert Collins
* ``VersionedFile.get_graph_with_ghosts`` is deprecated, with no |
763 |
return self.get_parent_map(self.versions()) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
764 |
|
2520.4.88
by Aaron Bentley
Retrieve all sha1s at once (ftw) |
765 |
def get_sha1s(self, version_ids): |
3316.2.9
by Robert Collins
* ``VersionedFile.get_sha1`` is deprecated, please use |
766 |
"""See VersionedFile.get_sha1s().""" |
2520.4.88
by Aaron Bentley
Retrieve all sha1s at once (ftw) |
767 |
record_map = self._get_record_map(version_ids) |
768 |
# record entry 2 is the 'digest'.
|
|
769 |
return [record_map[v][2] for v in version_ids] |
|
1666.1.6
by Robert Collins
Make knit the default format. |
770 |
|
3287.6.5
by Robert Collins
Deprecate VersionedFile.has_ghost. |
771 |
@deprecated_method(one_four) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
772 |
def has_ghost(self, version_id): |
773 |
"""True if there is a ghost reference in the file to version_id.""" |
|
774 |
# maybe we have it
|
|
775 |
if self.has_version(version_id): |
|
776 |
return False |
|
1759.2.2
by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron. |
777 |
# optimisable if needed by memoising the _ghosts set.
|
3287.6.5
by Robert Collins
Deprecate VersionedFile.has_ghost. |
778 |
items = self.get_parent_map(self.versions()) |
3287.6.6
by Robert Collins
Unbreak has_ghosts. |
779 |
for parents in items.itervalues(): |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
780 |
for parent in parents: |
3287.6.5
by Robert Collins
Deprecate VersionedFile.has_ghost. |
781 |
if parent == version_id and parent not in items: |
782 |
return True |
|
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
783 |
return False |
784 |
||
2535.3.30
by Andrew Bennetts
Delete obsolete comments and other cosmetic changes. |
785 |
def insert_data_stream(self, (format, data_list, reader_callable)): |
2535.3.4
by Andrew Bennetts
Simple implementation of Knit.insert_data_stream. |
786 |
"""Insert knit records from a data stream into this knit. |
787 |
||
2535.3.5
by Andrew Bennetts
Batch writes as much as possible in insert_data_stream. |
788 |
If a version in the stream is already present in this knit, it will not
|
789 |
be inserted a second time. It will be checked for consistency with the
|
|
790 |
stored version however, and may cause a KnitCorrupt error to be raised
|
|
791 |
if the data in the stream disagrees with the already stored data.
|
|
2535.3.4
by Andrew Bennetts
Simple implementation of Knit.insert_data_stream. |
792 |
|
793 |
:seealso: get_data_stream
|
|
794 |
"""
|
|
795 |
if format != self.get_format_signature(): |
|
3172.2.1
by Andrew Bennetts
Enable use of smart revision streaming between repos with compatible models, not just between identical format repos. |
796 |
if 'knit' in debug.debug_flags: |
797 |
trace.mutter( |
|
798 |
'incompatible format signature inserting to %r', self) |
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
799 |
source = self._knit_from_datastream( |
800 |
(format, data_list, reader_callable)) |
|
801 |
self.join(source) |
|
802 |
return
|
|
2535.3.17
by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request. |
803 |
|
804 |
for version_id, options, length, parents in data_list: |
|
805 |
if self.has_version(version_id): |
|
806 |
# First check: the list of parents.
|
|
807 |
my_parents = self.get_parents_with_ghosts(version_id) |
|
3184.5.1
by Lukáš Lalinský
Fix handling of some error cases in insert_data_stream |
808 |
if tuple(my_parents) != tuple(parents): |
2535.3.17
by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request. |
809 |
# XXX: KnitCorrupt is not quite the right exception here.
|
810 |
raise KnitCorrupt( |
|
811 |
self.filename, |
|
812 |
'parents list %r from data stream does not match ' |
|
813 |
'already recorded parents %r for %s' |
|
814 |
% (parents, my_parents, version_id)) |
|
815 |
||
816 |
# Also check the SHA-1 of the fulltext this content will
|
|
817 |
# produce.
|
|
818 |
raw_data = reader_callable(length) |
|
3316.2.9
by Robert Collins
* ``VersionedFile.get_sha1`` is deprecated, please use |
819 |
my_fulltext_sha1 = self.get_sha1s([version_id])[0] |
2535.3.17
by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request. |
820 |
df, rec = self._data._parse_record_header(version_id, raw_data) |
821 |
stream_fulltext_sha1 = rec[3] |
|
822 |
if my_fulltext_sha1 != stream_fulltext_sha1: |
|
823 |
# Actually, we don't know if it's this knit that's corrupt,
|
|
824 |
# or the data stream we're trying to insert.
|
|
825 |
raise KnitCorrupt( |
|
826 |
self.filename, 'sha-1 does not match %s' % version_id) |
|
827 |
else: |
|
2535.3.57
by Andrew Bennetts
Perform some sanity checking of data streams rather than blindly inserting them into our repository. |
828 |
if 'line-delta' in options: |
2535.3.61
by Andrew Bennetts
Clarify sanity checking in insert_data_stream. |
829 |
# Make sure that this knit record is actually useful: a
|
830 |
# line-delta is no use unless we have its parent.
|
|
831 |
# Fetching from a broken repository with this problem
|
|
832 |
# shouldn't break the target repository.
|
|
3040.2.1
by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled |
833 |
#
|
834 |
# See https://bugs.launchpad.net/bzr/+bug/164443
|
|
2535.3.61
by Andrew Bennetts
Clarify sanity checking in insert_data_stream. |
835 |
if not self._index.has_version(parents[0]): |
836 |
raise KnitCorrupt( |
|
837 |
self.filename, |
|
3040.2.1
by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled |
838 |
'line-delta from stream '
|
839 |
'for version %s ' |
|
840 |
'references '
|
|
841 |
'missing parent %s\n' |
|
3040.2.2
by Martin Pool
Clearer reconcile recommendation message (thanks Matt Nordhoff) |
842 |
'Try running "bzr check" '
|
843 |
'on the source repository, and "bzr reconcile" '
|
|
3040.2.1
by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled |
844 |
'if necessary.' % |
845 |
(version_id, parents[0])) |
|
2535.3.17
by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request. |
846 |
self._add_raw_records( |
847 |
[(version_id, options, parents, length)], |
|
848 |
reader_callable(length)) |
|
849 |
||
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
850 |
def _knit_from_datastream(self, (format, data_list, reader_callable)): |
851 |
"""Create a knit object from a data stream. |
|
852 |
||
853 |
This method exists to allow conversion of data streams that do not
|
|
854 |
match the signature of this knit. Generally it will be slower and use
|
|
855 |
more memory to use this method to insert data, but it will work.
|
|
856 |
||
857 |
:seealso: get_data_stream for details on datastreams.
|
|
858 |
:return: A knit versioned file which can be used to join the datastream
|
|
859 |
into self.
|
|
860 |
"""
|
|
861 |
if format == "knit-plain": |
|
862 |
factory = KnitPlainFactory() |
|
863 |
elif format == "knit-annotated": |
|
864 |
factory = KnitAnnotateFactory() |
|
865 |
else: |
|
866 |
raise errors.KnitDataStreamUnknown(format) |
|
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
867 |
index = _StreamIndex(data_list, self._index) |
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
868 |
access = _StreamAccess(reader_callable, index, self, factory) |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
869 |
return KnitVersionedFile(self.filename, self.transport, |
870 |
factory=factory, index=index, access_method=access) |
|
871 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
872 |
def versions(self): |
873 |
"""See VersionedFile.versions.""" |
|
2745.1.1
by Robert Collins
Add a number of -Devil checkpoints. |
874 |
if 'evil' in debug.debug_flags: |
2745.1.2
by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly. |
875 |
trace.mutter_callsite(2, "versions scales with size of history") |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
876 |
return self._index.get_versions() |
877 |
||
878 |
def has_version(self, version_id): |
|
879 |
"""See VersionedFile.has_version.""" |
|
2745.1.1
by Robert Collins
Add a number of -Devil checkpoints. |
880 |
if 'evil' in debug.debug_flags: |
2745.1.2
by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly. |
881 |
trace.mutter_callsite(2, "has_version is a LBYL scenario") |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
882 |
return self._index.has_version(version_id) |
883 |
||
884 |
__contains__ = has_version |
|
885 |
||
1596.2.34
by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation. |
886 |
def _merge_annotations(self, content, parents, parent_texts={}, |
2520.4.140
by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation |
887 |
delta=None, annotated=None, |
888 |
left_matching_blocks=None): |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
889 |
"""Merge annotations for content. This is done by comparing |
1596.2.27
by Robert Collins
Note potential improvements in knit adds. |
890 |
the annotations based on changed to the text.
|
891 |
"""
|
|
2520.4.146
by Aaron Bentley
Avoid get_matching_blocks for un-annotated text |
892 |
if left_matching_blocks is not None: |
893 |
delta_seq = diff._PrematchedMatcher(left_matching_blocks) |
|
894 |
else: |
|
895 |
delta_seq = None |
|
1596.2.34
by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation. |
896 |
if annotated: |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
897 |
for parent_id in parents: |
1596.2.34
by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation. |
898 |
merge_content = self._get_content(parent_id, parent_texts) |
2520.4.146
by Aaron Bentley
Avoid get_matching_blocks for un-annotated text |
899 |
if (parent_id == parents[0] and delta_seq is not None): |
900 |
seq = delta_seq |
|
2520.4.140
by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation |
901 |
else: |
902 |
seq = patiencediff.PatienceSequenceMatcher( |
|
903 |
None, merge_content.text(), content.text()) |
|
1596.2.34
by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation. |
904 |
for i, j, n in seq.get_matching_blocks(): |
905 |
if n == 0: |
|
906 |
continue
|
|
2520.4.146
by Aaron Bentley
Avoid get_matching_blocks for un-annotated text |
907 |
# this appears to copy (origin, text) pairs across to the
|
908 |
# new content for any line that matches the last-checked
|
|
909 |
# parent.
|
|
1596.2.34
by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation. |
910 |
content._lines[j:j+n] = merge_content._lines[i:i+n] |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
911 |
if delta: |
2520.4.146
by Aaron Bentley
Avoid get_matching_blocks for un-annotated text |
912 |
if delta_seq is None: |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
913 |
reference_content = self._get_content(parents[0], parent_texts) |
914 |
new_texts = content.text() |
|
915 |
old_texts = reference_content.text() |
|
2104.4.2
by John Arbash Meinel
Small cleanup and NEWS entry about fixing bug #65714 |
916 |
delta_seq = patiencediff.PatienceSequenceMatcher( |
2100.2.1
by wang
Replace python's difflib by patiencediff because the worst case |
917 |
None, old_texts, new_texts) |
1596.2.36
by Robert Collins
add a get_delta api to versioned_file. |
918 |
return self._make_line_delta(delta_seq, content) |
919 |
||
920 |
def _make_line_delta(self, delta_seq, new_content): |
|
921 |
"""Generate a line delta from delta_seq and new_content.""" |
|
922 |
diff_hunks = [] |
|
923 |
for op in delta_seq.get_opcodes(): |
|
924 |
if op[0] == 'equal': |
|
925 |
continue
|
|
926 |
diff_hunks.append((op[1], op[2], op[4]-op[3], new_content._lines[op[3]:op[4]])) |
|
1596.2.34
by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation. |
927 |
return diff_hunks |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
928 |
|
1756.3.17
by Aaron Bentley
Combine get_components_positions with get_components_versions |
929 |
def _get_components_positions(self, version_ids): |
1756.3.19
by Aaron Bentley
Documentation and cleanups |
930 |
"""Produce a map of position data for the components of versions. |
931 |
||
1756.3.22
by Aaron Bentley
Tweaks from review |
932 |
This data is intended to be used for retrieving the knit records.
|
1756.3.19
by Aaron Bentley
Documentation and cleanups |
933 |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
934 |
A dict of version_id to (record_details, index_memo, next, parents) is
|
1756.3.19
by Aaron Bentley
Documentation and cleanups |
935 |
returned.
|
936 |
method is the way referenced data should be applied.
|
|
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
937 |
index_memo is the handle to pass to the data access to actually get the
|
938 |
data
|
|
1756.3.19
by Aaron Bentley
Documentation and cleanups |
939 |
next is the build-parent of the version, or None for fulltexts.
|
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
940 |
parents is the version_ids of the parents of this version
|
1756.3.19
by Aaron Bentley
Documentation and cleanups |
941 |
"""
|
1756.3.9
by Aaron Bentley
More optimization refactoring |
942 |
component_data = {} |
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
943 |
pending_components = version_ids |
944 |
while pending_components: |
|
945 |
build_details = self._index.get_build_details(pending_components) |
|
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
946 |
current_components = set(pending_components) |
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
947 |
pending_components = set() |
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
948 |
for version_id, details in build_details.iteritems(): |
949 |
(index_memo, compression_parent, parents, |
|
950 |
record_details) = details |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
951 |
method = record_details[0] |
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
952 |
if compression_parent is not None: |
953 |
pending_components.add(compression_parent) |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
954 |
component_data[version_id] = (record_details, index_memo, |
3224.1.13
by John Arbash Meinel
Revert the _get_component_positions api |
955 |
compression_parent) |
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
956 |
missing = current_components.difference(build_details) |
957 |
if missing: |
|
958 |
raise errors.RevisionNotPresent(missing.pop(), self.filename) |
|
1756.3.10
by Aaron Bentley
Optimize selection and retrieval of records |
959 |
return component_data |
1756.3.18
by Aaron Bentley
More cleanup |
960 |
|
1596.2.32
by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility. |
961 |
def _get_content(self, version_id, parent_texts={}): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
962 |
"""Returns a content object that makes up the specified |
963 |
version."""
|
|
1596.2.32
by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility. |
964 |
cached_version = parent_texts.get(version_id, None) |
965 |
if cached_version is not None: |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
966 |
if not self.has_version(version_id): |
967 |
raise RevisionNotPresent(version_id, self.filename) |
|
1596.2.32
by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility. |
968 |
return cached_version |
969 |
||
1756.3.22
by Aaron Bentley
Tweaks from review |
970 |
text_map, contents_map = self._get_content_maps([version_id]) |
971 |
return contents_map[version_id] |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
972 |
|
973 |
def _check_versions_present(self, version_ids): |
|
974 |
"""Check that all specified versions are present.""" |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
975 |
self._index.check_versions_present(version_ids) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
976 |
|
2794.1.1
by Robert Collins
Allow knits to be instructed not to add a text based on a sha, for commit. |
977 |
def _add_lines_with_ghosts(self, version_id, parents, lines, parent_texts, |
3287.5.2
by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code. |
978 |
nostore_sha, random_id, check_content, left_matching_blocks): |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
979 |
"""See VersionedFile.add_lines_with_ghosts().""" |
2805.6.7
by Robert Collins
Review feedback. |
980 |
self._check_add(version_id, lines, random_id, check_content) |
2805.6.2
by Robert Collins
General cleanup of KnitVersionedFile._add. |
981 |
return self._add(version_id, lines, parents, self.delta, |
3287.5.2
by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code. |
982 |
parent_texts, left_matching_blocks, nostore_sha, random_id) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
983 |
|
2520.4.140
by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation |
984 |
def _add_lines(self, version_id, parents, lines, parent_texts, |
2805.6.7
by Robert Collins
Review feedback. |
985 |
left_matching_blocks, nostore_sha, random_id, check_content): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
986 |
"""See VersionedFile.add_lines.""" |
2805.6.7
by Robert Collins
Review feedback. |
987 |
self._check_add(version_id, lines, random_id, check_content) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
988 |
self._check_versions_present(parents) |
2520.4.140
by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation |
989 |
return self._add(version_id, lines[:], parents, self.delta, |
2841.2.1
by Robert Collins
* Commit no longer checks for new text keys during insertion when the |
990 |
parent_texts, left_matching_blocks, nostore_sha, random_id) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
991 |
|
2805.6.7
by Robert Collins
Review feedback. |
992 |
def _check_add(self, version_id, lines, random_id, check_content): |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
993 |
"""check that version_id and lines are safe to add.""" |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
994 |
if contains_whitespace(version_id): |
1668.5.1
by Olaf Conradi
Fix bug in knits when raising InvalidRevisionId without the required |
995 |
raise InvalidRevisionId(version_id, self.filename) |
2229.2.3
by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY |
996 |
self.check_not_reserved_id(version_id) |
2805.6.4
by Robert Collins
Don't check for existing versions when adding texts with random revision ids. |
997 |
# Technically this could be avoided if we are happy to allow duplicate
|
998 |
# id insertion when other things than bzr core insert texts, but it
|
|
999 |
# seems useful for folk using the knit api directly to have some safety
|
|
1000 |
# blanket that we can disable.
|
|
1001 |
if not random_id and self.has_version(version_id): |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1002 |
raise RevisionAlreadyPresent(version_id, self.filename) |
2805.6.7
by Robert Collins
Review feedback. |
1003 |
if check_content: |
1004 |
self._check_lines_not_unicode(lines) |
|
1005 |
self._check_lines_are_lines(lines) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1006 |
|
2520.4.140
by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation |
1007 |
def _add(self, version_id, lines, parents, delta, parent_texts, |
2841.2.1
by Robert Collins
* Commit no longer checks for new text keys during insertion when the |
1008 |
left_matching_blocks, nostore_sha, random_id): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1009 |
"""Add a set of lines on top of version specified by parents. |
1010 |
||
1011 |
If delta is true, compress the text as a line-delta against
|
|
1012 |
the first parent.
|
|
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1013 |
|
1014 |
Any versions not present will be converted into ghosts.
|
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1015 |
"""
|
2850.1.1
by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when |
1016 |
# first thing, if the content is something we don't need to store, find
|
1017 |
# that out.
|
|
1018 |
line_bytes = ''.join(lines) |
|
1019 |
digest = sha_string(line_bytes) |
|
1020 |
if nostore_sha == digest: |
|
1021 |
raise errors.ExistingContent |
|
1596.2.28
by Robert Collins
more knit profile based tuning. |
1022 |
|
1596.2.10
by Robert Collins
Reviewer feedback on knit branches. |
1023 |
present_parents = [] |
1596.2.32
by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility. |
1024 |
if parent_texts is None: |
1025 |
parent_texts = {} |
|
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1026 |
for parent in parents: |
2805.6.2
by Robert Collins
General cleanup of KnitVersionedFile._add. |
1027 |
if self.has_version(parent): |
1596.2.10
by Robert Collins
Reviewer feedback on knit branches. |
1028 |
present_parents.append(parent) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1029 |
|
2805.6.2
by Robert Collins
General cleanup of KnitVersionedFile._add. |
1030 |
# can only compress against the left most present parent.
|
1031 |
if (delta and |
|
1032 |
(len(present_parents) == 0 or |
|
1033 |
present_parents[0] != parents[0])): |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1034 |
delta = False |
1035 |
||
2850.1.1
by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when |
1036 |
text_length = len(line_bytes) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1037 |
options = [] |
1038 |
if lines: |
|
1039 |
if lines[-1][-1] != '\n': |
|
2805.6.2
by Robert Collins
General cleanup of KnitVersionedFile._add. |
1040 |
# copy the contents of lines.
|
1041 |
lines = lines[:] |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1042 |
options.append('no-eol') |
1043 |
lines[-1] = lines[-1] + '\n' |
|
2888.1.1
by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins) |
1044 |
line_bytes += '\n' |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1045 |
|
2805.6.2
by Robert Collins
General cleanup of KnitVersionedFile._add. |
1046 |
if delta: |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1047 |
# To speed the extract of texts the delta chain is limited
|
1048 |
# to a fixed number of deltas. This should minimize both
|
|
1049 |
# I/O and the time spend applying deltas.
|
|
2147.1.1
by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size |
1050 |
delta = self._check_should_delta(present_parents) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1051 |
|
2249.5.15
by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down. |
1052 |
assert isinstance(version_id, str) |
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
1053 |
content = self.factory.make(lines, version_id) |
1596.2.34
by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation. |
1054 |
if delta or (self.factory.annotated and len(present_parents) > 0): |
2805.6.2
by Robert Collins
General cleanup of KnitVersionedFile._add. |
1055 |
# Merge annotations from parent texts if needed.
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
1056 |
delta_hunks = self._merge_annotations(content, present_parents, |
2520.4.140
by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation |
1057 |
parent_texts, delta, self.factory.annotated, |
1058 |
left_matching_blocks) |
|
1596.2.32
by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility. |
1059 |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1060 |
if delta: |
1061 |
options.append('line-delta') |
|
1062 |
store_lines = self.factory.lower_line_delta(delta_hunks) |
|
2850.1.1
by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when |
1063 |
size, bytes = self._data._record_to_data(version_id, digest, |
1064 |
store_lines) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1065 |
else: |
1066 |
options.append('fulltext') |
|
2888.1.3
by Robert Collins
Review feedback. |
1067 |
# isinstance is slower and we have no hierarchy.
|
2888.1.1
by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins) |
1068 |
if self.factory.__class__ == KnitPlainFactory: |
2888.1.3
by Robert Collins
Review feedback. |
1069 |
# Use the already joined bytes saving iteration time in
|
1070 |
# _record_to_data.
|
|
2888.1.1
by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins) |
1071 |
size, bytes = self._data._record_to_data(version_id, digest, |
1072 |
lines, [line_bytes]) |
|
1073 |
else: |
|
1074 |
# get mixed annotation + content and feed it into the
|
|
1075 |
# serialiser.
|
|
1076 |
store_lines = self.factory.lower_fulltext(content) |
|
1077 |
size, bytes = self._data._record_to_data(version_id, digest, |
|
1078 |
store_lines) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1079 |
|
2850.1.1
by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when |
1080 |
access_memo = self._data.add_raw_records([size], bytes)[0] |
2841.2.1
by Robert Collins
* Commit no longer checks for new text keys during insertion when the |
1081 |
self._index.add_versions( |
2850.1.1
by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when |
1082 |
((version_id, options, access_memo, parents),), |
1083 |
random_id=random_id) |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
1084 |
return digest, text_length, content |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1085 |
|
1563.2.19
by Robert Collins
stub out a check for knits. |
1086 |
def check(self, progress_bar=None): |
1087 |
"""See VersionedFile.check().""" |
|
1088 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1089 |
def get_lines(self, version_id): |
1090 |
"""See VersionedFile.get_lines().""" |
|
1756.2.8
by Aaron Bentley
Implement get_line_list, cleanups |
1091 |
return self.get_line_list([version_id])[0] |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1092 |
|
1756.3.12
by Aaron Bentley
Stuff all text-building data in record_map |
1093 |
def _get_record_map(self, version_ids): |
1756.3.19
by Aaron Bentley
Documentation and cleanups |
1094 |
"""Produce a dictionary of knit records. |
1095 |
|
|
3224.1.17
by John Arbash Meinel
Clean up some variable ordering to make more sense. |
1096 |
:return: {version_id:(record, record_details, digest, next)}
|
1097 |
record
|
|
1098 |
data returned from read_records
|
|
1099 |
record_details
|
|
1100 |
opaque information to pass to parse_record
|
|
1101 |
digest
|
|
1102 |
SHA1 digest of the full text after all steps are done
|
|
1103 |
next
|
|
1104 |
build-parent of the version, i.e. the leftmost ancestor.
|
|
1105 |
Will be None if the record is not a delta.
|
|
1756.3.19
by Aaron Bentley
Documentation and cleanups |
1106 |
"""
|
1756.3.12
by Aaron Bentley
Stuff all text-building data in record_map |
1107 |
position_map = self._get_components_positions(version_ids) |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1108 |
# c = component_id, r = record_details, i_m = index_memo, n = next
|
1109 |
records = [(c, i_m) for c, (r, i_m, n) |
|
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
1110 |
in position_map.iteritems()] |
1756.3.12
by Aaron Bentley
Stuff all text-building data in record_map |
1111 |
record_map = {} |
3224.1.17
by John Arbash Meinel
Clean up some variable ordering to make more sense. |
1112 |
for component_id, record, digest in \ |
1863.1.9
by John Arbash Meinel
Switching to have 'read_records_iter' return in random order. |
1113 |
self._data.read_records_iter(records): |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1114 |
(record_details, index_memo, next) = position_map[component_id] |
3224.1.17
by John Arbash Meinel
Clean up some variable ordering to make more sense. |
1115 |
record_map[component_id] = record, record_details, digest, next |
3224.1.13
by John Arbash Meinel
Revert the _get_component_positions api |
1116 |
|
1756.3.10
by Aaron Bentley
Optimize selection and retrieval of records |
1117 |
return record_map |
1756.2.5
by Aaron Bentley
Reduced read_records calls to 1 |
1118 |
|
1756.2.7
by Aaron Bentley
Implement get_text in terms of get_texts |
1119 |
def get_text(self, version_id): |
1120 |
"""See VersionedFile.get_text""" |
|
1121 |
return self.get_texts([version_id])[0] |
|
1122 |
||
1756.2.1
by Aaron Bentley
Implement get_texts |
1123 |
def get_texts(self, version_ids): |
1756.2.8
by Aaron Bentley
Implement get_line_list, cleanups |
1124 |
return [''.join(l) for l in self.get_line_list(version_ids)] |
1125 |
||
1126 |
def get_line_list(self, version_ids): |
|
1756.2.1
by Aaron Bentley
Implement get_texts |
1127 |
"""Return the texts of listed versions as a list of strings.""" |
2229.2.1
by Aaron Bentley
Reject reserved ids in versiondfile, tree, branch and repository |
1128 |
for version_id in version_ids: |
2229.2.3
by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY |
1129 |
self.check_not_reserved_id(version_id) |
1756.3.13
by Aaron Bentley
Refactor get_line_list into _get_content |
1130 |
text_map, content_map = self._get_content_maps(version_ids) |
1131 |
return [text_map[v] for v in version_ids] |
|
1132 |
||
2520.4.90
by Aaron Bentley
Handle \r terminated lines in Weaves properly |
1133 |
_get_lf_split_line_list = get_line_list |
2520.4.3
by Aaron Bentley
Implement plain strategy for extracting and installing multiparent diffs |
1134 |
|
1756.3.13
by Aaron Bentley
Refactor get_line_list into _get_content |
1135 |
def _get_content_maps(self, version_ids): |
1756.3.19
by Aaron Bentley
Documentation and cleanups |
1136 |
"""Produce maps of text and KnitContents |
1137 |
|
|
1138 |
:return: (text_map, content_map) where text_map contains the texts for
|
|
1139 |
the requested versions and content_map contains the KnitContents.
|
|
1756.3.22
by Aaron Bentley
Tweaks from review |
1140 |
Both dicts take version_ids as their keys.
|
1756.3.19
by Aaron Bentley
Documentation and cleanups |
1141 |
"""
|
2921.2.1
by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for |
1142 |
# FUTURE: This function could be improved for the 'extract many' case
|
1143 |
# by tracking each component and only doing the copy when the number of
|
|
1144 |
# children than need to apply delta's to it is > 1 or it is part of the
|
|
1145 |
# final output.
|
|
1146 |
version_ids = list(version_ids) |
|
1147 |
multiple_versions = len(version_ids) != 1 |
|
1756.3.12
by Aaron Bentley
Stuff all text-building data in record_map |
1148 |
record_map = self._get_record_map(version_ids) |
1756.2.5
by Aaron Bentley
Reduced read_records calls to 1 |
1149 |
|
1756.2.8
by Aaron Bentley
Implement get_line_list, cleanups |
1150 |
text_map = {} |
1756.3.7
by Aaron Bentley
Avoid re-parsing texts version components |
1151 |
content_map = {} |
1756.3.14
by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts |
1152 |
final_content = {} |
1756.3.10
by Aaron Bentley
Optimize selection and retrieval of records |
1153 |
for version_id in version_ids: |
1154 |
components = [] |
|
1155 |
cursor = version_id |
|
1156 |
while cursor is not None: |
|
3224.1.17
by John Arbash Meinel
Clean up some variable ordering to make more sense. |
1157 |
record, record_details, digest, next = record_map[cursor] |
1158 |
components.append((cursor, record, record_details, digest)) |
|
1756.3.10
by Aaron Bentley
Optimize selection and retrieval of records |
1159 |
if cursor in content_map: |
1160 |
break
|
|
1161 |
cursor = next |
|
1162 |
||
1756.2.1
by Aaron Bentley
Implement get_texts |
1163 |
content = None |
3224.1.17
by John Arbash Meinel
Clean up some variable ordering to make more sense. |
1164 |
for (component_id, record, record_details, |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1165 |
digest) in reversed(components): |
1756.3.7
by Aaron Bentley
Avoid re-parsing texts version components |
1166 |
if component_id in content_map: |
1167 |
content = content_map[component_id] |
|
1756.3.8
by Aaron Bentley
Avoid unused calls, use generators, sets instead of lists |
1168 |
else: |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1169 |
content, delta = self.factory.parse_record(version_id, |
3224.1.17
by John Arbash Meinel
Clean up some variable ordering to make more sense. |
1170 |
record, record_details, content, |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1171 |
copy_base_content=multiple_versions) |
2921.2.1
by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for |
1172 |
if multiple_versions: |
1173 |
content_map[component_id] = content |
|
1756.2.1
by Aaron Bentley
Implement get_texts |
1174 |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1175 |
content.cleanup_eol(copy_on_mutate=multiple_versions) |
1756.3.14
by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts |
1176 |
final_content[version_id] = content |
1756.2.1
by Aaron Bentley
Implement get_texts |
1177 |
|
1178 |
# digest here is the digest from the last applied component.
|
|
1756.3.6
by Aaron Bentley
More multi-text extraction |
1179 |
text = content.text() |
2911.1.1
by Martin Pool
Better messages when problems are detected inside a knit |
1180 |
actual_sha = sha_strings(text) |
1181 |
if actual_sha != digest: |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
1182 |
raise KnitCorrupt(self.filename, |
2911.1.1
by Martin Pool
Better messages when problems are detected inside a knit |
1183 |
'\n sha-1 %s' |
1184 |
'\n of reconstructed text does not match' |
|
1185 |
'\n expected %s' |
|
1186 |
'\n for version %s' % |
|
1187 |
(actual_sha, digest, version_id)) |
|
2794.1.2
by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts. |
1188 |
text_map[version_id] = text |
1189 |
return text_map, final_content |
|
1756.2.1
by Aaron Bentley
Implement get_texts |
1190 |
|
2039.1.1
by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000) |
1191 |
def iter_lines_added_or_present_in_versions(self, version_ids=None, |
1192 |
pb=None): |
|
1594.2.6
by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved. |
1193 |
"""See VersionedFile.iter_lines_added_or_present_in_versions().""" |
1194 |
if version_ids is None: |
|
1195 |
version_ids = self.versions() |
|
2039.1.1
by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000) |
1196 |
if pb is None: |
1197 |
pb = progress.DummyProgress() |
|
1759.2.2
by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron. |
1198 |
# we don't care about inclusions, the caller cares.
|
1594.2.6
by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved. |
1199 |
# but we need to setup a list of records to visit.
|
1200 |
# we need version_id, position, length
|
|
1201 |
version_id_records = [] |
|
2163.1.1
by John Arbash Meinel
Use a set to make iter_lines_added_or_present *much* faster |
1202 |
requested_versions = set(version_ids) |
1594.3.1
by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit. |
1203 |
# filter for available versions
|
2698.2.4
by Robert Collins
Remove full history scan during iter_lines_added_or_present in KnitVersionedFile. |
1204 |
for version_id in requested_versions: |
1594.2.6
by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved. |
1205 |
if not self.has_version(version_id): |
1206 |
raise RevisionNotPresent(version_id, self.filename) |
|
1594.3.1
by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit. |
1207 |
# get a in-component-order queue:
|
1208 |
for version_id in self.versions(): |
|
1209 |
if version_id in requested_versions: |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1210 |
index_memo = self._index.get_position(version_id) |
1211 |
version_id_records.append((version_id, index_memo)) |
|
1594.3.1
by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit. |
1212 |
|
1594.2.17
by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading. |
1213 |
total = len(version_id_records) |
2147.1.3
by John Arbash Meinel
In knit.py we were re-using a variable in 2 loops, causing bogus progress messages to be generated. |
1214 |
for version_idx, (version_id, data, sha_value) in \ |
1215 |
enumerate(self._data.read_records_iter(version_id_records)): |
|
1216 |
pb.update('Walking content.', version_idx, total) |
|
2039.1.1
by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000) |
1217 |
method = self._index.get_method(version_id) |
2163.1.7
by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley |
1218 |
|
2039.1.1
by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000) |
1219 |
assert method in ('fulltext', 'line-delta') |
1220 |
if method == 'fulltext': |
|
2163.1.7
by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley |
1221 |
line_iterator = self.factory.get_fulltext_content(data) |
2039.1.1
by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000) |
1222 |
else: |
2163.1.7
by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley |
1223 |
line_iterator = self.factory.get_linedelta_content(data) |
2975.3.1
by Robert Collins
Change (without backwards compatibility) the |
1224 |
# XXX: It might be more efficient to yield (version_id,
|
1225 |
# line_iterator) in the future. However for now, this is a simpler
|
|
1226 |
# change to integrate into the rest of the codebase. RBC 20071110
|
|
2163.1.7
by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley |
1227 |
for line in line_iterator: |
2975.3.1
by Robert Collins
Change (without backwards compatibility) the |
1228 |
yield line, version_id |
2163.1.7
by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley |
1229 |
|
2039.1.1
by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000) |
1230 |
pb.update('Walking content.', total, total) |
1594.2.6
by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved. |
1231 |
|
1563.2.18
by Robert Collins
get knit repositories really using knits for text storage. |
1232 |
def num_versions(self): |
1233 |
"""See VersionedFile.num_versions().""" |
|
1234 |
return self._index.num_versions() |
|
1235 |
||
1236 |
__len__ = num_versions |
|
1237 |
||
3316.2.13
by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this |
1238 |
def annotate(self, version_id): |
1239 |
"""See VersionedFile.annotate.""" |
|
1240 |
return self.factory.annotate(self, version_id) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1241 |
|
3287.5.1
by Robert Collins
Add VersionedFile.get_parent_map. |
1242 |
def get_parent_map(self, version_ids): |
1243 |
"""See VersionedFile.get_parent_map.""" |
|
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
1244 |
return self._index.get_parent_map(version_ids) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1245 |
|
2530.1.1
by Aaron Bentley
Make topological sorting optional for get_ancestry |
1246 |
def get_ancestry(self, versions, topo_sorted=True): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1247 |
"""See VersionedFile.get_ancestry.""" |
1248 |
if isinstance(versions, basestring): |
|
1249 |
versions = [versions] |
|
1250 |
if not versions: |
|
1251 |
return [] |
|
2530.1.1
by Aaron Bentley
Make topological sorting optional for get_ancestry |
1252 |
return self._index.get_ancestry(versions, topo_sorted) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1253 |
|
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1254 |
def get_ancestry_with_ghosts(self, versions): |
1255 |
"""See VersionedFile.get_ancestry_with_ghosts.""" |
|
1256 |
if isinstance(versions, basestring): |
|
1257 |
versions = [versions] |
|
1258 |
if not versions: |
|
1259 |
return [] |
|
1260 |
return self._index.get_ancestry_with_ghosts(versions) |
|
1261 |
||
1664.2.3
by Aaron Bentley
Add failing test case |
1262 |
def plan_merge(self, ver_a, ver_b): |
1664.2.11
by Aaron Bentley
Clarifications from merge review |
1263 |
"""See VersionedFile.plan_merge.""" |
2490.2.33
by Aaron Bentley
Disable topological sorting of get_ancestry where sensible |
1264 |
ancestors_b = set(self.get_ancestry(ver_b, topo_sorted=False)) |
1265 |
ancestors_a = set(self.get_ancestry(ver_a, topo_sorted=False)) |
|
1664.2.4
by Aaron Bentley
Identify unchanged lines correctly |
1266 |
annotated_a = self.annotate(ver_a) |
1267 |
annotated_b = self.annotate(ver_b) |
|
1551.15.46
by Aaron Bentley
Move plan merge to tree |
1268 |
return merge._plan_annotate_merge(annotated_a, annotated_b, |
1269 |
ancestors_a, ancestors_b) |
|
1664.2.4
by Aaron Bentley
Identify unchanged lines correctly |
1270 |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1271 |
|
1272 |
class _KnitComponentFile(object): |
|
1273 |
"""One of the files used to implement a knit database""" |
|
1274 |
||
1946.2.1
by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories |
1275 |
def __init__(self, transport, filename, mode, file_mode=None, |
1946.2.12
by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put |
1276 |
create_parent_dir=False, dir_mode=None): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1277 |
self._transport = transport |
1278 |
self._filename = filename |
|
1279 |
self._mode = mode |
|
1946.2.3
by John Arbash Meinel
Pass around the file mode correctly |
1280 |
self._file_mode = file_mode |
1946.2.12
by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put |
1281 |
self._dir_mode = dir_mode |
1946.2.1
by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories |
1282 |
self._create_parent_dir = create_parent_dir |
1283 |
self._need_to_create = False |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1284 |
|
2196.2.5
by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it |
1285 |
def _full_path(self): |
1286 |
"""Return the full path to this file.""" |
|
1287 |
return self._transport.base + self._filename |
|
1288 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1289 |
def check_header(self, fp): |
1641.1.2
by Robert Collins
Change knit index files to be robust in the presence of partial writes. |
1290 |
line = fp.readline() |
2171.1.1
by John Arbash Meinel
Knit index files should ignore empty indexes rather than consider them corrupt. |
1291 |
if line == '': |
1292 |
# An empty file can actually be treated as though the file doesn't
|
|
1293 |
# exist yet.
|
|
2196.2.5
by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it |
1294 |
raise errors.NoSuchFile(self._full_path()) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1295 |
if line != self.HEADER: |
2171.1.1
by John Arbash Meinel
Knit index files should ignore empty indexes rather than consider them corrupt. |
1296 |
raise KnitHeaderError(badline=line, |
1297 |
filename=self._transport.abspath(self._filename)) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1298 |
|
1299 |
def __repr__(self): |
|
1300 |
return '%s(%s)' % (self.__class__.__name__, self._filename) |
|
1301 |
||
1302 |
||
1303 |
class _KnitIndex(_KnitComponentFile): |
|
1304 |
"""Manages knit index file. |
|
1305 |
||
1306 |
The index is already kept in memory and read on startup, to enable
|
|
1307 |
fast lookups of revision information. The cursor of the index
|
|
1308 |
file is always pointing to the end, making it easy to append
|
|
1309 |
entries.
|
|
1310 |
||
1311 |
_cache is a cache for fast mapping from version id to a Index
|
|
1312 |
object.
|
|
1313 |
||
1314 |
_history is a cache for fast mapping from indexes to version ids.
|
|
1315 |
||
1316 |
The index data format is dictionary compressed when it comes to
|
|
1317 |
parent references; a index entry may only have parents that with a
|
|
1318 |
lover index number. As a result, the index is topological sorted.
|
|
1563.2.11
by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis. |
1319 |
|
1320 |
Duplicate entries may be written to the index for a single version id
|
|
1321 |
if this is done then the latter one completely replaces the former:
|
|
1322 |
this allows updates to correct version and parent information.
|
|
1323 |
Note that the two entries may share the delta, and that successive
|
|
1324 |
annotations and references MUST point to the first entry.
|
|
1641.1.2
by Robert Collins
Change knit index files to be robust in the presence of partial writes. |
1325 |
|
1326 |
The index file on disc contains a header, followed by one line per knit
|
|
1327 |
record. The same revision can be present in an index file more than once.
|
|
1759.2.1
by Jelmer Vernooij
Fix some types (found using aspell). |
1328 |
The first occurrence gets assigned a sequence number starting from 0.
|
1641.1.2
by Robert Collins
Change knit index files to be robust in the presence of partial writes. |
1329 |
|
1330 |
The format of a single line is
|
|
1331 |
REVISION_ID FLAGS BYTE_OFFSET LENGTH( PARENT_ID|PARENT_SEQUENCE_ID)* :\n
|
|
1332 |
REVISION_ID is a utf8-encoded revision id
|
|
1333 |
FLAGS is a comma separated list of flags about the record. Values include
|
|
1334 |
no-eol, line-delta, fulltext.
|
|
1335 |
BYTE_OFFSET is the ascii representation of the byte offset in the data file
|
|
1336 |
that the the compressed data starts at.
|
|
1337 |
LENGTH is the ascii representation of the length of the data file.
|
|
1338 |
PARENT_ID a utf-8 revision id prefixed by a '.' that is a parent of
|
|
1339 |
REVISION_ID.
|
|
1340 |
PARENT_SEQUENCE_ID the ascii representation of the sequence number of a
|
|
1341 |
revision id already in the knit that is a parent of REVISION_ID.
|
|
1342 |
The ' :' marker is the end of record marker.
|
|
1343 |
|
|
1344 |
partial writes:
|
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1345 |
when a write is interrupted to the index file, it will result in a line
|
1346 |
that does not end in ' :'. If the ' :' is not present at the end of a line,
|
|
1347 |
or at the end of the file, then the record that is missing it will be
|
|
1348 |
ignored by the parser.
|
|
1641.1.2
by Robert Collins
Change knit index files to be robust in the presence of partial writes. |
1349 |
|
1759.2.1
by Jelmer Vernooij
Fix some types (found using aspell). |
1350 |
When writing new records to the index file, the data is preceded by '\n'
|
1641.1.2
by Robert Collins
Change knit index files to be robust in the presence of partial writes. |
1351 |
to ensure that records always start on new lines even if the last write was
|
1352 |
interrupted. As a result its normal for the last line in the index to be
|
|
1353 |
missing a trailing newline. One can be added with no harmful effects.
|
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1354 |
"""
|
1355 |
||
1666.1.6
by Robert Collins
Make knit the default format. |
1356 |
HEADER = "# bzr knit index 8\n" |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1357 |
|
1596.2.18
by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds. |
1358 |
# speed of knit parsing went from 280 ms to 280 ms with slots addition.
|
1359 |
# __slots__ = ['_cache', '_history', '_transport', '_filename']
|
|
1360 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1361 |
def _cache_version(self, version_id, options, pos, size, parents): |
1596.2.18
by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds. |
1362 |
"""Cache a version record in the history array and index cache. |
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1363 |
|
1364 |
This is inlined into _load_data for performance. KEEP IN SYNC.
|
|
1596.2.18
by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds. |
1365 |
(It saves 60ms, 25% of the __init__ overhead on local 4000 record
|
1366 |
indexes).
|
|
1367 |
"""
|
|
1596.2.14
by Robert Collins
Make knit parsing non quadratic? |
1368 |
# only want the _history index to reference the 1st index entry
|
1369 |
# for version_id
|
|
1596.2.18
by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds. |
1370 |
if version_id not in self._cache: |
1628.1.1
by Robert Collins
Cache the index number of versions in the knit index's self._cache so that |
1371 |
index = len(self._history) |
1596.2.14
by Robert Collins
Make knit parsing non quadratic? |
1372 |
self._history.append(version_id) |
1628.1.1
by Robert Collins
Cache the index number of versions in the knit index's self._cache so that |
1373 |
else: |
1374 |
index = self._cache[version_id][5] |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1375 |
self._cache[version_id] = (version_id, |
1628.1.1
by Robert Collins
Cache the index number of versions in the knit index's self._cache so that |
1376 |
options, |
1377 |
pos, |
|
1378 |
size, |
|
1379 |
parents, |
|
1380 |
index) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1381 |
|
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
1382 |
def _check_write_ok(self): |
3316.2.5
by Robert Collins
Review feedback. |
1383 |
if self._get_scope() != self._scope: |
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
1384 |
raise errors.OutSideTransaction() |
1385 |
if self._mode != 'w': |
|
1386 |
raise errors.ReadOnlyObjectDirtiedError(self) |
|
1387 |
||
1946.2.1
by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories |
1388 |
def __init__(self, transport, filename, mode, create=False, file_mode=None, |
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
1389 |
create_parent_dir=False, delay_create=False, dir_mode=None, |
1390 |
get_scope=None): |
|
1946.2.12
by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put |
1391 |
_KnitComponentFile.__init__(self, transport, filename, mode, |
1392 |
file_mode=file_mode, |
|
1393 |
create_parent_dir=create_parent_dir, |
|
1394 |
dir_mode=dir_mode) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1395 |
self._cache = {} |
1563.2.11
by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis. |
1396 |
# position in _history is the 'official' index for a revision
|
1397 |
# but the values may have come from a newer entry.
|
|
1759.2.1
by Jelmer Vernooij
Fix some types (found using aspell). |
1398 |
# so - wc -l of a knit index is != the number of unique names
|
1773.4.1
by Martin Pool
Add pyflakes makefile target; fix many warnings |
1399 |
# in the knit.
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1400 |
self._history = [] |
1401 |
try: |
|
2247.2.1
by John Arbash Meinel
Don't create pb for simple knit reading. |
1402 |
fp = self._transport.get(self._filename) |
1594.2.17
by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading. |
1403 |
try: |
2247.2.1
by John Arbash Meinel
Don't create pb for simple knit reading. |
1404 |
# _load_data may raise NoSuchFile if the target knit is
|
1405 |
# completely empty.
|
|
2484.1.1
by John Arbash Meinel
Add an initial function to read knit indexes in pyrex. |
1406 |
_load_data(self, fp) |
2247.2.1
by John Arbash Meinel
Don't create pb for simple knit reading. |
1407 |
finally: |
1408 |
fp.close() |
|
1409 |
except NoSuchFile: |
|
1410 |
if mode != 'w' or not create: |
|
1411 |
raise
|
|
1412 |
elif delay_create: |
|
1413 |
self._need_to_create = True |
|
1414 |
else: |
|
1415 |
self._transport.put_bytes_non_atomic( |
|
1416 |
self._filename, self.HEADER, mode=self._file_mode) |
|
3316.2.5
by Robert Collins
Review feedback. |
1417 |
self._scope = get_scope() |
1418 |
self._get_scope = get_scope |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1419 |
|
2530.1.1
by Aaron Bentley
Make topological sorting optional for get_ancestry |
1420 |
def get_ancestry(self, versions, topo_sorted=True): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1421 |
"""See VersionedFile.get_ancestry.""" |
1563.2.35
by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too. |
1422 |
# get a graph of all the mentioned versions:
|
1423 |
graph = {} |
|
1424 |
pending = set(versions) |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1425 |
cache = self._cache |
1426 |
while pending: |
|
1563.2.35
by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too. |
1427 |
version = pending.pop() |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1428 |
# trim ghosts
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1429 |
try: |
1430 |
parents = [p for p in cache[version][4] if p in cache] |
|
1431 |
except KeyError: |
|
1432 |
raise RevisionNotPresent(version, self._filename) |
|
1433 |
# if not completed and not a ghost
|
|
1434 |
pending.update([p for p in parents if p not in graph]) |
|
1563.2.35
by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too. |
1435 |
graph[version] = parents |
2530.1.1
by Aaron Bentley
Make topological sorting optional for get_ancestry |
1436 |
if not topo_sorted: |
1437 |
return graph.keys() |
|
1563.2.35
by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too. |
1438 |
return topo_sort(graph.items()) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1439 |
|
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1440 |
def get_ancestry_with_ghosts(self, versions): |
1441 |
"""See VersionedFile.get_ancestry_with_ghosts.""" |
|
1442 |
# get a graph of all the mentioned versions:
|
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1443 |
self.check_versions_present(versions) |
1444 |
cache = self._cache |
|
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1445 |
graph = {} |
1446 |
pending = set(versions) |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1447 |
while pending: |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1448 |
version = pending.pop() |
1449 |
try: |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1450 |
parents = cache[version][4] |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1451 |
except KeyError: |
1452 |
# ghost, fake it
|
|
1453 |
graph[version] = [] |
|
1454 |
else: |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1455 |
# if not completed
|
1456 |
pending.update([p for p in parents if p not in graph]) |
|
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1457 |
graph[version] = parents |
1458 |
return topo_sort(graph.items()) |
|
1459 |
||
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1460 |
def get_build_details(self, version_ids): |
1461 |
"""Get the method, index_memo and compression parent for version_ids. |
|
1462 |
||
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
1463 |
Ghosts are omitted from the result.
|
1464 |
||
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1465 |
:param version_ids: An iterable of version_ids.
|
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
1466 |
:return: A dict of version_id:(index_memo, compression_parent,
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1467 |
parents, record_details).
|
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
1468 |
index_memo
|
1469 |
opaque structure to pass to read_records to extract the raw
|
|
1470 |
data
|
|
1471 |
compression_parent
|
|
1472 |
Content that this record is built upon, may be None
|
|
1473 |
parents
|
|
1474 |
Logical parents of this node
|
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1475 |
record_details
|
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
1476 |
extra information about the content which needs to be passed to
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
1477 |
Factory.parse_record
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1478 |
"""
|
1479 |
result = {} |
|
1480 |
for version_id in version_ids: |
|
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
1481 |
if version_id not in self._cache: |
1482 |
# ghosts are omitted
|
|
1483 |
continue
|
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1484 |
method = self.get_method(version_id) |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
1485 |
parents = self.get_parents_with_ghosts(version_id) |
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1486 |
if method == 'fulltext': |
1487 |
compression_parent = None |
|
1488 |
else: |
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
1489 |
compression_parent = parents[0] |
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
1490 |
noeol = 'no-eol' in self.get_options(version_id) |
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1491 |
index_memo = self.get_position(version_id) |
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
1492 |
result[version_id] = (index_memo, compression_parent, |
1493 |
parents, (method, noeol)) |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1494 |
return result |
1495 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1496 |
def num_versions(self): |
1497 |
return len(self._history) |
|
1498 |
||
1499 |
__len__ = num_versions |
|
1500 |
||
1501 |
def get_versions(self): |
|
2592.3.6
by Robert Collins
Implement KnitGraphIndex.get_versions. |
1502 |
"""Get all the versions in the file. not topologically sorted.""" |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1503 |
return self._history |
1504 |
||
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1505 |
def _version_list_to_index(self, versions): |
1506 |
result_list = [] |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1507 |
cache = self._cache |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1508 |
for version in versions: |
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1509 |
if version in cache: |
1628.1.1
by Robert Collins
Cache the index number of versions in the knit index's self._cache so that |
1510 |
# -- inlined lookup() --
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1511 |
result_list.append(str(cache[version][5])) |
1628.1.1
by Robert Collins
Cache the index number of versions in the knit index's self._cache so that |
1512 |
# -- end lookup () --
|
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1513 |
else: |
2249.5.15
by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down. |
1514 |
result_list.append('.' + version) |
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1515 |
return ' '.join(result_list) |
1516 |
||
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1517 |
def add_version(self, version_id, options, index_memo, parents): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1518 |
"""Add a version record to the index.""" |
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1519 |
self.add_versions(((version_id, options, index_memo, parents),)) |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1520 |
|
2841.2.1
by Robert Collins
* Commit no longer checks for new text keys during insertion when the |
1521 |
def add_versions(self, versions, random_id=False): |
1692.2.1
by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions. |
1522 |
"""Add multiple versions to the index. |
1523 |
|
|
1524 |
:param versions: a list of tuples:
|
|
1525 |
(version_id, options, pos, size, parents).
|
|
2841.2.1
by Robert Collins
* Commit no longer checks for new text keys during insertion when the |
1526 |
:param random_id: If True the ids being added were randomly generated
|
1527 |
and no check for existence will be performed.
|
|
1692.2.1
by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions. |
1528 |
"""
|
1529 |
lines = [] |
|
2102.2.1
by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions |
1530 |
orig_history = self._history[:] |
1531 |
orig_cache = self._cache.copy() |
|
1532 |
||
1533 |
try: |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1534 |
for version_id, options, (index, pos, size), parents in versions: |
2249.5.15
by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down. |
1535 |
line = "\n%s %s %s %s %s :" % (version_id, |
2102.2.1
by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions |
1536 |
','.join(options), |
1537 |
pos, |
|
1538 |
size, |
|
1539 |
self._version_list_to_index(parents)) |
|
1540 |
assert isinstance(line, str), \ |
|
1541 |
'content must be utf-8 encoded: %r' % (line,) |
|
1542 |
lines.append(line) |
|
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
1543 |
self._cache_version(version_id, options, pos, size, tuple(parents)) |
2102.2.1
by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions |
1544 |
if not self._need_to_create: |
1545 |
self._transport.append_bytes(self._filename, ''.join(lines)) |
|
1546 |
else: |
|
1547 |
sio = StringIO() |
|
1548 |
sio.write(self.HEADER) |
|
1549 |
sio.writelines(lines) |
|
1550 |
sio.seek(0) |
|
1551 |
self._transport.put_file_non_atomic(self._filename, sio, |
|
1552 |
create_parent_dir=self._create_parent_dir, |
|
1553 |
mode=self._file_mode, |
|
1554 |
dir_mode=self._dir_mode) |
|
1555 |
self._need_to_create = False |
|
1556 |
except: |
|
1557 |
# If any problems happen, restore the original values and re-raise
|
|
1558 |
self._history = orig_history |
|
1559 |
self._cache = orig_cache |
|
1560 |
raise
|
|
1561 |
||
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1562 |
def has_version(self, version_id): |
1563 |
"""True if the version is in the index.""" |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1564 |
return version_id in self._cache |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1565 |
|
1566 |
def get_position(self, version_id): |
|
2670.2.2
by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use |
1567 |
"""Return details needed to access the version. |
1568 |
|
|
1569 |
.kndx indices do not support split-out data, so return None for the
|
|
1570 |
index field.
|
|
1571 |
||
1572 |
:return: a tuple (None, data position, size) to hand to the access
|
|
1573 |
logic to get the record.
|
|
1574 |
"""
|
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1575 |
entry = self._cache[version_id] |
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1576 |
return None, entry[2], entry[3] |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1577 |
|
1578 |
def get_method(self, version_id): |
|
1579 |
"""Return compression method of specified version.""" |
|
2592.3.97
by Robert Collins
Merge more bzr.dev, addressing some bugs. [still broken] |
1580 |
try: |
1581 |
options = self._cache[version_id][1] |
|
1582 |
except KeyError: |
|
1583 |
raise RevisionNotPresent(version_id, self._filename) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1584 |
if 'fulltext' in options: |
1585 |
return 'fulltext' |
|
1586 |
else: |
|
2196.2.5
by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it |
1587 |
if 'line-delta' not in options: |
1588 |
raise errors.KnitIndexUnknownMethod(self._full_path(), options) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1589 |
return 'line-delta' |
1590 |
||
1591 |
def get_options(self, version_id): |
|
3052.2.5
by Andrew Bennetts
Address the rest of the review comments from John and myself. |
1592 |
"""Return a list representing options. |
2592.3.14
by Robert Collins
Implement KnitGraphIndex.get_options. |
1593 |
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
1594 |
e.g. ['foo', 'bar']
|
2592.3.14
by Robert Collins
Implement KnitGraphIndex.get_options. |
1595 |
"""
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1596 |
return self._cache[version_id][1] |
1597 |
||
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
1598 |
def get_parent_map(self, version_ids): |
1599 |
"""Passed through to by KnitVersionedFile.get_parent_map.""" |
|
1600 |
result = {} |
|
1601 |
for version_id in version_ids: |
|
1602 |
try: |
|
1603 |
result[version_id] = tuple(self._cache[version_id][4]) |
|
1604 |
except KeyError: |
|
1605 |
pass
|
|
1606 |
return result |
|
1607 |
||
1594.2.8
by Robert Collins
add ghost aware apis to knits. |
1608 |
def get_parents_with_ghosts(self, version_id): |
1759.2.1
by Jelmer Vernooij
Fix some types (found using aspell). |
1609 |
"""Return parents of specified version with ghosts.""" |
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
1610 |
try: |
1611 |
return self.get_parent_map([version_id])[version_id] |
|
1612 |
except KeyError: |
|
1613 |
raise RevisionNotPresent(version_id, self) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1614 |
|
1615 |
def check_versions_present(self, version_ids): |
|
1616 |
"""Check that all specified versions are present.""" |
|
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
1617 |
cache = self._cache |
1618 |
for version_id in version_ids: |
|
1619 |
if version_id not in cache: |
|
1620 |
raise RevisionNotPresent(version_id, self._filename) |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
1621 |
|
1622 |
||
2592.3.2
by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API. |
1623 |
class KnitGraphIndex(object): |
1624 |
"""A knit index that builds on GraphIndex.""" |
|
1625 |
||
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1626 |
def __init__(self, graph_index, deltas=False, parents=True, add_callback=None): |
2592.3.2
by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API. |
1627 |
"""Construct a KnitGraphIndex on a graph_index. |
1628 |
||
1629 |
:param graph_index: An implementation of bzrlib.index.GraphIndex.
|
|
2592.3.13
by Robert Collins
Implement KnitGraphIndex.get_method. |
1630 |
:param deltas: Allow delta-compressed records.
|
2592.3.19
by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions. |
1631 |
:param add_callback: If not None, allow additions to the index and call
|
1632 |
this callback with a list of added GraphIndex nodes:
|
|
2592.3.33
by Robert Collins
Change the order of index refs and values to make the no-graph knit index easier. |
1633 |
[(node, value, node_refs), ...]
|
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1634 |
:param parents: If True, record knits parents, if not do not record
|
1635 |
parents.
|
|
2592.3.2
by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API. |
1636 |
"""
|
1637 |
self._graph_index = graph_index |
|
2592.3.13
by Robert Collins
Implement KnitGraphIndex.get_method. |
1638 |
self._deltas = deltas |
2592.3.19
by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions. |
1639 |
self._add_callback = add_callback |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1640 |
self._parents = parents |
1641 |
if deltas and not parents: |
|
1642 |
raise KnitCorrupt(self, "Cannot do delta compression without " |
|
1643 |
"parent tracking.") |
|
2592.3.2
by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API. |
1644 |
|
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
1645 |
def _check_write_ok(self): |
1646 |
pass
|
|
1647 |
||
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1648 |
def _get_entries(self, keys, check_present=False): |
1649 |
"""Get the entries for keys. |
|
1650 |
|
|
1651 |
:param keys: An iterable of index keys, - 1-tuples.
|
|
1652 |
"""
|
|
1653 |
keys = set(keys) |
|
2592.3.43
by Robert Collins
A knit iter_parents API. |
1654 |
found_keys = set() |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1655 |
if self._parents: |
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1656 |
for node in self._graph_index.iter_entries(keys): |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1657 |
yield node |
2624.2.14
by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data. |
1658 |
found_keys.add(node[1]) |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1659 |
else: |
1660 |
# adapt parentless index to the rest of the code.
|
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1661 |
for node in self._graph_index.iter_entries(keys): |
2624.2.14
by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data. |
1662 |
yield node[0], node[1], node[2], () |
1663 |
found_keys.add(node[1]) |
|
2592.3.43
by Robert Collins
A knit iter_parents API. |
1664 |
if check_present: |
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1665 |
missing_keys = keys.difference(found_keys) |
2592.3.43
by Robert Collins
A knit iter_parents API. |
1666 |
if missing_keys: |
1667 |
raise RevisionNotPresent(missing_keys.pop(), self) |
|
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1668 |
|
1669 |
def _present_keys(self, version_ids): |
|
1670 |
return set([ |
|
2624.2.14
by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data. |
1671 |
node[1] for node in self._get_entries(version_ids)]) |
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1672 |
|
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1673 |
def _parentless_ancestry(self, versions): |
1674 |
"""Honour the get_ancestry API for parentless knit indices.""" |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1675 |
wanted_keys = self._version_ids_to_keys(versions) |
1676 |
present_keys = self._present_keys(wanted_keys) |
|
1677 |
missing = set(wanted_keys).difference(present_keys) |
|
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1678 |
if missing: |
1679 |
raise RevisionNotPresent(missing.pop(), self) |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1680 |
return list(self._keys_to_version_ids(present_keys)) |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1681 |
|
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1682 |
def get_ancestry(self, versions, topo_sorted=True): |
1683 |
"""See VersionedFile.get_ancestry.""" |
|
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1684 |
if not self._parents: |
1685 |
return self._parentless_ancestry(versions) |
|
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1686 |
# XXX: This will do len(history) index calls - perhaps
|
1687 |
# it should be altered to be a index core feature?
|
|
1688 |
# get a graph of all the mentioned versions:
|
|
1689 |
graph = {} |
|
2592.3.30
by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits. |
1690 |
ghosts = set() |
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1691 |
versions = self._version_ids_to_keys(versions) |
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1692 |
pending = set(versions) |
1693 |
while pending: |
|
1694 |
# get all pending nodes
|
|
1695 |
this_iteration = pending |
|
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1696 |
new_nodes = self._get_entries(this_iteration) |
2592.3.53
by Robert Collins
Remove usage of difference_update in knit.py. |
1697 |
found = set() |
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1698 |
pending = set() |
2624.2.14
by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data. |
1699 |
for (index, key, value, node_refs) in new_nodes: |
2592.3.30
by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits. |
1700 |
# dont ask for ghosties - otherwise
|
1701 |
# we we can end up looping with pending
|
|
1702 |
# being entirely ghosted.
|
|
1703 |
graph[key] = [parent for parent in node_refs[0] |
|
1704 |
if parent not in ghosts] |
|
2592.3.53
by Robert Collins
Remove usage of difference_update in knit.py. |
1705 |
# queue parents
|
1706 |
for parent in graph[key]: |
|
1707 |
# dont examine known nodes again
|
|
1708 |
if parent in graph: |
|
1709 |
continue
|
|
1710 |
pending.add(parent) |
|
1711 |
found.add(key) |
|
1712 |
ghosts.update(this_iteration.difference(found)) |
|
2592.3.30
by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits. |
1713 |
if versions.difference(graph): |
1714 |
raise RevisionNotPresent(versions.difference(graph).pop(), self) |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1715 |
if topo_sorted: |
1716 |
result_keys = topo_sort(graph.items()) |
|
1717 |
else: |
|
1718 |
result_keys = graph.iterkeys() |
|
1719 |
return [key[0] for key in result_keys] |
|
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1720 |
|
1721 |
def get_ancestry_with_ghosts(self, versions): |
|
1722 |
"""See VersionedFile.get_ancestry.""" |
|
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1723 |
if not self._parents: |
1724 |
return self._parentless_ancestry(versions) |
|
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1725 |
# XXX: This will do len(history) index calls - perhaps
|
1726 |
# it should be altered to be a index core feature?
|
|
1727 |
# get a graph of all the mentioned versions:
|
|
1728 |
graph = {} |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1729 |
versions = self._version_ids_to_keys(versions) |
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1730 |
pending = set(versions) |
1731 |
while pending: |
|
1732 |
# get all pending nodes
|
|
1733 |
this_iteration = pending |
|
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1734 |
new_nodes = self._get_entries(this_iteration) |
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1735 |
pending = set() |
2624.2.14
by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data. |
1736 |
for (index, key, value, node_refs) in new_nodes: |
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1737 |
graph[key] = node_refs[0] |
1738 |
# queue parents
|
|
2592.3.53
by Robert Collins
Remove usage of difference_update in knit.py. |
1739 |
for parent in graph[key]: |
1740 |
# dont examine known nodes again
|
|
1741 |
if parent in graph: |
|
1742 |
continue
|
|
1743 |
pending.add(parent) |
|
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1744 |
missing_versions = this_iteration.difference(graph) |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1745 |
missing_needed = versions.intersection(missing_versions) |
1746 |
if missing_needed: |
|
1747 |
raise RevisionNotPresent(missing_needed.pop(), self) |
|
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1748 |
for missing_version in missing_versions: |
1749 |
# add a key, no parents
|
|
1750 |
graph[missing_version] = [] |
|
2592.3.53
by Robert Collins
Remove usage of difference_update in knit.py. |
1751 |
pending.discard(missing_version) # don't look for it |
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1752 |
result_keys = topo_sort(graph.items()) |
1753 |
return [key[0] for key in result_keys] |
|
2592.3.4
by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex. |
1754 |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1755 |
def get_build_details(self, version_ids): |
1756 |
"""Get the method, index_memo and compression parent for version_ids. |
|
1757 |
||
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
1758 |
Ghosts are omitted from the result.
|
1759 |
||
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1760 |
:param version_ids: An iterable of version_ids.
|
3224.1.18
by John Arbash Meinel
Cleanup documentation |
1761 |
:return: A dict of version_id:(index_memo, compression_parent,
|
1762 |
parents, record_details).
|
|
1763 |
index_memo
|
|
1764 |
opaque structure to pass to read_records to extract the raw
|
|
1765 |
data
|
|
1766 |
compression_parent
|
|
1767 |
Content that this record is built upon, may be None
|
|
1768 |
parents
|
|
1769 |
Logical parents of this node
|
|
1770 |
record_details
|
|
1771 |
extra information about the content which needs to be passed to
|
|
1772 |
Factory.parse_record
|
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1773 |
"""
|
1774 |
result = {} |
|
1775 |
entries = self._get_entries(self._version_ids_to_keys(version_ids), True) |
|
1776 |
for entry in entries: |
|
1777 |
version_id = self._keys_to_version_ids((entry[1],))[0] |
|
3224.1.27
by John Arbash Meinel
Handle when the knit index doesn't track parents. |
1778 |
if not self._parents: |
1779 |
parents = () |
|
1780 |
else: |
|
1781 |
parents = self._keys_to_version_ids(entry[3][0]) |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1782 |
if not self._deltas: |
1783 |
compression_parent = None |
|
1784 |
else: |
|
1785 |
compression_parent_key = self._compression_parent(entry) |
|
1786 |
if compression_parent_key: |
|
1787 |
compression_parent = self._keys_to_version_ids( |
|
1788 |
(compression_parent_key,))[0] |
|
1789 |
else: |
|
1790 |
compression_parent = None |
|
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
1791 |
noeol = (entry[2][0] == 'N') |
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1792 |
if compression_parent: |
1793 |
method = 'line-delta' |
|
1794 |
else: |
|
1795 |
method = 'fulltext' |
|
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
1796 |
result[version_id] = (self._node_to_position(entry), |
1797 |
compression_parent, parents, |
|
1798 |
(method, noeol)) |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1799 |
return result |
1800 |
||
1801 |
def _compression_parent(self, an_entry): |
|
1802 |
# return the key that an_entry is compressed against, or None
|
|
1803 |
# Grab the second parent list (as deltas implies parents currently)
|
|
1804 |
compression_parents = an_entry[3][1] |
|
1805 |
if not compression_parents: |
|
1806 |
return None |
|
1807 |
assert len(compression_parents) == 1 |
|
1808 |
return compression_parents[0] |
|
1809 |
||
1810 |
def _get_method(self, node): |
|
1811 |
if not self._deltas: |
|
1812 |
return 'fulltext' |
|
1813 |
if self._compression_parent(node): |
|
1814 |
return 'line-delta' |
|
1815 |
else: |
|
1816 |
return 'fulltext' |
|
1817 |
||
2592.3.5
by Robert Collins
Implement KnitGraphIndex.num_versions. |
1818 |
def num_versions(self): |
1819 |
return len(list(self._graph_index.iter_all_entries())) |
|
2592.3.2
by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API. |
1820 |
|
2592.3.6
by Robert Collins
Implement KnitGraphIndex.get_versions. |
1821 |
__len__ = num_versions |
1822 |
||
1823 |
def get_versions(self): |
|
1824 |
"""Get all the versions in the file. not topologically sorted.""" |
|
2624.2.14
by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data. |
1825 |
return [node[1][0] for node in self._graph_index.iter_all_entries()] |
2592.3.6
by Robert Collins
Implement KnitGraphIndex.get_versions. |
1826 |
|
2592.3.9
by Robert Collins
Implement KnitGraphIndex.has_version. |
1827 |
def has_version(self, version_id): |
1828 |
"""True if the version is in the index.""" |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1829 |
return len(self._present_keys(self._version_ids_to_keys([version_id]))) == 1 |
1830 |
||
1831 |
def _keys_to_version_ids(self, keys): |
|
1832 |
return tuple(key[0] for key in keys) |
|
2592.3.6
by Robert Collins
Implement KnitGraphIndex.get_versions. |
1833 |
|
2592.3.10
by Robert Collins
Implement KnitGraphIndex.get_position. |
1834 |
def get_position(self, version_id): |
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1835 |
"""Return details needed to access the version. |
1836 |
|
|
1837 |
:return: a tuple (index, data position, size) to hand to the access
|
|
1838 |
logic to get the record.
|
|
1839 |
"""
|
|
1840 |
node = self._get_node(version_id) |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1841 |
return self._node_to_position(node) |
1842 |
||
1843 |
def _node_to_position(self, node): |
|
1844 |
"""Convert an index value to position details.""" |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1845 |
bits = node[2][1:].split(' ') |
1846 |
return node[0], int(bits[0]), int(bits[1]) |
|
2592.3.10
by Robert Collins
Implement KnitGraphIndex.get_position. |
1847 |
|
2592.3.11
by Robert Collins
Implement KnitGraphIndex.get_method. |
1848 |
def get_method(self, version_id): |
1849 |
"""Return compression method of specified version.""" |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1850 |
return self._get_method(self._get_node(version_id)) |
2592.3.11
by Robert Collins
Implement KnitGraphIndex.get_method. |
1851 |
|
1852 |
def _get_node(self, version_id): |
|
2592.3.97
by Robert Collins
Merge more bzr.dev, addressing some bugs. [still broken] |
1853 |
try: |
1854 |
return list(self._get_entries(self._version_ids_to_keys([version_id])))[0] |
|
1855 |
except IndexError: |
|
1856 |
raise RevisionNotPresent(version_id, self) |
|
2592.3.11
by Robert Collins
Implement KnitGraphIndex.get_method. |
1857 |
|
2592.3.14
by Robert Collins
Implement KnitGraphIndex.get_options. |
1858 |
def get_options(self, version_id): |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
1859 |
"""Return a list representing options. |
2592.3.14
by Robert Collins
Implement KnitGraphIndex.get_options. |
1860 |
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
1861 |
e.g. ['foo', 'bar']
|
2592.3.14
by Robert Collins
Implement KnitGraphIndex.get_options. |
1862 |
"""
|
1863 |
node = self._get_node(version_id) |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
1864 |
options = [self._get_method(node)] |
2624.2.14
by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data. |
1865 |
if node[2][0] == 'N': |
2592.3.14
by Robert Collins
Implement KnitGraphIndex.get_options. |
1866 |
options.append('no-eol') |
2658.2.1
by Robert Collins
Fix mismatch between KnitGraphIndex and KnitIndex in get_options. |
1867 |
return options |
2592.3.11
by Robert Collins
Implement KnitGraphIndex.get_method. |
1868 |
|
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
1869 |
def get_parent_map(self, version_ids): |
1870 |
"""Passed through to by KnitVersionedFile.get_parent_map.""" |
|
1871 |
nodes = self._get_entries(self._version_ids_to_keys(version_ids)) |
|
1872 |
result = {} |
|
1873 |
if self._parents: |
|
1874 |
for node in nodes: |
|
1875 |
result[node[1][0]] = self._keys_to_version_ids(node[3][0]) |
|
1876 |
else: |
|
1877 |
for node in nodes: |
|
1878 |
result[node[1][0]] = () |
|
1879 |
return result |
|
1880 |
||
2592.3.15
by Robert Collins
Implement KnitGraphIndex.get_parents/get_parents_with_ghosts. |
1881 |
def get_parents_with_ghosts(self, version_id): |
1882 |
"""Return parents of specified version with ghosts.""" |
|
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
1883 |
try: |
1884 |
return self.get_parent_map([version_id])[version_id] |
|
1885 |
except KeyError: |
|
1886 |
raise RevisionNotPresent(version_id, self) |
|
2592.3.15
by Robert Collins
Implement KnitGraphIndex.get_parents/get_parents_with_ghosts. |
1887 |
|
2592.3.16
by Robert Collins
Implement KnitGraphIndex.check_versions_present. |
1888 |
def check_versions_present(self, version_ids): |
1889 |
"""Check that all specified versions are present.""" |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1890 |
keys = self._version_ids_to_keys(version_ids) |
1891 |
present = self._present_keys(keys) |
|
1892 |
missing = keys.difference(present) |
|
2592.3.16
by Robert Collins
Implement KnitGraphIndex.check_versions_present. |
1893 |
if missing: |
2592.3.19
by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions. |
1894 |
raise RevisionNotPresent(missing.pop(), self) |
2592.3.16
by Robert Collins
Implement KnitGraphIndex.check_versions_present. |
1895 |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1896 |
def add_version(self, version_id, options, access_memo, parents): |
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1897 |
"""Add a version record to the index.""" |
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1898 |
return self.add_versions(((version_id, options, access_memo, parents),)) |
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1899 |
|
2841.2.1
by Robert Collins
* Commit no longer checks for new text keys during insertion when the |
1900 |
def add_versions(self, versions, random_id=False): |
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1901 |
"""Add multiple versions to the index. |
1902 |
|
|
1903 |
This function does not insert data into the Immutable GraphIndex
|
|
1904 |
backing the KnitGraphIndex, instead it prepares data for insertion by
|
|
2592.3.19
by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions. |
1905 |
the caller and checks that it is safe to insert then calls
|
1906 |
self._add_callback with the prepared GraphIndex nodes.
|
|
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1907 |
|
1908 |
:param versions: a list of tuples:
|
|
1909 |
(version_id, options, pos, size, parents).
|
|
2841.2.1
by Robert Collins
* Commit no longer checks for new text keys during insertion when the |
1910 |
:param random_id: If True the ids being added were randomly generated
|
1911 |
and no check for existence will be performed.
|
|
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1912 |
"""
|
2592.3.19
by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions. |
1913 |
if not self._add_callback: |
1914 |
raise errors.ReadOnlyError(self) |
|
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1915 |
# we hope there are no repositories with inconsistent parentage
|
1916 |
# anymore.
|
|
1917 |
# check for dups
|
|
1918 |
||
1919 |
keys = {} |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
1920 |
for (version_id, options, access_memo, parents) in versions: |
2670.2.2
by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use |
1921 |
index, pos, size = access_memo |
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1922 |
key = (version_id, ) |
1923 |
parents = tuple((parent, ) for parent in parents) |
|
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1924 |
if 'no-eol' in options: |
1925 |
value = 'N' |
|
1926 |
else: |
|
1927 |
value = ' ' |
|
1928 |
value += "%d %d" % (pos, size) |
|
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1929 |
if not self._deltas: |
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1930 |
if 'line-delta' in options: |
1931 |
raise KnitCorrupt(self, "attempt to add line-delta in non-delta knit") |
|
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1932 |
if self._parents: |
1933 |
if self._deltas: |
|
1934 |
if 'line-delta' in options: |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1935 |
node_refs = (parents, (parents[0],)) |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1936 |
else: |
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1937 |
node_refs = (parents, ()) |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1938 |
else: |
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1939 |
node_refs = (parents, ) |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1940 |
else: |
1941 |
if parents: |
|
1942 |
raise KnitCorrupt(self, "attempt to add node with parents " |
|
1943 |
"in parentless index.") |
|
1944 |
node_refs = () |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1945 |
keys[key] = (value, node_refs) |
2841.2.1
by Robert Collins
* Commit no longer checks for new text keys during insertion when the |
1946 |
if not random_id: |
1947 |
present_nodes = self._get_entries(keys) |
|
1948 |
for (index, key, value, node_refs) in present_nodes: |
|
1949 |
if (value, node_refs) != keys[key]: |
|
1950 |
raise KnitCorrupt(self, "inconsistent details in add_versions" |
|
1951 |
": %s %s" % ((value, node_refs), keys[key])) |
|
1952 |
del keys[key] |
|
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1953 |
result = [] |
2592.3.34
by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs. |
1954 |
if self._parents: |
1955 |
for key, (value, node_refs) in keys.iteritems(): |
|
1956 |
result.append((key, value, node_refs)) |
|
1957 |
else: |
|
1958 |
for key, (value, node_refs) in keys.iteritems(): |
|
1959 |
result.append((key, value)) |
|
2592.3.19
by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions. |
1960 |
self._add_callback(result) |
2592.3.17
by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile. |
1961 |
|
2624.2.5
by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings. |
1962 |
def _version_ids_to_keys(self, version_ids): |
1963 |
return set((version_id, ) for version_id in version_ids) |
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
1964 |
|
1965 |
||
1966 |
class _KnitAccess(object): |
|
1967 |
"""Access to knit records in a .knit file.""" |
|
1968 |
||
1969 |
def __init__(self, transport, filename, _file_mode, _dir_mode, |
|
1970 |
_need_to_create, _create_parent_dir): |
|
1971 |
"""Create a _KnitAccess for accessing and inserting data. |
|
1972 |
||
1973 |
:param transport: The transport the .knit is located on.
|
|
1974 |
:param filename: The filename of the .knit.
|
|
1975 |
"""
|
|
1976 |
self._transport = transport |
|
1977 |
self._filename = filename |
|
1978 |
self._file_mode = _file_mode |
|
1979 |
self._dir_mode = _dir_mode |
|
1980 |
self._need_to_create = _need_to_create |
|
1981 |
self._create_parent_dir = _create_parent_dir |
|
1982 |
||
1983 |
def add_raw_records(self, sizes, raw_data): |
|
1984 |
"""Add raw knit bytes to a storage area. |
|
1985 |
||
1986 |
The data is spooled to whereever the access method is storing data.
|
|
1987 |
||
1988 |
:param sizes: An iterable containing the size of each raw data segment.
|
|
1989 |
:param raw_data: A bytestring containing the data.
|
|
2670.2.2
by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use |
1990 |
:return: A list of memos to retrieve the record later. Each memo is a
|
1991 |
tuple - (index, pos, length), where the index field is always None
|
|
1992 |
for the .knit access method.
|
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
1993 |
"""
|
1994 |
assert type(raw_data) == str, \ |
|
1995 |
'data must be plain bytes was %s' % type(raw_data) |
|
1996 |
if not self._need_to_create: |
|
1997 |
base = self._transport.append_bytes(self._filename, raw_data) |
|
1998 |
else: |
|
1999 |
self._transport.put_bytes_non_atomic(self._filename, raw_data, |
|
2000 |
create_parent_dir=self._create_parent_dir, |
|
2001 |
mode=self._file_mode, |
|
2002 |
dir_mode=self._dir_mode) |
|
2003 |
self._need_to_create = False |
|
2004 |
base = 0 |
|
2005 |
result = [] |
|
2006 |
for size in sizes: |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
2007 |
result.append((None, base, size)) |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2008 |
base += size |
2009 |
return result |
|
2010 |
||
2011 |
def create(self): |
|
2012 |
"""IFF this data access has its own storage area, initialise it. |
|
2013 |
||
2014 |
:return: None.
|
|
2015 |
"""
|
|
2016 |
self._transport.put_bytes_non_atomic(self._filename, '', |
|
2017 |
mode=self._file_mode) |
|
2018 |
||
2019 |
def open_file(self): |
|
2020 |
"""IFF this data access can be represented as a single file, open it. |
|
2021 |
||
2022 |
For knits that are not mapped to a single file on disk this will
|
|
2023 |
always return None.
|
|
2024 |
||
2025 |
:return: None or a file handle.
|
|
2026 |
"""
|
|
2027 |
try: |
|
2028 |
return self._transport.get(self._filename) |
|
2029 |
except NoSuchFile: |
|
2030 |
pass
|
|
2031 |
return None |
|
2032 |
||
2033 |
def get_raw_records(self, memos_for_retrieval): |
|
2034 |
"""Get the raw bytes for a records. |
|
2035 |
||
2670.2.2
by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use |
2036 |
:param memos_for_retrieval: An iterable containing the (index, pos,
|
2037 |
length) memo for retrieving the bytes. The .knit method ignores
|
|
2038 |
the index as there is always only a single file.
|
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2039 |
:return: An iterator over the bytes of the records.
|
2040 |
"""
|
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
2041 |
read_vector = [(pos, size) for (index, pos, size) in memos_for_retrieval] |
2042 |
for pos, data in self._transport.readv(self._filename, read_vector): |
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2043 |
yield data |
2044 |
||
2045 |
||
2046 |
class _PackAccess(object): |
|
2047 |
"""Access to knit records via a collection of packs.""" |
|
2048 |
||
2049 |
def __init__(self, index_to_packs, writer=None): |
|
2050 |
"""Create a _PackAccess object. |
|
2051 |
||
2052 |
:param index_to_packs: A dict mapping index objects to the transport
|
|
2053 |
and file names for obtaining data.
|
|
2054 |
:param writer: A tuple (pack.ContainerWriter, write_index) which
|
|
2670.2.3
by Robert Collins
Review feedback. |
2055 |
contains the pack to write, and the index that reads from it will
|
2056 |
be associated with.
|
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2057 |
"""
|
2058 |
if writer: |
|
2059 |
self.container_writer = writer[0] |
|
2060 |
self.write_index = writer[1] |
|
2061 |
else: |
|
2062 |
self.container_writer = None |
|
2063 |
self.write_index = None |
|
2064 |
self.indices = index_to_packs |
|
2065 |
||
2066 |
def add_raw_records(self, sizes, raw_data): |
|
2067 |
"""Add raw knit bytes to a storage area. |
|
2068 |
||
2670.2.3
by Robert Collins
Review feedback. |
2069 |
The data is spooled to the container writer in one bytes-record per
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2070 |
raw data item.
|
2071 |
||
2072 |
:param sizes: An iterable containing the size of each raw data segment.
|
|
2073 |
:param raw_data: A bytestring containing the data.
|
|
2670.2.2
by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use |
2074 |
:return: A list of memos to retrieve the record later. Each memo is a
|
2075 |
tuple - (index, pos, length), where the index field is the
|
|
2076 |
write_index object supplied to the PackAccess object.
|
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2077 |
"""
|
2078 |
assert type(raw_data) == str, \ |
|
2079 |
'data must be plain bytes was %s' % type(raw_data) |
|
2080 |
result = [] |
|
2081 |
offset = 0 |
|
2082 |
for size in sizes: |
|
2083 |
p_offset, p_length = self.container_writer.add_bytes_record( |
|
2084 |
raw_data[offset:offset+size], []) |
|
2085 |
offset += size |
|
2086 |
result.append((self.write_index, p_offset, p_length)) |
|
2087 |
return result |
|
2088 |
||
2089 |
def create(self): |
|
2090 |
"""Pack based knits do not get individually created.""" |
|
2091 |
||
2092 |
def get_raw_records(self, memos_for_retrieval): |
|
2093 |
"""Get the raw bytes for a records. |
|
2094 |
||
2670.2.2
by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use |
2095 |
:param memos_for_retrieval: An iterable containing the (index, pos,
|
2096 |
length) memo for retrieving the bytes. The Pack access method
|
|
2097 |
looks up the pack to use for a given record in its index_to_pack
|
|
2098 |
map.
|
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2099 |
:return: An iterator over the bytes of the records.
|
2100 |
"""
|
|
2101 |
# first pass, group into same-index requests
|
|
2102 |
request_lists = [] |
|
2103 |
current_index = None |
|
2104 |
for (index, offset, length) in memos_for_retrieval: |
|
2105 |
if current_index == index: |
|
2106 |
current_list.append((offset, length)) |
|
2107 |
else: |
|
2108 |
if current_index is not None: |
|
2109 |
request_lists.append((current_index, current_list)) |
|
2110 |
current_index = index |
|
2111 |
current_list = [(offset, length)] |
|
2112 |
# handle the last entry
|
|
2113 |
if current_index is not None: |
|
2114 |
request_lists.append((current_index, current_list)) |
|
2115 |
for index, offsets in request_lists: |
|
2116 |
transport, path = self.indices[index] |
|
2117 |
reader = pack.make_readv_reader(transport, path, offsets) |
|
2118 |
for names, read_func in reader.iter_records(): |
|
2119 |
yield read_func(None) |
|
2120 |
||
2121 |
def open_file(self): |
|
2122 |
"""Pack based knits have no single file.""" |
|
2123 |
return None |
|
2124 |
||
2592.3.70
by Robert Collins
Allow setting a writer after creating a knit._PackAccess object. |
2125 |
def set_writer(self, writer, index, (transport, packname)): |
2126 |
"""Set a writer to use for adding data.""" |
|
2592.3.208
by Robert Collins
Start refactoring the knit-pack thunking to be clearer. |
2127 |
if index is not None: |
2128 |
self.indices[index] = (transport, packname) |
|
2592.3.70
by Robert Collins
Allow setting a writer after creating a knit._PackAccess object. |
2129 |
self.container_writer = writer |
2130 |
self.write_index = index |
|
2131 |
||
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2132 |
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2133 |
class _StreamAccess(object): |
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2134 |
"""A Knit Access object that provides data from a datastream. |
2135 |
|
|
2136 |
It also provides a fallback to present as unannotated data, annotated data
|
|
2137 |
from a *backing* access object.
|
|
2138 |
||
2139 |
This is triggered by a index_memo which is pointing to a different index
|
|
2140 |
than this was constructed with, and is used to allow extracting full
|
|
2141 |
unannotated texts for insertion into annotated knits.
|
|
2142 |
"""
|
|
2143 |
||
2144 |
def __init__(self, reader_callable, stream_index, backing_knit, |
|
2145 |
orig_factory): |
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2146 |
"""Create a _StreamAccess object. |
2147 |
||
2148 |
:param reader_callable: The reader_callable from the datastream.
|
|
2149 |
This is called to buffer all the data immediately, for
|
|
2150 |
random access.
|
|
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2151 |
:param stream_index: The index the data stream this provides access to
|
2152 |
which will be present in native index_memo's.
|
|
2153 |
:param backing_knit: The knit object that will provide access to
|
|
2154 |
annotated texts which are not available in the stream, so as to
|
|
2155 |
create unannotated texts.
|
|
2156 |
:param orig_factory: The original content factory used to generate the
|
|
2157 |
stream. This is used for checking whether the thunk code for
|
|
2158 |
supporting _copy_texts will generate the correct form of data.
|
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2159 |
"""
|
2160 |
self.data = reader_callable(None) |
|
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2161 |
self.stream_index = stream_index |
2162 |
self.backing_knit = backing_knit |
|
2163 |
self.orig_factory = orig_factory |
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2164 |
|
2165 |
def get_raw_records(self, memos_for_retrieval): |
|
2166 |
"""Get the raw bytes for a records. |
|
2167 |
||
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2168 |
:param memos_for_retrieval: An iterable of memos from the
|
2169 |
_StreamIndex object identifying bytes to read; for these classes
|
|
2170 |
they are (from_backing_knit, index, start, end) and can point to
|
|
2171 |
either the backing knit or streamed data.
|
|
2172 |
:return: An iterator yielding a byte string for each record in
|
|
2173 |
memos_for_retrieval.
|
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2174 |
"""
|
2175 |
# use a generator for memory friendliness
|
|
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2176 |
for from_backing_knit, version_id, start, end in memos_for_retrieval: |
2177 |
if not from_backing_knit: |
|
2178 |
assert version_id is self.stream_index |
|
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2179 |
yield self.data[start:end] |
2180 |
continue
|
|
2181 |
# we have been asked to thunk. This thunking only occurs when
|
|
2182 |
# we are obtaining plain texts from an annotated backing knit
|
|
2183 |
# so that _copy_texts will work.
|
|
2184 |
# We could improve performance here by scanning for where we need
|
|
2185 |
# to do this and using get_line_list, then interleaving the output
|
|
2186 |
# as desired. However, for now, this is sufficient.
|
|
3052.2.5
by Andrew Bennetts
Address the rest of the review comments from John and myself. |
2187 |
if self.orig_factory.__class__ != KnitPlainFactory: |
2188 |
raise errors.KnitCorrupt( |
|
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2189 |
self, 'Bad thunk request %r cannot be backed by %r' % |
2190 |
(version_id, self.orig_factory)) |
|
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2191 |
lines = self.backing_knit.get_lines(version_id) |
2192 |
line_bytes = ''.join(lines) |
|
2193 |
digest = sha_string(line_bytes) |
|
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2194 |
# the packed form of the fulltext always has a trailing newline,
|
2195 |
# even if the actual text does not, unless the file is empty. the
|
|
2196 |
# record options including the noeol flag are passed through by
|
|
2197 |
# _StreamIndex, so this is safe.
|
|
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2198 |
if lines: |
2199 |
if lines[-1][-1] != '\n': |
|
2200 |
lines[-1] = lines[-1] + '\n' |
|
2201 |
line_bytes += '\n' |
|
2202 |
# We want plain data, because we expect to thunk only to allow text
|
|
2203 |
# extraction.
|
|
2204 |
size, bytes = self.backing_knit._data._record_to_data(version_id, |
|
2205 |
digest, lines, line_bytes) |
|
2206 |
yield bytes |
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2207 |
|
2208 |
||
2209 |
class _StreamIndex(object): |
|
2210 |
"""A Knit Index object that uses the data map from a datastream.""" |
|
2211 |
||
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
2212 |
def __init__(self, data_list, backing_index): |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2213 |
"""Create a _StreamIndex object. |
2214 |
||
2215 |
:param data_list: The data_list from the datastream.
|
|
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
2216 |
:param backing_index: The index which will supply values for nodes
|
2217 |
referenced outside of this stream.
|
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2218 |
"""
|
2219 |
self.data_list = data_list |
|
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
2220 |
self.backing_index = backing_index |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2221 |
self._by_version = {} |
2222 |
pos = 0 |
|
2223 |
for key, options, length, parents in data_list: |
|
2224 |
self._by_version[key] = options, (pos, pos + length), parents |
|
2225 |
pos += length |
|
2226 |
||
2227 |
def get_ancestry(self, versions, topo_sorted): |
|
2228 |
"""Get an ancestry list for versions.""" |
|
2229 |
if topo_sorted: |
|
2230 |
# Not needed for basic joins
|
|
2231 |
raise NotImplementedError(self.get_ancestry) |
|
2232 |
# get a graph of all the mentioned versions:
|
|
2233 |
# Little ugly - basically copied from KnitIndex, but don't want to
|
|
2234 |
# accidentally incorporate too much of that index's code.
|
|
3052.2.4
by Andrew Bennetts
Some tweaks suggested by John's review. |
2235 |
ancestry = set() |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2236 |
pending = set(versions) |
2237 |
cache = self._by_version |
|
2238 |
while pending: |
|
2239 |
version = pending.pop() |
|
2240 |
# trim ghosts
|
|
2241 |
try: |
|
2242 |
parents = [p for p in cache[version][2] if p in cache] |
|
2243 |
except KeyError: |
|
2244 |
raise RevisionNotPresent(version, self) |
|
2245 |
# if not completed and not a ghost
|
|
3052.2.4
by Andrew Bennetts
Some tweaks suggested by John's review. |
2246 |
pending.update([p for p in parents if p not in ancestry]) |
2247 |
ancestry.add(version) |
|
2248 |
return list(ancestry) |
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2249 |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
2250 |
def get_build_details(self, version_ids): |
2251 |
"""Get the method, index_memo and compression parent for version_ids. |
|
2252 |
||
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
2253 |
Ghosts are omitted from the result.
|
2254 |
||
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
2255 |
:param version_ids: An iterable of version_ids.
|
3224.1.18
by John Arbash Meinel
Cleanup documentation |
2256 |
:return: A dict of version_id:(index_memo, compression_parent,
|
2257 |
parents, record_details).
|
|
2258 |
index_memo
|
|
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2259 |
opaque memo that can be passed to _StreamAccess.read_records
|
2260 |
to extract the raw data; for these classes it is
|
|
2261 |
(from_backing_knit, index, start, end)
|
|
3224.1.18
by John Arbash Meinel
Cleanup documentation |
2262 |
compression_parent
|
2263 |
Content that this record is built upon, may be None
|
|
2264 |
parents
|
|
2265 |
Logical parents of this node
|
|
2266 |
record_details
|
|
2267 |
extra information about the content which needs to be passed to
|
|
2268 |
Factory.parse_record
|
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
2269 |
"""
|
2270 |
result = {} |
|
2271 |
for version_id in version_ids: |
|
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
2272 |
try: |
2273 |
method = self.get_method(version_id) |
|
2274 |
except errors.RevisionNotPresent: |
|
2275 |
# ghosts are omitted
|
|
2276 |
continue
|
|
3224.1.7
by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details. |
2277 |
parent_ids = self.get_parents_with_ghosts(version_id) |
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
2278 |
noeol = ('no-eol' in self.get_options(version_id)) |
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2279 |
index_memo = self.get_position(version_id) |
2280 |
from_backing_knit = index_memo[0] |
|
2281 |
if from_backing_knit: |
|
2282 |
# texts retrieved from the backing knit are always full texts
|
|
2283 |
method = 'fulltext' |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
2284 |
if method == 'fulltext': |
2285 |
compression_parent = None |
|
2286 |
else: |
|
3224.1.7
by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details. |
2287 |
compression_parent = parent_ids[0] |
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
2288 |
result[version_id] = (index_memo, compression_parent, |
2289 |
parent_ids, (method, noeol)) |
|
3218.1.1
by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries. |
2290 |
return result |
2291 |
||
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2292 |
def get_method(self, version_id): |
2293 |
"""Return compression method of specified version.""" |
|
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2294 |
options = self.get_options(version_id) |
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2295 |
if 'fulltext' in options: |
2296 |
return 'fulltext' |
|
2297 |
elif 'line-delta' in options: |
|
2298 |
return 'line-delta' |
|
2299 |
else: |
|
2300 |
raise errors.KnitIndexUnknownMethod(self, options) |
|
2301 |
||
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2302 |
def get_options(self, version_id): |
3052.2.5
by Andrew Bennetts
Address the rest of the review comments from John and myself. |
2303 |
"""Return a list representing options. |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2304 |
|
2305 |
e.g. ['foo', 'bar']
|
|
2306 |
"""
|
|
3224.1.8
by John Arbash Meinel
Add noeol to the return signature of get_build_details. |
2307 |
try: |
2308 |
return self._by_version[version_id][0] |
|
2309 |
except KeyError: |
|
3287.7.6
by Andrew Bennetts
Tweaks suggested by Robert's review. |
2310 |
options = list(self.backing_index.get_options(version_id)) |
2311 |
if 'fulltext' in options: |
|
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2312 |
pass
|
3287.7.6
by Andrew Bennetts
Tweaks suggested by Robert's review. |
2313 |
elif 'line-delta' in options: |
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2314 |
# Texts from the backing knit are always returned from the stream
|
2315 |
# as full texts
|
|
3287.7.6
by Andrew Bennetts
Tweaks suggested by Robert's review. |
2316 |
options.remove('line-delta') |
2317 |
options.append('fulltext') |
|
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2318 |
else: |
2319 |
raise errors.KnitIndexUnknownMethod(self, options) |
|
3287.7.6
by Andrew Bennetts
Tweaks suggested by Robert's review. |
2320 |
return tuple(options) |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2321 |
|
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
2322 |
def get_parent_map(self, version_ids): |
2323 |
"""Passed through to by KnitVersionedFile.get_parent_map.""" |
|
2324 |
result = {} |
|
2325 |
pending_ids = set() |
|
2326 |
for version_id in version_ids: |
|
2327 |
try: |
|
2328 |
result[version_id] = self._by_version[version_id][2] |
|
2329 |
except KeyError: |
|
2330 |
pending_ids.add(version_id) |
|
2331 |
result.update(self.backing_index.get_parent_map(pending_ids)) |
|
2332 |
return result |
|
2333 |
||
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2334 |
def get_parents_with_ghosts(self, version_id): |
2335 |
"""Return parents of specified version with ghosts.""" |
|
3224.1.7
by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details. |
2336 |
try: |
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
2337 |
return self.get_parent_map([version_id])[version_id] |
3224.1.7
by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details. |
2338 |
except KeyError: |
3287.5.5
by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map. |
2339 |
raise RevisionNotPresent(version_id, self) |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2340 |
|
2341 |
def get_position(self, version_id): |
|
2342 |
"""Return details needed to access the version. |
|
2343 |
|
|
2344 |
_StreamAccess has the data as a big array, so we return slice
|
|
3052.2.5
by Andrew Bennetts
Address the rest of the review comments from John and myself. |
2345 |
coordinates into that (as index_memo's are opaque outside the
|
2346 |
index and matching access class).
|
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2347 |
|
3287.12.1
by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits. |
2348 |
:return: a tuple (from_backing_knit, index, start, end) that can
|
2349 |
be passed e.g. to get_raw_records.
|
|
2350 |
If from_backing_knit is False, index will be self, otherwise it
|
|
2351 |
will be a version id.
|
|
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2352 |
"""
|
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2353 |
try: |
2354 |
start, end = self._by_version[version_id][1] |
|
3052.2.5
by Andrew Bennetts
Address the rest of the review comments from John and myself. |
2355 |
return False, self, start, end |
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2356 |
except KeyError: |
2357 |
# Signal to the access object to handle this from the backing knit.
|
|
3052.2.5
by Andrew Bennetts
Address the rest of the review comments from John and myself. |
2358 |
return (True, version_id, None, None) |
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2359 |
|
2360 |
def get_versions(self): |
|
2361 |
"""Get all the versions in the stream.""" |
|
2362 |
return self._by_version.keys() |
|
2363 |
||
2364 |
||
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2365 |
class _KnitData(object): |
2670.2.2
by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use |
2366 |
"""Manage extraction of data from a KnitAccess, caching and decompressing. |
2367 |
|
|
2368 |
The KnitData class provides the logic for parsing and using knit records,
|
|
2369 |
making use of an access method for the low level read and write operations.
|
|
2370 |
"""
|
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2371 |
|
2372 |
def __init__(self, access): |
|
2373 |
"""Create a KnitData object. |
|
2374 |
||
2375 |
:param access: The access method to use. Access methods such as
|
|
2376 |
_KnitAccess manage the insertion of raw records and the subsequent
|
|
2377 |
retrieval of the same.
|
|
2378 |
"""
|
|
2379 |
self._access = access |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
2380 |
self._checked = False |
2381 |
||
2382 |
def _open_file(self): |
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2383 |
return self._access.open_file() |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
2384 |
|
2888.1.1
by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins) |
2385 |
def _record_to_data(self, version_id, digest, lines, dense_lines=None): |
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2386 |
"""Convert version_id, digest, lines into a raw data block. |
2387 |
|
|
2888.1.2
by Robert Collins
Cleanup the dense_lines parameter docstring to be more useful. |
2388 |
:param dense_lines: The bytes of lines but in a denser form. For
|
2389 |
instance, if lines is a list of 1000 bytestrings each ending in \n,
|
|
2390 |
dense_lines may be a list with one line in it, containing all the
|
|
2391 |
1000's lines and their \n's. Using dense_lines if it is already
|
|
2392 |
known is a win because the string join to create bytes in this
|
|
2393 |
function spends less time resizing the final string.
|
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2394 |
:return: (len, a StringIO instance with the raw data ready to read.)
|
2395 |
"""
|
|
2888.1.1
by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins) |
2396 |
# Note: using a string copy here increases memory pressure with e.g.
|
2397 |
# ISO's, but it is about 3 seconds faster on a 1.2Ghz intel machine
|
|
2398 |
# when doing the initial commit of a mozilla tree. RBC 20070921
|
|
2399 |
bytes = ''.join(chain( |
|
2249.5.15
by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down. |
2400 |
["version %s %d %s\n" % (version_id, |
1596.2.28
by Robert Collins
more knit profile based tuning. |
2401 |
len(lines), |
2402 |
digest)], |
|
2888.1.1
by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins) |
2403 |
dense_lines or lines, |
2404 |
["end %s\n" % version_id])) |
|
2817.3.1
by Robert Collins
* New helper ``bzrlib.tuned_gzip.bytes_to_gzip`` which takes a byte string |
2405 |
assert bytes.__class__ == str |
2406 |
compressed_bytes = bytes_to_gzip(bytes) |
|
2407 |
return len(compressed_bytes), compressed_bytes |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
2408 |
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2409 |
def add_raw_records(self, sizes, raw_data): |
1692.4.1
by Robert Collins
Multiple merges: |
2410 |
"""Append a prepared record to the data file. |
2329.1.2
by John Arbash Meinel
Remove some spurious whitespace changes. |
2411 |
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2412 |
:param sizes: An iterable containing the size of each raw data segment.
|
2413 |
:param raw_data: A bytestring containing the data.
|
|
2414 |
:return: a list of index data for the way the data was stored.
|
|
2415 |
See the access method add_raw_records documentation for more
|
|
2416 |
details.
|
|
1692.4.1
by Robert Collins
Multiple merges: |
2417 |
"""
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2418 |
return self._access.add_raw_records(sizes, raw_data) |
2329.1.2
by John Arbash Meinel
Remove some spurious whitespace changes. |
2419 |
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2420 |
def _parse_record_header(self, version_id, raw_data): |
2421 |
"""Parse a record header for consistency. |
|
2422 |
||
2423 |
:return: the header and the decompressor stream.
|
|
2424 |
as (stream, header_record)
|
|
2425 |
"""
|
|
2426 |
df = GzipFile(mode='rb', fileobj=StringIO(raw_data)) |
|
2329.1.1
by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption. |
2427 |
try: |
2428 |
rec = self._check_header(version_id, df.readline()) |
|
2358.3.4
by Martin Pool
Fix mangled knit.py changes |
2429 |
except Exception, e: |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2430 |
raise KnitCorrupt(self._access, |
2329.1.1
by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption. |
2431 |
"While reading {%s} got %s(%s)" |
2432 |
% (version_id, e.__class__.__name__, str(e))) |
|
2358.3.4
by Martin Pool
Fix mangled knit.py changes |
2433 |
return df, rec |
2163.2.4
by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines() |
2434 |
|
2358.3.4
by Martin Pool
Fix mangled knit.py changes |
2435 |
def _check_header(self, version_id, line): |
2436 |
rec = line.split() |
|
2437 |
if len(rec) != 4: |
|
2438 |
raise KnitCorrupt(self._access, |
|
2163.2.4
by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines() |
2439 |
'unexpected number of elements in record header') |
2249.5.12
by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8 |
2440 |
if rec[1] != version_id: |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2441 |
raise KnitCorrupt(self._access, |
2163.2.4
by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines() |
2442 |
'unexpected version, wanted %r, got %r' |
2443 |
% (version_id, rec[1])) |
|
2444 |
return rec |
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2445 |
|
2446 |
def _parse_record(self, version_id, data): |
|
1628.1.2
by Robert Collins
More knit micro-optimisations. |
2447 |
# profiling notes:
|
2448 |
# 4168 calls in 2880 217 internal
|
|
2449 |
# 4168 calls to _parse_record_header in 2121
|
|
2450 |
# 4168 calls to readlines in 330
|
|
2163.2.4
by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines() |
2451 |
df = GzipFile(mode='rb', fileobj=StringIO(data)) |
2452 |
||
2329.1.1
by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption. |
2453 |
try: |
2454 |
record_contents = df.readlines() |
|
2358.3.4
by Martin Pool
Fix mangled knit.py changes |
2455 |
except Exception, e: |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2456 |
raise KnitCorrupt(self._access, |
2329.1.1
by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption. |
2457 |
"While reading {%s} got %s(%s)" |
2458 |
% (version_id, e.__class__.__name__, str(e))) |
|
2163.2.4
by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines() |
2459 |
header = record_contents.pop(0) |
2460 |
rec = self._check_header(version_id, header) |
|
2461 |
||
2462 |
last_line = record_contents.pop() |
|
2329.1.1
by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption. |
2463 |
if len(record_contents) != int(rec[2]): |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2464 |
raise KnitCorrupt(self._access, |
2329.1.1
by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption. |
2465 |
'incorrect number of lines %s != %s' |
2466 |
' for version {%s}' |
|
2467 |
% (len(record_contents), int(rec[2]), |
|
2468 |
version_id)) |
|
2163.2.4
by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines() |
2469 |
if last_line != 'end %s\n' % rec[1]: |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2470 |
raise KnitCorrupt(self._access, |
2163.2.4
by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines() |
2471 |
'unexpected version end line %r, wanted %r' |
2472 |
% (last_line, version_id)) |
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2473 |
df.close() |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
2474 |
return record_contents, rec[3] |
2475 |
||
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2476 |
def read_records_iter_raw(self, records): |
2477 |
"""Read text records from data file and yield raw data. |
|
2478 |
||
2479 |
This unpacks enough of the text record to validate the id is
|
|
2480 |
as expected but thats all.
|
|
2481 |
"""
|
|
2482 |
# setup an iterator of the external records:
|
|
2483 |
# uses readv so nice and fast we hope.
|
|
1756.3.23
by Aaron Bentley
Remove knit caches |
2484 |
if len(records): |
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2485 |
# grab the disk data needed.
|
3316.2.11
by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated. |
2486 |
needed_offsets = [index_memo for version_id, index_memo |
2487 |
in records] |
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2488 |
raw_records = self._access.get_raw_records(needed_offsets) |
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2489 |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
2490 |
for version_id, index_memo in records: |
3316.2.11
by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated. |
2491 |
data = raw_records.next() |
2492 |
# validate the header
|
|
2493 |
df, rec = self._parse_record_header(version_id, data) |
|
2494 |
df.close() |
|
1756.3.23
by Aaron Bentley
Remove knit caches |
2495 |
yield version_id, data |
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2496 |
|
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
2497 |
def read_records_iter(self, records): |
2498 |
"""Read text records from data file and yield result. |
|
2499 |
||
1863.1.5
by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order. |
2500 |
The result will be returned in whatever is the fastest to read.
|
2501 |
Not by the order requested. Also, multiple requests for the same
|
|
2502 |
record will only yield 1 response.
|
|
2503 |
:param records: A list of (version_id, pos, len) entries
|
|
2504 |
:return: Yields (version_id, contents, digest) in the order
|
|
2505 |
read, not the order requested
|
|
2506 |
"""
|
|
2507 |
if not records: |
|
2508 |
return
|
|
2509 |
||
3316.2.11
by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated. |
2510 |
needed_records = sorted(set(records), key=operator.itemgetter(1)) |
1863.1.5
by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order. |
2511 |
if not needed_records: |
2512 |
return
|
|
2513 |
||
2514 |
# The transport optimizes the fetching as well
|
|
2515 |
# (ie, reads continuous ranges.)
|
|
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2516 |
raw_data = self._access.get_raw_records( |
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
2517 |
[index_memo for version_id, index_memo in needed_records]) |
1863.1.5
by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order. |
2518 |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
2519 |
for (version_id, index_memo), data in \ |
2592.3.66
by Robert Collins
Allow adaption of KnitData to pack files. |
2520 |
izip(iter(needed_records), raw_data): |
1863.1.5
by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order. |
2521 |
content, digest = self._parse_record(version_id, data) |
1756.3.23
by Aaron Bentley
Remove knit caches |
2522 |
yield version_id, content, digest |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
2523 |
|
2524 |
def read_records(self, records): |
|
2525 |
"""Read records into a dictionary.""" |
|
2526 |
components = {} |
|
1863.1.5
by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order. |
2527 |
for record_id, content, digest in \ |
1863.1.9
by John Arbash Meinel
Switching to have 'read_records_iter' return in random order. |
2528 |
self.read_records_iter(records): |
1563.2.4
by Robert Collins
First cut at including the knit implementation of versioned_file. |
2529 |
components[record_id] = (content, digest) |
2530 |
return components |
|
2531 |
||
1563.2.13
by Robert Collins
InterVersionedFile implemented. |
2532 |
|
2533 |
class InterKnit(InterVersionedFile): |
|
2534 |
"""Optimised code paths for knit to knit operations.""" |
|
2535 |
||
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
2536 |
_matching_file_from_factory = staticmethod(make_file_knit) |
2537 |
_matching_file_to_factory = staticmethod(make_file_knit) |
|
1563.2.13
by Robert Collins
InterVersionedFile implemented. |
2538 |
|
2539 |
@staticmethod
|
|
2540 |
def is_compatible(source, target): |
|
2541 |
"""Be compatible with knits. """ |
|
2542 |
try: |
|
2543 |
return (isinstance(source, KnitVersionedFile) and |
|
2544 |
isinstance(target, KnitVersionedFile)) |
|
2545 |
except AttributeError: |
|
2546 |
return False |
|
2547 |
||
2998.2.3
by John Arbash Meinel
Respond to Aaron's requests |
2548 |
def _copy_texts(self, pb, msg, version_ids, ignore_missing=False): |
2998.2.2
by John Arbash Meinel
implement a faster path for copying from packs back to knits. |
2549 |
"""Copy texts to the target by extracting and adding them one by one. |
2550 |
||
2551 |
see join() for the parameter definitions.
|
|
2552 |
"""
|
|
2553 |
version_ids = self._get_source_version_ids(version_ids, ignore_missing) |
|
3287.6.1
by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method. |
2554 |
# --- the below is factorable out with VersionedFile.join, but wait for
|
2555 |
# VersionedFiles, it may all be simpler then.
|
|
2556 |
graph = Graph(self.source) |
|
2557 |
search = graph._make_breadth_first_searcher(version_ids) |
|
2558 |
transitive_ids = set() |
|
2559 |
map(transitive_ids.update, list(search)) |
|
2560 |
parent_map = self.source.get_parent_map(transitive_ids) |
|
2561 |
order = topo_sort(parent_map.items()) |
|
2998.2.2
by John Arbash Meinel
implement a faster path for copying from packs back to knits. |
2562 |
|
2563 |
def size_of_content(content): |
|
2564 |
return sum(len(line) for line in content.text()) |
|
2565 |
# Cache at most 10MB of parent texts
|
|
2566 |
parent_cache = lru_cache.LRUSizeCache(max_size=10*1024*1024, |
|
2567 |
compute_size=size_of_content) |
|
2568 |
# TODO: jam 20071116 It would be nice to have a streaming interface to
|
|
2569 |
# get multiple texts from a source. The source could be smarter
|
|
2570 |
# about how it handled intermediate stages.
|
|
2998.2.3
by John Arbash Meinel
Respond to Aaron's requests |
2571 |
# get_line_list() or make_mpdiffs() seem like a possibility, but
|
2572 |
# at the moment they extract all full texts into memory, which
|
|
2573 |
# causes us to store more than our 3x fulltext goal.
|
|
2574 |
# Repository.iter_files_bytes() may be another possibility
|
|
2998.2.2
by John Arbash Meinel
implement a faster path for copying from packs back to knits. |
2575 |
to_process = [version for version in order |
2576 |
if version not in self.target] |
|
2577 |
total = len(to_process) |
|
2578 |
pb = ui.ui_factory.nested_progress_bar() |
|
2579 |
try: |
|
2580 |
for index, version in enumerate(to_process): |
|
2581 |
pb.update('Converting versioned data', index, total) |
|
2582 |
sha1, num_bytes, parent_text = self.target.add_lines(version, |
|
3052.2.3
by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit. |
2583 |
self.source.get_parents_with_ghosts(version), |
2998.2.2
by John Arbash Meinel
implement a faster path for copying from packs back to knits. |
2584 |
self.source.get_lines(version), |
2585 |
parent_texts=parent_cache) |
|
2586 |
parent_cache[version] = parent_text |
|
2587 |
finally: |
|
2588 |
pb.finished() |
|
2589 |
return total |
|
2590 |
||
1563.2.31
by Robert Collins
Convert Knit repositories to use knits. |
2591 |
def join(self, pb=None, msg=None, version_ids=None, ignore_missing=False): |
1563.2.13
by Robert Collins
InterVersionedFile implemented. |
2592 |
"""See InterVersionedFile.join.""" |
2593 |
assert isinstance(self.source, KnitVersionedFile) |
|
2594 |
assert isinstance(self.target, KnitVersionedFile) |
|
2595 |
||
2851.4.3
by Ian Clatworthy
fix up plain-to-annotated knit conversion |
2596 |
# If the source and target are mismatched w.r.t. annotations vs
|
2597 |
# plain, the data needs to be converted accordingly
|
|
2598 |
if self.source.factory.annotated == self.target.factory.annotated: |
|
2599 |
converter = None |
|
2600 |
elif self.source.factory.annotated: |
|
2601 |
converter = self._anno_to_plain_converter |
|
2602 |
else: |
|
2998.2.3
by John Arbash Meinel
Respond to Aaron's requests |
2603 |
# We're converting from a plain to an annotated knit. Copy them
|
2604 |
# across by full texts.
|
|
2605 |
return self._copy_texts(pb, msg, version_ids, ignore_missing) |
|
2851.4.3
by Ian Clatworthy
fix up plain-to-annotated knit conversion |
2606 |
|
1684.3.2
by Robert Collins
Factor out version_ids-to-join selection in InterVersionedfile. |
2607 |
version_ids = self._get_source_version_ids(version_ids, ignore_missing) |
1563.2.13
by Robert Collins
InterVersionedFile implemented. |
2608 |
if not version_ids: |
2609 |
return 0 |
|
2610 |
||
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
2611 |
pb = ui.ui_factory.nested_progress_bar() |
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2612 |
try: |
2613 |
version_ids = list(version_ids) |
|
2614 |
if None in version_ids: |
|
2615 |
version_ids.remove(None) |
|
2616 |
||
3052.2.2
by Robert Collins
* Operations pulling data from a smart server where the underlying |
2617 |
self.source_ancestry = set(self.source.get_ancestry(version_ids, |
2618 |
topo_sorted=False)) |
|
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2619 |
this_versions = set(self.target._index.get_versions()) |
2825.4.1
by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some |
2620 |
# XXX: For efficiency we should not look at the whole index,
|
2621 |
# we only need to consider the referenced revisions - they
|
|
2622 |
# must all be present, or the method must be full-text.
|
|
2623 |
# TODO, RBC 20070919
|
|
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2624 |
needed_versions = self.source_ancestry - this_versions |
2625 |
||
2825.4.1
by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some |
2626 |
if not needed_versions: |
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2627 |
return 0 |
3287.6.1
by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method. |
2628 |
full_list = topo_sort( |
2629 |
self.source.get_parent_map(self.source.versions())) |
|
1910.2.65
by Aaron Bentley
Remove the check-parent patch |
2630 |
|
2631 |
version_list = [i for i in full_list if (not self.target.has_version(i) |
|
2632 |
and i in needed_versions)] |
|
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2633 |
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2634 |
# plan the join:
|
2635 |
copy_queue = [] |
|
2636 |
copy_queue_records = [] |
|
2637 |
copy_set = set() |
|
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2638 |
for version_id in version_list: |
2639 |
options = self.source._index.get_options(version_id) |
|
2640 |
parents = self.source._index.get_parents_with_ghosts(version_id) |
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2641 |
# check that its will be a consistent copy:
|
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2642 |
for parent in parents: |
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2643 |
# if source has the parent, we must :
|
2644 |
# * already have it or
|
|
2645 |
# * have it scheduled already
|
|
1759.2.2
by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron. |
2646 |
# otherwise we don't care
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2647 |
assert (self.target.has_version(parent) or |
2648 |
parent in copy_set or |
|
2649 |
not self.source.has_version(parent)) |
|
2592.3.71
by Robert Collins
Basic version of knit-based repository operating, many tests failing. |
2650 |
index_memo = self.source._index.get_position(version_id) |
2651 |
copy_queue_records.append((version_id, index_memo)) |
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2652 |
copy_queue.append((version_id, options, parents)) |
2653 |
copy_set.add(version_id) |
|
2654 |
||
2655 |
# data suck the join:
|
|
2656 |
count = 0 |
|
2657 |
total = len(version_list) |
|
1692.2.1
by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions. |
2658 |
raw_datum = [] |
2659 |
raw_records = [] |
|
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2660 |
for (version_id, raw_data), \ |
2661 |
(version_id2, options, parents) in \ |
|
2662 |
izip(self.source._data.read_records_iter_raw(copy_queue_records), |
|
2663 |
copy_queue): |
|
2664 |
assert version_id == version_id2, 'logic error, inconsistent results' |
|
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2665 |
count = count + 1 |
1596.2.8
by Robert Collins
Join knits with the original gzipped data avoiding recompression. |
2666 |
pb.update("Joining knit", count, total) |
2851.4.2
by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge |
2667 |
if converter: |
2668 |
size, raw_data = converter(raw_data, version_id, options, |
|
2669 |
parents) |
|
2851.4.1
by Ian Clatworthy
Support joining plain knits to annotated knits and vice versa |
2670 |
else: |
2671 |
size = len(raw_data) |
|
2672 |
raw_records.append((version_id, options, parents, size)) |
|
1692.2.1
by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions. |
2673 |
raw_datum.append(raw_data) |
2674 |
self.target._add_raw_records(raw_records, ''.join(raw_datum)) |
|
1594.2.24
by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching. |
2675 |
return count |
2676 |
finally: |
|
2677 |
pb.finished() |
|
1563.2.13
by Robert Collins
InterVersionedFile implemented. |
2678 |
|
2851.4.2
by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge |
2679 |
def _anno_to_plain_converter(self, raw_data, version_id, options, |
2680 |
parents): |
|
2681 |
"""Convert annotated content to plain content.""" |
|
2682 |
data, digest = self.source._data._parse_record(version_id, raw_data) |
|
2683 |
if 'fulltext' in options: |
|
2684 |
content = self.source.factory.parse_fulltext(data, version_id) |
|
2685 |
lines = self.target.factory.lower_fulltext(content) |
|
2686 |
else: |
|
2687 |
delta = self.source.factory.parse_line_delta(data, version_id, |
|
2688 |
plain=True) |
|
2689 |
lines = self.target.factory.lower_line_delta(delta) |
|
2690 |
return self.target._data._record_to_data(version_id, digest, lines) |
|
2691 |
||
1563.2.13
by Robert Collins
InterVersionedFile implemented. |
2692 |
|
2693 |
InterVersionedFile.register_optimiser(InterKnit) |
|
1596.2.24
by Robert Collins
Gzipfile was slightly slower than ideal. |
2694 |
|
2695 |
||
1684.3.3
by Robert Collins
Add a special cased weaves to knit converter. |
2696 |
class WeaveToKnit(InterVersionedFile): |
2697 |
"""Optimised code paths for weave to knit operations.""" |
|
2698 |
||
2699 |
_matching_file_from_factory = bzrlib.weave.WeaveFile |
|
3316.2.3
by Robert Collins
Remove manual notification of transaction finishing on versioned files. |
2700 |
_matching_file_to_factory = staticmethod(make_file_knit) |
1684.3.3
by Robert Collins
Add a special cased weaves to knit converter. |
2701 |
|
2702 |
@staticmethod
|
|
2703 |
def is_compatible(source, target): |
|
2704 |
"""Be compatible with weaves to knits.""" |
|
2705 |
try: |
|
2706 |
return (isinstance(source, bzrlib.weave.Weave) and |
|
2707 |
isinstance(target, KnitVersionedFile)) |
|
2708 |
except AttributeError: |
|
2709 |
return False |
|
2710 |
||
2711 |
def join(self, pb=None, msg=None, version_ids=None, ignore_missing=False): |
|
2712 |
"""See InterVersionedFile.join.""" |
|
2713 |
assert isinstance(self.source, bzrlib.weave.Weave) |
|
2714 |
assert isinstance(self.target, KnitVersionedFile) |
|
2715 |
||
2716 |
version_ids = self._get_source_version_ids(version_ids, ignore_missing) |
|
2717 |
||
2718 |
if not version_ids: |
|
2719 |
return 0 |
|
2720 |
||
2158.3.1
by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations |
2721 |
pb = ui.ui_factory.nested_progress_bar() |
1684.3.3
by Robert Collins
Add a special cased weaves to knit converter. |
2722 |
try: |
2723 |
version_ids = list(version_ids) |
|
2724 |
||
2725 |
self.source_ancestry = set(self.source.get_ancestry(version_ids)) |
|
2726 |
this_versions = set(self.target._index.get_versions()) |
|
2727 |
needed_versions = self.source_ancestry - this_versions |
|
2728 |
||
2825.4.1
by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some |
2729 |
if not needed_versions: |
1684.3.3
by Robert Collins
Add a special cased weaves to knit converter. |
2730 |
return 0 |
3287.6.1
by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method. |
2731 |
full_list = topo_sort( |
2732 |
self.source.get_parent_map(self.source.versions())) |
|
1684.3.3
by Robert Collins
Add a special cased weaves to knit converter. |
2733 |
|
2734 |
version_list = [i for i in full_list if (not self.target.has_version(i) |
|
2735 |
and i in needed_versions)] |
|
2736 |
||
2737 |
# do the join:
|
|
2738 |
count = 0 |
|
2739 |
total = len(version_list) |
|
3287.5.2
by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code. |
2740 |
parent_map = self.source.get_parent_map(version_list) |
1684.3.3
by Robert Collins
Add a special cased weaves to knit converter. |
2741 |
for version_id in version_list: |
2742 |
pb.update("Converting to knit", count, total) |
|
3287.5.2
by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code. |
2743 |
parents = parent_map[version_id] |
1684.3.3
by Robert Collins
Add a special cased weaves to knit converter. |
2744 |
# check that its will be a consistent copy:
|
2745 |
for parent in parents: |
|
2746 |
# if source has the parent, we must already have it
|
|
2747 |
assert (self.target.has_version(parent)) |
|
2748 |
self.target.add_lines( |
|
2749 |
version_id, parents, self.source.get_lines(version_id)) |
|
2750 |
count = count + 1 |
|
2751 |
return count |
|
2752 |
finally: |
|
2753 |
pb.finished() |
|
2754 |
||
2755 |
||
2756 |
InterVersionedFile.register_optimiser(WeaveToKnit) |
|
2757 |
||
2758 |
||
2781.1.1
by Martin Pool
merge cpatiencediff from Lukas |
2759 |
# Deprecated, use PatienceSequenceMatcher instead
|
2760 |
KnitSequenceMatcher = patiencediff.PatienceSequenceMatcher |
|
2484.1.1
by John Arbash Meinel
Add an initial function to read knit indexes in pyrex. |
2761 |
|
2762 |
||
2770.1.2
by Aaron Bentley
Convert to knit-only annotation |
2763 |
def annotate_knit(knit, revision_id): |
2764 |
"""Annotate a knit with no cached annotations. |
|
2765 |
||
2766 |
This implementation is for knits with no cached annotations.
|
|
2767 |
It will work for knits with cached annotations, but this is not
|
|
2768 |
recommended.
|
|
2769 |
"""
|
|
3224.1.7
by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details. |
2770 |
annotator = _KnitAnnotator(knit) |
3224.1.25
by John Arbash Meinel
Quick change to the _KnitAnnotator api to use .annotate() instead of get_annotated_lines() |
2771 |
return iter(annotator.annotate(revision_id)) |
3224.1.7
by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details. |
2772 |
|
2773 |
||
2774 |
class _KnitAnnotator(object): |
|
3224.1.5
by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations. |
2775 |
"""Build up the annotations for a text.""" |
2776 |
||
2777 |
def __init__(self, knit): |
|
2778 |
self._knit = knit |
|
2779 |
||
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2780 |
# Content objects, differs from fulltexts because of how final newlines
|
2781 |
# are treated by knits. the content objects here will always have a
|
|
2782 |
# final newline
|
|
2783 |
self._fulltext_contents = {} |
|
2784 |
||
2785 |
# Annotated lines of specific revisions
|
|
2786 |
self._annotated_lines = {} |
|
2787 |
||
2788 |
# Track the raw data for nodes that we could not process yet.
|
|
2789 |
# This maps the revision_id of the base to a list of children that will
|
|
2790 |
# annotated from it.
|
|
2791 |
self._pending_children = {} |
|
2792 |
||
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
2793 |
# Nodes which cannot be extracted
|
2794 |
self._ghosts = set() |
|
2795 |
||
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2796 |
# Track how many children this node has, so we know if we need to keep
|
2797 |
# it
|
|
2798 |
self._annotate_children = {} |
|
2799 |
self._compression_children = {} |
|
2800 |
||
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2801 |
self._all_build_details = {} |
3224.1.10
by John Arbash Meinel
Introduce the heads_provider for reannotate. |
2802 |
# The children => parent revision_id graph
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2803 |
self._revision_id_graph = {} |
2804 |
||
3224.1.10
by John Arbash Meinel
Introduce the heads_provider for reannotate. |
2805 |
self._heads_provider = None |
2806 |
||
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2807 |
self._nodes_to_keep_annotations = set() |
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2808 |
self._generations_until_keep = 100 |
2809 |
||
2810 |
def set_generations_until_keep(self, value): |
|
2811 |
"""Set the number of generations before caching a node. |
|
2812 |
||
2813 |
Setting this to -1 will cache every merge node, setting this higher
|
|
2814 |
will cache fewer nodes.
|
|
2815 |
"""
|
|
2816 |
self._generations_until_keep = value |
|
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2817 |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2818 |
def _add_fulltext_content(self, revision_id, content_obj): |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2819 |
self._fulltext_contents[revision_id] = content_obj |
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2820 |
# TODO: jam 20080305 It might be good to check the sha1digest here
|
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2821 |
return content_obj.text() |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2822 |
|
2823 |
def _check_parents(self, child, nodes_to_annotate): |
|
2824 |
"""Check if all parents have been processed. |
|
2825 |
||
2826 |
:param child: A tuple of (rev_id, parents, raw_content)
|
|
2827 |
:param nodes_to_annotate: If child is ready, add it to
|
|
2828 |
nodes_to_annotate, otherwise put it back in self._pending_children
|
|
2829 |
"""
|
|
2830 |
for parent_id in child[1]: |
|
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
2831 |
if (parent_id not in self._annotated_lines): |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2832 |
# This parent is present, but another parent is missing
|
2833 |
self._pending_children.setdefault(parent_id, |
|
2834 |
[]).append(child) |
|
2835 |
break
|
|
2836 |
else: |
|
2837 |
# This one is ready to be processed
|
|
2838 |
nodes_to_annotate.append(child) |
|
2839 |
||
2840 |
def _add_annotation(self, revision_id, fulltext, parent_ids, |
|
2841 |
left_matching_blocks=None): |
|
2842 |
"""Add an annotation entry. |
|
2843 |
||
2844 |
All parents should already have been annotated.
|
|
2845 |
:return: A list of children that now have their parents satisfied.
|
|
2846 |
"""
|
|
2847 |
a = self._annotated_lines |
|
2848 |
annotated_parent_lines = [a[p] for p in parent_ids] |
|
2849 |
annotated_lines = list(annotate.reannotate(annotated_parent_lines, |
|
3224.1.10
by John Arbash Meinel
Introduce the heads_provider for reannotate. |
2850 |
fulltext, revision_id, left_matching_blocks, |
2851 |
heads_provider=self._get_heads_provider())) |
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2852 |
self._annotated_lines[revision_id] = annotated_lines |
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2853 |
for p in parent_ids: |
2854 |
ann_children = self._annotate_children[p] |
|
2855 |
ann_children.remove(revision_id) |
|
2856 |
if (not ann_children |
|
2857 |
and p not in self._nodes_to_keep_annotations): |
|
2858 |
del self._annotated_lines[p] |
|
2859 |
del self._all_build_details[p] |
|
2860 |
if p in self._fulltext_contents: |
|
2861 |
del self._fulltext_contents[p] |
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2862 |
# Now that we've added this one, see if there are any pending
|
2863 |
# deltas to be done, certainly this parent is finished
|
|
2864 |
nodes_to_annotate = [] |
|
2865 |
for child in self._pending_children.pop(revision_id, []): |
|
2866 |
self._check_parents(child, nodes_to_annotate) |
|
2867 |
return nodes_to_annotate |
|
2868 |
||
2869 |
def _get_build_graph(self, revision_id): |
|
2870 |
"""Get the graphs for building texts and annotations. |
|
2871 |
||
2872 |
The data you need for creating a full text may be different than the
|
|
2873 |
data you need to annotate that text. (At a minimum, you need both
|
|
2874 |
parents to create an annotation, but only need 1 parent to generate the
|
|
2875 |
fulltext.)
|
|
2876 |
||
2877 |
:return: A list of (revision_id, index_memo) records, suitable for
|
|
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2878 |
passing to read_records_iter to start reading in the raw data fro/
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2879 |
the pack file.
|
2880 |
"""
|
|
3224.1.10
by John Arbash Meinel
Introduce the heads_provider for reannotate. |
2881 |
if revision_id in self._annotated_lines: |
2882 |
# Nothing to do
|
|
2883 |
return [] |
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2884 |
pending = set([revision_id]) |
2885 |
records = [] |
|
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2886 |
generation = 0 |
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2887 |
kept_generation = 0 |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2888 |
while pending: |
2889 |
# get all pending nodes
|
|
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2890 |
generation += 1 |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2891 |
this_iteration = pending |
2892 |
build_details = self._knit._index.get_build_details(this_iteration) |
|
2893 |
self._all_build_details.update(build_details) |
|
2894 |
# new_nodes = self._knit._index._get_entries(this_iteration)
|
|
2895 |
pending = set() |
|
2896 |
for rev_id, details in build_details.iteritems(): |
|
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
2897 |
(index_memo, compression_parent, parents, |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2898 |
record_details) = details |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2899 |
self._revision_id_graph[rev_id] = parents |
2900 |
records.append((rev_id, index_memo)) |
|
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2901 |
# Do we actually need to check _annotated_lines?
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2902 |
pending.update(p for p in parents |
2903 |
if p not in self._all_build_details) |
|
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2904 |
if compression_parent: |
2905 |
self._compression_children.setdefault(compression_parent, |
|
2906 |
[]).append(rev_id) |
|
2907 |
if parents: |
|
2908 |
for parent in parents: |
|
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2909 |
self._annotate_children.setdefault(parent, |
2910 |
[]).append(rev_id) |
|
2911 |
num_gens = generation - kept_generation |
|
2912 |
if ((num_gens >= self._generations_until_keep) |
|
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2913 |
and len(parents) > 1): |
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2914 |
kept_generation = generation |
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2915 |
self._nodes_to_keep_annotations.add(rev_id) |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2916 |
|
2917 |
missing_versions = this_iteration.difference(build_details.keys()) |
|
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
2918 |
self._ghosts.update(missing_versions) |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2919 |
for missing_version in missing_versions: |
2920 |
# add a key, no parents
|
|
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
2921 |
self._revision_id_graph[missing_version] = () |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2922 |
pending.discard(missing_version) # don't look for it |
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
2923 |
# XXX: This should probably be a real exception, as it is a data
|
2924 |
# inconsistency
|
|
2925 |
assert not self._ghosts.intersection(self._compression_children), \ |
|
2926 |
"We cannot have nodes which have a compression parent of a ghost."
|
|
2927 |
# Cleanout anything that depends on a ghost so that we don't wait for
|
|
2928 |
# the ghost to show up
|
|
2929 |
for node in self._ghosts: |
|
2930 |
if node in self._annotate_children: |
|
2931 |
# We won't be building this node
|
|
2932 |
del self._annotate_children[node] |
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2933 |
# Generally we will want to read the records in reverse order, because
|
2934 |
# we find the parent nodes after the children
|
|
2935 |
records.reverse() |
|
2936 |
return records |
|
2937 |
||
2938 |
def _annotate_records(self, records): |
|
2939 |
"""Build the annotations for the listed records.""" |
|
2940 |
# We iterate in the order read, rather than a strict order requested
|
|
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2941 |
# However, process what we can, and put off to the side things that
|
2942 |
# still need parents, cleaning them up when those parents are
|
|
2943 |
# processed.
|
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2944 |
for (rev_id, record, |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2945 |
digest) in self._knit._data.read_records_iter(records): |
2946 |
if rev_id in self._annotated_lines: |
|
2947 |
continue
|
|
2948 |
parent_ids = self._revision_id_graph[rev_id] |
|
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
2949 |
parent_ids = [p for p in parent_ids if p not in self._ghosts] |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2950 |
details = self._all_build_details[rev_id] |
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
2951 |
(index_memo, compression_parent, parents, |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2952 |
record_details) = details |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2953 |
nodes_to_annotate = [] |
2954 |
# TODO: Remove the punning between compression parents, and
|
|
2955 |
# parent_ids, we should be able to do this without assuming
|
|
2956 |
# the build order
|
|
2957 |
if len(parent_ids) == 0: |
|
2958 |
# There are no parents for this node, so just add it
|
|
2959 |
# TODO: This probably needs to be decoupled
|
|
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
2960 |
assert compression_parent is None |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2961 |
fulltext_content, delta = self._knit.factory.parse_record( |
2962 |
rev_id, record, record_details, None) |
|
2963 |
fulltext = self._add_fulltext_content(rev_id, fulltext_content) |
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2964 |
nodes_to_annotate.extend(self._add_annotation(rev_id, fulltext, |
2965 |
parent_ids, left_matching_blocks=None)) |
|
2966 |
else: |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2967 |
child = (rev_id, parent_ids, record) |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2968 |
# Check if all the parents are present
|
2969 |
self._check_parents(child, nodes_to_annotate) |
|
2970 |
while nodes_to_annotate: |
|
2971 |
# Should we use a queue here instead of a stack?
|
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2972 |
(rev_id, parent_ids, record) = nodes_to_annotate.pop() |
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
2973 |
(index_memo, compression_parent, parents, |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2974 |
record_details) = self._all_build_details[rev_id] |
3224.1.14
by John Arbash Meinel
Switch to making content_details opaque, step 1 |
2975 |
if compression_parent is not None: |
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2976 |
comp_children = self._compression_children[compression_parent] |
2977 |
assert rev_id in comp_children |
|
2978 |
# If there is only 1 child, it is safe to reuse this
|
|
2979 |
# content
|
|
2980 |
reuse_content = (len(comp_children) == 1 |
|
2981 |
and compression_parent not in |
|
2982 |
self._nodes_to_keep_annotations) |
|
2983 |
if reuse_content: |
|
2984 |
# Remove it from the cache since it will be changing
|
|
2985 |
parent_fulltext_content = self._fulltext_contents.pop(compression_parent) |
|
2986 |
# Make sure to copy the fulltext since it might be
|
|
2987 |
# modified
|
|
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2988 |
parent_fulltext = list(parent_fulltext_content.text()) |
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2989 |
else: |
2990 |
parent_fulltext_content = self._fulltext_contents[compression_parent] |
|
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2991 |
parent_fulltext = parent_fulltext_content.text() |
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2992 |
comp_children.remove(rev_id) |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
2993 |
fulltext_content, delta = self._knit.factory.parse_record( |
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2994 |
rev_id, record, record_details, |
2995 |
parent_fulltext_content, |
|
3224.1.19
by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed. |
2996 |
copy_base_content=(not reuse_content)) |
3224.1.22
by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines. |
2997 |
fulltext = self._add_fulltext_content(rev_id, |
2998 |
fulltext_content) |
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
2999 |
blocks = KnitContent.get_line_delta_blocks(delta, |
3000 |
parent_fulltext, fulltext) |
|
3001 |
else: |
|
3002 |
fulltext_content = self._knit.factory.parse_fulltext( |
|
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
3003 |
record, rev_id) |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
3004 |
fulltext = self._add_fulltext_content(rev_id, |
3224.1.15
by John Arbash Meinel
Finish removing method and noeol from general knowledge, |
3005 |
fulltext_content) |
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
3006 |
blocks = None |
3007 |
nodes_to_annotate.extend( |
|
3008 |
self._add_annotation(rev_id, fulltext, parent_ids, |
|
3009 |
left_matching_blocks=blocks)) |
|
3010 |
||
3224.1.10
by John Arbash Meinel
Introduce the heads_provider for reannotate. |
3011 |
def _get_heads_provider(self): |
3012 |
"""Create a heads provider for resolving ancestry issues.""" |
|
3013 |
if self._heads_provider is not None: |
|
3014 |
return self._heads_provider |
|
3015 |
parent_provider = _mod_graph.DictParentsProvider( |
|
3016 |
self._revision_id_graph) |
|
3017 |
graph_obj = _mod_graph.Graph(parent_provider) |
|
3224.1.20
by John Arbash Meinel
Reduce the number of cache misses by caching known heads answers |
3018 |
head_cache = _mod_graph.FrozenHeadsCache(graph_obj) |
3224.1.10
by John Arbash Meinel
Introduce the heads_provider for reannotate. |
3019 |
self._heads_provider = head_cache |
3020 |
return head_cache |
|
3021 |
||
3224.1.25
by John Arbash Meinel
Quick change to the _KnitAnnotator api to use .annotate() instead of get_annotated_lines() |
3022 |
def annotate(self, revision_id): |
3224.1.5
by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations. |
3023 |
"""Return the annotated fulltext at the given revision. |
3024 |
||
3025 |
:param revision_id: The revision id for this file
|
|
3026 |
"""
|
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
3027 |
records = self._get_build_graph(revision_id) |
3224.1.29
by John Arbash Meinel
Properly handle annotating when ghosts are present. |
3028 |
if revision_id in self._ghosts: |
3029 |
raise errors.RevisionNotPresent(revision_id, self._knit) |
|
3224.1.6
by John Arbash Meinel
Refactor the annotation logic into a helper class. |
3030 |
self._annotate_records(records) |
3031 |
return self._annotated_lines[revision_id] |
|
3224.1.5
by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations. |
3032 |
|
3033 |
||
2484.1.1
by John Arbash Meinel
Add an initial function to read knit indexes in pyrex. |
3034 |
try: |
2484.1.12
by John Arbash Meinel
Switch the layout to use a matching _knit_load_data_py.py and _knit_load_data_c.pyx |
3035 |
from bzrlib._knit_load_data_c import _load_data_c as _load_data |
2484.1.1
by John Arbash Meinel
Add an initial function to read knit indexes in pyrex. |
3036 |
except ImportError: |
2484.1.12
by John Arbash Meinel
Switch the layout to use a matching _knit_load_data_py.py and _knit_load_data_c.pyx |
3037 |
from bzrlib._knit_load_data_py import _load_data_py as _load_data |