/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
1
# Copyright (C) 2005, 2006, 2007 Canonical Ltd
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
17
"""Knit versionedfile implementation.
18
19
A knit is a versioned file implementation that supports efficient append only
20
updates.
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
21
22
Knit file layout:
23
lifeless: the data file is made up of "delta records".  each delta record has a delta header 
24
that contains; (1) a version id, (2) the size of the delta (in lines), and (3)  the digest of 
25
the -expanded data- (ie, the delta applied to the parent).  the delta also ends with a 
26
end-marker; simply "end VERSION"
27
28
delta can be line or full contents.a
29
... the 8's there are the index number of the annotation.
30
version robertc@robertcollins.net-20051003014215-ee2990904cc4c7ad 7 c7d23b2a5bd6ca00e8e266cec0ec228158ee9f9e
31
59,59,3
32
8
33
8         if ie.executable:
34
8             e.set('executable', 'yes')
35
130,130,2
36
8         if elt.get('executable') == 'yes':
37
8             ie.executable = True
38
end robertc@robertcollins.net-20051003014215-ee2990904cc4c7ad 
39
40
41
whats in an index:
42
09:33 < jrydberg> lifeless: each index is made up of a tuple of; version id, options, position, size, parents
43
09:33 < jrydberg> lifeless: the parents are currently dictionary compressed
44
09:33 < jrydberg> lifeless: (meaning it currently does not support ghosts)
45
09:33 < lifeless> right
46
09:33 < jrydberg> lifeless: the position and size is the range in the data file
47
48
49
so the index sequence is the dictionary compressed sequence number used
50
in the deltas to provide line annotation
51
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
52
"""
53
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
54
# TODOS:
55
# 10:16 < lifeless> make partial index writes safe
56
# 10:16 < lifeless> implement 'knit.check()' like weave.check()
57
# 10:17 < lifeless> record known ghosts so we can detect when they are filled in rather than the current 'reweave 
58
#                    always' approach.
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
59
# move sha1 out of the content so that join is faster at verifying parents
60
# record content length ?
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
61
                  
62
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
63
from cStringIO import StringIO
1596.2.28 by Robert Collins
more knit profile based tuning.
64
from itertools import izip, chain
1756.2.17 by Aaron Bentley
Fixes suggested by John Meinel
65
import operator
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
66
import os
1628.1.2 by Robert Collins
More knit micro-optimisations.
67
import sys
1756.2.29 by Aaron Bentley
Remove basis knit support
68
import warnings
2762.3.1 by Robert Collins
* The compression used within the bzr repository has changed from zlib
69
from zlib import Z_DEFAULT_COMPRESSION
1594.2.19 by Robert Collins
More coalescing tweaks, and knit feedback.
70
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
71
import bzrlib
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
72
from bzrlib.lazy_import import lazy_import
73
lazy_import(globals(), """
74
from bzrlib import (
2770.1.1 by Aaron Bentley
Initial implmentation of plain knit annotation
75
    annotate,
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
76
    graph as _mod_graph,
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
77
    lru_cache,
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
78
    pack,
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
79
    trace,
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
80
    )
81
""")
1911.2.3 by John Arbash Meinel
Moving everything into a new location so that we can cache more than just revision ids
82
from bzrlib import (
83
    cache_utf8,
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
84
    debug,
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
85
    diff,
1911.2.3 by John Arbash Meinel
Moving everything into a new location so that we can cache more than just revision ids
86
    errors,
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
87
    osutils,
2104.4.2 by John Arbash Meinel
Small cleanup and NEWS entry about fixing bug #65714
88
    patiencediff,
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
89
    progress,
1551.15.46 by Aaron Bentley
Move plan merge to tree
90
    merge,
2196.2.1 by John Arbash Meinel
Merge Dmitry's optimizations and minimize the actual diff.
91
    ui,
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
92
    )
93
from bzrlib.errors import (
94
    FileExists,
95
    NoSuchFile,
96
    KnitError,
97
    InvalidRevisionId,
98
    KnitCorrupt,
99
    KnitHeaderError,
100
    RevisionNotPresent,
101
    RevisionAlreadyPresent,
102
    )
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
103
from bzrlib.graph import Graph
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
104
from bzrlib.osutils import (
105
    contains_whitespace,
106
    contains_linebreaks,
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
107
    sha_string,
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
108
    sha_strings,
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
109
    split_lines,
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
110
    )
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
111
from bzrlib.tsort import topo_sort
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
112
from bzrlib.tuned_gzip import GzipFile, bytes_to_gzip
2094.3.5 by John Arbash Meinel
Fix imports to ensure modules are loaded before they are used
113
import bzrlib.ui
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
114
from bzrlib.versionedfile import (
3350.3.12 by Robert Collins
Generate streams with absent records.
115
    AbsentContentFactory,
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
116
    adapter_registry,
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
117
    ContentFactory,
118
    InterVersionedFile,
119
    VersionedFile,
120
    )
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
121
import bzrlib.weave
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
122
123
124
# TODO: Split out code specific to this format into an associated object.
125
126
# TODO: Can we put in some kind of value to check that the index and data
127
# files belong together?
128
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
129
# TODO: accommodate binaries, perhaps by storing a byte count
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
130
131
# TODO: function to check whole file
132
133
# TODO: atomically append data, then measure backwards from the cursor
134
# position after writing to work out where it was located.  we may need to
135
# bypass python file buffering.
136
137
DATA_SUFFIX = '.knit'
138
INDEX_SUFFIX = '.kndx'
139
140
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
141
class KnitAdapter(object):
142
    """Base class for knit record adaption."""
143
3350.3.7 by Robert Collins
Create a registry of versioned file record adapters.
144
    def __init__(self, basis_vf):
145
        """Create an adapter which accesses full texts from basis_vf.
146
        
147
        :param basis_vf: A versioned file to access basis texts of deltas from.
148
            May be None for adapters that do not need to access basis texts.
149
        """
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
150
        self._data = _KnitData(None)
151
        self._annotate_factory = KnitAnnotateFactory()
152
        self._plain_factory = KnitPlainFactory()
3350.3.7 by Robert Collins
Create a registry of versioned file record adapters.
153
        self._basis_vf = basis_vf
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
154
155
156
class FTAnnotatedToUnannotated(KnitAdapter):
157
    """An adapter from FT annotated knits to unannotated ones."""
158
159
    def get_bytes(self, factory, annotated_compressed_bytes):
160
        rec, contents = \
161
            self._data._parse_record_unchecked(annotated_compressed_bytes)
162
        content = self._annotate_factory.parse_fulltext(contents, rec[1])
163
        size, bytes = self._data._record_to_data(rec[1], rec[3], content.text())
164
        return bytes
165
166
167
class DeltaAnnotatedToUnannotated(KnitAdapter):
168
    """An adapter for deltas from annotated to unannotated."""
169
170
    def get_bytes(self, factory, annotated_compressed_bytes):
171
        rec, contents = \
172
            self._data._parse_record_unchecked(annotated_compressed_bytes)
173
        delta = self._annotate_factory.parse_line_delta(contents, rec[1],
174
            plain=True)
175
        contents = self._plain_factory.lower_line_delta(delta)
176
        size, bytes = self._data._record_to_data(rec[1], rec[3], contents)
177
        return bytes
178
179
180
class FTAnnotatedToFullText(KnitAdapter):
181
    """An adapter from FT annotated knits to unannotated ones."""
182
183
    def get_bytes(self, factory, annotated_compressed_bytes):
184
        rec, contents = \
185
            self._data._parse_record_unchecked(annotated_compressed_bytes)
186
        content, delta = self._annotate_factory.parse_record(factory.key[0],
187
            contents, factory._build_details, None)
188
        return ''.join(content.text())
189
190
191
class DeltaAnnotatedToFullText(KnitAdapter):
192
    """An adapter for deltas from annotated to unannotated."""
193
194
    def get_bytes(self, factory, annotated_compressed_bytes):
195
        rec, contents = \
196
            self._data._parse_record_unchecked(annotated_compressed_bytes)
197
        delta = self._annotate_factory.parse_line_delta(contents, rec[1],
198
            plain=True)
199
        compression_parent = factory.parents[0][0]
200
        basis_lines = self._basis_vf.get_lines(compression_parent)
201
        # Manually apply the delta because we have one annotated content and
202
        # one plain.
203
        basis_content = PlainKnitContent(basis_lines, compression_parent)
204
        basis_content.apply_delta(delta, rec[1])
205
        basis_content._should_strip_eol = factory._build_details[1]
206
        return ''.join(basis_content.text())
207
208
3350.3.5 by Robert Collins
Create adapters from plain compressed knit content.
209
class FTPlainToFullText(KnitAdapter):
210
    """An adapter from FT plain knits to unannotated ones."""
211
212
    def get_bytes(self, factory, compressed_bytes):
213
        rec, contents = \
214
            self._data._parse_record_unchecked(compressed_bytes)
215
        content, delta = self._plain_factory.parse_record(factory.key[0],
216
            contents, factory._build_details, None)
217
        return ''.join(content.text())
218
219
220
class DeltaPlainToFullText(KnitAdapter):
221
    """An adapter for deltas from annotated to unannotated."""
222
223
    def get_bytes(self, factory, compressed_bytes):
224
        rec, contents = \
225
            self._data._parse_record_unchecked(compressed_bytes)
226
        delta = self._plain_factory.parse_line_delta(contents, rec[1])
227
        compression_parent = factory.parents[0][0]
228
        basis_lines = self._basis_vf.get_lines(compression_parent)
229
        basis_content = PlainKnitContent(basis_lines, compression_parent)
230
        # Manually apply the delta because we have one annotated content and
231
        # one plain.
232
        content, _ = self._plain_factory.parse_record(rec[1], contents,
233
            factory._build_details, basis_content)
234
        return ''.join(content.text())
235
236
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
237
class KnitContentFactory(ContentFactory):
238
    """Content factory for streaming from knits.
239
    
240
    :seealso ContentFactory:
241
    """
242
243
    def __init__(self, version, parents, build_details, sha1, raw_record,
244
        annotated, knit=None):
245
        """Create a KnitContentFactory for version.
246
        
247
        :param version: The version.
248
        :param parents: The parents.
249
        :param build_details: The build details as returned from
250
            get_build_details.
251
        :param sha1: The sha1 expected from the full text of this object.
252
        :param raw_record: The bytes of the knit data from disk.
253
        :param annotated: True if the raw data is annotated.
254
        """
255
        ContentFactory.__init__(self)
256
        self.sha1 = sha1
257
        self.key = (version,)
258
        self.parents = tuple((parent,) for parent in parents)
259
        if build_details[0] == 'line-delta':
260
            kind = 'delta'
261
        else:
262
            kind = 'ft'
263
        if annotated:
264
            annotated_kind = 'annotated-'
265
        else:
266
            annotated_kind = ''
267
        self.storage_kind = 'knit-%s%s-gz' % (annotated_kind, kind)
268
        self._raw_record = raw_record
269
        self._build_details = build_details
270
        self._knit = knit
271
272
    def get_bytes_as(self, storage_kind):
273
        if storage_kind == self.storage_kind:
274
            return self._raw_record
275
        if storage_kind == 'fulltext' and self._knit is not None:
276
            return self._knit.get_text(self.key[0])
277
        else:
278
            raise errors.UnavailableRepresentation(self.key, storage_kind,
279
                self.storage_kind)
280
281
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
282
class KnitContent(object):
283
    """Content of a knit version to which deltas can be applied."""
284
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
285
    def __init__(self):
286
        self._should_strip_eol = False
287
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
288
    def apply_delta(self, delta, new_version_id):
2921.2.2 by Robert Collins
Review feedback.
289
        """Apply delta to this object to become new_version_id."""
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
290
        raise NotImplementedError(self.apply_delta)
291
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
292
    def cleanup_eol(self, copy_on_mutate=True):
293
        if self._should_strip_eol:
294
            if copy_on_mutate:
295
                self._lines = self._lines[:]
296
            self.strip_last_line_newline()
297
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
298
    def line_delta_iter(self, new_lines):
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
299
        """Generate line-based delta from this content to new_lines."""
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
300
        new_texts = new_lines.text()
301
        old_texts = self.text()
2781.1.1 by Martin Pool
merge cpatiencediff from Lukas
302
        s = patiencediff.PatienceSequenceMatcher(None, old_texts, new_texts)
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
303
        for tag, i1, i2, j1, j2 in s.get_opcodes():
304
            if tag == 'equal':
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
305
                continue
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
306
            # ofrom, oto, length, data
307
            yield i1, i2, j2 - j1, new_lines._lines[j1:j2]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
308
309
    def line_delta(self, new_lines):
310
        return list(self.line_delta_iter(new_lines))
311
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
312
    @staticmethod
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
313
    def get_line_delta_blocks(knit_delta, source, target):
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
314
        """Extract SequenceMatcher.get_matching_blocks() from a knit delta"""
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
315
        target_len = len(target)
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
316
        s_pos = 0
317
        t_pos = 0
318
        for s_begin, s_end, t_len, new_text in knit_delta:
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
319
            true_n = s_begin - s_pos
320
            n = true_n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
321
            if n > 0:
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
322
                # knit deltas do not provide reliable info about whether the
323
                # last line of a file matches, due to eol handling.
324
                if source[s_pos + n -1] != target[t_pos + n -1]:
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
325
                    n-=1
326
                if n > 0:
327
                    yield s_pos, t_pos, n
328
            t_pos += t_len + true_n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
329
            s_pos = s_end
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
330
        n = target_len - t_pos
331
        if n > 0:
332
            if source[s_pos + n -1] != target[t_pos + n -1]:
333
                n-=1
334
            if n > 0:
335
                yield s_pos, t_pos, n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
336
        yield s_pos + (target_len - t_pos), target_len, 0
337
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
338
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
339
class AnnotatedKnitContent(KnitContent):
340
    """Annotated content."""
341
342
    def __init__(self, lines):
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
343
        KnitContent.__init__(self)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
344
        self._lines = lines
345
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
346
    def annotate(self):
347
        """Return a list of (origin, text) for each content line."""
348
        return list(self._lines)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
349
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
350
    def apply_delta(self, delta, new_version_id):
2921.2.2 by Robert Collins
Review feedback.
351
        """Apply delta to this object to become new_version_id."""
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
352
        offset = 0
353
        lines = self._lines
354
        for start, end, count, delta_lines in delta:
355
            lines[offset+start:offset+end] = delta_lines
356
            offset = offset + (start - end) + count
357
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
358
    def strip_last_line_newline(self):
359
        line = self._lines[-1][1].rstrip('\n')
360
        self._lines[-1] = (self._lines[-1][0], line)
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
361
        self._should_strip_eol = False
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
362
363
    def text(self):
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
364
        try:
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
365
            lines = [text for origin, text in self._lines]
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
366
        except ValueError, e:
367
            # most commonly (only?) caused by the internal form of the knit
368
            # missing annotation information because of a bug - see thread
369
            # around 20071015
370
            raise KnitCorrupt(self,
371
                "line in annotated knit missing annotation information: %s"
372
                % (e,))
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
373
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
374
        if self._should_strip_eol:
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
375
            lines[-1] = lines[-1].rstrip('\n')
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
376
        return lines
377
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
378
    def copy(self):
379
        return AnnotatedKnitContent(self._lines[:])
380
381
382
class PlainKnitContent(KnitContent):
2794.1.3 by Robert Collins
Review feedback.
383
    """Unannotated content.
384
    
385
    When annotate[_iter] is called on this content, the same version is reported
386
    for all lines. Generally, annotate[_iter] is not useful on PlainKnitContent
387
    objects.
388
    """
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
389
390
    def __init__(self, lines, version_id):
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
391
        KnitContent.__init__(self)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
392
        self._lines = lines
393
        self._version_id = version_id
394
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
395
    def annotate(self):
396
        """Return a list of (origin, text) for each content line."""
397
        return [(self._version_id, line) for line in self._lines]
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
398
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
399
    def apply_delta(self, delta, new_version_id):
2921.2.2 by Robert Collins
Review feedback.
400
        """Apply delta to this object to become new_version_id."""
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
401
        offset = 0
402
        lines = self._lines
403
        for start, end, count, delta_lines in delta:
404
            lines[offset+start:offset+end] = delta_lines
405
            offset = offset + (start - end) + count
406
        self._version_id = new_version_id
407
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
408
    def copy(self):
409
        return PlainKnitContent(self._lines[:], self._version_id)
410
411
    def strip_last_line_newline(self):
412
        self._lines[-1] = self._lines[-1].rstrip('\n')
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
413
        self._should_strip_eol = False
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
414
415
    def text(self):
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
416
        lines = self._lines
417
        if self._should_strip_eol:
418
            lines = lines[:]
419
            lines[-1] = lines[-1].rstrip('\n')
420
        return lines
421
422
423
class _KnitFactory(object):
424
    """Base class for common Factory functions."""
425
426
    def parse_record(self, version_id, record, record_details,
427
                     base_content, copy_base_content=True):
428
        """Parse a record into a full content object.
429
430
        :param version_id: The official version id for this content
431
        :param record: The data returned by read_records_iter()
432
        :param record_details: Details about the record returned by
433
            get_build_details
434
        :param base_content: If get_build_details returns a compression_parent,
435
            you must return a base_content here, else use None
436
        :param copy_base_content: When building from the base_content, decide
437
            you can either copy it and return a new object, or modify it in
438
            place.
439
        :return: (content, delta) A Content object and possibly a line-delta,
440
            delta may be None
441
        """
442
        method, noeol = record_details
443
        if method == 'line-delta':
444
            if copy_base_content:
445
                content = base_content.copy()
446
            else:
447
                content = base_content
448
            delta = self.parse_line_delta(record, version_id)
449
            content.apply_delta(delta, version_id)
450
        else:
451
            content = self.parse_fulltext(record, version_id)
452
            delta = None
453
        content._should_strip_eol = noeol
454
        return (content, delta)
455
456
457
class KnitAnnotateFactory(_KnitFactory):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
458
    """Factory for creating annotated Content objects."""
459
460
    annotated = True
461
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
462
    def make(self, lines, version_id):
463
        num_lines = len(lines)
464
        return AnnotatedKnitContent(zip([version_id] * num_lines, lines))
465
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
466
    def parse_fulltext(self, content, version_id):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
467
        """Convert fulltext to internal representation
468
469
        fulltext content is of the format
470
        revid(utf8) plaintext\n
471
        internal representation is of the format:
472
        (revid, plaintext)
473
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
474
        # TODO: jam 20070209 The tests expect this to be returned as tuples,
475
        #       but the code itself doesn't really depend on that.
476
        #       Figure out a way to not require the overhead of turning the
477
        #       list back into tuples.
478
        lines = [tuple(line.split(' ', 1)) for line in content]
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
479
        return AnnotatedKnitContent(lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
480
481
    def parse_line_delta_iter(self, lines):
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
482
        return iter(self.parse_line_delta(lines))
1628.1.2 by Robert Collins
More knit micro-optimisations.
483
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
484
    def parse_line_delta(self, lines, version_id, plain=False):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
485
        """Convert a line based delta into internal representation.
486
487
        line delta is in the form of:
488
        intstart intend intcount
489
        1..count lines:
490
        revid(utf8) newline\n
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
491
        internal representation is
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
492
        (start, end, count, [1..count tuples (revid, newline)])
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
493
494
        :param plain: If True, the lines are returned as a plain
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
495
            list without annotations, not as a list of (origin, content) tuples, i.e.
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
496
            (start, end, count, [1..count newline])
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
497
        """
1628.1.2 by Robert Collins
More knit micro-optimisations.
498
        result = []
499
        lines = iter(lines)
500
        next = lines.next
2249.5.1 by John Arbash Meinel
Leave revision-ids in utf-8 when reading.
501
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
502
        cache = {}
503
        def cache_and_return(line):
504
            origin, text = line.split(' ', 1)
505
            return cache.setdefault(origin, origin), text
506
1628.1.2 by Robert Collins
More knit micro-optimisations.
507
        # walk through the lines parsing.
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
508
        # Note that the plain test is explicitly pulled out of the
509
        # loop to minimise any performance impact
510
        if plain:
511
            for header in lines:
512
                start, end, count = [int(n) for n in header.split(',')]
513
                contents = [next().split(' ', 1)[1] for i in xrange(count)]
514
                result.append((start, end, count, contents))
515
        else:
516
            for header in lines:
517
                start, end, count = [int(n) for n in header.split(',')]
518
                contents = [tuple(next().split(' ', 1)) for i in xrange(count)]
519
                result.append((start, end, count, contents))
1628.1.2 by Robert Collins
More knit micro-optimisations.
520
        return result
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
521
2163.2.2 by John Arbash Meinel
Don't deal with annotations when we don't care about them. Saves another 300+ms
522
    def get_fulltext_content(self, lines):
523
        """Extract just the content lines from a fulltext."""
524
        return (line.split(' ', 1)[1] for line in lines)
525
526
    def get_linedelta_content(self, lines):
527
        """Extract just the content from a line delta.
528
529
        This doesn't return all of the extra information stored in a delta.
530
        Only the actual content lines.
531
        """
532
        lines = iter(lines)
533
        next = lines.next
534
        for header in lines:
535
            header = header.split(',')
536
            count = int(header[2])
537
            for i in xrange(count):
538
                origin, text = next().split(' ', 1)
539
                yield text
540
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
541
    def lower_fulltext(self, content):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
542
        """convert a fulltext content record into a serializable form.
543
544
        see parse_fulltext which this inverts.
545
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
546
        # TODO: jam 20070209 We only do the caching thing to make sure that
547
        #       the origin is a valid utf-8 line, eventually we could remove it
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
548
        return ['%s %s' % (o, t) for o, t in content._lines]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
549
550
    def lower_line_delta(self, delta):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
551
        """convert a delta into a serializable form.
552
1628.1.2 by Robert Collins
More knit micro-optimisations.
553
        See parse_line_delta which this inverts.
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
554
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
555
        # TODO: jam 20070209 We only do the caching thing to make sure that
556
        #       the origin is a valid utf-8 line, eventually we could remove it
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
557
        out = []
558
        for start, end, c, lines in delta:
559
            out.append('%d,%d,%d\n' % (start, end, c))
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
560
            out.extend(origin + ' ' + text
1911.2.1 by John Arbash Meinel
Cache encode/decode operations, saves memory and time. Especially when committing a new kernel tree with 7.7M new lines to annotate
561
                       for origin, text in lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
562
        return out
563
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
564
    def annotate(self, knit, version_id):
2770.1.1 by Aaron Bentley
Initial implmentation of plain knit annotation
565
        content = knit._get_content(version_id)
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
566
        return content.annotate()
2770.1.1 by Aaron Bentley
Initial implmentation of plain knit annotation
567
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
568
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
569
class KnitPlainFactory(_KnitFactory):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
570
    """Factory for creating plain Content objects."""
571
572
    annotated = False
573
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
574
    def make(self, lines, version_id):
575
        return PlainKnitContent(lines, version_id)
576
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
577
    def parse_fulltext(self, content, version_id):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
578
        """This parses an unannotated fulltext.
579
580
        Note that this is not a noop - the internal representation
581
        has (versionid, line) - its just a constant versionid.
582
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
583
        return self.make(content, version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
584
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
585
    def parse_line_delta_iter(self, lines, version_id):
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
586
        cur = 0
587
        num_lines = len(lines)
588
        while cur < num_lines:
589
            header = lines[cur]
590
            cur += 1
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
591
            start, end, c = [int(n) for n in header.split(',')]
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
592
            yield start, end, c, lines[cur:cur+c]
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
593
            cur += c
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
594
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
595
    def parse_line_delta(self, lines, version_id):
596
        return list(self.parse_line_delta_iter(lines, version_id))
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
597
2163.2.2 by John Arbash Meinel
Don't deal with annotations when we don't care about them. Saves another 300+ms
598
    def get_fulltext_content(self, lines):
599
        """Extract just the content lines from a fulltext."""
600
        return iter(lines)
601
602
    def get_linedelta_content(self, lines):
603
        """Extract just the content from a line delta.
604
605
        This doesn't return all of the extra information stored in a delta.
606
        Only the actual content lines.
607
        """
608
        lines = iter(lines)
609
        next = lines.next
610
        for header in lines:
611
            header = header.split(',')
612
            count = int(header[2])
613
            for i in xrange(count):
614
                yield next()
615
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
616
    def lower_fulltext(self, content):
617
        return content.text()
618
619
    def lower_line_delta(self, delta):
620
        out = []
621
        for start, end, c, lines in delta:
622
            out.append('%d,%d,%d\n' % (start, end, c))
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
623
            out.extend(lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
624
        return out
625
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
626
    def annotate(self, knit, version_id):
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
627
        annotator = _KnitAnnotator(knit)
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
628
        return annotator.annotate(version_id)
2770.1.1 by Aaron Bentley
Initial implmentation of plain knit annotation
629
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
630
631
def make_empty_knit(transport, relpath):
632
    """Construct a empty knit at the specified location."""
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
633
    k = make_file_knit(transport, relpath, 'w', KnitPlainFactory)
634
635
636
def make_file_knit(name, transport, file_mode=None, access_mode='w',
637
    factory=None, delta=True, create=False, create_parent_dir=False,
638
    delay_create=False, dir_mode=None, get_scope=None):
639
    """Factory to create a KnitVersionedFile for a .knit/.kndx file pair."""
640
    if factory is None:
641
        factory = KnitAnnotateFactory()
642
    if get_scope is None:
643
        get_scope = lambda:None
644
    index = _KnitIndex(transport, name + INDEX_SUFFIX,
645
        access_mode, create=create, file_mode=file_mode,
646
        create_parent_dir=create_parent_dir, delay_create=delay_create,
647
        dir_mode=dir_mode, get_scope=get_scope)
648
    access = _KnitAccess(transport, name + DATA_SUFFIX, file_mode,
649
        dir_mode, ((create and not len(index)) and delay_create),
650
        create_parent_dir)
651
    return KnitVersionedFile(name, transport, factory=factory,
652
        create=create, delay_create=delay_create, index=index,
653
        access_method=access)
654
655
656
def get_suffixes():
657
    """Return the suffixes used by file based knits."""
658
    return [DATA_SUFFIX, INDEX_SUFFIX]
659
make_file_knit.get_suffixes = get_suffixes
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
660
661
662
class KnitVersionedFile(VersionedFile):
663
    """Weave-like structure with faster random access.
664
665
    A knit stores a number of texts and a summary of the relationships
666
    between them.  Texts are identified by a string version-id.  Texts
667
    are normally stored and retrieved as a series of lines, but can
668
    also be passed as single strings.
669
670
    Lines are stored with the trailing newline (if any) included, to
671
    avoid special cases for files with no final newline.  Lines are
672
    composed of 8-bit characters, not unicode.  The combination of
673
    these approaches should mean any 'binary' file can be safely
674
    stored and retrieved.
675
    """
676
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
677
    def __init__(self, relpath, transport, file_mode=None,
2592.3.135 by Robert Collins
Do not create many transient knit objects, saving 4% on commit.
678
        factory=None, delta=True, create=False, create_parent_dir=False,
679
        delay_create=False, dir_mode=None, index=None, access_method=None):
1563.2.25 by Robert Collins
Merge in upstream.
680
        """Construct a knit at location specified by relpath.
681
        
682
        :param create: If not True, only open an existing knit.
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
683
        :param create_parent_dir: If True, create the parent directory if 
684
            creating the file fails. (This is used for stores with 
685
            hash-prefixes that may not exist yet)
686
        :param delay_create: The calling code is aware that the knit won't 
687
            actually be created until the first data is stored.
2592.3.1 by Robert Collins
Allow giving KnitVersionedFile an index object to use rather than implicitly creating one.
688
        :param index: An index to use for the knit.
1563.2.25 by Robert Collins
Merge in upstream.
689
        """
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
690
        super(KnitVersionedFile, self).__init__()
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
691
        self.transport = transport
692
        self.filename = relpath
1563.2.16 by Robert Collins
Change WeaveStore into VersionedFileStore and make its versoined file class parameterisable.
693
        self.factory = factory or KnitAnnotateFactory()
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
694
        self.delta = delta
695
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
696
        self._max_delta_chain = 200
697
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
698
        if None in (access_method, index):
3316.2.15 by Robert Collins
Final review feedback.
699
            raise ValueError("No default access_method or index any more")
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
700
        self._index = index
701
        _access = access_method
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
702
        if create and not len(self) and not delay_create:
703
            _access.create()
704
        self._data = _KnitData(_access)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
705
1704.2.10 by Martin Pool
Add KnitVersionedFile.__repr__ method
706
    def __repr__(self):
2592.3.159 by Robert Collins
Provide a transport for KnitVersionedFile's __repr__ in pack repositories.
707
        return '%s(%s)' % (self.__class__.__name__,
1704.2.10 by Martin Pool
Add KnitVersionedFile.__repr__ method
708
                           self.transport.abspath(self.filename))
709
    
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
710
    def _check_should_delta(self, first_parents):
711
        """Iterate back through the parent listing, looking for a fulltext.
712
713
        This is used when we want to decide whether to add a delta or a new
714
        fulltext. It searches for _max_delta_chain parents. When it finds a
715
        fulltext parent, it sees if the total size of the deltas leading up to
716
        it is large enough to indicate that we want a new full text anyway.
717
718
        Return True if we should create a new delta, False if we should use a
719
        full text.
720
        """
721
        delta_size = 0
722
        fulltext_size = None
723
        delta_parents = first_parents
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
724
        for count in xrange(self._max_delta_chain):
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
725
            parent = delta_parents[0]
726
            method = self._index.get_method(parent)
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
727
            index, pos, size = self._index.get_position(parent)
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
728
            if method == 'fulltext':
729
                fulltext_size = size
730
                break
731
            delta_size += size
3287.5.6 by Robert Collins
Remove _KnitIndex.get_parents.
732
            delta_parents = self._index.get_parent_map([parent])[parent]
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
733
        else:
734
            # We couldn't find a fulltext, so we must create a new one
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
735
            return False
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
736
737
        return fulltext_size > delta_size
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
738
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
739
    def _check_write_ok(self):
740
        return self._index._check_write_ok()
741
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
742
    def _add_raw_records(self, records, data):
743
        """Add all the records 'records' with data pre-joined in 'data'.
744
745
        :param records: A list of tuples(version_id, options, parents, size).
746
        :param data: The data for the records. When it is written, the records
747
                     are adjusted to have pos pointing into data by the sum of
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
748
                     the preceding records sizes.
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
749
        """
750
        # write all the data
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
751
        raw_record_sizes = [record[3] for record in records]
752
        positions = self._data.add_raw_records(raw_record_sizes, data)
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
753
        index_entries = []
3350.3.17 by Robert Collins
Prevent corrupt knits being created when a stream is interrupted with basis parents not present.
754
        for (version_id, options, parents, _), access_memo in zip(
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
755
            records, positions):
2592.3.68 by Robert Collins
Make knit add_versions calls take access memo tuples rather than just pos and size.
756
            index_entries.append((version_id, options, access_memo, parents))
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
757
        self._index.add_versions(index_entries)
758
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
759
    def copy_to(self, name, transport):
760
        """See VersionedFile.copy_to()."""
761
        # copy the current index to a temp index to avoid racing with local
762
        # writes
1955.3.30 by John Arbash Meinel
fix small bug
763
        transport.put_file_non_atomic(name + INDEX_SUFFIX + '.tmp',
1955.3.24 by John Arbash Meinel
Update Knit to use the new non_atomic_foo functions
764
                self.transport.get(self._index._filename))
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
765
        # copy the data file
1711.7.25 by John Arbash Meinel
try/finally to close files, _KnitData was keeping a handle to a file it never used again, and using transport.rename() when it wanted transport.move()
766
        f = self._data._open_file()
767
        try:
1955.3.8 by John Arbash Meinel
avoid some deprecation warnings in other parts of the code
768
            transport.put_file(name + DATA_SUFFIX, f)
1711.7.25 by John Arbash Meinel
try/finally to close files, _KnitData was keeping a handle to a file it never used again, and using transport.rename() when it wanted transport.move()
769
        finally:
770
            f.close()
771
        # move the copied index into place
772
        transport.move(name + INDEX_SUFFIX + '.tmp', name + INDEX_SUFFIX)
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
773
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
774
    def get_data_stream(self, required_versions):
775
        """Get a data stream for the specified versions.
776
777
        Versions may be returned in any order, not necessarily the order
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
778
        specified.  They are returned in a partial order by compression
779
        parent, so that the deltas can be applied as the data stream is
780
        inserted; however note that compression parents will not be sent
781
        unless they were specifically requested, as the client may already
782
        have them.
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
783
2670.3.7 by Andrew Bennetts
Tweak docstring as requested in review.
784
        :param required_versions: The exact set of versions to be extracted.
785
            Unlike some other knit methods, this is not used to generate a
786
            transitive closure, rather it is used precisely as given.
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
787
        
788
        :returns: format_signature, list of (version, options, length, parents),
789
            reader_callable.
790
        """
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
791
        required_version_set = frozenset(required_versions)
792
        version_index = {}
793
        # list of revisions that can just be sent without waiting for their
794
        # compression parent
795
        ready_to_send = []
796
        # map from revision to the children based on it
797
        deferred = {}
798
        # first, read all relevant index data, enough to sort into the right
799
        # order to return
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
800
        for version_id in required_versions:
801
            options = self._index.get_options(version_id)
802
            parents = self._index.get_parents_with_ghosts(version_id)
2535.3.36 by Andrew Bennetts
Merge bzr.dev
803
            index_memo = self._index.get_position(version_id)
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
804
            version_index[version_id] = (index_memo, options, parents)
3034.3.1 by Martin Pool
Post-review cleanups from Robert for KnitVersionedFile.get_data_stream
805
            if ('line-delta' in options
806
                and parents[0] in required_version_set):
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
807
                # must wait until the parent has been sent
808
                deferred.setdefault(parents[0], []). \
809
                    append(version_id)
810
            else:
811
                # either a fulltext, or a delta whose parent the client did
812
                # not ask for and presumably already has
813
                ready_to_send.append(version_id)
814
        # build a list of results to return, plus instructions for data to
815
        # read from the file
816
        copy_queue_records = []
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
817
        temp_version_list = []
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
818
        while ready_to_send:
819
            # XXX: pushing and popping lists may be a bit inefficient
3023.2.3 by Martin Pool
Update tests for new ordering of results from get_data_stream - the order is not defined by the interface, but is stable
820
            version_id = ready_to_send.pop(0)
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
821
            (index_memo, options, parents) = version_index[version_id]
2535.3.36 by Andrew Bennetts
Merge bzr.dev
822
            copy_queue_records.append((version_id, index_memo))
823
            none, data_pos, data_size = index_memo
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
824
            temp_version_list.append((version_id, options, data_size,
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
825
                parents))
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
826
            if version_id in deferred:
827
                # now we can send all the children of this revision - we could
3023.2.3 by Martin Pool
Update tests for new ordering of results from get_data_stream - the order is not defined by the interface, but is stable
828
                # put them in anywhere, but we hope that sending them soon
829
                # after the fulltext will give good locality in the receiver
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
830
                ready_to_send[:0] = deferred.pop(version_id)
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
831
        if not (len(deferred) == 0):
832
            raise AssertionError("Still have compressed child versions waiting to be sent")
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
833
        # XXX: The stream format is such that we cannot stream it - we have to
834
        # know the length of all the data a-priori.
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
835
        raw_datum = []
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
836
        result_version_list = []
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
837
        for (version_id, raw_data, _), \
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
838
            (version_id2, options, _, parents) in \
839
            izip(self._data.read_records_iter_raw(copy_queue_records),
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
840
                 temp_version_list):
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
841
            if not (version_id == version_id2):
842
                raise AssertionError('logic error, inconsistent results')
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
843
            raw_datum.append(raw_data)
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
844
            result_version_list.append(
845
                (version_id, options, len(raw_data), parents))
846
        # provide a callback to get data incrementally.
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
847
        pseudo_file = StringIO(''.join(raw_datum))
848
        def read(length):
849
            if length is None:
850
                return pseudo_file.read()
851
            else:
852
                return pseudo_file.read(length)
853
        return (self.get_format_signature(), result_version_list, read)
854
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
855
    def get_record_stream(self, versions, ordering, include_delta_closure):
856
        """Get a stream of records for versions.
857
858
        :param versions: The versions to include. Each version is a tuple
859
            (version,).
860
        :param ordering: Either 'unordered' or 'topological'. A topologically
861
            sorted stream has compression parents strictly before their
862
            children.
863
        :param include_delta_closure: If True then the closure across any
864
            compression parents will be included (in the opaque data).
865
        :return: An iterator of ContentFactory objects, each of which is only
866
            valid until the iterator is advanced.
867
        """
868
        if include_delta_closure:
869
            # Nb: what we should do is plan the data to stream to allow
870
            # reconstruction of all the texts without excessive buffering,
871
            # including re-sending common bases as needed. This makes the most
872
            # sense when we start serialising these streams though, so for now
873
            # we just fallback to individual text construction behind the
874
            # abstraction barrier.
875
            knit = self
876
        else:
877
            knit = None
3350.3.22 by Robert Collins
Review feedback.
878
        # We end up doing multiple index lookups here for parents details and
879
        # disk layout details - we need a unified api ?
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
880
        parent_map = self.get_parent_map(versions)
3350.3.12 by Robert Collins
Generate streams with absent records.
881
        absent_versions = set(versions) - set(parent_map)
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
882
        if ordering == 'topological':
3350.3.12 by Robert Collins
Generate streams with absent records.
883
            present_versions = topo_sort(parent_map)
884
        else:
885
            # List comprehension to keep the requested order (as that seems
886
            # marginally useful, at least until we start doing IO optimising
887
            # here.
888
            present_versions = [version for version in versions if version in
889
                parent_map]
890
        position_map = self._get_components_positions(present_versions)
891
        records = [(version, position_map[version][1]) for version in
892
            present_versions]
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
893
        record_map = {}
3350.3.12 by Robert Collins
Generate streams with absent records.
894
        for version in absent_versions:
895
            yield AbsentContentFactory((version,))
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
896
        for version, raw_data, sha1 in \
897
                self._data.read_records_iter_raw(records):
898
            (record_details, index_memo, _) = position_map[version]
899
            yield KnitContentFactory(version, parent_map[version],
900
                record_details, sha1, raw_data, self.factory.annotated, knit)
901
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
902
    def _extract_blocks(self, version_id, source, target):
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
903
        if self._index.get_method(version_id) != 'line-delta':
904
            return None
905
        parent, sha1, noeol, delta = self.get_delta(version_id)
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
906
        return KnitContent.get_line_delta_blocks(delta, source, target)
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
907
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
908
    def get_delta(self, version_id):
909
        """Get a delta for constructing version from some other version."""
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
910
        self.check_not_reserved_id(version_id)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
911
        parents = self.get_parent_map([version_id])[version_id]
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
912
        if len(parents):
913
            parent = parents[0]
914
        else:
915
            parent = None
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
916
        index_memo = self._index.get_position(version_id)
917
        data, sha1 = self._data.read_records(((version_id, index_memo),))[version_id]
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
918
        noeol = 'no-eol' in self._index.get_options(version_id)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
919
        if 'fulltext' == self._index.get_method(version_id):
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
920
            new_content = self.factory.parse_fulltext(data, version_id)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
921
            if parent is not None:
922
                reference_content = self._get_content(parent)
923
                old_texts = reference_content.text()
924
            else:
925
                old_texts = []
926
            new_texts = new_content.text()
2781.1.1 by Martin Pool
merge cpatiencediff from Lukas
927
            delta_seq = patiencediff.PatienceSequenceMatcher(None, old_texts,
928
                                                             new_texts)
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
929
            return parent, sha1, noeol, self._make_line_delta(delta_seq, new_content)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
930
        else:
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
931
            delta = self.factory.parse_line_delta(data, version_id)
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
932
            return parent, sha1, noeol, delta
2535.3.1 by Andrew Bennetts
Add get_format_signature to VersionedFile
933
934
    def get_format_signature(self):
935
        """See VersionedFile.get_format_signature()."""
936
        if self.factory.annotated:
937
            annotated_part = "annotated"
938
        else:
939
            annotated_part = "plain"
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
940
        return "knit-%s" % (annotated_part,)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
941
        
2520.4.88 by Aaron Bentley
Retrieve all sha1s at once (ftw)
942
    def get_sha1s(self, version_ids):
3316.2.9 by Robert Collins
* ``VersionedFile.get_sha1`` is deprecated, please use
943
        """See VersionedFile.get_sha1s()."""
2520.4.88 by Aaron Bentley
Retrieve all sha1s at once (ftw)
944
        record_map = self._get_record_map(version_ids)
945
        # record entry 2 is the 'digest'.
946
        return [record_map[v][2] for v in version_ids]
1666.1.6 by Robert Collins
Make knit the default format.
947
2535.3.30 by Andrew Bennetts
Delete obsolete comments and other cosmetic changes.
948
    def insert_data_stream(self, (format, data_list, reader_callable)):
2535.3.4 by Andrew Bennetts
Simple implementation of Knit.insert_data_stream.
949
        """Insert knit records from a data stream into this knit.
950
2535.3.5 by Andrew Bennetts
Batch writes as much as possible in insert_data_stream.
951
        If a version in the stream is already present in this knit, it will not
952
        be inserted a second time.  It will be checked for consistency with the
953
        stored version however, and may cause a KnitCorrupt error to be raised
954
        if the data in the stream disagrees with the already stored data.
2535.3.4 by Andrew Bennetts
Simple implementation of Knit.insert_data_stream.
955
        
956
        :seealso: get_data_stream
957
        """
958
        if format != self.get_format_signature():
3172.2.1 by Andrew Bennetts
Enable use of smart revision streaming between repos with compatible models, not just between identical format repos.
959
            if 'knit' in debug.debug_flags:
960
                trace.mutter(
961
                    'incompatible format signature inserting to %r', self)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
962
            source = self._knit_from_datastream(
963
                (format, data_list, reader_callable))
3350.3.10 by Robert Collins
Eliminate use of join in knit.insert_data_stream.
964
            stream = source.get_record_stream(source.versions(), 'unordered', False)
965
            self.insert_record_stream(stream)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
966
            return
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
967
968
        for version_id, options, length, parents in data_list:
969
            if self.has_version(version_id):
970
                # First check: the list of parents.
971
                my_parents = self.get_parents_with_ghosts(version_id)
3184.5.1 by Lukáš Lalinský
Fix handling of some error cases in insert_data_stream
972
                if tuple(my_parents) != tuple(parents):
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
973
                    # XXX: KnitCorrupt is not quite the right exception here.
974
                    raise KnitCorrupt(
975
                        self.filename,
976
                        'parents list %r from data stream does not match '
977
                        'already recorded parents %r for %s'
978
                        % (parents, my_parents, version_id))
979
980
                # Also check the SHA-1 of the fulltext this content will
981
                # produce.
982
                raw_data = reader_callable(length)
3316.2.9 by Robert Collins
* ``VersionedFile.get_sha1`` is deprecated, please use
983
                my_fulltext_sha1 = self.get_sha1s([version_id])[0]
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
984
                df, rec = self._data._parse_record_header(version_id, raw_data)
985
                stream_fulltext_sha1 = rec[3]
986
                if my_fulltext_sha1 != stream_fulltext_sha1:
987
                    # Actually, we don't know if it's this knit that's corrupt,
988
                    # or the data stream we're trying to insert.
989
                    raise KnitCorrupt(
990
                        self.filename, 'sha-1 does not match %s' % version_id)
991
            else:
2535.3.57 by Andrew Bennetts
Perform some sanity checking of data streams rather than blindly inserting them into our repository.
992
                if 'line-delta' in options:
2535.3.61 by Andrew Bennetts
Clarify sanity checking in insert_data_stream.
993
                    # Make sure that this knit record is actually useful: a
994
                    # line-delta is no use unless we have its parent.
995
                    # Fetching from a broken repository with this problem
996
                    # shouldn't break the target repository.
3040.2.1 by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled
997
                    #
998
                    # See https://bugs.launchpad.net/bzr/+bug/164443
2535.3.61 by Andrew Bennetts
Clarify sanity checking in insert_data_stream.
999
                    if not self._index.has_version(parents[0]):
1000
                        raise KnitCorrupt(
1001
                            self.filename,
3040.2.1 by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled
1002
                            'line-delta from stream '
1003
                            'for version %s '
1004
                            'references '
1005
                            'missing parent %s\n'
3040.2.2 by Martin Pool
Clearer reconcile recommendation message (thanks Matt Nordhoff)
1006
                            'Try running "bzr check" '
1007
                            'on the source repository, and "bzr reconcile" '
3040.2.1 by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled
1008
                            'if necessary.' %
1009
                            (version_id, parents[0]))
3370.1.1 by Andrew Bennetts
Tentative fix for 217701.
1010
                    if not self.delta:
1011
                        # We received a line-delta record for a non-delta knit.
1012
                        # Convert it to a fulltext.
1013
                        gzip_bytes = reader_callable(length)
3370.1.8 by Andrew Bennetts
Extract a _convert_line_delta_to_fulltext helper.
1014
                        self._convert_line_delta_to_fulltext(
1015
                            gzip_bytes, version_id, parents)
3370.1.1 by Andrew Bennetts
Tentative fix for 217701.
1016
                        continue
1017
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
1018
                self._add_raw_records(
1019
                    [(version_id, options, parents, length)],
1020
                    reader_callable(length))
1021
3370.1.8 by Andrew Bennetts
Extract a _convert_line_delta_to_fulltext helper.
1022
    def _convert_line_delta_to_fulltext(self, gzip_bytes, version_id, parents):
1023
        lines, sha1 = self._data._parse_record(version_id, gzip_bytes)
1024
        delta = self.factory.parse_line_delta(lines, version_id)
1025
        content = self.factory.make(self.get_lines(parents[0]), parents[0])
1026
        content.apply_delta(delta, version_id)
1027
        digest, len, content = self.add_lines(
1028
            version_id, parents, content.text())
1029
        if digest != sha1:
1030
            raise errors.VersionedFileInvalidChecksum(version_id)
1031
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
1032
    def _knit_from_datastream(self, (format, data_list, reader_callable)):
1033
        """Create a knit object from a data stream.
1034
1035
        This method exists to allow conversion of data streams that do not
1036
        match the signature of this knit. Generally it will be slower and use
1037
        more memory to use this method to insert data, but it will work.
1038
1039
        :seealso: get_data_stream for details on datastreams.
1040
        :return: A knit versioned file which can be used to join the datastream
1041
            into self.
1042
        """
1043
        if format == "knit-plain":
1044
            factory = KnitPlainFactory()
1045
        elif format == "knit-annotated":
1046
            factory = KnitAnnotateFactory()
1047
        else:
1048
            raise errors.KnitDataStreamUnknown(format)
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1049
        index = _StreamIndex(data_list, self._index)
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
1050
        access = _StreamAccess(reader_callable, index, self, factory)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
1051
        return KnitVersionedFile(self.filename, self.transport,
1052
            factory=factory, index=index, access_method=access)
1053
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1054
    def insert_record_stream(self, stream):
1055
        """Insert a record stream into this versioned file.
1056
1057
        :param stream: A stream of records to insert. 
1058
        :return: None
1059
        :seealso VersionedFile.get_record_stream:
1060
        """
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1061
        def get_adapter(adapter_key):
1062
            try:
1063
                return adapters[adapter_key]
1064
            except KeyError:
1065
                adapter_factory = adapter_registry.get(adapter_key)
1066
                adapter = adapter_factory(self)
1067
                adapters[adapter_key] = adapter
1068
                return adapter
1069
        if self.factory.annotated:
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1070
            # self is annotated, we need annotated knits to use directly.
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1071
            annotated = "annotated-"
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1072
            convertibles = []
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1073
        else:
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1074
            # self is not annotated, but we can strip annotations cheaply.
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1075
            annotated = ""
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1076
            convertibles = set(["knit-annotated-delta-gz",
1077
                "knit-annotated-ft-gz"])
3350.3.22 by Robert Collins
Review feedback.
1078
        # The set of types we can cheaply adapt without needing basis texts.
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1079
        native_types = set()
1080
        native_types.add("knit-%sdelta-gz" % annotated)
1081
        native_types.add("knit-%sft-gz" % annotated)
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1082
        knit_types = native_types.union(convertibles)
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1083
        adapters = {}
3350.3.22 by Robert Collins
Review feedback.
1084
        # Buffer all index entries that we can't add immediately because their
3350.3.17 by Robert Collins
Prevent corrupt knits being created when a stream is interrupted with basis parents not present.
1085
        # basis parent is missing. We don't buffer all because generating
1086
        # annotations may require access to some of the new records. However we
1087
        # can't generate annotations from new deltas until their basis parent
1088
        # is present anyway, so we get away with not needing an index that
3350.3.22 by Robert Collins
Review feedback.
1089
        # includes the new keys.
3350.3.17 by Robert Collins
Prevent corrupt knits being created when a stream is interrupted with basis parents not present.
1090
        # key = basis_parent, value = index entry to add
1091
        buffered_index_entries = {}
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1092
        for record in stream:
3350.3.15 by Robert Collins
Update the insert_record_stream contract to error if an absent record is provided.
1093
            # Raise an error when a record is missing.
1094
            if record.storage_kind == 'absent':
1095
                raise RevisionNotPresent([record.key[0]], self)
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1096
            # adapt to non-tuple interface
1097
            parents = [parent[0] for parent in record.parents]
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1098
            if record.storage_kind in knit_types:
1099
                if record.storage_kind not in native_types:
1100
                    try:
1101
                        adapter_key = (record.storage_kind, "knit-delta-gz")
1102
                        adapter = get_adapter(adapter_key)
1103
                    except KeyError:
1104
                        adapter_key = (record.storage_kind, "knit-ft-gz")
1105
                        adapter = get_adapter(adapter_key)
1106
                    bytes = adapter.get_bytes(
1107
                        record, record.get_bytes_as(record.storage_kind))
1108
                else:
1109
                    bytes = record.get_bytes_as(record.storage_kind)
1110
                options = [record._build_details[0]]
1111
                if record._build_details[1]:
1112
                    options.append('no-eol')
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1113
                # Just blat it across.
1114
                # Note: This does end up adding data on duplicate keys. As
1115
                # modern repositories use atomic insertions this should not
1116
                # lead to excessive growth in the event of interrupted fetches.
1117
                # 'knit' repositories may suffer excessive growth, but as a
1118
                # deprecated format this is tolerable. It can be fixed if
1119
                # needed by in the kndx index support raising on a duplicate
1120
                # add with identical parents and options.
3350.3.17 by Robert Collins
Prevent corrupt knits being created when a stream is interrupted with basis parents not present.
1121
                access_memo = self._data.add_raw_records([len(bytes)], bytes)[0]
1122
                index_entry = (record.key[0], options, access_memo, parents)
1123
                buffered = False
1124
                if 'fulltext' not in options:
1125
                    basis_parent = parents[0]
1126
                    if not self.has_version(basis_parent):
1127
                        pending = buffered_index_entries.setdefault(
1128
                            basis_parent, [])
1129
                        pending.append(index_entry)
1130
                        buffered = True
1131
                if not buffered:
1132
                    self._index.add_versions([index_entry])
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1133
            elif record.storage_kind == 'fulltext':
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1134
                self.add_lines(record.key[0], parents,
1135
                    split_lines(record.get_bytes_as('fulltext')))
1136
            else:
1137
                adapter_key = record.storage_kind, 'fulltext'
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1138
                adapter = get_adapter(adapter_key)
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1139
                lines = split_lines(adapter.get_bytes(
1140
                    record, record.get_bytes_as(record.storage_kind)))
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1141
                try:
1142
                    self.add_lines(record.key[0], parents, lines)
1143
                except errors.RevisionAlreadyPresent:
1144
                    pass
3350.3.17 by Robert Collins
Prevent corrupt knits being created when a stream is interrupted with basis parents not present.
1145
            # Add any records whose basis parent is now available.
1146
            added_keys = [record.key[0]]
1147
            while added_keys:
1148
                key = added_keys.pop(0)
1149
                if key in buffered_index_entries:
1150
                    index_entries = buffered_index_entries[key]
1151
                    self._index.add_versions(index_entries)
1152
                    added_keys.extend(
1153
                        [index_entry[0] for index_entry in index_entries])
1154
                    del buffered_index_entries[key]
1155
        # If there were any deltas which had a missing basis parent, error.
1156
        if buffered_index_entries:
1157
            raise errors.RevisionNotPresent(buffered_index_entries.keys()[0],
1158
                self)
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1159
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1160
    def versions(self):
1161
        """See VersionedFile.versions."""
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
1162
        if 'evil' in debug.debug_flags:
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
1163
            trace.mutter_callsite(2, "versions scales with size of history")
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1164
        return self._index.get_versions()
1165
1166
    def has_version(self, version_id):
1167
        """See VersionedFile.has_version."""
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
1168
        if 'evil' in debug.debug_flags:
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
1169
            trace.mutter_callsite(2, "has_version is a LBYL scenario")
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1170
        return self._index.has_version(version_id)
1171
1172
    __contains__ = has_version
1173
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1174
    def _merge_annotations(self, content, parents, parent_texts={},
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1175
                           delta=None, annotated=None,
1176
                           left_matching_blocks=None):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1177
        """Merge annotations for content.  This is done by comparing
1596.2.27 by Robert Collins
Note potential improvements in knit adds.
1178
        the annotations based on changed to the text.
1179
        """
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
1180
        if left_matching_blocks is not None:
1181
            delta_seq = diff._PrematchedMatcher(left_matching_blocks)
1182
        else:
1183
            delta_seq = None
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1184
        if annotated:
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
1185
            for parent_id in parents:
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1186
                merge_content = self._get_content(parent_id, parent_texts)
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
1187
                if (parent_id == parents[0] and delta_seq is not None):
1188
                    seq = delta_seq
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1189
                else:
1190
                    seq = patiencediff.PatienceSequenceMatcher(
1191
                        None, merge_content.text(), content.text())
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1192
                for i, j, n in seq.get_matching_blocks():
1193
                    if n == 0:
1194
                        continue
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
1195
                    # this appears to copy (origin, text) pairs across to the
1196
                    # new content for any line that matches the last-checked
1197
                    # parent.
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1198
                    content._lines[j:j+n] = merge_content._lines[i:i+n]
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
1199
        if delta:
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
1200
            if delta_seq is None:
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
1201
                reference_content = self._get_content(parents[0], parent_texts)
1202
                new_texts = content.text()
1203
                old_texts = reference_content.text()
2104.4.2 by John Arbash Meinel
Small cleanup and NEWS entry about fixing bug #65714
1204
                delta_seq = patiencediff.PatienceSequenceMatcher(
2100.2.1 by wang
Replace python's difflib by patiencediff because the worst case
1205
                                                 None, old_texts, new_texts)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
1206
            return self._make_line_delta(delta_seq, content)
1207
1208
    def _make_line_delta(self, delta_seq, new_content):
1209
        """Generate a line delta from delta_seq and new_content."""
1210
        diff_hunks = []
1211
        for op in delta_seq.get_opcodes():
1212
            if op[0] == 'equal':
1213
                continue
1214
            diff_hunks.append((op[1], op[2], op[4]-op[3], new_content._lines[op[3]:op[4]]))
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1215
        return diff_hunks
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1216
1756.3.17 by Aaron Bentley
Combine get_components_positions with get_components_versions
1217
    def _get_components_positions(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
1218
        """Produce a map of position data for the components of versions.
1219
1756.3.22 by Aaron Bentley
Tweaks from review
1220
        This data is intended to be used for retrieving the knit records.
1756.3.19 by Aaron Bentley
Documentation and cleanups
1221
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1222
        A dict of version_id to (record_details, index_memo, next, parents) is
1756.3.19 by Aaron Bentley
Documentation and cleanups
1223
        returned.
1224
        method is the way referenced data should be applied.
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1225
        index_memo is the handle to pass to the data access to actually get the
1226
            data
1756.3.19 by Aaron Bentley
Documentation and cleanups
1227
        next is the build-parent of the version, or None for fulltexts.
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1228
        parents is the version_ids of the parents of this version
1756.3.19 by Aaron Bentley
Documentation and cleanups
1229
        """
1756.3.9 by Aaron Bentley
More optimization refactoring
1230
        component_data = {}
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1231
        pending_components = version_ids
1232
        while pending_components:
1233
            build_details = self._index.get_build_details(pending_components)
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1234
            current_components = set(pending_components)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1235
            pending_components = set()
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1236
            for version_id, details in build_details.iteritems():
1237
                (index_memo, compression_parent, parents,
1238
                 record_details) = details
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1239
                method = record_details[0]
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1240
                if compression_parent is not None:
1241
                    pending_components.add(compression_parent)
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1242
                component_data[version_id] = (record_details, index_memo,
3224.1.13 by John Arbash Meinel
Revert the _get_component_positions api
1243
                                              compression_parent)
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1244
            missing = current_components.difference(build_details)
1245
            if missing:
1246
                raise errors.RevisionNotPresent(missing.pop(), self.filename)
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
1247
        return component_data
1756.3.18 by Aaron Bentley
More cleanup
1248
       
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1249
    def _get_content(self, version_id, parent_texts={}):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1250
        """Returns a content object that makes up the specified
1251
        version."""
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1252
        cached_version = parent_texts.get(version_id, None)
1253
        if cached_version is not None:
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1254
            if not self.has_version(version_id):
1255
                raise RevisionNotPresent(version_id, self.filename)
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1256
            return cached_version
1257
1756.3.22 by Aaron Bentley
Tweaks from review
1258
        text_map, contents_map = self._get_content_maps([version_id])
1259
        return contents_map[version_id]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1260
1261
    def _check_versions_present(self, version_ids):
1262
        """Check that all specified versions are present."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1263
        self._index.check_versions_present(version_ids)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1264
2794.1.1 by Robert Collins
Allow knits to be instructed not to add a text based on a sha, for commit.
1265
    def _add_lines_with_ghosts(self, version_id, parents, lines, parent_texts,
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
1266
        nostore_sha, random_id, check_content, left_matching_blocks):
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1267
        """See VersionedFile.add_lines_with_ghosts()."""
2805.6.7 by Robert Collins
Review feedback.
1268
        self._check_add(version_id, lines, random_id, check_content)
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1269
        return self._add(version_id, lines, parents, self.delta,
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
1270
            parent_texts, left_matching_blocks, nostore_sha, random_id)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1271
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1272
    def _add_lines(self, version_id, parents, lines, parent_texts,
2805.6.7 by Robert Collins
Review feedback.
1273
        left_matching_blocks, nostore_sha, random_id, check_content):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1274
        """See VersionedFile.add_lines."""
2805.6.7 by Robert Collins
Review feedback.
1275
        self._check_add(version_id, lines, random_id, check_content)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1276
        self._check_versions_present(parents)
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1277
        return self._add(version_id, lines[:], parents, self.delta,
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1278
            parent_texts, left_matching_blocks, nostore_sha, random_id)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1279
2805.6.7 by Robert Collins
Review feedback.
1280
    def _check_add(self, version_id, lines, random_id, check_content):
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1281
        """check that version_id and lines are safe to add."""
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1282
        if contains_whitespace(version_id):
1668.5.1 by Olaf Conradi
Fix bug in knits when raising InvalidRevisionId without the required
1283
            raise InvalidRevisionId(version_id, self.filename)
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
1284
        self.check_not_reserved_id(version_id)
2805.6.4 by Robert Collins
Don't check for existing versions when adding texts with random revision ids.
1285
        # Technically this could be avoided if we are happy to allow duplicate
1286
        # id insertion when other things than bzr core insert texts, but it
1287
        # seems useful for folk using the knit api directly to have some safety
1288
        # blanket that we can disable.
1289
        if not random_id and self.has_version(version_id):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1290
            raise RevisionAlreadyPresent(version_id, self.filename)
2805.6.7 by Robert Collins
Review feedback.
1291
        if check_content:
1292
            self._check_lines_not_unicode(lines)
1293
            self._check_lines_are_lines(lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1294
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1295
    def _add(self, version_id, lines, parents, delta, parent_texts,
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1296
        left_matching_blocks, nostore_sha, random_id):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1297
        """Add a set of lines on top of version specified by parents.
1298
1299
        If delta is true, compress the text as a line-delta against
1300
        the first parent.
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1301
1302
        Any versions not present will be converted into ghosts.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1303
        """
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1304
        # first thing, if the content is something we don't need to store, find
1305
        # that out.
1306
        line_bytes = ''.join(lines)
1307
        digest = sha_string(line_bytes)
1308
        if nostore_sha == digest:
1309
            raise errors.ExistingContent
1596.2.28 by Robert Collins
more knit profile based tuning.
1310
1596.2.10 by Robert Collins
Reviewer feedback on knit branches.
1311
        present_parents = []
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1312
        if parent_texts is None:
1313
            parent_texts = {}
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1314
        for parent in parents:
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1315
            if self.has_version(parent):
1596.2.10 by Robert Collins
Reviewer feedback on knit branches.
1316
                present_parents.append(parent)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1317
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1318
        # can only compress against the left most present parent.
1319
        if (delta and
1320
            (len(present_parents) == 0 or
1321
             present_parents[0] != parents[0])):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1322
            delta = False
1323
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1324
        text_length = len(line_bytes)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1325
        options = []
1326
        if lines:
1327
            if lines[-1][-1] != '\n':
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1328
                # copy the contents of lines.
1329
                lines = lines[:]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1330
                options.append('no-eol')
1331
                lines[-1] = lines[-1] + '\n'
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
1332
                line_bytes += '\n'
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1333
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1334
        if delta:
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1335
            # To speed the extract of texts the delta chain is limited
1336
            # to a fixed number of deltas.  This should minimize both
1337
            # I/O and the time spend applying deltas.
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
1338
            delta = self._check_should_delta(present_parents)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1339
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1340
        content = self.factory.make(lines, version_id)
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1341
        if delta or (self.factory.annotated and len(present_parents) > 0):
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1342
            # Merge annotations from parent texts if needed.
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1343
            delta_hunks = self._merge_annotations(content, present_parents,
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1344
                parent_texts, delta, self.factory.annotated,
1345
                left_matching_blocks)
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1346
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1347
        if delta:
1348
            options.append('line-delta')
1349
            store_lines = self.factory.lower_line_delta(delta_hunks)
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1350
            size, bytes = self._data._record_to_data(version_id, digest,
1351
                store_lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1352
        else:
1353
            options.append('fulltext')
2888.1.3 by Robert Collins
Review feedback.
1354
            # isinstance is slower and we have no hierarchy.
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
1355
            if self.factory.__class__ == KnitPlainFactory:
2888.1.3 by Robert Collins
Review feedback.
1356
                # Use the already joined bytes saving iteration time in
1357
                # _record_to_data.
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
1358
                size, bytes = self._data._record_to_data(version_id, digest,
1359
                    lines, [line_bytes])
1360
            else:
1361
                # get mixed annotation + content and feed it into the
1362
                # serialiser.
1363
                store_lines = self.factory.lower_fulltext(content)
1364
                size, bytes = self._data._record_to_data(version_id, digest,
1365
                    store_lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1366
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1367
        access_memo = self._data.add_raw_records([size], bytes)[0]
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1368
        self._index.add_versions(
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1369
            ((version_id, options, access_memo, parents),),
1370
            random_id=random_id)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1371
        return digest, text_length, content
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1372
1563.2.19 by Robert Collins
stub out a check for knits.
1373
    def check(self, progress_bar=None):
1374
        """See VersionedFile.check()."""
3350.3.16 by Robert Collins
Add test that out of order insertion fails with a clean error/does not fail.
1375
        # This doesn't actually test extraction of everything, but that will
1376
        # impact 'bzr check' substantially, and needs to be integrated with
1377
        # care. However, it does check for the obvious problem of a delta with
1378
        # no basis.
1379
        versions = self.versions()
1380
        parent_map = self.get_parent_map(versions)
1381
        for version in versions:
3350.3.17 by Robert Collins
Prevent corrupt knits being created when a stream is interrupted with basis parents not present.
1382
            if self._index.get_method(version) != 'fulltext':
3350.3.16 by Robert Collins
Add test that out of order insertion fails with a clean error/does not fail.
1383
                compression_parent = parent_map[version][0]
1384
                if compression_parent not in parent_map:
1385
                    raise errors.KnitCorrupt(self,
1386
                        "Missing basis parent %s for %s" % (
1387
                        compression_parent, version))
1563.2.19 by Robert Collins
stub out a check for knits.
1388
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1389
    def get_lines(self, version_id):
1390
        """See VersionedFile.get_lines()."""
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
1391
        return self.get_line_list([version_id])[0]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1392
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
1393
    def _get_record_map(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
1394
        """Produce a dictionary of knit records.
1395
        
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1396
        :return: {version_id:(record, record_details, digest, next)}
1397
            record
1398
                data returned from read_records
1399
            record_details
1400
                opaque information to pass to parse_record
1401
            digest
1402
                SHA1 digest of the full text after all steps are done
1403
            next
1404
                build-parent of the version, i.e. the leftmost ancestor.
1405
                Will be None if the record is not a delta.
1756.3.19 by Aaron Bentley
Documentation and cleanups
1406
        """
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
1407
        position_map = self._get_components_positions(version_ids)
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1408
        # c = component_id, r = record_details, i_m = index_memo, n = next
1409
        records = [(c, i_m) for c, (r, i_m, n)
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1410
                             in position_map.iteritems()]
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
1411
        record_map = {}
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1412
        for component_id, record, digest in \
1863.1.9 by John Arbash Meinel
Switching to have 'read_records_iter' return in random order.
1413
                self._data.read_records_iter(records):
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1414
            (record_details, index_memo, next) = position_map[component_id]
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1415
            record_map[component_id] = record, record_details, digest, next
3224.1.13 by John Arbash Meinel
Revert the _get_component_positions api
1416
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
1417
        return record_map
1756.2.5 by Aaron Bentley
Reduced read_records calls to 1
1418
1756.2.7 by Aaron Bentley
Implement get_text in terms of get_texts
1419
    def get_text(self, version_id):
1420
        """See VersionedFile.get_text"""
1421
        return self.get_texts([version_id])[0]
1422
1756.2.1 by Aaron Bentley
Implement get_texts
1423
    def get_texts(self, version_ids):
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
1424
        return [''.join(l) for l in self.get_line_list(version_ids)]
1425
1426
    def get_line_list(self, version_ids):
1756.2.1 by Aaron Bentley
Implement get_texts
1427
        """Return the texts of listed versions as a list of strings."""
2229.2.1 by Aaron Bentley
Reject reserved ids in versiondfile, tree, branch and repository
1428
        for version_id in version_ids:
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
1429
            self.check_not_reserved_id(version_id)
1756.3.13 by Aaron Bentley
Refactor get_line_list into _get_content
1430
        text_map, content_map = self._get_content_maps(version_ids)
1431
        return [text_map[v] for v in version_ids]
1432
2520.4.90 by Aaron Bentley
Handle \r terminated lines in Weaves properly
1433
    _get_lf_split_line_list = get_line_list
2520.4.3 by Aaron Bentley
Implement plain strategy for extracting and installing multiparent diffs
1434
1756.3.13 by Aaron Bentley
Refactor get_line_list into _get_content
1435
    def _get_content_maps(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
1436
        """Produce maps of text and KnitContents
1437
        
1438
        :return: (text_map, content_map) where text_map contains the texts for
1439
        the requested versions and content_map contains the KnitContents.
1756.3.22 by Aaron Bentley
Tweaks from review
1440
        Both dicts take version_ids as their keys.
1756.3.19 by Aaron Bentley
Documentation and cleanups
1441
        """
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
1442
        # FUTURE: This function could be improved for the 'extract many' case
1443
        # by tracking each component and only doing the copy when the number of
1444
        # children than need to apply delta's to it is > 1 or it is part of the
1445
        # final output.
1446
        version_ids = list(version_ids)
1447
        multiple_versions = len(version_ids) != 1
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
1448
        record_map = self._get_record_map(version_ids)
1756.2.5 by Aaron Bentley
Reduced read_records calls to 1
1449
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
1450
        text_map = {}
1756.3.7 by Aaron Bentley
Avoid re-parsing texts version components
1451
        content_map = {}
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
1452
        final_content = {}
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
1453
        for version_id in version_ids:
1454
            components = []
1455
            cursor = version_id
1456
            while cursor is not None:
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1457
                record, record_details, digest, next = record_map[cursor]
1458
                components.append((cursor, record, record_details, digest))
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
1459
                if cursor in content_map:
1460
                    break
1461
                cursor = next
1462
1756.2.1 by Aaron Bentley
Implement get_texts
1463
            content = None
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1464
            for (component_id, record, record_details,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1465
                 digest) in reversed(components):
1756.3.7 by Aaron Bentley
Avoid re-parsing texts version components
1466
                if component_id in content_map:
1467
                    content = content_map[component_id]
1756.3.8 by Aaron Bentley
Avoid unused calls, use generators, sets instead of lists
1468
                else:
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1469
                    content, delta = self.factory.parse_record(version_id,
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1470
                        record, record_details, content,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1471
                        copy_base_content=multiple_versions)
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
1472
                    if multiple_versions:
1473
                        content_map[component_id] = content
1756.2.1 by Aaron Bentley
Implement get_texts
1474
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1475
            content.cleanup_eol(copy_on_mutate=multiple_versions)
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
1476
            final_content[version_id] = content
1756.2.1 by Aaron Bentley
Implement get_texts
1477
1478
            # digest here is the digest from the last applied component.
1756.3.6 by Aaron Bentley
More multi-text extraction
1479
            text = content.text()
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
1480
            actual_sha = sha_strings(text)
1481
            if actual_sha != digest:
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1482
                raise KnitCorrupt(self.filename,
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
1483
                    '\n  sha-1 %s'
1484
                    '\n  of reconstructed text does not match'
1485
                    '\n  expected %s'
1486
                    '\n  for version %s' %
1487
                    (actual_sha, digest, version_id))
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1488
            text_map[version_id] = text
1489
        return text_map, final_content
1756.2.1 by Aaron Bentley
Implement get_texts
1490
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1491
    def iter_lines_added_or_present_in_versions(self, version_ids=None, 
1492
                                                pb=None):
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
1493
        """See VersionedFile.iter_lines_added_or_present_in_versions()."""
1494
        if version_ids is None:
1495
            version_ids = self.versions()
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1496
        if pb is None:
1497
            pb = progress.DummyProgress()
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
1498
        # we don't care about inclusions, the caller cares.
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
1499
        # but we need to setup a list of records to visit.
1500
        # we need version_id, position, length
1501
        version_id_records = []
2163.1.1 by John Arbash Meinel
Use a set to make iter_lines_added_or_present *much* faster
1502
        requested_versions = set(version_ids)
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
1503
        # filter for available versions
2698.2.4 by Robert Collins
Remove full history scan during iter_lines_added_or_present in KnitVersionedFile.
1504
        for version_id in requested_versions:
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
1505
            if not self.has_version(version_id):
1506
                raise RevisionNotPresent(version_id, self.filename)
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
1507
        # get a in-component-order queue:
1508
        for version_id in self.versions():
1509
            if version_id in requested_versions:
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1510
                index_memo = self._index.get_position(version_id)
1511
                version_id_records.append((version_id, index_memo))
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
1512
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
1513
        total = len(version_id_records)
2147.1.3 by John Arbash Meinel
In knit.py we were re-using a variable in 2 loops, causing bogus progress messages to be generated.
1514
        for version_idx, (version_id, data, sha_value) in \
1515
            enumerate(self._data.read_records_iter(version_id_records)):
1516
            pb.update('Walking content.', version_idx, total)
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1517
            method = self._index.get_method(version_id)
1518
            if method == 'fulltext':
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1519
                line_iterator = self.factory.get_fulltext_content(data)
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
1520
            elif method == 'line-delta':
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1521
                line_iterator = self.factory.get_linedelta_content(data)
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
1522
            else:
1523
                raise ValueError('invalid method %r' % (method,))
2975.3.1 by Robert Collins
Change (without backwards compatibility) the
1524
            # XXX: It might be more efficient to yield (version_id,
1525
            # line_iterator) in the future. However for now, this is a simpler
1526
            # change to integrate into the rest of the codebase. RBC 20071110
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1527
            for line in line_iterator:
2975.3.1 by Robert Collins
Change (without backwards compatibility) the
1528
                yield line, version_id
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1529
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1530
        pb.update('Walking content.', total, total)
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
1531
        
1563.2.18 by Robert Collins
get knit repositories really using knits for text storage.
1532
    def num_versions(self):
1533
        """See VersionedFile.num_versions()."""
1534
        return self._index.num_versions()
1535
1536
    __len__ = num_versions
1537
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
1538
    def annotate(self, version_id):
1539
        """See VersionedFile.annotate."""
1540
        return self.factory.annotate(self, version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1541
3287.5.1 by Robert Collins
Add VersionedFile.get_parent_map.
1542
    def get_parent_map(self, version_ids):
1543
        """See VersionedFile.get_parent_map."""
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
1544
        return self._index.get_parent_map(version_ids)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1545
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1546
    def get_ancestry(self, versions, topo_sorted=True):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1547
        """See VersionedFile.get_ancestry."""
1548
        if isinstance(versions, basestring):
1549
            versions = [versions]
1550
        if not versions:
1551
            return []
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1552
        return self._index.get_ancestry(versions, topo_sorted)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1553
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1554
    def get_ancestry_with_ghosts(self, versions):
1555
        """See VersionedFile.get_ancestry_with_ghosts."""
1556
        if isinstance(versions, basestring):
1557
            versions = [versions]
1558
        if not versions:
1559
            return []
1560
        return self._index.get_ancestry_with_ghosts(versions)
1561
1664.2.3 by Aaron Bentley
Add failing test case
1562
    def plan_merge(self, ver_a, ver_b):
1664.2.11 by Aaron Bentley
Clarifications from merge review
1563
        """See VersionedFile.plan_merge."""
2490.2.33 by Aaron Bentley
Disable topological sorting of get_ancestry where sensible
1564
        ancestors_b = set(self.get_ancestry(ver_b, topo_sorted=False))
1565
        ancestors_a = set(self.get_ancestry(ver_a, topo_sorted=False))
1664.2.4 by Aaron Bentley
Identify unchanged lines correctly
1566
        annotated_a = self.annotate(ver_a)
1567
        annotated_b = self.annotate(ver_b)
1551.15.46 by Aaron Bentley
Move plan merge to tree
1568
        return merge._plan_annotate_merge(annotated_a, annotated_b,
1569
                                          ancestors_a, ancestors_b)
1664.2.4 by Aaron Bentley
Identify unchanged lines correctly
1570
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1571
1572
class _KnitComponentFile(object):
1573
    """One of the files used to implement a knit database"""
1574
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1575
    def __init__(self, transport, filename, mode, file_mode=None,
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1576
                 create_parent_dir=False, dir_mode=None):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1577
        self._transport = transport
1578
        self._filename = filename
1579
        self._mode = mode
1946.2.3 by John Arbash Meinel
Pass around the file mode correctly
1580
        self._file_mode = file_mode
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1581
        self._dir_mode = dir_mode
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1582
        self._create_parent_dir = create_parent_dir
1583
        self._need_to_create = False
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1584
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1585
    def _full_path(self):
1586
        """Return the full path to this file."""
1587
        return self._transport.base + self._filename
1588
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1589
    def check_header(self, fp):
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1590
        line = fp.readline()
2171.1.1 by John Arbash Meinel
Knit index files should ignore empty indexes rather than consider them corrupt.
1591
        if line == '':
1592
            # An empty file can actually be treated as though the file doesn't
1593
            # exist yet.
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1594
            raise errors.NoSuchFile(self._full_path())
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1595
        if line != self.HEADER:
2171.1.1 by John Arbash Meinel
Knit index files should ignore empty indexes rather than consider them corrupt.
1596
            raise KnitHeaderError(badline=line,
1597
                              filename=self._transport.abspath(self._filename))
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1598
1599
    def __repr__(self):
1600
        return '%s(%s)' % (self.__class__.__name__, self._filename)
1601
1602
1603
class _KnitIndex(_KnitComponentFile):
1604
    """Manages knit index file.
1605
1606
    The index is already kept in memory and read on startup, to enable
1607
    fast lookups of revision information.  The cursor of the index
1608
    file is always pointing to the end, making it easy to append
1609
    entries.
1610
1611
    _cache is a cache for fast mapping from version id to a Index
1612
    object.
1613
1614
    _history is a cache for fast mapping from indexes to version ids.
1615
1616
    The index data format is dictionary compressed when it comes to
1617
    parent references; a index entry may only have parents that with a
1618
    lover index number.  As a result, the index is topological sorted.
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
1619
1620
    Duplicate entries may be written to the index for a single version id
1621
    if this is done then the latter one completely replaces the former:
1622
    this allows updates to correct version and parent information. 
1623
    Note that the two entries may share the delta, and that successive
1624
    annotations and references MUST point to the first entry.
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1625
1626
    The index file on disc contains a header, followed by one line per knit
1627
    record. The same revision can be present in an index file more than once.
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1628
    The first occurrence gets assigned a sequence number starting from 0. 
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1629
    
1630
    The format of a single line is
1631
    REVISION_ID FLAGS BYTE_OFFSET LENGTH( PARENT_ID|PARENT_SEQUENCE_ID)* :\n
1632
    REVISION_ID is a utf8-encoded revision id
1633
    FLAGS is a comma separated list of flags about the record. Values include 
1634
        no-eol, line-delta, fulltext.
1635
    BYTE_OFFSET is the ascii representation of the byte offset in the data file
1636
        that the the compressed data starts at.
1637
    LENGTH is the ascii representation of the length of the data file.
1638
    PARENT_ID a utf-8 revision id prefixed by a '.' that is a parent of
1639
        REVISION_ID.
1640
    PARENT_SEQUENCE_ID the ascii representation of the sequence number of a
1641
        revision id already in the knit that is a parent of REVISION_ID.
1642
    The ' :' marker is the end of record marker.
1643
    
1644
    partial writes:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1645
    when a write is interrupted to the index file, it will result in a line
1646
    that does not end in ' :'. If the ' :' is not present at the end of a line,
1647
    or at the end of the file, then the record that is missing it will be
1648
    ignored by the parser.
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1649
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1650
    When writing new records to the index file, the data is preceded by '\n'
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1651
    to ensure that records always start on new lines even if the last write was
1652
    interrupted. As a result its normal for the last line in the index to be
1653
    missing a trailing newline. One can be added with no harmful effects.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1654
    """
1655
1666.1.6 by Robert Collins
Make knit the default format.
1656
    HEADER = "# bzr knit index 8\n"
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1657
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1658
    # speed of knit parsing went from 280 ms to 280 ms with slots addition.
1659
    # __slots__ = ['_cache', '_history', '_transport', '_filename']
1660
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1661
    def _cache_version(self, version_id, options, pos, size, parents):
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1662
        """Cache a version record in the history array and index cache.
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1663
1664
        This is inlined into _load_data for performance. KEEP IN SYNC.
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1665
        (It saves 60ms, 25% of the __init__ overhead on local 4000 record
1666
         indexes).
1667
        """
1596.2.14 by Robert Collins
Make knit parsing non quadratic?
1668
        # only want the _history index to reference the 1st index entry
1669
        # for version_id
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1670
        if version_id not in self._cache:
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1671
            index = len(self._history)
1596.2.14 by Robert Collins
Make knit parsing non quadratic?
1672
            self._history.append(version_id)
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1673
        else:
1674
            index = self._cache[version_id][5]
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1675
        self._cache[version_id] = (version_id,
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1676
                                   options,
1677
                                   pos,
1678
                                   size,
1679
                                   parents,
1680
                                   index)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1681
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
1682
    def _check_write_ok(self):
3316.2.5 by Robert Collins
Review feedback.
1683
        if self._get_scope() != self._scope:
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
1684
            raise errors.OutSideTransaction()
1685
        if self._mode != 'w':
1686
            raise errors.ReadOnlyObjectDirtiedError(self)
1687
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1688
    def __init__(self, transport, filename, mode, create=False, file_mode=None,
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
1689
        create_parent_dir=False, delay_create=False, dir_mode=None,
1690
        get_scope=None):
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1691
        _KnitComponentFile.__init__(self, transport, filename, mode,
1692
                                    file_mode=file_mode,
1693
                                    create_parent_dir=create_parent_dir,
1694
                                    dir_mode=dir_mode)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1695
        self._cache = {}
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
1696
        # position in _history is the 'official' index for a revision
1697
        # but the values may have come from a newer entry.
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1698
        # so - wc -l of a knit index is != the number of unique names
1773.4.1 by Martin Pool
Add pyflakes makefile target; fix many warnings
1699
        # in the knit.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1700
        self._history = []
1701
        try:
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1702
            fp = self._transport.get(self._filename)
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
1703
            try:
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1704
                # _load_data may raise NoSuchFile if the target knit is
1705
                # completely empty.
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
1706
                _load_data(self, fp)
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1707
            finally:
1708
                fp.close()
1709
        except NoSuchFile:
1710
            if mode != 'w' or not create:
1711
                raise
1712
            elif delay_create:
1713
                self._need_to_create = True
1714
            else:
1715
                self._transport.put_bytes_non_atomic(
1716
                    self._filename, self.HEADER, mode=self._file_mode)
3316.2.5 by Robert Collins
Review feedback.
1717
        self._scope = get_scope()
1718
        self._get_scope = get_scope
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1719
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1720
    def get_ancestry(self, versions, topo_sorted=True):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1721
        """See VersionedFile.get_ancestry."""
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1722
        # get a graph of all the mentioned versions:
1723
        graph = {}
1724
        pending = set(versions)
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1725
        cache = self._cache
1726
        while pending:
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1727
            version = pending.pop()
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1728
            # trim ghosts
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1729
            try:
1730
                parents = [p for p in cache[version][4] if p in cache]
1731
            except KeyError:
1732
                raise RevisionNotPresent(version, self._filename)
1733
            # if not completed and not a ghost
1734
            pending.update([p for p in parents if p not in graph])
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1735
            graph[version] = parents
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1736
        if not topo_sorted:
1737
            return graph.keys()
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1738
        return topo_sort(graph.items())
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1739
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1740
    def get_ancestry_with_ghosts(self, versions):
1741
        """See VersionedFile.get_ancestry_with_ghosts."""
1742
        # get a graph of all the mentioned versions:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1743
        self.check_versions_present(versions)
1744
        cache = self._cache
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1745
        graph = {}
1746
        pending = set(versions)
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1747
        while pending:
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1748
            version = pending.pop()
1749
            try:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1750
                parents = cache[version][4]
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1751
            except KeyError:
1752
                # ghost, fake it
1753
                graph[version] = []
1754
            else:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1755
                # if not completed
1756
                pending.update([p for p in parents if p not in graph])
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1757
                graph[version] = parents
1758
        return topo_sort(graph.items())
1759
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1760
    def get_build_details(self, version_ids):
1761
        """Get the method, index_memo and compression parent for version_ids.
1762
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1763
        Ghosts are omitted from the result.
1764
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1765
        :param version_ids: An iterable of version_ids.
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
1766
        :return: A dict of version_id:(index_memo, compression_parent,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1767
                                       parents, record_details).
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
1768
            index_memo
1769
                opaque structure to pass to read_records to extract the raw
1770
                data
1771
            compression_parent
1772
                Content that this record is built upon, may be None
1773
            parents
1774
                Logical parents of this node
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1775
            record_details
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
1776
                extra information about the content which needs to be passed to
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1777
                Factory.parse_record
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1778
        """
1779
        result = {}
1780
        for version_id in version_ids:
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1781
            if version_id not in self._cache:
1782
                # ghosts are omitted
1783
                continue
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1784
            method = self.get_method(version_id)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
1785
            parents = self.get_parents_with_ghosts(version_id)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1786
            if method == 'fulltext':
1787
                compression_parent = None
1788
            else:
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
1789
                compression_parent = parents[0]
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1790
            noeol = 'no-eol' in self.get_options(version_id)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1791
            index_memo = self.get_position(version_id)
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
1792
            result[version_id] = (index_memo, compression_parent,
1793
                                  parents, (method, noeol))
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1794
        return result
1795
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1796
    def num_versions(self):
1797
        return len(self._history)
1798
1799
    __len__ = num_versions
1800
1801
    def get_versions(self):
2592.3.6 by Robert Collins
Implement KnitGraphIndex.get_versions.
1802
        """Get all the versions in the file. not topologically sorted."""
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1803
        return self._history
1804
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1805
    def _version_list_to_index(self, versions):
1806
        result_list = []
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1807
        cache = self._cache
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1808
        for version in versions:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1809
            if version in cache:
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1810
                # -- inlined lookup() --
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1811
                result_list.append(str(cache[version][5]))
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1812
                # -- end lookup () --
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1813
            else:
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1814
                result_list.append('.' + version)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1815
        return ' '.join(result_list)
1816
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1817
    def add_version(self, version_id, options, index_memo, parents):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1818
        """Add a version record to the index."""
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1819
        self.add_versions(((version_id, options, index_memo, parents),))
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1820
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1821
    def add_versions(self, versions, random_id=False):
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
1822
        """Add multiple versions to the index.
1823
        
1824
        :param versions: a list of tuples:
1825
                         (version_id, options, pos, size, parents).
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1826
        :param random_id: If True the ids being added were randomly generated
1827
            and no check for existence will be performed.
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
1828
        """
1829
        lines = []
2102.2.1 by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions
1830
        orig_history = self._history[:]
1831
        orig_cache = self._cache.copy()
1832
1833
        try:
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1834
            for version_id, options, (index, pos, size), parents in versions:
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1835
                line = "\n%s %s %s %s %s :" % (version_id,
2102.2.1 by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions
1836
                                               ','.join(options),
1837
                                               pos,
1838
                                               size,
1839
                                               self._version_list_to_index(parents))
1840
                lines.append(line)
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
1841
                self._cache_version(version_id, options, pos, size, tuple(parents))
2102.2.1 by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions
1842
            if not self._need_to_create:
1843
                self._transport.append_bytes(self._filename, ''.join(lines))
1844
            else:
1845
                sio = StringIO()
1846
                sio.write(self.HEADER)
1847
                sio.writelines(lines)
1848
                sio.seek(0)
1849
                self._transport.put_file_non_atomic(self._filename, sio,
1850
                                    create_parent_dir=self._create_parent_dir,
1851
                                    mode=self._file_mode,
1852
                                    dir_mode=self._dir_mode)
1853
                self._need_to_create = False
1854
        except:
1855
            # If any problems happen, restore the original values and re-raise
1856
            self._history = orig_history
1857
            self._cache = orig_cache
1858
            raise
1859
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1860
    def has_version(self, version_id):
1861
        """True if the version is in the index."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1862
        return version_id in self._cache
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1863
1864
    def get_position(self, version_id):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1865
        """Return details needed to access the version.
1866
        
1867
        .kndx indices do not support split-out data, so return None for the 
1868
        index field.
1869
1870
        :return: a tuple (None, data position, size) to hand to the access
1871
            logic to get the record.
1872
        """
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1873
        entry = self._cache[version_id]
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1874
        return None, entry[2], entry[3]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1875
1876
    def get_method(self, version_id):
1877
        """Return compression method of specified version."""
2592.3.97 by Robert Collins
Merge more bzr.dev, addressing some bugs. [still broken]
1878
        try:
1879
            options = self._cache[version_id][1]
1880
        except KeyError:
1881
            raise RevisionNotPresent(version_id, self._filename)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1882
        if 'fulltext' in options:
1883
            return 'fulltext'
1884
        else:
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1885
            if 'line-delta' not in options:
1886
                raise errors.KnitIndexUnknownMethod(self._full_path(), options)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1887
            return 'line-delta'
1888
1889
    def get_options(self, version_id):
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
1890
        """Return a list representing options.
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
1891
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
1892
        e.g. ['foo', 'bar']
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
1893
        """
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1894
        return self._cache[version_id][1]
1895
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
1896
    def get_parent_map(self, version_ids):
1897
        """Passed through to by KnitVersionedFile.get_parent_map."""
1898
        result = {}
1899
        for version_id in version_ids:
1900
            try:
1901
                result[version_id] = tuple(self._cache[version_id][4])
1902
            except KeyError:
1903
                pass
1904
        return result
1905
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1906
    def get_parents_with_ghosts(self, version_id):
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1907
        """Return parents of specified version with ghosts."""
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
1908
        try:
1909
            return self.get_parent_map([version_id])[version_id]
1910
        except KeyError:
1911
            raise RevisionNotPresent(version_id, self)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1912
1913
    def check_versions_present(self, version_ids):
1914
        """Check that all specified versions are present."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1915
        cache = self._cache
1916
        for version_id in version_ids:
1917
            if version_id not in cache:
1918
                raise RevisionNotPresent(version_id, self._filename)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1919
1920
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
1921
class KnitGraphIndex(object):
1922
    """A knit index that builds on GraphIndex."""
1923
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1924
    def __init__(self, graph_index, deltas=False, parents=True, add_callback=None):
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
1925
        """Construct a KnitGraphIndex on a graph_index.
1926
1927
        :param graph_index: An implementation of bzrlib.index.GraphIndex.
2592.3.13 by Robert Collins
Implement KnitGraphIndex.get_method.
1928
        :param deltas: Allow delta-compressed records.
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
1929
        :param add_callback: If not None, allow additions to the index and call
1930
            this callback with a list of added GraphIndex nodes:
2592.3.33 by Robert Collins
Change the order of index refs and values to make the no-graph knit index easier.
1931
            [(node, value, node_refs), ...]
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1932
        :param parents: If True, record knits parents, if not do not record 
1933
            parents.
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
1934
        """
1935
        self._graph_index = graph_index
2592.3.13 by Robert Collins
Implement KnitGraphIndex.get_method.
1936
        self._deltas = deltas
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
1937
        self._add_callback = add_callback
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1938
        self._parents = parents
1939
        if deltas and not parents:
1940
            raise KnitCorrupt(self, "Cannot do delta compression without "
1941
                "parent tracking.")
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
1942
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
1943
    def _check_write_ok(self):
1944
        pass
1945
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1946
    def _get_entries(self, keys, check_present=False):
1947
        """Get the entries for keys.
1948
        
1949
        :param keys: An iterable of index keys, - 1-tuples.
1950
        """
1951
        keys = set(keys)
2592.3.43 by Robert Collins
A knit iter_parents API.
1952
        found_keys = set()
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1953
        if self._parents:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1954
            for node in self._graph_index.iter_entries(keys):
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1955
                yield node
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1956
                found_keys.add(node[1])
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1957
        else:
1958
            # adapt parentless index to the rest of the code.
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1959
            for node in self._graph_index.iter_entries(keys):
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1960
                yield node[0], node[1], node[2], ()
1961
                found_keys.add(node[1])
2592.3.43 by Robert Collins
A knit iter_parents API.
1962
        if check_present:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1963
            missing_keys = keys.difference(found_keys)
2592.3.43 by Robert Collins
A knit iter_parents API.
1964
            if missing_keys:
1965
                raise RevisionNotPresent(missing_keys.pop(), self)
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
1966
1967
    def _present_keys(self, version_ids):
1968
        return set([
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1969
            node[1] for node in self._get_entries(version_ids)])
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
1970
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1971
    def _parentless_ancestry(self, versions):
1972
        """Honour the get_ancestry API for parentless knit indices."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1973
        wanted_keys = self._version_ids_to_keys(versions)
1974
        present_keys = self._present_keys(wanted_keys)
1975
        missing = set(wanted_keys).difference(present_keys)
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1976
        if missing:
1977
            raise RevisionNotPresent(missing.pop(), self)
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1978
        return list(self._keys_to_version_ids(present_keys))
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1979
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
1980
    def get_ancestry(self, versions, topo_sorted=True):
1981
        """See VersionedFile.get_ancestry."""
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1982
        if not self._parents:
1983
            return self._parentless_ancestry(versions)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
1984
        # XXX: This will do len(history) index calls - perhaps
1985
        # it should be altered to be a index core feature?
1986
        # get a graph of all the mentioned versions:
1987
        graph = {}
2592.3.30 by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits.
1988
        ghosts = set()
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1989
        versions = self._version_ids_to_keys(versions)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
1990
        pending = set(versions)
1991
        while pending:
1992
            # get all pending nodes
1993
            this_iteration = pending
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
1994
            new_nodes = self._get_entries(this_iteration)
2592.3.53 by Robert Collins
Remove usage of difference_update in knit.py.
1995
            found = set()
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
1996
            pending = set()
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1997
            for (index, key, value, node_refs) in new_nodes:
2592.3.30 by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits.
1998
                # dont ask for ghosties - otherwise
1999
                # we we can end up looping with pending
2000
                # being entirely ghosted.
2001
                graph[key] = [parent for parent in node_refs[0]
2002
                    if parent not in ghosts]
2592.3.53 by Robert Collins
Remove usage of difference_update in knit.py.
2003
                # queue parents
2004
                for parent in graph[key]:
2005
                    # dont examine known nodes again
2006
                    if parent in graph:
2007
                        continue
2008
                    pending.add(parent)
2009
                found.add(key)
2010
            ghosts.update(this_iteration.difference(found))
2592.3.30 by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits.
2011
        if versions.difference(graph):
2012
            raise RevisionNotPresent(versions.difference(graph).pop(), self)
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2013
        if topo_sorted:
2014
            result_keys = topo_sort(graph.items())
2015
        else:
2016
            result_keys = graph.iterkeys()
2017
        return [key[0] for key in result_keys]
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2018
2019
    def get_ancestry_with_ghosts(self, versions):
2020
        """See VersionedFile.get_ancestry."""
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2021
        if not self._parents:
2022
            return self._parentless_ancestry(versions)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2023
        # XXX: This will do len(history) index calls - perhaps
2024
        # it should be altered to be a index core feature?
2025
        # get a graph of all the mentioned versions:
2026
        graph = {}
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2027
        versions = self._version_ids_to_keys(versions)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2028
        pending = set(versions)
2029
        while pending:
2030
            # get all pending nodes
2031
            this_iteration = pending
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2032
            new_nodes = self._get_entries(this_iteration)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2033
            pending = set()
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
2034
            for (index, key, value, node_refs) in new_nodes:
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2035
                graph[key] = node_refs[0]
2036
                # queue parents 
2592.3.53 by Robert Collins
Remove usage of difference_update in knit.py.
2037
                for parent in graph[key]:
2038
                    # dont examine known nodes again
2039
                    if parent in graph:
2040
                        continue
2041
                    pending.add(parent)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2042
            missing_versions = this_iteration.difference(graph)
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2043
            missing_needed = versions.intersection(missing_versions)
2044
            if missing_needed:
2045
                raise RevisionNotPresent(missing_needed.pop(), self)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2046
            for missing_version in missing_versions:
2047
                # add a key, no parents
2048
                graph[missing_version] = []
2592.3.53 by Robert Collins
Remove usage of difference_update in knit.py.
2049
                pending.discard(missing_version) # don't look for it
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2050
        result_keys = topo_sort(graph.items())
2051
        return [key[0] for key in result_keys]
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2052
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2053
    def get_build_details(self, version_ids):
2054
        """Get the method, index_memo and compression parent for version_ids.
2055
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
2056
        Ghosts are omitted from the result.
2057
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2058
        :param version_ids: An iterable of version_ids.
3224.1.18 by John Arbash Meinel
Cleanup documentation
2059
        :return: A dict of version_id:(index_memo, compression_parent,
2060
                                       parents, record_details).
2061
            index_memo
2062
                opaque structure to pass to read_records to extract the raw
2063
                data
2064
            compression_parent
2065
                Content that this record is built upon, may be None
2066
            parents
2067
                Logical parents of this node
2068
            record_details
2069
                extra information about the content which needs to be passed to
2070
                Factory.parse_record
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2071
        """
2072
        result = {}
2073
        entries = self._get_entries(self._version_ids_to_keys(version_ids), True)
2074
        for entry in entries:
2075
            version_id = self._keys_to_version_ids((entry[1],))[0]
3224.1.27 by John Arbash Meinel
Handle when the knit index doesn't track parents.
2076
            if not self._parents:
2077
                parents = ()
2078
            else:
2079
                parents = self._keys_to_version_ids(entry[3][0])
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2080
            if not self._deltas:
2081
                compression_parent = None
2082
            else:
2083
                compression_parent_key = self._compression_parent(entry)
2084
                if compression_parent_key:
2085
                    compression_parent = self._keys_to_version_ids(
2086
                    (compression_parent_key,))[0]
2087
                else:
2088
                    compression_parent = None
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2089
            noeol = (entry[2][0] == 'N')
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2090
            if compression_parent:
2091
                method = 'line-delta'
2092
            else:
2093
                method = 'fulltext'
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
2094
            result[version_id] = (self._node_to_position(entry),
2095
                                  compression_parent, parents,
2096
                                  (method, noeol))
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2097
        return result
2098
2099
    def _compression_parent(self, an_entry):
2100
        # return the key that an_entry is compressed against, or None
2101
        # Grab the second parent list (as deltas implies parents currently)
2102
        compression_parents = an_entry[3][1]
2103
        if not compression_parents:
2104
            return None
2105
        return compression_parents[0]
2106
2107
    def _get_method(self, node):
2108
        if not self._deltas:
2109
            return 'fulltext'
2110
        if self._compression_parent(node):
2111
            return 'line-delta'
2112
        else:
2113
            return 'fulltext'
2114
2592.3.5 by Robert Collins
Implement KnitGraphIndex.num_versions.
2115
    def num_versions(self):
2116
        return len(list(self._graph_index.iter_all_entries()))
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
2117
2592.3.6 by Robert Collins
Implement KnitGraphIndex.get_versions.
2118
    __len__ = num_versions
2119
2120
    def get_versions(self):
2121
        """Get all the versions in the file. not topologically sorted."""
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
2122
        return [node[1][0] for node in self._graph_index.iter_all_entries()]
2592.3.6 by Robert Collins
Implement KnitGraphIndex.get_versions.
2123
    
2592.3.9 by Robert Collins
Implement KnitGraphIndex.has_version.
2124
    def has_version(self, version_id):
2125
        """True if the version is in the index."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2126
        return len(self._present_keys(self._version_ids_to_keys([version_id]))) == 1
2127
2128
    def _keys_to_version_ids(self, keys):
2129
        return tuple(key[0] for key in keys)
2592.3.6 by Robert Collins
Implement KnitGraphIndex.get_versions.
2130
2592.3.10 by Robert Collins
Implement KnitGraphIndex.get_position.
2131
    def get_position(self, version_id):
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2132
        """Return details needed to access the version.
2133
        
2134
        :return: a tuple (index, data position, size) to hand to the access
2135
            logic to get the record.
2136
        """
2137
        node = self._get_node(version_id)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2138
        return self._node_to_position(node)
2139
2140
    def _node_to_position(self, node):
2141
        """Convert an index value to position details."""
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2142
        bits = node[2][1:].split(' ')
2143
        return node[0], int(bits[0]), int(bits[1])
2592.3.10 by Robert Collins
Implement KnitGraphIndex.get_position.
2144
2592.3.11 by Robert Collins
Implement KnitGraphIndex.get_method.
2145
    def get_method(self, version_id):
2146
        """Return compression method of specified version."""
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2147
        return self._get_method(self._get_node(version_id))
2592.3.11 by Robert Collins
Implement KnitGraphIndex.get_method.
2148
2149
    def _get_node(self, version_id):
2592.3.97 by Robert Collins
Merge more bzr.dev, addressing some bugs. [still broken]
2150
        try:
2151
            return list(self._get_entries(self._version_ids_to_keys([version_id])))[0]
2152
        except IndexError:
2153
            raise RevisionNotPresent(version_id, self)
2592.3.11 by Robert Collins
Implement KnitGraphIndex.get_method.
2154
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
2155
    def get_options(self, version_id):
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2156
        """Return a list representing options.
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
2157
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2158
        e.g. ['foo', 'bar']
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
2159
        """
2160
        node = self._get_node(version_id)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2161
        options = [self._get_method(node)]
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
2162
        if node[2][0] == 'N':
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
2163
            options.append('no-eol')
2658.2.1 by Robert Collins
Fix mismatch between KnitGraphIndex and KnitIndex in get_options.
2164
        return options
2592.3.11 by Robert Collins
Implement KnitGraphIndex.get_method.
2165
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2166
    def get_parent_map(self, version_ids):
2167
        """Passed through to by KnitVersionedFile.get_parent_map."""
2168
        nodes = self._get_entries(self._version_ids_to_keys(version_ids))
2169
        result = {}
2170
        if self._parents:
2171
            for node in nodes:
2172
                result[node[1][0]] = self._keys_to_version_ids(node[3][0])
2173
        else:
2174
            for node in nodes:
2175
                result[node[1][0]] = ()
2176
        return result
2177
2592.3.15 by Robert Collins
Implement KnitGraphIndex.get_parents/get_parents_with_ghosts.
2178
    def get_parents_with_ghosts(self, version_id):
2179
        """Return parents of specified version with ghosts."""
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2180
        try:
2181
            return self.get_parent_map([version_id])[version_id]
2182
        except KeyError:
2183
            raise RevisionNotPresent(version_id, self)
2592.3.15 by Robert Collins
Implement KnitGraphIndex.get_parents/get_parents_with_ghosts.
2184
2592.3.16 by Robert Collins
Implement KnitGraphIndex.check_versions_present.
2185
    def check_versions_present(self, version_ids):
2186
        """Check that all specified versions are present."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2187
        keys = self._version_ids_to_keys(version_ids)
2188
        present = self._present_keys(keys)
2189
        missing = keys.difference(present)
2592.3.16 by Robert Collins
Implement KnitGraphIndex.check_versions_present.
2190
        if missing:
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
2191
            raise RevisionNotPresent(missing.pop(), self)
2592.3.16 by Robert Collins
Implement KnitGraphIndex.check_versions_present.
2192
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2193
    def add_version(self, version_id, options, access_memo, parents):
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2194
        """Add a version record to the index."""
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2195
        return self.add_versions(((version_id, options, access_memo, parents),))
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2196
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
2197
    def add_versions(self, versions, random_id=False):
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2198
        """Add multiple versions to the index.
2199
        
2200
        This function does not insert data into the Immutable GraphIndex
2201
        backing the KnitGraphIndex, instead it prepares data for insertion by
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
2202
        the caller and checks that it is safe to insert then calls
2203
        self._add_callback with the prepared GraphIndex nodes.
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2204
2205
        :param versions: a list of tuples:
2206
                         (version_id, options, pos, size, parents).
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
2207
        :param random_id: If True the ids being added were randomly generated
2208
            and no check for existence will be performed.
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2209
        """
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
2210
        if not self._add_callback:
2211
            raise errors.ReadOnlyError(self)
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2212
        # we hope there are no repositories with inconsistent parentage
2213
        # anymore.
2214
        # check for dups
2215
2216
        keys = {}
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2217
        for (version_id, options, access_memo, parents) in versions:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2218
            index, pos, size = access_memo
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2219
            key = (version_id, )
2220
            parents = tuple((parent, ) for parent in parents)
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2221
            if 'no-eol' in options:
2222
                value = 'N'
2223
            else:
2224
                value = ' '
2225
            value += "%d %d" % (pos, size)
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2226
            if not self._deltas:
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2227
                if 'line-delta' in options:
2228
                    raise KnitCorrupt(self, "attempt to add line-delta in non-delta knit")
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2229
            if self._parents:
2230
                if self._deltas:
2231
                    if 'line-delta' in options:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2232
                        node_refs = (parents, (parents[0],))
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2233
                    else:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2234
                        node_refs = (parents, ())
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2235
                else:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2236
                    node_refs = (parents, )
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2237
            else:
2238
                if parents:
2239
                    raise KnitCorrupt(self, "attempt to add node with parents "
2240
                        "in parentless index.")
2241
                node_refs = ()
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2242
            keys[key] = (value, node_refs)
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
2243
        if not random_id:
2244
            present_nodes = self._get_entries(keys)
2245
            for (index, key, value, node_refs) in present_nodes:
2246
                if (value, node_refs) != keys[key]:
2247
                    raise KnitCorrupt(self, "inconsistent details in add_versions"
2248
                        ": %s %s" % ((value, node_refs), keys[key]))
2249
                del keys[key]
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2250
        result = []
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2251
        if self._parents:
2252
            for key, (value, node_refs) in keys.iteritems():
2253
                result.append((key, value, node_refs))
2254
        else:
2255
            for key, (value, node_refs) in keys.iteritems():
2256
                result.append((key, value))
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
2257
        self._add_callback(result)
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2258
        
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2259
    def _version_ids_to_keys(self, version_ids):
2260
        return set((version_id, ) for version_id in version_ids)
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2261
2262
2263
class _KnitAccess(object):
2264
    """Access to knit records in a .knit file."""
2265
2266
    def __init__(self, transport, filename, _file_mode, _dir_mode,
2267
        _need_to_create, _create_parent_dir):
2268
        """Create a _KnitAccess for accessing and inserting data.
2269
2270
        :param transport: The transport the .knit is located on.
2271
        :param filename: The filename of the .knit.
2272
        """
2273
        self._transport = transport
2274
        self._filename = filename
2275
        self._file_mode = _file_mode
2276
        self._dir_mode = _dir_mode
2277
        self._need_to_create = _need_to_create
2278
        self._create_parent_dir = _create_parent_dir
2279
3468.2.1 by Martin Pool
Add _KnitAccess repr and remove dead import
2280
    def __repr__(self):
2281
        try:
2282
            return "%s(%r)" % (self.__class__.__name__,
2283
                self._transport.abspath(self._filename))
2284
        except:
2285
            return "_KnitAccess(**unprintable**)"
2286
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2287
    def add_raw_records(self, sizes, raw_data):
2288
        """Add raw knit bytes to a storage area.
2289
2290
        The data is spooled to whereever the access method is storing data.
2291
2292
        :param sizes: An iterable containing the size of each raw data segment.
2293
        :param raw_data: A bytestring containing the data.
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2294
        :return: A list of memos to retrieve the record later. Each memo is a
2295
            tuple - (index, pos, length), where the index field is always None
2296
            for the .knit access method.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2297
        """
2298
        if not self._need_to_create:
2299
            base = self._transport.append_bytes(self._filename, raw_data)
2300
        else:
2301
            self._transport.put_bytes_non_atomic(self._filename, raw_data,
2302
                                   create_parent_dir=self._create_parent_dir,
2303
                                   mode=self._file_mode,
2304
                                   dir_mode=self._dir_mode)
2305
            self._need_to_create = False
2306
            base = 0
2307
        result = []
2308
        for size in sizes:
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2309
            result.append((None, base, size))
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2310
            base += size
2311
        return result
2312
2313
    def create(self):
2314
        """IFF this data access has its own storage area, initialise it.
2315
2316
        :return: None.
2317
        """
2318
        self._transport.put_bytes_non_atomic(self._filename, '',
2319
                                             mode=self._file_mode)
2320
2321
    def open_file(self):
2322
        """IFF this data access can be represented as a single file, open it.
2323
2324
        For knits that are not mapped to a single file on disk this will
2325
        always return None.
2326
2327
        :return: None or a file handle.
2328
        """
2329
        try:
2330
            return self._transport.get(self._filename)
2331
        except NoSuchFile:
2332
            pass
2333
        return None
2334
2335
    def get_raw_records(self, memos_for_retrieval):
2336
        """Get the raw bytes for a records.
2337
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2338
        :param memos_for_retrieval: An iterable containing the (index, pos, 
2339
            length) memo for retrieving the bytes. The .knit method ignores
2340
            the index as there is always only a single file.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2341
        :return: An iterator over the bytes of the records.
2342
        """
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2343
        read_vector = [(pos, size) for (index, pos, size) in memos_for_retrieval]
2344
        for pos, data in self._transport.readv(self._filename, read_vector):
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2345
            yield data
2346
2347
2348
class _PackAccess(object):
2349
    """Access to knit records via a collection of packs."""
2350
2351
    def __init__(self, index_to_packs, writer=None):
2352
        """Create a _PackAccess object.
2353
2354
        :param index_to_packs: A dict mapping index objects to the transport
2355
            and file names for obtaining data.
2356
        :param writer: A tuple (pack.ContainerWriter, write_index) which
2670.2.3 by Robert Collins
Review feedback.
2357
            contains the pack to write, and the index that reads from it will
2358
            be associated with.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2359
        """
2360
        if writer:
2361
            self.container_writer = writer[0]
2362
            self.write_index = writer[1]
2363
        else:
2364
            self.container_writer = None
2365
            self.write_index = None
2366
        self.indices = index_to_packs
2367
2368
    def add_raw_records(self, sizes, raw_data):
2369
        """Add raw knit bytes to a storage area.
2370
2670.2.3 by Robert Collins
Review feedback.
2371
        The data is spooled to the container writer in one bytes-record per
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2372
        raw data item.
2373
2374
        :param sizes: An iterable containing the size of each raw data segment.
2375
        :param raw_data: A bytestring containing the data.
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2376
        :return: A list of memos to retrieve the record later. Each memo is a
2377
            tuple - (index, pos, length), where the index field is the 
2378
            write_index object supplied to the PackAccess object.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2379
        """
2380
        result = []
2381
        offset = 0
2382
        for size in sizes:
2383
            p_offset, p_length = self.container_writer.add_bytes_record(
2384
                raw_data[offset:offset+size], [])
2385
            offset += size
2386
            result.append((self.write_index, p_offset, p_length))
2387
        return result
2388
2389
    def create(self):
2390
        """Pack based knits do not get individually created."""
2391
2392
    def get_raw_records(self, memos_for_retrieval):
2393
        """Get the raw bytes for a records.
2394
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2395
        :param memos_for_retrieval: An iterable containing the (index, pos, 
2396
            length) memo for retrieving the bytes. The Pack access method
2397
            looks up the pack to use for a given record in its index_to_pack
2398
            map.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2399
        :return: An iterator over the bytes of the records.
2400
        """
2401
        # first pass, group into same-index requests
2402
        request_lists = []
2403
        current_index = None
2404
        for (index, offset, length) in memos_for_retrieval:
2405
            if current_index == index:
2406
                current_list.append((offset, length))
2407
            else:
2408
                if current_index is not None:
2409
                    request_lists.append((current_index, current_list))
2410
                current_index = index
2411
                current_list = [(offset, length)]
2412
        # handle the last entry
2413
        if current_index is not None:
2414
            request_lists.append((current_index, current_list))
2415
        for index, offsets in request_lists:
2416
            transport, path = self.indices[index]
2417
            reader = pack.make_readv_reader(transport, path, offsets)
2418
            for names, read_func in reader.iter_records():
2419
                yield read_func(None)
2420
2421
    def open_file(self):
2422
        """Pack based knits have no single file."""
2423
        return None
2424
2592.3.70 by Robert Collins
Allow setting a writer after creating a knit._PackAccess object.
2425
    def set_writer(self, writer, index, (transport, packname)):
2426
        """Set a writer to use for adding data."""
2592.3.208 by Robert Collins
Start refactoring the knit-pack thunking to be clearer.
2427
        if index is not None:
2428
            self.indices[index] = (transport, packname)
2592.3.70 by Robert Collins
Allow setting a writer after creating a knit._PackAccess object.
2429
        self.container_writer = writer
2430
        self.write_index = index
2431
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2432
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2433
class _StreamAccess(object):
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2434
    """A Knit Access object that provides data from a datastream.
2435
    
2436
    It also provides a fallback to present as unannotated data, annotated data
2437
    from a *backing* access object.
2438
2439
    This is triggered by a index_memo which is pointing to a different index
2440
    than this was constructed with, and is used to allow extracting full
2441
    unannotated texts for insertion into annotated knits.
2442
    """
2443
2444
    def __init__(self, reader_callable, stream_index, backing_knit,
2445
        orig_factory):
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2446
        """Create a _StreamAccess object.
2447
2448
        :param reader_callable: The reader_callable from the datastream.
2449
            This is called to buffer all the data immediately, for 
2450
            random access.
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2451
        :param stream_index: The index the data stream this provides access to
2452
            which will be present in native index_memo's.
2453
        :param backing_knit: The knit object that will provide access to 
2454
            annotated texts which are not available in the stream, so as to
2455
            create unannotated texts.
2456
        :param orig_factory: The original content factory used to generate the
2457
            stream. This is used for checking whether the thunk code for
2458
            supporting _copy_texts will generate the correct form of data.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2459
        """
2460
        self.data = reader_callable(None)
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2461
        self.stream_index = stream_index
2462
        self.backing_knit = backing_knit
2463
        self.orig_factory = orig_factory
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2464
2465
    def get_raw_records(self, memos_for_retrieval):
2466
        """Get the raw bytes for a records.
2467
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2468
        :param memos_for_retrieval: An iterable of memos from the
2469
            _StreamIndex object identifying bytes to read; for these classes
2470
            they are (from_backing_knit, index, start, end) and can point to
2471
            either the backing knit or streamed data.
2472
        :return: An iterator yielding a byte string for each record in 
2473
            memos_for_retrieval.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2474
        """
2475
        # use a generator for memory friendliness
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2476
        for from_backing_knit, version_id, start, end in memos_for_retrieval:
2477
            if not from_backing_knit:
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
2478
                if version_id is not self.stream_index:
2479
                    raise AssertionError()
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2480
                yield self.data[start:end]
2481
                continue
2482
            # we have been asked to thunk. This thunking only occurs when
2483
            # we are obtaining plain texts from an annotated backing knit
2484
            # so that _copy_texts will work.
2485
            # We could improve performance here by scanning for where we need
2486
            # to do this and using get_line_list, then interleaving the output
2487
            # as desired. However, for now, this is sufficient.
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2488
            if self.orig_factory.__class__ != KnitPlainFactory:
2489
                raise errors.KnitCorrupt(
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2490
                    self, 'Bad thunk request %r cannot be backed by %r' %
2491
                        (version_id, self.orig_factory))
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2492
            lines = self.backing_knit.get_lines(version_id)
2493
            line_bytes = ''.join(lines)
2494
            digest = sha_string(line_bytes)
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2495
            # the packed form of the fulltext always has a trailing newline,
2496
            # even if the actual text does not, unless the file is empty.  the
2497
            # record options including the noeol flag are passed through by
2498
            # _StreamIndex, so this is safe.
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2499
            if lines:
2500
                if lines[-1][-1] != '\n':
2501
                    lines[-1] = lines[-1] + '\n'
2502
                    line_bytes += '\n'
2503
            # We want plain data, because we expect to thunk only to allow text
2504
            # extraction.
2505
            size, bytes = self.backing_knit._data._record_to_data(version_id,
2506
                digest, lines, line_bytes)
2507
            yield bytes
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2508
2509
2510
class _StreamIndex(object):
2511
    """A Knit Index object that uses the data map from a datastream."""
2512
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2513
    def __init__(self, data_list, backing_index):
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2514
        """Create a _StreamIndex object.
2515
2516
        :param data_list: The data_list from the datastream.
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2517
        :param backing_index: The index which will supply values for nodes
2518
            referenced outside of this stream.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2519
        """
2520
        self.data_list = data_list
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2521
        self.backing_index = backing_index
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2522
        self._by_version = {}
2523
        pos = 0
2524
        for key, options, length, parents in data_list:
2525
            self._by_version[key] = options, (pos, pos + length), parents
2526
            pos += length
2527
2528
    def get_ancestry(self, versions, topo_sorted):
2529
        """Get an ancestry list for versions."""
2530
        if topo_sorted:
2531
            # Not needed for basic joins
2532
            raise NotImplementedError(self.get_ancestry)
2533
        # get a graph of all the mentioned versions:
2534
        # Little ugly - basically copied from KnitIndex, but don't want to
2535
        # accidentally incorporate too much of that index's code.
3052.2.4 by Andrew Bennetts
Some tweaks suggested by John's review.
2536
        ancestry = set()
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2537
        pending = set(versions)
2538
        cache = self._by_version
2539
        while pending:
2540
            version = pending.pop()
2541
            # trim ghosts
2542
            try:
2543
                parents = [p for p in cache[version][2] if p in cache]
2544
            except KeyError:
2545
                raise RevisionNotPresent(version, self)
2546
            # if not completed and not a ghost
3052.2.4 by Andrew Bennetts
Some tweaks suggested by John's review.
2547
            pending.update([p for p in parents if p not in ancestry])
2548
            ancestry.add(version)
2549
        return list(ancestry)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2550
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2551
    def get_build_details(self, version_ids):
2552
        """Get the method, index_memo and compression parent for version_ids.
2553
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
2554
        Ghosts are omitted from the result.
2555
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2556
        :param version_ids: An iterable of version_ids.
3224.1.18 by John Arbash Meinel
Cleanup documentation
2557
        :return: A dict of version_id:(index_memo, compression_parent,
2558
                                       parents, record_details).
2559
            index_memo
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2560
                opaque memo that can be passed to _StreamAccess.read_records
2561
                to extract the raw data; for these classes it is
2562
                (from_backing_knit, index, start, end) 
3224.1.18 by John Arbash Meinel
Cleanup documentation
2563
            compression_parent
2564
                Content that this record is built upon, may be None
2565
            parents
2566
                Logical parents of this node
2567
            record_details
2568
                extra information about the content which needs to be passed to
2569
                Factory.parse_record
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2570
        """
2571
        result = {}
2572
        for version_id in version_ids:
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
2573
            try:
2574
                method = self.get_method(version_id)
2575
            except errors.RevisionNotPresent:
2576
                # ghosts are omitted
2577
                continue
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
2578
            parent_ids = self.get_parents_with_ghosts(version_id)
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2579
            noeol = ('no-eol' in self.get_options(version_id))
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2580
            index_memo = self.get_position(version_id)
2581
            from_backing_knit = index_memo[0]
2582
            if from_backing_knit:
2583
                # texts retrieved from the backing knit are always full texts
2584
                method = 'fulltext'
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2585
            if method == 'fulltext':
2586
                compression_parent = None
2587
            else:
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
2588
                compression_parent = parent_ids[0]
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
2589
            result[version_id] = (index_memo, compression_parent,
2590
                                  parent_ids, (method, noeol))
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2591
        return result
2592
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2593
    def get_method(self, version_id):
2594
        """Return compression method of specified version."""
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2595
        options = self.get_options(version_id)
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2596
        if 'fulltext' in options:
2597
            return 'fulltext'
2598
        elif 'line-delta' in options:
2599
            return 'line-delta'
2600
        else:
2601
            raise errors.KnitIndexUnknownMethod(self, options)
2602
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2603
    def get_options(self, version_id):
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2604
        """Return a list representing options.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2605
2606
        e.g. ['foo', 'bar']
2607
        """
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2608
        try:
2609
            return self._by_version[version_id][0]
2610
        except KeyError:
3287.7.6 by Andrew Bennetts
Tweaks suggested by Robert's review.
2611
            options = list(self.backing_index.get_options(version_id))
2612
            if 'fulltext' in options:
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2613
                pass
3287.7.6 by Andrew Bennetts
Tweaks suggested by Robert's review.
2614
            elif 'line-delta' in options:
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2615
                # Texts from the backing knit are always returned from the stream
2616
                # as full texts
3287.7.6 by Andrew Bennetts
Tweaks suggested by Robert's review.
2617
                options.remove('line-delta')
2618
                options.append('fulltext')
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2619
            else:
2620
                raise errors.KnitIndexUnknownMethod(self, options)
3287.7.6 by Andrew Bennetts
Tweaks suggested by Robert's review.
2621
            return tuple(options)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2622
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2623
    def get_parent_map(self, version_ids):
2624
        """Passed through to by KnitVersionedFile.get_parent_map."""
2625
        result = {}
2626
        pending_ids = set()
2627
        for version_id in version_ids:
2628
            try:
2629
                result[version_id] = self._by_version[version_id][2]
2630
            except KeyError:
2631
                pending_ids.add(version_id)
2632
        result.update(self.backing_index.get_parent_map(pending_ids))
2633
        return result
2634
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2635
    def get_parents_with_ghosts(self, version_id):
2636
        """Return parents of specified version with ghosts."""
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
2637
        try:
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2638
            return self.get_parent_map([version_id])[version_id]
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
2639
        except KeyError:
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2640
            raise RevisionNotPresent(version_id, self)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2641
2642
    def get_position(self, version_id):
2643
        """Return details needed to access the version.
2644
        
2645
        _StreamAccess has the data as a big array, so we return slice
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2646
        coordinates into that (as index_memo's are opaque outside the
2647
        index and matching access class).
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2648
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2649
        :return: a tuple (from_backing_knit, index, start, end) that can 
2650
            be passed e.g. to get_raw_records.  
2651
            If from_backing_knit is False, index will be self, otherwise it
2652
            will be a version id.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2653
        """
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2654
        try:
2655
            start, end = self._by_version[version_id][1]
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2656
            return False, self, start, end
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2657
        except KeyError:
2658
            # Signal to the access object to handle this from the backing knit.
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2659
            return (True, version_id, None, None)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2660
2661
    def get_versions(self):
2662
        """Get all the versions in the stream."""
2663
        return self._by_version.keys()
2664
2665
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2666
class _KnitData(object):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2667
    """Manage extraction of data from a KnitAccess, caching and decompressing.
2668
    
2669
    The KnitData class provides the logic for parsing and using knit records,
2670
    making use of an access method for the low level read and write operations.
2671
    """
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2672
2673
    def __init__(self, access):
2674
        """Create a KnitData object.
2675
2676
        :param access: The access method to use. Access methods such as
2677
            _KnitAccess manage the insertion of raw records and the subsequent
2678
            retrieval of the same.
2679
        """
2680
        self._access = access
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2681
        self._checked = False
2682
2683
    def _open_file(self):
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2684
        return self._access.open_file()
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2685
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
2686
    def _record_to_data(self, version_id, digest, lines, dense_lines=None):
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2687
        """Convert version_id, digest, lines into a raw data block.
2688
        
2888.1.2 by Robert Collins
Cleanup the dense_lines parameter docstring to be more useful.
2689
        :param dense_lines: The bytes of lines but in a denser form. For
2690
            instance, if lines is a list of 1000 bytestrings each ending in \n,
2691
            dense_lines may be a list with one line in it, containing all the
2692
            1000's lines and their \n's. Using dense_lines if it is already
2693
            known is a win because the string join to create bytes in this
2694
            function spends less time resizing the final string.
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2695
        :return: (len, a StringIO instance with the raw data ready to read.)
2696
        """
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
2697
        # Note: using a string copy here increases memory pressure with e.g.
2698
        # ISO's, but it is about 3 seconds faster on a 1.2Ghz intel machine
2699
        # when doing the initial commit of a mozilla tree. RBC 20070921
2700
        bytes = ''.join(chain(
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
2701
            ["version %s %d %s\n" % (version_id,
1596.2.28 by Robert Collins
more knit profile based tuning.
2702
                                     len(lines),
2703
                                     digest)],
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
2704
            dense_lines or lines,
2705
            ["end %s\n" % version_id]))
2817.3.1 by Robert Collins
* New helper ``bzrlib.tuned_gzip.bytes_to_gzip`` which takes a byte string
2706
        compressed_bytes = bytes_to_gzip(bytes)
2707
        return len(compressed_bytes), compressed_bytes
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2708
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2709
    def add_raw_records(self, sizes, raw_data):
1692.4.1 by Robert Collins
Multiple merges:
2710
        """Append a prepared record to the data file.
2329.1.2 by John Arbash Meinel
Remove some spurious whitespace changes.
2711
        
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2712
        :param sizes: An iterable containing the size of each raw data segment.
2713
        :param raw_data: A bytestring containing the data.
2714
        :return: a list of index data for the way the data was stored.
2715
            See the access method add_raw_records documentation for more
2716
            details.
1692.4.1 by Robert Collins
Multiple merges:
2717
        """
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2718
        return self._access.add_raw_records(sizes, raw_data)
2329.1.2 by John Arbash Meinel
Remove some spurious whitespace changes.
2719
        
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2720
    def _parse_record_header(self, version_id, raw_data):
2721
        """Parse a record header for consistency.
2722
2723
        :return: the header and the decompressor stream.
2724
                 as (stream, header_record)
2725
        """
2726
        df = GzipFile(mode='rb', fileobj=StringIO(raw_data))
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2727
        try:
2728
            rec = self._check_header(version_id, df.readline())
2358.3.4 by Martin Pool
Fix mangled knit.py changes
2729
        except Exception, e:
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2730
            raise KnitCorrupt(self._access,
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2731
                              "While reading {%s} got %s(%s)"
2732
                              % (version_id, e.__class__.__name__, str(e)))
2358.3.4 by Martin Pool
Fix mangled knit.py changes
2733
        return df, rec
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2734
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2735
    def _split_header(self, line):
2358.3.4 by Martin Pool
Fix mangled knit.py changes
2736
        rec = line.split()
2737
        if len(rec) != 4:
2738
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2739
                              'unexpected number of elements in record header')
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2740
        return rec
2741
2742
    def _check_header_version(self, rec, version_id):
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
2743
        if rec[1] != version_id:
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2744
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2745
                              'unexpected version, wanted %r, got %r'
2746
                              % (version_id, rec[1]))
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2747
2748
    def _check_header(self, version_id, line):
2749
        rec = self._split_header(line)
2750
        self._check_header_version(rec, version_id)
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2751
        return rec
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2752
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2753
    def _parse_record_unchecked(self, data):
1628.1.2 by Robert Collins
More knit micro-optimisations.
2754
        # profiling notes:
2755
        # 4168 calls in 2880 217 internal
2756
        # 4168 calls to _parse_record_header in 2121
2757
        # 4168 calls to readlines in 330
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2758
        df = GzipFile(mode='rb', fileobj=StringIO(data))
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2759
        try:
2760
            record_contents = df.readlines()
2358.3.4 by Martin Pool
Fix mangled knit.py changes
2761
        except Exception, e:
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2762
            raise KnitCorrupt(self._access, "Corrupt compressed record %r, got %s(%s)" %
2763
                (data, e.__class__.__name__, str(e)))
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2764
        header = record_contents.pop(0)
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2765
        rec = self._split_header(header)
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2766
        last_line = record_contents.pop()
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2767
        if len(record_contents) != int(rec[2]):
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2768
            raise KnitCorrupt(self._access,
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2769
                              'incorrect number of lines %s != %s'
2770
                              ' for version {%s}'
2771
                              % (len(record_contents), int(rec[2]),
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2772
                                 rec[1]))
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2773
        if last_line != 'end %s\n' % rec[1]:
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2774
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2775
                              'unexpected version end line %r, wanted %r' 
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2776
                              % (last_line, rec[1]))
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2777
        df.close()
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2778
        return rec, record_contents
2779
2780
    def _parse_record(self, version_id, data):
2781
        rec, record_contents = self._parse_record_unchecked(data)
2782
        self._check_header_version(rec, version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2783
        return record_contents, rec[3]
2784
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2785
    def read_records_iter_raw(self, records):
2786
        """Read text records from data file and yield raw data.
2787
2788
        This unpacks enough of the text record to validate the id is
2789
        as expected but thats all.
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
2790
2791
        Each item the iterator yields is (version_id, bytes,
2792
        sha1_of_full_text).
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2793
        """
2794
        # setup an iterator of the external records:
2795
        # uses readv so nice and fast we hope.
1756.3.23 by Aaron Bentley
Remove knit caches
2796
        if len(records):
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2797
            # grab the disk data needed.
3316.2.11 by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated.
2798
            needed_offsets = [index_memo for version_id, index_memo
2799
                                           in records]
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2800
            raw_records = self._access.get_raw_records(needed_offsets)
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2801
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2802
        for version_id, index_memo in records:
3316.2.11 by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated.
2803
            data = raw_records.next()
2804
            # validate the header
2805
            df, rec = self._parse_record_header(version_id, data)
2806
            df.close()
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
2807
            yield version_id, data, rec[3]
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2808
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2809
    def read_records_iter(self, records):
2810
        """Read text records from data file and yield result.
2811
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2812
        The result will be returned in whatever is the fastest to read.
2813
        Not by the order requested. Also, multiple requests for the same
2814
        record will only yield 1 response.
2815
        :param records: A list of (version_id, pos, len) entries
2816
        :return: Yields (version_id, contents, digest) in the order
2817
                 read, not the order requested
2818
        """
2819
        if not records:
2820
            return
2821
3316.2.11 by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated.
2822
        needed_records = sorted(set(records), key=operator.itemgetter(1))
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2823
        if not needed_records:
2824
            return
2825
2826
        # The transport optimizes the fetching as well 
2827
        # (ie, reads continuous ranges.)
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2828
        raw_data = self._access.get_raw_records(
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2829
            [index_memo for version_id, index_memo in needed_records])
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2830
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2831
        for (version_id, index_memo), data in \
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2832
                izip(iter(needed_records), raw_data):
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2833
            content, digest = self._parse_record(version_id, data)
1756.3.23 by Aaron Bentley
Remove knit caches
2834
            yield version_id, content, digest
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2835
2836
    def read_records(self, records):
2837
        """Read records into a dictionary."""
2838
        components = {}
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2839
        for record_id, content, digest in \
1863.1.9 by John Arbash Meinel
Switching to have 'read_records_iter' return in random order.
2840
                self.read_records_iter(records):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2841
            components[record_id] = (content, digest)
2842
        return components
2843
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2844
2845
class InterKnit(InterVersionedFile):
2846
    """Optimised code paths for knit to knit operations."""
2847
    
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
2848
    _matching_file_from_factory = staticmethod(make_file_knit)
2849
    _matching_file_to_factory = staticmethod(make_file_knit)
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2850
    
2851
    @staticmethod
2852
    def is_compatible(source, target):
2853
        """Be compatible with knits.  """
2854
        try:
2855
            return (isinstance(source, KnitVersionedFile) and
2856
                    isinstance(target, KnitVersionedFile))
2857
        except AttributeError:
2858
            return False
2859
2998.2.3 by John Arbash Meinel
Respond to Aaron's requests
2860
    def _copy_texts(self, pb, msg, version_ids, ignore_missing=False):
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
2861
        """Copy texts to the target by extracting and adding them one by one.
2862
2863
        see join() for the parameter definitions.
2864
        """
2865
        version_ids = self._get_source_version_ids(version_ids, ignore_missing)
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
2866
        # --- the below is factorable out with VersionedFile.join, but wait for
2867
        # VersionedFiles, it may all be simpler then.
2868
        graph = Graph(self.source)
2869
        search = graph._make_breadth_first_searcher(version_ids)
2870
        transitive_ids = set()
2871
        map(transitive_ids.update, list(search))
2872
        parent_map = self.source.get_parent_map(transitive_ids)
2873
        order = topo_sort(parent_map.items())
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
2874
2875
        def size_of_content(content):
2876
            return sum(len(line) for line in content.text())
2877
        # Cache at most 10MB of parent texts
2878
        parent_cache = lru_cache.LRUSizeCache(max_size=10*1024*1024,
2879
                                              compute_size=size_of_content)
2880
        # TODO: jam 20071116 It would be nice to have a streaming interface to
2881
        #       get multiple texts from a source. The source could be smarter
2882
        #       about how it handled intermediate stages.
2998.2.3 by John Arbash Meinel
Respond to Aaron's requests
2883
        #       get_line_list() or make_mpdiffs() seem like a possibility, but
2884
        #       at the moment they extract all full texts into memory, which
2885
        #       causes us to store more than our 3x fulltext goal.
2886
        #       Repository.iter_files_bytes() may be another possibility
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
2887
        to_process = [version for version in order
2888
                               if version not in self.target]
2889
        total = len(to_process)
2890
        pb = ui.ui_factory.nested_progress_bar()
2891
        try:
2892
            for index, version in enumerate(to_process):
2893
                pb.update('Converting versioned data', index, total)
2894
                sha1, num_bytes, parent_text = self.target.add_lines(version,
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2895
                    self.source.get_parents_with_ghosts(version),
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
2896
                    self.source.get_lines(version),
2897
                    parent_texts=parent_cache)
2898
                parent_cache[version] = parent_text
2899
        finally:
2900
            pb.finished()
2901
        return total
2902
1563.2.31 by Robert Collins
Convert Knit repositories to use knits.
2903
    def join(self, pb=None, msg=None, version_ids=None, ignore_missing=False):
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2904
        """See InterVersionedFile.join."""
2851.4.3 by Ian Clatworthy
fix up plain-to-annotated knit conversion
2905
        # If the source and target are mismatched w.r.t. annotations vs
2906
        # plain, the data needs to be converted accordingly
2907
        if self.source.factory.annotated == self.target.factory.annotated:
2908
            converter = None
2909
        elif self.source.factory.annotated:
2910
            converter = self._anno_to_plain_converter
2911
        else:
2998.2.3 by John Arbash Meinel
Respond to Aaron's requests
2912
            # We're converting from a plain to an annotated knit. Copy them
2913
            # across by full texts.
2914
            return self._copy_texts(pb, msg, version_ids, ignore_missing)
2851.4.3 by Ian Clatworthy
fix up plain-to-annotated knit conversion
2915
1684.3.2 by Robert Collins
Factor out version_ids-to-join selection in InterVersionedfile.
2916
        version_ids = self._get_source_version_ids(version_ids, ignore_missing)
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2917
        if not version_ids:
2918
            return 0
2919
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
2920
        pb = ui.ui_factory.nested_progress_bar()
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2921
        try:
2922
            version_ids = list(version_ids)
2923
            if None in version_ids:
2924
                version_ids.remove(None)
2925
    
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2926
            self.source_ancestry = set(self.source.get_ancestry(version_ids,
2927
                topo_sorted=False))
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2928
            this_versions = set(self.target._index.get_versions())
2825.4.1 by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some
2929
            # XXX: For efficiency we should not look at the whole index,
2930
            #      we only need to consider the referenced revisions - they
2931
            #      must all be present, or the method must be full-text.
2932
            #      TODO, RBC 20070919
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2933
            needed_versions = self.source_ancestry - this_versions
2934
    
2825.4.1 by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some
2935
            if not needed_versions:
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2936
                return 0
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
2937
            full_list = topo_sort(
2938
                self.source.get_parent_map(self.source.versions()))
1910.2.65 by Aaron Bentley
Remove the check-parent patch
2939
    
2940
            version_list = [i for i in full_list if (not self.target.has_version(i)
2941
                            and i in needed_versions)]
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2942
    
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2943
            # plan the join:
2944
            copy_queue = []
2945
            copy_queue_records = []
2946
            copy_set = set()
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2947
            for version_id in version_list:
2948
                options = self.source._index.get_options(version_id)
2949
                parents = self.source._index.get_parents_with_ghosts(version_id)
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2950
                # check that its will be a consistent copy:
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2951
                for parent in parents:
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2952
                    # if source has the parent, we must :
2953
                    # * already have it or
2954
                    # * have it scheduled already
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
2955
                    # otherwise we don't care
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
2956
                    if not (self.target.has_version(parent) or
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2957
                            parent in copy_set or
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
2958
                            not self.source.has_version(parent)):
2959
                        raise AssertionError("problem joining parent %r "
2960
                            "from %r to %r"
2961
                            % (parent, self.source, self.target))
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2962
                index_memo = self.source._index.get_position(version_id)
2963
                copy_queue_records.append((version_id, index_memo))
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2964
                copy_queue.append((version_id, options, parents))
2965
                copy_set.add(version_id)
2966
2967
            # data suck the join:
2968
            count = 0
2969
            total = len(version_list)
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
2970
            raw_datum = []
2971
            raw_records = []
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
2972
            for (version_id, raw_data, _), \
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2973
                (version_id2, options, parents) in \
2974
                izip(self.source._data.read_records_iter_raw(copy_queue_records),
2975
                     copy_queue):
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
2976
                if not (version_id == version_id2):
2977
                    raise AssertionError('logic error, inconsistent results')
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2978
                count = count + 1
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2979
                pb.update("Joining knit", count, total)
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
2980
                if converter:
2981
                    size, raw_data = converter(raw_data, version_id, options,
2982
                        parents)
2851.4.1 by Ian Clatworthy
Support joining plain knits to annotated knits and vice versa
2983
                else:
2984
                    size = len(raw_data)
2985
                raw_records.append((version_id, options, parents, size))
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
2986
                raw_datum.append(raw_data)
2987
            self.target._add_raw_records(raw_records, ''.join(raw_datum))
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2988
            return count
2989
        finally:
2990
            pb.finished()
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2991
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
2992
    def _anno_to_plain_converter(self, raw_data, version_id, options,
2993
                                 parents):
2994
        """Convert annotated content to plain content."""
2995
        data, digest = self.source._data._parse_record(version_id, raw_data)
2996
        if 'fulltext' in options:
2997
            content = self.source.factory.parse_fulltext(data, version_id)
2998
            lines = self.target.factory.lower_fulltext(content)
2999
        else:
3000
            delta = self.source.factory.parse_line_delta(data, version_id,
3001
                plain=True)
3002
            lines = self.target.factory.lower_line_delta(delta)
3003
        return self.target._data._record_to_data(version_id, digest, lines)
3004
1563.2.13 by Robert Collins
InterVersionedFile implemented.
3005
3006
InterVersionedFile.register_optimiser(InterKnit)
1596.2.24 by Robert Collins
Gzipfile was slightly slower than ideal.
3007
3008
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3009
class WeaveToKnit(InterVersionedFile):
3010
    """Optimised code paths for weave to knit operations."""
3011
    
3012
    _matching_file_from_factory = bzrlib.weave.WeaveFile
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
3013
    _matching_file_to_factory = staticmethod(make_file_knit)
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3014
    
3015
    @staticmethod
3016
    def is_compatible(source, target):
3017
        """Be compatible with weaves to knits."""
3018
        try:
3019
            return (isinstance(source, bzrlib.weave.Weave) and
3020
                    isinstance(target, KnitVersionedFile))
3021
        except AttributeError:
3022
            return False
3023
3024
    def join(self, pb=None, msg=None, version_ids=None, ignore_missing=False):
3025
        """See InterVersionedFile.join."""
3026
        version_ids = self._get_source_version_ids(version_ids, ignore_missing)
3027
3028
        if not version_ids:
3029
            return 0
3030
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
3031
        pb = ui.ui_factory.nested_progress_bar()
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3032
        try:
3033
            version_ids = list(version_ids)
3034
    
3035
            self.source_ancestry = set(self.source.get_ancestry(version_ids))
3036
            this_versions = set(self.target._index.get_versions())
3037
            needed_versions = self.source_ancestry - this_versions
3038
    
2825.4.1 by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some
3039
            if not needed_versions:
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3040
                return 0
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
3041
            full_list = topo_sort(
3042
                self.source.get_parent_map(self.source.versions()))
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3043
    
3044
            version_list = [i for i in full_list if (not self.target.has_version(i)
3045
                            and i in needed_versions)]
3046
    
3047
            # do the join:
3048
            count = 0
3049
            total = len(version_list)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
3050
            parent_map = self.source.get_parent_map(version_list)
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3051
            for version_id in version_list:
3052
                pb.update("Converting to knit", count, total)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
3053
                parents = parent_map[version_id]
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3054
                # check that its will be a consistent copy:
3055
                for parent in parents:
3056
                    # if source has the parent, we must already have it
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
3057
                    if not self.target.has_version(parent):
3058
                        raise AssertionError("%r does not have parent %r"
3059
                            % (self.target, parent))
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3060
                self.target.add_lines(
3061
                    version_id, parents, self.source.get_lines(version_id))
3062
                count = count + 1
3063
            return count
3064
        finally:
3065
            pb.finished()
3066
3067
3068
InterVersionedFile.register_optimiser(WeaveToKnit)
3069
3070
2781.1.1 by Martin Pool
merge cpatiencediff from Lukas
3071
# Deprecated, use PatienceSequenceMatcher instead
3072
KnitSequenceMatcher = patiencediff.PatienceSequenceMatcher
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
3073
3074
2770.1.2 by Aaron Bentley
Convert to knit-only annotation
3075
def annotate_knit(knit, revision_id):
3076
    """Annotate a knit with no cached annotations.
3077
3078
    This implementation is for knits with no cached annotations.
3079
    It will work for knits with cached annotations, but this is not
3080
    recommended.
3081
    """
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
3082
    annotator = _KnitAnnotator(knit)
3224.1.25 by John Arbash Meinel
Quick change to the _KnitAnnotator api to use .annotate() instead of get_annotated_lines()
3083
    return iter(annotator.annotate(revision_id))
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
3084
3085
3086
class _KnitAnnotator(object):
3224.1.5 by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations.
3087
    """Build up the annotations for a text."""
3088
3089
    def __init__(self, knit):
3090
        self._knit = knit
3091
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3092
        # Content objects, differs from fulltexts because of how final newlines
3093
        # are treated by knits. the content objects here will always have a
3094
        # final newline
3095
        self._fulltext_contents = {}
3096
3097
        # Annotated lines of specific revisions
3098
        self._annotated_lines = {}
3099
3100
        # Track the raw data for nodes that we could not process yet.
3101
        # This maps the revision_id of the base to a list of children that will
3102
        # annotated from it.
3103
        self._pending_children = {}
3104
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3105
        # Nodes which cannot be extracted
3106
        self._ghosts = set()
3107
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3108
        # Track how many children this node has, so we know if we need to keep
3109
        # it
3110
        self._annotate_children = {}
3111
        self._compression_children = {}
3112
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3113
        self._all_build_details = {}
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3114
        # The children => parent revision_id graph
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3115
        self._revision_id_graph = {}
3116
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3117
        self._heads_provider = None
3118
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3119
        self._nodes_to_keep_annotations = set()
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3120
        self._generations_until_keep = 100
3121
3122
    def set_generations_until_keep(self, value):
3123
        """Set the number of generations before caching a node.
3124
3125
        Setting this to -1 will cache every merge node, setting this higher
3126
        will cache fewer nodes.
3127
        """
3128
        self._generations_until_keep = value
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3129
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3130
    def _add_fulltext_content(self, revision_id, content_obj):
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3131
        self._fulltext_contents[revision_id] = content_obj
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3132
        # TODO: jam 20080305 It might be good to check the sha1digest here
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3133
        return content_obj.text()
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3134
3135
    def _check_parents(self, child, nodes_to_annotate):
3136
        """Check if all parents have been processed.
3137
3138
        :param child: A tuple of (rev_id, parents, raw_content)
3139
        :param nodes_to_annotate: If child is ready, add it to
3140
            nodes_to_annotate, otherwise put it back in self._pending_children
3141
        """
3142
        for parent_id in child[1]:
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3143
            if (parent_id not in self._annotated_lines):
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3144
                # This parent is present, but another parent is missing
3145
                self._pending_children.setdefault(parent_id,
3146
                                                  []).append(child)
3147
                break
3148
        else:
3149
            # This one is ready to be processed
3150
            nodes_to_annotate.append(child)
3151
3152
    def _add_annotation(self, revision_id, fulltext, parent_ids,
3153
                        left_matching_blocks=None):
3154
        """Add an annotation entry.
3155
3156
        All parents should already have been annotated.
3157
        :return: A list of children that now have their parents satisfied.
3158
        """
3159
        a = self._annotated_lines
3160
        annotated_parent_lines = [a[p] for p in parent_ids]
3161
        annotated_lines = list(annotate.reannotate(annotated_parent_lines,
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3162
            fulltext, revision_id, left_matching_blocks,
3163
            heads_provider=self._get_heads_provider()))
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3164
        self._annotated_lines[revision_id] = annotated_lines
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3165
        for p in parent_ids:
3166
            ann_children = self._annotate_children[p]
3167
            ann_children.remove(revision_id)
3168
            if (not ann_children
3169
                and p not in self._nodes_to_keep_annotations):
3170
                del self._annotated_lines[p]
3171
                del self._all_build_details[p]
3172
                if p in self._fulltext_contents:
3173
                    del self._fulltext_contents[p]
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3174
        # Now that we've added this one, see if there are any pending
3175
        # deltas to be done, certainly this parent is finished
3176
        nodes_to_annotate = []
3177
        for child in self._pending_children.pop(revision_id, []):
3178
            self._check_parents(child, nodes_to_annotate)
3179
        return nodes_to_annotate
3180
3181
    def _get_build_graph(self, revision_id):
3182
        """Get the graphs for building texts and annotations.
3183
3184
        The data you need for creating a full text may be different than the
3185
        data you need to annotate that text. (At a minimum, you need both
3186
        parents to create an annotation, but only need 1 parent to generate the
3187
        fulltext.)
3188
3189
        :return: A list of (revision_id, index_memo) records, suitable for
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3190
            passing to read_records_iter to start reading in the raw data fro/
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3191
            the pack file.
3192
        """
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3193
        if revision_id in self._annotated_lines:
3194
            # Nothing to do
3195
            return []
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3196
        pending = set([revision_id])
3197
        records = []
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3198
        generation = 0
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3199
        kept_generation = 0
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3200
        while pending:
3201
            # get all pending nodes
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3202
            generation += 1
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3203
            this_iteration = pending
3204
            build_details = self._knit._index.get_build_details(this_iteration)
3205
            self._all_build_details.update(build_details)
3206
            # new_nodes = self._knit._index._get_entries(this_iteration)
3207
            pending = set()
3208
            for rev_id, details in build_details.iteritems():
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3209
                (index_memo, compression_parent, parents,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3210
                 record_details) = details
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3211
                self._revision_id_graph[rev_id] = parents
3212
                records.append((rev_id, index_memo))
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3213
                # Do we actually need to check _annotated_lines?
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3214
                pending.update(p for p in parents
3215
                                 if p not in self._all_build_details)
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3216
                if compression_parent:
3217
                    self._compression_children.setdefault(compression_parent,
3218
                        []).append(rev_id)
3219
                if parents:
3220
                    for parent in parents:
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3221
                        self._annotate_children.setdefault(parent,
3222
                            []).append(rev_id)
3223
                    num_gens = generation - kept_generation
3224
                    if ((num_gens >= self._generations_until_keep)
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3225
                        and len(parents) > 1):
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3226
                        kept_generation = generation
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3227
                        self._nodes_to_keep_annotations.add(rev_id)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3228
3229
            missing_versions = this_iteration.difference(build_details.keys())
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3230
            self._ghosts.update(missing_versions)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3231
            for missing_version in missing_versions:
3232
                # add a key, no parents
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3233
                self._revision_id_graph[missing_version] = ()
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3234
                pending.discard(missing_version) # don't look for it
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
3235
        if self._ghosts.intersection(self._compression_children):
3236
            raise KnitCorrupt(
3237
                "We cannot have nodes which have a ghost compression parent:\n"
3238
                "ghosts: %r\n"
3239
                "compression children: %r"
3240
                % (self._ghosts, self._compression_children))
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3241
        # Cleanout anything that depends on a ghost so that we don't wait for
3242
        # the ghost to show up
3243
        for node in self._ghosts:
3244
            if node in self._annotate_children:
3245
                # We won't be building this node
3246
                del self._annotate_children[node]
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3247
        # Generally we will want to read the records in reverse order, because
3248
        # we find the parent nodes after the children
3249
        records.reverse()
3250
        return records
3251
3252
    def _annotate_records(self, records):
3253
        """Build the annotations for the listed records."""
3254
        # We iterate in the order read, rather than a strict order requested
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3255
        # However, process what we can, and put off to the side things that
3256
        # still need parents, cleaning them up when those parents are
3257
        # processed.
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3258
        for (rev_id, record,
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3259
             digest) in self._knit._data.read_records_iter(records):
3260
            if rev_id in self._annotated_lines:
3261
                continue
3262
            parent_ids = self._revision_id_graph[rev_id]
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3263
            parent_ids = [p for p in parent_ids if p not in self._ghosts]
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3264
            details = self._all_build_details[rev_id]
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3265
            (index_memo, compression_parent, parents,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3266
             record_details) = details
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3267
            nodes_to_annotate = []
3268
            # TODO: Remove the punning between compression parents, and
3269
            #       parent_ids, we should be able to do this without assuming
3270
            #       the build order
3271
            if len(parent_ids) == 0:
3272
                # There are no parents for this node, so just add it
3273
                # TODO: This probably needs to be decoupled
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3274
                fulltext_content, delta = self._knit.factory.parse_record(
3275
                    rev_id, record, record_details, None)
3276
                fulltext = self._add_fulltext_content(rev_id, fulltext_content)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3277
                nodes_to_annotate.extend(self._add_annotation(rev_id, fulltext,
3278
                    parent_ids, left_matching_blocks=None))
3279
            else:
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3280
                child = (rev_id, parent_ids, record)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3281
                # Check if all the parents are present
3282
                self._check_parents(child, nodes_to_annotate)
3283
            while nodes_to_annotate:
3284
                # Should we use a queue here instead of a stack?
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3285
                (rev_id, parent_ids, record) = nodes_to_annotate.pop()
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3286
                (index_memo, compression_parent, parents,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3287
                 record_details) = self._all_build_details[rev_id]
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3288
                if compression_parent is not None:
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3289
                    comp_children = self._compression_children[compression_parent]
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
3290
                    if rev_id not in comp_children:
3291
                        raise AssertionError("%r not in compression children %r"
3292
                            % (rev_id, comp_children))
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3293
                    # If there is only 1 child, it is safe to reuse this
3294
                    # content
3295
                    reuse_content = (len(comp_children) == 1
3296
                        and compression_parent not in
3297
                            self._nodes_to_keep_annotations)
3298
                    if reuse_content:
3299
                        # Remove it from the cache since it will be changing
3300
                        parent_fulltext_content = self._fulltext_contents.pop(compression_parent)
3301
                        # Make sure to copy the fulltext since it might be
3302
                        # modified
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3303
                        parent_fulltext = list(parent_fulltext_content.text())
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3304
                    else:
3305
                        parent_fulltext_content = self._fulltext_contents[compression_parent]
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3306
                        parent_fulltext = parent_fulltext_content.text()
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3307
                    comp_children.remove(rev_id)
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3308
                    fulltext_content, delta = self._knit.factory.parse_record(
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3309
                        rev_id, record, record_details,
3310
                        parent_fulltext_content,
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3311
                        copy_base_content=(not reuse_content))
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3312
                    fulltext = self._add_fulltext_content(rev_id,
3313
                                                          fulltext_content)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3314
                    blocks = KnitContent.get_line_delta_blocks(delta,
3315
                            parent_fulltext, fulltext)
3316
                else:
3317
                    fulltext_content = self._knit.factory.parse_fulltext(
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3318
                        record, rev_id)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3319
                    fulltext = self._add_fulltext_content(rev_id,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3320
                        fulltext_content)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3321
                    blocks = None
3322
                nodes_to_annotate.extend(
3323
                    self._add_annotation(rev_id, fulltext, parent_ids,
3324
                                     left_matching_blocks=blocks))
3325
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3326
    def _get_heads_provider(self):
3327
        """Create a heads provider for resolving ancestry issues."""
3328
        if self._heads_provider is not None:
3329
            return self._heads_provider
3330
        parent_provider = _mod_graph.DictParentsProvider(
3331
            self._revision_id_graph)
3332
        graph_obj = _mod_graph.Graph(parent_provider)
3224.1.20 by John Arbash Meinel
Reduce the number of cache misses by caching known heads answers
3333
        head_cache = _mod_graph.FrozenHeadsCache(graph_obj)
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3334
        self._heads_provider = head_cache
3335
        return head_cache
3336
3224.1.25 by John Arbash Meinel
Quick change to the _KnitAnnotator api to use .annotate() instead of get_annotated_lines()
3337
    def annotate(self, revision_id):
3224.1.5 by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations.
3338
        """Return the annotated fulltext at the given revision.
3339
3340
        :param revision_id: The revision id for this file
3341
        """
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3342
        records = self._get_build_graph(revision_id)
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3343
        if revision_id in self._ghosts:
3344
            raise errors.RevisionNotPresent(revision_id, self._knit)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3345
        self._annotate_records(records)
3346
        return self._annotated_lines[revision_id]
3224.1.5 by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations.
3347
3348
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
3349
try:
2484.1.12 by John Arbash Meinel
Switch the layout to use a matching _knit_load_data_py.py and _knit_load_data_c.pyx
3350
    from bzrlib._knit_load_data_c import _load_data_c as _load_data
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
3351
except ImportError:
2484.1.12 by John Arbash Meinel
Switch the layout to use a matching _knit_load_data_py.py and _knit_load_data_c.pyx
3352
    from bzrlib._knit_load_data_py import _load_data_py as _load_data