/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
1
# Copyright (C) 2005, 2006, 2007 Canonical Ltd
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
17
"""Knit versionedfile implementation.
18
19
A knit is a versioned file implementation that supports efficient append only
20
updates.
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
21
22
Knit file layout:
23
lifeless: the data file is made up of "delta records".  each delta record has a delta header 
24
that contains; (1) a version id, (2) the size of the delta (in lines), and (3)  the digest of 
25
the -expanded data- (ie, the delta applied to the parent).  the delta also ends with a 
26
end-marker; simply "end VERSION"
27
28
delta can be line or full contents.a
29
... the 8's there are the index number of the annotation.
30
version robertc@robertcollins.net-20051003014215-ee2990904cc4c7ad 7 c7d23b2a5bd6ca00e8e266cec0ec228158ee9f9e
31
59,59,3
32
8
33
8         if ie.executable:
34
8             e.set('executable', 'yes')
35
130,130,2
36
8         if elt.get('executable') == 'yes':
37
8             ie.executable = True
38
end robertc@robertcollins.net-20051003014215-ee2990904cc4c7ad 
39
40
41
whats in an index:
42
09:33 < jrydberg> lifeless: each index is made up of a tuple of; version id, options, position, size, parents
43
09:33 < jrydberg> lifeless: the parents are currently dictionary compressed
44
09:33 < jrydberg> lifeless: (meaning it currently does not support ghosts)
45
09:33 < lifeless> right
46
09:33 < jrydberg> lifeless: the position and size is the range in the data file
47
48
49
so the index sequence is the dictionary compressed sequence number used
50
in the deltas to provide line annotation
51
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
52
"""
53
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
54
# TODOS:
55
# 10:16 < lifeless> make partial index writes safe
56
# 10:16 < lifeless> implement 'knit.check()' like weave.check()
57
# 10:17 < lifeless> record known ghosts so we can detect when they are filled in rather than the current 'reweave 
58
#                    always' approach.
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
59
# move sha1 out of the content so that join is faster at verifying parents
60
# record content length ?
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
61
                  
62
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
63
from copy import copy
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
64
from cStringIO import StringIO
1596.2.28 by Robert Collins
more knit profile based tuning.
65
from itertools import izip, chain
1756.2.17 by Aaron Bentley
Fixes suggested by John Meinel
66
import operator
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
67
import os
1628.1.2 by Robert Collins
More knit micro-optimisations.
68
import sys
1756.2.29 by Aaron Bentley
Remove basis knit support
69
import warnings
2762.3.1 by Robert Collins
* The compression used within the bzr repository has changed from zlib
70
from zlib import Z_DEFAULT_COMPRESSION
1594.2.19 by Robert Collins
More coalescing tweaks, and knit feedback.
71
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
72
import bzrlib
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
73
from bzrlib.lazy_import import lazy_import
74
lazy_import(globals(), """
75
from bzrlib import (
2770.1.1 by Aaron Bentley
Initial implmentation of plain knit annotation
76
    annotate,
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
77
    graph as _mod_graph,
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
78
    lru_cache,
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
79
    pack,
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
80
    trace,
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
81
    )
82
""")
1911.2.3 by John Arbash Meinel
Moving everything into a new location so that we can cache more than just revision ids
83
from bzrlib import (
84
    cache_utf8,
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
85
    debug,
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
86
    diff,
1911.2.3 by John Arbash Meinel
Moving everything into a new location so that we can cache more than just revision ids
87
    errors,
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
88
    osutils,
2104.4.2 by John Arbash Meinel
Small cleanup and NEWS entry about fixing bug #65714
89
    patiencediff,
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
90
    progress,
1551.15.46 by Aaron Bentley
Move plan merge to tree
91
    merge,
2196.2.1 by John Arbash Meinel
Merge Dmitry's optimizations and minimize the actual diff.
92
    ui,
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
93
    )
94
from bzrlib.errors import (
95
    FileExists,
96
    NoSuchFile,
97
    KnitError,
98
    InvalidRevisionId,
99
    KnitCorrupt,
100
    KnitHeaderError,
101
    RevisionNotPresent,
102
    RevisionAlreadyPresent,
103
    )
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
104
from bzrlib.graph import Graph
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
105
from bzrlib.osutils import (
106
    contains_whitespace,
107
    contains_linebreaks,
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
108
    sha_string,
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
109
    sha_strings,
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
110
    split_lines,
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
111
    )
3287.6.5 by Robert Collins
Deprecate VersionedFile.has_ghost.
112
from bzrlib.symbol_versioning import (
113
    DEPRECATED_PARAMETER,
114
    deprecated_method,
115
    deprecated_passed,
116
    one_four,
117
    )
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
118
from bzrlib.tsort import topo_sort
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
119
from bzrlib.tuned_gzip import GzipFile, bytes_to_gzip
2094.3.5 by John Arbash Meinel
Fix imports to ensure modules are loaded before they are used
120
import bzrlib.ui
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
121
from bzrlib.versionedfile import (
3350.3.12 by Robert Collins
Generate streams with absent records.
122
    AbsentContentFactory,
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
123
    adapter_registry,
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
124
    ContentFactory,
125
    InterVersionedFile,
126
    VersionedFile,
127
    )
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
128
import bzrlib.weave
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
129
130
131
# TODO: Split out code specific to this format into an associated object.
132
133
# TODO: Can we put in some kind of value to check that the index and data
134
# files belong together?
135
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
136
# TODO: accommodate binaries, perhaps by storing a byte count
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
137
138
# TODO: function to check whole file
139
140
# TODO: atomically append data, then measure backwards from the cursor
141
# position after writing to work out where it was located.  we may need to
142
# bypass python file buffering.
143
144
DATA_SUFFIX = '.knit'
145
INDEX_SUFFIX = '.kndx'
146
147
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
148
class KnitAdapter(object):
149
    """Base class for knit record adaption."""
150
3350.3.7 by Robert Collins
Create a registry of versioned file record adapters.
151
    def __init__(self, basis_vf):
152
        """Create an adapter which accesses full texts from basis_vf.
153
        
154
        :param basis_vf: A versioned file to access basis texts of deltas from.
155
            May be None for adapters that do not need to access basis texts.
156
        """
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
157
        self._data = _KnitData(None)
158
        self._annotate_factory = KnitAnnotateFactory()
159
        self._plain_factory = KnitPlainFactory()
3350.3.7 by Robert Collins
Create a registry of versioned file record adapters.
160
        self._basis_vf = basis_vf
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
161
162
163
class FTAnnotatedToUnannotated(KnitAdapter):
164
    """An adapter from FT annotated knits to unannotated ones."""
165
166
    def get_bytes(self, factory, annotated_compressed_bytes):
167
        rec, contents = \
168
            self._data._parse_record_unchecked(annotated_compressed_bytes)
169
        content = self._annotate_factory.parse_fulltext(contents, rec[1])
170
        size, bytes = self._data._record_to_data(rec[1], rec[3], content.text())
171
        return bytes
172
173
174
class DeltaAnnotatedToUnannotated(KnitAdapter):
175
    """An adapter for deltas from annotated to unannotated."""
176
177
    def get_bytes(self, factory, annotated_compressed_bytes):
178
        rec, contents = \
179
            self._data._parse_record_unchecked(annotated_compressed_bytes)
180
        delta = self._annotate_factory.parse_line_delta(contents, rec[1],
181
            plain=True)
182
        contents = self._plain_factory.lower_line_delta(delta)
183
        size, bytes = self._data._record_to_data(rec[1], rec[3], contents)
184
        return bytes
185
186
187
class FTAnnotatedToFullText(KnitAdapter):
188
    """An adapter from FT annotated knits to unannotated ones."""
189
190
    def get_bytes(self, factory, annotated_compressed_bytes):
191
        rec, contents = \
192
            self._data._parse_record_unchecked(annotated_compressed_bytes)
193
        content, delta = self._annotate_factory.parse_record(factory.key[0],
194
            contents, factory._build_details, None)
195
        return ''.join(content.text())
196
197
198
class DeltaAnnotatedToFullText(KnitAdapter):
199
    """An adapter for deltas from annotated to unannotated."""
200
201
    def get_bytes(self, factory, annotated_compressed_bytes):
202
        rec, contents = \
203
            self._data._parse_record_unchecked(annotated_compressed_bytes)
204
        delta = self._annotate_factory.parse_line_delta(contents, rec[1],
205
            plain=True)
206
        compression_parent = factory.parents[0][0]
207
        basis_lines = self._basis_vf.get_lines(compression_parent)
208
        # Manually apply the delta because we have one annotated content and
209
        # one plain.
210
        basis_content = PlainKnitContent(basis_lines, compression_parent)
211
        basis_content.apply_delta(delta, rec[1])
212
        basis_content._should_strip_eol = factory._build_details[1]
213
        return ''.join(basis_content.text())
214
215
3350.3.5 by Robert Collins
Create adapters from plain compressed knit content.
216
class FTPlainToFullText(KnitAdapter):
217
    """An adapter from FT plain knits to unannotated ones."""
218
219
    def get_bytes(self, factory, compressed_bytes):
220
        rec, contents = \
221
            self._data._parse_record_unchecked(compressed_bytes)
222
        content, delta = self._plain_factory.parse_record(factory.key[0],
223
            contents, factory._build_details, None)
224
        return ''.join(content.text())
225
226
227
class DeltaPlainToFullText(KnitAdapter):
228
    """An adapter for deltas from annotated to unannotated."""
229
230
    def get_bytes(self, factory, compressed_bytes):
231
        rec, contents = \
232
            self._data._parse_record_unchecked(compressed_bytes)
233
        delta = self._plain_factory.parse_line_delta(contents, rec[1])
234
        compression_parent = factory.parents[0][0]
235
        basis_lines = self._basis_vf.get_lines(compression_parent)
236
        basis_content = PlainKnitContent(basis_lines, compression_parent)
237
        # Manually apply the delta because we have one annotated content and
238
        # one plain.
239
        content, _ = self._plain_factory.parse_record(rec[1], contents,
240
            factory._build_details, basis_content)
241
        return ''.join(content.text())
242
243
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
244
class KnitContentFactory(ContentFactory):
245
    """Content factory for streaming from knits.
246
    
247
    :seealso ContentFactory:
248
    """
249
250
    def __init__(self, version, parents, build_details, sha1, raw_record,
251
        annotated, knit=None):
252
        """Create a KnitContentFactory for version.
253
        
254
        :param version: The version.
255
        :param parents: The parents.
256
        :param build_details: The build details as returned from
257
            get_build_details.
258
        :param sha1: The sha1 expected from the full text of this object.
259
        :param raw_record: The bytes of the knit data from disk.
260
        :param annotated: True if the raw data is annotated.
261
        """
262
        ContentFactory.__init__(self)
263
        self.sha1 = sha1
264
        self.key = (version,)
265
        self.parents = tuple((parent,) for parent in parents)
266
        if build_details[0] == 'line-delta':
267
            kind = 'delta'
268
        else:
269
            kind = 'ft'
270
        if annotated:
271
            annotated_kind = 'annotated-'
272
        else:
273
            annotated_kind = ''
274
        self.storage_kind = 'knit-%s%s-gz' % (annotated_kind, kind)
275
        self._raw_record = raw_record
276
        self._build_details = build_details
277
        self._knit = knit
278
279
    def get_bytes_as(self, storage_kind):
280
        if storage_kind == self.storage_kind:
281
            return self._raw_record
282
        if storage_kind == 'fulltext' and self._knit is not None:
283
            return self._knit.get_text(self.key[0])
284
        else:
285
            raise errors.UnavailableRepresentation(self.key, storage_kind,
286
                self.storage_kind)
287
288
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
289
class KnitContent(object):
290
    """Content of a knit version to which deltas can be applied."""
291
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
292
    def __init__(self):
293
        self._should_strip_eol = False
294
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
295
    def apply_delta(self, delta, new_version_id):
2921.2.2 by Robert Collins
Review feedback.
296
        """Apply delta to this object to become new_version_id."""
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
297
        raise NotImplementedError(self.apply_delta)
298
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
299
    def cleanup_eol(self, copy_on_mutate=True):
300
        if self._should_strip_eol:
301
            if copy_on_mutate:
302
                self._lines = self._lines[:]
303
            self.strip_last_line_newline()
304
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
305
    def line_delta_iter(self, new_lines):
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
306
        """Generate line-based delta from this content to new_lines."""
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
307
        new_texts = new_lines.text()
308
        old_texts = self.text()
2781.1.1 by Martin Pool
merge cpatiencediff from Lukas
309
        s = patiencediff.PatienceSequenceMatcher(None, old_texts, new_texts)
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
310
        for tag, i1, i2, j1, j2 in s.get_opcodes():
311
            if tag == 'equal':
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
312
                continue
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
313
            # ofrom, oto, length, data
314
            yield i1, i2, j2 - j1, new_lines._lines[j1:j2]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
315
316
    def line_delta(self, new_lines):
317
        return list(self.line_delta_iter(new_lines))
318
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
319
    @staticmethod
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
320
    def get_line_delta_blocks(knit_delta, source, target):
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
321
        """Extract SequenceMatcher.get_matching_blocks() from a knit delta"""
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
322
        target_len = len(target)
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
323
        s_pos = 0
324
        t_pos = 0
325
        for s_begin, s_end, t_len, new_text in knit_delta:
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
326
            true_n = s_begin - s_pos
327
            n = true_n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
328
            if n > 0:
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
329
                # knit deltas do not provide reliable info about whether the
330
                # last line of a file matches, due to eol handling.
331
                if source[s_pos + n -1] != target[t_pos + n -1]:
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
332
                    n-=1
333
                if n > 0:
334
                    yield s_pos, t_pos, n
335
            t_pos += t_len + true_n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
336
            s_pos = s_end
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
337
        n = target_len - t_pos
338
        if n > 0:
339
            if source[s_pos + n -1] != target[t_pos + n -1]:
340
                n-=1
341
            if n > 0:
342
                yield s_pos, t_pos, n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
343
        yield s_pos + (target_len - t_pos), target_len, 0
344
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
345
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
346
class AnnotatedKnitContent(KnitContent):
347
    """Annotated content."""
348
349
    def __init__(self, lines):
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
350
        KnitContent.__init__(self)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
351
        self._lines = lines
352
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
353
    def annotate(self):
354
        """Return a list of (origin, text) for each content line."""
355
        return list(self._lines)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
356
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
357
    def apply_delta(self, delta, new_version_id):
2921.2.2 by Robert Collins
Review feedback.
358
        """Apply delta to this object to become new_version_id."""
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
359
        offset = 0
360
        lines = self._lines
361
        for start, end, count, delta_lines in delta:
362
            lines[offset+start:offset+end] = delta_lines
363
            offset = offset + (start - end) + count
364
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
365
    def strip_last_line_newline(self):
366
        line = self._lines[-1][1].rstrip('\n')
367
        self._lines[-1] = (self._lines[-1][0], line)
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
368
        self._should_strip_eol = False
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
369
370
    def text(self):
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
371
        try:
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
372
            lines = [text for origin, text in self._lines]
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
373
        except ValueError, e:
374
            # most commonly (only?) caused by the internal form of the knit
375
            # missing annotation information because of a bug - see thread
376
            # around 20071015
377
            raise KnitCorrupt(self,
378
                "line in annotated knit missing annotation information: %s"
379
                % (e,))
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
380
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
381
        if self._should_strip_eol:
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
382
            lines[-1] = lines[-1].rstrip('\n')
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
383
        return lines
384
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
385
    def copy(self):
386
        return AnnotatedKnitContent(self._lines[:])
387
388
389
class PlainKnitContent(KnitContent):
2794.1.3 by Robert Collins
Review feedback.
390
    """Unannotated content.
391
    
392
    When annotate[_iter] is called on this content, the same version is reported
393
    for all lines. Generally, annotate[_iter] is not useful on PlainKnitContent
394
    objects.
395
    """
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
396
397
    def __init__(self, lines, version_id):
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
398
        KnitContent.__init__(self)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
399
        self._lines = lines
400
        self._version_id = version_id
401
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
402
    def annotate(self):
403
        """Return a list of (origin, text) for each content line."""
404
        return [(self._version_id, line) for line in self._lines]
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
405
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
406
    def apply_delta(self, delta, new_version_id):
2921.2.2 by Robert Collins
Review feedback.
407
        """Apply delta to this object to become new_version_id."""
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
408
        offset = 0
409
        lines = self._lines
410
        for start, end, count, delta_lines in delta:
411
            lines[offset+start:offset+end] = delta_lines
412
            offset = offset + (start - end) + count
413
        self._version_id = new_version_id
414
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
415
    def copy(self):
416
        return PlainKnitContent(self._lines[:], self._version_id)
417
418
    def strip_last_line_newline(self):
419
        self._lines[-1] = self._lines[-1].rstrip('\n')
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
420
        self._should_strip_eol = False
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
421
422
    def text(self):
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
423
        lines = self._lines
424
        if self._should_strip_eol:
425
            lines = lines[:]
426
            lines[-1] = lines[-1].rstrip('\n')
427
        return lines
428
429
430
class _KnitFactory(object):
431
    """Base class for common Factory functions."""
432
433
    def parse_record(self, version_id, record, record_details,
434
                     base_content, copy_base_content=True):
435
        """Parse a record into a full content object.
436
437
        :param version_id: The official version id for this content
438
        :param record: The data returned by read_records_iter()
439
        :param record_details: Details about the record returned by
440
            get_build_details
441
        :param base_content: If get_build_details returns a compression_parent,
442
            you must return a base_content here, else use None
443
        :param copy_base_content: When building from the base_content, decide
444
            you can either copy it and return a new object, or modify it in
445
            place.
446
        :return: (content, delta) A Content object and possibly a line-delta,
447
            delta may be None
448
        """
449
        method, noeol = record_details
450
        if method == 'line-delta':
451
            assert base_content is not None
452
            if copy_base_content:
453
                content = base_content.copy()
454
            else:
455
                content = base_content
456
            delta = self.parse_line_delta(record, version_id)
457
            content.apply_delta(delta, version_id)
458
        else:
459
            content = self.parse_fulltext(record, version_id)
460
            delta = None
461
        content._should_strip_eol = noeol
462
        return (content, delta)
463
464
465
class KnitAnnotateFactory(_KnitFactory):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
466
    """Factory for creating annotated Content objects."""
467
468
    annotated = True
469
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
470
    def make(self, lines, version_id):
471
        num_lines = len(lines)
472
        return AnnotatedKnitContent(zip([version_id] * num_lines, lines))
473
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
474
    def parse_fulltext(self, content, version_id):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
475
        """Convert fulltext to internal representation
476
477
        fulltext content is of the format
478
        revid(utf8) plaintext\n
479
        internal representation is of the format:
480
        (revid, plaintext)
481
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
482
        # TODO: jam 20070209 The tests expect this to be returned as tuples,
483
        #       but the code itself doesn't really depend on that.
484
        #       Figure out a way to not require the overhead of turning the
485
        #       list back into tuples.
486
        lines = [tuple(line.split(' ', 1)) for line in content]
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
487
        return AnnotatedKnitContent(lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
488
489
    def parse_line_delta_iter(self, lines):
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
490
        return iter(self.parse_line_delta(lines))
1628.1.2 by Robert Collins
More knit micro-optimisations.
491
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
492
    def parse_line_delta(self, lines, version_id, plain=False):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
493
        """Convert a line based delta into internal representation.
494
495
        line delta is in the form of:
496
        intstart intend intcount
497
        1..count lines:
498
        revid(utf8) newline\n
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
499
        internal representation is
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
500
        (start, end, count, [1..count tuples (revid, newline)])
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
501
502
        :param plain: If True, the lines are returned as a plain
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
503
            list without annotations, not as a list of (origin, content) tuples, i.e.
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
504
            (start, end, count, [1..count newline])
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
505
        """
1628.1.2 by Robert Collins
More knit micro-optimisations.
506
        result = []
507
        lines = iter(lines)
508
        next = lines.next
2249.5.1 by John Arbash Meinel
Leave revision-ids in utf-8 when reading.
509
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
510
        cache = {}
511
        def cache_and_return(line):
512
            origin, text = line.split(' ', 1)
513
            return cache.setdefault(origin, origin), text
514
1628.1.2 by Robert Collins
More knit micro-optimisations.
515
        # walk through the lines parsing.
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
516
        # Note that the plain test is explicitly pulled out of the
517
        # loop to minimise any performance impact
518
        if plain:
519
            for header in lines:
520
                start, end, count = [int(n) for n in header.split(',')]
521
                contents = [next().split(' ', 1)[1] for i in xrange(count)]
522
                result.append((start, end, count, contents))
523
        else:
524
            for header in lines:
525
                start, end, count = [int(n) for n in header.split(',')]
526
                contents = [tuple(next().split(' ', 1)) for i in xrange(count)]
527
                result.append((start, end, count, contents))
1628.1.2 by Robert Collins
More knit micro-optimisations.
528
        return result
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
529
2163.2.2 by John Arbash Meinel
Don't deal with annotations when we don't care about them. Saves another 300+ms
530
    def get_fulltext_content(self, lines):
531
        """Extract just the content lines from a fulltext."""
532
        return (line.split(' ', 1)[1] for line in lines)
533
534
    def get_linedelta_content(self, lines):
535
        """Extract just the content from a line delta.
536
537
        This doesn't return all of the extra information stored in a delta.
538
        Only the actual content lines.
539
        """
540
        lines = iter(lines)
541
        next = lines.next
542
        for header in lines:
543
            header = header.split(',')
544
            count = int(header[2])
545
            for i in xrange(count):
546
                origin, text = next().split(' ', 1)
547
                yield text
548
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
549
    def lower_fulltext(self, content):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
550
        """convert a fulltext content record into a serializable form.
551
552
        see parse_fulltext which this inverts.
553
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
554
        # TODO: jam 20070209 We only do the caching thing to make sure that
555
        #       the origin is a valid utf-8 line, eventually we could remove it
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
556
        return ['%s %s' % (o, t) for o, t in content._lines]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
557
558
    def lower_line_delta(self, delta):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
559
        """convert a delta into a serializable form.
560
1628.1.2 by Robert Collins
More knit micro-optimisations.
561
        See parse_line_delta which this inverts.
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
562
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
563
        # TODO: jam 20070209 We only do the caching thing to make sure that
564
        #       the origin is a valid utf-8 line, eventually we could remove it
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
565
        out = []
566
        for start, end, c, lines in delta:
567
            out.append('%d,%d,%d\n' % (start, end, c))
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
568
            out.extend(origin + ' ' + text
1911.2.1 by John Arbash Meinel
Cache encode/decode operations, saves memory and time. Especially when committing a new kernel tree with 7.7M new lines to annotate
569
                       for origin, text in lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
570
        return out
571
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
572
    def annotate(self, knit, version_id):
2770.1.1 by Aaron Bentley
Initial implmentation of plain knit annotation
573
        content = knit._get_content(version_id)
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
574
        return content.annotate()
2770.1.1 by Aaron Bentley
Initial implmentation of plain knit annotation
575
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
576
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
577
class KnitPlainFactory(_KnitFactory):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
578
    """Factory for creating plain Content objects."""
579
580
    annotated = False
581
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
582
    def make(self, lines, version_id):
583
        return PlainKnitContent(lines, version_id)
584
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
585
    def parse_fulltext(self, content, version_id):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
586
        """This parses an unannotated fulltext.
587
588
        Note that this is not a noop - the internal representation
589
        has (versionid, line) - its just a constant versionid.
590
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
591
        return self.make(content, version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
592
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
593
    def parse_line_delta_iter(self, lines, version_id):
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
594
        cur = 0
595
        num_lines = len(lines)
596
        while cur < num_lines:
597
            header = lines[cur]
598
            cur += 1
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
599
            start, end, c = [int(n) for n in header.split(',')]
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
600
            yield start, end, c, lines[cur:cur+c]
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
601
            cur += c
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
602
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
603
    def parse_line_delta(self, lines, version_id):
604
        return list(self.parse_line_delta_iter(lines, version_id))
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
605
2163.2.2 by John Arbash Meinel
Don't deal with annotations when we don't care about them. Saves another 300+ms
606
    def get_fulltext_content(self, lines):
607
        """Extract just the content lines from a fulltext."""
608
        return iter(lines)
609
610
    def get_linedelta_content(self, lines):
611
        """Extract just the content from a line delta.
612
613
        This doesn't return all of the extra information stored in a delta.
614
        Only the actual content lines.
615
        """
616
        lines = iter(lines)
617
        next = lines.next
618
        for header in lines:
619
            header = header.split(',')
620
            count = int(header[2])
621
            for i in xrange(count):
622
                yield next()
623
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
624
    def lower_fulltext(self, content):
625
        return content.text()
626
627
    def lower_line_delta(self, delta):
628
        out = []
629
        for start, end, c, lines in delta:
630
            out.append('%d,%d,%d\n' % (start, end, c))
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
631
            out.extend(lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
632
        return out
633
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
634
    def annotate(self, knit, version_id):
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
635
        annotator = _KnitAnnotator(knit)
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
636
        return annotator.annotate(version_id)
2770.1.1 by Aaron Bentley
Initial implmentation of plain knit annotation
637
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
638
639
def make_empty_knit(transport, relpath):
640
    """Construct a empty knit at the specified location."""
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
641
    k = make_file_knit(transport, relpath, 'w', KnitPlainFactory)
642
643
644
def make_file_knit(name, transport, file_mode=None, access_mode='w',
645
    factory=None, delta=True, create=False, create_parent_dir=False,
646
    delay_create=False, dir_mode=None, get_scope=None):
647
    """Factory to create a KnitVersionedFile for a .knit/.kndx file pair."""
648
    if factory is None:
649
        factory = KnitAnnotateFactory()
650
    if get_scope is None:
651
        get_scope = lambda:None
652
    index = _KnitIndex(transport, name + INDEX_SUFFIX,
653
        access_mode, create=create, file_mode=file_mode,
654
        create_parent_dir=create_parent_dir, delay_create=delay_create,
655
        dir_mode=dir_mode, get_scope=get_scope)
656
    access = _KnitAccess(transport, name + DATA_SUFFIX, file_mode,
657
        dir_mode, ((create and not len(index)) and delay_create),
658
        create_parent_dir)
659
    return KnitVersionedFile(name, transport, factory=factory,
660
        create=create, delay_create=delay_create, index=index,
661
        access_method=access)
662
663
664
def get_suffixes():
665
    """Return the suffixes used by file based knits."""
666
    return [DATA_SUFFIX, INDEX_SUFFIX]
667
make_file_knit.get_suffixes = get_suffixes
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
668
669
670
class KnitVersionedFile(VersionedFile):
671
    """Weave-like structure with faster random access.
672
673
    A knit stores a number of texts and a summary of the relationships
674
    between them.  Texts are identified by a string version-id.  Texts
675
    are normally stored and retrieved as a series of lines, but can
676
    also be passed as single strings.
677
678
    Lines are stored with the trailing newline (if any) included, to
679
    avoid special cases for files with no final newline.  Lines are
680
    composed of 8-bit characters, not unicode.  The combination of
681
    these approaches should mean any 'binary' file can be safely
682
    stored and retrieved.
683
    """
684
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
685
    def __init__(self, relpath, transport, file_mode=None,
2592.3.135 by Robert Collins
Do not create many transient knit objects, saving 4% on commit.
686
        factory=None, delta=True, create=False, create_parent_dir=False,
687
        delay_create=False, dir_mode=None, index=None, access_method=None):
1563.2.25 by Robert Collins
Merge in upstream.
688
        """Construct a knit at location specified by relpath.
689
        
690
        :param create: If not True, only open an existing knit.
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
691
        :param create_parent_dir: If True, create the parent directory if 
692
            creating the file fails. (This is used for stores with 
693
            hash-prefixes that may not exist yet)
694
        :param delay_create: The calling code is aware that the knit won't 
695
            actually be created until the first data is stored.
2592.3.1 by Robert Collins
Allow giving KnitVersionedFile an index object to use rather than implicitly creating one.
696
        :param index: An index to use for the knit.
1563.2.25 by Robert Collins
Merge in upstream.
697
        """
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
698
        super(KnitVersionedFile, self).__init__()
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
699
        self.transport = transport
700
        self.filename = relpath
1563.2.16 by Robert Collins
Change WeaveStore into VersionedFileStore and make its versoined file class parameterisable.
701
        self.factory = factory or KnitAnnotateFactory()
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
702
        self.delta = delta
703
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
704
        self._max_delta_chain = 200
705
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
706
        if None in (access_method, index):
3316.2.15 by Robert Collins
Final review feedback.
707
            raise ValueError("No default access_method or index any more")
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
708
        self._index = index
709
        _access = access_method
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
710
        if create and not len(self) and not delay_create:
711
            _access.create()
712
        self._data = _KnitData(_access)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
713
1704.2.10 by Martin Pool
Add KnitVersionedFile.__repr__ method
714
    def __repr__(self):
2592.3.159 by Robert Collins
Provide a transport for KnitVersionedFile's __repr__ in pack repositories.
715
        return '%s(%s)' % (self.__class__.__name__,
1704.2.10 by Martin Pool
Add KnitVersionedFile.__repr__ method
716
                           self.transport.abspath(self.filename))
717
    
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
718
    def _check_should_delta(self, first_parents):
719
        """Iterate back through the parent listing, looking for a fulltext.
720
721
        This is used when we want to decide whether to add a delta or a new
722
        fulltext. It searches for _max_delta_chain parents. When it finds a
723
        fulltext parent, it sees if the total size of the deltas leading up to
724
        it is large enough to indicate that we want a new full text anyway.
725
726
        Return True if we should create a new delta, False if we should use a
727
        full text.
728
        """
729
        delta_size = 0
730
        fulltext_size = None
731
        delta_parents = first_parents
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
732
        for count in xrange(self._max_delta_chain):
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
733
            parent = delta_parents[0]
734
            method = self._index.get_method(parent)
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
735
            index, pos, size = self._index.get_position(parent)
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
736
            if method == 'fulltext':
737
                fulltext_size = size
738
                break
739
            delta_size += size
3287.5.6 by Robert Collins
Remove _KnitIndex.get_parents.
740
            delta_parents = self._index.get_parent_map([parent])[parent]
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
741
        else:
742
            # We couldn't find a fulltext, so we must create a new one
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
743
            return False
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
744
745
        return fulltext_size > delta_size
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
746
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
747
    def _check_write_ok(self):
748
        return self._index._check_write_ok()
749
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
750
    def _add_raw_records(self, records, data):
751
        """Add all the records 'records' with data pre-joined in 'data'.
752
753
        :param records: A list of tuples(version_id, options, parents, size).
754
        :param data: The data for the records. When it is written, the records
755
                     are adjusted to have pos pointing into data by the sum of
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
756
                     the preceding records sizes.
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
757
        """
758
        # write all the data
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
759
        raw_record_sizes = [record[3] for record in records]
760
        positions = self._data.add_raw_records(raw_record_sizes, data)
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
761
        offset = 0
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
762
        index_entries = []
2592.3.68 by Robert Collins
Make knit add_versions calls take access memo tuples rather than just pos and size.
763
        for (version_id, options, parents, size), access_memo in zip(
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
764
            records, positions):
2592.3.68 by Robert Collins
Make knit add_versions calls take access memo tuples rather than just pos and size.
765
            index_entries.append((version_id, options, access_memo, parents))
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
766
            offset += size
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
767
        self._index.add_versions(index_entries)
768
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
769
    def copy_to(self, name, transport):
770
        """See VersionedFile.copy_to()."""
771
        # copy the current index to a temp index to avoid racing with local
772
        # writes
1955.3.30 by John Arbash Meinel
fix small bug
773
        transport.put_file_non_atomic(name + INDEX_SUFFIX + '.tmp',
1955.3.24 by John Arbash Meinel
Update Knit to use the new non_atomic_foo functions
774
                self.transport.get(self._index._filename))
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
775
        # copy the data file
1711.7.25 by John Arbash Meinel
try/finally to close files, _KnitData was keeping a handle to a file it never used again, and using transport.rename() when it wanted transport.move()
776
        f = self._data._open_file()
777
        try:
1955.3.8 by John Arbash Meinel
avoid some deprecation warnings in other parts of the code
778
            transport.put_file(name + DATA_SUFFIX, f)
1711.7.25 by John Arbash Meinel
try/finally to close files, _KnitData was keeping a handle to a file it never used again, and using transport.rename() when it wanted transport.move()
779
        finally:
780
            f.close()
781
        # move the copied index into place
782
        transport.move(name + INDEX_SUFFIX + '.tmp', name + INDEX_SUFFIX)
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
783
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
784
    def get_data_stream(self, required_versions):
785
        """Get a data stream for the specified versions.
786
787
        Versions may be returned in any order, not necessarily the order
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
788
        specified.  They are returned in a partial order by compression
789
        parent, so that the deltas can be applied as the data stream is
790
        inserted; however note that compression parents will not be sent
791
        unless they were specifically requested, as the client may already
792
        have them.
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
793
2670.3.7 by Andrew Bennetts
Tweak docstring as requested in review.
794
        :param required_versions: The exact set of versions to be extracted.
795
            Unlike some other knit methods, this is not used to generate a
796
            transitive closure, rather it is used precisely as given.
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
797
        
798
        :returns: format_signature, list of (version, options, length, parents),
799
            reader_callable.
800
        """
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
801
        required_version_set = frozenset(required_versions)
802
        version_index = {}
803
        # list of revisions that can just be sent without waiting for their
804
        # compression parent
805
        ready_to_send = []
806
        # map from revision to the children based on it
807
        deferred = {}
808
        # first, read all relevant index data, enough to sort into the right
809
        # order to return
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
810
        for version_id in required_versions:
811
            options = self._index.get_options(version_id)
812
            parents = self._index.get_parents_with_ghosts(version_id)
2535.3.36 by Andrew Bennetts
Merge bzr.dev
813
            index_memo = self._index.get_position(version_id)
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
814
            version_index[version_id] = (index_memo, options, parents)
3034.3.1 by Martin Pool
Post-review cleanups from Robert for KnitVersionedFile.get_data_stream
815
            if ('line-delta' in options
816
                and parents[0] in required_version_set):
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
817
                # must wait until the parent has been sent
818
                deferred.setdefault(parents[0], []). \
819
                    append(version_id)
820
            else:
821
                # either a fulltext, or a delta whose parent the client did
822
                # not ask for and presumably already has
823
                ready_to_send.append(version_id)
824
        # build a list of results to return, plus instructions for data to
825
        # read from the file
826
        copy_queue_records = []
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
827
        temp_version_list = []
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
828
        while ready_to_send:
829
            # XXX: pushing and popping lists may be a bit inefficient
3023.2.3 by Martin Pool
Update tests for new ordering of results from get_data_stream - the order is not defined by the interface, but is stable
830
            version_id = ready_to_send.pop(0)
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
831
            (index_memo, options, parents) = version_index[version_id]
2535.3.36 by Andrew Bennetts
Merge bzr.dev
832
            copy_queue_records.append((version_id, index_memo))
833
            none, data_pos, data_size = index_memo
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
834
            temp_version_list.append((version_id, options, data_size,
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
835
                parents))
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
836
            if version_id in deferred:
837
                # now we can send all the children of this revision - we could
3023.2.3 by Martin Pool
Update tests for new ordering of results from get_data_stream - the order is not defined by the interface, but is stable
838
                # put them in anywhere, but we hope that sending them soon
839
                # after the fulltext will give good locality in the receiver
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
840
                ready_to_send[:0] = deferred.pop(version_id)
841
        assert len(deferred) == 0, \
842
            "Still have compressed child versions waiting to be sent"
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
843
        # XXX: The stream format is such that we cannot stream it - we have to
844
        # know the length of all the data a-priori.
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
845
        raw_datum = []
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
846
        result_version_list = []
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
847
        for (version_id, raw_data, _), \
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
848
            (version_id2, options, _, parents) in \
849
            izip(self._data.read_records_iter_raw(copy_queue_records),
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
850
                 temp_version_list):
3023.2.2 by Martin Pool
Fix KnitVersionedFile.get_data_stream to not assume .versions() is sorted. (lp:165106)
851
            assert version_id == version_id2, \
852
                'logic error, inconsistent results'
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
853
            raw_datum.append(raw_data)
3015.2.19 by Robert Collins
Don't include the pack container length in the lengths given by get_data_stream.
854
            result_version_list.append(
855
                (version_id, options, len(raw_data), parents))
856
        # provide a callback to get data incrementally.
2535.3.3 by Andrew Bennetts
Add Knit.get_data_stream.
857
        pseudo_file = StringIO(''.join(raw_datum))
858
        def read(length):
859
            if length is None:
860
                return pseudo_file.read()
861
            else:
862
                return pseudo_file.read(length)
863
        return (self.get_format_signature(), result_version_list, read)
864
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
865
    def get_record_stream(self, versions, ordering, include_delta_closure):
866
        """Get a stream of records for versions.
867
868
        :param versions: The versions to include. Each version is a tuple
869
            (version,).
870
        :param ordering: Either 'unordered' or 'topological'. A topologically
871
            sorted stream has compression parents strictly before their
872
            children.
873
        :param include_delta_closure: If True then the closure across any
874
            compression parents will be included (in the opaque data).
875
        :return: An iterator of ContentFactory objects, each of which is only
876
            valid until the iterator is advanced.
877
        """
878
        if include_delta_closure:
879
            # Nb: what we should do is plan the data to stream to allow
880
            # reconstruction of all the texts without excessive buffering,
881
            # including re-sending common bases as needed. This makes the most
882
            # sense when we start serialising these streams though, so for now
883
            # we just fallback to individual text construction behind the
884
            # abstraction barrier.
885
            knit = self
886
        else:
887
            knit = None
888
        # Double index lookups here : need a unified api ?
889
        parent_map = self.get_parent_map(versions)
3350.3.12 by Robert Collins
Generate streams with absent records.
890
        absent_versions = set(versions) - set(parent_map)
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
891
        if ordering == 'topological':
3350.3.12 by Robert Collins
Generate streams with absent records.
892
            present_versions = topo_sort(parent_map)
893
        else:
894
            # List comprehension to keep the requested order (as that seems
895
            # marginally useful, at least until we start doing IO optimising
896
            # here.
897
            present_versions = [version for version in versions if version in
898
                parent_map]
899
        position_map = self._get_components_positions(present_versions)
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
900
        # c = component_id, r = record_details, i_m = index_memo, n = next
3350.3.12 by Robert Collins
Generate streams with absent records.
901
        records = [(version, position_map[version][1]) for version in
902
            present_versions]
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
903
        record_map = {}
3350.3.12 by Robert Collins
Generate streams with absent records.
904
        for version in absent_versions:
905
            yield AbsentContentFactory((version,))
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
906
        for version, raw_data, sha1 in \
907
                self._data.read_records_iter_raw(records):
908
            (record_details, index_memo, _) = position_map[version]
909
            yield KnitContentFactory(version, parent_map[version],
910
                record_details, sha1, raw_data, self.factory.annotated, knit)
911
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
912
    def _extract_blocks(self, version_id, source, target):
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
913
        if self._index.get_method(version_id) != 'line-delta':
914
            return None
915
        parent, sha1, noeol, delta = self.get_delta(version_id)
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
916
        return KnitContent.get_line_delta_blocks(delta, source, target)
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
917
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
918
    def get_delta(self, version_id):
919
        """Get a delta for constructing version from some other version."""
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
920
        self.check_not_reserved_id(version_id)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
921
        parents = self.get_parent_map([version_id])[version_id]
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
922
        if len(parents):
923
            parent = parents[0]
924
        else:
925
            parent = None
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
926
        index_memo = self._index.get_position(version_id)
927
        data, sha1 = self._data.read_records(((version_id, index_memo),))[version_id]
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
928
        noeol = 'no-eol' in self._index.get_options(version_id)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
929
        if 'fulltext' == self._index.get_method(version_id):
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
930
            new_content = self.factory.parse_fulltext(data, version_id)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
931
            if parent is not None:
932
                reference_content = self._get_content(parent)
933
                old_texts = reference_content.text()
934
            else:
935
                old_texts = []
936
            new_texts = new_content.text()
2781.1.1 by Martin Pool
merge cpatiencediff from Lukas
937
            delta_seq = patiencediff.PatienceSequenceMatcher(None, old_texts,
938
                                                             new_texts)
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
939
            return parent, sha1, noeol, self._make_line_delta(delta_seq, new_content)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
940
        else:
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
941
            delta = self.factory.parse_line_delta(data, version_id)
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
942
            return parent, sha1, noeol, delta
2535.3.1 by Andrew Bennetts
Add get_format_signature to VersionedFile
943
944
    def get_format_signature(self):
945
        """See VersionedFile.get_format_signature()."""
946
        if self.factory.annotated:
947
            annotated_part = "annotated"
948
        else:
949
            annotated_part = "plain"
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
950
        return "knit-%s" % (annotated_part,)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
951
        
3287.6.7 by Robert Collins
* ``VersionedFile.get_graph_with_ghosts`` is deprecated, with no
952
    @deprecated_method(one_four)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
953
    def get_graph_with_ghosts(self):
954
        """See VersionedFile.get_graph_with_ghosts()."""
3287.6.7 by Robert Collins
* ``VersionedFile.get_graph_with_ghosts`` is deprecated, with no
955
        return self.get_parent_map(self.versions())
1594.2.8 by Robert Collins
add ghost aware apis to knits.
956
2520.4.88 by Aaron Bentley
Retrieve all sha1s at once (ftw)
957
    def get_sha1s(self, version_ids):
3316.2.9 by Robert Collins
* ``VersionedFile.get_sha1`` is deprecated, please use
958
        """See VersionedFile.get_sha1s()."""
2520.4.88 by Aaron Bentley
Retrieve all sha1s at once (ftw)
959
        record_map = self._get_record_map(version_ids)
960
        # record entry 2 is the 'digest'.
961
        return [record_map[v][2] for v in version_ids]
1666.1.6 by Robert Collins
Make knit the default format.
962
3287.6.5 by Robert Collins
Deprecate VersionedFile.has_ghost.
963
    @deprecated_method(one_four)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
964
    def has_ghost(self, version_id):
965
        """True if there is a ghost reference in the file to version_id."""
966
        # maybe we have it
967
        if self.has_version(version_id):
968
            return False
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
969
        # optimisable if needed by memoising the _ghosts set.
3287.6.5 by Robert Collins
Deprecate VersionedFile.has_ghost.
970
        items = self.get_parent_map(self.versions())
3287.6.6 by Robert Collins
Unbreak has_ghosts.
971
        for parents in items.itervalues():
1594.2.8 by Robert Collins
add ghost aware apis to knits.
972
            for parent in parents:
3287.6.5 by Robert Collins
Deprecate VersionedFile.has_ghost.
973
                if parent == version_id and parent not in items:
974
                    return True
1594.2.8 by Robert Collins
add ghost aware apis to knits.
975
        return False
976
2535.3.30 by Andrew Bennetts
Delete obsolete comments and other cosmetic changes.
977
    def insert_data_stream(self, (format, data_list, reader_callable)):
2535.3.4 by Andrew Bennetts
Simple implementation of Knit.insert_data_stream.
978
        """Insert knit records from a data stream into this knit.
979
2535.3.5 by Andrew Bennetts
Batch writes as much as possible in insert_data_stream.
980
        If a version in the stream is already present in this knit, it will not
981
        be inserted a second time.  It will be checked for consistency with the
982
        stored version however, and may cause a KnitCorrupt error to be raised
983
        if the data in the stream disagrees with the already stored data.
2535.3.4 by Andrew Bennetts
Simple implementation of Knit.insert_data_stream.
984
        
985
        :seealso: get_data_stream
986
        """
987
        if format != self.get_format_signature():
3172.2.1 by Andrew Bennetts
Enable use of smart revision streaming between repos with compatible models, not just between identical format repos.
988
            if 'knit' in debug.debug_flags:
989
                trace.mutter(
990
                    'incompatible format signature inserting to %r', self)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
991
            source = self._knit_from_datastream(
992
                (format, data_list, reader_callable))
3350.3.10 by Robert Collins
Eliminate use of join in knit.insert_data_stream.
993
            stream = source.get_record_stream(source.versions(), 'unordered', False)
994
            self.insert_record_stream(stream)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
995
            return
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
996
997
        for version_id, options, length, parents in data_list:
998
            if self.has_version(version_id):
999
                # First check: the list of parents.
1000
                my_parents = self.get_parents_with_ghosts(version_id)
3184.5.1 by Lukáš Lalinský
Fix handling of some error cases in insert_data_stream
1001
                if tuple(my_parents) != tuple(parents):
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
1002
                    # XXX: KnitCorrupt is not quite the right exception here.
1003
                    raise KnitCorrupt(
1004
                        self.filename,
1005
                        'parents list %r from data stream does not match '
1006
                        'already recorded parents %r for %s'
1007
                        % (parents, my_parents, version_id))
1008
1009
                # Also check the SHA-1 of the fulltext this content will
1010
                # produce.
1011
                raw_data = reader_callable(length)
3316.2.9 by Robert Collins
* ``VersionedFile.get_sha1`` is deprecated, please use
1012
                my_fulltext_sha1 = self.get_sha1s([version_id])[0]
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
1013
                df, rec = self._data._parse_record_header(version_id, raw_data)
1014
                stream_fulltext_sha1 = rec[3]
1015
                if my_fulltext_sha1 != stream_fulltext_sha1:
1016
                    # Actually, we don't know if it's this knit that's corrupt,
1017
                    # or the data stream we're trying to insert.
1018
                    raise KnitCorrupt(
1019
                        self.filename, 'sha-1 does not match %s' % version_id)
1020
            else:
2535.3.57 by Andrew Bennetts
Perform some sanity checking of data streams rather than blindly inserting them into our repository.
1021
                if 'line-delta' in options:
2535.3.61 by Andrew Bennetts
Clarify sanity checking in insert_data_stream.
1022
                    # Make sure that this knit record is actually useful: a
1023
                    # line-delta is no use unless we have its parent.
1024
                    # Fetching from a broken repository with this problem
1025
                    # shouldn't break the target repository.
3040.2.1 by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled
1026
                    #
1027
                    # See https://bugs.launchpad.net/bzr/+bug/164443
2535.3.61 by Andrew Bennetts
Clarify sanity checking in insert_data_stream.
1028
                    if not self._index.has_version(parents[0]):
1029
                        raise KnitCorrupt(
1030
                            self.filename,
3040.2.1 by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled
1031
                            'line-delta from stream '
1032
                            'for version %s '
1033
                            'references '
1034
                            'missing parent %s\n'
3040.2.2 by Martin Pool
Clearer reconcile recommendation message (thanks Matt Nordhoff)
1035
                            'Try running "bzr check" '
1036
                            'on the source repository, and "bzr reconcile" '
3040.2.1 by Martin Pool
Give a better message when failing to pull because the source needs to be reconciled
1037
                            'if necessary.' %
1038
                            (version_id, parents[0]))
3360.2.9 by Martin Pool
merge #205156
1039
                    if not self.delta:
1040
                        # We received a line-delta record for a non-delta knit.
1041
                        # Convert it to a fulltext.
1042
                        gzip_bytes = reader_callable(length)
1043
                        lines, sha1 = self._data._parse_record(
1044
                            version_id, gzip_bytes)
1045
                        delta = self.factory.parse_line_delta(lines,
1046
                                version_id)
1047
                        content = self.factory.make(
1048
                            self.get_lines(parents[0]), parents[0])
1049
                        content.apply_delta(delta, version_id)
1050
                        digest, len, content = self.add_lines(
1051
                            version_id, parents, content.text())
1052
                        if digest != sha1:
1053
                            raise errors.VersionedFileInvalidChecksum(version)
1054
                        continue
1055
2535.3.17 by Andrew Bennetts
[broken] Closer to a working Repository.fetch_revisions smart request.
1056
                self._add_raw_records(
1057
                    [(version_id, options, parents, length)],
1058
                    reader_callable(length))
1059
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
1060
    def _knit_from_datastream(self, (format, data_list, reader_callable)):
1061
        """Create a knit object from a data stream.
1062
1063
        This method exists to allow conversion of data streams that do not
1064
        match the signature of this knit. Generally it will be slower and use
1065
        more memory to use this method to insert data, but it will work.
1066
1067
        :seealso: get_data_stream for details on datastreams.
1068
        :return: A knit versioned file which can be used to join the datastream
1069
            into self.
1070
        """
1071
        if format == "knit-plain":
1072
            factory = KnitPlainFactory()
1073
        elif format == "knit-annotated":
1074
            factory = KnitAnnotateFactory()
1075
        else:
1076
            raise errors.KnitDataStreamUnknown(format)
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1077
        index = _StreamIndex(data_list, self._index)
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
1078
        access = _StreamAccess(reader_callable, index, self, factory)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
1079
        return KnitVersionedFile(self.filename, self.transport,
1080
            factory=factory, index=index, access_method=access)
1081
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1082
    def insert_record_stream(self, stream):
1083
        """Insert a record stream into this versioned file.
1084
1085
        :param stream: A stream of records to insert. 
1086
        :return: None
1087
        :seealso VersionedFile.get_record_stream:
1088
        """
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1089
        def get_adapter(adapter_key):
1090
            try:
1091
                return adapters[adapter_key]
1092
            except KeyError:
1093
                adapter_factory = adapter_registry.get(adapter_key)
1094
                adapter = adapter_factory(self)
1095
                adapters[adapter_key] = adapter
1096
                return adapter
1097
        if self.factory.annotated:
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1098
            # self is annotated, we need annotated knits to use directly.
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1099
            annotated = "annotated-"
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1100
            convertibles = []
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1101
        else:
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1102
            # self is not annotated, but we can strip annotations cheaply.
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1103
            annotated = ""
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1104
            convertibles = set(["knit-annotated-delta-gz",
1105
                "knit-annotated-ft-gz"])
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1106
        native_types = set()
1107
        native_types.add("knit-%sdelta-gz" % annotated)
1108
        native_types.add("knit-%sft-gz" % annotated)
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1109
        knit_types = native_types.union(convertibles)
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1110
        adapters = {}
1111
        for record in stream:
1112
            # adapt to non-tuple interface
1113
            parents = [parent[0] for parent in record.parents]
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1114
            if record.storage_kind in knit_types:
1115
                if record.storage_kind not in native_types:
1116
                    try:
1117
                        adapter_key = (record.storage_kind, "knit-delta-gz")
1118
                        adapter = get_adapter(adapter_key)
1119
                    except KeyError:
1120
                        adapter_key = (record.storage_kind, "knit-ft-gz")
1121
                        adapter = get_adapter(adapter_key)
1122
                    bytes = adapter.get_bytes(
1123
                        record, record.get_bytes_as(record.storage_kind))
1124
                else:
1125
                    bytes = record.get_bytes_as(record.storage_kind)
1126
                options = [record._build_details[0]]
1127
                if record._build_details[1]:
1128
                    options.append('no-eol')
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1129
                # Just blat it across.
1130
                # Note: This does end up adding data on duplicate keys. As
1131
                # modern repositories use atomic insertions this should not
1132
                # lead to excessive growth in the event of interrupted fetches.
1133
                # 'knit' repositories may suffer excessive growth, but as a
1134
                # deprecated format this is tolerable. It can be fixed if
1135
                # needed by in the kndx index support raising on a duplicate
1136
                # add with identical parents and options.
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1137
                self._add_raw_records(
1138
                    [(record.key[0], options, parents, len(bytes))],
1139
                    bytes)
1140
            elif record.storage_kind == 'fulltext':
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1141
                self.add_lines(record.key[0], parents,
1142
                    split_lines(record.get_bytes_as('fulltext')))
1143
            else:
1144
                adapter_key = record.storage_kind, 'fulltext'
3350.3.9 by Robert Collins
Avoid full text reconstruction when transferring knit to knit via record streams.
1145
                adapter = get_adapter(adapter_key)
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1146
                lines = split_lines(adapter.get_bytes(
1147
                    record, record.get_bytes_as(record.storage_kind)))
3350.3.11 by Robert Collins
Test inserting a stream that overlaps the current content of a knit does not error.
1148
                try:
1149
                    self.add_lines(record.key[0], parents, lines)
1150
                except errors.RevisionAlreadyPresent:
1151
                    pass
3350.3.8 by Robert Collins
Basic stream insertion, no fast path yet for knit to knit.
1152
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1153
    def versions(self):
1154
        """See VersionedFile.versions."""
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
1155
        if 'evil' in debug.debug_flags:
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
1156
            trace.mutter_callsite(2, "versions scales with size of history")
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1157
        return self._index.get_versions()
1158
1159
    def has_version(self, version_id):
1160
        """See VersionedFile.has_version."""
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
1161
        if 'evil' in debug.debug_flags:
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
1162
            trace.mutter_callsite(2, "has_version is a LBYL scenario")
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1163
        return self._index.has_version(version_id)
1164
1165
    __contains__ = has_version
1166
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1167
    def _merge_annotations(self, content, parents, parent_texts={},
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1168
                           delta=None, annotated=None,
1169
                           left_matching_blocks=None):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1170
        """Merge annotations for content.  This is done by comparing
1596.2.27 by Robert Collins
Note potential improvements in knit adds.
1171
        the annotations based on changed to the text.
1172
        """
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
1173
        if left_matching_blocks is not None:
1174
            delta_seq = diff._PrematchedMatcher(left_matching_blocks)
1175
        else:
1176
            delta_seq = None
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1177
        if annotated:
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
1178
            for parent_id in parents:
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1179
                merge_content = self._get_content(parent_id, parent_texts)
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
1180
                if (parent_id == parents[0] and delta_seq is not None):
1181
                    seq = delta_seq
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1182
                else:
1183
                    seq = patiencediff.PatienceSequenceMatcher(
1184
                        None, merge_content.text(), content.text())
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1185
                for i, j, n in seq.get_matching_blocks():
1186
                    if n == 0:
1187
                        continue
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
1188
                    # this appears to copy (origin, text) pairs across to the
1189
                    # new content for any line that matches the last-checked
1190
                    # parent.
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1191
                    content._lines[j:j+n] = merge_content._lines[i:i+n]
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
1192
        if delta:
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
1193
            if delta_seq is None:
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
1194
                reference_content = self._get_content(parents[0], parent_texts)
1195
                new_texts = content.text()
1196
                old_texts = reference_content.text()
2104.4.2 by John Arbash Meinel
Small cleanup and NEWS entry about fixing bug #65714
1197
                delta_seq = patiencediff.PatienceSequenceMatcher(
2100.2.1 by wang
Replace python's difflib by patiencediff because the worst case
1198
                                                 None, old_texts, new_texts)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
1199
            return self._make_line_delta(delta_seq, content)
1200
1201
    def _make_line_delta(self, delta_seq, new_content):
1202
        """Generate a line delta from delta_seq and new_content."""
1203
        diff_hunks = []
1204
        for op in delta_seq.get_opcodes():
1205
            if op[0] == 'equal':
1206
                continue
1207
            diff_hunks.append((op[1], op[2], op[4]-op[3], new_content._lines[op[3]:op[4]]))
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1208
        return diff_hunks
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1209
1756.3.17 by Aaron Bentley
Combine get_components_positions with get_components_versions
1210
    def _get_components_positions(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
1211
        """Produce a map of position data for the components of versions.
1212
1756.3.22 by Aaron Bentley
Tweaks from review
1213
        This data is intended to be used for retrieving the knit records.
1756.3.19 by Aaron Bentley
Documentation and cleanups
1214
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1215
        A dict of version_id to (record_details, index_memo, next, parents) is
1756.3.19 by Aaron Bentley
Documentation and cleanups
1216
        returned.
1217
        method is the way referenced data should be applied.
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1218
        index_memo is the handle to pass to the data access to actually get the
1219
            data
1756.3.19 by Aaron Bentley
Documentation and cleanups
1220
        next is the build-parent of the version, or None for fulltexts.
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1221
        parents is the version_ids of the parents of this version
1756.3.19 by Aaron Bentley
Documentation and cleanups
1222
        """
1756.3.9 by Aaron Bentley
More optimization refactoring
1223
        component_data = {}
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1224
        pending_components = version_ids
1225
        while pending_components:
1226
            build_details = self._index.get_build_details(pending_components)
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1227
            current_components = set(pending_components)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1228
            pending_components = set()
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1229
            for version_id, details in build_details.iteritems():
1230
                (index_memo, compression_parent, parents,
1231
                 record_details) = details
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1232
                method = record_details[0]
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1233
                if compression_parent is not None:
1234
                    pending_components.add(compression_parent)
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1235
                component_data[version_id] = (record_details, index_memo,
3224.1.13 by John Arbash Meinel
Revert the _get_component_positions api
1236
                                              compression_parent)
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1237
            missing = current_components.difference(build_details)
1238
            if missing:
1239
                raise errors.RevisionNotPresent(missing.pop(), self.filename)
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
1240
        return component_data
1756.3.18 by Aaron Bentley
More cleanup
1241
       
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1242
    def _get_content(self, version_id, parent_texts={}):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1243
        """Returns a content object that makes up the specified
1244
        version."""
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1245
        cached_version = parent_texts.get(version_id, None)
1246
        if cached_version is not None:
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1247
            if not self.has_version(version_id):
1248
                raise RevisionNotPresent(version_id, self.filename)
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1249
            return cached_version
1250
1756.3.22 by Aaron Bentley
Tweaks from review
1251
        text_map, contents_map = self._get_content_maps([version_id])
1252
        return contents_map[version_id]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1253
1254
    def _check_versions_present(self, version_ids):
1255
        """Check that all specified versions are present."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1256
        self._index.check_versions_present(version_ids)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1257
2794.1.1 by Robert Collins
Allow knits to be instructed not to add a text based on a sha, for commit.
1258
    def _add_lines_with_ghosts(self, version_id, parents, lines, parent_texts,
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
1259
        nostore_sha, random_id, check_content, left_matching_blocks):
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1260
        """See VersionedFile.add_lines_with_ghosts()."""
2805.6.7 by Robert Collins
Review feedback.
1261
        self._check_add(version_id, lines, random_id, check_content)
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1262
        return self._add(version_id, lines, parents, self.delta,
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
1263
            parent_texts, left_matching_blocks, nostore_sha, random_id)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1264
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1265
    def _add_lines(self, version_id, parents, lines, parent_texts,
2805.6.7 by Robert Collins
Review feedback.
1266
        left_matching_blocks, nostore_sha, random_id, check_content):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1267
        """See VersionedFile.add_lines."""
2805.6.7 by Robert Collins
Review feedback.
1268
        self._check_add(version_id, lines, random_id, check_content)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1269
        self._check_versions_present(parents)
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1270
        return self._add(version_id, lines[:], parents, self.delta,
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1271
            parent_texts, left_matching_blocks, nostore_sha, random_id)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1272
2805.6.7 by Robert Collins
Review feedback.
1273
    def _check_add(self, version_id, lines, random_id, check_content):
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1274
        """check that version_id and lines are safe to add."""
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1275
        if contains_whitespace(version_id):
1668.5.1 by Olaf Conradi
Fix bug in knits when raising InvalidRevisionId without the required
1276
            raise InvalidRevisionId(version_id, self.filename)
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
1277
        self.check_not_reserved_id(version_id)
2805.6.4 by Robert Collins
Don't check for existing versions when adding texts with random revision ids.
1278
        # Technically this could be avoided if we are happy to allow duplicate
1279
        # id insertion when other things than bzr core insert texts, but it
1280
        # seems useful for folk using the knit api directly to have some safety
1281
        # blanket that we can disable.
1282
        if not random_id and self.has_version(version_id):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1283
            raise RevisionAlreadyPresent(version_id, self.filename)
2805.6.7 by Robert Collins
Review feedback.
1284
        if check_content:
1285
            self._check_lines_not_unicode(lines)
1286
            self._check_lines_are_lines(lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1287
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1288
    def _add(self, version_id, lines, parents, delta, parent_texts,
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1289
        left_matching_blocks, nostore_sha, random_id):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1290
        """Add a set of lines on top of version specified by parents.
1291
1292
        If delta is true, compress the text as a line-delta against
1293
        the first parent.
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1294
1295
        Any versions not present will be converted into ghosts.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1296
        """
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1297
        # first thing, if the content is something we don't need to store, find
1298
        # that out.
1299
        line_bytes = ''.join(lines)
1300
        digest = sha_string(line_bytes)
1301
        if nostore_sha == digest:
1302
            raise errors.ExistingContent
1596.2.28 by Robert Collins
more knit profile based tuning.
1303
1596.2.10 by Robert Collins
Reviewer feedback on knit branches.
1304
        present_parents = []
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1305
        if parent_texts is None:
1306
            parent_texts = {}
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1307
        for parent in parents:
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1308
            if self.has_version(parent):
1596.2.10 by Robert Collins
Reviewer feedback on knit branches.
1309
                present_parents.append(parent)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1310
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1311
        # can only compress against the left most present parent.
1312
        if (delta and
1313
            (len(present_parents) == 0 or
1314
             present_parents[0] != parents[0])):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1315
            delta = False
1316
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1317
        text_length = len(line_bytes)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1318
        options = []
1319
        if lines:
1320
            if lines[-1][-1] != '\n':
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1321
                # copy the contents of lines.
1322
                lines = lines[:]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1323
                options.append('no-eol')
1324
                lines[-1] = lines[-1] + '\n'
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
1325
                line_bytes += '\n'
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1326
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1327
        if delta:
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1328
            # To speed the extract of texts the delta chain is limited
1329
            # to a fixed number of deltas.  This should minimize both
1330
            # I/O and the time spend applying deltas.
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
1331
            delta = self._check_should_delta(present_parents)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1332
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1333
        assert isinstance(version_id, str)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1334
        content = self.factory.make(lines, version_id)
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
1335
        if delta or (self.factory.annotated and len(present_parents) > 0):
2805.6.2 by Robert Collins
General cleanup of KnitVersionedFile._add.
1336
            # Merge annotations from parent texts if needed.
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1337
            delta_hunks = self._merge_annotations(content, present_parents,
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
1338
                parent_texts, delta, self.factory.annotated,
1339
                left_matching_blocks)
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
1340
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1341
        if delta:
1342
            options.append('line-delta')
1343
            store_lines = self.factory.lower_line_delta(delta_hunks)
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1344
            size, bytes = self._data._record_to_data(version_id, digest,
1345
                store_lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1346
        else:
1347
            options.append('fulltext')
2888.1.3 by Robert Collins
Review feedback.
1348
            # isinstance is slower and we have no hierarchy.
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
1349
            if self.factory.__class__ == KnitPlainFactory:
2888.1.3 by Robert Collins
Review feedback.
1350
                # Use the already joined bytes saving iteration time in
1351
                # _record_to_data.
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
1352
                size, bytes = self._data._record_to_data(version_id, digest,
1353
                    lines, [line_bytes])
1354
            else:
1355
                # get mixed annotation + content and feed it into the
1356
                # serialiser.
1357
                store_lines = self.factory.lower_fulltext(content)
1358
                size, bytes = self._data._record_to_data(version_id, digest,
1359
                    store_lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1360
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1361
        access_memo = self._data.add_raw_records([size], bytes)[0]
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1362
        self._index.add_versions(
2850.1.1 by Robert Collins
* ``KnitVersionedFile.add*`` will no longer cache added records even when
1363
            ((version_id, options, access_memo, parents),),
1364
            random_id=random_id)
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1365
        return digest, text_length, content
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1366
1563.2.19 by Robert Collins
stub out a check for knits.
1367
    def check(self, progress_bar=None):
1368
        """See VersionedFile.check()."""
1369
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1370
    def get_lines(self, version_id):
1371
        """See VersionedFile.get_lines()."""
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
1372
        return self.get_line_list([version_id])[0]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1373
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
1374
    def _get_record_map(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
1375
        """Produce a dictionary of knit records.
1376
        
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1377
        :return: {version_id:(record, record_details, digest, next)}
1378
            record
1379
                data returned from read_records
1380
            record_details
1381
                opaque information to pass to parse_record
1382
            digest
1383
                SHA1 digest of the full text after all steps are done
1384
            next
1385
                build-parent of the version, i.e. the leftmost ancestor.
1386
                Will be None if the record is not a delta.
1756.3.19 by Aaron Bentley
Documentation and cleanups
1387
        """
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
1388
        position_map = self._get_components_positions(version_ids)
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1389
        # c = component_id, r = record_details, i_m = index_memo, n = next
1390
        records = [(c, i_m) for c, (r, i_m, n)
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1391
                             in position_map.iteritems()]
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
1392
        record_map = {}
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1393
        for component_id, record, digest in \
1863.1.9 by John Arbash Meinel
Switching to have 'read_records_iter' return in random order.
1394
                self._data.read_records_iter(records):
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1395
            (record_details, index_memo, next) = position_map[component_id]
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1396
            record_map[component_id] = record, record_details, digest, next
3224.1.13 by John Arbash Meinel
Revert the _get_component_positions api
1397
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
1398
        return record_map
1756.2.5 by Aaron Bentley
Reduced read_records calls to 1
1399
1756.2.7 by Aaron Bentley
Implement get_text in terms of get_texts
1400
    def get_text(self, version_id):
1401
        """See VersionedFile.get_text"""
1402
        return self.get_texts([version_id])[0]
1403
1756.2.1 by Aaron Bentley
Implement get_texts
1404
    def get_texts(self, version_ids):
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
1405
        return [''.join(l) for l in self.get_line_list(version_ids)]
1406
1407
    def get_line_list(self, version_ids):
1756.2.1 by Aaron Bentley
Implement get_texts
1408
        """Return the texts of listed versions as a list of strings."""
2229.2.1 by Aaron Bentley
Reject reserved ids in versiondfile, tree, branch and repository
1409
        for version_id in version_ids:
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
1410
            self.check_not_reserved_id(version_id)
1756.3.13 by Aaron Bentley
Refactor get_line_list into _get_content
1411
        text_map, content_map = self._get_content_maps(version_ids)
1412
        return [text_map[v] for v in version_ids]
1413
2520.4.90 by Aaron Bentley
Handle \r terminated lines in Weaves properly
1414
    _get_lf_split_line_list = get_line_list
2520.4.3 by Aaron Bentley
Implement plain strategy for extracting and installing multiparent diffs
1415
1756.3.13 by Aaron Bentley
Refactor get_line_list into _get_content
1416
    def _get_content_maps(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
1417
        """Produce maps of text and KnitContents
1418
        
1419
        :return: (text_map, content_map) where text_map contains the texts for
1420
        the requested versions and content_map contains the KnitContents.
1756.3.22 by Aaron Bentley
Tweaks from review
1421
        Both dicts take version_ids as their keys.
1756.3.19 by Aaron Bentley
Documentation and cleanups
1422
        """
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
1423
        # FUTURE: This function could be improved for the 'extract many' case
1424
        # by tracking each component and only doing the copy when the number of
1425
        # children than need to apply delta's to it is > 1 or it is part of the
1426
        # final output.
1427
        version_ids = list(version_ids)
1428
        multiple_versions = len(version_ids) != 1
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
1429
        record_map = self._get_record_map(version_ids)
1756.2.5 by Aaron Bentley
Reduced read_records calls to 1
1430
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
1431
        text_map = {}
1756.3.7 by Aaron Bentley
Avoid re-parsing texts version components
1432
        content_map = {}
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
1433
        final_content = {}
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
1434
        for version_id in version_ids:
1435
            components = []
1436
            cursor = version_id
1437
            while cursor is not None:
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1438
                record, record_details, digest, next = record_map[cursor]
1439
                components.append((cursor, record, record_details, digest))
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
1440
                if cursor in content_map:
1441
                    break
1442
                cursor = next
1443
1756.2.1 by Aaron Bentley
Implement get_texts
1444
            content = None
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1445
            for (component_id, record, record_details,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1446
                 digest) in reversed(components):
1756.3.7 by Aaron Bentley
Avoid re-parsing texts version components
1447
                if component_id in content_map:
1448
                    content = content_map[component_id]
1756.3.8 by Aaron Bentley
Avoid unused calls, use generators, sets instead of lists
1449
                else:
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1450
                    content, delta = self.factory.parse_record(version_id,
3224.1.17 by John Arbash Meinel
Clean up some variable ordering to make more sense.
1451
                        record, record_details, content,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1452
                        copy_base_content=multiple_versions)
2921.2.1 by Robert Collins
* Knit text reconstruction now avoids making copies of the lines list for
1453
                    if multiple_versions:
1454
                        content_map[component_id] = content
1756.2.1 by Aaron Bentley
Implement get_texts
1455
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1456
            content.cleanup_eol(copy_on_mutate=multiple_versions)
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
1457
            final_content[version_id] = content
1756.2.1 by Aaron Bentley
Implement get_texts
1458
1459
            # digest here is the digest from the last applied component.
1756.3.6 by Aaron Bentley
More multi-text extraction
1460
            text = content.text()
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
1461
            actual_sha = sha_strings(text)
1462
            if actual_sha != digest:
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1463
                raise KnitCorrupt(self.filename,
2911.1.1 by Martin Pool
Better messages when problems are detected inside a knit
1464
                    '\n  sha-1 %s'
1465
                    '\n  of reconstructed text does not match'
1466
                    '\n  expected %s'
1467
                    '\n  for version %s' %
1468
                    (actual_sha, digest, version_id))
2794.1.2 by Robert Collins
Nuke versioned file add/get delta support, allowing easy simplification of unannotated Content, reducing memory copies and friction during commit on unannotated texts.
1469
            text_map[version_id] = text
1470
        return text_map, final_content
1756.2.1 by Aaron Bentley
Implement get_texts
1471
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1472
    def iter_lines_added_or_present_in_versions(self, version_ids=None, 
1473
                                                pb=None):
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
1474
        """See VersionedFile.iter_lines_added_or_present_in_versions()."""
1475
        if version_ids is None:
1476
            version_ids = self.versions()
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1477
        if pb is None:
1478
            pb = progress.DummyProgress()
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
1479
        # we don't care about inclusions, the caller cares.
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
1480
        # but we need to setup a list of records to visit.
1481
        # we need version_id, position, length
1482
        version_id_records = []
2163.1.1 by John Arbash Meinel
Use a set to make iter_lines_added_or_present *much* faster
1483
        requested_versions = set(version_ids)
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
1484
        # filter for available versions
2698.2.4 by Robert Collins
Remove full history scan during iter_lines_added_or_present in KnitVersionedFile.
1485
        for version_id in requested_versions:
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
1486
            if not self.has_version(version_id):
1487
                raise RevisionNotPresent(version_id, self.filename)
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
1488
        # get a in-component-order queue:
1489
        for version_id in self.versions():
1490
            if version_id in requested_versions:
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1491
                index_memo = self._index.get_position(version_id)
1492
                version_id_records.append((version_id, index_memo))
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
1493
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
1494
        total = len(version_id_records)
2147.1.3 by John Arbash Meinel
In knit.py we were re-using a variable in 2 loops, causing bogus progress messages to be generated.
1495
        for version_idx, (version_id, data, sha_value) in \
1496
            enumerate(self._data.read_records_iter(version_id_records)):
1497
            pb.update('Walking content.', version_idx, total)
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1498
            method = self._index.get_method(version_id)
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1499
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1500
            assert method in ('fulltext', 'line-delta')
1501
            if method == 'fulltext':
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1502
                line_iterator = self.factory.get_fulltext_content(data)
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1503
            else:
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1504
                line_iterator = self.factory.get_linedelta_content(data)
2975.3.1 by Robert Collins
Change (without backwards compatibility) the
1505
            # XXX: It might be more efficient to yield (version_id,
1506
            # line_iterator) in the future. However for now, this is a simpler
1507
            # change to integrate into the rest of the codebase. RBC 20071110
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1508
            for line in line_iterator:
2975.3.1 by Robert Collins
Change (without backwards compatibility) the
1509
                yield line, version_id
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
1510
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
1511
        pb.update('Walking content.', total, total)
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
1512
        
1563.2.18 by Robert Collins
get knit repositories really using knits for text storage.
1513
    def num_versions(self):
1514
        """See VersionedFile.num_versions()."""
1515
        return self._index.num_versions()
1516
1517
    __len__ = num_versions
1518
3316.2.13 by Robert Collins
* ``VersionedFile.annotate_iter`` is deprecated. While in principal this
1519
    def annotate(self, version_id):
1520
        """See VersionedFile.annotate."""
1521
        return self.factory.annotate(self, version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1522
3287.5.1 by Robert Collins
Add VersionedFile.get_parent_map.
1523
    def get_parent_map(self, version_ids):
1524
        """See VersionedFile.get_parent_map."""
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
1525
        return self._index.get_parent_map(version_ids)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1526
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1527
    def get_ancestry(self, versions, topo_sorted=True):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1528
        """See VersionedFile.get_ancestry."""
1529
        if isinstance(versions, basestring):
1530
            versions = [versions]
1531
        if not versions:
1532
            return []
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1533
        return self._index.get_ancestry(versions, topo_sorted)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1534
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1535
    def get_ancestry_with_ghosts(self, versions):
1536
        """See VersionedFile.get_ancestry_with_ghosts."""
1537
        if isinstance(versions, basestring):
1538
            versions = [versions]
1539
        if not versions:
1540
            return []
1541
        return self._index.get_ancestry_with_ghosts(versions)
1542
1664.2.3 by Aaron Bentley
Add failing test case
1543
    def plan_merge(self, ver_a, ver_b):
1664.2.11 by Aaron Bentley
Clarifications from merge review
1544
        """See VersionedFile.plan_merge."""
2490.2.33 by Aaron Bentley
Disable topological sorting of get_ancestry where sensible
1545
        ancestors_b = set(self.get_ancestry(ver_b, topo_sorted=False))
1546
        ancestors_a = set(self.get_ancestry(ver_a, topo_sorted=False))
1664.2.4 by Aaron Bentley
Identify unchanged lines correctly
1547
        annotated_a = self.annotate(ver_a)
1548
        annotated_b = self.annotate(ver_b)
1551.15.46 by Aaron Bentley
Move plan merge to tree
1549
        return merge._plan_annotate_merge(annotated_a, annotated_b,
1550
                                          ancestors_a, ancestors_b)
1664.2.4 by Aaron Bentley
Identify unchanged lines correctly
1551
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1552
1553
class _KnitComponentFile(object):
1554
    """One of the files used to implement a knit database"""
1555
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1556
    def __init__(self, transport, filename, mode, file_mode=None,
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1557
                 create_parent_dir=False, dir_mode=None):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1558
        self._transport = transport
1559
        self._filename = filename
1560
        self._mode = mode
1946.2.3 by John Arbash Meinel
Pass around the file mode correctly
1561
        self._file_mode = file_mode
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1562
        self._dir_mode = dir_mode
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1563
        self._create_parent_dir = create_parent_dir
1564
        self._need_to_create = False
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1565
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1566
    def _full_path(self):
1567
        """Return the full path to this file."""
1568
        return self._transport.base + self._filename
1569
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1570
    def check_header(self, fp):
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1571
        line = fp.readline()
2171.1.1 by John Arbash Meinel
Knit index files should ignore empty indexes rather than consider them corrupt.
1572
        if line == '':
1573
            # An empty file can actually be treated as though the file doesn't
1574
            # exist yet.
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1575
            raise errors.NoSuchFile(self._full_path())
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1576
        if line != self.HEADER:
2171.1.1 by John Arbash Meinel
Knit index files should ignore empty indexes rather than consider them corrupt.
1577
            raise KnitHeaderError(badline=line,
1578
                              filename=self._transport.abspath(self._filename))
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1579
1580
    def __repr__(self):
1581
        return '%s(%s)' % (self.__class__.__name__, self._filename)
1582
1583
1584
class _KnitIndex(_KnitComponentFile):
1585
    """Manages knit index file.
1586
1587
    The index is already kept in memory and read on startup, to enable
1588
    fast lookups of revision information.  The cursor of the index
1589
    file is always pointing to the end, making it easy to append
1590
    entries.
1591
1592
    _cache is a cache for fast mapping from version id to a Index
1593
    object.
1594
1595
    _history is a cache for fast mapping from indexes to version ids.
1596
1597
    The index data format is dictionary compressed when it comes to
1598
    parent references; a index entry may only have parents that with a
1599
    lover index number.  As a result, the index is topological sorted.
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
1600
1601
    Duplicate entries may be written to the index for a single version id
1602
    if this is done then the latter one completely replaces the former:
1603
    this allows updates to correct version and parent information. 
1604
    Note that the two entries may share the delta, and that successive
1605
    annotations and references MUST point to the first entry.
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1606
1607
    The index file on disc contains a header, followed by one line per knit
1608
    record. The same revision can be present in an index file more than once.
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1609
    The first occurrence gets assigned a sequence number starting from 0. 
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1610
    
1611
    The format of a single line is
1612
    REVISION_ID FLAGS BYTE_OFFSET LENGTH( PARENT_ID|PARENT_SEQUENCE_ID)* :\n
1613
    REVISION_ID is a utf8-encoded revision id
1614
    FLAGS is a comma separated list of flags about the record. Values include 
1615
        no-eol, line-delta, fulltext.
1616
    BYTE_OFFSET is the ascii representation of the byte offset in the data file
1617
        that the the compressed data starts at.
1618
    LENGTH is the ascii representation of the length of the data file.
1619
    PARENT_ID a utf-8 revision id prefixed by a '.' that is a parent of
1620
        REVISION_ID.
1621
    PARENT_SEQUENCE_ID the ascii representation of the sequence number of a
1622
        revision id already in the knit that is a parent of REVISION_ID.
1623
    The ' :' marker is the end of record marker.
1624
    
1625
    partial writes:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1626
    when a write is interrupted to the index file, it will result in a line
1627
    that does not end in ' :'. If the ' :' is not present at the end of a line,
1628
    or at the end of the file, then the record that is missing it will be
1629
    ignored by the parser.
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1630
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1631
    When writing new records to the index file, the data is preceded by '\n'
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1632
    to ensure that records always start on new lines even if the last write was
1633
    interrupted. As a result its normal for the last line in the index to be
1634
    missing a trailing newline. One can be added with no harmful effects.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1635
    """
1636
1666.1.6 by Robert Collins
Make knit the default format.
1637
    HEADER = "# bzr knit index 8\n"
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1638
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1639
    # speed of knit parsing went from 280 ms to 280 ms with slots addition.
1640
    # __slots__ = ['_cache', '_history', '_transport', '_filename']
1641
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1642
    def _cache_version(self, version_id, options, pos, size, parents):
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1643
        """Cache a version record in the history array and index cache.
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1644
1645
        This is inlined into _load_data for performance. KEEP IN SYNC.
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1646
        (It saves 60ms, 25% of the __init__ overhead on local 4000 record
1647
         indexes).
1648
        """
1596.2.14 by Robert Collins
Make knit parsing non quadratic?
1649
        # only want the _history index to reference the 1st index entry
1650
        # for version_id
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1651
        if version_id not in self._cache:
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1652
            index = len(self._history)
1596.2.14 by Robert Collins
Make knit parsing non quadratic?
1653
            self._history.append(version_id)
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1654
        else:
1655
            index = self._cache[version_id][5]
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1656
        self._cache[version_id] = (version_id,
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1657
                                   options,
1658
                                   pos,
1659
                                   size,
1660
                                   parents,
1661
                                   index)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1662
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
1663
    def _check_write_ok(self):
3316.2.5 by Robert Collins
Review feedback.
1664
        if self._get_scope() != self._scope:
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
1665
            raise errors.OutSideTransaction()
1666
        if self._mode != 'w':
1667
            raise errors.ReadOnlyObjectDirtiedError(self)
1668
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1669
    def __init__(self, transport, filename, mode, create=False, file_mode=None,
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
1670
        create_parent_dir=False, delay_create=False, dir_mode=None,
1671
        get_scope=None):
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1672
        _KnitComponentFile.__init__(self, transport, filename, mode,
1673
                                    file_mode=file_mode,
1674
                                    create_parent_dir=create_parent_dir,
1675
                                    dir_mode=dir_mode)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1676
        self._cache = {}
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
1677
        # position in _history is the 'official' index for a revision
1678
        # but the values may have come from a newer entry.
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1679
        # so - wc -l of a knit index is != the number of unique names
1773.4.1 by Martin Pool
Add pyflakes makefile target; fix many warnings
1680
        # in the knit.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1681
        self._history = []
1682
        try:
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1683
            fp = self._transport.get(self._filename)
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
1684
            try:
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1685
                # _load_data may raise NoSuchFile if the target knit is
1686
                # completely empty.
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
1687
                _load_data(self, fp)
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1688
            finally:
1689
                fp.close()
1690
        except NoSuchFile:
1691
            if mode != 'w' or not create:
1692
                raise
1693
            elif delay_create:
1694
                self._need_to_create = True
1695
            else:
1696
                self._transport.put_bytes_non_atomic(
1697
                    self._filename, self.HEADER, mode=self._file_mode)
3316.2.5 by Robert Collins
Review feedback.
1698
        self._scope = get_scope()
1699
        self._get_scope = get_scope
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1700
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1701
    def get_ancestry(self, versions, topo_sorted=True):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1702
        """See VersionedFile.get_ancestry."""
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1703
        # get a graph of all the mentioned versions:
1704
        graph = {}
1705
        pending = set(versions)
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1706
        cache = self._cache
1707
        while pending:
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1708
            version = pending.pop()
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1709
            # trim ghosts
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1710
            try:
1711
                parents = [p for p in cache[version][4] if p in cache]
1712
            except KeyError:
1713
                raise RevisionNotPresent(version, self._filename)
1714
            # if not completed and not a ghost
1715
            pending.update([p for p in parents if p not in graph])
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1716
            graph[version] = parents
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1717
        if not topo_sorted:
1718
            return graph.keys()
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1719
        return topo_sort(graph.items())
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1720
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1721
    def get_ancestry_with_ghosts(self, versions):
1722
        """See VersionedFile.get_ancestry_with_ghosts."""
1723
        # get a graph of all the mentioned versions:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1724
        self.check_versions_present(versions)
1725
        cache = self._cache
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1726
        graph = {}
1727
        pending = set(versions)
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1728
        while pending:
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1729
            version = pending.pop()
1730
            try:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1731
                parents = cache[version][4]
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1732
            except KeyError:
1733
                # ghost, fake it
1734
                graph[version] = []
1735
            else:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1736
                # if not completed
1737
                pending.update([p for p in parents if p not in graph])
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1738
                graph[version] = parents
1739
        return topo_sort(graph.items())
1740
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1741
    def get_build_details(self, version_ids):
1742
        """Get the method, index_memo and compression parent for version_ids.
1743
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1744
        Ghosts are omitted from the result.
1745
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1746
        :param version_ids: An iterable of version_ids.
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
1747
        :return: A dict of version_id:(index_memo, compression_parent,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1748
                                       parents, record_details).
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
1749
            index_memo
1750
                opaque structure to pass to read_records to extract the raw
1751
                data
1752
            compression_parent
1753
                Content that this record is built upon, may be None
1754
            parents
1755
                Logical parents of this node
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1756
            record_details
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
1757
                extra information about the content which needs to be passed to
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
1758
                Factory.parse_record
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1759
        """
1760
        result = {}
1761
        for version_id in version_ids:
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
1762
            if version_id not in self._cache:
1763
                # ghosts are omitted
1764
                continue
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1765
            method = self.get_method(version_id)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
1766
            parents = self.get_parents_with_ghosts(version_id)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1767
            if method == 'fulltext':
1768
                compression_parent = None
1769
            else:
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
1770
                compression_parent = parents[0]
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
1771
            noeol = 'no-eol' in self.get_options(version_id)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1772
            index_memo = self.get_position(version_id)
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
1773
            result[version_id] = (index_memo, compression_parent,
1774
                                  parents, (method, noeol))
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
1775
        return result
1776
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1777
    def num_versions(self):
1778
        return len(self._history)
1779
1780
    __len__ = num_versions
1781
1782
    def get_versions(self):
2592.3.6 by Robert Collins
Implement KnitGraphIndex.get_versions.
1783
        """Get all the versions in the file. not topologically sorted."""
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1784
        return self._history
1785
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1786
    def _version_list_to_index(self, versions):
1787
        result_list = []
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1788
        cache = self._cache
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1789
        for version in versions:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1790
            if version in cache:
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1791
                # -- inlined lookup() --
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1792
                result_list.append(str(cache[version][5]))
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1793
                # -- end lookup () --
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1794
            else:
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1795
                result_list.append('.' + version)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1796
        return ' '.join(result_list)
1797
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1798
    def add_version(self, version_id, options, index_memo, parents):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1799
        """Add a version record to the index."""
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1800
        self.add_versions(((version_id, options, index_memo, parents),))
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1801
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1802
    def add_versions(self, versions, random_id=False):
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
1803
        """Add multiple versions to the index.
1804
        
1805
        :param versions: a list of tuples:
1806
                         (version_id, options, pos, size, parents).
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
1807
        :param random_id: If True the ids being added were randomly generated
1808
            and no check for existence will be performed.
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
1809
        """
1810
        lines = []
2102.2.1 by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions
1811
        orig_history = self._history[:]
1812
        orig_cache = self._cache.copy()
1813
1814
        try:
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1815
            for version_id, options, (index, pos, size), parents in versions:
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1816
                line = "\n%s %s %s %s %s :" % (version_id,
2102.2.1 by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions
1817
                                               ','.join(options),
1818
                                               pos,
1819
                                               size,
1820
                                               self._version_list_to_index(parents))
1821
                assert isinstance(line, str), \
1822
                    'content must be utf-8 encoded: %r' % (line,)
1823
                lines.append(line)
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
1824
                self._cache_version(version_id, options, pos, size, tuple(parents))
2102.2.1 by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions
1825
            if not self._need_to_create:
1826
                self._transport.append_bytes(self._filename, ''.join(lines))
1827
            else:
1828
                sio = StringIO()
1829
                sio.write(self.HEADER)
1830
                sio.writelines(lines)
1831
                sio.seek(0)
1832
                self._transport.put_file_non_atomic(self._filename, sio,
1833
                                    create_parent_dir=self._create_parent_dir,
1834
                                    mode=self._file_mode,
1835
                                    dir_mode=self._dir_mode)
1836
                self._need_to_create = False
1837
        except:
1838
            # If any problems happen, restore the original values and re-raise
1839
            self._history = orig_history
1840
            self._cache = orig_cache
1841
            raise
1842
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1843
    def has_version(self, version_id):
1844
        """True if the version is in the index."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1845
        return version_id in self._cache
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1846
1847
    def get_position(self, version_id):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1848
        """Return details needed to access the version.
1849
        
1850
        .kndx indices do not support split-out data, so return None for the 
1851
        index field.
1852
1853
        :return: a tuple (None, data position, size) to hand to the access
1854
            logic to get the record.
1855
        """
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1856
        entry = self._cache[version_id]
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
1857
        return None, entry[2], entry[3]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1858
1859
    def get_method(self, version_id):
1860
        """Return compression method of specified version."""
2592.3.97 by Robert Collins
Merge more bzr.dev, addressing some bugs. [still broken]
1861
        try:
1862
            options = self._cache[version_id][1]
1863
        except KeyError:
1864
            raise RevisionNotPresent(version_id, self._filename)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1865
        if 'fulltext' in options:
1866
            return 'fulltext'
1867
        else:
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1868
            if 'line-delta' not in options:
1869
                raise errors.KnitIndexUnknownMethod(self._full_path(), options)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1870
            return 'line-delta'
1871
1872
    def get_options(self, version_id):
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
1873
        """Return a list representing options.
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
1874
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
1875
        e.g. ['foo', 'bar']
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
1876
        """
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1877
        return self._cache[version_id][1]
1878
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
1879
    def get_parent_map(self, version_ids):
1880
        """Passed through to by KnitVersionedFile.get_parent_map."""
1881
        result = {}
1882
        for version_id in version_ids:
1883
            try:
1884
                result[version_id] = tuple(self._cache[version_id][4])
1885
            except KeyError:
1886
                pass
1887
        return result
1888
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1889
    def get_parents_with_ghosts(self, version_id):
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1890
        """Return parents of specified version with ghosts."""
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
1891
        try:
1892
            return self.get_parent_map([version_id])[version_id]
1893
        except KeyError:
1894
            raise RevisionNotPresent(version_id, self)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1895
1896
    def check_versions_present(self, version_ids):
1897
        """Check that all specified versions are present."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1898
        cache = self._cache
1899
        for version_id in version_ids:
1900
            if version_id not in cache:
1901
                raise RevisionNotPresent(version_id, self._filename)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1902
1903
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
1904
class KnitGraphIndex(object):
1905
    """A knit index that builds on GraphIndex."""
1906
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1907
    def __init__(self, graph_index, deltas=False, parents=True, add_callback=None):
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
1908
        """Construct a KnitGraphIndex on a graph_index.
1909
1910
        :param graph_index: An implementation of bzrlib.index.GraphIndex.
2592.3.13 by Robert Collins
Implement KnitGraphIndex.get_method.
1911
        :param deltas: Allow delta-compressed records.
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
1912
        :param add_callback: If not None, allow additions to the index and call
1913
            this callback with a list of added GraphIndex nodes:
2592.3.33 by Robert Collins
Change the order of index refs and values to make the no-graph knit index easier.
1914
            [(node, value, node_refs), ...]
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1915
        :param parents: If True, record knits parents, if not do not record 
1916
            parents.
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
1917
        """
1918
        self._graph_index = graph_index
2592.3.13 by Robert Collins
Implement KnitGraphIndex.get_method.
1919
        self._deltas = deltas
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
1920
        self._add_callback = add_callback
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1921
        self._parents = parents
1922
        if deltas and not parents:
1923
            raise KnitCorrupt(self, "Cannot do delta compression without "
1924
                "parent tracking.")
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
1925
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
1926
    def _check_write_ok(self):
1927
        pass
1928
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1929
    def _get_entries(self, keys, check_present=False):
1930
        """Get the entries for keys.
1931
        
1932
        :param keys: An iterable of index keys, - 1-tuples.
1933
        """
1934
        keys = set(keys)
2592.3.43 by Robert Collins
A knit iter_parents API.
1935
        found_keys = set()
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1936
        if self._parents:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1937
            for node in self._graph_index.iter_entries(keys):
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1938
                yield node
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1939
                found_keys.add(node[1])
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1940
        else:
1941
            # adapt parentless index to the rest of the code.
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1942
            for node in self._graph_index.iter_entries(keys):
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1943
                yield node[0], node[1], node[2], ()
1944
                found_keys.add(node[1])
2592.3.43 by Robert Collins
A knit iter_parents API.
1945
        if check_present:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1946
            missing_keys = keys.difference(found_keys)
2592.3.43 by Robert Collins
A knit iter_parents API.
1947
            if missing_keys:
1948
                raise RevisionNotPresent(missing_keys.pop(), self)
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
1949
1950
    def _present_keys(self, version_ids):
1951
        return set([
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1952
            node[1] for node in self._get_entries(version_ids)])
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
1953
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1954
    def _parentless_ancestry(self, versions):
1955
        """Honour the get_ancestry API for parentless knit indices."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1956
        wanted_keys = self._version_ids_to_keys(versions)
1957
        present_keys = self._present_keys(wanted_keys)
1958
        missing = set(wanted_keys).difference(present_keys)
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1959
        if missing:
1960
            raise RevisionNotPresent(missing.pop(), self)
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1961
        return list(self._keys_to_version_ids(present_keys))
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1962
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
1963
    def get_ancestry(self, versions, topo_sorted=True):
1964
        """See VersionedFile.get_ancestry."""
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
1965
        if not self._parents:
1966
            return self._parentless_ancestry(versions)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
1967
        # XXX: This will do len(history) index calls - perhaps
1968
        # it should be altered to be a index core feature?
1969
        # get a graph of all the mentioned versions:
1970
        graph = {}
2592.3.30 by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits.
1971
        ghosts = set()
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1972
        versions = self._version_ids_to_keys(versions)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
1973
        pending = set(versions)
1974
        while pending:
1975
            # get all pending nodes
1976
            this_iteration = pending
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
1977
            new_nodes = self._get_entries(this_iteration)
2592.3.53 by Robert Collins
Remove usage of difference_update in knit.py.
1978
            found = set()
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
1979
            pending = set()
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1980
            for (index, key, value, node_refs) in new_nodes:
2592.3.30 by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits.
1981
                # dont ask for ghosties - otherwise
1982
                # we we can end up looping with pending
1983
                # being entirely ghosted.
1984
                graph[key] = [parent for parent in node_refs[0]
1985
                    if parent not in ghosts]
2592.3.53 by Robert Collins
Remove usage of difference_update in knit.py.
1986
                # queue parents
1987
                for parent in graph[key]:
1988
                    # dont examine known nodes again
1989
                    if parent in graph:
1990
                        continue
1991
                    pending.add(parent)
1992
                found.add(key)
1993
            ghosts.update(this_iteration.difference(found))
2592.3.30 by Robert Collins
Make GraphKnitIndex get_ancestry the same as regular knits.
1994
        if versions.difference(graph):
1995
            raise RevisionNotPresent(versions.difference(graph).pop(), self)
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1996
        if topo_sorted:
1997
            result_keys = topo_sort(graph.items())
1998
        else:
1999
            result_keys = graph.iterkeys()
2000
        return [key[0] for key in result_keys]
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2001
2002
    def get_ancestry_with_ghosts(self, versions):
2003
        """See VersionedFile.get_ancestry."""
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2004
        if not self._parents:
2005
            return self._parentless_ancestry(versions)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2006
        # XXX: This will do len(history) index calls - perhaps
2007
        # it should be altered to be a index core feature?
2008
        # get a graph of all the mentioned versions:
2009
        graph = {}
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2010
        versions = self._version_ids_to_keys(versions)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2011
        pending = set(versions)
2012
        while pending:
2013
            # get all pending nodes
2014
            this_iteration = pending
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2015
            new_nodes = self._get_entries(this_iteration)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2016
            pending = set()
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
2017
            for (index, key, value, node_refs) in new_nodes:
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2018
                graph[key] = node_refs[0]
2019
                # queue parents 
2592.3.53 by Robert Collins
Remove usage of difference_update in knit.py.
2020
                for parent in graph[key]:
2021
                    # dont examine known nodes again
2022
                    if parent in graph:
2023
                        continue
2024
                    pending.add(parent)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2025
            missing_versions = this_iteration.difference(graph)
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2026
            missing_needed = versions.intersection(missing_versions)
2027
            if missing_needed:
2028
                raise RevisionNotPresent(missing_needed.pop(), self)
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2029
            for missing_version in missing_versions:
2030
                # add a key, no parents
2031
                graph[missing_version] = []
2592.3.53 by Robert Collins
Remove usage of difference_update in knit.py.
2032
                pending.discard(missing_version) # don't look for it
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2033
        result_keys = topo_sort(graph.items())
2034
        return [key[0] for key in result_keys]
2592.3.4 by Robert Collins
Implement get_ancestry/get_ancestry_with_ghosts for KnitGraphIndex.
2035
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2036
    def get_build_details(self, version_ids):
2037
        """Get the method, index_memo and compression parent for version_ids.
2038
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
2039
        Ghosts are omitted from the result.
2040
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2041
        :param version_ids: An iterable of version_ids.
3224.1.18 by John Arbash Meinel
Cleanup documentation
2042
        :return: A dict of version_id:(index_memo, compression_parent,
2043
                                       parents, record_details).
2044
            index_memo
2045
                opaque structure to pass to read_records to extract the raw
2046
                data
2047
            compression_parent
2048
                Content that this record is built upon, may be None
2049
            parents
2050
                Logical parents of this node
2051
            record_details
2052
                extra information about the content which needs to be passed to
2053
                Factory.parse_record
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2054
        """
2055
        result = {}
2056
        entries = self._get_entries(self._version_ids_to_keys(version_ids), True)
2057
        for entry in entries:
2058
            version_id = self._keys_to_version_ids((entry[1],))[0]
3224.1.27 by John Arbash Meinel
Handle when the knit index doesn't track parents.
2059
            if not self._parents:
2060
                parents = ()
2061
            else:
2062
                parents = self._keys_to_version_ids(entry[3][0])
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2063
            if not self._deltas:
2064
                compression_parent = None
2065
            else:
2066
                compression_parent_key = self._compression_parent(entry)
2067
                if compression_parent_key:
2068
                    compression_parent = self._keys_to_version_ids(
2069
                    (compression_parent_key,))[0]
2070
                else:
2071
                    compression_parent = None
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2072
            noeol = (entry[2][0] == 'N')
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2073
            if compression_parent:
2074
                method = 'line-delta'
2075
            else:
2076
                method = 'fulltext'
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
2077
            result[version_id] = (self._node_to_position(entry),
2078
                                  compression_parent, parents,
2079
                                  (method, noeol))
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2080
        return result
2081
2082
    def _compression_parent(self, an_entry):
2083
        # return the key that an_entry is compressed against, or None
2084
        # Grab the second parent list (as deltas implies parents currently)
2085
        compression_parents = an_entry[3][1]
2086
        if not compression_parents:
2087
            return None
2088
        assert len(compression_parents) == 1
2089
        return compression_parents[0]
2090
2091
    def _get_method(self, node):
2092
        if not self._deltas:
2093
            return 'fulltext'
2094
        if self._compression_parent(node):
2095
            return 'line-delta'
2096
        else:
2097
            return 'fulltext'
2098
2592.3.5 by Robert Collins
Implement KnitGraphIndex.num_versions.
2099
    def num_versions(self):
2100
        return len(list(self._graph_index.iter_all_entries()))
2592.3.2 by Robert Collins
Implement a get_graph for a new KnitGraphIndex that will implement a KnitIndex on top of the GraphIndex API.
2101
2592.3.6 by Robert Collins
Implement KnitGraphIndex.get_versions.
2102
    __len__ = num_versions
2103
2104
    def get_versions(self):
2105
        """Get all the versions in the file. not topologically sorted."""
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
2106
        return [node[1][0] for node in self._graph_index.iter_all_entries()]
2592.3.6 by Robert Collins
Implement KnitGraphIndex.get_versions.
2107
    
2592.3.9 by Robert Collins
Implement KnitGraphIndex.has_version.
2108
    def has_version(self, version_id):
2109
        """True if the version is in the index."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2110
        return len(self._present_keys(self._version_ids_to_keys([version_id]))) == 1
2111
2112
    def _keys_to_version_ids(self, keys):
2113
        return tuple(key[0] for key in keys)
2592.3.6 by Robert Collins
Implement KnitGraphIndex.get_versions.
2114
2592.3.10 by Robert Collins
Implement KnitGraphIndex.get_position.
2115
    def get_position(self, version_id):
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2116
        """Return details needed to access the version.
2117
        
2118
        :return: a tuple (index, data position, size) to hand to the access
2119
            logic to get the record.
2120
        """
2121
        node = self._get_node(version_id)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2122
        return self._node_to_position(node)
2123
2124
    def _node_to_position(self, node):
2125
        """Convert an index value to position details."""
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2126
        bits = node[2][1:].split(' ')
2127
        return node[0], int(bits[0]), int(bits[1])
2592.3.10 by Robert Collins
Implement KnitGraphIndex.get_position.
2128
2592.3.11 by Robert Collins
Implement KnitGraphIndex.get_method.
2129
    def get_method(self, version_id):
2130
        """Return compression method of specified version."""
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2131
        return self._get_method(self._get_node(version_id))
2592.3.11 by Robert Collins
Implement KnitGraphIndex.get_method.
2132
2133
    def _get_node(self, version_id):
2592.3.97 by Robert Collins
Merge more bzr.dev, addressing some bugs. [still broken]
2134
        try:
2135
            return list(self._get_entries(self._version_ids_to_keys([version_id])))[0]
2136
        except IndexError:
2137
            raise RevisionNotPresent(version_id, self)
2592.3.11 by Robert Collins
Implement KnitGraphIndex.get_method.
2138
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
2139
    def get_options(self, version_id):
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2140
        """Return a list representing options.
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
2141
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2142
        e.g. ['foo', 'bar']
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
2143
        """
2144
        node = self._get_node(version_id)
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2145
        options = [self._get_method(node)]
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
2146
        if node[2][0] == 'N':
2592.3.14 by Robert Collins
Implement KnitGraphIndex.get_options.
2147
            options.append('no-eol')
2658.2.1 by Robert Collins
Fix mismatch between KnitGraphIndex and KnitIndex in get_options.
2148
        return options
2592.3.11 by Robert Collins
Implement KnitGraphIndex.get_method.
2149
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2150
    def get_parent_map(self, version_ids):
2151
        """Passed through to by KnitVersionedFile.get_parent_map."""
2152
        nodes = self._get_entries(self._version_ids_to_keys(version_ids))
2153
        result = {}
2154
        if self._parents:
2155
            for node in nodes:
2156
                result[node[1][0]] = self._keys_to_version_ids(node[3][0])
2157
        else:
2158
            for node in nodes:
2159
                result[node[1][0]] = ()
2160
        return result
2161
2592.3.15 by Robert Collins
Implement KnitGraphIndex.get_parents/get_parents_with_ghosts.
2162
    def get_parents_with_ghosts(self, version_id):
2163
        """Return parents of specified version with ghosts."""
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2164
        try:
2165
            return self.get_parent_map([version_id])[version_id]
2166
        except KeyError:
2167
            raise RevisionNotPresent(version_id, self)
2592.3.15 by Robert Collins
Implement KnitGraphIndex.get_parents/get_parents_with_ghosts.
2168
2592.3.16 by Robert Collins
Implement KnitGraphIndex.check_versions_present.
2169
    def check_versions_present(self, version_ids):
2170
        """Check that all specified versions are present."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2171
        keys = self._version_ids_to_keys(version_ids)
2172
        present = self._present_keys(keys)
2173
        missing = keys.difference(present)
2592.3.16 by Robert Collins
Implement KnitGraphIndex.check_versions_present.
2174
        if missing:
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
2175
            raise RevisionNotPresent(missing.pop(), self)
2592.3.16 by Robert Collins
Implement KnitGraphIndex.check_versions_present.
2176
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2177
    def add_version(self, version_id, options, access_memo, parents):
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2178
        """Add a version record to the index."""
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2179
        return self.add_versions(((version_id, options, access_memo, parents),))
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2180
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
2181
    def add_versions(self, versions, random_id=False):
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2182
        """Add multiple versions to the index.
2183
        
2184
        This function does not insert data into the Immutable GraphIndex
2185
        backing the KnitGraphIndex, instead it prepares data for insertion by
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
2186
        the caller and checks that it is safe to insert then calls
2187
        self._add_callback with the prepared GraphIndex nodes.
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2188
2189
        :param versions: a list of tuples:
2190
                         (version_id, options, pos, size, parents).
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
2191
        :param random_id: If True the ids being added were randomly generated
2192
            and no check for existence will be performed.
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2193
        """
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
2194
        if not self._add_callback:
2195
            raise errors.ReadOnlyError(self)
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2196
        # we hope there are no repositories with inconsistent parentage
2197
        # anymore.
2198
        # check for dups
2199
2200
        keys = {}
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2201
        for (version_id, options, access_memo, parents) in versions:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2202
            index, pos, size = access_memo
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2203
            key = (version_id, )
2204
            parents = tuple((parent, ) for parent in parents)
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2205
            if 'no-eol' in options:
2206
                value = 'N'
2207
            else:
2208
                value = ' '
2209
            value += "%d %d" % (pos, size)
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2210
            if not self._deltas:
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2211
                if 'line-delta' in options:
2212
                    raise KnitCorrupt(self, "attempt to add line-delta in non-delta knit")
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2213
            if self._parents:
2214
                if self._deltas:
2215
                    if 'line-delta' in options:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2216
                        node_refs = (parents, (parents[0],))
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2217
                    else:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2218
                        node_refs = (parents, ())
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2219
                else:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2220
                    node_refs = (parents, )
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2221
            else:
2222
                if parents:
2223
                    raise KnitCorrupt(self, "attempt to add node with parents "
2224
                        "in parentless index.")
2225
                node_refs = ()
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2226
            keys[key] = (value, node_refs)
2841.2.1 by Robert Collins
* Commit no longer checks for new text keys during insertion when the
2227
        if not random_id:
2228
            present_nodes = self._get_entries(keys)
2229
            for (index, key, value, node_refs) in present_nodes:
2230
                if (value, node_refs) != keys[key]:
2231
                    raise KnitCorrupt(self, "inconsistent details in add_versions"
2232
                        ": %s %s" % ((value, node_refs), keys[key]))
2233
                del keys[key]
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2234
        result = []
2592.3.34 by Robert Collins
Rough unfactored support for parentless KnitGraphIndexs.
2235
        if self._parents:
2236
            for key, (value, node_refs) in keys.iteritems():
2237
                result.append((key, value, node_refs))
2238
        else:
2239
            for key, (value, node_refs) in keys.iteritems():
2240
                result.append((key, value))
2592.3.19 by Robert Collins
Change KnitGraphIndex from returning data to performing a callback on insertions.
2241
        self._add_callback(result)
2592.3.17 by Robert Collins
Add add_version(s) to KnitGraphIndex, completing the required api for KnitVersionedFile.
2242
        
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
2243
    def _version_ids_to_keys(self, version_ids):
2244
        return set((version_id, ) for version_id in version_ids)
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2245
2246
2247
class _KnitAccess(object):
2248
    """Access to knit records in a .knit file."""
2249
2250
    def __init__(self, transport, filename, _file_mode, _dir_mode,
2251
        _need_to_create, _create_parent_dir):
2252
        """Create a _KnitAccess for accessing and inserting data.
2253
2254
        :param transport: The transport the .knit is located on.
2255
        :param filename: The filename of the .knit.
2256
        """
2257
        self._transport = transport
2258
        self._filename = filename
2259
        self._file_mode = _file_mode
2260
        self._dir_mode = _dir_mode
2261
        self._need_to_create = _need_to_create
2262
        self._create_parent_dir = _create_parent_dir
2263
2264
    def add_raw_records(self, sizes, raw_data):
2265
        """Add raw knit bytes to a storage area.
2266
2267
        The data is spooled to whereever the access method is storing data.
2268
2269
        :param sizes: An iterable containing the size of each raw data segment.
2270
        :param raw_data: A bytestring containing the data.
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2271
        :return: A list of memos to retrieve the record later. Each memo is a
2272
            tuple - (index, pos, length), where the index field is always None
2273
            for the .knit access method.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2274
        """
2275
        assert type(raw_data) == str, \
2276
            'data must be plain bytes was %s' % type(raw_data)
2277
        if not self._need_to_create:
2278
            base = self._transport.append_bytes(self._filename, raw_data)
2279
        else:
2280
            self._transport.put_bytes_non_atomic(self._filename, raw_data,
2281
                                   create_parent_dir=self._create_parent_dir,
2282
                                   mode=self._file_mode,
2283
                                   dir_mode=self._dir_mode)
2284
            self._need_to_create = False
2285
            base = 0
2286
        result = []
2287
        for size in sizes:
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2288
            result.append((None, base, size))
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2289
            base += size
2290
        return result
2291
2292
    def create(self):
2293
        """IFF this data access has its own storage area, initialise it.
2294
2295
        :return: None.
2296
        """
2297
        self._transport.put_bytes_non_atomic(self._filename, '',
2298
                                             mode=self._file_mode)
2299
2300
    def open_file(self):
2301
        """IFF this data access can be represented as a single file, open it.
2302
2303
        For knits that are not mapped to a single file on disk this will
2304
        always return None.
2305
2306
        :return: None or a file handle.
2307
        """
2308
        try:
2309
            return self._transport.get(self._filename)
2310
        except NoSuchFile:
2311
            pass
2312
        return None
2313
2314
    def get_raw_records(self, memos_for_retrieval):
2315
        """Get the raw bytes for a records.
2316
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2317
        :param memos_for_retrieval: An iterable containing the (index, pos, 
2318
            length) memo for retrieving the bytes. The .knit method ignores
2319
            the index as there is always only a single file.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2320
        :return: An iterator over the bytes of the records.
2321
        """
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2322
        read_vector = [(pos, size) for (index, pos, size) in memos_for_retrieval]
2323
        for pos, data in self._transport.readv(self._filename, read_vector):
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2324
            yield data
2325
2326
2327
class _PackAccess(object):
2328
    """Access to knit records via a collection of packs."""
2329
2330
    def __init__(self, index_to_packs, writer=None):
2331
        """Create a _PackAccess object.
2332
2333
        :param index_to_packs: A dict mapping index objects to the transport
2334
            and file names for obtaining data.
2335
        :param writer: A tuple (pack.ContainerWriter, write_index) which
2670.2.3 by Robert Collins
Review feedback.
2336
            contains the pack to write, and the index that reads from it will
2337
            be associated with.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2338
        """
2339
        if writer:
2340
            self.container_writer = writer[0]
2341
            self.write_index = writer[1]
2342
        else:
2343
            self.container_writer = None
2344
            self.write_index = None
2345
        self.indices = index_to_packs
2346
2347
    def add_raw_records(self, sizes, raw_data):
2348
        """Add raw knit bytes to a storage area.
2349
2670.2.3 by Robert Collins
Review feedback.
2350
        The data is spooled to the container writer in one bytes-record per
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2351
        raw data item.
2352
2353
        :param sizes: An iterable containing the size of each raw data segment.
2354
        :param raw_data: A bytestring containing the data.
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2355
        :return: A list of memos to retrieve the record later. Each memo is a
2356
            tuple - (index, pos, length), where the index field is the 
2357
            write_index object supplied to the PackAccess object.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2358
        """
2359
        assert type(raw_data) == str, \
2360
            'data must be plain bytes was %s' % type(raw_data)
2361
        result = []
2362
        offset = 0
2363
        for size in sizes:
2364
            p_offset, p_length = self.container_writer.add_bytes_record(
2365
                raw_data[offset:offset+size], [])
2366
            offset += size
2367
            result.append((self.write_index, p_offset, p_length))
2368
        return result
2369
2370
    def create(self):
2371
        """Pack based knits do not get individually created."""
2372
2373
    def get_raw_records(self, memos_for_retrieval):
2374
        """Get the raw bytes for a records.
2375
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2376
        :param memos_for_retrieval: An iterable containing the (index, pos, 
2377
            length) memo for retrieving the bytes. The Pack access method
2378
            looks up the pack to use for a given record in its index_to_pack
2379
            map.
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2380
        :return: An iterator over the bytes of the records.
2381
        """
2382
        # first pass, group into same-index requests
2383
        request_lists = []
2384
        current_index = None
2385
        for (index, offset, length) in memos_for_retrieval:
2386
            if current_index == index:
2387
                current_list.append((offset, length))
2388
            else:
2389
                if current_index is not None:
2390
                    request_lists.append((current_index, current_list))
2391
                current_index = index
2392
                current_list = [(offset, length)]
2393
        # handle the last entry
2394
        if current_index is not None:
2395
            request_lists.append((current_index, current_list))
2396
        for index, offsets in request_lists:
2397
            transport, path = self.indices[index]
2398
            reader = pack.make_readv_reader(transport, path, offsets)
2399
            for names, read_func in reader.iter_records():
2400
                yield read_func(None)
2401
2402
    def open_file(self):
2403
        """Pack based knits have no single file."""
2404
        return None
2405
2592.3.70 by Robert Collins
Allow setting a writer after creating a knit._PackAccess object.
2406
    def set_writer(self, writer, index, (transport, packname)):
2407
        """Set a writer to use for adding data."""
2592.3.208 by Robert Collins
Start refactoring the knit-pack thunking to be clearer.
2408
        if index is not None:
2409
            self.indices[index] = (transport, packname)
2592.3.70 by Robert Collins
Allow setting a writer after creating a knit._PackAccess object.
2410
        self.container_writer = writer
2411
        self.write_index = index
2412
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2413
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2414
class _StreamAccess(object):
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2415
    """A Knit Access object that provides data from a datastream.
2416
    
2417
    It also provides a fallback to present as unannotated data, annotated data
2418
    from a *backing* access object.
2419
2420
    This is triggered by a index_memo which is pointing to a different index
2421
    than this was constructed with, and is used to allow extracting full
2422
    unannotated texts for insertion into annotated knits.
2423
    """
2424
2425
    def __init__(self, reader_callable, stream_index, backing_knit,
2426
        orig_factory):
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2427
        """Create a _StreamAccess object.
2428
2429
        :param reader_callable: The reader_callable from the datastream.
2430
            This is called to buffer all the data immediately, for 
2431
            random access.
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2432
        :param stream_index: The index the data stream this provides access to
2433
            which will be present in native index_memo's.
2434
        :param backing_knit: The knit object that will provide access to 
2435
            annotated texts which are not available in the stream, so as to
2436
            create unannotated texts.
2437
        :param orig_factory: The original content factory used to generate the
2438
            stream. This is used for checking whether the thunk code for
2439
            supporting _copy_texts will generate the correct form of data.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2440
        """
2441
        self.data = reader_callable(None)
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2442
        self.stream_index = stream_index
2443
        self.backing_knit = backing_knit
2444
        self.orig_factory = orig_factory
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2445
2446
    def get_raw_records(self, memos_for_retrieval):
2447
        """Get the raw bytes for a records.
2448
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2449
        :param memos_for_retrieval: An iterable of memos from the
2450
            _StreamIndex object identifying bytes to read; for these classes
2451
            they are (from_backing_knit, index, start, end) and can point to
2452
            either the backing knit or streamed data.
2453
        :return: An iterator yielding a byte string for each record in 
2454
            memos_for_retrieval.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2455
        """
2456
        # use a generator for memory friendliness
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2457
        for from_backing_knit, version_id, start, end in memos_for_retrieval:
2458
            if not from_backing_knit:
2459
                assert version_id is self.stream_index
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2460
                yield self.data[start:end]
2461
                continue
2462
            # we have been asked to thunk. This thunking only occurs when
2463
            # we are obtaining plain texts from an annotated backing knit
2464
            # so that _copy_texts will work.
2465
            # We could improve performance here by scanning for where we need
2466
            # to do this and using get_line_list, then interleaving the output
2467
            # as desired. However, for now, this is sufficient.
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2468
            if self.orig_factory.__class__ != KnitPlainFactory:
2469
                raise errors.KnitCorrupt(
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2470
                    self, 'Bad thunk request %r cannot be backed by %r' %
2471
                        (version_id, self.orig_factory))
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2472
            lines = self.backing_knit.get_lines(version_id)
2473
            line_bytes = ''.join(lines)
2474
            digest = sha_string(line_bytes)
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2475
            # the packed form of the fulltext always has a trailing newline,
2476
            # even if the actual text does not, unless the file is empty.  the
2477
            # record options including the noeol flag are passed through by
2478
            # _StreamIndex, so this is safe.
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2479
            if lines:
2480
                if lines[-1][-1] != '\n':
2481
                    lines[-1] = lines[-1] + '\n'
2482
                    line_bytes += '\n'
2483
            # We want plain data, because we expect to thunk only to allow text
2484
            # extraction.
2485
            size, bytes = self.backing_knit._data._record_to_data(version_id,
2486
                digest, lines, line_bytes)
2487
            yield bytes
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2488
2489
2490
class _StreamIndex(object):
2491
    """A Knit Index object that uses the data map from a datastream."""
2492
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2493
    def __init__(self, data_list, backing_index):
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2494
        """Create a _StreamIndex object.
2495
2496
        :param data_list: The data_list from the datastream.
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2497
        :param backing_index: The index which will supply values for nodes
2498
            referenced outside of this stream.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2499
        """
2500
        self.data_list = data_list
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2501
        self.backing_index = backing_index
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2502
        self._by_version = {}
2503
        pos = 0
2504
        for key, options, length, parents in data_list:
2505
            self._by_version[key] = options, (pos, pos + length), parents
2506
            pos += length
2507
2508
    def get_ancestry(self, versions, topo_sorted):
2509
        """Get an ancestry list for versions."""
2510
        if topo_sorted:
2511
            # Not needed for basic joins
2512
            raise NotImplementedError(self.get_ancestry)
2513
        # get a graph of all the mentioned versions:
2514
        # Little ugly - basically copied from KnitIndex, but don't want to
2515
        # accidentally incorporate too much of that index's code.
3052.2.4 by Andrew Bennetts
Some tweaks suggested by John's review.
2516
        ancestry = set()
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2517
        pending = set(versions)
2518
        cache = self._by_version
2519
        while pending:
2520
            version = pending.pop()
2521
            # trim ghosts
2522
            try:
2523
                parents = [p for p in cache[version][2] if p in cache]
2524
            except KeyError:
2525
                raise RevisionNotPresent(version, self)
2526
            # if not completed and not a ghost
3052.2.4 by Andrew Bennetts
Some tweaks suggested by John's review.
2527
            pending.update([p for p in parents if p not in ancestry])
2528
            ancestry.add(version)
2529
        return list(ancestry)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2530
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2531
    def get_build_details(self, version_ids):
2532
        """Get the method, index_memo and compression parent for version_ids.
2533
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
2534
        Ghosts are omitted from the result.
2535
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2536
        :param version_ids: An iterable of version_ids.
3224.1.18 by John Arbash Meinel
Cleanup documentation
2537
        :return: A dict of version_id:(index_memo, compression_parent,
2538
                                       parents, record_details).
2539
            index_memo
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2540
                opaque memo that can be passed to _StreamAccess.read_records
2541
                to extract the raw data; for these classes it is
2542
                (from_backing_knit, index, start, end) 
3224.1.18 by John Arbash Meinel
Cleanup documentation
2543
            compression_parent
2544
                Content that this record is built upon, may be None
2545
            parents
2546
                Logical parents of this node
2547
            record_details
2548
                extra information about the content which needs to be passed to
2549
                Factory.parse_record
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2550
        """
2551
        result = {}
2552
        for version_id in version_ids:
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
2553
            try:
2554
                method = self.get_method(version_id)
2555
            except errors.RevisionNotPresent:
2556
                # ghosts are omitted
2557
                continue
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
2558
            parent_ids = self.get_parents_with_ghosts(version_id)
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2559
            noeol = ('no-eol' in self.get_options(version_id))
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2560
            index_memo = self.get_position(version_id)
2561
            from_backing_knit = index_memo[0]
2562
            if from_backing_knit:
2563
                # texts retrieved from the backing knit are always full texts
2564
                method = 'fulltext'
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2565
            if method == 'fulltext':
2566
                compression_parent = None
2567
            else:
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
2568
                compression_parent = parent_ids[0]
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
2569
            result[version_id] = (index_memo, compression_parent,
2570
                                  parent_ids, (method, noeol))
3218.1.1 by Robert Collins
Reduce index query pressure for text construction by batching the individual queries into single batch queries.
2571
        return result
2572
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2573
    def get_method(self, version_id):
2574
        """Return compression method of specified version."""
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2575
        options = self.get_options(version_id)
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2576
        if 'fulltext' in options:
2577
            return 'fulltext'
2578
        elif 'line-delta' in options:
2579
            return 'line-delta'
2580
        else:
2581
            raise errors.KnitIndexUnknownMethod(self, options)
2582
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2583
    def get_options(self, version_id):
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2584
        """Return a list representing options.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2585
2586
        e.g. ['foo', 'bar']
2587
        """
3224.1.8 by John Arbash Meinel
Add noeol to the return signature of get_build_details.
2588
        try:
2589
            return self._by_version[version_id][0]
2590
        except KeyError:
3287.7.6 by Andrew Bennetts
Tweaks suggested by Robert's review.
2591
            options = list(self.backing_index.get_options(version_id))
2592
            if 'fulltext' in options:
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2593
                pass
3287.7.6 by Andrew Bennetts
Tweaks suggested by Robert's review.
2594
            elif 'line-delta' in options:
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2595
                # Texts from the backing knit are always returned from the stream
2596
                # as full texts
3287.7.6 by Andrew Bennetts
Tweaks suggested by Robert's review.
2597
                options.remove('line-delta')
2598
                options.append('fulltext')
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2599
            else:
2600
                raise errors.KnitIndexUnknownMethod(self, options)
3287.7.6 by Andrew Bennetts
Tweaks suggested by Robert's review.
2601
            return tuple(options)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2602
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2603
    def get_parent_map(self, version_ids):
2604
        """Passed through to by KnitVersionedFile.get_parent_map."""
2605
        result = {}
2606
        pending_ids = set()
2607
        for version_id in version_ids:
2608
            try:
2609
                result[version_id] = self._by_version[version_id][2]
2610
            except KeyError:
2611
                pending_ids.add(version_id)
2612
        result.update(self.backing_index.get_parent_map(pending_ids))
2613
        return result
2614
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2615
    def get_parents_with_ghosts(self, version_id):
2616
        """Return parents of specified version with ghosts."""
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
2617
        try:
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2618
            return self.get_parent_map([version_id])[version_id]
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
2619
        except KeyError:
3287.5.5 by Robert Collins
Refactor internals of knit implementations to implement get_parents_with_ghosts in terms of get_parent_map.
2620
            raise RevisionNotPresent(version_id, self)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2621
2622
    def get_position(self, version_id):
2623
        """Return details needed to access the version.
2624
        
2625
        _StreamAccess has the data as a big array, so we return slice
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2626
        coordinates into that (as index_memo's are opaque outside the
2627
        index and matching access class).
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2628
3287.12.1 by Martin Pool
#2008418: (with spiv) Avoid interpreting fulltexts as line deltas when pulling knits.
2629
        :return: a tuple (from_backing_knit, index, start, end) that can 
2630
            be passed e.g. to get_raw_records.  
2631
            If from_backing_knit is False, index will be self, otherwise it
2632
            will be a version id.
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2633
        """
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2634
        try:
2635
            start, end = self._by_version[version_id][1]
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2636
            return False, self, start, end
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2637
        except KeyError:
2638
            # Signal to the access object to handle this from the backing knit.
3052.2.5 by Andrew Bennetts
Address the rest of the review comments from John and myself.
2639
            return (True, version_id, None, None)
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2640
2641
    def get_versions(self):
2642
        """Get all the versions in the stream."""
2643
        return self._by_version.keys()
2644
2645
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2646
class _KnitData(object):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2647
    """Manage extraction of data from a KnitAccess, caching and decompressing.
2648
    
2649
    The KnitData class provides the logic for parsing and using knit records,
2650
    making use of an access method for the low level read and write operations.
2651
    """
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2652
2653
    def __init__(self, access):
2654
        """Create a KnitData object.
2655
2656
        :param access: The access method to use. Access methods such as
2657
            _KnitAccess manage the insertion of raw records and the subsequent
2658
            retrieval of the same.
2659
        """
2660
        self._access = access
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2661
        self._checked = False
2662
2663
    def _open_file(self):
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2664
        return self._access.open_file()
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2665
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
2666
    def _record_to_data(self, version_id, digest, lines, dense_lines=None):
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2667
        """Convert version_id, digest, lines into a raw data block.
2668
        
2888.1.2 by Robert Collins
Cleanup the dense_lines parameter docstring to be more useful.
2669
        :param dense_lines: The bytes of lines but in a denser form. For
2670
            instance, if lines is a list of 1000 bytestrings each ending in \n,
2671
            dense_lines may be a list with one line in it, containing all the
2672
            1000's lines and their \n's. Using dense_lines if it is already
2673
            known is a win because the string join to create bytes in this
2674
            function spends less time resizing the final string.
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2675
        :return: (len, a StringIO instance with the raw data ready to read.)
2676
        """
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
2677
        # Note: using a string copy here increases memory pressure with e.g.
2678
        # ISO's, but it is about 3 seconds faster on a 1.2Ghz intel machine
2679
        # when doing the initial commit of a mozilla tree. RBC 20070921
2680
        bytes = ''.join(chain(
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
2681
            ["version %s %d %s\n" % (version_id,
1596.2.28 by Robert Collins
more knit profile based tuning.
2682
                                     len(lines),
2683
                                     digest)],
2888.1.1 by Robert Collins
(robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
2684
            dense_lines or lines,
2685
            ["end %s\n" % version_id]))
2817.3.1 by Robert Collins
* New helper ``bzrlib.tuned_gzip.bytes_to_gzip`` which takes a byte string
2686
        assert bytes.__class__ == str
2687
        compressed_bytes = bytes_to_gzip(bytes)
2688
        return len(compressed_bytes), compressed_bytes
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2689
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2690
    def add_raw_records(self, sizes, raw_data):
1692.4.1 by Robert Collins
Multiple merges:
2691
        """Append a prepared record to the data file.
2329.1.2 by John Arbash Meinel
Remove some spurious whitespace changes.
2692
        
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2693
        :param sizes: An iterable containing the size of each raw data segment.
2694
        :param raw_data: A bytestring containing the data.
2695
        :return: a list of index data for the way the data was stored.
2696
            See the access method add_raw_records documentation for more
2697
            details.
1692.4.1 by Robert Collins
Multiple merges:
2698
        """
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2699
        return self._access.add_raw_records(sizes, raw_data)
2329.1.2 by John Arbash Meinel
Remove some spurious whitespace changes.
2700
        
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2701
    def _parse_record_header(self, version_id, raw_data):
2702
        """Parse a record header for consistency.
2703
2704
        :return: the header and the decompressor stream.
2705
                 as (stream, header_record)
2706
        """
2707
        df = GzipFile(mode='rb', fileobj=StringIO(raw_data))
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2708
        try:
2709
            rec = self._check_header(version_id, df.readline())
2358.3.4 by Martin Pool
Fix mangled knit.py changes
2710
        except Exception, e:
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2711
            raise KnitCorrupt(self._access,
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2712
                              "While reading {%s} got %s(%s)"
2713
                              % (version_id, e.__class__.__name__, str(e)))
2358.3.4 by Martin Pool
Fix mangled knit.py changes
2714
        return df, rec
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2715
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2716
    def _split_header(self, line):
2358.3.4 by Martin Pool
Fix mangled knit.py changes
2717
        rec = line.split()
2718
        if len(rec) != 4:
2719
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2720
                              'unexpected number of elements in record header')
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2721
        return rec
2722
2723
    def _check_header_version(self, rec, version_id):
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
2724
        if rec[1] != version_id:
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2725
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2726
                              'unexpected version, wanted %r, got %r'
2727
                              % (version_id, rec[1]))
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2728
2729
    def _check_header(self, version_id, line):
2730
        rec = self._split_header(line)
2731
        self._check_header_version(rec, version_id)
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2732
        return rec
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2733
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2734
    def _parse_record_unchecked(self, data):
1628.1.2 by Robert Collins
More knit micro-optimisations.
2735
        # profiling notes:
2736
        # 4168 calls in 2880 217 internal
2737
        # 4168 calls to _parse_record_header in 2121
2738
        # 4168 calls to readlines in 330
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2739
        df = GzipFile(mode='rb', fileobj=StringIO(data))
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2740
        try:
2741
            record_contents = df.readlines()
2358.3.4 by Martin Pool
Fix mangled knit.py changes
2742
        except Exception, e:
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2743
            raise KnitCorrupt(self._access, "Corrupt compressed record %r, got %s(%s)" %
2744
                (data, e.__class__.__name__, str(e)))
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2745
        header = record_contents.pop(0)
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2746
        rec = self._split_header(header)
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2747
        last_line = record_contents.pop()
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2748
        if len(record_contents) != int(rec[2]):
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2749
            raise KnitCorrupt(self._access,
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
2750
                              'incorrect number of lines %s != %s'
2751
                              ' for version {%s}'
2752
                              % (len(record_contents), int(rec[2]),
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2753
                                 rec[1]))
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2754
        if last_line != 'end %s\n' % rec[1]:
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2755
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
2756
                              'unexpected version end line %r, wanted %r' 
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2757
                              % (last_line, rec[1]))
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2758
        df.close()
3350.3.4 by Robert Collins
Finish adapters for annotated knits to unannotated knits and full texts.
2759
        return rec, record_contents
2760
2761
    def _parse_record(self, version_id, data):
2762
        rec, record_contents = self._parse_record_unchecked(data)
2763
        self._check_header_version(rec, version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2764
        return record_contents, rec[3]
2765
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2766
    def read_records_iter_raw(self, records):
2767
        """Read text records from data file and yield raw data.
2768
2769
        This unpacks enough of the text record to validate the id is
2770
        as expected but thats all.
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
2771
2772
        Each item the iterator yields is (version_id, bytes,
2773
        sha1_of_full_text).
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2774
        """
2775
        # setup an iterator of the external records:
2776
        # uses readv so nice and fast we hope.
1756.3.23 by Aaron Bentley
Remove knit caches
2777
        if len(records):
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2778
            # grab the disk data needed.
3316.2.11 by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated.
2779
            needed_offsets = [index_memo for version_id, index_memo
2780
                                           in records]
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2781
            raw_records = self._access.get_raw_records(needed_offsets)
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2782
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2783
        for version_id, index_memo in records:
3316.2.11 by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated.
2784
            data = raw_records.next()
2785
            # validate the header
2786
            df, rec = self._parse_record_header(version_id, data)
2787
            df.close()
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
2788
            yield version_id, data, rec[3]
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2789
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2790
    def read_records_iter(self, records):
2791
        """Read text records from data file and yield result.
2792
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2793
        The result will be returned in whatever is the fastest to read.
2794
        Not by the order requested. Also, multiple requests for the same
2795
        record will only yield 1 response.
2796
        :param records: A list of (version_id, pos, len) entries
2797
        :return: Yields (version_id, contents, digest) in the order
2798
                 read, not the order requested
2799
        """
2800
        if not records:
2801
            return
2802
3316.2.11 by Robert Collins
* ``VersionedFile.clear_cache`` and ``enable_cache`` are deprecated.
2803
        needed_records = sorted(set(records), key=operator.itemgetter(1))
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2804
        if not needed_records:
2805
            return
2806
2807
        # The transport optimizes the fetching as well 
2808
        # (ie, reads continuous ranges.)
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2809
        raw_data = self._access.get_raw_records(
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2810
            [index_memo for version_id, index_memo in needed_records])
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2811
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2812
        for (version_id, index_memo), data in \
2592.3.66 by Robert Collins
Allow adaption of KnitData to pack files.
2813
                izip(iter(needed_records), raw_data):
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2814
            content, digest = self._parse_record(version_id, data)
1756.3.23 by Aaron Bentley
Remove knit caches
2815
            yield version_id, content, digest
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2816
2817
    def read_records(self, records):
2818
        """Read records into a dictionary."""
2819
        components = {}
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2820
        for record_id, content, digest in \
1863.1.9 by John Arbash Meinel
Switching to have 'read_records_iter' return in random order.
2821
                self.read_records_iter(records):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2822
            components[record_id] = (content, digest)
2823
        return components
2824
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2825
2826
class InterKnit(InterVersionedFile):
2827
    """Optimised code paths for knit to knit operations."""
2828
    
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
2829
    _matching_file_from_factory = staticmethod(make_file_knit)
2830
    _matching_file_to_factory = staticmethod(make_file_knit)
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2831
    
2832
    @staticmethod
2833
    def is_compatible(source, target):
2834
        """Be compatible with knits.  """
2835
        try:
2836
            return (isinstance(source, KnitVersionedFile) and
2837
                    isinstance(target, KnitVersionedFile))
2838
        except AttributeError:
2839
            return False
2840
2998.2.3 by John Arbash Meinel
Respond to Aaron's requests
2841
    def _copy_texts(self, pb, msg, version_ids, ignore_missing=False):
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
2842
        """Copy texts to the target by extracting and adding them one by one.
2843
2844
        see join() for the parameter definitions.
2845
        """
2846
        version_ids = self._get_source_version_ids(version_ids, ignore_missing)
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
2847
        # --- the below is factorable out with VersionedFile.join, but wait for
2848
        # VersionedFiles, it may all be simpler then.
2849
        graph = Graph(self.source)
2850
        search = graph._make_breadth_first_searcher(version_ids)
2851
        transitive_ids = set()
2852
        map(transitive_ids.update, list(search))
2853
        parent_map = self.source.get_parent_map(transitive_ids)
2854
        order = topo_sort(parent_map.items())
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
2855
2856
        def size_of_content(content):
2857
            return sum(len(line) for line in content.text())
2858
        # Cache at most 10MB of parent texts
2859
        parent_cache = lru_cache.LRUSizeCache(max_size=10*1024*1024,
2860
                                              compute_size=size_of_content)
2861
        # TODO: jam 20071116 It would be nice to have a streaming interface to
2862
        #       get multiple texts from a source. The source could be smarter
2863
        #       about how it handled intermediate stages.
2998.2.3 by John Arbash Meinel
Respond to Aaron's requests
2864
        #       get_line_list() or make_mpdiffs() seem like a possibility, but
2865
        #       at the moment they extract all full texts into memory, which
2866
        #       causes us to store more than our 3x fulltext goal.
2867
        #       Repository.iter_files_bytes() may be another possibility
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
2868
        to_process = [version for version in order
2869
                               if version not in self.target]
2870
        total = len(to_process)
2871
        pb = ui.ui_factory.nested_progress_bar()
2872
        try:
2873
            for index, version in enumerate(to_process):
2874
                pb.update('Converting versioned data', index, total)
2875
                sha1, num_bytes, parent_text = self.target.add_lines(version,
3052.2.3 by Robert Collins
Handle insert_data_stream of an unannotated stream into an annotated knit.
2876
                    self.source.get_parents_with_ghosts(version),
2998.2.2 by John Arbash Meinel
implement a faster path for copying from packs back to knits.
2877
                    self.source.get_lines(version),
2878
                    parent_texts=parent_cache)
2879
                parent_cache[version] = parent_text
2880
        finally:
2881
            pb.finished()
2882
        return total
2883
1563.2.31 by Robert Collins
Convert Knit repositories to use knits.
2884
    def join(self, pb=None, msg=None, version_ids=None, ignore_missing=False):
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2885
        """See InterVersionedFile.join."""
2886
        assert isinstance(self.source, KnitVersionedFile)
2887
        assert isinstance(self.target, KnitVersionedFile)
2888
2851.4.3 by Ian Clatworthy
fix up plain-to-annotated knit conversion
2889
        # If the source and target are mismatched w.r.t. annotations vs
2890
        # plain, the data needs to be converted accordingly
2891
        if self.source.factory.annotated == self.target.factory.annotated:
2892
            converter = None
2893
        elif self.source.factory.annotated:
2894
            converter = self._anno_to_plain_converter
2895
        else:
2998.2.3 by John Arbash Meinel
Respond to Aaron's requests
2896
            # We're converting from a plain to an annotated knit. Copy them
2897
            # across by full texts.
2898
            return self._copy_texts(pb, msg, version_ids, ignore_missing)
2851.4.3 by Ian Clatworthy
fix up plain-to-annotated knit conversion
2899
1684.3.2 by Robert Collins
Factor out version_ids-to-join selection in InterVersionedfile.
2900
        version_ids = self._get_source_version_ids(version_ids, ignore_missing)
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2901
        if not version_ids:
2902
            return 0
2903
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
2904
        pb = ui.ui_factory.nested_progress_bar()
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2905
        try:
2906
            version_ids = list(version_ids)
2907
            if None in version_ids:
2908
                version_ids.remove(None)
2909
    
3052.2.2 by Robert Collins
* Operations pulling data from a smart server where the underlying
2910
            self.source_ancestry = set(self.source.get_ancestry(version_ids,
2911
                topo_sorted=False))
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2912
            this_versions = set(self.target._index.get_versions())
2825.4.1 by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some
2913
            # XXX: For efficiency we should not look at the whole index,
2914
            #      we only need to consider the referenced revisions - they
2915
            #      must all be present, or the method must be full-text.
2916
            #      TODO, RBC 20070919
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2917
            needed_versions = self.source_ancestry - this_versions
2918
    
2825.4.1 by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some
2919
            if not needed_versions:
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2920
                return 0
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
2921
            full_list = topo_sort(
2922
                self.source.get_parent_map(self.source.versions()))
1910.2.65 by Aaron Bentley
Remove the check-parent patch
2923
    
2924
            version_list = [i for i in full_list if (not self.target.has_version(i)
2925
                            and i in needed_versions)]
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2926
    
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2927
            # plan the join:
2928
            copy_queue = []
2929
            copy_queue_records = []
2930
            copy_set = set()
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2931
            for version_id in version_list:
2932
                options = self.source._index.get_options(version_id)
2933
                parents = self.source._index.get_parents_with_ghosts(version_id)
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2934
                # check that its will be a consistent copy:
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2935
                for parent in parents:
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2936
                    # if source has the parent, we must :
2937
                    # * already have it or
2938
                    # * have it scheduled already
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
2939
                    # otherwise we don't care
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2940
                    assert (self.target.has_version(parent) or
2941
                            parent in copy_set or
2942
                            not self.source.has_version(parent))
2592.3.71 by Robert Collins
Basic version of knit-based repository operating, many tests failing.
2943
                index_memo = self.source._index.get_position(version_id)
2944
                copy_queue_records.append((version_id, index_memo))
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2945
                copy_queue.append((version_id, options, parents))
2946
                copy_set.add(version_id)
2947
2948
            # data suck the join:
2949
            count = 0
2950
            total = len(version_list)
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
2951
            raw_datum = []
2952
            raw_records = []
3350.3.3 by Robert Collins
Functional get_record_stream interface tests covering full interface.
2953
            for (version_id, raw_data, _), \
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2954
                (version_id2, options, parents) in \
2955
                izip(self.source._data.read_records_iter_raw(copy_queue_records),
2956
                     copy_queue):
2957
                assert version_id == version_id2, 'logic error, inconsistent results'
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2958
                count = count + 1
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2959
                pb.update("Joining knit", count, total)
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
2960
                if converter:
2961
                    size, raw_data = converter(raw_data, version_id, options,
2962
                        parents)
2851.4.1 by Ian Clatworthy
Support joining plain knits to annotated knits and vice versa
2963
                else:
2964
                    size = len(raw_data)
2965
                raw_records.append((version_id, options, parents, size))
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
2966
                raw_datum.append(raw_data)
2967
            self.target._add_raw_records(raw_records, ''.join(raw_datum))
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2968
            return count
2969
        finally:
2970
            pb.finished()
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2971
2851.4.2 by Ian Clatworthy
use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
2972
    def _anno_to_plain_converter(self, raw_data, version_id, options,
2973
                                 parents):
2974
        """Convert annotated content to plain content."""
2975
        data, digest = self.source._data._parse_record(version_id, raw_data)
2976
        if 'fulltext' in options:
2977
            content = self.source.factory.parse_fulltext(data, version_id)
2978
            lines = self.target.factory.lower_fulltext(content)
2979
        else:
2980
            delta = self.source.factory.parse_line_delta(data, version_id,
2981
                plain=True)
2982
            lines = self.target.factory.lower_line_delta(delta)
2983
        return self.target._data._record_to_data(version_id, digest, lines)
2984
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2985
2986
InterVersionedFile.register_optimiser(InterKnit)
1596.2.24 by Robert Collins
Gzipfile was slightly slower than ideal.
2987
2988
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
2989
class WeaveToKnit(InterVersionedFile):
2990
    """Optimised code paths for weave to knit operations."""
2991
    
2992
    _matching_file_from_factory = bzrlib.weave.WeaveFile
3316.2.3 by Robert Collins
Remove manual notification of transaction finishing on versioned files.
2993
    _matching_file_to_factory = staticmethod(make_file_knit)
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
2994
    
2995
    @staticmethod
2996
    def is_compatible(source, target):
2997
        """Be compatible with weaves to knits."""
2998
        try:
2999
            return (isinstance(source, bzrlib.weave.Weave) and
3000
                    isinstance(target, KnitVersionedFile))
3001
        except AttributeError:
3002
            return False
3003
3004
    def join(self, pb=None, msg=None, version_ids=None, ignore_missing=False):
3005
        """See InterVersionedFile.join."""
3006
        assert isinstance(self.source, bzrlib.weave.Weave)
3007
        assert isinstance(self.target, KnitVersionedFile)
3008
3009
        version_ids = self._get_source_version_ids(version_ids, ignore_missing)
3010
3011
        if not version_ids:
3012
            return 0
3013
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
3014
        pb = ui.ui_factory.nested_progress_bar()
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3015
        try:
3016
            version_ids = list(version_ids)
3017
    
3018
            self.source_ancestry = set(self.source.get_ancestry(version_ids))
3019
            this_versions = set(self.target._index.get_versions())
3020
            needed_versions = self.source_ancestry - this_versions
3021
    
2825.4.1 by Robert Collins
* ``pull``, ``merge`` and ``push`` will no longer silently correct some
3022
            if not needed_versions:
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3023
                return 0
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
3024
            full_list = topo_sort(
3025
                self.source.get_parent_map(self.source.versions()))
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3026
    
3027
            version_list = [i for i in full_list if (not self.target.has_version(i)
3028
                            and i in needed_versions)]
3029
    
3030
            # do the join:
3031
            count = 0
3032
            total = len(version_list)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
3033
            parent_map = self.source.get_parent_map(version_list)
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3034
            for version_id in version_list:
3035
                pb.update("Converting to knit", count, total)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
3036
                parents = parent_map[version_id]
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
3037
                # check that its will be a consistent copy:
3038
                for parent in parents:
3039
                    # if source has the parent, we must already have it
3040
                    assert (self.target.has_version(parent))
3041
                self.target.add_lines(
3042
                    version_id, parents, self.source.get_lines(version_id))
3043
                count = count + 1
3044
            return count
3045
        finally:
3046
            pb.finished()
3047
3048
3049
InterVersionedFile.register_optimiser(WeaveToKnit)
3050
3051
2781.1.1 by Martin Pool
merge cpatiencediff from Lukas
3052
# Deprecated, use PatienceSequenceMatcher instead
3053
KnitSequenceMatcher = patiencediff.PatienceSequenceMatcher
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
3054
3055
2770.1.2 by Aaron Bentley
Convert to knit-only annotation
3056
def annotate_knit(knit, revision_id):
3057
    """Annotate a knit with no cached annotations.
3058
3059
    This implementation is for knits with no cached annotations.
3060
    It will work for knits with cached annotations, but this is not
3061
    recommended.
3062
    """
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
3063
    annotator = _KnitAnnotator(knit)
3224.1.25 by John Arbash Meinel
Quick change to the _KnitAnnotator api to use .annotate() instead of get_annotated_lines()
3064
    return iter(annotator.annotate(revision_id))
3224.1.7 by John Arbash Meinel
_StreamIndex also needs to return the proper values for get_build_details.
3065
3066
3067
class _KnitAnnotator(object):
3224.1.5 by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations.
3068
    """Build up the annotations for a text."""
3069
3070
    def __init__(self, knit):
3071
        self._knit = knit
3072
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3073
        # Content objects, differs from fulltexts because of how final newlines
3074
        # are treated by knits. the content objects here will always have a
3075
        # final newline
3076
        self._fulltext_contents = {}
3077
3078
        # Annotated lines of specific revisions
3079
        self._annotated_lines = {}
3080
3081
        # Track the raw data for nodes that we could not process yet.
3082
        # This maps the revision_id of the base to a list of children that will
3083
        # annotated from it.
3084
        self._pending_children = {}
3085
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3086
        # Nodes which cannot be extracted
3087
        self._ghosts = set()
3088
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3089
        # Track how many children this node has, so we know if we need to keep
3090
        # it
3091
        self._annotate_children = {}
3092
        self._compression_children = {}
3093
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3094
        self._all_build_details = {}
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3095
        # The children => parent revision_id graph
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3096
        self._revision_id_graph = {}
3097
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3098
        self._heads_provider = None
3099
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3100
        self._nodes_to_keep_annotations = set()
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3101
        self._generations_until_keep = 100
3102
3103
    def set_generations_until_keep(self, value):
3104
        """Set the number of generations before caching a node.
3105
3106
        Setting this to -1 will cache every merge node, setting this higher
3107
        will cache fewer nodes.
3108
        """
3109
        self._generations_until_keep = value
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3110
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3111
    def _add_fulltext_content(self, revision_id, content_obj):
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3112
        self._fulltext_contents[revision_id] = content_obj
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3113
        # TODO: jam 20080305 It might be good to check the sha1digest here
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3114
        return content_obj.text()
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3115
3116
    def _check_parents(self, child, nodes_to_annotate):
3117
        """Check if all parents have been processed.
3118
3119
        :param child: A tuple of (rev_id, parents, raw_content)
3120
        :param nodes_to_annotate: If child is ready, add it to
3121
            nodes_to_annotate, otherwise put it back in self._pending_children
3122
        """
3123
        for parent_id in child[1]:
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3124
            if (parent_id not in self._annotated_lines):
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3125
                # This parent is present, but another parent is missing
3126
                self._pending_children.setdefault(parent_id,
3127
                                                  []).append(child)
3128
                break
3129
        else:
3130
            # This one is ready to be processed
3131
            nodes_to_annotate.append(child)
3132
3133
    def _add_annotation(self, revision_id, fulltext, parent_ids,
3134
                        left_matching_blocks=None):
3135
        """Add an annotation entry.
3136
3137
        All parents should already have been annotated.
3138
        :return: A list of children that now have their parents satisfied.
3139
        """
3140
        a = self._annotated_lines
3141
        annotated_parent_lines = [a[p] for p in parent_ids]
3142
        annotated_lines = list(annotate.reannotate(annotated_parent_lines,
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3143
            fulltext, revision_id, left_matching_blocks,
3144
            heads_provider=self._get_heads_provider()))
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3145
        self._annotated_lines[revision_id] = annotated_lines
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3146
        for p in parent_ids:
3147
            ann_children = self._annotate_children[p]
3148
            ann_children.remove(revision_id)
3149
            if (not ann_children
3150
                and p not in self._nodes_to_keep_annotations):
3151
                del self._annotated_lines[p]
3152
                del self._all_build_details[p]
3153
                if p in self._fulltext_contents:
3154
                    del self._fulltext_contents[p]
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3155
        # Now that we've added this one, see if there are any pending
3156
        # deltas to be done, certainly this parent is finished
3157
        nodes_to_annotate = []
3158
        for child in self._pending_children.pop(revision_id, []):
3159
            self._check_parents(child, nodes_to_annotate)
3160
        return nodes_to_annotate
3161
3162
    def _get_build_graph(self, revision_id):
3163
        """Get the graphs for building texts and annotations.
3164
3165
        The data you need for creating a full text may be different than the
3166
        data you need to annotate that text. (At a minimum, you need both
3167
        parents to create an annotation, but only need 1 parent to generate the
3168
        fulltext.)
3169
3170
        :return: A list of (revision_id, index_memo) records, suitable for
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3171
            passing to read_records_iter to start reading in the raw data fro/
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3172
            the pack file.
3173
        """
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3174
        if revision_id in self._annotated_lines:
3175
            # Nothing to do
3176
            return []
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3177
        pending = set([revision_id])
3178
        records = []
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3179
        generation = 0
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3180
        kept_generation = 0
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3181
        while pending:
3182
            # get all pending nodes
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3183
            generation += 1
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3184
            this_iteration = pending
3185
            build_details = self._knit._index.get_build_details(this_iteration)
3186
            self._all_build_details.update(build_details)
3187
            # new_nodes = self._knit._index._get_entries(this_iteration)
3188
            pending = set()
3189
            for rev_id, details in build_details.iteritems():
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3190
                (index_memo, compression_parent, parents,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3191
                 record_details) = details
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3192
                self._revision_id_graph[rev_id] = parents
3193
                records.append((rev_id, index_memo))
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3194
                # Do we actually need to check _annotated_lines?
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3195
                pending.update(p for p in parents
3196
                                 if p not in self._all_build_details)
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3197
                if compression_parent:
3198
                    self._compression_children.setdefault(compression_parent,
3199
                        []).append(rev_id)
3200
                if parents:
3201
                    for parent in parents:
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3202
                        self._annotate_children.setdefault(parent,
3203
                            []).append(rev_id)
3204
                    num_gens = generation - kept_generation
3205
                    if ((num_gens >= self._generations_until_keep)
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3206
                        and len(parents) > 1):
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3207
                        kept_generation = generation
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3208
                        self._nodes_to_keep_annotations.add(rev_id)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3209
3210
            missing_versions = this_iteration.difference(build_details.keys())
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3211
            self._ghosts.update(missing_versions)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3212
            for missing_version in missing_versions:
3213
                # add a key, no parents
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3214
                self._revision_id_graph[missing_version] = ()
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3215
                pending.discard(missing_version) # don't look for it
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3216
        # XXX: This should probably be a real exception, as it is a data
3217
        #      inconsistency
3218
        assert not self._ghosts.intersection(self._compression_children), \
3219
            "We cannot have nodes which have a compression parent of a ghost."
3220
        # Cleanout anything that depends on a ghost so that we don't wait for
3221
        # the ghost to show up
3222
        for node in self._ghosts:
3223
            if node in self._annotate_children:
3224
                # We won't be building this node
3225
                del self._annotate_children[node]
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3226
        # Generally we will want to read the records in reverse order, because
3227
        # we find the parent nodes after the children
3228
        records.reverse()
3229
        return records
3230
3231
    def _annotate_records(self, records):
3232
        """Build the annotations for the listed records."""
3233
        # We iterate in the order read, rather than a strict order requested
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3234
        # However, process what we can, and put off to the side things that
3235
        # still need parents, cleaning them up when those parents are
3236
        # processed.
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3237
        for (rev_id, record,
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3238
             digest) in self._knit._data.read_records_iter(records):
3239
            if rev_id in self._annotated_lines:
3240
                continue
3241
            parent_ids = self._revision_id_graph[rev_id]
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3242
            parent_ids = [p for p in parent_ids if p not in self._ghosts]
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3243
            details = self._all_build_details[rev_id]
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3244
            (index_memo, compression_parent, parents,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3245
             record_details) = details
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3246
            nodes_to_annotate = []
3247
            # TODO: Remove the punning between compression parents, and
3248
            #       parent_ids, we should be able to do this without assuming
3249
            #       the build order
3250
            if len(parent_ids) == 0:
3251
                # There are no parents for this node, so just add it
3252
                # TODO: This probably needs to be decoupled
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3253
                assert compression_parent is None
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3254
                fulltext_content, delta = self._knit.factory.parse_record(
3255
                    rev_id, record, record_details, None)
3256
                fulltext = self._add_fulltext_content(rev_id, fulltext_content)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3257
                nodes_to_annotate.extend(self._add_annotation(rev_id, fulltext,
3258
                    parent_ids, left_matching_blocks=None))
3259
            else:
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3260
                child = (rev_id, parent_ids, record)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3261
                # Check if all the parents are present
3262
                self._check_parents(child, nodes_to_annotate)
3263
            while nodes_to_annotate:
3264
                # Should we use a queue here instead of a stack?
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3265
                (rev_id, parent_ids, record) = nodes_to_annotate.pop()
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3266
                (index_memo, compression_parent, parents,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3267
                 record_details) = self._all_build_details[rev_id]
3224.1.14 by John Arbash Meinel
Switch to making content_details opaque, step 1
3268
                if compression_parent is not None:
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3269
                    comp_children = self._compression_children[compression_parent]
3270
                    assert rev_id in comp_children
3271
                    # If there is only 1 child, it is safe to reuse this
3272
                    # content
3273
                    reuse_content = (len(comp_children) == 1
3274
                        and compression_parent not in
3275
                            self._nodes_to_keep_annotations)
3276
                    if reuse_content:
3277
                        # Remove it from the cache since it will be changing
3278
                        parent_fulltext_content = self._fulltext_contents.pop(compression_parent)
3279
                        # Make sure to copy the fulltext since it might be
3280
                        # modified
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3281
                        parent_fulltext = list(parent_fulltext_content.text())
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3282
                    else:
3283
                        parent_fulltext_content = self._fulltext_contents[compression_parent]
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3284
                        parent_fulltext = parent_fulltext_content.text()
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3285
                    comp_children.remove(rev_id)
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3286
                    fulltext_content, delta = self._knit.factory.parse_record(
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3287
                        rev_id, record, record_details,
3288
                        parent_fulltext_content,
3224.1.19 by John Arbash Meinel
Work on removing nodes from the working set once they aren't needed.
3289
                        copy_base_content=(not reuse_content))
3224.1.22 by John Arbash Meinel
Cleanup the extra debugging info, and some >80 char lines.
3290
                    fulltext = self._add_fulltext_content(rev_id,
3291
                                                          fulltext_content)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3292
                    blocks = KnitContent.get_line_delta_blocks(delta,
3293
                            parent_fulltext, fulltext)
3294
                else:
3295
                    fulltext_content = self._knit.factory.parse_fulltext(
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3296
                        record, rev_id)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3297
                    fulltext = self._add_fulltext_content(rev_id,
3224.1.15 by John Arbash Meinel
Finish removing method and noeol from general knowledge,
3298
                        fulltext_content)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3299
                    blocks = None
3300
                nodes_to_annotate.extend(
3301
                    self._add_annotation(rev_id, fulltext, parent_ids,
3302
                                     left_matching_blocks=blocks))
3303
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3304
    def _get_heads_provider(self):
3305
        """Create a heads provider for resolving ancestry issues."""
3306
        if self._heads_provider is not None:
3307
            return self._heads_provider
3308
        parent_provider = _mod_graph.DictParentsProvider(
3309
            self._revision_id_graph)
3310
        graph_obj = _mod_graph.Graph(parent_provider)
3224.1.20 by John Arbash Meinel
Reduce the number of cache misses by caching known heads answers
3311
        head_cache = _mod_graph.FrozenHeadsCache(graph_obj)
3224.1.10 by John Arbash Meinel
Introduce the heads_provider for reannotate.
3312
        self._heads_provider = head_cache
3313
        return head_cache
3314
3224.1.25 by John Arbash Meinel
Quick change to the _KnitAnnotator api to use .annotate() instead of get_annotated_lines()
3315
    def annotate(self, revision_id):
3224.1.5 by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations.
3316
        """Return the annotated fulltext at the given revision.
3317
3318
        :param revision_id: The revision id for this file
3319
        """
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3320
        records = self._get_build_graph(revision_id)
3224.1.29 by John Arbash Meinel
Properly handle annotating when ghosts are present.
3321
        if revision_id in self._ghosts:
3322
            raise errors.RevisionNotPresent(revision_id, self._knit)
3224.1.6 by John Arbash Meinel
Refactor the annotation logic into a helper class.
3323
        self._annotate_records(records)
3324
        return self._annotated_lines[revision_id]
3224.1.5 by John Arbash Meinel
Start using a helper class for doing the knit-pack annotations.
3325
3326
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
3327
try:
2484.1.12 by John Arbash Meinel
Switch the layout to use a matching _knit_load_data_py.py and _knit_load_data_c.pyx
3328
    from bzrlib._knit_load_data_c import _load_data_c as _load_data
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
3329
except ImportError:
2484.1.12 by John Arbash Meinel
Switch the layout to use a matching _knit_load_data_py.py and _knit_load_data_c.pyx
3330
    from bzrlib._knit_load_data_py import _load_data_py as _load_data