/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
1
# Copyright (C) 2005, 2006, 2007 Canonical Ltd
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
17
"""Knit versionedfile implementation.
18
19
A knit is a versioned file implementation that supports efficient append only
20
updates.
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
21
22
Knit file layout:
23
lifeless: the data file is made up of "delta records".  each delta record has a delta header 
24
that contains; (1) a version id, (2) the size of the delta (in lines), and (3)  the digest of 
25
the -expanded data- (ie, the delta applied to the parent).  the delta also ends with a 
26
end-marker; simply "end VERSION"
27
28
delta can be line or full contents.a
29
... the 8's there are the index number of the annotation.
30
version robertc@robertcollins.net-20051003014215-ee2990904cc4c7ad 7 c7d23b2a5bd6ca00e8e266cec0ec228158ee9f9e
31
59,59,3
32
8
33
8         if ie.executable:
34
8             e.set('executable', 'yes')
35
130,130,2
36
8         if elt.get('executable') == 'yes':
37
8             ie.executable = True
38
end robertc@robertcollins.net-20051003014215-ee2990904cc4c7ad 
39
40
41
whats in an index:
42
09:33 < jrydberg> lifeless: each index is made up of a tuple of; version id, options, position, size, parents
43
09:33 < jrydberg> lifeless: the parents are currently dictionary compressed
44
09:33 < jrydberg> lifeless: (meaning it currently does not support ghosts)
45
09:33 < lifeless> right
46
09:33 < jrydberg> lifeless: the position and size is the range in the data file
47
48
49
so the index sequence is the dictionary compressed sequence number used
50
in the deltas to provide line annotation
51
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
52
"""
53
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
54
# TODOS:
55
# 10:16 < lifeless> make partial index writes safe
56
# 10:16 < lifeless> implement 'knit.check()' like weave.check()
57
# 10:17 < lifeless> record known ghosts so we can detect when they are filled in rather than the current 'reweave 
58
#                    always' approach.
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
59
# move sha1 out of the content so that join is faster at verifying parents
60
# record content length ?
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
61
                  
62
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
63
from copy import copy
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
64
from cStringIO import StringIO
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
65
import difflib
1596.2.28 by Robert Collins
more knit profile based tuning.
66
from itertools import izip, chain
1756.2.17 by Aaron Bentley
Fixes suggested by John Meinel
67
import operator
1563.2.6 by Robert Collins
Start check tests for knits (pending), and remove dead code.
68
import os
1628.1.2 by Robert Collins
More knit micro-optimisations.
69
import sys
1756.2.29 by Aaron Bentley
Remove basis knit support
70
import warnings
2762.3.1 by Robert Collins
* The compression used within the bzr repository has changed from zlib
71
from zlib import Z_DEFAULT_COMPRESSION
1594.2.19 by Robert Collins
More coalescing tweaks, and knit feedback.
72
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
73
import bzrlib
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
74
from bzrlib.lazy_import import lazy_import
75
lazy_import(globals(), """
76
from bzrlib import (
77
    pack,
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
78
    trace,
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
79
    )
80
""")
1911.2.3 by John Arbash Meinel
Moving everything into a new location so that we can cache more than just revision ids
81
from bzrlib import (
82
    cache_utf8,
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
83
    debug,
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
84
    diff,
1911.2.3 by John Arbash Meinel
Moving everything into a new location so that we can cache more than just revision ids
85
    errors,
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
86
    osutils,
2104.4.2 by John Arbash Meinel
Small cleanup and NEWS entry about fixing bug #65714
87
    patiencediff,
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
88
    progress,
1551.15.46 by Aaron Bentley
Move plan merge to tree
89
    merge,
2196.2.1 by John Arbash Meinel
Merge Dmitry's optimizations and minimize the actual diff.
90
    ui,
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
91
    )
92
from bzrlib.errors import (
93
    FileExists,
94
    NoSuchFile,
95
    KnitError,
96
    InvalidRevisionId,
97
    KnitCorrupt,
98
    KnitHeaderError,
99
    RevisionNotPresent,
100
    RevisionAlreadyPresent,
101
    )
1773.4.1 by Martin Pool
Add pyflakes makefile target; fix many warnings
102
from bzrlib.tuned_gzip import GzipFile
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
103
from bzrlib.osutils import (
104
    contains_whitespace,
105
    contains_linebreaks,
106
    sha_strings,
107
    )
1756.2.29 by Aaron Bentley
Remove basis knit support
108
from bzrlib.symbol_versioning import DEPRECATED_PARAMETER, deprecated_passed
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
109
from bzrlib.tsort import topo_sort
2094.3.5 by John Arbash Meinel
Fix imports to ensure modules are loaded before they are used
110
import bzrlib.ui
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
111
import bzrlib.weave
1911.2.1 by John Arbash Meinel
Cache encode/decode operations, saves memory and time. Especially when committing a new kernel tree with 7.7M new lines to annotate
112
from bzrlib.versionedfile import VersionedFile, InterVersionedFile
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
113
114
115
# TODO: Split out code specific to this format into an associated object.
116
117
# TODO: Can we put in some kind of value to check that the index and data
118
# files belong together?
119
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
120
# TODO: accommodate binaries, perhaps by storing a byte count
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
121
122
# TODO: function to check whole file
123
124
# TODO: atomically append data, then measure backwards from the cursor
125
# position after writing to work out where it was located.  we may need to
126
# bypass python file buffering.
127
128
DATA_SUFFIX = '.knit'
129
INDEX_SUFFIX = '.kndx'
130
131
132
class KnitContent(object):
133
    """Content of a knit version to which deltas can be applied."""
134
135
    def __init__(self, lines):
136
        self._lines = lines
137
138
    def annotate_iter(self):
139
        """Yield tuples of (origin, text) for each content line."""
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
140
        return iter(self._lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
141
142
    def annotate(self):
143
        """Return a list of (origin, text) tuples."""
144
        return list(self.annotate_iter())
145
146
    def line_delta_iter(self, new_lines):
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
147
        """Generate line-based delta from this content to new_lines."""
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
148
        new_texts = new_lines.text()
149
        old_texts = self.text()
1711.2.11 by John Arbash Meinel
Rename patiencediff.SequenceMatcher => PatienceSequenceMatcher and knit.SequenceMatcher => KnitSequenceMatcher
150
        s = KnitSequenceMatcher(None, old_texts, new_texts)
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
151
        for tag, i1, i2, j1, j2 in s.get_opcodes():
152
            if tag == 'equal':
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
153
                continue
2151.1.1 by John Arbash Meinel
(Dmitry Vasiliev) Tune KnitContent and add tests
154
            # ofrom, oto, length, data
155
            yield i1, i2, j2 - j1, new_lines._lines[j1:j2]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
156
157
    def line_delta(self, new_lines):
158
        return list(self.line_delta_iter(new_lines))
159
160
    def text(self):
161
        return [text for origin, text in self._lines]
162
1756.3.7 by Aaron Bentley
Avoid re-parsing texts version components
163
    def copy(self):
164
        return KnitContent(self._lines[:])
165
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
166
    @staticmethod
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
167
    def get_line_delta_blocks(knit_delta, source, target):
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
168
        """Extract SequenceMatcher.get_matching_blocks() from a knit delta"""
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
169
        target_len = len(target)
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
170
        s_pos = 0
171
        t_pos = 0
172
        for s_begin, s_end, t_len, new_text in knit_delta:
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
173
            true_n = s_begin - s_pos
174
            n = true_n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
175
            if n > 0:
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
176
                # knit deltas do not provide reliable info about whether the
177
                # last line of a file matches, due to eol handling.
178
                if source[s_pos + n -1] != target[t_pos + n -1]:
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
179
                    n-=1
180
                if n > 0:
181
                    yield s_pos, t_pos, n
182
            t_pos += t_len + true_n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
183
            s_pos = s_end
2520.4.48 by Aaron Bentley
Support getting blocks from knit deltas with no final EOL
184
        n = target_len - t_pos
185
        if n > 0:
186
            if source[s_pos + n -1] != target[t_pos + n -1]:
187
                n-=1
188
            if n > 0:
189
                yield s_pos, t_pos, n
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
190
        yield s_pos + (target_len - t_pos), target_len, 0
191
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
192
193
class _KnitFactory(object):
194
    """Base factory for creating content objects."""
195
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
196
    def make(self, lines, version_id):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
197
        num_lines = len(lines)
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
198
        return KnitContent(zip([version_id] * num_lines, lines))
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
199
200
201
class KnitAnnotateFactory(_KnitFactory):
202
    """Factory for creating annotated Content objects."""
203
204
    annotated = True
205
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
206
    def parse_fulltext(self, content, version_id):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
207
        """Convert fulltext to internal representation
208
209
        fulltext content is of the format
210
        revid(utf8) plaintext\n
211
        internal representation is of the format:
212
        (revid, plaintext)
213
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
214
        # TODO: jam 20070209 The tests expect this to be returned as tuples,
215
        #       but the code itself doesn't really depend on that.
216
        #       Figure out a way to not require the overhead of turning the
217
        #       list back into tuples.
218
        lines = [tuple(line.split(' ', 1)) for line in content]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
219
        return KnitContent(lines)
220
221
    def parse_line_delta_iter(self, lines):
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
222
        return iter(self.parse_line_delta(lines))
1628.1.2 by Robert Collins
More knit micro-optimisations.
223
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
224
    def parse_line_delta(self, lines, version_id):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
225
        """Convert a line based delta into internal representation.
226
227
        line delta is in the form of:
228
        intstart intend intcount
229
        1..count lines:
230
        revid(utf8) newline\n
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
231
        internal representation is
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
232
        (start, end, count, [1..count tuples (revid, newline)])
233
        """
1628.1.2 by Robert Collins
More knit micro-optimisations.
234
        result = []
235
        lines = iter(lines)
236
        next = lines.next
2249.5.1 by John Arbash Meinel
Leave revision-ids in utf-8 when reading.
237
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
238
        cache = {}
239
        def cache_and_return(line):
240
            origin, text = line.split(' ', 1)
241
            return cache.setdefault(origin, origin), text
242
1628.1.2 by Robert Collins
More knit micro-optimisations.
243
        # walk through the lines parsing.
244
        for header in lines:
245
            start, end, count = [int(n) for n in header.split(',')]
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
246
            contents = [tuple(next().split(' ', 1)) for i in xrange(count)]
1628.1.2 by Robert Collins
More knit micro-optimisations.
247
            result.append((start, end, count, contents))
248
        return result
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
249
2163.2.2 by John Arbash Meinel
Don't deal with annotations when we don't care about them. Saves another 300+ms
250
    def get_fulltext_content(self, lines):
251
        """Extract just the content lines from a fulltext."""
252
        return (line.split(' ', 1)[1] for line in lines)
253
254
    def get_linedelta_content(self, lines):
255
        """Extract just the content from a line delta.
256
257
        This doesn't return all of the extra information stored in a delta.
258
        Only the actual content lines.
259
        """
260
        lines = iter(lines)
261
        next = lines.next
262
        for header in lines:
263
            header = header.split(',')
264
            count = int(header[2])
265
            for i in xrange(count):
266
                origin, text = next().split(' ', 1)
267
                yield text
268
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
269
    def lower_fulltext(self, content):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
270
        """convert a fulltext content record into a serializable form.
271
272
        see parse_fulltext which this inverts.
273
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
274
        # TODO: jam 20070209 We only do the caching thing to make sure that
275
        #       the origin is a valid utf-8 line, eventually we could remove it
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
276
        return ['%s %s' % (o, t) for o, t in content._lines]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
277
278
    def lower_line_delta(self, delta):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
279
        """convert a delta into a serializable form.
280
1628.1.2 by Robert Collins
More knit micro-optimisations.
281
        See parse_line_delta which this inverts.
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
282
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
283
        # TODO: jam 20070209 We only do the caching thing to make sure that
284
        #       the origin is a valid utf-8 line, eventually we could remove it
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
285
        out = []
286
        for start, end, c, lines in delta:
287
            out.append('%d,%d,%d\n' % (start, end, c))
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
288
            out.extend(origin + ' ' + text
1911.2.1 by John Arbash Meinel
Cache encode/decode operations, saves memory and time. Especially when committing a new kernel tree with 7.7M new lines to annotate
289
                       for origin, text in lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
290
        return out
291
292
293
class KnitPlainFactory(_KnitFactory):
294
    """Factory for creating plain Content objects."""
295
296
    annotated = False
297
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
298
    def parse_fulltext(self, content, version_id):
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
299
        """This parses an unannotated fulltext.
300
301
        Note that this is not a noop - the internal representation
302
        has (versionid, line) - its just a constant versionid.
303
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
304
        return self.make(content, version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
305
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
306
    def parse_line_delta_iter(self, lines, version_id):
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
307
        cur = 0
308
        num_lines = len(lines)
309
        while cur < num_lines:
310
            header = lines[cur]
311
            cur += 1
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
312
            start, end, c = [int(n) for n in header.split(',')]
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
313
            yield start, end, c, zip([version_id] * c, lines[cur:cur+c])
2163.1.2 by John Arbash Meinel
Don't modify the list during parse_line_delta
314
            cur += c
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
315
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
316
    def parse_line_delta(self, lines, version_id):
317
        return list(self.parse_line_delta_iter(lines, version_id))
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
318
2163.2.2 by John Arbash Meinel
Don't deal with annotations when we don't care about them. Saves another 300+ms
319
    def get_fulltext_content(self, lines):
320
        """Extract just the content lines from a fulltext."""
321
        return iter(lines)
322
323
    def get_linedelta_content(self, lines):
324
        """Extract just the content from a line delta.
325
326
        This doesn't return all of the extra information stored in a delta.
327
        Only the actual content lines.
328
        """
329
        lines = iter(lines)
330
        next = lines.next
331
        for header in lines:
332
            header = header.split(',')
333
            count = int(header[2])
334
            for i in xrange(count):
335
                yield next()
336
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
337
    def lower_fulltext(self, content):
338
        return content.text()
339
340
    def lower_line_delta(self, delta):
341
        out = []
342
        for start, end, c, lines in delta:
343
            out.append('%d,%d,%d\n' % (start, end, c))
344
            out.extend([text for origin, text in lines])
345
        return out
346
347
348
def make_empty_knit(transport, relpath):
349
    """Construct a empty knit at the specified location."""
1563.2.5 by Robert Collins
Remove unused transaction references from knit.py and the versionedfile interface.
350
    k = KnitVersionedFile(transport, relpath, 'w', KnitPlainFactory)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
351
352
353
class KnitVersionedFile(VersionedFile):
354
    """Weave-like structure with faster random access.
355
356
    A knit stores a number of texts and a summary of the relationships
357
    between them.  Texts are identified by a string version-id.  Texts
358
    are normally stored and retrieved as a series of lines, but can
359
    also be passed as single strings.
360
361
    Lines are stored with the trailing newline (if any) included, to
362
    avoid special cases for files with no final newline.  Lines are
363
    composed of 8-bit characters, not unicode.  The combination of
364
    these approaches should mean any 'binary' file can be safely
365
    stored and retrieved.
366
    """
367
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
368
    def __init__(self, relpath, transport, file_mode=None, access_mode=None,
1756.2.29 by Aaron Bentley
Remove basis knit support
369
                 factory=None, basis_knit=DEPRECATED_PARAMETER, delta=True,
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
370
                 create=False, create_parent_dir=False, delay_create=False,
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
371
                 dir_mode=None, index=None, access_method=None):
1563.2.25 by Robert Collins
Merge in upstream.
372
        """Construct a knit at location specified by relpath.
373
        
374
        :param create: If not True, only open an existing knit.
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
375
        :param create_parent_dir: If True, create the parent directory if 
376
            creating the file fails. (This is used for stores with 
377
            hash-prefixes that may not exist yet)
378
        :param delay_create: The calling code is aware that the knit won't 
379
            actually be created until the first data is stored.
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
380
        :param index: An index to use for the knit.
1563.2.25 by Robert Collins
Merge in upstream.
381
        """
1756.2.29 by Aaron Bentley
Remove basis knit support
382
        if deprecated_passed(basis_knit):
383
            warnings.warn("KnitVersionedFile.__(): The basis_knit parameter is"
384
                 " deprecated as of bzr 0.9.",
385
                 DeprecationWarning, stacklevel=2)
1563.2.16 by Robert Collins
Change WeaveStore into VersionedFileStore and make its versoined file class parameterisable.
386
        if access_mode is None:
387
            access_mode = 'w'
1594.2.23 by Robert Collins
Test versioned file storage handling of clean/dirty status for accessed versioned files.
388
        super(KnitVersionedFile, self).__init__(access_mode)
1563.2.16 by Robert Collins
Change WeaveStore into VersionedFileStore and make its versoined file class parameterisable.
389
        assert access_mode in ('r', 'w'), "invalid mode specified %r" % access_mode
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
390
        self.transport = transport
391
        self.filename = relpath
1563.2.16 by Robert Collins
Change WeaveStore into VersionedFileStore and make its versoined file class parameterisable.
392
        self.factory = factory or KnitAnnotateFactory()
393
        self.writable = (access_mode == 'w')
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
394
        self.delta = delta
395
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
396
        self._max_delta_chain = 200
397
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
398
        if index is None:
399
            self._index = _KnitIndex(transport, relpath + INDEX_SUFFIX,
400
                access_mode, create=create, file_mode=file_mode,
401
                create_parent_dir=create_parent_dir, delay_create=delay_create,
402
                dir_mode=dir_mode)
403
        else:
404
            self._index = index
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
405
        if access_method is None:
406
            _access = _KnitAccess(transport, relpath + DATA_SUFFIX, file_mode, dir_mode,
407
                ((create and not len(self)) and delay_create), create_parent_dir)
408
        else:
409
            _access = access_method
410
        if create and not len(self) and not delay_create:
411
            _access.create()
412
        self._data = _KnitData(_access)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
413
1704.2.10 by Martin Pool
Add KnitVersionedFile.__repr__ method
414
    def __repr__(self):
415
        return '%s(%s)' % (self.__class__.__name__, 
416
                           self.transport.abspath(self.filename))
417
    
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
418
    def _check_should_delta(self, first_parents):
419
        """Iterate back through the parent listing, looking for a fulltext.
420
421
        This is used when we want to decide whether to add a delta or a new
422
        fulltext. It searches for _max_delta_chain parents. When it finds a
423
        fulltext parent, it sees if the total size of the deltas leading up to
424
        it is large enough to indicate that we want a new full text anyway.
425
426
        Return True if we should create a new delta, False if we should use a
427
        full text.
428
        """
429
        delta_size = 0
430
        fulltext_size = None
431
        delta_parents = first_parents
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
432
        for count in xrange(self._max_delta_chain):
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
433
            parent = delta_parents[0]
434
            method = self._index.get_method(parent)
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
435
            index, pos, size = self._index.get_position(parent)
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
436
            if method == 'fulltext':
437
                fulltext_size = size
438
                break
439
            delta_size += size
440
            delta_parents = self._index.get_parents(parent)
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
441
        else:
442
            # We couldn't find a fulltext, so we must create a new one
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
443
            return False
2147.1.2 by John Arbash Meinel
Simplify the knit max-chain detection code.
444
445
        return fulltext_size > delta_size
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
446
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
447
    def _add_delta(self, version_id, parents, delta_parent, sha1, noeol, delta):
448
        """See VersionedFile._add_delta()."""
449
        self._check_add(version_id, []) # should we check the lines ?
450
        self._check_versions_present(parents)
451
        present_parents = []
452
        ghosts = []
453
        parent_texts = {}
454
        for parent in parents:
455
            if not self.has_version(parent):
456
                ghosts.append(parent)
457
            else:
458
                present_parents.append(parent)
459
460
        if delta_parent is None:
461
            # reconstitute as full text.
462
            assert len(delta) == 1 or len(delta) == 0
463
            if len(delta):
464
                assert delta[0][0] == 0
1596.2.38 by Robert Collins
rollback from using deltas to using fulltexts - deltas need more work to be ready.
465
                assert delta[0][1] == 0, delta[0][1]
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
466
            return super(KnitVersionedFile, self)._add_delta(version_id,
467
                                                             parents,
468
                                                             delta_parent,
469
                                                             sha1,
470
                                                             noeol,
471
                                                             delta)
472
473
        digest = sha1
474
475
        options = []
476
        if noeol:
477
            options.append('no-eol')
478
479
        if delta_parent is not None:
480
            # determine the current delta chain length.
481
            # To speed the extract of texts the delta chain is limited
482
            # to a fixed number of deltas.  This should minimize both
483
            # I/O and the time spend applying deltas.
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
484
            # The window was changed to a maximum of 200 deltas, but also added
485
            # was a check that the total compressed size of the deltas is
486
            # smaller than the compressed size of the fulltext.
487
            if not self._check_should_delta([delta_parent]):
488
                # We don't want a delta here, just do a normal insertion.
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
489
                return super(KnitVersionedFile, self)._add_delta(version_id,
490
                                                                 parents,
491
                                                                 delta_parent,
492
                                                                 sha1,
493
                                                                 noeol,
494
                                                                 delta)
495
496
        options.append('line-delta')
497
        store_lines = self.factory.lower_line_delta(delta)
498
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
499
        access_memo = self._data.add_record(version_id, digest, store_lines)
500
        self._index.add_version(version_id, options, access_memo, parents)
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
501
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
502
    def _add_raw_records(self, records, data):
503
        """Add all the records 'records' with data pre-joined in 'data'.
504
505
        :param records: A list of tuples(version_id, options, parents, size).
506
        :param data: The data for the records. When it is written, the records
507
                     are adjusted to have pos pointing into data by the sum of
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
508
                     the preceding records sizes.
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
509
        """
510
        # write all the data
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
511
        raw_record_sizes = [record[3] for record in records]
512
        positions = self._data.add_raw_records(raw_record_sizes, data)
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
513
        offset = 0
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
514
        index_entries = []
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
515
        for (version_id, options, parents, size), access_memo in zip(
516
            records, positions):
517
            index_entries.append((version_id, options, access_memo, parents))
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
518
            if self._data._do_cache:
519
                self._data._cache[version_id] = data[offset:offset+size]
520
            offset += size
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
521
        self._index.add_versions(index_entries)
522
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
523
    def enable_cache(self):
524
        """Start caching data for this knit"""
525
        self._data.enable_cache()
526
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
527
    def clear_cache(self):
528
        """Clear the data cache only."""
529
        self._data.clear_cache()
530
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
531
    def copy_to(self, name, transport):
532
        """See VersionedFile.copy_to()."""
533
        # copy the current index to a temp index to avoid racing with local
534
        # writes
1955.3.30 by John Arbash Meinel
fix small bug
535
        transport.put_file_non_atomic(name + INDEX_SUFFIX + '.tmp',
1955.3.24 by John Arbash Meinel
Update Knit to use the new non_atomic_foo functions
536
                self.transport.get(self._index._filename))
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
537
        # copy the data file
1711.7.25 by John Arbash Meinel
try/finally to close files, _KnitData was keeping a handle to a file it never used again, and using transport.rename() when it wanted transport.move()
538
        f = self._data._open_file()
539
        try:
1955.3.8 by John Arbash Meinel
avoid some deprecation warnings in other parts of the code
540
            transport.put_file(name + DATA_SUFFIX, f)
1711.7.25 by John Arbash Meinel
try/finally to close files, _KnitData was keeping a handle to a file it never used again, and using transport.rename() when it wanted transport.move()
541
        finally:
542
            f.close()
543
        # move the copied index into place
544
        transport.move(name + INDEX_SUFFIX + '.tmp', name + INDEX_SUFFIX)
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
545
1563.2.13 by Robert Collins
InterVersionedFile implemented.
546
    def create_empty(self, name, transport, mode=None):
1955.3.8 by John Arbash Meinel
avoid some deprecation warnings in other parts of the code
547
        return KnitVersionedFile(name, transport, factory=self.factory,
548
                                 delta=self.delta, create=True)
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
549
    
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
550
    def _fix_parents(self, version_id, new_parents):
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
551
        """Fix the parents list for version.
552
        
553
        This is done by appending a new version to the index
554
        with identical data except for the parents list.
555
        the parents list must be a superset of the current
556
        list.
557
        """
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
558
        current_values = self._index._cache[version_id]
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
559
        assert set(current_values[4]).difference(set(new_parents)) == set()
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
560
        self._index.add_version(version_id,
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
561
                                current_values[1],
562
                                (None, current_values[2], current_values[3]),
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
563
                                new_parents)
564
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
565
    def _extract_blocks(self, version_id, source, target):
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
566
        if self._index.get_method(version_id) != 'line-delta':
567
            return None
568
        parent, sha1, noeol, delta = self.get_delta(version_id)
2520.4.47 by Aaron Bentley
Fix get_line_delta_blocks with eol
569
        return KnitContent.get_line_delta_blocks(delta, source, target)
2520.4.41 by Aaron Bentley
Accelerate mpdiff generation
570
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
571
    def get_delta(self, version_id):
572
        """Get a delta for constructing version from some other version."""
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
573
        version_id = osutils.safe_revision_id(version_id)
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
574
        self.check_not_reserved_id(version_id)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
575
        if not self.has_version(version_id):
576
            raise RevisionNotPresent(version_id, self.filename)
577
        
578
        parents = self.get_parents(version_id)
579
        if len(parents):
580
            parent = parents[0]
581
        else:
582
            parent = None
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
583
        index_memo = self._index.get_position(version_id)
584
        data, sha1 = self._data.read_records(((version_id, index_memo),))[version_id]
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
585
        noeol = 'no-eol' in self._index.get_options(version_id)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
586
        if 'fulltext' == self._index.get_method(version_id):
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
587
            new_content = self.factory.parse_fulltext(data, version_id)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
588
            if parent is not None:
589
                reference_content = self._get_content(parent)
590
                old_texts = reference_content.text()
591
            else:
592
                old_texts = []
593
            new_texts = new_content.text()
1711.2.11 by John Arbash Meinel
Rename patiencediff.SequenceMatcher => PatienceSequenceMatcher and knit.SequenceMatcher => KnitSequenceMatcher
594
            delta_seq = KnitSequenceMatcher(None, old_texts, new_texts)
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
595
            return parent, sha1, noeol, self._make_line_delta(delta_seq, new_content)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
596
        else:
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
597
            delta = self.factory.parse_line_delta(data, version_id)
1596.2.37 by Robert Collins
Switch to delta based content copying in the generic versioned file copier.
598
            return parent, sha1, noeol, delta
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
599
        
1594.2.8 by Robert Collins
add ghost aware apis to knits.
600
    def get_graph_with_ghosts(self):
601
        """See VersionedFile.get_graph_with_ghosts()."""
602
        graph_items = self._index.get_graph()
603
        return dict(graph_items)
604
1666.1.6 by Robert Collins
Make knit the default format.
605
    def get_sha1(self, version_id):
2520.4.89 by Aaron Bentley
Add get_sha1s to weaves
606
        return self.get_sha1s([version_id])[0]
2520.4.88 by Aaron Bentley
Retrieve all sha1s at once (ftw)
607
608
    def get_sha1s(self, version_ids):
1666.1.6 by Robert Collins
Make knit the default format.
609
        """See VersionedFile.get_sha1()."""
2520.4.88 by Aaron Bentley
Retrieve all sha1s at once (ftw)
610
        version_ids = [osutils.safe_revision_id(v) for v in version_ids]
611
        record_map = self._get_record_map(version_ids)
612
        # record entry 2 is the 'digest'.
613
        return [record_map[v][2] for v in version_ids]
1666.1.6 by Robert Collins
Make knit the default format.
614
1563.2.15 by Robert Collins
remove the weavestore assumptions about the number and nature of files it manages.
615
    @staticmethod
616
    def get_suffixes():
617
        """See VersionedFile.get_suffixes()."""
618
        return [DATA_SUFFIX, INDEX_SUFFIX]
1563.2.13 by Robert Collins
InterVersionedFile implemented.
619
1594.2.8 by Robert Collins
add ghost aware apis to knits.
620
    def has_ghost(self, version_id):
621
        """True if there is a ghost reference in the file to version_id."""
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
622
        version_id = osutils.safe_revision_id(version_id)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
623
        # maybe we have it
624
        if self.has_version(version_id):
625
            return False
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
626
        # optimisable if needed by memoising the _ghosts set.
1594.2.8 by Robert Collins
add ghost aware apis to knits.
627
        items = self._index.get_graph()
628
        for node, parents in items:
629
            for parent in parents:
630
                if parent not in self._index._cache:
631
                    if parent == version_id:
632
                        return True
633
        return False
634
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
635
    def versions(self):
636
        """See VersionedFile.versions."""
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
637
        if 'evil' in debug.debug_flags:
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
638
            trace.mutter_callsite(2, "versions scales with size of history")
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
639
        return self._index.get_versions()
640
641
    def has_version(self, version_id):
642
        """See VersionedFile.has_version."""
2745.1.1 by Robert Collins
Add a number of -Devil checkpoints.
643
        if 'evil' in debug.debug_flags:
2745.1.2 by Robert Collins
Ensure mutter_callsite is not directly called on a lazy_load object, to make the stacklevel parameter work correctly.
644
            trace.mutter_callsite(2, "has_version is a LBYL scenario")
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
645
        version_id = osutils.safe_revision_id(version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
646
        return self._index.has_version(version_id)
647
648
    __contains__ = has_version
649
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
650
    def _merge_annotations(self, content, parents, parent_texts={},
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
651
                           delta=None, annotated=None,
652
                           left_matching_blocks=None):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
653
        """Merge annotations for content.  This is done by comparing
1596.2.27 by Robert Collins
Note potential improvements in knit adds.
654
        the annotations based on changed to the text.
655
        """
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
656
        if left_matching_blocks is not None:
657
            delta_seq = diff._PrematchedMatcher(left_matching_blocks)
658
        else:
659
            delta_seq = None
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
660
        if annotated:
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
661
            for parent_id in parents:
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
662
                merge_content = self._get_content(parent_id, parent_texts)
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
663
                if (parent_id == parents[0] and delta_seq is not None):
664
                    seq = delta_seq
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
665
                else:
666
                    seq = patiencediff.PatienceSequenceMatcher(
667
                        None, merge_content.text(), content.text())
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
668
                for i, j, n in seq.get_matching_blocks():
669
                    if n == 0:
670
                        continue
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
671
                    # this appears to copy (origin, text) pairs across to the
672
                    # new content for any line that matches the last-checked
673
                    # parent.
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
674
                    content._lines[j:j+n] = merge_content._lines[i:i+n]
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
675
        if delta:
2520.4.146 by Aaron Bentley
Avoid get_matching_blocks for un-annotated text
676
            if delta_seq is None:
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
677
                reference_content = self._get_content(parents[0], parent_texts)
678
                new_texts = content.text()
679
                old_texts = reference_content.text()
2104.4.2 by John Arbash Meinel
Small cleanup and NEWS entry about fixing bug #65714
680
                delta_seq = patiencediff.PatienceSequenceMatcher(
2100.2.1 by wang
Replace python's difflib by patiencediff because the worst case
681
                                                 None, old_texts, new_texts)
1596.2.36 by Robert Collins
add a get_delta api to versioned_file.
682
            return self._make_line_delta(delta_seq, content)
683
684
    def _make_line_delta(self, delta_seq, new_content):
685
        """Generate a line delta from delta_seq and new_content."""
686
        diff_hunks = []
687
        for op in delta_seq.get_opcodes():
688
            if op[0] == 'equal':
689
                continue
690
            diff_hunks.append((op[1], op[2], op[4]-op[3], new_content._lines[op[3]:op[4]]))
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
691
        return diff_hunks
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
692
1756.3.17 by Aaron Bentley
Combine get_components_positions with get_components_versions
693
    def _get_components_positions(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
694
        """Produce a map of position data for the components of versions.
695
1756.3.22 by Aaron Bentley
Tweaks from review
696
        This data is intended to be used for retrieving the knit records.
1756.3.19 by Aaron Bentley
Documentation and cleanups
697
698
        A dict of version_id to (method, data_pos, data_size, next) is
699
        returned.
700
        method is the way referenced data should be applied.
701
        data_pos is the position of the data in the knit.
702
        data_size is the size of the data in the knit.
703
        next is the build-parent of the version, or None for fulltexts.
704
        """
1756.3.9 by Aaron Bentley
More optimization refactoring
705
        component_data = {}
706
        for version_id in version_ids:
707
            cursor = version_id
708
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
709
            while cursor is not None and cursor not in component_data:
1756.2.29 by Aaron Bentley
Remove basis knit support
710
                method = self._index.get_method(cursor)
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
711
                if method == 'fulltext':
712
                    next = None
713
                else:
1756.2.29 by Aaron Bentley
Remove basis knit support
714
                    next = self.get_parents(cursor)[0]
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
715
                index_memo = self._index.get_position(cursor)
716
                component_data[cursor] = (method, index_memo, next)
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
717
                cursor = next
718
        return component_data
1756.3.18 by Aaron Bentley
More cleanup
719
       
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
720
    def _get_content(self, version_id, parent_texts={}):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
721
        """Returns a content object that makes up the specified
722
        version."""
723
        if not self.has_version(version_id):
724
            raise RevisionNotPresent(version_id, self.filename)
725
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
726
        cached_version = parent_texts.get(version_id, None)
727
        if cached_version is not None:
728
            return cached_version
729
1756.3.22 by Aaron Bentley
Tweaks from review
730
        text_map, contents_map = self._get_content_maps([version_id])
731
        return contents_map[version_id]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
732
733
    def _check_versions_present(self, version_ids):
734
        """Check that all specified versions are present."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
735
        self._index.check_versions_present(version_ids)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
736
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
737
    def _add_lines_with_ghosts(self, version_id, parents, lines, parent_texts):
1594.2.8 by Robert Collins
add ghost aware apis to knits.
738
        """See VersionedFile.add_lines_with_ghosts()."""
739
        self._check_add(version_id, lines)
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
740
        return self._add(version_id, lines[:], parents, self.delta, parent_texts)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
741
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
742
    def _add_lines(self, version_id, parents, lines, parent_texts,
743
                   left_matching_blocks=None):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
744
        """See VersionedFile.add_lines."""
1594.2.8 by Robert Collins
add ghost aware apis to knits.
745
        self._check_add(version_id, lines)
746
        self._check_versions_present(parents)
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
747
        return self._add(version_id, lines[:], parents, self.delta,
748
                         parent_texts, left_matching_blocks)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
749
750
    def _check_add(self, version_id, lines):
751
        """check that version_id and lines are safe to add."""
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
752
        assert self.writable, "knit is not opened for write"
753
        ### FIXME escape. RBC 20060228
754
        if contains_whitespace(version_id):
1668.5.1 by Olaf Conradi
Fix bug in knits when raising InvalidRevisionId without the required
755
            raise InvalidRevisionId(version_id, self.filename)
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
756
        self.check_not_reserved_id(version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
757
        if self.has_version(version_id):
758
            raise RevisionAlreadyPresent(version_id, self.filename)
1666.1.6 by Robert Collins
Make knit the default format.
759
        self._check_lines_not_unicode(lines)
760
        self._check_lines_are_lines(lines)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
761
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
762
    def _add(self, version_id, lines, parents, delta, parent_texts,
763
             left_matching_blocks=None):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
764
        """Add a set of lines on top of version specified by parents.
765
766
        If delta is true, compress the text as a line-delta against
767
        the first parent.
1594.2.8 by Robert Collins
add ghost aware apis to knits.
768
769
        Any versions not present will be converted into ghosts.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
770
        """
1596.2.28 by Robert Collins
more knit profile based tuning.
771
        #  461    0   6546.0390     43.9100   bzrlib.knit:489(_add)
772
        # +400    0    889.4890    418.9790   +bzrlib.knit:192(lower_fulltext)
773
        # +461    0   1364.8070    108.8030   +bzrlib.knit:996(add_record)
774
        # +461    0    193.3940     41.5720   +bzrlib.knit:898(add_version)
775
        # +461    0    134.0590     18.3810   +bzrlib.osutils:361(sha_strings)
776
        # +461    0     36.3420     15.4540   +bzrlib.knit:146(make)
777
        # +1383   0      8.0370      8.0370   +<len>
778
        # +61     0     13.5770      7.9190   +bzrlib.knit:199(lower_line_delta)
779
        # +61     0    963.3470      7.8740   +bzrlib.knit:427(_get_content)
780
        # +61     0    973.9950      5.2950   +bzrlib.knit:136(line_delta)
781
        # +61     0   1918.1800      5.2640   +bzrlib.knit:359(_merge_annotations)
782
1596.2.10 by Robert Collins
Reviewer feedback on knit branches.
783
        present_parents = []
1594.2.8 by Robert Collins
add ghost aware apis to knits.
784
        ghosts = []
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
785
        if parent_texts is None:
786
            parent_texts = {}
1594.2.8 by Robert Collins
add ghost aware apis to knits.
787
        for parent in parents:
788
            if not self.has_version(parent):
789
                ghosts.append(parent)
1594.2.9 by Robert Collins
Teach Knit repositories how to handle ghosts without corrupting at all.
790
            else:
1596.2.10 by Robert Collins
Reviewer feedback on knit branches.
791
                present_parents.append(parent)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
792
1596.2.10 by Robert Collins
Reviewer feedback on knit branches.
793
        if delta and not len(present_parents):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
794
            delta = False
795
796
        digest = sha_strings(lines)
797
        options = []
798
        if lines:
799
            if lines[-1][-1] != '\n':
800
                options.append('no-eol')
801
                lines[-1] = lines[-1] + '\n'
802
1596.2.10 by Robert Collins
Reviewer feedback on knit branches.
803
        if len(present_parents) and delta:
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
804
            # To speed the extract of texts the delta chain is limited
805
            # to a fixed number of deltas.  This should minimize both
806
            # I/O and the time spend applying deltas.
2147.1.1 by John Arbash Meinel
Factor the common knit delta selection into a helper func, and allow the fulltext to be chosen based on cumulative delta size
807
            delta = self._check_should_delta(present_parents)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
808
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
809
        assert isinstance(version_id, str)
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
810
        lines = self.factory.make(lines, version_id)
1596.2.34 by Robert Collins
Optimise knit add to only diff once per parent, not once per parent + once for the delta generation.
811
        if delta or (self.factory.annotated and len(present_parents) > 0):
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
812
            # Merge annotations from parent texts if so is needed.
2520.4.140 by Aaron Bentley
Use matching blocks from mpdiff for knit delta creation
813
            delta_hunks = self._merge_annotations(lines, present_parents,
814
                parent_texts, delta, self.factory.annotated,
815
                left_matching_blocks)
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
816
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
817
        if delta:
818
            options.append('line-delta')
819
            store_lines = self.factory.lower_line_delta(delta_hunks)
820
        else:
821
            options.append('fulltext')
822
            store_lines = self.factory.lower_fulltext(lines)
823
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
824
        access_memo = self._data.add_record(version_id, digest, store_lines)
825
        self._index.add_version(version_id, options, access_memo, parents)
1596.2.32 by Robert Collins
Reduce re-extraction of texts during weave to knit joins by providing a memoisation facility.
826
        return lines
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
827
1563.2.19 by Robert Collins
stub out a check for knits.
828
    def check(self, progress_bar=None):
829
        """See VersionedFile.check()."""
830
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
831
    def _clone_text(self, new_version_id, old_version_id, parents):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
832
        """See VersionedFile.clone_text()."""
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
833
        # FIXME RBC 20060228 make fast by only inserting an index with null 
834
        # delta.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
835
        self.add_lines(new_version_id, parents, self.get_lines(old_version_id))
836
837
    def get_lines(self, version_id):
838
        """See VersionedFile.get_lines()."""
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
839
        return self.get_line_list([version_id])[0]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
840
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
841
    def _get_record_map(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
842
        """Produce a dictionary of knit records.
843
        
844
        The keys are version_ids, the values are tuples of (method, content,
845
        digest, next).
846
        method is the way the content should be applied.  
847
        content is a KnitContent object.
848
        digest is the SHA1 digest of this version id after all steps are done
849
        next is the build-parent of the version, i.e. the leftmost ancestor.
850
        If the method is fulltext, next will be None.
851
        """
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
852
        position_map = self._get_components_positions(version_ids)
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
853
        # c = component_id, m = method, i_m = index_memo, n = next
854
        records = [(c, i_m) for c, (m, i_m, n) in position_map.iteritems()]
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
855
        record_map = {}
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
856
        for component_id, content, digest in \
1863.1.9 by John Arbash Meinel
Switching to have 'read_records_iter' return in random order.
857
                self._data.read_records_iter(records):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
858
            method, index_memo, next = position_map[component_id]
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
859
            record_map[component_id] = method, content, digest, next
860
                          
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
861
        return record_map
1756.2.5 by Aaron Bentley
Reduced read_records calls to 1
862
1756.2.7 by Aaron Bentley
Implement get_text in terms of get_texts
863
    def get_text(self, version_id):
864
        """See VersionedFile.get_text"""
865
        return self.get_texts([version_id])[0]
866
1756.2.1 by Aaron Bentley
Implement get_texts
867
    def get_texts(self, version_ids):
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
868
        return [''.join(l) for l in self.get_line_list(version_ids)]
869
870
    def get_line_list(self, version_ids):
1756.2.1 by Aaron Bentley
Implement get_texts
871
        """Return the texts of listed versions as a list of strings."""
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
872
        version_ids = [osutils.safe_revision_id(v) for v in version_ids]
2229.2.1 by Aaron Bentley
Reject reserved ids in versiondfile, tree, branch and repository
873
        for version_id in version_ids:
2229.2.3 by Aaron Bentley
change reserved_id to is_reserved_id, add check_not_reserved for DRY
874
            self.check_not_reserved_id(version_id)
1756.3.13 by Aaron Bentley
Refactor get_line_list into _get_content
875
        text_map, content_map = self._get_content_maps(version_ids)
876
        return [text_map[v] for v in version_ids]
877
2520.4.90 by Aaron Bentley
Handle \r terminated lines in Weaves properly
878
    _get_lf_split_line_list = get_line_list
2520.4.3 by Aaron Bentley
Implement plain strategy for extracting and installing multiparent diffs
879
1756.3.13 by Aaron Bentley
Refactor get_line_list into _get_content
880
    def _get_content_maps(self, version_ids):
1756.3.19 by Aaron Bentley
Documentation and cleanups
881
        """Produce maps of text and KnitContents
882
        
883
        :return: (text_map, content_map) where text_map contains the texts for
884
        the requested versions and content_map contains the KnitContents.
1756.3.22 by Aaron Bentley
Tweaks from review
885
        Both dicts take version_ids as their keys.
1756.3.19 by Aaron Bentley
Documentation and cleanups
886
        """
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
887
        for version_id in version_ids:
1756.2.1 by Aaron Bentley
Implement get_texts
888
            if not self.has_version(version_id):
889
                raise RevisionNotPresent(version_id, self.filename)
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
890
        record_map = self._get_record_map(version_ids)
1756.2.5 by Aaron Bentley
Reduced read_records calls to 1
891
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
892
        text_map = {}
1756.3.7 by Aaron Bentley
Avoid re-parsing texts version components
893
        content_map = {}
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
894
        final_content = {}
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
895
        for version_id in version_ids:
896
            components = []
897
            cursor = version_id
898
            while cursor is not None:
1756.3.12 by Aaron Bentley
Stuff all text-building data in record_map
899
                method, data, digest, next = record_map[cursor]
1756.3.10 by Aaron Bentley
Optimize selection and retrieval of records
900
                components.append((cursor, method, data, digest))
901
                if cursor in content_map:
902
                    break
903
                cursor = next
904
1756.2.1 by Aaron Bentley
Implement get_texts
905
            content = None
1756.2.7 by Aaron Bentley
Implement get_text in terms of get_texts
906
            for component_id, method, data, digest in reversed(components):
1756.3.7 by Aaron Bentley
Avoid re-parsing texts version components
907
                if component_id in content_map:
908
                    content = content_map[component_id]
1756.3.8 by Aaron Bentley
Avoid unused calls, use generators, sets instead of lists
909
                else:
910
                    if method == 'fulltext':
911
                        assert content is None
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
912
                        content = self.factory.parse_fulltext(data, version_id)
1756.3.8 by Aaron Bentley
Avoid unused calls, use generators, sets instead of lists
913
                    elif method == 'line-delta':
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
914
                        delta = self.factory.parse_line_delta(data, version_id)
1756.3.8 by Aaron Bentley
Avoid unused calls, use generators, sets instead of lists
915
                        content = content.copy()
916
                        content._lines = self._apply_delta(content._lines, 
917
                                                           delta)
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
918
                    content_map[component_id] = content
1756.2.1 by Aaron Bentley
Implement get_texts
919
920
            if 'no-eol' in self._index.get_options(version_id):
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
921
                content = content.copy()
1756.2.1 by Aaron Bentley
Implement get_texts
922
                line = content._lines[-1][1].rstrip('\n')
923
                content._lines[-1] = (content._lines[-1][0], line)
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
924
            final_content[version_id] = content
1756.2.1 by Aaron Bentley
Implement get_texts
925
926
            # digest here is the digest from the last applied component.
1756.3.6 by Aaron Bentley
More multi-text extraction
927
            text = content.text()
928
            if sha_strings(text) != digest:
1756.2.8 by Aaron Bentley
Implement get_line_list, cleanups
929
                raise KnitCorrupt(self.filename, 
930
                                  'sha-1 does not match %s' % version_id)
1756.2.1 by Aaron Bentley
Implement get_texts
931
1756.3.6 by Aaron Bentley
More multi-text extraction
932
            text_map[version_id] = text 
1756.3.14 by Aaron Bentley
Handle the intermediate and final representations of no-final-eol texts
933
        return text_map, final_content 
1756.2.1 by Aaron Bentley
Implement get_texts
934
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
935
    def iter_lines_added_or_present_in_versions(self, version_ids=None, 
936
                                                pb=None):
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
937
        """See VersionedFile.iter_lines_added_or_present_in_versions()."""
938
        if version_ids is None:
939
            version_ids = self.versions()
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
940
        else:
941
            version_ids = [osutils.safe_revision_id(v) for v in version_ids]
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
942
        if pb is None:
943
            pb = progress.DummyProgress()
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
944
        # we don't care about inclusions, the caller cares.
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
945
        # but we need to setup a list of records to visit.
946
        # we need version_id, position, length
947
        version_id_records = []
2163.1.1 by John Arbash Meinel
Use a set to make iter_lines_added_or_present *much* faster
948
        requested_versions = set(version_ids)
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
949
        # filter for available versions
950
        for version_id in requested_versions:
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
951
            if not self.has_version(version_id):
952
                raise RevisionNotPresent(version_id, self.filename)
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
953
        # get a in-component-order queue:
954
        for version_id in self.versions():
955
            if version_id in requested_versions:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
956
                index_memo = self._index.get_position(version_id)
957
                version_id_records.append((version_id, index_memo))
1594.3.1 by Robert Collins
Merge transaction finalisation and ensure iter_lines_added_or_present in knits does a old-to-new read in the knit.
958
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
959
        total = len(version_id_records)
2147.1.3 by John Arbash Meinel
In knit.py we were re-using a variable in 2 loops, causing bogus progress messages to be generated.
960
        for version_idx, (version_id, data, sha_value) in \
961
            enumerate(self._data.read_records_iter(version_id_records)):
962
            pb.update('Walking content.', version_idx, total)
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
963
            method = self._index.get_method(version_id)
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
964
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
965
            assert method in ('fulltext', 'line-delta')
966
            if method == 'fulltext':
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
967
                line_iterator = self.factory.get_fulltext_content(data)
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
968
            else:
2163.1.7 by John Arbash Meinel
Switch the line iterator as suggested by Aaron Bentley
969
                line_iterator = self.factory.get_linedelta_content(data)
970
            for line in line_iterator:
971
                yield line
972
2039.1.1 by Aaron Bentley
Clean up progress properly when interrupted during fetch (#54000)
973
        pb.update('Walking content.', total, total)
1594.2.6 by Robert Collins
Introduce a api specifically for looking at lines in some versions of the inventory, for fileid_involved.
974
        
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
975
    def iter_parents(self, version_ids):
976
        """Iterate through the parents for many version ids.
977
978
        :param version_ids: An iterable yielding version_ids.
979
        :return: An iterator that yields (version_id, parents). Requested 
980
            version_ids not present in the versioned file are simply skipped.
981
            The order is undefined, allowing for different optimisations in
982
            the underlying implementation.
983
        """
984
        version_ids = [osutils.safe_revision_id(version_id) for
985
            version_id in version_ids]
986
        return self._index.iter_parents(version_ids)
987
1563.2.18 by Robert Collins
get knit repositories really using knits for text storage.
988
    def num_versions(self):
989
        """See VersionedFile.num_versions()."""
990
        return self._index.num_versions()
991
992
    __len__ = num_versions
993
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
994
    def annotate_iter(self, version_id):
995
        """See VersionedFile.annotate_iter."""
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
996
        version_id = osutils.safe_revision_id(version_id)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
997
        content = self._get_content(version_id)
998
        for origin, text in content.annotate_iter():
1596.2.7 by Robert Collins
Remove the requirement for reannotation in knit joins.
999
            yield origin, text
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1000
1001
    def get_parents(self, version_id):
1002
        """See VersionedFile.get_parents."""
1628.1.2 by Robert Collins
More knit micro-optimisations.
1003
        # perf notes:
1004
        # optimism counts!
1005
        # 52554 calls in 1264 872 internal down from 3674
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
1006
        version_id = osutils.safe_revision_id(version_id)
1628.1.2 by Robert Collins
More knit micro-optimisations.
1007
        try:
1008
            return self._index.get_parents(version_id)
1009
        except KeyError:
1010
            raise RevisionNotPresent(version_id, self.filename)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1011
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1012
    def get_parents_with_ghosts(self, version_id):
1013
        """See VersionedFile.get_parents."""
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
1014
        version_id = osutils.safe_revision_id(version_id)
1628.1.2 by Robert Collins
More knit micro-optimisations.
1015
        try:
1016
            return self._index.get_parents_with_ghosts(version_id)
1017
        except KeyError:
1018
            raise RevisionNotPresent(version_id, self.filename)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1019
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1020
    def get_ancestry(self, versions, topo_sorted=True):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1021
        """See VersionedFile.get_ancestry."""
1022
        if isinstance(versions, basestring):
1023
            versions = [versions]
1024
        if not versions:
1025
            return []
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
1026
        versions = [osutils.safe_revision_id(v) for v in versions]
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1027
        return self._index.get_ancestry(versions, topo_sorted)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1028
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1029
    def get_ancestry_with_ghosts(self, versions):
1030
        """See VersionedFile.get_ancestry_with_ghosts."""
1031
        if isinstance(versions, basestring):
1032
            versions = [versions]
1033
        if not versions:
1034
            return []
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
1035
        versions = [osutils.safe_revision_id(v) for v in versions]
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1036
        return self._index.get_ancestry_with_ghosts(versions)
1037
1664.2.3 by Aaron Bentley
Add failing test case
1038
    def plan_merge(self, ver_a, ver_b):
1664.2.11 by Aaron Bentley
Clarifications from merge review
1039
        """See VersionedFile.plan_merge."""
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
1040
        ver_a = osutils.safe_revision_id(ver_a)
1041
        ver_b = osutils.safe_revision_id(ver_b)
2490.2.33 by Aaron Bentley
Disable topological sorting of get_ancestry where sensible
1042
        ancestors_b = set(self.get_ancestry(ver_b, topo_sorted=False))
1664.2.6 by Aaron Bentley
Got plan-merge passing tests
1043
        
2490.2.33 by Aaron Bentley
Disable topological sorting of get_ancestry where sensible
1044
        ancestors_a = set(self.get_ancestry(ver_a, topo_sorted=False))
1664.2.4 by Aaron Bentley
Identify unchanged lines correctly
1045
        annotated_a = self.annotate(ver_a)
1046
        annotated_b = self.annotate(ver_b)
1551.15.46 by Aaron Bentley
Move plan merge to tree
1047
        return merge._plan_annotate_merge(annotated_a, annotated_b,
1048
                                          ancestors_a, ancestors_b)
1664.2.4 by Aaron Bentley
Identify unchanged lines correctly
1049
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1050
1051
class _KnitComponentFile(object):
1052
    """One of the files used to implement a knit database"""
1053
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1054
    def __init__(self, transport, filename, mode, file_mode=None,
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1055
                 create_parent_dir=False, dir_mode=None):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1056
        self._transport = transport
1057
        self._filename = filename
1058
        self._mode = mode
1946.2.3 by John Arbash Meinel
Pass around the file mode correctly
1059
        self._file_mode = file_mode
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1060
        self._dir_mode = dir_mode
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1061
        self._create_parent_dir = create_parent_dir
1062
        self._need_to_create = False
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1063
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1064
    def _full_path(self):
1065
        """Return the full path to this file."""
1066
        return self._transport.base + self._filename
1067
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1068
    def check_header(self, fp):
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1069
        line = fp.readline()
2171.1.1 by John Arbash Meinel
Knit index files should ignore empty indexes rather than consider them corrupt.
1070
        if line == '':
1071
            # An empty file can actually be treated as though the file doesn't
1072
            # exist yet.
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1073
            raise errors.NoSuchFile(self._full_path())
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1074
        if line != self.HEADER:
2171.1.1 by John Arbash Meinel
Knit index files should ignore empty indexes rather than consider them corrupt.
1075
            raise KnitHeaderError(badline=line,
1076
                              filename=self._transport.abspath(self._filename))
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1077
1078
    def __repr__(self):
1079
        return '%s(%s)' % (self.__class__.__name__, self._filename)
1080
1081
1082
class _KnitIndex(_KnitComponentFile):
1083
    """Manages knit index file.
1084
1085
    The index is already kept in memory and read on startup, to enable
1086
    fast lookups of revision information.  The cursor of the index
1087
    file is always pointing to the end, making it easy to append
1088
    entries.
1089
1090
    _cache is a cache for fast mapping from version id to a Index
1091
    object.
1092
1093
    _history is a cache for fast mapping from indexes to version ids.
1094
1095
    The index data format is dictionary compressed when it comes to
1096
    parent references; a index entry may only have parents that with a
1097
    lover index number.  As a result, the index is topological sorted.
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
1098
1099
    Duplicate entries may be written to the index for a single version id
1100
    if this is done then the latter one completely replaces the former:
1101
    this allows updates to correct version and parent information. 
1102
    Note that the two entries may share the delta, and that successive
1103
    annotations and references MUST point to the first entry.
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1104
1105
    The index file on disc contains a header, followed by one line per knit
1106
    record. The same revision can be present in an index file more than once.
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1107
    The first occurrence gets assigned a sequence number starting from 0. 
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1108
    
1109
    The format of a single line is
1110
    REVISION_ID FLAGS BYTE_OFFSET LENGTH( PARENT_ID|PARENT_SEQUENCE_ID)* :\n
1111
    REVISION_ID is a utf8-encoded revision id
1112
    FLAGS is a comma separated list of flags about the record. Values include 
1113
        no-eol, line-delta, fulltext.
1114
    BYTE_OFFSET is the ascii representation of the byte offset in the data file
1115
        that the the compressed data starts at.
1116
    LENGTH is the ascii representation of the length of the data file.
1117
    PARENT_ID a utf-8 revision id prefixed by a '.' that is a parent of
1118
        REVISION_ID.
1119
    PARENT_SEQUENCE_ID the ascii representation of the sequence number of a
1120
        revision id already in the knit that is a parent of REVISION_ID.
1121
    The ' :' marker is the end of record marker.
1122
    
1123
    partial writes:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1124
    when a write is interrupted to the index file, it will result in a line
1125
    that does not end in ' :'. If the ' :' is not present at the end of a line,
1126
    or at the end of the file, then the record that is missing it will be
1127
    ignored by the parser.
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1128
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1129
    When writing new records to the index file, the data is preceded by '\n'
1641.1.2 by Robert Collins
Change knit index files to be robust in the presence of partial writes.
1130
    to ensure that records always start on new lines even if the last write was
1131
    interrupted. As a result its normal for the last line in the index to be
1132
    missing a trailing newline. One can be added with no harmful effects.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1133
    """
1134
1666.1.6 by Robert Collins
Make knit the default format.
1135
    HEADER = "# bzr knit index 8\n"
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1136
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1137
    # speed of knit parsing went from 280 ms to 280 ms with slots addition.
1138
    # __slots__ = ['_cache', '_history', '_transport', '_filename']
1139
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1140
    def _cache_version(self, version_id, options, pos, size, parents):
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1141
        """Cache a version record in the history array and index cache.
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1142
1143
        This is inlined into _load_data for performance. KEEP IN SYNC.
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1144
        (It saves 60ms, 25% of the __init__ overhead on local 4000 record
1145
         indexes).
1146
        """
1596.2.14 by Robert Collins
Make knit parsing non quadratic?
1147
        # only want the _history index to reference the 1st index entry
1148
        # for version_id
1596.2.18 by Robert Collins
More microopimisations on index reading, now down to 16000 records/seconds.
1149
        if version_id not in self._cache:
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1150
            index = len(self._history)
1596.2.14 by Robert Collins
Make knit parsing non quadratic?
1151
            self._history.append(version_id)
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1152
        else:
1153
            index = self._cache[version_id][5]
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1154
        self._cache[version_id] = (version_id,
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1155
                                   options,
1156
                                   pos,
1157
                                   size,
1158
                                   parents,
1159
                                   index)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1160
1946.2.1 by John Arbash Meinel
2 changes to knits. Delay creating the .knit or .kndx file until we have actually tried to write data. Because of this, we must allow the Knit to create the prefix directories
1161
    def __init__(self, transport, filename, mode, create=False, file_mode=None,
1946.2.12 by John Arbash Meinel
Add ability to pass a directory mode to non_atomic_put
1162
                 create_parent_dir=False, delay_create=False, dir_mode=None):
1163
        _KnitComponentFile.__init__(self, transport, filename, mode,
1164
                                    file_mode=file_mode,
1165
                                    create_parent_dir=create_parent_dir,
1166
                                    dir_mode=dir_mode)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1167
        self._cache = {}
1563.2.11 by Robert Collins
Consolidate reweave and join as we have no separate usage, make reweave tests apply to all versionedfile implementations and deprecate the old reweave apis.
1168
        # position in _history is the 'official' index for a revision
1169
        # but the values may have come from a newer entry.
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1170
        # so - wc -l of a knit index is != the number of unique names
1773.4.1 by Martin Pool
Add pyflakes makefile target; fix many warnings
1171
        # in the knit.
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1172
        self._history = []
1173
        try:
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1174
            fp = self._transport.get(self._filename)
1594.2.17 by Robert Collins
Better readv coalescing, now with test, and progress during knit index reading.
1175
            try:
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1176
                # _load_data may raise NoSuchFile if the target knit is
1177
                # completely empty.
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
1178
                _load_data(self, fp)
2247.2.1 by John Arbash Meinel
Don't create pb for simple knit reading.
1179
            finally:
1180
                fp.close()
1181
        except NoSuchFile:
1182
            if mode != 'w' or not create:
1183
                raise
1184
            elif delay_create:
1185
                self._need_to_create = True
1186
            else:
1187
                self._transport.put_bytes_non_atomic(
1188
                    self._filename, self.HEADER, mode=self._file_mode)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1189
1190
    def get_graph(self):
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1191
        """Return a list of the node:parents lists from this knit index."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1192
        return [(vid, idx[4]) for vid, idx in self._cache.iteritems()]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1193
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1194
    def get_ancestry(self, versions, topo_sorted=True):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1195
        """See VersionedFile.get_ancestry."""
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1196
        # get a graph of all the mentioned versions:
1197
        graph = {}
1198
        pending = set(versions)
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1199
        cache = self._cache
1200
        while pending:
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1201
            version = pending.pop()
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1202
            # trim ghosts
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1203
            try:
1204
                parents = [p for p in cache[version][4] if p in cache]
1205
            except KeyError:
1206
                raise RevisionNotPresent(version, self._filename)
1207
            # if not completed and not a ghost
1208
            pending.update([p for p in parents if p not in graph])
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1209
            graph[version] = parents
2530.1.1 by Aaron Bentley
Make topological sorting optional for get_ancestry
1210
        if not topo_sorted:
1211
            return graph.keys()
1563.2.35 by Robert Collins
cleanup deprecation warnings and finish conversion so the inventory is knit based too.
1212
        return topo_sort(graph.items())
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1213
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1214
    def get_ancestry_with_ghosts(self, versions):
1215
        """See VersionedFile.get_ancestry_with_ghosts."""
1216
        # get a graph of all the mentioned versions:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1217
        self.check_versions_present(versions)
1218
        cache = self._cache
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1219
        graph = {}
1220
        pending = set(versions)
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1221
        while pending:
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1222
            version = pending.pop()
1223
            try:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1224
                parents = cache[version][4]
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1225
            except KeyError:
1226
                # ghost, fake it
1227
                graph[version] = []
1228
            else:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1229
                # if not completed
1230
                pending.update([p for p in parents if p not in graph])
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1231
                graph[version] = parents
1232
        return topo_sort(graph.items())
1233
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1234
    def iter_parents(self, version_ids):
1235
        """Iterate through the parents for many version ids.
1236
1237
        :param version_ids: An iterable yielding version_ids.
1238
        :return: An iterator that yields (version_id, parents). Requested 
1239
            version_ids not present in the versioned file are simply skipped.
1240
            The order is undefined, allowing for different optimisations in
1241
            the underlying implementation.
1242
        """
1243
        for version_id in version_ids:
1244
            try:
1245
                yield version_id, tuple(self.get_parents(version_id))
1246
            except KeyError:
1247
                pass
1248
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1249
    def num_versions(self):
1250
        return len(self._history)
1251
1252
    __len__ = num_versions
1253
1254
    def get_versions(self):
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1255
        """Get all the versions in the file. not topologically sorted."""
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1256
        return self._history
1257
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1258
    def _version_list_to_index(self, versions):
1259
        result_list = []
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1260
        cache = self._cache
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1261
        for version in versions:
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1262
            if version in cache:
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1263
                # -- inlined lookup() --
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1264
                result_list.append(str(cache[version][5]))
1628.1.1 by Robert Collins
Cache the index number of versions in the knit index's self._cache so that
1265
                # -- end lookup () --
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1266
            else:
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1267
                result_list.append('.' + version)
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1268
        return ' '.join(result_list)
1269
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1270
    def add_version(self, version_id, options, index_memo, parents):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1271
        """Add a version record to the index."""
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1272
        self.add_versions(((version_id, options, index_memo, parents),))
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1273
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
1274
    def add_versions(self, versions):
1275
        """Add multiple versions to the index.
1276
        
1277
        :param versions: a list of tuples:
1278
                         (version_id, options, pos, size, parents).
1279
        """
1280
        lines = []
2102.2.1 by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions
1281
        orig_history = self._history[:]
1282
        orig_cache = self._cache.copy()
1283
1284
        try:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1285
            for version_id, options, (index, pos, size), parents in versions:
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1286
                line = "\n%s %s %s %s %s :" % (version_id,
2102.2.1 by John Arbash Meinel
Fix bug #64789 _KnitIndex.add_versions() should dict compress new revisions
1287
                                               ','.join(options),
1288
                                               pos,
1289
                                               size,
1290
                                               self._version_list_to_index(parents))
1291
                assert isinstance(line, str), \
1292
                    'content must be utf-8 encoded: %r' % (line,)
1293
                lines.append(line)
1294
                self._cache_version(version_id, options, pos, size, parents)
1295
            if not self._need_to_create:
1296
                self._transport.append_bytes(self._filename, ''.join(lines))
1297
            else:
1298
                sio = StringIO()
1299
                sio.write(self.HEADER)
1300
                sio.writelines(lines)
1301
                sio.seek(0)
1302
                self._transport.put_file_non_atomic(self._filename, sio,
1303
                                    create_parent_dir=self._create_parent_dir,
1304
                                    mode=self._file_mode,
1305
                                    dir_mode=self._dir_mode)
1306
                self._need_to_create = False
1307
        except:
1308
            # If any problems happen, restore the original values and re-raise
1309
            self._history = orig_history
1310
            self._cache = orig_cache
1311
            raise
1312
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1313
    def has_version(self, version_id):
1314
        """True if the version is in the index."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1315
        return version_id in self._cache
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1316
1317
    def get_position(self, version_id):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1318
        """Return details needed to access the version.
1319
        
1320
        .kndx indices do not support split-out data, so return None for the 
1321
        index field.
1322
1323
        :return: a tuple (None, data position, size) to hand to the access
1324
            logic to get the record.
1325
        """
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1326
        entry = self._cache[version_id]
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1327
        return None, entry[2], entry[3]
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1328
1329
    def get_method(self, version_id):
1330
        """Return compression method of specified version."""
1331
        options = self._cache[version_id][1]
1332
        if 'fulltext' in options:
1333
            return 'fulltext'
1334
        else:
2196.2.5 by John Arbash Meinel
Add an exception class when the knit index storage method is unknown, and properly test for it
1335
            if 'line-delta' not in options:
1336
                raise errors.KnitIndexUnknownMethod(self._full_path(), options)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1337
            return 'line-delta'
1338
1339
    def get_options(self, version_id):
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1340
        """Return a string represention options.
1341
1342
        e.g. foo,bar
1343
        """
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1344
        return self._cache[version_id][1]
1345
1346
    def get_parents(self, version_id):
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1347
        """Return parents of specified version ignoring ghosts."""
1348
        return [parent for parent in self._cache[version_id][4] 
1349
                if parent in self._cache]
1350
1351
    def get_parents_with_ghosts(self, version_id):
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
1352
        """Return parents of specified version with ghosts."""
1594.2.8 by Robert Collins
add ghost aware apis to knits.
1353
        return self._cache[version_id][4] 
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1354
1355
    def check_versions_present(self, version_ids):
1356
        """Check that all specified versions are present."""
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
1357
        cache = self._cache
1358
        for version_id in version_ids:
1359
            if version_id not in cache:
1360
                raise RevisionNotPresent(version_id, self._filename)
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1361
1362
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1363
class KnitGraphIndex(object):
1364
    """A knit index that builds on GraphIndex."""
1365
1366
    def __init__(self, graph_index, deltas=False, parents=True, add_callback=None):
1367
        """Construct a KnitGraphIndex on a graph_index.
1368
1369
        :param graph_index: An implementation of bzrlib.index.GraphIndex.
1370
        :param deltas: Allow delta-compressed records.
1371
        :param add_callback: If not None, allow additions to the index and call
1372
            this callback with a list of added GraphIndex nodes:
1373
            [(node, value, node_refs), ...]
1374
        :param parents: If True, record knits parents, if not do not record 
1375
            parents.
1376
        """
1377
        self._graph_index = graph_index
1378
        self._deltas = deltas
1379
        self._add_callback = add_callback
1380
        self._parents = parents
1381
        if deltas and not parents:
1382
            raise KnitCorrupt(self, "Cannot do delta compression without "
1383
                "parent tracking.")
1384
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1385
    def _get_entries(self, keys, check_present=False):
1386
        """Get the entries for keys.
1387
        
1388
        :param keys: An iterable of index keys, - 1-tuples.
1389
        """
1390
        keys = set(keys)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1391
        found_keys = set()
1392
        if self._parents:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1393
            for node in self._graph_index.iter_entries(keys):
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1394
                yield node
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1395
                found_keys.add(node[1])
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1396
        else:
1397
            # adapt parentless index to the rest of the code.
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1398
            for node in self._graph_index.iter_entries(keys):
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1399
                yield node[0], node[1], node[2], ()
1400
                found_keys.add(node[1])
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1401
        if check_present:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1402
            missing_keys = keys.difference(found_keys)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1403
            if missing_keys:
1404
                raise RevisionNotPresent(missing_keys.pop(), self)
1405
1406
    def _present_keys(self, version_ids):
1407
        return set([
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1408
            node[1] for node in self._get_entries(version_ids)])
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1409
1410
    def _parentless_ancestry(self, versions):
1411
        """Honour the get_ancestry API for parentless knit indices."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1412
        wanted_keys = self._version_ids_to_keys(versions)
1413
        present_keys = self._present_keys(wanted_keys)
1414
        missing = set(wanted_keys).difference(present_keys)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1415
        if missing:
1416
            raise RevisionNotPresent(missing.pop(), self)
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1417
        return list(self._keys_to_version_ids(present_keys))
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1418
1419
    def get_ancestry(self, versions, topo_sorted=True):
1420
        """See VersionedFile.get_ancestry."""
1421
        if not self._parents:
1422
            return self._parentless_ancestry(versions)
1423
        # XXX: This will do len(history) index calls - perhaps
1424
        # it should be altered to be a index core feature?
1425
        # get a graph of all the mentioned versions:
1426
        graph = {}
1427
        ghosts = set()
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1428
        versions = self._version_ids_to_keys(versions)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1429
        pending = set(versions)
1430
        while pending:
1431
            # get all pending nodes
1432
            this_iteration = pending
1433
            new_nodes = self._get_entries(this_iteration)
2624.2.7 by Robert Collins
Merge back the removal of difference_update from my repository branch.
1434
            found = set()
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1435
            pending = set()
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1436
            for (index, key, value, node_refs) in new_nodes:
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1437
                # dont ask for ghosties - otherwise
1438
                # we we can end up looping with pending
1439
                # being entirely ghosted.
1440
                graph[key] = [parent for parent in node_refs[0]
1441
                    if parent not in ghosts]
2624.2.7 by Robert Collins
Merge back the removal of difference_update from my repository branch.
1442
                # queue parents
1443
                for parent in graph[key]:
1444
                    # dont examine known nodes again
1445
                    if parent in graph:
1446
                        continue
1447
                    pending.add(parent)
1448
                found.add(key)
1449
            ghosts.update(this_iteration.difference(found))
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1450
        if versions.difference(graph):
1451
            raise RevisionNotPresent(versions.difference(graph).pop(), self)
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1452
        if topo_sorted:
1453
            result_keys = topo_sort(graph.items())
1454
        else:
1455
            result_keys = graph.iterkeys()
1456
        return [key[0] for key in result_keys]
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1457
1458
    def get_ancestry_with_ghosts(self, versions):
1459
        """See VersionedFile.get_ancestry."""
1460
        if not self._parents:
1461
            return self._parentless_ancestry(versions)
1462
        # XXX: This will do len(history) index calls - perhaps
1463
        # it should be altered to be a index core feature?
1464
        # get a graph of all the mentioned versions:
1465
        graph = {}
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1466
        versions = self._version_ids_to_keys(versions)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1467
        pending = set(versions)
1468
        while pending:
1469
            # get all pending nodes
1470
            this_iteration = pending
1471
            new_nodes = self._get_entries(this_iteration)
1472
            pending = set()
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1473
            for (index, key, value, node_refs) in new_nodes:
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1474
                graph[key] = node_refs[0]
1475
                # queue parents 
2624.2.7 by Robert Collins
Merge back the removal of difference_update from my repository branch.
1476
                for parent in graph[key]:
1477
                    # dont examine known nodes again
1478
                    if parent in graph:
1479
                        continue
1480
                    pending.add(parent)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1481
            missing_versions = this_iteration.difference(graph)
1482
            missing_needed = versions.intersection(missing_versions)
1483
            if missing_needed:
1484
                raise RevisionNotPresent(missing_needed.pop(), self)
1485
            for missing_version in missing_versions:
1486
                # add a key, no parents
1487
                graph[missing_version] = []
2624.2.7 by Robert Collins
Merge back the removal of difference_update from my repository branch.
1488
                pending.discard(missing_version) # don't look for it
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1489
        result_keys = topo_sort(graph.items())
1490
        return [key[0] for key in result_keys]
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1491
1492
    def get_graph(self):
1493
        """Return a list of the node:parents lists from this knit index."""
1494
        if not self._parents:
1495
            return [(key, ()) for key in self.get_versions()]
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1496
        result = []
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1497
        for index, key, value, refs in self._graph_index.iter_all_entries():
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1498
            result.append((key[0], tuple([ref[0] for ref in refs[0]])))
1499
        return result
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1500
1501
    def iter_parents(self, version_ids):
1502
        """Iterate through the parents for many version ids.
1503
1504
        :param version_ids: An iterable yielding version_ids.
1505
        :return: An iterator that yields (version_id, parents). Requested 
1506
            version_ids not present in the versioned file are simply skipped.
1507
            The order is undefined, allowing for different optimisations in
1508
            the underlying implementation.
1509
        """
1510
        if self._parents:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1511
            all_nodes = set(self._get_entries(self._version_ids_to_keys(version_ids)))
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1512
            all_parents = set()
1513
            present_parents = set()
1514
            for node in all_nodes:
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1515
                all_parents.update(node[3][0])
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1516
                # any node we are querying must be present
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1517
                present_parents.add(node[1])
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1518
            unknown_parents = all_parents.difference(present_parents)
1519
            present_parents.update(self._present_keys(unknown_parents))
1520
            for node in all_nodes:
1521
                parents = []
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1522
                for parent in node[3][0]:
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1523
                    if parent in present_parents:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1524
                        parents.append(parent[0])
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1525
                yield node[1][0], tuple(parents)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1526
        else:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1527
            for node in self._get_entries(self._version_ids_to_keys(version_ids)):
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1528
                yield node[1][0], ()
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1529
1530
    def num_versions(self):
1531
        return len(list(self._graph_index.iter_all_entries()))
1532
1533
    __len__ = num_versions
1534
1535
    def get_versions(self):
1536
        """Get all the versions in the file. not topologically sorted."""
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1537
        return [node[1][0] for node in self._graph_index.iter_all_entries()]
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1538
    
1539
    def has_version(self, version_id):
1540
        """True if the version is in the index."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1541
        return len(self._present_keys(self._version_ids_to_keys([version_id]))) == 1
1542
1543
    def _keys_to_version_ids(self, keys):
1544
        return tuple(key[0] for key in keys)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1545
1546
    def get_position(self, version_id):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1547
        """Return details needed to access the version.
1548
        
1549
        :return: a tuple (index, data position, size) to hand to the access
1550
            logic to get the record.
1551
        """
1552
        node = self._get_node(version_id)
1553
        bits = node[2][1:].split(' ')
1554
        return node[0], int(bits[0]), int(bits[1])
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1555
1556
    def get_method(self, version_id):
1557
        """Return compression method of specified version."""
1558
        if not self._deltas:
1559
            return 'fulltext'
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1560
        return self._parent_compression(self._get_node(version_id)[3][1])
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1561
1562
    def _parent_compression(self, reference_list):
1563
        # use the second reference list to decide if this is delta'd or not.
1564
        if len(reference_list):
1565
            return 'line-delta'
1566
        else:
1567
            return 'fulltext'
1568
1569
    def _get_node(self, version_id):
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1570
        return list(self._get_entries(self._version_ids_to_keys([version_id])))[0]
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1571
1572
    def get_options(self, version_id):
1573
        """Return a string represention options.
1574
1575
        e.g. foo,bar
1576
        """
1577
        node = self._get_node(version_id)
1578
        if not self._deltas:
1579
            options = ['fulltext']
1580
        else:
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1581
            options = [self._parent_compression(node[3][1])]
1582
        if node[2][0] == 'N':
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1583
            options.append('no-eol')
2658.2.1 by Robert Collins
Fix mismatch between KnitGraphIndex and KnitIndex in get_options.
1584
        return options
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1585
1586
    def get_parents(self, version_id):
1587
        """Return parents of specified version ignoring ghosts."""
1588
        parents = list(self.iter_parents([version_id]))
1589
        if not parents:
1590
            # missing key
1591
            raise errors.RevisionNotPresent(version_id, self)
1592
        return parents[0][1]
1593
1594
    def get_parents_with_ghosts(self, version_id):
1595
        """Return parents of specified version with ghosts."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1596
        nodes = list(self._get_entries(self._version_ids_to_keys([version_id]),
1597
            check_present=True))
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1598
        if not self._parents:
1599
            return ()
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1600
        return self._keys_to_version_ids(nodes[0][3][0])
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1601
1602
    def check_versions_present(self, version_ids):
1603
        """Check that all specified versions are present."""
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1604
        keys = self._version_ids_to_keys(version_ids)
1605
        present = self._present_keys(keys)
1606
        missing = keys.difference(present)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1607
        if missing:
1608
            raise RevisionNotPresent(missing.pop(), self)
1609
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1610
    def add_version(self, version_id, options, access_memo, parents):
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1611
        """Add a version record to the index."""
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1612
        return self.add_versions(((version_id, options, access_memo, parents),))
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1613
1614
    def add_versions(self, versions):
1615
        """Add multiple versions to the index.
1616
        
1617
        This function does not insert data into the Immutable GraphIndex
1618
        backing the KnitGraphIndex, instead it prepares data for insertion by
1619
        the caller and checks that it is safe to insert then calls
1620
        self._add_callback with the prepared GraphIndex nodes.
1621
1622
        :param versions: a list of tuples:
1623
                         (version_id, options, pos, size, parents).
1624
        """
1625
        if not self._add_callback:
1626
            raise errors.ReadOnlyError(self)
1627
        # we hope there are no repositories with inconsistent parentage
1628
        # anymore.
1629
        # check for dups
1630
1631
        keys = {}
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1632
        for (version_id, options, access_memo, parents) in versions:
1633
            index, pos, size = access_memo
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1634
            key = (version_id, )
1635
            parents = tuple((parent, ) for parent in parents)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1636
            if 'no-eol' in options:
1637
                value = 'N'
1638
            else:
1639
                value = ' '
1640
            value += "%d %d" % (pos, size)
1641
            if not self._deltas:
1642
                if 'line-delta' in options:
1643
                    raise KnitCorrupt(self, "attempt to add line-delta in non-delta knit")
1644
            if self._parents:
1645
                if self._deltas:
1646
                    if 'line-delta' in options:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1647
                        node_refs = (parents, (parents[0],))
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1648
                    else:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1649
                        node_refs = (parents, ())
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1650
                else:
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1651
                    node_refs = (parents, )
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1652
            else:
1653
                if parents:
1654
                    raise KnitCorrupt(self, "attempt to add node with parents "
1655
                        "in parentless index.")
1656
                node_refs = ()
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1657
            keys[key] = (value, node_refs)
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1658
        present_nodes = self._get_entries(keys)
2624.2.14 by Robert Collins
Add source index to the index iteration API to allow mapping back to the origin of retrieved data.
1659
        for (index, key, value, node_refs) in present_nodes:
2625.8.1 by Robert Collins
LIBRARY API BREAKS:
1660
            if (value, node_refs) != keys[key]:
1661
                raise KnitCorrupt(self, "inconsistent details in add_versions"
1662
                    ": %s %s" % ((value, node_refs), keys[key]))
1663
            del keys[key]
1664
        result = []
1665
        if self._parents:
1666
            for key, (value, node_refs) in keys.iteritems():
1667
                result.append((key, value, node_refs))
1668
        else:
1669
            for key, (value, node_refs) in keys.iteritems():
1670
                result.append((key, value))
1671
        self._add_callback(result)
1672
        
2624.2.5 by Robert Collins
Change bzrlib.index.Index keys to be 1-tuples, not strings.
1673
    def _version_ids_to_keys(self, version_ids):
1674
        return set((version_id, ) for version_id in version_ids)
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1675
1676
1677
class _KnitAccess(object):
1678
    """Access to knit records in a .knit file."""
1679
1680
    def __init__(self, transport, filename, _file_mode, _dir_mode,
1681
        _need_to_create, _create_parent_dir):
1682
        """Create a _KnitAccess for accessing and inserting data.
1683
1684
        :param transport: The transport the .knit is located on.
1685
        :param filename: The filename of the .knit.
1686
        """
1687
        self._transport = transport
1688
        self._filename = filename
1689
        self._file_mode = _file_mode
1690
        self._dir_mode = _dir_mode
1691
        self._need_to_create = _need_to_create
1692
        self._create_parent_dir = _create_parent_dir
1693
1694
    def add_raw_records(self, sizes, raw_data):
1695
        """Add raw knit bytes to a storage area.
1696
1697
        The data is spooled to whereever the access method is storing data.
1698
1699
        :param sizes: An iterable containing the size of each raw data segment.
1700
        :param raw_data: A bytestring containing the data.
1701
        :return: A list of memos to retrieve the record later. Each memo is a
1702
            tuple - (index, pos, length), where the index field is always None
1703
            for the .knit access method.
1704
        """
1705
        assert type(raw_data) == str, \
1706
            'data must be plain bytes was %s' % type(raw_data)
1707
        if not self._need_to_create:
1708
            base = self._transport.append_bytes(self._filename, raw_data)
1709
        else:
1710
            self._transport.put_bytes_non_atomic(self._filename, raw_data,
1711
                                   create_parent_dir=self._create_parent_dir,
1712
                                   mode=self._file_mode,
1713
                                   dir_mode=self._dir_mode)
1714
            self._need_to_create = False
1715
            base = 0
1716
        result = []
1717
        for size in sizes:
1718
            result.append((None, base, size))
1719
            base += size
1720
        return result
1721
1722
    def create(self):
1723
        """IFF this data access has its own storage area, initialise it.
1724
1725
        :return: None.
1726
        """
1727
        self._transport.put_bytes_non_atomic(self._filename, '',
1728
                                             mode=self._file_mode)
1729
1730
    def open_file(self):
1731
        """IFF this data access can be represented as a single file, open it.
1732
1733
        For knits that are not mapped to a single file on disk this will
1734
        always return None.
1735
1736
        :return: None or a file handle.
1737
        """
1738
        try:
1739
            return self._transport.get(self._filename)
1740
        except NoSuchFile:
1741
            pass
1742
        return None
1743
1744
    def get_raw_records(self, memos_for_retrieval):
1745
        """Get the raw bytes for a records.
1746
1747
        :param memos_for_retrieval: An iterable containing the (index, pos, 
1748
            length) memo for retrieving the bytes. The .knit method ignores
1749
            the index as there is always only a single file.
1750
        :return: An iterator over the bytes of the records.
1751
        """
1752
        read_vector = [(pos, size) for (index, pos, size) in memos_for_retrieval]
1753
        for pos, data in self._transport.readv(self._filename, read_vector):
1754
            yield data
1755
1756
1757
class _PackAccess(object):
1758
    """Access to knit records via a collection of packs."""
1759
1760
    def __init__(self, index_to_packs, writer=None):
1761
        """Create a _PackAccess object.
1762
1763
        :param index_to_packs: A dict mapping index objects to the transport
1764
            and file names for obtaining data.
1765
        :param writer: A tuple (pack.ContainerWriter, write_index) which
2670.2.3 by Robert Collins
Review feedback.
1766
            contains the pack to write, and the index that reads from it will
1767
            be associated with.
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1768
        """
1769
        if writer:
1770
            self.container_writer = writer[0]
1771
            self.write_index = writer[1]
1772
        else:
1773
            self.container_writer = None
1774
            self.write_index = None
1775
        self.indices = index_to_packs
1776
1777
    def add_raw_records(self, sizes, raw_data):
1778
        """Add raw knit bytes to a storage area.
1779
2670.2.3 by Robert Collins
Review feedback.
1780
        The data is spooled to the container writer in one bytes-record per
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1781
        raw data item.
1782
1783
        :param sizes: An iterable containing the size of each raw data segment.
1784
        :param raw_data: A bytestring containing the data.
1785
        :return: A list of memos to retrieve the record later. Each memo is a
1786
            tuple - (index, pos, length), where the index field is the 
1787
            write_index object supplied to the PackAccess object.
1788
        """
1789
        assert type(raw_data) == str, \
1790
            'data must be plain bytes was %s' % type(raw_data)
1791
        result = []
1792
        offset = 0
1793
        for size in sizes:
1794
            p_offset, p_length = self.container_writer.add_bytes_record(
1795
                raw_data[offset:offset+size], [])
1796
            offset += size
1797
            result.append((self.write_index, p_offset, p_length))
1798
        return result
1799
1800
    def create(self):
1801
        """Pack based knits do not get individually created."""
1802
1803
    def get_raw_records(self, memos_for_retrieval):
1804
        """Get the raw bytes for a records.
1805
1806
        :param memos_for_retrieval: An iterable containing the (index, pos, 
1807
            length) memo for retrieving the bytes. The Pack access method
1808
            looks up the pack to use for a given record in its index_to_pack
1809
            map.
1810
        :return: An iterator over the bytes of the records.
1811
        """
1812
        # first pass, group into same-index requests
1813
        request_lists = []
1814
        current_index = None
1815
        for (index, offset, length) in memos_for_retrieval:
1816
            if current_index == index:
1817
                current_list.append((offset, length))
1818
            else:
1819
                if current_index is not None:
1820
                    request_lists.append((current_index, current_list))
1821
                current_index = index
1822
                current_list = [(offset, length)]
1823
        # handle the last entry
1824
        if current_index is not None:
1825
            request_lists.append((current_index, current_list))
1826
        for index, offsets in request_lists:
1827
            transport, path = self.indices[index]
1828
            reader = pack.make_readv_reader(transport, path, offsets)
1829
            for names, read_func in reader.iter_records():
1830
                yield read_func(None)
1831
1832
    def open_file(self):
1833
        """Pack based knits have no single file."""
1834
        return None
1835
1836
    def set_writer(self, writer, index, (transport, packname)):
1837
        """Set a writer to use for adding data."""
1838
        self.indices[index] = (transport, packname)
1839
        self.container_writer = writer
1840
        self.write_index = index
1841
1842
1843
class _KnitData(object):
1844
    """Manage extraction of data from a KnitAccess, caching and decompressing.
1845
    
1846
    The KnitData class provides the logic for parsing and using knit records,
1847
    making use of an access method for the low level read and write operations.
1848
    """
1849
1850
    def __init__(self, access):
1851
        """Create a KnitData object.
1852
1853
        :param access: The access method to use. Access methods such as
1854
            _KnitAccess manage the insertion of raw records and the subsequent
1855
            retrieval of the same.
1856
        """
1857
        self._access = access
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1858
        self._checked = False
1863.1.8 by John Arbash Meinel
Removing disk-backed-cache
1859
        # TODO: jam 20060713 conceptually, this could spill to disk
1860
        #       if the cached size gets larger than a certain amount
1861
        #       but it complicates the model a bit, so for now just use
1862
        #       a simple dictionary
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
1863
        self._cache = {}
1864
        self._do_cache = False
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
1865
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
1866
    def enable_cache(self):
1867
        """Enable caching of reads."""
1863.1.8 by John Arbash Meinel
Removing disk-backed-cache
1868
        self._do_cache = True
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
1869
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
1870
    def clear_cache(self):
1871
        """Clear the record cache."""
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
1872
        self._do_cache = False
1873
        self._cache = {}
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1874
1875
    def _open_file(self):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1876
        return self._access.open_file()
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1877
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1878
    def _record_to_data(self, version_id, digest, lines):
1879
        """Convert version_id, digest, lines into a raw data block.
1880
        
1881
        :return: (len, a StringIO instance with the raw data ready to read.)
1882
        """
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1883
        sio = StringIO()
2762.3.1 by Robert Collins
* The compression used within the bzr repository has changed from zlib
1884
        data_file = GzipFile(None, mode='wb', fileobj=sio,
1885
            compresslevel=Z_DEFAULT_COMPRESSION)
1908.4.10 by John Arbash Meinel
Small cleanups
1886
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1887
        assert isinstance(version_id, str)
1908.4.5 by John Arbash Meinel
Some small tweaks to knit and tuned_gzip to shave off another couple seconds
1888
        data_file.writelines(chain(
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1889
            ["version %s %d %s\n" % (version_id,
1596.2.28 by Robert Collins
more knit profile based tuning.
1890
                                     len(lines),
1891
                                     digest)],
1892
            lines,
2249.5.15 by John Arbash Meinel
remove get_cached_utf8 checks which were slowing things down.
1893
            ["end %s\n" % version_id]))
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1894
        data_file.close()
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1895
        length= sio.tell()
1596.2.28 by Robert Collins
more knit profile based tuning.
1896
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1897
        sio.seek(0)
1898
        return length, sio
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1899
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1900
    def add_raw_records(self, sizes, raw_data):
1692.4.1 by Robert Collins
Multiple merges:
1901
        """Append a prepared record to the data file.
2329.1.2 by John Arbash Meinel
Remove some spurious whitespace changes.
1902
        
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1903
        :param sizes: An iterable containing the size of each raw data segment.
1904
        :param raw_data: A bytestring containing the data.
1905
        :return: a list of index data for the way the data was stored.
1906
            See the access method add_raw_records documentation for more
1907
            details.
1692.4.1 by Robert Collins
Multiple merges:
1908
        """
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1909
        return self._access.add_raw_records(sizes, raw_data)
2329.1.2 by John Arbash Meinel
Remove some spurious whitespace changes.
1910
        
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1911
    def add_record(self, version_id, digest, lines):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1912
        """Write new text record to disk. 
1913
        
1914
        Returns index data for retrieving it later, as per add_raw_records.
1915
        """
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1916
        size, sio = self._record_to_data(version_id, digest, lines)
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1917
        result = self.add_raw_records([size], sio.getvalue())
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
1918
        if self._do_cache:
1919
            self._cache[version_id] = sio.getvalue()
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1920
        return result[0]
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1921
1922
    def _parse_record_header(self, version_id, raw_data):
1923
        """Parse a record header for consistency.
1924
1925
        :return: the header and the decompressor stream.
1926
                 as (stream, header_record)
1927
        """
1928
        df = GzipFile(mode='rb', fileobj=StringIO(raw_data))
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
1929
        try:
1930
            rec = self._check_header(version_id, df.readline())
2358.3.4 by Martin Pool
Fix mangled knit.py changes
1931
        except Exception, e:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1932
            raise KnitCorrupt(self._access,
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
1933
                              "While reading {%s} got %s(%s)"
1934
                              % (version_id, e.__class__.__name__, str(e)))
2358.3.4 by Martin Pool
Fix mangled knit.py changes
1935
        return df, rec
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
1936
2358.3.4 by Martin Pool
Fix mangled knit.py changes
1937
    def _check_header(self, version_id, line):
1938
        rec = line.split()
1939
        if len(rec) != 4:
1940
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
1941
                              'unexpected number of elements in record header')
2249.5.12 by John Arbash Meinel
Change the APIs for VersionedFile, Store, and some of Repository into utf-8
1942
        if rec[1] != version_id:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1943
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
1944
                              'unexpected version, wanted %r, got %r'
1945
                              % (version_id, rec[1]))
1946
        return rec
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1947
1948
    def _parse_record(self, version_id, data):
1628.1.2 by Robert Collins
More knit micro-optimisations.
1949
        # profiling notes:
1950
        # 4168 calls in 2880 217 internal
1951
        # 4168 calls to _parse_record_header in 2121
1952
        # 4168 calls to readlines in 330
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
1953
        df = GzipFile(mode='rb', fileobj=StringIO(data))
1954
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
1955
        try:
1956
            record_contents = df.readlines()
2358.3.4 by Martin Pool
Fix mangled knit.py changes
1957
        except Exception, e:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1958
            raise KnitCorrupt(self._access,
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
1959
                              "While reading {%s} got %s(%s)"
1960
                              % (version_id, e.__class__.__name__, str(e)))
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
1961
        header = record_contents.pop(0)
1962
        rec = self._check_header(version_id, header)
1963
1964
        last_line = record_contents.pop()
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
1965
        if len(record_contents) != int(rec[2]):
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1966
            raise KnitCorrupt(self._access,
2329.1.1 by John Arbash Meinel
Update _KnitData parser to raise more helpful errors when it detects corruption.
1967
                              'incorrect number of lines %s != %s'
1968
                              ' for version {%s}'
1969
                              % (len(record_contents), int(rec[2]),
1970
                                 version_id))
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
1971
        if last_line != 'end %s\n' % rec[1]:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1972
            raise KnitCorrupt(self._access,
2163.2.4 by John Arbash Meinel
Split _KnitData._parse_header up, so that we have 1 readlines() call, rather than readline+readlines()
1973
                              'unexpected version end line %r, wanted %r' 
1974
                              % (last_line, version_id))
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1975
        df.close()
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
1976
        return record_contents, rec[3]
1977
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1978
    def read_records_iter_raw(self, records):
1979
        """Read text records from data file and yield raw data.
1980
1981
        This unpacks enough of the text record to validate the id is
1982
        as expected but thats all.
1983
        """
1984
        # setup an iterator of the external records:
1985
        # uses readv so nice and fast we hope.
1756.3.23 by Aaron Bentley
Remove knit caches
1986
        if len(records):
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1987
            # grab the disk data needed.
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
1988
            if self._cache:
1989
                # Don't check _cache if it is empty
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1990
                needed_offsets = [index_memo for version_id, index_memo
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
1991
                                              in records
1992
                                              if version_id not in self._cache]
1993
            else:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1994
                needed_offsets = [index_memo for version_id, index_memo
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
1995
                                               in records]
1996
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1997
            raw_records = self._access.get_raw_records(needed_offsets)
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
1998
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
1999
        for version_id, index_memo in records:
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
2000
            if version_id in self._cache:
2001
                # This data has already been validated
2002
                data = self._cache[version_id]
2003
            else:
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2004
                data = raw_records.next()
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
2005
                if self._do_cache:
2006
                    self._cache[version_id] = data
2007
2008
                # validate the header
2009
                df, rec = self._parse_record_header(version_id, data)
2010
                df.close()
1756.3.23 by Aaron Bentley
Remove knit caches
2011
            yield version_id, data
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2012
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2013
    def read_records_iter(self, records):
2014
        """Read text records from data file and yield result.
2015
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2016
        The result will be returned in whatever is the fastest to read.
2017
        Not by the order requested. Also, multiple requests for the same
2018
        record will only yield 1 response.
2019
        :param records: A list of (version_id, pos, len) entries
2020
        :return: Yields (version_id, contents, digest) in the order
2021
                 read, not the order requested
2022
        """
2023
        if not records:
2024
            return
2025
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
2026
        if self._cache:
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2027
            # Skip records we have alread seen
2028
            yielded_records = set()
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
2029
            needed_records = set()
2030
            for record in records:
2031
                if record[0] in self._cache:
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2032
                    if record[0] in yielded_records:
2033
                        continue
2034
                    yielded_records.add(record[0])
2035
                    data = self._cache[record[0]]
2036
                    content, digest = self._parse_record(record[0], data)
2037
                    yield (record[0], content, digest)
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
2038
                else:
2039
                    needed_records.add(record)
2040
            needed_records = sorted(needed_records, key=operator.itemgetter(1))
2041
        else:
2042
            needed_records = sorted(set(records), key=operator.itemgetter(1))
1756.3.23 by Aaron Bentley
Remove knit caches
2043
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2044
        if not needed_records:
2045
            return
2046
2047
        # The transport optimizes the fetching as well 
2048
        # (ie, reads continuous ranges.)
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2049
        raw_data = self._access.get_raw_records(
2050
            [index_memo for version_id, index_memo in needed_records])
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2051
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2052
        for (version_id, index_memo), data in \
2053
                izip(iter(needed_records), raw_data):
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2054
            content, digest = self._parse_record(version_id, data)
1863.1.1 by John Arbash Meinel
Allow Versioned files to do caching if explicitly asked, and implement for Knit
2055
            if self._do_cache:
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2056
                self._cache[version_id] = data
1756.3.23 by Aaron Bentley
Remove knit caches
2057
            yield version_id, content, digest
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2058
2059
    def read_records(self, records):
2060
        """Read records into a dictionary."""
2061
        components = {}
1863.1.5 by John Arbash Meinel
Add a read_records_iter_unsorted, which can return records in any order.
2062
        for record_id, content, digest in \
1863.1.9 by John Arbash Meinel
Switching to have 'read_records_iter' return in random order.
2063
                self.read_records_iter(records):
1563.2.4 by Robert Collins
First cut at including the knit implementation of versioned_file.
2064
            components[record_id] = (content, digest)
2065
        return components
2066
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2067
2068
class InterKnit(InterVersionedFile):
2069
    """Optimised code paths for knit to knit operations."""
2070
    
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
2071
    _matching_file_from_factory = KnitVersionedFile
2072
    _matching_file_to_factory = KnitVersionedFile
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2073
    
2074
    @staticmethod
2075
    def is_compatible(source, target):
2076
        """Be compatible with knits.  """
2077
        try:
2078
            return (isinstance(source, KnitVersionedFile) and
2079
                    isinstance(target, KnitVersionedFile))
2080
        except AttributeError:
2081
            return False
2082
1563.2.31 by Robert Collins
Convert Knit repositories to use knits.
2083
    def join(self, pb=None, msg=None, version_ids=None, ignore_missing=False):
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2084
        """See InterVersionedFile.join."""
2085
        assert isinstance(self.source, KnitVersionedFile)
2086
        assert isinstance(self.target, KnitVersionedFile)
2087
1684.3.2 by Robert Collins
Factor out version_ids-to-join selection in InterVersionedfile.
2088
        version_ids = self._get_source_version_ids(version_ids, ignore_missing)
1563.2.31 by Robert Collins
Convert Knit repositories to use knits.
2089
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2090
        if not version_ids:
2091
            return 0
2092
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
2093
        pb = ui.ui_factory.nested_progress_bar()
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2094
        try:
2095
            version_ids = list(version_ids)
2096
            if None in version_ids:
2097
                version_ids.remove(None)
2098
    
2099
            self.source_ancestry = set(self.source.get_ancestry(version_ids))
2100
            this_versions = set(self.target._index.get_versions())
2101
            needed_versions = self.source_ancestry - this_versions
2102
            cross_check_versions = self.source_ancestry.intersection(this_versions)
2103
            mismatched_versions = set()
2104
            for version in cross_check_versions:
2105
                # scan to include needed parents.
2106
                n1 = set(self.target.get_parents_with_ghosts(version))
2107
                n2 = set(self.source.get_parents_with_ghosts(version))
2108
                if n1 != n2:
2109
                    # FIXME TEST this check for cycles being introduced works
2110
                    # the logic is we have a cycle if in our graph we are an
2111
                    # ancestor of any of the n2 revisions.
2112
                    for parent in n2:
2113
                        if parent in n1:
2114
                            # safe
2115
                            continue
2116
                        else:
2117
                            parent_ancestors = self.source.get_ancestry(parent)
2118
                            if version in parent_ancestors:
2119
                                raise errors.GraphCycleError([parent, version])
2120
                    # ensure this parent will be available later.
2121
                    new_parents = n2.difference(n1)
2122
                    needed_versions.update(new_parents.difference(this_versions))
2123
                    mismatched_versions.add(version)
2124
    
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
2125
            if not needed_versions and not mismatched_versions:
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2126
                return 0
1910.2.65 by Aaron Bentley
Remove the check-parent patch
2127
            full_list = topo_sort(self.source.get_graph())
2128
    
2129
            version_list = [i for i in full_list if (not self.target.has_version(i)
2130
                            and i in needed_versions)]
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2131
    
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2132
            # plan the join:
2133
            copy_queue = []
2134
            copy_queue_records = []
2135
            copy_set = set()
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2136
            for version_id in version_list:
2137
                options = self.source._index.get_options(version_id)
2138
                parents = self.source._index.get_parents_with_ghosts(version_id)
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2139
                # check that its will be a consistent copy:
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2140
                for parent in parents:
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2141
                    # if source has the parent, we must :
2142
                    # * already have it or
2143
                    # * have it scheduled already
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
2144
                    # otherwise we don't care
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2145
                    assert (self.target.has_version(parent) or
2146
                            parent in copy_set or
2147
                            not self.source.has_version(parent))
2670.2.2 by Robert Collins
* In ``bzrlib.knit`` the internal interface has been altered to use
2148
                index_memo = self.source._index.get_position(version_id)
2149
                copy_queue_records.append((version_id, index_memo))
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2150
                copy_queue.append((version_id, options, parents))
2151
                copy_set.add(version_id)
2152
2153
            # data suck the join:
2154
            count = 0
2155
            total = len(version_list)
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
2156
            raw_datum = []
2157
            raw_records = []
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2158
            for (version_id, raw_data), \
2159
                (version_id2, options, parents) in \
2160
                izip(self.source._data.read_records_iter_raw(copy_queue_records),
2161
                     copy_queue):
2162
                assert version_id == version_id2, 'logic error, inconsistent results'
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2163
                count = count + 1
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2164
                pb.update("Joining knit", count, total)
1692.2.1 by Robert Collins
Fix knit based push to only perform 2 appends to the target, rather that 2*new-versions.
2165
                raw_records.append((version_id, options, parents, len(raw_data)))
2166
                raw_datum.append(raw_data)
2167
            self.target._add_raw_records(raw_records, ''.join(raw_datum))
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2168
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2169
            for version in mismatched_versions:
1596.2.8 by Robert Collins
Join knits with the original gzipped data avoiding recompression.
2170
                # FIXME RBC 20060309 is this needed?
1594.2.24 by Robert Collins
Make use of the transaction finalisation warning support to implement in-knit caching.
2171
                n1 = set(self.target.get_parents_with_ghosts(version))
2172
                n2 = set(self.source.get_parents_with_ghosts(version))
2173
                # write a combined record to our history preserving the current 
2174
                # parents as first in the list
2175
                new_parents = self.target.get_parents_with_ghosts(version) + list(n2.difference(n1))
2176
                self.target.fix_parents(version, new_parents)
2177
            return count
2178
        finally:
2179
            pb.finished()
1563.2.13 by Robert Collins
InterVersionedFile implemented.
2180
2181
2182
InterVersionedFile.register_optimiser(InterKnit)
1596.2.24 by Robert Collins
Gzipfile was slightly slower than ideal.
2183
2184
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
2185
class WeaveToKnit(InterVersionedFile):
2186
    """Optimised code paths for weave to knit operations."""
2187
    
2188
    _matching_file_from_factory = bzrlib.weave.WeaveFile
2189
    _matching_file_to_factory = KnitVersionedFile
2190
    
2191
    @staticmethod
2192
    def is_compatible(source, target):
2193
        """Be compatible with weaves to knits."""
2194
        try:
2195
            return (isinstance(source, bzrlib.weave.Weave) and
2196
                    isinstance(target, KnitVersionedFile))
2197
        except AttributeError:
2198
            return False
2199
2200
    def join(self, pb=None, msg=None, version_ids=None, ignore_missing=False):
2201
        """See InterVersionedFile.join."""
2202
        assert isinstance(self.source, bzrlib.weave.Weave)
2203
        assert isinstance(self.target, KnitVersionedFile)
2204
2205
        version_ids = self._get_source_version_ids(version_ids, ignore_missing)
2206
2207
        if not version_ids:
2208
            return 0
2209
2158.3.1 by Dmitry Vasiliev
KnitIndex tests/fixes/optimizations
2210
        pb = ui.ui_factory.nested_progress_bar()
1684.3.3 by Robert Collins
Add a special cased weaves to knit converter.
2211
        try:
2212
            version_ids = list(version_ids)
2213
    
2214
            self.source_ancestry = set(self.source.get_ancestry(version_ids))
2215
            this_versions = set(self.target._index.get_versions())
2216
            needed_versions = self.source_ancestry - this_versions
2217
            cross_check_versions = self.source_ancestry.intersection(this_versions)
2218
            mismatched_versions = set()
2219
            for version in cross_check_versions:
2220
                # scan to include needed parents.
2221
                n1 = set(self.target.get_parents_with_ghosts(version))
2222
                n2 = set(self.source.get_parents(version))
2223
                # if all of n2's parents are in n1, then its fine.
2224
                if n2.difference(n1):
2225
                    # FIXME TEST this check for cycles being introduced works
2226
                    # the logic is we have a cycle if in our graph we are an
2227
                    # ancestor of any of the n2 revisions.
2228
                    for parent in n2:
2229
                        if parent in n1:
2230
                            # safe
2231
                            continue
2232
                        else:
2233
                            parent_ancestors = self.source.get_ancestry(parent)
2234
                            if version in parent_ancestors:
2235
                                raise errors.GraphCycleError([parent, version])
2236
                    # ensure this parent will be available later.
2237
                    new_parents = n2.difference(n1)
2238
                    needed_versions.update(new_parents.difference(this_versions))
2239
                    mismatched_versions.add(version)
2240
    
2241
            if not needed_versions and not mismatched_versions:
2242
                return 0
2243
            full_list = topo_sort(self.source.get_graph())
2244
    
2245
            version_list = [i for i in full_list if (not self.target.has_version(i)
2246
                            and i in needed_versions)]
2247
    
2248
            # do the join:
2249
            count = 0
2250
            total = len(version_list)
2251
            for version_id in version_list:
2252
                pb.update("Converting to knit", count, total)
2253
                parents = self.source.get_parents(version_id)
2254
                # check that its will be a consistent copy:
2255
                for parent in parents:
2256
                    # if source has the parent, we must already have it
2257
                    assert (self.target.has_version(parent))
2258
                self.target.add_lines(
2259
                    version_id, parents, self.source.get_lines(version_id))
2260
                count = count + 1
2261
2262
            for version in mismatched_versions:
2263
                # FIXME RBC 20060309 is this needed?
2264
                n1 = set(self.target.get_parents_with_ghosts(version))
2265
                n2 = set(self.source.get_parents(version))
2266
                # write a combined record to our history preserving the current 
2267
                # parents as first in the list
2268
                new_parents = self.target.get_parents_with_ghosts(version) + list(n2.difference(n1))
2269
                self.target.fix_parents(version, new_parents)
2270
            return count
2271
        finally:
2272
            pb.finished()
2273
2274
2275
InterVersionedFile.register_optimiser(WeaveToKnit)
2276
2277
1711.2.11 by John Arbash Meinel
Rename patiencediff.SequenceMatcher => PatienceSequenceMatcher and knit.SequenceMatcher => KnitSequenceMatcher
2278
class KnitSequenceMatcher(difflib.SequenceMatcher):
1596.2.35 by Robert Collins
Subclass SequenceMatcher to get a slightly faster (in our case) find_longest_match routine.
2279
    """Knit tuned sequence matcher.
2280
2281
    This is based on profiling of difflib which indicated some improvements
2282
    for our usage pattern.
2283
    """
2284
2285
    def find_longest_match(self, alo, ahi, blo, bhi):
2286
        """Find longest matching block in a[alo:ahi] and b[blo:bhi].
2287
2288
        If isjunk is not defined:
2289
2290
        Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where
2291
            alo <= i <= i+k <= ahi
2292
            blo <= j <= j+k <= bhi
2293
        and for all (i',j',k') meeting those conditions,
2294
            k >= k'
2295
            i <= i'
2296
            and if i == i', j <= j'
2297
2298
        In other words, of all maximal matching blocks, return one that
2299
        starts earliest in a, and of all those maximal matching blocks that
2300
        start earliest in a, return the one that starts earliest in b.
2301
2302
        >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
2303
        >>> s.find_longest_match(0, 5, 0, 9)
2304
        (0, 4, 5)
2305
2306
        If isjunk is defined, first the longest matching block is
2307
        determined as above, but with the additional restriction that no
2308
        junk element appears in the block.  Then that block is extended as
2309
        far as possible by matching (only) junk elements on both sides.  So
2310
        the resulting block never matches on junk except as identical junk
2311
        happens to be adjacent to an "interesting" match.
2312
2313
        Here's the same example as before, but considering blanks to be
2314
        junk.  That prevents " abcd" from matching the " abcd" at the tail
2315
        end of the second sequence directly.  Instead only the "abcd" can
2316
        match, and matches the leftmost "abcd" in the second sequence:
2317
2318
        >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
2319
        >>> s.find_longest_match(0, 5, 0, 9)
2320
        (1, 0, 4)
2321
2322
        If no blocks match, return (alo, blo, 0).
2323
2324
        >>> s = SequenceMatcher(None, "ab", "c")
2325
        >>> s.find_longest_match(0, 2, 0, 1)
2326
        (0, 0, 0)
2327
        """
2328
2329
        # CAUTION:  stripping common prefix or suffix would be incorrect.
2330
        # E.g.,
2331
        #    ab
2332
        #    acab
2333
        # Longest matching block is "ab", but if common prefix is
2334
        # stripped, it's "a" (tied with "b").  UNIX(tm) diff does so
2335
        # strip, so ends up claiming that ab is changed to acab by
2336
        # inserting "ca" in the middle.  That's minimal but unintuitive:
2337
        # "it's obvious" that someone inserted "ac" at the front.
2338
        # Windiff ends up at the same place as diff, but by pairing up
2339
        # the unique 'b's and then matching the first two 'a's.
2340
2341
        a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.isbjunk
2342
        besti, bestj, bestsize = alo, blo, 0
2343
        # find longest junk-free match
2344
        # during an iteration of the loop, j2len[j] = length of longest
2345
        # junk-free match ending with a[i-1] and b[j]
2346
        j2len = {}
2347
        # nothing = []
2348
        b2jget = b2j.get
2349
        for i in xrange(alo, ahi):
2350
            # look at all instances of a[i] in b; note that because
2351
            # b2j has no junk keys, the loop is skipped if a[i] is junk
2352
            j2lenget = j2len.get
2353
            newj2len = {}
2354
            
1759.2.1 by Jelmer Vernooij
Fix some types (found using aspell).
2355
            # changing b2j.get(a[i], nothing) to a try:KeyError pair produced the
1596.2.35 by Robert Collins
Subclass SequenceMatcher to get a slightly faster (in our case) find_longest_match routine.
2356
            # following improvement
2357
            #     704  0   4650.5320   2620.7410   bzrlib.knit:1336(find_longest_match)
2358
            # +326674  0   1655.1210   1655.1210   +<method 'get' of 'dict' objects>
2359
            #  +76519  0    374.6700    374.6700   +<method 'has_key' of 'dict' objects>
2360
            # to 
2361
            #     704  0   3733.2820   2209.6520   bzrlib.knit:1336(find_longest_match)
2362
            #  +211400 0   1147.3520   1147.3520   +<method 'get' of 'dict' objects>
2363
            #  +76519  0    376.2780    376.2780   +<method 'has_key' of 'dict' objects>
2364
2365
            try:
2366
                js = b2j[a[i]]
2367
            except KeyError:
2368
                pass
2369
            else:
2370
                for j in js:
2371
                    # a[i] matches b[j]
2372
                    if j >= blo:
2373
                        if j >= bhi:
2374
                            break
2375
                        k = newj2len[j] = 1 + j2lenget(-1 + j, 0)
2376
                        if k > bestsize:
2377
                            besti, bestj, bestsize = 1 + i-k, 1 + j-k, k
2378
            j2len = newj2len
2379
2380
        # Extend the best by non-junk elements on each end.  In particular,
2381
        # "popular" non-junk elements aren't in b2j, which greatly speeds
2382
        # the inner loop above, but also means "the best" match so far
2383
        # doesn't contain any junk *or* popular non-junk elements.
2384
        while besti > alo and bestj > blo and \
2385
              not isbjunk(b[bestj-1]) and \
2386
              a[besti-1] == b[bestj-1]:
2387
            besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
2388
        while besti+bestsize < ahi and bestj+bestsize < bhi and \
2389
              not isbjunk(b[bestj+bestsize]) and \
2390
              a[besti+bestsize] == b[bestj+bestsize]:
2391
            bestsize += 1
2392
2393
        # Now that we have a wholly interesting match (albeit possibly
2394
        # empty!), we may as well suck up the matching junk on each
2395
        # side of it too.  Can't think of a good reason not to, and it
2396
        # saves post-processing the (possibly considerable) expense of
2397
        # figuring out what to do with it.  In the case of an empty
2398
        # interesting match, this is clearly the right thing to do,
2399
        # because no other kind of match is possible in the regions.
2400
        while besti > alo and bestj > blo and \
2401
              isbjunk(b[bestj-1]) and \
2402
              a[besti-1] == b[bestj-1]:
2403
            besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
2404
        while besti+bestsize < ahi and bestj+bestsize < bhi and \
2405
              isbjunk(b[bestj+bestsize]) and \
2406
              a[besti+bestsize] == b[bestj+bestsize]:
2407
            bestsize = bestsize + 1
2408
2409
        return besti, bestj, bestsize
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
2410
2411
2412
try:
2484.1.12 by John Arbash Meinel
Switch the layout to use a matching _knit_load_data_py.py and _knit_load_data_c.pyx
2413
    from bzrlib._knit_load_data_c import _load_data_c as _load_data
2484.1.1 by John Arbash Meinel
Add an initial function to read knit indexes in pyrex.
2414
except ImportError:
2484.1.12 by John Arbash Meinel
Switch the layout to use a matching _knit_load_data_py.py and _knit_load_data_c.pyx
2415
    from bzrlib._knit_load_data_py import _load_data_py as _load_data