/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
0.1.1 by Martin Pool
Check in old existing knit code.
1
#! /usr/bin/python
2
3
# Copyright (C) 2005 Canonical Ltd
4
0.1.33 by Martin Pool
add gpl text
5
# This program is free software; you can redistribute it and/or modify
6
# it under the terms of the GNU General Public License as published by
7
# the Free Software Foundation; either version 2 of the License, or
8
# (at your option) any later version.
9
10
# This program is distributed in the hope that it will be useful,
11
# but WITHOUT ANY WARRANTY; without even the implied warranty of
12
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
13
# GNU General Public License for more details.
14
15
# You should have received a copy of the GNU General Public License
16
# along with this program; if not, write to the Free Software
17
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
0.1.1 by Martin Pool
Check in old existing knit code.
18
19
# Author: Martin Pool <mbp@canonical.com>
20
21
0.1.38 by Martin Pool
Rename knit to weave. (I don't think there's an existing module called weave.)
22
"""Weave - storage of related text file versions"""
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
23
928 by Martin Pool
- go back to using plain builtin set()
24
# before intset (r923) 2000 versions in 41.5s
25
# with intset (r926) 2000 versions in 93s !!!
26
# better to just use plain sets.
27
931 by Martin Pool
- experiment with making Weave._extract() return a list, not a generator - slightly faster
28
# making _extract build and return a list, rather than being a generator
29
# takes 37.94s
30
938 by Martin Pool
- various optimizations to weave add code
31
# with python -O, r923 does 2000 versions in 36.87s
32
33
# with optimizations to avoid mutating lists - 35.75!  I guess copying
34
# all the elements every time costs more than the small manipulations.
35
# a surprisingly small change.
36
37
# r931, which avoids using a generator for extract, does 36.98s
38
39
# with memoized inclusions, takes 41.49s; not very good
40
41
# with slots, takes 37.35s; without takes 39.16, a bit surprising
42
43
# with the delta calculation mixed in with the add method, rather than
44
# separated, takes 36.78s
45
46
# with delta folded in and mutation of the list, 36.13s
47
48
# with all this and simplification of add code, 33s 
49
50
0.1.61 by Martin Pool
doc
51
# TODO: Perhaps have copy method for Weave instances?
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
52
0.1.58 by Martin Pool
doc
53
# XXX: If we do weaves this way, will a merge still behave the same
54
# way if it's done in a different order?  That's a pretty desirable
55
# property.
56
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
57
# TODO: Nothing here so far assumes the lines are really \n newlines,
58
# rather than being split up in some other way.  We could accomodate
59
# binaries, perhaps by naively splitting on \n or perhaps using
60
# something like a rolling checksum.
61
62
# TODO: Track version names as well as indexes. 
63
0.1.85 by Martin Pool
doc
64
# TODO: End marker for each version so we can stop reading?
0.1.69 by Martin Pool
Simple text-based format for storing weaves, cleaner than
65
66
# TODO: Check that no insertion occurs inside a deletion that was
67
# active in the version of the insertion.
68
912 by Martin Pool
- update todos for weave
69
# TODO: In addition to the SHA-1 check, perhaps have some code that
70
# checks structural constraints of the weave: ie that insertions are
71
# properly nested, that there is no text outside of an insertion, that
72
# insertions or deletions are not repeated, etc.
0.1.85 by Martin Pool
doc
73
918 by Martin Pool
- start doing new weave-merge algorithm
74
# TODO: Make the info command just show info, not extract everything:
75
# it can be much faster.
76
77
# TODO: Perhaps use long integers as sets instead of set objects; may
78
# be faster.
79
80
# TODO: Parallel-extract that passes back each line along with a
81
# description of which revisions include it.  Nice for checking all
82
# shas in parallel.
83
84
0.1.85 by Martin Pool
doc
85
924 by Martin Pool
- Add IntSet class
86
0.1.47 by Martin Pool
New WeaveError and WeaveFormatError rather than assertions.
87
class WeaveError(Exception):
88
    """Exception in processing weave"""
89
90
91
class WeaveFormatError(WeaveError):
92
    """Weave invariant violated"""
93
    
94
0.1.38 by Martin Pool
Rename knit to weave. (I don't think there's an existing module called weave.)
95
class Weave(object):
96
    """weave - versioned text file storage.
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
97
    
0.1.72 by Martin Pool
Go back to weave lines normally having newlines at the end.
98
    A Weave manages versions of line-based text files, keeping track
99
    of the originating version for each line.
100
101
    To clients the "lines" of the file are represented as a list of strings.
102
    These strings  will typically have terminal newline characters, but
103
    this is not required.  In particular files commonly do not have a newline
104
    at the end of the file.
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
105
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
106
    Texts can be identified in either of two ways:
107
108
    * a nonnegative index number.
109
110
    * a version-id string.
111
0.1.38 by Martin Pool
Rename knit to weave. (I don't think there's an existing module called weave.)
112
    Typically the index number will be valid only inside this weave and
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
113
    the version-id is used to reference it in the larger world.
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
114
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
115
    The weave is represented as a list mixing edit instructions and
116
    literal text.  Each entry in _l can be either a string (or
117
    unicode), or a tuple.  If a string, it means that the given line
118
    should be output in the currently active revisions.
119
120
    If a tuple, it gives a processing instruction saying in which
121
    revisions the enclosed lines are active.  The tuple has the form
122
    (instruction, version).
123
124
    The instruction can be '{' or '}' for an insertion block, and '['
125
    and ']' for a deletion block respectively.  The version is the
0.1.45 by Martin Pool
doc
126
    integer version index.  There is no replace operator, only deletes
127
    and inserts.
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
128
0.1.41 by Martin Pool
Doc
129
    Constraints/notes:
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
130
131
    * A later version can delete lines that were introduced by any
132
      number of ancestor versions; this implies that deletion
133
      instructions can span insertion blocks without regard to the
134
      insertion block's nesting.
135
0.1.41 by Martin Pool
Doc
136
    * Similarly, deletions need not be properly nested with regard to
137
      each other, because they might have been generated by
138
      independent revisions.
139
0.1.45 by Martin Pool
doc
140
    * Insertions are always made by inserting a new bracketed block
141
      into a single point in the previous weave.  This implies they
142
      can nest but not overlap, and the nesting must always have later
143
      insertions on the inside.
144
0.1.41 by Martin Pool
Doc
145
    * It doesn't seem very useful to have an active insertion
146
      inside an inactive insertion, but it might happen.
0.1.45 by Martin Pool
doc
147
      
0.1.41 by Martin Pool
Doc
148
    * Therefore, all instructions are always"considered"; that
149
      is passed onto and off the stack.  An outer inactive block
150
      doesn't disable an inner block.
151
152
    * Lines are enabled if the most recent enclosing insertion is
153
      active and none of the enclosing deletions are active.
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
154
0.1.49 by Martin Pool
Add another constraint: revisions should not delete text that they
155
    * There is no point having a deletion directly inside its own
156
      insertion; you might as well just not write it.  And there
157
      should be no way to get an earlier version deleting a later
158
      version.
159
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
160
    _l
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
161
        Text of the weave.
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
162
163
    _v
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
164
        List of parents, indexed by version number.
165
        It is only necessary to store the minimal set of parents for
166
        each version; the parent's parents are implied.
0.1.13 by Martin Pool
Knit structure now allows for versions to include the lines present in other
167
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
168
    _sha1s
169
        List of hex SHA-1 of each version, or None if not recorded.
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
170
    """
938 by Martin Pool
- various optimizations to weave add code
171
172
    ## __slots__ = ['_l', '_v', '_sha1s']
173
    
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
174
    def __init__(self):
175
        self._l = []
176
        self._v = []
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
177
        self._sha1s = []
0.1.60 by Martin Pool
Weave eq and ne methods
178
179
180
    def __eq__(self, other):
181
        if not isinstance(other, Weave):
182
            return False
183
        return self._v == other._v \
184
               and self._l == other._l
185
    
186
187
    def __ne__(self, other):
188
        return not self.__eq__(other)
189
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
190
        
0.1.26 by Martin Pool
Refactor parameters to add command
191
    def add(self, parents, text):
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
192
        """Add a single text on top of the weave.
0.1.36 by Martin Pool
doc
193
  
0.1.26 by Martin Pool
Refactor parameters to add command
194
        Returns the index number of the newly added version.
195
196
        parents
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
197
            List or set of direct parent version numbers.
198
            
0.1.26 by Martin Pool
Refactor parameters to add command
199
        text
200
            Sequence of lines to be added in the new version."""
938 by Martin Pool
- various optimizations to weave add code
201
202
        self._check_versions(parents)
0.1.82 by Martin Pool
Small weave optimizations
203
        ## self._check_lines(text)
938 by Martin Pool
- various optimizations to weave add code
204
        new_version = len(self._v)
0.1.5 by Martin Pool
Add test for storing two text versions.
205
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
206
        import sha
207
        s = sha.new()
938 by Martin Pool
- various optimizations to weave add code
208
        map(s.update, text)
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
209
        sha1 = s.hexdigest()
210
        del s
211
938 by Martin Pool
- various optimizations to weave add code
212
        # if we abort after here the weave will be corrupt
939 by Martin Pool
- remove trivial function Weave._addversion
213
        self._v.append(frozenset(parents))
0.1.89 by Martin Pool
Store SHA1 in weave file for later verification
214
        self._sha1s.append(sha1)
938 by Martin Pool
- various optimizations to weave add code
215
216
            
217
        if not parents:
218
            # special case; adding with no parents revision; can do
219
            # this more quickly by just appending unconditionally.
220
            # even more specially, if we're adding an empty text we
221
            # need do nothing at all.
222
            if text:
223
                self._l.append(('{', new_version))
224
                self._l.extend(text)
225
                self._l.append(('}', new_version))
226
        
227
            return new_version
228
229
        if len(parents) == 1 and sha1 == self._sha1s[parents[0]]:
230
            # special case: same as the single parent
231
            return new_version
232
            
233
234
        ancestors = self.inclusions(parents)
235
236
        l = self._l
237
238
        # basis a list of (origin, lineno, line)
239
        basis_lineno = []
240
        basis_lines = []
241
        for origin, lineno, line in self._extract(ancestors):
242
            basis_lineno.append(lineno)
243
            basis_lines.append(line)
244
245
        # another small special case: a merge, producing the same text as auto-merge
246
        if text == basis_lines:
247
            return new_version            
248
249
        # add a sentinal, because we can also match against the final line
250
        basis_lineno.append(len(self._l))
251
252
        # XXX: which line of the weave should we really consider
253
        # matches the end of the file?  the current code says it's the
254
        # last line of the weave?
255
256
        #print 'basis_lines:', basis_lines
257
        #print 'new_lines:  ', lines
258
259
        from difflib import SequenceMatcher
260
        s = SequenceMatcher(None, basis_lines, text)
261
262
        # offset gives the number of lines that have been inserted
263
        # into the weave up to the current point; if the original edit instruction
264
        # says to change line A then we actually change (A+offset)
265
        offset = 0
266
267
        for tag, i1, i2, j1, j2 in s.get_opcodes():
268
            # i1,i2 are given in offsets within basis_lines; we need to map them
269
            # back to offsets within the entire weave
270
            #print 'raw match', tag, i1, i2, j1, j2
271
            if tag == 'equal':
272
                continue
273
274
            i1 = basis_lineno[i1]
275
            i2 = basis_lineno[i2]
276
277
            assert 0 <= i1 <= i2 <= len(old_l)
278
            assert 0 <= j1 <= j2 <= len(text)
279
280
            #print tag, i1, i2, j1, j2
281
282
            # the deletion and insertion are handled separately.
283
            # first delete the region.
284
            if i1 != i2:
285
                self._l.insert(i1+offset, ('[', new_version))
286
                self._l.insert(i2+offset+1, (']', new_version))
287
                offset += 2
288
289
            if j1 != j2:
290
                # there may have been a deletion spanning up to
291
                # i2; we want to insert after this region to make sure
292
                # we don't destroy ourselves
293
                i = i2 + offset
294
                self._l[i:i] = ([('{', new_version)] 
295
                                + text[j1:j2] 
296
                                + [('}', new_version)])
297
                offset += 2 + (j2 - j1)
298
299
        return new_version
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
300
0.1.27 by Martin Pool
Check that version numbers passed in are reasonable
301
0.1.78 by Martin Pool
Rename Weave.get_included to inclusions and getiter to get_iter
302
    def inclusions(self, versions):
893 by Martin Pool
- Refactor weave calculation of inclusions
303
        """Return set of all ancestors of given version(s)."""
928 by Martin Pool
- go back to using plain builtin set()
304
        i = set(versions)
893 by Martin Pool
- Refactor weave calculation of inclusions
305
        v = max(versions)
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
306
        try:
893 by Martin Pool
- Refactor weave calculation of inclusions
307
            while v >= 0:
308
                if v in i:
309
                    # include all its parents
310
                    i.update(self._v[v])
311
                v -= 1
312
            return i
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
313
        except IndexError:
314
            raise ValueError("version %d not present in weave" % v)
0.1.77 by Martin Pool
New Weave.get_included() does transitive expansion
315
316
890 by Martin Pool
- weave info should show minimal expression of parents
317
    def minimal_parents(self, version):
318
        """Find the minimal set of parents for the version."""
319
        included = self._v[version]
320
        if not included:
321
            return []
322
        
323
        li = list(included)
893 by Martin Pool
- Refactor weave calculation of inclusions
324
        li.sort(reverse=True)
890 by Martin Pool
- weave info should show minimal expression of parents
325
326
        mininc = []
928 by Martin Pool
- go back to using plain builtin set()
327
        gotit = set()
890 by Martin Pool
- weave info should show minimal expression of parents
328
329
        for pv in li:
330
            if pv not in gotit:
331
                mininc.append(pv)
893 by Martin Pool
- Refactor weave calculation of inclusions
332
                gotit.update(self.inclusions(pv))
890 by Martin Pool
- weave info should show minimal expression of parents
333
334
        assert mininc[0] >= 0
335
        assert mininc[-1] < version
336
        return mininc
337
338
0.1.75 by Martin Pool
Remove VerInfo class; just store sets directly in the list of
339
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
340
    def _check_lines(self, text):
341
        if not isinstance(text, list):
342
            raise ValueError("text should be a list, not %s" % type(text))
343
344
        for l in text:
345
            if not isinstance(l, basestring):
869 by Martin Pool
- more weave.py command line options
346
                raise ValueError("text line should be a string or unicode, not %s"
347
                                 % type(l))
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
348
        
349
350
0.1.27 by Martin Pool
Check that version numbers passed in are reasonable
351
    def _check_versions(self, indexes):
352
        """Check everything in the sequence of indexes is valid"""
353
        for i in indexes:
354
            try:
355
                self._v[i]
356
            except IndexError:
357
                raise IndexError("invalid version number %r" % i)
358
0.1.2 by Martin Pool
Import testsweet module adapted from bzr.
359
    
0.1.7 by Martin Pool
Add trivial annotate text
360
    def annotate(self, index):
361
        return list(self.annotate_iter(index))
362
363
0.1.78 by Martin Pool
Rename Weave.get_included to inclusions and getiter to get_iter
364
    def annotate_iter(self, version):
0.1.7 by Martin Pool
Add trivial annotate text
365
        """Yield list of (index-id, line) pairs for the specified version.
366
367
        The index indicates when the line originated in the weave."""
893 by Martin Pool
- Refactor weave calculation of inclusions
368
        for origin, lineno, text in self._extract([version]):
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
369
            yield origin, text
0.1.22 by Martin Pool
Calculate delta for new versions relative to a set of parent versions.
370
371
918 by Martin Pool
- start doing new weave-merge algorithm
372
    def _walk(self):
373
        """Walk the weave.
374
375
        Yields sequence of
376
        (lineno, insert, deletes, text)
377
        for each literal line.
378
        """
379
        
380
        istack = []
928 by Martin Pool
- go back to using plain builtin set()
381
        dset = set()
918 by Martin Pool
- start doing new weave-merge algorithm
382
383
        lineno = 0         # line of weave, 0-based
384
385
        for l in self._l:
386
            if isinstance(l, tuple):
387
                c, v = l
388
                isactive = None
389
                if c == '{':
390
                    istack.append(v)
391
                elif c == '}':
392
                    oldv = istack.pop()
393
                elif c == '[':
926 by Martin Pool
- update more weave code to use intsets
394
                    assert v not in dset
395
                    dset.add(v)
918 by Martin Pool
- start doing new weave-merge algorithm
396
                elif c == ']':
926 by Martin Pool
- update more weave code to use intsets
397
                    dset.remove(v)
918 by Martin Pool
- start doing new weave-merge algorithm
398
                else:
399
                    raise WeaveFormatError('unexpected instruction %r'
400
                                           % v)
401
            else:
402
                assert isinstance(l, basestring)
403
                assert istack
404
                yield lineno, istack[-1], dset, l
405
            lineno += 1
406
407
408
893 by Martin Pool
- Refactor weave calculation of inclusions
409
    def _extract(self, versions):
0.1.20 by Martin Pool
Factor out Knit.extract() method
410
        """Yield annotation of lines in included set.
411
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
412
        Yields a sequence of tuples (origin, lineno, text), where
413
        origin is the origin version, lineno the index in the weave,
414
        and text the text of the line.
415
0.1.20 by Martin Pool
Factor out Knit.extract() method
416
        The set typically but not necessarily corresponds to a version.
417
        """
893 by Martin Pool
- Refactor weave calculation of inclusions
418
        included = self.inclusions(versions)
881 by Martin Pool
- faster weave extraction
419
420
        istack = []
928 by Martin Pool
- go back to using plain builtin set()
421
        dset = set()
0.1.48 by Martin Pool
Basic parsing of delete instructions.
422
423
        lineno = 0         # line of weave, 0-based
891 by Martin Pool
- fix up refactoring of weave
424
894 by Martin Pool
- small optimization for weave extract
425
        isactive = None
0.1.85 by Martin Pool
doc
426
931 by Martin Pool
- experiment with making Weave._extract() return a list, not a generator - slightly faster
427
        result = []
428
0.1.63 by Martin Pool
Abbreviate WeaveFormatError in some code
429
        WFE = WeaveFormatError
0.1.95 by Martin Pool
- preliminary merge conflict detection
430
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
431
        for l in self._l:
432
            if isinstance(l, tuple):
433
                c, v = l
894 by Martin Pool
- small optimization for weave extract
434
                isactive = None
891 by Martin Pool
- fix up refactoring of weave
435
                if c == '{':
436
                    assert v not in istack
437
                    istack.append(v)
438
                elif c == '}':
439
                    oldv = istack.pop()
440
                    assert oldv == v
441
                elif c == '[':
442
                    if v in included:
881 by Martin Pool
- faster weave extraction
443
                        assert v not in dset
0.1.48 by Martin Pool
Basic parsing of delete instructions.
444
                        dset.add(v)
891 by Martin Pool
- fix up refactoring of weave
445
                else:
446
                    assert c == ']'
447
                    if v in included:
881 by Martin Pool
- faster weave extraction
448
                        assert v in dset
0.1.48 by Martin Pool
Basic parsing of delete instructions.
449
                        dset.remove(v)
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
450
            else:
451
                assert isinstance(l, basestring)
894 by Martin Pool
- small optimization for weave extract
452
                if isactive is None:
453
                    isactive = (not dset) and istack and (istack[-1] in included)
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
454
                if isactive:
931 by Martin Pool
- experiment with making Weave._extract() return a list, not a generator - slightly faster
455
                    result.append((istack[-1], lineno, l))
0.1.39 by Martin Pool
Change to a more realistic weave structure which can represent insertions and
456
            lineno += 1
0.1.7 by Martin Pool
Add trivial annotate text
457
0.1.46 by Martin Pool
More constraints on structure of weave, and checks that they work
458
        if istack:
0.1.63 by Martin Pool
Abbreviate WeaveFormatError in some code
459
            raise WFE("unclosed insertion blocks at end of weave",
0.1.47 by Martin Pool
New WeaveError and WeaveFormatError rather than assertions.
460
                                   istack)
0.1.48 by Martin Pool
Basic parsing of delete instructions.
461
        if dset:
0.1.63 by Martin Pool
Abbreviate WeaveFormatError in some code
462
            raise WFE("unclosed deletion blocks at end of weave",
0.1.48 by Martin Pool
Basic parsing of delete instructions.
463
                                   dset)
0.1.40 by Martin Pool
Add test for extracting from weave with nested insertions
464
931 by Martin Pool
- experiment with making Weave._extract() return a list, not a generator - slightly faster
465
        return result
466
    
467
0.1.7 by Martin Pool
Add trivial annotate text
468
0.1.78 by Martin Pool
Rename Weave.get_included to inclusions and getiter to get_iter
469
    def get_iter(self, version):
0.1.5 by Martin Pool
Add test for storing two text versions.
470
        """Yield lines for the specified version."""
893 by Martin Pool
- Refactor weave calculation of inclusions
471
        for origin, lineno, line in self._extract([version]):
0.1.8 by Martin Pool
Unify get/annotate code
472
            yield line
0.1.5 by Martin Pool
Add test for storing two text versions.
473
474
0.1.4 by Martin Pool
Start indexing knits by both integer and version string.
475
    def get(self, index):
0.1.78 by Martin Pool
Rename Weave.get_included to inclusions and getiter to get_iter
476
        return list(self.get_iter(index))
0.1.1 by Martin Pool
Check in old existing knit code.
477
478
0.1.95 by Martin Pool
- preliminary merge conflict detection
479
    def mash_iter(self, included):
0.1.65 by Martin Pool
Add Weave.merge_iter to get automerged lines
480
        """Return composed version of multiple included versions."""
893 by Martin Pool
- Refactor weave calculation of inclusions
481
        for origin, lineno, text in self._extract(included):
0.1.65 by Martin Pool
Add Weave.merge_iter to get automerged lines
482
            yield text
483
484
0.1.11 by Martin Pool
Add Knit.dump method
485
    def dump(self, to_file):
486
        from pprint import pprint
0.1.38 by Martin Pool
Rename knit to weave. (I don't think there's an existing module called weave.)
487
        print >>to_file, "Weave._l = ",
0.1.11 by Martin Pool
Add Knit.dump method
488
        pprint(self._l, to_file)
0.1.38 by Martin Pool
Rename knit to weave. (I don't think there's an existing module called weave.)
489
        print >>to_file, "Weave._v = ",
0.1.18 by Martin Pool
Better Knit.dump method
490
        pprint(self._v, to_file)
0.1.11 by Martin Pool
Add Knit.dump method
491
492
0.1.91 by Martin Pool
Update Weave.check
493
494
    def numversions(self):
495
        l = len(self._v)
496
        assert l == len(self._sha1s)
497
        return l
498
499
894 by Martin Pool
- small optimization for weave extract
500
    def check(self, progress_bar=None):
0.1.91 by Martin Pool
Update Weave.check
501
        # check no circular inclusions
502
        for version in range(self.numversions()):
503
            inclusions = list(self._v[version])
504
            if inclusions:
505
                inclusions.sort()
506
                if inclusions[-1] >= version:
0.1.47 by Martin Pool
New WeaveError and WeaveFormatError rather than assertions.
507
                    raise WeaveFormatError("invalid included version %d for index %d"
0.1.91 by Martin Pool
Update Weave.check
508
                                           % (inclusions[-1], version))
509
510
        # try extracting all versions; this is a bit slow and parallel
511
        # extraction could be used
512
        import sha
894 by Martin Pool
- small optimization for weave extract
513
        nv = self.numversions()
514
        for version in range(nv):
515
            if progress_bar:
516
                progress_bar.update('checking text', version, nv)
0.1.91 by Martin Pool
Update Weave.check
517
            s = sha.new()
518
            for l in self.get_iter(version):
519
                s.update(l)
520
            hd = s.hexdigest()
521
            expected = self._sha1s[version]
522
            if hd != expected:
523
                raise WeaveError("mismatched sha1 for version %d; "
524
                                 "got %s, expected %s"
525
                                 % (version, hd, expected))
0.1.18 by Martin Pool
Better Knit.dump method
526
881 by Martin Pool
- faster weave extraction
527
        # TODO: check insertions are properly nested, that there are
528
        # no lines outside of insertion blocks, that deletions are
529
        # properly paired, etc.
530
0.1.13 by Martin Pool
Knit structure now allows for versions to include the lines present in other
531
532
0.1.95 by Martin Pool
- preliminary merge conflict detection
533
    def merge(self, merge_versions):
534
        """Automerge and mark conflicts between versions.
535
536
        This returns a sequence, each entry describing alternatives
537
        for a chunk of the file.  Each of the alternatives is given as
538
        a list of lines.
539
540
        If there is a chunk of the file where there's no diagreement,
541
        only one alternative is given.
542
        """
543
544
        # approach: find the included versions common to all the
545
        # merged versions
546
        raise NotImplementedError()
547
548
549
0.1.21 by Martin Pool
Start computing a delta to insert a new revision
550
    def _delta(self, included, lines):
551
        """Return changes from basis to new revision.
552
553
        The old text for comparison is the union of included revisions.
554
555
        This is used in inserting a new text.
0.1.22 by Martin Pool
Calculate delta for new versions relative to a set of parent versions.
556
0.1.55 by Martin Pool
doc
557
        Delta is returned as a sequence of
558
        (weave1, weave2, newlines).
559
560
        This indicates that weave1:weave2 of the old weave should be
0.1.22 by Martin Pool
Calculate delta for new versions relative to a set of parent versions.
561
        replaced by the sequence of lines in newlines.  Note that
562
        these line numbers are positions in the total weave and don't
563
        correspond to the lines in any extracted version, or even the
564
        extracted union of included versions.
565
566
        If line1=line2, this is a pure insert; if newlines=[] this is a
567
        pure delete.  (Similar to difflib.)
0.1.21 by Martin Pool
Start computing a delta to insert a new revision
568
        """
569
0.1.1 by Martin Pool
Check in old existing knit code.
570
918 by Martin Pool
- start doing new weave-merge algorithm
571
            
572
    def plan_merge(self, ver_a, ver_b):
573
        """Return pseudo-annotation indicating how the two versions merge.
574
575
        This is computed between versions a and b and their common
576
        base.
577
578
        Weave lines present in none of them are skipped entirely.
579
        """
926 by Martin Pool
- update more weave code to use intsets
580
        inc_a = self.inclusions([ver_a])
581
        inc_b = self.inclusions([ver_b])
918 by Martin Pool
- start doing new weave-merge algorithm
582
        inc_c = inc_a & inc_b
583
584
        for lineno, insert, deleteset, line in self._walk():
585
            if deleteset & inc_c:
586
                # killed in parent; can't be in either a or b
587
                # not relevant to our work
588
                yield 'killed-base', line
926 by Martin Pool
- update more weave code to use intsets
589
            elif insert in inc_c:
918 by Martin Pool
- start doing new weave-merge algorithm
590
                # was inserted in base
591
                killed_a = bool(deleteset & inc_a)
592
                killed_b = bool(deleteset & inc_b)
593
                if killed_a and killed_b:
594
                    yield 'killed-both', line
595
                elif killed_a:
596
                    yield 'killed-a', line
597
                elif killed_b:
598
                    yield 'killed-b', line
599
                else:
600
                    yield 'unchanged', line
926 by Martin Pool
- update more weave code to use intsets
601
            elif insert in inc_a:
918 by Martin Pool
- start doing new weave-merge algorithm
602
                if deleteset & inc_a:
603
                    yield 'ghost-a', line
604
                else:
605
                    # new in A; not in B
606
                    yield 'new-a', line
926 by Martin Pool
- update more weave code to use intsets
607
            elif insert in inc_b:
918 by Martin Pool
- start doing new weave-merge algorithm
608
                if deleteset & inc_b:
609
                    yield 'ghost-b', line
610
                else:
611
                    yield 'new-b', line
612
            else:
613
                # not in either revision
614
                yield 'irrelevant', line
615
919 by Martin Pool
- more development of weave-merge
616
        yield 'unchanged', ''           # terminator
617
618
619
620
    def weave_merge(self, plan):
621
        lines_a = []
622
        lines_b = []
623
        ch_a = ch_b = False
624
625
        for state, line in plan:
626
            if state == 'unchanged' or state == 'killed-both':
627
                # resync and flush queued conflicts changes if any
628
                if not lines_a and not lines_b:
629
                    pass
630
                elif ch_a and not ch_b:
631
                    # one-sided change:                    
632
                    for l in lines_a: yield l
633
                elif ch_b and not ch_a:
634
                    for l in lines_b: yield l
635
                elif lines_a == lines_b:
636
                    for l in lines_a: yield l
637
                else:
638
                    yield '<<<<\n'
639
                    for l in lines_a: yield l
640
                    yield '====\n'
641
                    for l in lines_b: yield l
642
                    yield '>>>>\n'
643
644
                del lines_a[:]
645
                del lines_b[:]
646
                ch_a = ch_b = False
647
                
648
            if state == 'unchanged':
649
                if line:
650
                    yield line
651
            elif state == 'killed-a':
652
                ch_a = True
653
                lines_b.append(line)
654
            elif state == 'killed-b':
655
                ch_b = True
656
                lines_a.append(line)
657
            elif state == 'new-a':
658
                ch_a = True
659
                lines_a.append(line)
660
            elif state == 'new-b':
661
                ch_b = True
662
                lines_b.append(line)
663
            else:
920 by Martin Pool
- add more test cases for weave_merge
664
                assert state in ('irrelevant', 'ghost-a', 'ghost-b', 'killed-base',
665
                                 'killed-both'), \
919 by Martin Pool
- more development of weave-merge
666
                       state
667
668
                
669
670
918 by Martin Pool
- start doing new weave-merge algorithm
671
672
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
673
0.1.88 by Martin Pool
Add weave info command.
674
def weave_info(filename, out):
675
    """Show some text information about the weave."""
676
    from weavefile import read_weave
677
    wf = file(filename, 'rb')
678
    w = read_weave(wf)
679
    # FIXME: doesn't work on pipes
680
    weave_size = wf.tell()
681
    print >>out, "weave file size %d bytes" % weave_size
682
    print >>out, "weave contains %d versions" % len(w._v)
683
684
    total = 0
870 by Martin Pool
- better weave info display
685
    print '%6s %6s %8s %40s %20s' % ('ver', 'lines', 'bytes', 'sha1', 'parents')
686
    for i in (6, 6, 8, 40, 20):
687
        print '-' * i,
688
    print
0.1.88 by Martin Pool
Add weave info command.
689
    for i in range(len(w._v)):
690
        text = w.get(i)
691
        lines = len(text)
692
        bytes = sum((len(a) for a in text))
0.1.91 by Martin Pool
Update Weave.check
693
        sha1 = w._sha1s[i]
870 by Martin Pool
- better weave info display
694
        print '%6d %6d %8d %40s' % (i, lines, bytes, sha1),
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
695
        for pv in w._v[i]:
696
            print pv,
697
        print
0.1.88 by Martin Pool
Add weave info command.
698
        total += bytes
699
700
    print >>out, "versions total %d bytes" % total
701
    print >>out, "compression ratio %.3f" % (float(total)/float(weave_size))
869 by Martin Pool
- more weave.py command line options
702
703
704
def usage():
871 by Martin Pool
- add command for merge-based weave
705
    print """bzr weave tool
706
707
Experimental tool for weave algorithm.
708
869 by Martin Pool
- more weave.py command line options
709
usage:
710
    weave init WEAVEFILE
711
        Create an empty weave file
712
    weave get WEAVEFILE VERSION
713
        Write out specified version.
714
    weave check WEAVEFILE
715
        Check consistency of all versions.
716
    weave info WEAVEFILE
717
        Display table of contents.
718
    weave add WEAVEFILE [BASE...] < NEWTEXT
719
        Add NEWTEXT, with specified parent versions.
720
    weave annotate WEAVEFILE VERSION
721
        Display origin of each line.
722
    weave mash WEAVEFILE VERSION...
723
        Display composite of all selected versions.
724
    weave merge WEAVEFILE VERSION1 VERSION2 > OUT
725
        Auto-merge two versions and display conflicts.
871 by Martin Pool
- add command for merge-based weave
726
727
example:
728
729
    % weave init foo.weave
730
    % vi foo.txt
731
    % weave add foo.weave < foo.txt
732
    added version 0
733
734
    (create updated version)
735
    % vi foo.txt
736
    % weave get foo.weave 0 | diff -u - foo.txt
737
    % weave add foo.weave 0 < foo.txt
738
    added version 1
739
740
    % weave get foo.weave 0 > foo.txt       (create forked version)
741
    % vi foo.txt
742
    % weave add foo.weave 0 < foo.txt
743
    added version 2
744
745
    % weave merge foo.weave 1 2 > foo.txt   (merge them)
746
    % vi foo.txt                            (resolve conflicts)
747
    % weave add foo.weave 1 2 < foo.txt     (commit merged version)     
748
    
869 by Martin Pool
- more weave.py command line options
749
"""
0.1.88 by Martin Pool
Add weave info command.
750
    
751
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
752
753
def main(argv):
754
    import sys
755
    import os
869 by Martin Pool
- more weave.py command line options
756
    from weavefile import write_weave, read_weave
894 by Martin Pool
- small optimization for weave extract
757
    from bzrlib.progress import ProgressBar
758
759
    #import psyco
760
    #psyco.full()
761
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
762
    cmd = argv[1]
869 by Martin Pool
- more weave.py command line options
763
764
    def readit():
765
        return read_weave(file(argv[2], 'rb'))
766
    
767
    if cmd == 'help':
768
        usage()
769
    elif cmd == 'add':
770
        w = readit()
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
771
        # at the moment, based on everything in the file
869 by Martin Pool
- more weave.py command line options
772
        parents = map(int, argv[3:])
0.1.72 by Martin Pool
Go back to weave lines normally having newlines at the end.
773
        lines = sys.stdin.readlines()
0.1.69 by Martin Pool
Simple text-based format for storing weaves, cleaner than
774
        ver = w.add(parents, lines)
869 by Martin Pool
- more weave.py command line options
775
        write_weave(w, file(argv[2], 'wb'))
776
        print 'added version %d' % ver
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
777
    elif cmd == 'init':
778
        fn = argv[2]
779
        if os.path.exists(fn):
780
            raise IOError("file exists")
781
        w = Weave()
869 by Martin Pool
- more weave.py command line options
782
        write_weave(w, file(fn, 'wb'))
783
    elif cmd == 'get': # get one version
784
        w = readit()
0.1.94 by Martin Pool
Fix get_iter call
785
        sys.stdout.writelines(w.get_iter(int(argv[3])))
869 by Martin Pool
- more weave.py command line options
786
        
787
    elif cmd == 'mash': # get composite
788
        w = readit()
789
        sys.stdout.writelines(w.mash_iter(map(int, argv[3:])))
790
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
791
    elif cmd == 'annotate':
869 by Martin Pool
- more weave.py command line options
792
        w = readit()
0.1.72 by Martin Pool
Go back to weave lines normally having newlines at the end.
793
        # newline is added to all lines regardless; too hard to get
794
        # reasonable formatting otherwise
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
795
        lasto = None
796
        for origin, text in w.annotate(int(argv[3])):
0.1.72 by Martin Pool
Go back to weave lines normally having newlines at the end.
797
            text = text.rstrip('\r\n')
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
798
            if origin == lasto:
799
                print '      | %s' % (text)
800
            else:
801
                print '%5d | %s' % (origin, text)
802
                lasto = origin
871 by Martin Pool
- add command for merge-based weave
803
                
0.1.88 by Martin Pool
Add weave info command.
804
    elif cmd == 'info':
805
        weave_info(argv[2], sys.stdout)
871 by Martin Pool
- add command for merge-based weave
806
        
0.1.91 by Martin Pool
Update Weave.check
807
    elif cmd == 'check':
869 by Martin Pool
- more weave.py command line options
808
        w = readit()
894 by Martin Pool
- small optimization for weave extract
809
        pb = ProgressBar()
810
        w.check(pb)
811
        pb.clear()
938 by Martin Pool
- various optimizations to weave add code
812
        print '%d versions ok' % w.numversions()
871 by Martin Pool
- add command for merge-based weave
813
892 by Martin Pool
- weave stores only direct parents, and calculates and memoizes expansion as needed
814
    elif cmd == 'inclusions':
815
        w = readit()
816
        print ' '.join(map(str, w.inclusions([int(argv[3])])))
817
818
    elif cmd == 'parents':
819
        w = readit()
820
        print ' '.join(map(str, w._v[int(argv[3])]))
821
918 by Martin Pool
- start doing new weave-merge algorithm
822
    elif cmd == 'plan-merge':
823
        w = readit()
824
        for state, line in w.plan_merge(int(argv[3]), int(argv[4])):
919 by Martin Pool
- more development of weave-merge
825
            if line:
826
                print '%14s | %s' % (state, line),
918 by Martin Pool
- start doing new weave-merge algorithm
827
871 by Martin Pool
- add command for merge-based weave
828
    elif cmd == 'merge':
919 by Martin Pool
- more development of weave-merge
829
        w = readit()
830
        p = w.plan_merge(int(argv[3]), int(argv[4]))
831
        sys.stdout.writelines(w.weave_merge(p))
832
            
833
    elif cmd == 'mash-merge':
871 by Martin Pool
- add command for merge-based weave
834
        if len(argv) != 5:
835
            usage()
836
            return 1
837
838
        w = readit()
839
        v1, v2 = map(int, argv[3:5])
840
841
        basis = w.inclusions([v1]).intersection(w.inclusions([v2]))
842
843
        base_lines = list(w.mash_iter(basis))
844
        a_lines = list(w.get(v1))
845
        b_lines = list(w.get(v2))
846
847
        from bzrlib.merge3 import Merge3
848
        m3 = Merge3(base_lines, a_lines, b_lines)
849
850
        name_a = 'version %d' % v1
851
        name_b = 'version %d' % v2
852
        sys.stdout.writelines(m3.merge_lines(name_a=name_a, name_b=name_b))
0.1.62 by Martin Pool
Lame command-line client for reading and writing weaves.
853
    else:
854
        raise ValueError('unknown command %r' % cmd)
855
    
856
857
if __name__ == '__main__':
858
    import sys
859
    sys.exit(main(sys.argv))