bzr branch
http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
1  | 
#! /usr/bin/env python
 | 
2  | 
||
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
3  | 
# (C) 2005 Canonical Ltd
 | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
4  | 
|
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
5  | 
# based on an idea by Matt Mackall
 | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
6  | 
# modified to squish into bzr by Martin Pool
 | 
7  | 
||
8  | 
# This program is free software; you can redistribute it and/or modify
 | 
|
9  | 
# it under the terms of the GNU General Public License as published by
 | 
|
10  | 
# the Free Software Foundation; either version 2 of the License, or
 | 
|
11  | 
# (at your option) any later version.
 | 
|
12  | 
||
13  | 
# This program is distributed in the hope that it will be useful,
 | 
|
14  | 
# but WITHOUT ANY WARRANTY; without even the implied warranty of
 | 
|
15  | 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 | 
|
16  | 
# GNU General Public License for more details.
 | 
|
17  | 
||
18  | 
# You should have received a copy of the GNU General Public License
 | 
|
19  | 
# along with this program; if not, write to the Free Software
 | 
|
20  | 
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 | 
|
21  | 
||
22  | 
||
23  | 
"""Packed file revision storage.
 | 
|
24  | 
||
25  | 
A Revfile holds the text history of a particular source file, such
 | 
|
26  | 
as Makefile.  It can represent a tree of text versions for that
 | 
|
27  | 
file, allowing for microbranches within a single repository.
 | 
|
28  | 
||
29  | 
This is stored on disk as two files: an index file, and a data file.
 | 
|
30  | 
The index file is short and always read completely into memory; the
 | 
|
31  | 
data file is much longer and only the relevant bits of it,
 | 
|
32  | 
identified by the index file, need to be read.
 | 
|
33  | 
||
34  | 
Each text version is identified by the SHA-1 of the full text of
 | 
|
35  | 
that version.  It also has a sequence number within the file.
 | 
|
36  | 
||
37  | 
The index file has a short header and then a sequence of fixed-length
 | 
|
38  | 
records:
 | 
|
39  | 
||
40  | 
* byte[20]    SHA-1 of text (as binary, not hex)
 | 
|
41  | 
* uint32      sequence number this is based on, or -1 for full text
 | 
|
42  | 
* uint32      flags: 1=zlib compressed
 | 
|
43  | 
* uint32      offset in text file of start
 | 
|
44  | 
* uint32      length of compressed delta in text file
 | 
|
45  | 
* uint32[3]   reserved
 | 
|
46  | 
||
47  | 
total 48 bytes.
 | 
|
48  | 
||
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
49  | 
The header is also 48 bytes for tidyness and easy calculation.
 | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
50  | 
|
51  | 
Both the index and the text are only ever appended to; a consequence
 | 
|
52  | 
is that sequence numbers are stable references.  But not every
 | 
|
53  | 
repository in the world will assign the same sequence numbers,
 | 
|
54  | 
therefore the SHA-1 is the only universally unique reference.
 | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
55  | 
|
56  | 
This is meant to scale to hold 100,000 revisions of a single file, by
 | 
|
57  | 
which time the index file will be ~4.8MB and a bit big to read
 | 
|
58  | 
sequentially.
 | 
|
59  | 
||
60  | 
Some of the reserved fields could be used to implement a (semi?)
 | 
|
61  | 
balanced tree indexed by SHA1 so we can much more efficiently find the
 | 
|
62  | 
index associated with a particular hash.  For 100,000 revs we would be
 | 
|
63  | 
able to find it in about 17 random reads, which is not too bad.
 | 
|
64  | 
||
65  | 
This performs pretty well except when trying to calculate deltas of
 | 
|
66  | 
really large files.  For that the main thing would be to plug in
 | 
|
67  | 
something faster than difflib, which is after all pure Python.
 | 
|
68  | 
Another approach is to just store the gzipped full text of big files,
 | 
|
69  | 
though perhaps that's too perverse?
 | 
|
70  | 
||
| 
231
by mbp at sourcefrog
 revfile doc  | 
71  | 
The iter method here will generally read through the whole index file
 | 
72  | 
in one go.  With readahead in the kernel and python/libc (typically
 | 
|
73  | 
128kB) this means that there should be no seeks and often only one
 | 
|
74  | 
read() call to get everything into memory.
 | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
75  | 
"""
 | 
76  | 
||
77  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
78  | 
# TODO: Something like pread() would make this slightly simpler and
 | 
79  | 
# perhaps more efficient.
 | 
|
80  | 
||
| 
219
by mbp at sourcefrog
 todo  | 
81  | 
# TODO: Could also try to mmap things...  Might be faster for the
 | 
82  | 
# index in particular?
 | 
|
83  | 
||
84  | 
# TODO: Some kind of faster lookup of SHAs?  The bad thing is that probably means
 | 
|
85  | 
# rewriting existing records, which is not so nice.
 | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
86  | 
|
| 
224
by mbp at sourcefrog
 doc  | 
87  | 
# TODO: Something to check that regions identified in the index file
 | 
88  | 
# completely butt up and do not overlap.  Strictly it's not a problem
 | 
|
89  | 
# if there are gaps and that can happen if we're interrupted while
 | 
|
90  | 
# writing to the datafile.  Overlapping would be very bad though.
 | 
|
91  | 
||
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
92  | 
# TODO: Shouldn't need to lock if we always write in append mode and
 | 
93  | 
# then ftell after writing to see where it went.  In any case we
 | 
|
94  | 
# assume the whole branch is protected by a lock.
 | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
95  | 
|
96  | 
import sys, zlib, struct, mdiff, stat, os, sha  | 
|
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
97  | 
from binascii import hexlify, unhexlify  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
98  | 
|
99  | 
_RECORDSIZE = 48  | 
|
100  | 
||
101  | 
_HEADER = "bzr revfile v1\n"  | 
|
102  | 
_HEADER = _HEADER + ('\xff' * (_RECORDSIZE - len(_HEADER)))  | 
|
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
103  | 
_NO_RECORD = 0xFFFFFFFFL  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
104  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
105  | 
# fields in the index record
 | 
106  | 
I_SHA = 0  | 
|
107  | 
I_BASE = 1  | 
|
108  | 
I_FLAGS = 2  | 
|
109  | 
I_OFFSET = 3  | 
|
110  | 
I_LEN = 4  | 
|
111  | 
||
| 
207
by mbp at sourcefrog
 Revfile: compress data going into datafile if that would be worthwhile  | 
112  | 
FL_GZIP = 1  | 
113  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
114  | 
# maximum number of patches in a row before recording a whole text.
 | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
115  | 
CHAIN_LIMIT = 10  | 
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
116  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
117  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
118  | 
class RevfileError(Exception):  | 
119  | 
    pass
 | 
|
120  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
121  | 
class LimitHitException(Exception):  | 
122  | 
    pass
 | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
123  | 
|
| 
558
by Martin Pool
 - All top-level classes inherit from object  | 
124  | 
class Revfile(object):  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
125  | 
def __init__(self, basename, mode):  | 
| 
202
by mbp at sourcefrog
 Revfile:  | 
126  | 
        # TODO: Lock file  while open
 | 
127  | 
||
128  | 
        # TODO: advise of random access
 | 
|
129  | 
||
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
130  | 
self.basename = basename  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
131  | 
|
132  | 
if mode not in ['r', 'w']:  | 
|
133  | 
raise RevfileError("invalid open mode %r" % mode)  | 
|
134  | 
self.mode = mode  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
135  | 
|
136  | 
idxname = basename + '.irev'  | 
|
137  | 
dataname = basename + '.drev'  | 
|
138  | 
||
139  | 
idx_exists = os.path.exists(idxname)  | 
|
140  | 
data_exists = os.path.exists(dataname)  | 
|
141  | 
||
142  | 
if idx_exists != data_exists:  | 
|
143  | 
raise RevfileError("half-assed revfile")  | 
|
144  | 
||
145  | 
if not idx_exists:  | 
|
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
146  | 
if mode == 'r':  | 
147  | 
raise RevfileError("Revfile %r does not exist" % basename)  | 
|
148  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
149  | 
self.idxfile = open(idxname, 'w+b')  | 
150  | 
self.datafile = open(dataname, 'w+b')  | 
|
151  | 
||
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
152  | 
self.idxfile.write(_HEADER)  | 
153  | 
self.idxfile.flush()  | 
|
154  | 
else:  | 
|
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
155  | 
if mode == 'r':  | 
156  | 
diskmode = 'rb'  | 
|
157  | 
else:  | 
|
158  | 
diskmode = 'r+b'  | 
|
159  | 
||
160  | 
self.idxfile = open(idxname, diskmode)  | 
|
161  | 
self.datafile = open(dataname, diskmode)  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
162  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
163  | 
h = self.idxfile.read(_RECORDSIZE)  | 
164  | 
if h != _HEADER:  | 
|
165  | 
raise RevfileError("bad header %r in index of %r"  | 
|
166  | 
% (h, self.basename))  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
167  | 
|
168  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
169  | 
def _check_index(self, idx):  | 
170  | 
if idx < 0 or idx > len(self):  | 
|
171  | 
raise RevfileError("invalid index %r" % idx)  | 
|
172  | 
||
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
173  | 
def _check_write(self):  | 
174  | 
if self.mode != 'w':  | 
|
175  | 
raise RevfileError("%r is open readonly" % self.basename)  | 
|
176  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
177  | 
|
178  | 
def find_sha(self, s):  | 
|
179  | 
assert isinstance(s, str)  | 
|
180  | 
assert len(s) == 20  | 
|
181  | 
||
182  | 
for idx, idxrec in enumerate(self):  | 
|
183  | 
if idxrec[I_SHA] == s:  | 
|
184  | 
return idx  | 
|
185  | 
else:  | 
|
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
186  | 
return _NO_RECORD  | 
187  | 
||
188  | 
||
189  | 
||
190  | 
def _add_compressed(self, text_sha, data, base, compress):  | 
|
191  | 
        # well, maybe compress
 | 
|
192  | 
flags = 0  | 
|
193  | 
if compress:  | 
|
194  | 
data_len = len(data)  | 
|
195  | 
if data_len > 50:  | 
|
196  | 
                # don't do compression if it's too small; it's unlikely to win
 | 
|
197  | 
                # enough to be worthwhile
 | 
|
198  | 
compr_data = zlib.compress(data)  | 
|
199  | 
compr_len = len(compr_data)  | 
|
200  | 
if compr_len < data_len:  | 
|
201  | 
data = compr_data  | 
|
202  | 
flags = FL_GZIP  | 
|
203  | 
                    ##print '- compressed %d -> %d, %.1f%%' \
 | 
|
204  | 
                    ##      % (data_len, compr_len, float(compr_len)/float(data_len) * 100.0)
 | 
|
205  | 
return self._add_raw(text_sha, data, base, flags)  | 
|
206  | 
||
207  | 
||
208  | 
||
209  | 
def _add_raw(self, text_sha, data, base, flags):  | 
|
| 
207
by mbp at sourcefrog
 Revfile: compress data going into datafile if that would be worthwhile  | 
210  | 
"""Add pre-processed data, can be either full text or delta.  | 
211  | 
||
212  | 
        This does the compression if that makes sense."""
 | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
213  | 
idx = len(self)  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
214  | 
self.datafile.seek(0, 2) # to end  | 
215  | 
self.idxfile.seek(0, 2)  | 
|
| 
202
by mbp at sourcefrog
 Revfile:  | 
216  | 
assert self.idxfile.tell() == _RECORDSIZE * (idx + 1)  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
217  | 
data_offset = self.datafile.tell()  | 
218  | 
||
| 
254
by Martin Pool
 - Doc cleanups from Magnus Therning  | 
219  | 
assert isinstance(data, str) # not unicode or anything weird  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
220  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
221  | 
self.datafile.write(data)  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
222  | 
self.datafile.flush()  | 
223  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
224  | 
assert isinstance(text_sha, str)  | 
225  | 
entry = text_sha  | 
|
226  | 
entry += struct.pack(">IIII12x", base, flags, data_offset, len(data))  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
227  | 
assert len(entry) == _RECORDSIZE  | 
228  | 
||
229  | 
self.idxfile.write(entry)  | 
|
230  | 
self.idxfile.flush()  | 
|
231  | 
||
232  | 
return idx  | 
|
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
233  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
234  | 
|
235  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
236  | 
def _add_full_text(self, text, text_sha, compress):  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
237  | 
"""Add a full text to the file.  | 
238  | 
||
239  | 
        This is not compressed against any reference version.
 | 
|
240  | 
||
241  | 
        Returns the index for that text."""
 | 
|
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
242  | 
return self._add_compressed(text_sha, text, _NO_RECORD, compress)  | 
243  | 
||
244  | 
||
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
245  | 
    # NOT USED
 | 
246  | 
def _choose_base(self, seed, base):  | 
|
247  | 
while seed & 3 == 3:  | 
|
248  | 
if base == _NO_RECORD:  | 
|
249  | 
return _NO_RECORD  | 
|
250  | 
idxrec = self[base]  | 
|
251  | 
if idxrec[I_BASE] == _NO_RECORD:  | 
|
252  | 
return base  | 
|
253  | 
||
254  | 
base = idxrec[I_BASE]  | 
|
255  | 
seed >>= 2  | 
|
256  | 
||
257  | 
return base # relative to this full text  | 
|
258  | 
||
259  | 
||
260  | 
||
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
261  | 
def _add_delta(self, text, text_sha, base, compress):  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
262  | 
"""Add a text stored relative to a previous text."""  | 
263  | 
self._check_index(base)  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
264  | 
|
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
265  | 
try:  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
266  | 
base_text = self.get(base, CHAIN_LIMIT)  | 
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
267  | 
except LimitHitException:  | 
268  | 
return self._add_full_text(text, text_sha, compress)  | 
|
269  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
270  | 
data = mdiff.bdiff(base_text, text)  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
271  | 
|
272  | 
||
273  | 
if True: # paranoid early check for bad diff  | 
|
274  | 
result = mdiff.bpatch(base_text, data)  | 
|
275  | 
assert result == text  | 
|
276  | 
||
| 
213
by mbp at sourcefrog
 Revfile: don't store deltas if they'd be larger than just storing the whole text  | 
277  | 
|
278  | 
        # If the delta is larger than the text, we might as well just
 | 
|
279  | 
        # store the text.  (OK, the delta might be more compressible,
 | 
|
280  | 
        # but the overhead of applying it probably still makes it
 | 
|
| 
214
by mbp at sourcefrog
 doc  | 
281  | 
        # bad, and I don't want to compress both of them to find out.)
 | 
| 
213
by mbp at sourcefrog
 Revfile: don't store deltas if they'd be larger than just storing the whole text  | 
282  | 
if len(data) >= len(text):  | 
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
283  | 
return self._add_full_text(text, text_sha, compress)  | 
| 
213
by mbp at sourcefrog
 Revfile: don't store deltas if they'd be larger than just storing the whole text  | 
284  | 
else:  | 
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
285  | 
return self._add_compressed(text_sha, data, base, compress)  | 
286  | 
||
287  | 
||
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
288  | 
def add(self, text, base=None, compress=True):  | 
| 
215
by mbp at sourcefrog
 Doc  | 
289  | 
"""Add a new text to the revfile.  | 
290  | 
||
291  | 
        If the text is already present them its existing id is
 | 
|
292  | 
        returned and the file is not changed.
 | 
|
293  | 
||
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
294  | 
        If compress is true then gzip compression will be used if it
 | 
295  | 
        reduces the size.
 | 
|
296  | 
||
| 
215
by mbp at sourcefrog
 Doc  | 
297  | 
        If a base index is specified, that text *may* be used for
 | 
298  | 
        delta compression of the new text.  Delta compression will
 | 
|
299  | 
        only be used if it would be a size win and if the existing
 | 
|
300  | 
        base is not at too long of a delta chain already.
 | 
|
301  | 
        """
 | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
302  | 
if base == None:  | 
303  | 
base = _NO_RECORD  | 
|
304  | 
||
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
305  | 
self._check_write()  | 
306  | 
||
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
307  | 
text_sha = sha.new(text).digest()  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
308  | 
|
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
309  | 
idx = self.find_sha(text_sha)  | 
310  | 
if idx != _NO_RECORD:  | 
|
| 
215
by mbp at sourcefrog
 Doc  | 
311  | 
            # TODO: Optional paranoid mode where we read out that record and make sure
 | 
312  | 
            # it's the same, in case someone ever breaks SHA-1.
 | 
|
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
313  | 
return idx # already present  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
314  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
315  | 
        # base = self._choose_base(ord(text_sha[0]), base)
 | 
316  | 
||
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
317  | 
if base == _NO_RECORD:  | 
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
318  | 
return self._add_full_text(text, text_sha, compress)  | 
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
319  | 
else:  | 
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
320  | 
return self._add_delta(text, text_sha, base, compress)  | 
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
321  | 
|
322  | 
||
323  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
324  | 
def get(self, idx, recursion_limit=None):  | 
325  | 
"""Retrieve text of a previous revision.  | 
|
326  | 
||
327  | 
        If recursion_limit is an integer then walk back at most that
 | 
|
328  | 
        many revisions and then raise LimitHitException, indicating
 | 
|
329  | 
        that we ought to record a new file text instead of another
 | 
|
330  | 
        delta.  Don't use this when trying to get out an existing
 | 
|
331  | 
        revision."""
 | 
|
332  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
333  | 
idxrec = self[idx]  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
334  | 
base = idxrec[I_BASE]  | 
335  | 
if base == _NO_RECORD:  | 
|
336  | 
text = self._get_full_text(idx, idxrec)  | 
|
337  | 
else:  | 
|
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
338  | 
text = self._get_patched(idx, idxrec, recursion_limit)  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
339  | 
|
340  | 
if sha.new(text).digest() != idxrec[I_SHA]:  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
341  | 
raise RevfileError("corrupt SHA-1 digest on record %d in %s"  | 
342  | 
% (idx, self.basename))  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
343  | 
|
344  | 
return text  | 
|
345  | 
||
346  | 
||
347  | 
||
348  | 
def _get_raw(self, idx, idxrec):  | 
|
| 
209
by mbp at sourcefrog
 Revfile: handle decompression  | 
349  | 
flags = idxrec[I_FLAGS]  | 
350  | 
if flags & ~FL_GZIP:  | 
|
351  | 
raise RevfileError("unsupported index flags %#x on index %d"  | 
|
352  | 
% (flags, idx))  | 
|
353  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
354  | 
l = idxrec[I_LEN]  | 
355  | 
if l == 0:  | 
|
356  | 
return ''  | 
|
357  | 
||
358  | 
self.datafile.seek(idxrec[I_OFFSET])  | 
|
359  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
360  | 
data = self.datafile.read(l)  | 
361  | 
if len(data) != l:  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
362  | 
raise RevfileError("short read %d of %d "  | 
363  | 
"getting text for record %d in %r"  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
364  | 
% (len(data), l, idx, self.basename))  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
365  | 
|
| 
209
by mbp at sourcefrog
 Revfile: handle decompression  | 
366  | 
if flags & FL_GZIP:  | 
367  | 
data = zlib.decompress(data)  | 
|
368  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
369  | 
return data  | 
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
370  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
371  | 
|
372  | 
def _get_full_text(self, idx, idxrec):  | 
|
373  | 
assert idxrec[I_BASE] == _NO_RECORD  | 
|
374  | 
||
375  | 
text = self._get_raw(idx, idxrec)  | 
|
376  | 
||
377  | 
return text  | 
|
378  | 
||
379  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
380  | 
def _get_patched(self, idx, idxrec, recursion_limit):  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
381  | 
base = idxrec[I_BASE]  | 
382  | 
assert base >= 0  | 
|
383  | 
assert base < idx # no loops!  | 
|
384  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
385  | 
if recursion_limit == None:  | 
386  | 
sub_limit = None  | 
|
387  | 
else:  | 
|
388  | 
sub_limit = recursion_limit - 1  | 
|
389  | 
if sub_limit < 0:  | 
|
390  | 
raise LimitHitException()  | 
|
391  | 
||
392  | 
base_text = self.get(base, sub_limit)  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
393  | 
patch = self._get_raw(idx, idxrec)  | 
394  | 
||
395  | 
text = mdiff.bpatch(base_text, patch)  | 
|
396  | 
||
397  | 
return text  | 
|
398  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
399  | 
|
400  | 
||
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
401  | 
def __len__(self):  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
402  | 
"""Return number of revisions."""  | 
403  | 
l = os.fstat(self.idxfile.fileno())[stat.ST_SIZE]  | 
|
404  | 
if l % _RECORDSIZE:  | 
|
405  | 
raise RevfileError("bad length %d on index of %r" % (l, self.basename))  | 
|
406  | 
if l < _RECORDSIZE:  | 
|
407  | 
raise RevfileError("no header present in index of %r" % (self.basename))  | 
|
408  | 
return int(l / _RECORDSIZE) - 1  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
409  | 
|
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
410  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
411  | 
def __getitem__(self, idx):  | 
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
412  | 
"""Index by sequence id returns the index field"""  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
413  | 
        ## TODO: Can avoid seek if we just moved there...
 | 
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
414  | 
self._seek_index(idx)  | 
| 
230
by mbp at sourcefrog
 Revfile: better __iter__ method that reads the whole index file in one go!  | 
415  | 
idxrec = self._read_next_index()  | 
416  | 
if idxrec == None:  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
417  | 
raise IndexError("no index %d" % idx)  | 
| 
230
by mbp at sourcefrog
 Revfile: better __iter__ method that reads the whole index file in one go!  | 
418  | 
else:  | 
419  | 
return idxrec  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
420  | 
|
421  | 
||
422  | 
def _seek_index(self, idx):  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
423  | 
if idx < 0:  | 
424  | 
raise RevfileError("invalid index %r" % idx)  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
425  | 
self.idxfile.seek((idx + 1) * _RECORDSIZE)  | 
| 
230
by mbp at sourcefrog
 Revfile: better __iter__ method that reads the whole index file in one go!  | 
426  | 
|
427  | 
||
428  | 
||
429  | 
def __iter__(self):  | 
|
430  | 
"""Read back all index records.  | 
|
431  | 
||
432  | 
        Do not seek the index file while this is underway!"""
 | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
433  | 
        ## sys.stderr.write(" ** iter called ** \n")
 | 
| 
230
by mbp at sourcefrog
 Revfile: better __iter__ method that reads the whole index file in one go!  | 
434  | 
self._seek_index(0)  | 
435  | 
while True:  | 
|
436  | 
idxrec = self._read_next_index()  | 
|
437  | 
if not idxrec:  | 
|
438  | 
                break
 | 
|
439  | 
yield idxrec  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
440  | 
|
441  | 
||
442  | 
def _read_next_index(self):  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
443  | 
rec = self.idxfile.read(_RECORDSIZE)  | 
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
444  | 
if not rec:  | 
| 
230
by mbp at sourcefrog
 Revfile: better __iter__ method that reads the whole index file in one go!  | 
445  | 
return None  | 
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
446  | 
elif len(rec) != _RECORDSIZE:  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
447  | 
raise RevfileError("short read of %d bytes getting index %d from %r"  | 
448  | 
% (len(rec), idx, self.basename))  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
449  | 
|
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
450  | 
return struct.unpack(">20sIIII12x", rec)  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
451  | 
|
452  | 
||
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
453  | 
def dump(self, f=sys.stdout):  | 
454  | 
f.write('%-8s %-40s %-8s %-8s %-8s %-8s\n'  | 
|
455  | 
% tuple('idx sha1 base flags offset len'.split()))  | 
|
456  | 
f.write('-------- ---------------------------------------- ')  | 
|
457  | 
f.write('-------- -------- -------- --------\n')  | 
|
458  | 
||
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
459  | 
for i, rec in enumerate(self):  | 
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
460  | 
f.write("#%-7d %40s " % (i, hexlify(rec[0])))  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
461  | 
if rec[1] == _NO_RECORD:  | 
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
462  | 
f.write("(none) ")  | 
463  | 
else:  | 
|
464  | 
f.write("#%-7d " % rec[1])  | 
|
465  | 
||
466  | 
f.write("%8x %8d %8d\n" % (rec[2], rec[3], rec[4]))  | 
|
| 
222
by mbp at sourcefrog
 refactor total_text_size  | 
467  | 
|
| 
223
by mbp at sourcefrog
 doc  | 
468  | 
|
| 
222
by mbp at sourcefrog
 refactor total_text_size  | 
469  | 
def total_text_size(self):  | 
| 
223
by mbp at sourcefrog
 doc  | 
470  | 
"""Return the sum of sizes of all file texts.  | 
471  | 
||
472  | 
        This is how much space they would occupy if they were stored without
 | 
|
473  | 
        delta and gzip compression.
 | 
|
474  | 
||
475  | 
        As a side effect this completely validates the Revfile, checking that all
 | 
|
476  | 
        texts can be reproduced with the correct SHA-1."""
 | 
|
| 
222
by mbp at sourcefrog
 refactor total_text_size  | 
477  | 
t = 0L  | 
478  | 
for idx in range(len(self)):  | 
|
479  | 
t += len(self.get(idx))  | 
|
480  | 
return t  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
481  | 
|
482  | 
||
483  | 
def check(self, pb=None):  | 
|
484  | 
"""Extract every version and check its hash."""  | 
|
485  | 
total = len(self)  | 
|
486  | 
for i in range(total):  | 
|
487  | 
if pb:  | 
|
488  | 
pb.update("check revision", i, total)  | 
|
489  | 
            # the get method implicitly checks the SHA-1
 | 
|
490  | 
self.get(i)  | 
|
491  | 
if pb:  | 
|
492  | 
pb.clear()  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
493  | 
|
494  | 
||
495  | 
||
496  | 
def main(argv):  | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
497  | 
try:  | 
498  | 
cmd = argv[1]  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
499  | 
filename = argv[2]  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
500  | 
except IndexError:  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
501  | 
sys.stderr.write("usage: revfile dump REVFILE\n"  | 
502  | 
" revfile add REVFILE < INPUT\n"  | 
|
503  | 
" revfile add-delta REVFILE BASE < INPUT\n"  | 
|
504  | 
" revfile add-series REVFILE BASE FILE...\n"  | 
|
505  | 
" revfile get REVFILE IDX\n"  | 
|
506  | 
" revfile find-sha REVFILE HEX\n"  | 
|
507  | 
" revfile total-text-size REVFILE\n"  | 
|
508  | 
" revfile last REVFILE\n")  | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
509  | 
return 1  | 
| 
218
by mbp at sourcefrog
 todo  | 
510  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
511  | 
if filename.endswith('.drev') or filename.endswith('.irev'):  | 
512  | 
filename = filename[:-5]  | 
|
513  | 
||
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
514  | 
def rw():  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
515  | 
return Revfile(filename, 'w')  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
516  | 
|
517  | 
def ro():  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
518  | 
return Revfile(filename, 'r')  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
519  | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
520  | 
if cmd == 'add':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
521  | 
print rw().add(sys.stdin.read())  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
522  | 
elif cmd == 'add-delta':  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
523  | 
print rw().add(sys.stdin.read(), int(argv[3]))  | 
524  | 
elif cmd == 'add-series':  | 
|
525  | 
r = rw()  | 
|
526  | 
rev = int(argv[3])  | 
|
527  | 
for fn in argv[4:]:  | 
|
528  | 
print rev  | 
|
529  | 
rev = r.add(file(fn).read(), rev)  | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
530  | 
elif cmd == 'dump':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
531  | 
ro().dump()  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
532  | 
elif cmd == 'get':  | 
| 
202
by mbp at sourcefrog
 Revfile:  | 
533  | 
try:  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
534  | 
idx = int(argv[3])  | 
| 
202
by mbp at sourcefrog
 Revfile:  | 
535  | 
except IndexError:  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
536  | 
sys.stderr.write("usage: revfile get FILE IDX\n")  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
537  | 
return 1  | 
538  | 
||
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
539  | 
r = ro()  | 
540  | 
||
| 
203
by mbp at sourcefrog
 revfile:  | 
541  | 
if idx < 0 or idx >= len(r):  | 
542  | 
sys.stderr.write("invalid index %r\n" % idx)  | 
|
543  | 
return 1  | 
|
544  | 
||
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
545  | 
sys.stdout.write(r.get(idx))  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
546  | 
elif cmd == 'find-sha':  | 
547  | 
try:  | 
|
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
548  | 
s = unhexlify(argv[3])  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
549  | 
except IndexError:  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
550  | 
sys.stderr.write("usage: revfile find-sha FILE HEX\n")  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
551  | 
return 1  | 
552  | 
||
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
553  | 
idx = ro().find_sha(s)  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
554  | 
if idx == _NO_RECORD:  | 
555  | 
sys.stderr.write("no such record\n")  | 
|
556  | 
return 1  | 
|
557  | 
else:  | 
|
558  | 
print idx  | 
|
| 
221
by mbp at sourcefrog
 Revfile: new command total-text-size  | 
559  | 
elif cmd == 'total-text-size':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
560  | 
print ro().total_text_size()  | 
| 
226
by mbp at sourcefrog
 revf: new command 'last'  | 
561  | 
elif cmd == 'last':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
562  | 
print len(ro())-1  | 
| 
974.1.26
by aaron.bentley at utoronto
 merged mbp@sourcefrog.net-20050817233101-0939da1cf91f2472  | 
563  | 
elif cmd == 'check':  | 
564  | 
import bzrlib.progress  | 
|
565  | 
pb = bzrlib.progress.ProgressBar()  | 
|
566  | 
ro().check(pb)  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
567  | 
else:  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
568  | 
sys.stderr.write("unknown command %r\n" % cmd)  | 
569  | 
return 1  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
570  | 
|
571  | 
||
572  | 
if __name__ == '__main__':  | 
|
573  | 
import sys  | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
574  | 
sys.exit(main(sys.argv) or 0)  |