bzr branch
http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
1  | 
#! /usr/bin/env python
 | 
2  | 
||
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
3  | 
# (C) 2005 Canonical Ltd
 | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
4  | 
|
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
5  | 
# based on an idea by Matt Mackall
 | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
6  | 
# modified to squish into bzr by Martin Pool
 | 
7  | 
||
8  | 
# This program is free software; you can redistribute it and/or modify
 | 
|
9  | 
# it under the terms of the GNU General Public License as published by
 | 
|
10  | 
# the Free Software Foundation; either version 2 of the License, or
 | 
|
11  | 
# (at your option) any later version.
 | 
|
12  | 
||
13  | 
# This program is distributed in the hope that it will be useful,
 | 
|
14  | 
# but WITHOUT ANY WARRANTY; without even the implied warranty of
 | 
|
15  | 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 | 
|
16  | 
# GNU General Public License for more details.
 | 
|
17  | 
||
18  | 
# You should have received a copy of the GNU General Public License
 | 
|
19  | 
# along with this program; if not, write to the Free Software
 | 
|
20  | 
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 | 
|
21  | 
||
22  | 
||
23  | 
"""Packed file revision storage.
 | 
|
24  | 
||
25  | 
A Revfile holds the text history of a particular source file, such
 | 
|
26  | 
as Makefile.  It can represent a tree of text versions for that
 | 
|
27  | 
file, allowing for microbranches within a single repository.
 | 
|
28  | 
||
29  | 
This is stored on disk as two files: an index file, and a data file.
 | 
|
30  | 
The index file is short and always read completely into memory; the
 | 
|
31  | 
data file is much longer and only the relevant bits of it,
 | 
|
32  | 
identified by the index file, need to be read.
 | 
|
33  | 
||
34  | 
Each text version is identified by the SHA-1 of the full text of
 | 
|
35  | 
that version.  It also has a sequence number within the file.
 | 
|
36  | 
||
37  | 
The index file has a short header and then a sequence of fixed-length
 | 
|
38  | 
records:
 | 
|
39  | 
||
40  | 
* byte[20]    SHA-1 of text (as binary, not hex)
 | 
|
41  | 
* uint32      sequence number this is based on, or -1 for full text
 | 
|
42  | 
* uint32      flags: 1=zlib compressed
 | 
|
43  | 
* uint32      offset in text file of start
 | 
|
44  | 
* uint32      length of compressed delta in text file
 | 
|
45  | 
* uint32[3]   reserved
 | 
|
46  | 
||
47  | 
total 48 bytes.
 | 
|
48  | 
||
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
49  | 
The header is also 48 bytes for tidyness and easy calculation.
 | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
50  | 
|
51  | 
Both the index and the text are only ever appended to; a consequence
 | 
|
52  | 
is that sequence numbers are stable references.  But not every
 | 
|
53  | 
repository in the world will assign the same sequence numbers,
 | 
|
54  | 
therefore the SHA-1 is the only universally unique reference.
 | 
|
| 
231
by mbp at sourcefrog
 revfile doc  | 
55  | 
The iter method here will generally read through the whole index file
 | 
56  | 
in one go.  With readahead in the kernel and python/libc (typically
 | 
|
57  | 
128kB) this means that there should be no seeks and often only one
 | 
|
58  | 
read() call to get everything into memory.
 | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
59  | 
"""
 | 
60  | 
||
61  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
62  | 
# TODO: Something like pread() would make this slightly simpler and
 | 
63  | 
# perhaps more efficient.
 | 
|
64  | 
||
| 
219
by mbp at sourcefrog
 todo  | 
65  | 
# TODO: Could also try to mmap things...  Might be faster for the
 | 
66  | 
# index in particular?
 | 
|
67  | 
||
68  | 
# TODO: Some kind of faster lookup of SHAs?  The bad thing is that probably means
 | 
|
69  | 
# rewriting existing records, which is not so nice.
 | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
70  | 
|
| 
224
by mbp at sourcefrog
 doc  | 
71  | 
# TODO: Something to check that regions identified in the index file
 | 
72  | 
# completely butt up and do not overlap.  Strictly it's not a problem
 | 
|
73  | 
# if there are gaps and that can happen if we're interrupted while
 | 
|
74  | 
# writing to the datafile.  Overlapping would be very bad though.
 | 
|
75  | 
||
76  | 
||
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
77  | 
|
78  | 
import sys, zlib, struct, mdiff, stat, os, sha  | 
|
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
79  | 
from binascii import hexlify, unhexlify  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
80  | 
|
81  | 
factor = 10  | 
|
82  | 
||
83  | 
_RECORDSIZE = 48  | 
|
84  | 
||
85  | 
_HEADER = "bzr revfile v1\n"  | 
|
86  | 
_HEADER = _HEADER + ('\xff' * (_RECORDSIZE - len(_HEADER)))  | 
|
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
87  | 
_NO_RECORD = 0xFFFFFFFFL  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
88  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
89  | 
# fields in the index record
 | 
90  | 
I_SHA = 0  | 
|
91  | 
I_BASE = 1  | 
|
92  | 
I_FLAGS = 2  | 
|
93  | 
I_OFFSET = 3  | 
|
94  | 
I_LEN = 4  | 
|
95  | 
||
| 
207
by mbp at sourcefrog
 Revfile: compress data going into datafile if that would be worthwhile  | 
96  | 
FL_GZIP = 1  | 
97  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
98  | 
# maximum number of patches in a row before recording a whole text.
 | 
| 
227
by mbp at sourcefrog
 increase patch chaining limit  | 
99  | 
CHAIN_LIMIT = 50  | 
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
100  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
101  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
102  | 
class RevfileError(Exception):  | 
103  | 
    pass
 | 
|
104  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
105  | 
class LimitHitException(Exception):  | 
106  | 
    pass
 | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
107  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
108  | 
class Revfile:  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
109  | 
def __init__(self, basename, mode):  | 
| 
202
by mbp at sourcefrog
 Revfile:  | 
110  | 
        # TODO: Lock file  while open
 | 
111  | 
||
112  | 
        # TODO: advise of random access
 | 
|
113  | 
||
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
114  | 
self.basename = basename  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
115  | 
|
116  | 
if mode not in ['r', 'w']:  | 
|
117  | 
raise RevfileError("invalid open mode %r" % mode)  | 
|
118  | 
self.mode = mode  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
119  | 
|
120  | 
idxname = basename + '.irev'  | 
|
121  | 
dataname = basename + '.drev'  | 
|
122  | 
||
123  | 
idx_exists = os.path.exists(idxname)  | 
|
124  | 
data_exists = os.path.exists(dataname)  | 
|
125  | 
||
126  | 
if idx_exists != data_exists:  | 
|
127  | 
raise RevfileError("half-assed revfile")  | 
|
128  | 
||
129  | 
if not idx_exists:  | 
|
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
130  | 
if mode == 'r':  | 
131  | 
raise RevfileError("Revfile %r does not exist" % basename)  | 
|
132  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
133  | 
self.idxfile = open(idxname, 'w+b')  | 
134  | 
self.datafile = open(dataname, 'w+b')  | 
|
135  | 
||
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
136  | 
print 'init empty file'  | 
137  | 
self.idxfile.write(_HEADER)  | 
|
138  | 
self.idxfile.flush()  | 
|
139  | 
else:  | 
|
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
140  | 
if mode == 'r':  | 
141  | 
diskmode = 'rb'  | 
|
142  | 
else:  | 
|
143  | 
diskmode = 'r+b'  | 
|
144  | 
||
145  | 
self.idxfile = open(idxname, diskmode)  | 
|
146  | 
self.datafile = open(dataname, diskmode)  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
147  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
148  | 
h = self.idxfile.read(_RECORDSIZE)  | 
149  | 
if h != _HEADER:  | 
|
150  | 
raise RevfileError("bad header %r in index of %r"  | 
|
151  | 
% (h, self.basename))  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
152  | 
|
153  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
154  | 
def _check_index(self, idx):  | 
155  | 
if idx < 0 or idx > len(self):  | 
|
156  | 
raise RevfileError("invalid index %r" % idx)  | 
|
157  | 
||
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
158  | 
def _check_write(self):  | 
159  | 
if self.mode != 'w':  | 
|
160  | 
raise RevfileError("%r is open readonly" % self.basename)  | 
|
161  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
162  | 
|
163  | 
def find_sha(self, s):  | 
|
164  | 
assert isinstance(s, str)  | 
|
165  | 
assert len(s) == 20  | 
|
166  | 
||
167  | 
for idx, idxrec in enumerate(self):  | 
|
168  | 
if idxrec[I_SHA] == s:  | 
|
169  | 
return idx  | 
|
170  | 
else:  | 
|
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
171  | 
return _NO_RECORD  | 
172  | 
||
173  | 
||
174  | 
||
175  | 
def _add_compressed(self, text_sha, data, base, compress):  | 
|
176  | 
        # well, maybe compress
 | 
|
177  | 
flags = 0  | 
|
178  | 
if compress:  | 
|
179  | 
data_len = len(data)  | 
|
180  | 
if data_len > 50:  | 
|
181  | 
                # don't do compression if it's too small; it's unlikely to win
 | 
|
182  | 
                # enough to be worthwhile
 | 
|
183  | 
compr_data = zlib.compress(data)  | 
|
184  | 
compr_len = len(compr_data)  | 
|
185  | 
if compr_len < data_len:  | 
|
186  | 
data = compr_data  | 
|
187  | 
flags = FL_GZIP  | 
|
188  | 
                    ##print '- compressed %d -> %d, %.1f%%' \
 | 
|
189  | 
                    ##      % (data_len, compr_len, float(compr_len)/float(data_len) * 100.0)
 | 
|
190  | 
return self._add_raw(text_sha, data, base, flags)  | 
|
191  | 
||
192  | 
||
193  | 
||
194  | 
def _add_raw(self, text_sha, data, base, flags):  | 
|
| 
207
by mbp at sourcefrog
 Revfile: compress data going into datafile if that would be worthwhile  | 
195  | 
"""Add pre-processed data, can be either full text or delta.  | 
196  | 
||
197  | 
        This does the compression if that makes sense."""
 | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
198  | 
idx = len(self)  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
199  | 
self.datafile.seek(0, 2) # to end  | 
200  | 
self.idxfile.seek(0, 2)  | 
|
| 
202
by mbp at sourcefrog
 Revfile:  | 
201  | 
assert self.idxfile.tell() == _RECORDSIZE * (idx + 1)  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
202  | 
data_offset = self.datafile.tell()  | 
203  | 
||
| 
254
by Martin Pool
 - Doc cleanups from Magnus Therning  | 
204  | 
assert isinstance(data, str) # not unicode or anything weird  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
205  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
206  | 
self.datafile.write(data)  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
207  | 
self.datafile.flush()  | 
208  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
209  | 
assert isinstance(text_sha, str)  | 
210  | 
entry = text_sha  | 
|
211  | 
entry += struct.pack(">IIII12x", base, flags, data_offset, len(data))  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
212  | 
assert len(entry) == _RECORDSIZE  | 
213  | 
||
214  | 
self.idxfile.write(entry)  | 
|
215  | 
self.idxfile.flush()  | 
|
216  | 
||
217  | 
return idx  | 
|
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
218  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
219  | 
|
220  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
221  | 
def _add_full_text(self, text, text_sha, compress):  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
222  | 
"""Add a full text to the file.  | 
223  | 
||
224  | 
        This is not compressed against any reference version.
 | 
|
225  | 
||
226  | 
        Returns the index for that text."""
 | 
|
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
227  | 
return self._add_compressed(text_sha, text, _NO_RECORD, compress)  | 
228  | 
||
229  | 
||
230  | 
def _add_delta(self, text, text_sha, base, compress):  | 
|
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
231  | 
"""Add a text stored relative to a previous text."""  | 
232  | 
self._check_index(base)  | 
|
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
233  | 
|
234  | 
try:  | 
|
235  | 
base_text = self.get(base, recursion_limit=CHAIN_LIMIT)  | 
|
236  | 
except LimitHitException:  | 
|
237  | 
return self._add_full_text(text, text_sha, compress)  | 
|
238  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
239  | 
data = mdiff.bdiff(base_text, text)  | 
| 
213
by mbp at sourcefrog
 Revfile: don't store deltas if they'd be larger than just storing the whole text  | 
240  | 
|
241  | 
        # If the delta is larger than the text, we might as well just
 | 
|
242  | 
        # store the text.  (OK, the delta might be more compressible,
 | 
|
243  | 
        # but the overhead of applying it probably still makes it
 | 
|
| 
214
by mbp at sourcefrog
 doc  | 
244  | 
        # bad, and I don't want to compress both of them to find out.)
 | 
| 
213
by mbp at sourcefrog
 Revfile: don't store deltas if they'd be larger than just storing the whole text  | 
245  | 
if len(data) >= len(text):  | 
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
246  | 
return self._add_full_text(text, text_sha, compress)  | 
| 
213
by mbp at sourcefrog
 Revfile: don't store deltas if they'd be larger than just storing the whole text  | 
247  | 
else:  | 
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
248  | 
return self._add_compressed(text_sha, data, base, compress)  | 
249  | 
||
250  | 
||
251  | 
def add(self, text, base=_NO_RECORD, compress=True):  | 
|
| 
215
by mbp at sourcefrog
 Doc  | 
252  | 
"""Add a new text to the revfile.  | 
253  | 
||
254  | 
        If the text is already present them its existing id is
 | 
|
255  | 
        returned and the file is not changed.
 | 
|
256  | 
||
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
257  | 
        If compress is true then gzip compression will be used if it
 | 
258  | 
        reduces the size.
 | 
|
259  | 
||
| 
215
by mbp at sourcefrog
 Doc  | 
260  | 
        If a base index is specified, that text *may* be used for
 | 
261  | 
        delta compression of the new text.  Delta compression will
 | 
|
262  | 
        only be used if it would be a size win and if the existing
 | 
|
263  | 
        base is not at too long of a delta chain already.
 | 
|
264  | 
        """
 | 
|
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
265  | 
self._check_write()  | 
266  | 
||
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
267  | 
text_sha = sha.new(text).digest()  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
268  | 
|
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
269  | 
idx = self.find_sha(text_sha)  | 
270  | 
if idx != _NO_RECORD:  | 
|
| 
215
by mbp at sourcefrog
 Doc  | 
271  | 
            # TODO: Optional paranoid mode where we read out that record and make sure
 | 
272  | 
            # it's the same, in case someone ever breaks SHA-1.
 | 
|
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
273  | 
return idx # already present  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
274  | 
|
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
275  | 
if base == _NO_RECORD:  | 
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
276  | 
return self._add_full_text(text, text_sha, compress)  | 
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
277  | 
else:  | 
| 
217
by mbp at sourcefrog
 Revfile: make compression optional, in case people are storing files they know won't compress  | 
278  | 
return self._add_delta(text, text_sha, base, compress)  | 
| 
206
by mbp at sourcefrog
 new Revfile.add() dwim  | 
279  | 
|
280  | 
||
281  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
282  | 
def get(self, idx, recursion_limit=None):  | 
283  | 
"""Retrieve text of a previous revision.  | 
|
284  | 
||
285  | 
        If recursion_limit is an integer then walk back at most that
 | 
|
286  | 
        many revisions and then raise LimitHitException, indicating
 | 
|
287  | 
        that we ought to record a new file text instead of another
 | 
|
288  | 
        delta.  Don't use this when trying to get out an existing
 | 
|
289  | 
        revision."""
 | 
|
290  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
291  | 
idxrec = self[idx]  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
292  | 
base = idxrec[I_BASE]  | 
293  | 
if base == _NO_RECORD:  | 
|
294  | 
text = self._get_full_text(idx, idxrec)  | 
|
295  | 
else:  | 
|
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
296  | 
text = self._get_patched(idx, idxrec, recursion_limit)  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
297  | 
|
298  | 
if sha.new(text).digest() != idxrec[I_SHA]:  | 
|
299  | 
raise RevfileError("corrupt SHA-1 digest on record %d"  | 
|
300  | 
% idx)  | 
|
301  | 
||
302  | 
return text  | 
|
303  | 
||
304  | 
||
305  | 
||
306  | 
def _get_raw(self, idx, idxrec):  | 
|
| 
209
by mbp at sourcefrog
 Revfile: handle decompression  | 
307  | 
flags = idxrec[I_FLAGS]  | 
308  | 
if flags & ~FL_GZIP:  | 
|
309  | 
raise RevfileError("unsupported index flags %#x on index %d"  | 
|
310  | 
% (flags, idx))  | 
|
311  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
312  | 
l = idxrec[I_LEN]  | 
313  | 
if l == 0:  | 
|
314  | 
return ''  | 
|
315  | 
||
316  | 
self.datafile.seek(idxrec[I_OFFSET])  | 
|
317  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
318  | 
data = self.datafile.read(l)  | 
319  | 
if len(data) != l:  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
320  | 
raise RevfileError("short read %d of %d "  | 
321  | 
"getting text for record %d in %r"  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
322  | 
% (len(data), l, idx, self.basename))  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
323  | 
|
| 
209
by mbp at sourcefrog
 Revfile: handle decompression  | 
324  | 
if flags & FL_GZIP:  | 
325  | 
data = zlib.decompress(data)  | 
|
326  | 
||
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
327  | 
return data  | 
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
328  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
329  | 
|
330  | 
def _get_full_text(self, idx, idxrec):  | 
|
331  | 
assert idxrec[I_BASE] == _NO_RECORD  | 
|
332  | 
||
333  | 
text = self._get_raw(idx, idxrec)  | 
|
334  | 
||
335  | 
return text  | 
|
336  | 
||
337  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
338  | 
def _get_patched(self, idx, idxrec, recursion_limit):  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
339  | 
base = idxrec[I_BASE]  | 
340  | 
assert base >= 0  | 
|
341  | 
assert base < idx # no loops!  | 
|
342  | 
||
| 
220
by mbp at sourcefrog
 limit the number of chained patches  | 
343  | 
if recursion_limit == None:  | 
344  | 
sub_limit = None  | 
|
345  | 
else:  | 
|
346  | 
sub_limit = recursion_limit - 1  | 
|
347  | 
if sub_limit < 0:  | 
|
348  | 
raise LimitHitException()  | 
|
349  | 
||
350  | 
base_text = self.get(base, sub_limit)  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
351  | 
patch = self._get_raw(idx, idxrec)  | 
352  | 
||
353  | 
text = mdiff.bpatch(base_text, patch)  | 
|
354  | 
||
355  | 
return text  | 
|
356  | 
||
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
357  | 
|
358  | 
||
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
359  | 
def __len__(self):  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
360  | 
"""Return number of revisions."""  | 
361  | 
l = os.fstat(self.idxfile.fileno())[stat.ST_SIZE]  | 
|
362  | 
if l % _RECORDSIZE:  | 
|
363  | 
raise RevfileError("bad length %d on index of %r" % (l, self.basename))  | 
|
364  | 
if l < _RECORDSIZE:  | 
|
365  | 
raise RevfileError("no header present in index of %r" % (self.basename))  | 
|
366  | 
return int(l / _RECORDSIZE) - 1  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
367  | 
|
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
368  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
369  | 
def __getitem__(self, idx):  | 
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
370  | 
"""Index by sequence id returns the index field"""  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
371  | 
        ## TODO: Can avoid seek if we just moved there...
 | 
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
372  | 
self._seek_index(idx)  | 
| 
230
by mbp at sourcefrog
 Revfile: better __iter__ method that reads the whole index file in one go!  | 
373  | 
idxrec = self._read_next_index()  | 
374  | 
if idxrec == None:  | 
|
375  | 
raise IndexError()  | 
|
376  | 
else:  | 
|
377  | 
return idxrec  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
378  | 
|
379  | 
||
380  | 
def _seek_index(self, idx):  | 
|
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
381  | 
if idx < 0:  | 
382  | 
raise RevfileError("invalid index %r" % idx)  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
383  | 
self.idxfile.seek((idx + 1) * _RECORDSIZE)  | 
| 
230
by mbp at sourcefrog
 Revfile: better __iter__ method that reads the whole index file in one go!  | 
384  | 
|
385  | 
||
386  | 
||
387  | 
def __iter__(self):  | 
|
388  | 
"""Read back all index records.  | 
|
389  | 
||
390  | 
        Do not seek the index file while this is underway!"""
 | 
|
391  | 
sys.stderr.write(" ** iter called ** \n")  | 
|
392  | 
self._seek_index(0)  | 
|
393  | 
while True:  | 
|
394  | 
idxrec = self._read_next_index()  | 
|
395  | 
if not idxrec:  | 
|
396  | 
                break
 | 
|
397  | 
yield idxrec  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
398  | 
|
399  | 
||
400  | 
def _read_next_index(self):  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
401  | 
rec = self.idxfile.read(_RECORDSIZE)  | 
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
402  | 
if not rec:  | 
| 
230
by mbp at sourcefrog
 Revfile: better __iter__ method that reads the whole index file in one go!  | 
403  | 
return None  | 
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
404  | 
elif len(rec) != _RECORDSIZE:  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
405  | 
raise RevfileError("short read of %d bytes getting index %d from %r"  | 
406  | 
% (len(rec), idx, self.basename))  | 
|
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
407  | 
|
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
408  | 
return struct.unpack(">20sIIII12x", rec)  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
409  | 
|
410  | 
||
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
411  | 
def dump(self, f=sys.stdout):  | 
412  | 
f.write('%-8s %-40s %-8s %-8s %-8s %-8s\n'  | 
|
413  | 
% tuple('idx sha1 base flags offset len'.split()))  | 
|
414  | 
f.write('-------- ---------------------------------------- ')  | 
|
415  | 
f.write('-------- -------- -------- --------\n')  | 
|
416  | 
||
| 
200
by mbp at sourcefrog
 revfile: fix up __getitem__ to allow simple iteration  | 
417  | 
for i, rec in enumerate(self):  | 
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
418  | 
f.write("#%-7d %40s " % (i, hexlify(rec[0])))  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
419  | 
if rec[1] == _NO_RECORD:  | 
| 
199
by mbp at sourcefrog
 - use -1 for no_base in revfile  | 
420  | 
f.write("(none) ")  | 
421  | 
else:  | 
|
422  | 
f.write("#%-7d " % rec[1])  | 
|
423  | 
||
424  | 
f.write("%8x %8d %8d\n" % (rec[2], rec[3], rec[4]))  | 
|
| 
222
by mbp at sourcefrog
 refactor total_text_size  | 
425  | 
|
| 
223
by mbp at sourcefrog
 doc  | 
426  | 
|
| 
222
by mbp at sourcefrog
 refactor total_text_size  | 
427  | 
def total_text_size(self):  | 
| 
223
by mbp at sourcefrog
 doc  | 
428  | 
"""Return the sum of sizes of all file texts.  | 
429  | 
||
430  | 
        This is how much space they would occupy if they were stored without
 | 
|
431  | 
        delta and gzip compression.
 | 
|
432  | 
||
433  | 
        As a side effect this completely validates the Revfile, checking that all
 | 
|
434  | 
        texts can be reproduced with the correct SHA-1."""
 | 
|
| 
222
by mbp at sourcefrog
 refactor total_text_size  | 
435  | 
t = 0L  | 
436  | 
for idx in range(len(self)):  | 
|
437  | 
t += len(self.get(idx))  | 
|
438  | 
return t  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
439  | 
|
440  | 
||
441  | 
||
442  | 
def main(argv):  | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
443  | 
try:  | 
444  | 
cmd = argv[1]  | 
|
445  | 
except IndexError:  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
446  | 
sys.stderr.write("usage: revfile dump\n"  | 
| 
201
by mbp at sourcefrog
 Revfile: - get full text from a record- fix creation of files if they don't exist- protect against half-assed storage  | 
447  | 
" revfile add\n"  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
448  | 
" revfile add-delta BASE\n"  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
449  | 
" revfile get IDX\n"  | 
| 
221
by mbp at sourcefrog
 Revfile: new command total-text-size  | 
450  | 
" revfile find-sha HEX\n"  | 
| 
226
by mbp at sourcefrog
 revf: new command 'last'  | 
451  | 
" revfile total-text-size\n"  | 
452  | 
" revfile last\n")  | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
453  | 
return 1  | 
| 
218
by mbp at sourcefrog
 todo  | 
454  | 
|
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
455  | 
def rw():  | 
456  | 
return Revfile('testrev', 'w')  | 
|
457  | 
||
458  | 
def ro():  | 
|
459  | 
return Revfile('testrev', 'r')  | 
|
460  | 
||
| 
203
by mbp at sourcefrog
 revfile:  | 
461  | 
if cmd == 'add':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
462  | 
print rw().add(sys.stdin.read())  | 
| 
205
by mbp at sourcefrog
 Revfile:- store and retrieve deltas!mdiff:- work on bytes not lines  | 
463  | 
elif cmd == 'add-delta':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
464  | 
print rw().add(sys.stdin.read(), int(argv[2]))  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
465  | 
elif cmd == 'dump':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
466  | 
ro().dump()  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
467  | 
elif cmd == 'get':  | 
| 
202
by mbp at sourcefrog
 Revfile:  | 
468  | 
try:  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
469  | 
idx = int(argv[2])  | 
| 
202
by mbp at sourcefrog
 Revfile:  | 
470  | 
except IndexError:  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
471  | 
sys.stderr.write("usage: revfile get IDX\n")  | 
472  | 
return 1  | 
|
473  | 
||
474  | 
if idx < 0 or idx >= len(r):  | 
|
475  | 
sys.stderr.write("invalid index %r\n" % idx)  | 
|
476  | 
return 1  | 
|
477  | 
||
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
478  | 
sys.stdout.write(ro().get(idx))  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
479  | 
elif cmd == 'find-sha':  | 
480  | 
try:  | 
|
481  | 
s = unhexlify(argv[2])  | 
|
482  | 
except IndexError:  | 
|
483  | 
sys.stderr.write("usage: revfile find-sha HEX\n")  | 
|
484  | 
return 1  | 
|
485  | 
||
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
486  | 
idx = ro().find_sha(s)  | 
| 
204
by mbp at sourcefrog
 Revfile:- new find-sha command and implementation- new _check_index helper  | 
487  | 
if idx == _NO_RECORD:  | 
488  | 
sys.stderr.write("no such record\n")  | 
|
489  | 
return 1  | 
|
490  | 
else:  | 
|
491  | 
print idx  | 
|
| 
221
by mbp at sourcefrog
 Revfile: new command total-text-size  | 
492  | 
elif cmd == 'total-text-size':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
493  | 
print ro().total_text_size()  | 
| 
226
by mbp at sourcefrog
 revf: new command 'last'  | 
494  | 
elif cmd == 'last':  | 
| 
229
by mbp at sourcefrog
 Allow opening revision file read-only  | 
495  | 
print len(ro())-1  | 
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
496  | 
else:  | 
| 
203
by mbp at sourcefrog
 revfile:  | 
497  | 
sys.stderr.write("unknown command %r\n" % cmd)  | 
498  | 
return 1  | 
|
| 
198
by mbp at sourcefrog
 - experimental compressed Revfile support  | 
499  | 
|
500  | 
||
501  | 
if __name__ == '__main__':  | 
|
502  | 
import sys  | 
|
| 
203
by mbp at sourcefrog
 revfile:  | 
503  | 
sys.exit(main(sys.argv) or 0)  |