/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
1
# Copyright (C) 2007 Canonical Ltd
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
17
"""Container format for Bazaar data.
18
19
"Containers" and "records" are described in doc/developers/container-format.txt.
20
"""
21
2661.2.2 by Robert Collins
* ``bzrlib.pack.make_readv_reader`` allows readv based access to pack
22
from cStringIO import StringIO
2506.5.2 by Andrew Bennetts
Raise InvalidRecordError on invalid names.
23
import re
24
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
25
from bzrlib import errors
26
27
2506.2.10 by Andrew Bennetts
Add '(introduced in 0.18)' to pack format string.
28
FORMAT_ONE = "Bazaar pack format 1 (introduced in 0.18)"
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
29
30
2506.5.2 by Andrew Bennetts
Raise InvalidRecordError on invalid names.
31
_whitespace_re = re.compile('[\t\n\x0b\x0c\r ]')
32
33
34
def _check_name(name):
35
    """Do some basic checking of 'name'.
36
    
37
    At the moment, this just checks that there are no whitespace characters in a
38
    name.
39
40
    :raises InvalidRecordError: if name is not valid.
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
41
    :seealso: _check_name_encoding
2506.5.2 by Andrew Bennetts
Raise InvalidRecordError on invalid names.
42
    """
43
    if _whitespace_re.search(name) is not None:
44
        raise errors.InvalidRecordError("%r is not a valid name." % (name,))
45
46
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
47
def _check_name_encoding(name):
48
    """Check that 'name' is valid UTF-8.
49
    
50
    This is separate from _check_name because UTF-8 decoding is relatively
51
    expensive, and we usually want to avoid it.
52
53
    :raises InvalidRecordError: if name is not valid UTF-8.
54
    """
55
    try:
56
        name.decode('utf-8')
57
    except UnicodeDecodeError, e:
58
        raise errors.InvalidRecordError(str(e))
59
60
2506.3.1 by Andrew Bennetts
More progress:
61
class ContainerWriter(object):
2698.1.1 by Robert Collins
Add records_written attribute to ContainerWriter's. (Robert Collins).
62
    """A class for writing containers.
63
64
    :attribute records_written: The number of user records added to the
65
        container. This does not count the prelude or suffix of the container
66
        introduced by the begin() and end() methods.
67
    """
2506.3.1 by Andrew Bennetts
More progress:
68
69
    def __init__(self, write_func):
70
        """Constructor.
71
72
        :param write_func: a callable that will be called when this
73
            ContainerWriter needs to write some bytes.
74
        """
2661.2.1 by Robert Collins
* ``bzrlib.pack.ContainerWriter`` now returns an offset, length tuple to
75
        self._write_func = write_func
76
        self.current_offset = 0
2698.1.1 by Robert Collins
Add records_written attribute to ContainerWriter's. (Robert Collins).
77
        self.records_written = 0
2506.3.1 by Andrew Bennetts
More progress:
78
79
    def begin(self):
80
        """Begin writing a container."""
81
        self.write_func(FORMAT_ONE + "\n")
82
2661.2.1 by Robert Collins
* ``bzrlib.pack.ContainerWriter`` now returns an offset, length tuple to
83
    def write_func(self, bytes):
84
        self._write_func(bytes)
85
        self.current_offset += len(bytes)
86
2506.3.1 by Andrew Bennetts
More progress:
87
    def end(self):
88
        """Finish writing a container."""
89
        self.write_func("E")
90
91
    def add_bytes_record(self, bytes, names):
2661.2.1 by Robert Collins
* ``bzrlib.pack.ContainerWriter`` now returns an offset, length tuple to
92
        """Add a Bytes record with the given names.
93
        
94
        :param bytes: The bytes to insert.
2682.1.1 by Robert Collins
* The ``bzrlib.pack`` interface has changed to use tuples of bytestrings
95
        :param names: The names to give the inserted bytes. Each name is
96
            a tuple of bytestrings. The bytestrings may not contain
97
            whitespace.
2661.2.1 by Robert Collins
* ``bzrlib.pack.ContainerWriter`` now returns an offset, length tuple to
98
        :return: An offset, length tuple. The offset is the offset
99
            of the record within the container, and the length is the
100
            length of data that will need to be read to reconstitute the
101
            record. These offset and length can only be used with the pack
102
            interface - they might be offset by headers or other such details
103
            and thus are only suitable for use by a ContainerReader.
104
        """
105
        current_offset = self.current_offset
2506.3.1 by Andrew Bennetts
More progress:
106
        # Kind marker
107
        self.write_func("B")
108
        # Length
109
        self.write_func(str(len(bytes)) + "\n")
110
        # Names
2682.1.1 by Robert Collins
* The ``bzrlib.pack`` interface has changed to use tuples of bytestrings
111
        for name_tuple in names:
2506.5.2 by Andrew Bennetts
Raise InvalidRecordError on invalid names.
112
            # Make sure we're writing valid names.  Note that we will leave a
113
            # half-written record if a name is bad!
2682.1.1 by Robert Collins
* The ``bzrlib.pack`` interface has changed to use tuples of bytestrings
114
            for name in name_tuple:
115
                _check_name(name)
116
            self.write_func('\x00'.join(name_tuple) + "\n")
2506.3.1 by Andrew Bennetts
More progress:
117
        # End of headers
118
        self.write_func("\n")
119
        # Finally, the contents.
120
        self.write_func(bytes)
2698.1.1 by Robert Collins
Add records_written attribute to ContainerWriter's. (Robert Collins).
121
        self.records_written += 1
2661.2.1 by Robert Collins
* ``bzrlib.pack.ContainerWriter`` now returns an offset, length tuple to
122
        # return a memo of where we wrote data to allow random access.
123
        return current_offset, self.current_offset - current_offset
2506.3.1 by Andrew Bennetts
More progress:
124
125
2661.2.2 by Robert Collins
* ``bzrlib.pack.make_readv_reader`` allows readv based access to pack
126
class ReadVFile(object):
127
    """Adapt a readv result iterator to a file like protocol."""
128
129
    def __init__(self, readv_result):
130
        self.readv_result = readv_result
131
        # the most recent readv result block
132
        self._string = None
133
134
    def _next(self):
135
        if (self._string is None or
136
            self._string.tell() == self._string_length):
137
            length, data = self.readv_result.next()
138
            self._string_length = len(data)
139
            self._string = StringIO(data)
140
141
    def read(self, length):
142
        self._next()
143
        result = self._string.read(length)
144
        if len(result) < length:
145
            raise errors.BzrError('request for too much data from a readv hunk.')
146
        return result
147
148
    def readline(self):
149
        """Note that readline will not cross readv segments."""
150
        self._next()
151
        result = self._string.readline()
152
        if self._string.tell() == self._string_length and result[-1] != '\n':
153
            raise errors.BzrError('short readline in the readvfile hunk.')
154
        return result
155
156
157
def make_readv_reader(transport, filename, requested_records):
158
    """Create a ContainerReader that will read selected records only.
159
160
    :param transport: The transport the pack file is located on.
161
    :param filename: The filename of the pack file.
162
    :param requested_records: The record offset, length tuples as returned
163
        by add_bytes_record for the desired records.
164
    """
165
    readv_blocks = [(0, len(FORMAT_ONE)+1)]
166
    readv_blocks.extend(requested_records)
167
    result = ContainerReader(ReadVFile(
168
        transport.readv(filename, readv_blocks)))
169
    return result
170
171
2506.3.1 by Andrew Bennetts
More progress:
172
class BaseReader(object):
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
173
2506.2.9 by Aaron Bentley
Use file-like objects as container input, not callables
174
    def __init__(self, source_file):
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
175
        """Constructor.
176
2506.2.12 by Andrew Bennetts
Update docstring for Aaron's changes.
177
        :param source_file: a file-like object with `read` and `readline`
178
            methods.
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
179
        """
2506.2.9 by Aaron Bentley
Use file-like objects as container input, not callables
180
        self._source = source_file
181
182
    def reader_func(self, length=None):
183
        return self._source.read(length)
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
184
2506.3.1 by Andrew Bennetts
More progress:
185
    def _read_line(self):
2506.2.9 by Aaron Bentley
Use file-like objects as container input, not callables
186
        line = self._source.readline()
187
        if not line.endswith('\n'):
188
            raise errors.UnexpectedEndOfContainerError()
189
        return line.rstrip('\n')
2506.3.1 by Andrew Bennetts
More progress:
190
191
192
class ContainerReader(BaseReader):
193
    """A class for reading Bazaar's container format."""
194
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
195
    def iter_records(self):
196
        """Iterate over the container, yielding each record as it is read.
197
2506.6.2 by Andrew Bennetts
Docstring improvements.
198
        Each yielded record will be a 2-tuple of (names, callable), where names
199
        is a ``list`` and bytes is a function that takes one argument,
200
        ``max_length``.
201
202
        You **must not** call the callable after advancing the interator to the
203
        next record.  That is, this code is invalid::
204
205
            record_iter = container.iter_records()
206
            names1, callable1 = record_iter.next()
207
            names2, callable2 = record_iter.next()
208
            bytes1 = callable1(None)
209
        
210
        As it will give incorrect results and invalidate the state of the
211
        ContainerReader.
2506.3.1 by Andrew Bennetts
More progress:
212
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
213
        :raises ContainerError: if any sort of containter corruption is
214
            detected, e.g. UnknownContainerFormatError is the format of the
215
            container is unrecognised.
2506.6.2 by Andrew Bennetts
Docstring improvements.
216
        :seealso: ContainerReader.read
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
217
        """
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
218
        self._read_format()
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
219
        return self._iter_records()
220
    
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
221
    def iter_record_objects(self):
222
        """Iterate over the container, yielding each record as it is read.
223
224
        Each yielded record will be an object with ``read`` and ``validate``
2506.6.2 by Andrew Bennetts
Docstring improvements.
225
        methods.  Like with iter_records, it is not safe to use a record object
226
        after advancing the iterator to yield next record.
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
227
228
        :raises ContainerError: if any sort of containter corruption is
229
            detected, e.g. UnknownContainerFormatError is the format of the
230
            container is unrecognised.
2506.6.2 by Andrew Bennetts
Docstring improvements.
231
        :seealso: iter_records
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
232
        """
233
        self._read_format()
234
        return self._iter_record_objects()
235
    
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
236
    def _iter_records(self):
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
237
        for record in self._iter_record_objects():
238
            yield record.read()
239
240
    def _iter_record_objects(self):
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
241
        while True:
242
            record_kind = self.reader_func(1)
243
            if record_kind == 'B':
244
                # Bytes record.
2506.2.9 by Aaron Bentley
Use file-like objects as container input, not callables
245
                reader = BytesRecordReader(self._source)
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
246
                yield reader
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
247
            elif record_kind == 'E':
248
                # End marker.  There are no more records.
249
                return
250
            elif record_kind == '':
251
                # End of stream encountered, but no End Marker record seen, so
252
                # this container is incomplete.
253
                raise errors.UnexpectedEndOfContainerError()
254
            else:
255
                # Unknown record type.
256
                raise errors.UnknownRecordTypeError(record_kind)
257
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
258
    def _read_format(self):
259
        format = self._read_line()
260
        if format != FORMAT_ONE:
261
            raise errors.UnknownContainerFormatError(format)
262
2506.2.6 by Andrew Bennetts
Add validate method to ContainerReader and BytesRecordReader.
263
    def validate(self):
264
        """Validate this container and its records.
265
2506.2.7 by Andrew Bennetts
Change read/iter_records to return a callable, add more validation, and
266
        Validating consumes the data stream just like iter_records and
267
        iter_record_objects, so you cannot call it after
268
        iter_records/iter_record_objects.
2506.2.6 by Andrew Bennetts
Add validate method to ContainerReader and BytesRecordReader.
269
270
        :raises ContainerError: if something is invalid.
271
        """
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
272
        all_names = set()
273
        for record_names, read_bytes in self.iter_records():
274
            read_bytes(None)
2682.1.1 by Robert Collins
* The ``bzrlib.pack`` interface has changed to use tuples of bytestrings
275
            for name_tuple in record_names:
276
                for name in name_tuple:
277
                    _check_name_encoding(name)
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
278
                # Check that the name is unique.  Note that Python will refuse
279
                # to decode non-shortest forms of UTF-8 encoding, so there is no
280
                # risk that the same unicode string has been encoded two
281
                # different ways.
2682.1.1 by Robert Collins
* The ``bzrlib.pack`` interface has changed to use tuples of bytestrings
282
                if name_tuple in all_names:
283
                    raise errors.DuplicateRecordNameError(name_tuple)
284
                all_names.add(name_tuple)
2506.2.6 by Andrew Bennetts
Add validate method to ContainerReader and BytesRecordReader.
285
        excess_bytes = self.reader_func(1)
286
        if excess_bytes != '':
287
            raise errors.ContainerHasExcessDataError(excess_bytes)
288
2506.3.1 by Andrew Bennetts
More progress:
289
290
class BytesRecordReader(BaseReader):
291
292
    def read(self):
2506.2.6 by Andrew Bennetts
Add validate method to ContainerReader and BytesRecordReader.
293
        """Read this record.
294
2506.6.2 by Andrew Bennetts
Docstring improvements.
295
        You can either validate or read a record, you can't do both.
2506.2.6 by Andrew Bennetts
Add validate method to ContainerReader and BytesRecordReader.
296
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
297
        :returns: A tuple of (names, callable).  The callable can be called
298
            repeatedly to obtain the bytes for the record, with a max_length
299
            argument.  If max_length is None, returns all the bytes.  Because
300
            records can be arbitrarily large, using None is not recommended
301
            unless you have reason to believe the content will fit in memory.
2506.2.6 by Andrew Bennetts
Add validate method to ContainerReader and BytesRecordReader.
302
        """
2506.3.1 by Andrew Bennetts
More progress:
303
        # Read the content length.
304
        length_line = self._read_line()
305
        try:
306
            length = int(length_line)
307
        except ValueError:
308
            raise errors.InvalidRecordError(
309
                "%r is not a valid length." % (length_line,))
310
        
311
        # Read the list of names.
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
312
        names = []
313
        while True:
2682.1.1 by Robert Collins
* The ``bzrlib.pack`` interface has changed to use tuples of bytestrings
314
            name_line = self._read_line()
315
            if name_line == '':
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
316
                break
2682.1.1 by Robert Collins
* The ``bzrlib.pack`` interface has changed to use tuples of bytestrings
317
            name_tuple = tuple(name_line.split('\x00'))
318
            for name in name_tuple:
319
                _check_name(name)
320
            names.append(name_tuple)
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
321
322
        self._remaining_length = length
323
        return names, self._content_reader
324
325
    def _content_reader(self, max_length):
326
        if max_length is None:
327
            length_to_read = self._remaining_length
328
        else:
329
            length_to_read = min(max_length, self._remaining_length)
330
        self._remaining_length -= length_to_read
331
        bytes = self.reader_func(length_to_read)
332
        if len(bytes) != length_to_read:
2506.3.3 by Andrew Bennetts
Deal with EOF in the middle of a bytes record.
333
            raise errors.UnexpectedEndOfContainerError()
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
334
        return bytes
2506.2.1 by Andrew Bennetts
Start implementing container format reading and writing.
335
2506.2.6 by Andrew Bennetts
Add validate method to ContainerReader and BytesRecordReader.
336
    def validate(self):
337
        """Validate this record.
338
339
        You can either validate or read, you can't do both.
340
341
        :raises ContainerError: if this record is invalid.
342
        """
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
343
        names, read_bytes = self.read()
2682.1.1 by Robert Collins
* The ``bzrlib.pack`` interface has changed to use tuples of bytestrings
344
        for name_tuple in names:
345
            for name in name_tuple:
346
                _check_name_encoding(name)
2506.6.1 by Andrew Bennetts
Return a callable instead of a str from read, and add more validation.
347
        read_bytes(None)
348