bzr branch
http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1  | 
# Copyright (C) 2008, 2009 Canonical Ltd
 | 
2  | 
#
 | 
|
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
3  | 
# This program is free software; you can redistribute it and/or modify
 | 
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
4  | 
# it under the terms of the GNU General Public License as published by
 | 
5  | 
# the Free Software Foundation; either version 2 of the License, or
 | 
|
6  | 
# (at your option) any later version.
 | 
|
7  | 
#
 | 
|
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
8  | 
# This program is distributed in the hope that it will be useful,
 | 
9  | 
# but WITHOUT ANY WARRANTY; without even the implied warranty of
 | 
|
10  | 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 | 
|
11  | 
# GNU General Public License for more details.
 | 
|
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
12  | 
#
 | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
13  | 
# You should have received a copy of the GNU General Public License
 | 
14  | 
# along with this program; if not, write to the Free Software
 | 
|
| 
3735.36.3
by John Arbash Meinel
 Add the new address for FSF to the new files.  | 
15  | 
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
16  | 
|
17  | 
"""Core compression logic for compressing streams of related files."""
 | 
|
18  | 
||
| 
0.17.13
by Robert Collins
 Do not output copy instructions which take more to encode than a fresh insert. (But do not refer to those insertions when finding ranges to copy: they are not interesting).  | 
19  | 
from itertools import izip  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
20  | 
from cStringIO import StringIO  | 
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
21  | 
import time  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
22  | 
import zlib  | 
| 
0.17.44
by John Arbash Meinel
 Use the bit field to allow both lzma groups and zlib groups.  | 
23  | 
try:  | 
24  | 
import pylzma  | 
|
25  | 
except ImportError:  | 
|
26  | 
pylzma = None  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
27  | 
|
| 
0.17.4
by Robert Collins
 Annotate.  | 
28  | 
from bzrlib import (  | 
29  | 
annotate,  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
30  | 
debug,  | 
| 
0.17.4
by Robert Collins
 Annotate.  | 
31  | 
diff,  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
32  | 
errors,  | 
| 
0.17.4
by Robert Collins
 Annotate.  | 
33  | 
graph as _mod_graph,  | 
| 
4343.3.21
by John Arbash Meinel
 Implement get_missing_parents in terms of _KeyRefs.  | 
34  | 
knit,  | 
| 
0.20.2
by John Arbash Meinel
 Teach groupcompress about 'chunked' encoding  | 
35  | 
osutils,  | 
| 
0.17.4
by Robert Collins
 Annotate.  | 
36  | 
pack,  | 
37  | 
patiencediff,  | 
|
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
38  | 
trace,  | 
| 
0.17.4
by Robert Collins
 Annotate.  | 
39  | 
    )
 | 
40  | 
from bzrlib.graph import Graph  | 
|
| 
0.17.21
by Robert Collins
 Update groupcompress to bzrlib 1.10.  | 
41  | 
from bzrlib.btree_index import BTreeBuilder  | 
| 
0.17.24
by Robert Collins
 Add a group cache to decompression, 5 times faster than knit at decompression when accessing everything in a group.  | 
42  | 
from bzrlib.lru_cache import LRUSizeCache  | 
| 
0.17.9
by Robert Collins
 Initial stab at repository format support.  | 
43  | 
from bzrlib.tsort import topo_sort  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
44  | 
from bzrlib.versionedfile import (  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
45  | 
adapter_registry,  | 
46  | 
AbsentContentFactory,  | 
|
| 
0.20.5
by John Arbash Meinel
 Finish the Fulltext => Chunked conversions so that we work in the more-efficient Chunks.  | 
47  | 
ChunkedContentFactory,  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
48  | 
FulltextContentFactory,  | 
49  | 
VersionedFiles,  | 
|
50  | 
    )
 | 
|
51  | 
||
| 
0.17.44
by John Arbash Meinel
 Use the bit field to allow both lzma groups and zlib groups.  | 
52  | 
_USE_LZMA = False and (pylzma is not None)  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
53  | 
|
| 
3735.2.162
by John Arbash Meinel
 Change GroupCompressor.compress() to return the start_point.  | 
54  | 
# osutils.sha_string('')
 | 
55  | 
_null_sha1 = 'da39a3ee5e6b4b0d3255bfef95601890afd80709'  | 
|
56  | 
||
57  | 
||
| 
0.20.11
by John Arbash Meinel
 start experimenting with gc-optimal ordering.  | 
58  | 
def sort_gc_optimal(parent_map):  | 
| 
3735.31.14
by John Arbash Meinel
 Change the gc-optimal to 'groupcompress'  | 
59  | 
"""Sort and group the keys in parent_map into groupcompress order.  | 
| 
0.20.11
by John Arbash Meinel
 start experimenting with gc-optimal ordering.  | 
60  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
61  | 
    groupcompress is defined (currently) as reverse-topological order, grouped
 | 
62  | 
    by the key prefix.
 | 
|
| 
0.20.11
by John Arbash Meinel
 start experimenting with gc-optimal ordering.  | 
63  | 
|
64  | 
    :return: A sorted-list of keys
 | 
|
65  | 
    """
 | 
|
| 
3735.31.14
by John Arbash Meinel
 Change the gc-optimal to 'groupcompress'  | 
66  | 
    # groupcompress ordering is approximately reverse topological,
 | 
| 
0.20.11
by John Arbash Meinel
 start experimenting with gc-optimal ordering.  | 
67  | 
    # properly grouped by file-id.
 | 
| 
0.20.23
by John Arbash Meinel
 Add a progress indicator for chk pages.  | 
68  | 
per_prefix_map = {}  | 
| 
0.20.11
by John Arbash Meinel
 start experimenting with gc-optimal ordering.  | 
69  | 
for item in parent_map.iteritems():  | 
70  | 
key = item[0]  | 
|
71  | 
if isinstance(key, str) or len(key) == 1:  | 
|
| 
0.20.23
by John Arbash Meinel
 Add a progress indicator for chk pages.  | 
72  | 
prefix = ''  | 
| 
0.20.11
by John Arbash Meinel
 start experimenting with gc-optimal ordering.  | 
73  | 
else:  | 
| 
0.20.23
by John Arbash Meinel
 Add a progress indicator for chk pages.  | 
74  | 
prefix = key[0]  | 
75  | 
try:  | 
|
76  | 
per_prefix_map[prefix].append(item)  | 
|
77  | 
except KeyError:  | 
|
78  | 
per_prefix_map[prefix] = [item]  | 
|
| 
0.20.11
by John Arbash Meinel
 start experimenting with gc-optimal ordering.  | 
79  | 
|
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
80  | 
present_keys = []  | 
| 
0.20.11
by John Arbash Meinel
 start experimenting with gc-optimal ordering.  | 
81  | 
for prefix in sorted(per_prefix_map):  | 
82  | 
present_keys.extend(reversed(topo_sort(per_prefix_map[prefix])))  | 
|
83  | 
return present_keys  | 
|
84  | 
||
85  | 
||
| 
3735.32.9
by John Arbash Meinel
 Use a 32kB extension, since that is the max window size for zlib.  | 
86  | 
# The max zlib window size is 32kB, so if we set 'max_size' output of the
 | 
87  | 
# decompressor to the requested bytes + 32kB, then we should guarantee
 | 
|
88  | 
# num_bytes coming out.
 | 
|
89  | 
_ZLIB_DECOMP_WINDOW = 32*1024  | 
|
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
90  | 
|
91  | 
class GroupCompressBlock(object):  | 
|
92  | 
"""An object which maintains the internal structure of the compressed data.  | 
|
93  | 
||
94  | 
    This tracks the meta info (start of text, length, type, etc.)
 | 
|
95  | 
    """
 | 
|
96  | 
||
| 
0.25.5
by John Arbash Meinel
 Now using a zlib compressed format.  | 
97  | 
    # Group Compress Block v1 Zlib
 | 
98  | 
GCB_HEADER = 'gcb1z\n'  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
99  | 
    # Group Compress Block v1 Lzma
 | 
| 
0.17.44
by John Arbash Meinel
 Use the bit field to allow both lzma groups and zlib groups.  | 
100  | 
GCB_LZ_HEADER = 'gcb1l\n'  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
101  | 
GCB_KNOWN_HEADERS = (GCB_HEADER, GCB_LZ_HEADER)  | 
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
102  | 
|
103  | 
def __init__(self):  | 
|
104  | 
        # map by key? or just order in file?
 | 
|
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
105  | 
self._compressor_name = None  | 
| 
3735.32.5
by John Arbash Meinel
 Change the parsing code to start out just holding the compressed bytes.  | 
106  | 
self._z_content = None  | 
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
107  | 
self._z_content_decompressor = None  | 
| 
3735.32.5
by John Arbash Meinel
 Change the parsing code to start out just holding the compressed bytes.  | 
108  | 
self._z_content_length = None  | 
109  | 
self._content_length = None  | 
|
| 
0.25.6
by John Arbash Meinel
 (tests broken) implement the basic ability to have a separate header  | 
110  | 
self._content = None  | 
| 
4469.1.1
by John Arbash Meinel
 Add a set_content_chunked member to GroupCompressBlock.  | 
111  | 
self._content_chunks = None  | 
| 
3735.32.5
by John Arbash Meinel
 Change the parsing code to start out just holding the compressed bytes.  | 
112  | 
|
113  | 
def __len__(self):  | 
|
| 
3735.38.4
by John Arbash Meinel
 Another disk format change.  | 
114  | 
        # This is the maximum number of bytes this object will reference if
 | 
115  | 
        # everything is decompressed. However, if we decompress less than
 | 
|
116  | 
        # everything... (this would cause some problems for LRUSizeCache)
 | 
|
117  | 
return self._content_length + self._z_content_length  | 
|
| 
0.17.48
by John Arbash Meinel
 if _NO_LABELS is set, don't bother parsing the mini header.  | 
118  | 
|
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
119  | 
def _ensure_content(self, num_bytes=None):  | 
120  | 
"""Make sure that content has been expanded enough.  | 
|
121  | 
||
122  | 
        :param num_bytes: Ensure that we have extracted at least num_bytes of
 | 
|
123  | 
            content. If None, consume everything
 | 
|
124  | 
        """
 | 
|
| 
3735.32.15
by John Arbash Meinel
 Change the GroupCompressBlock code to allow not recording 'end'.  | 
125  | 
        # TODO: If we re-use the same content block at different times during
 | 
126  | 
        #       get_record_stream(), it is possible that the first pass will
 | 
|
127  | 
        #       get inserted, triggering an extract/_ensure_content() which
 | 
|
128  | 
        #       will get rid of _z_content. And then the next use of the block
 | 
|
129  | 
        #       will try to access _z_content (to send it over the wire), and
 | 
|
130  | 
        #       fail because it is already extracted. Consider never releasing
 | 
|
131  | 
        #       _z_content because of this.
 | 
|
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
132  | 
if num_bytes is None:  | 
133  | 
num_bytes = self._content_length  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
134  | 
elif (self._content_length is not None  | 
135  | 
and num_bytes > self._content_length):  | 
|
136  | 
raise AssertionError(  | 
|
137  | 
'requested num_bytes (%d) > content length (%d)'  | 
|
138  | 
% (num_bytes, self._content_length))  | 
|
139  | 
        # Expand the content if required
 | 
|
| 
3735.32.6
by John Arbash Meinel
 A bit of reworking changes things so content is expanded at extract() time.  | 
140  | 
if self._content is None:  | 
| 
4469.1.1
by John Arbash Meinel
 Add a set_content_chunked member to GroupCompressBlock.  | 
141  | 
if self._content_chunks is not None:  | 
142  | 
self._content = ''.join(self._content_chunks)  | 
|
143  | 
self._content_chunks = None  | 
|
144  | 
if self._content is None:  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
145  | 
if self._z_content is None:  | 
146  | 
raise AssertionError('No content to decompress')  | 
|
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
147  | 
if self._z_content == '':  | 
148  | 
self._content = ''  | 
|
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
149  | 
elif self._compressor_name == 'lzma':  | 
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
150  | 
                # We don't do partial lzma decomp yet
 | 
| 
3735.2.160
by John Arbash Meinel
 Fix a trivial typo  | 
151  | 
self._content = pylzma.decompress(self._z_content)  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
152  | 
elif self._compressor_name == 'zlib':  | 
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
153  | 
                # Start a zlib decompressor
 | 
| 
3735.32.27
by John Arbash Meinel
 Have _LazyGroupContentManager pre-extract everything it holds.  | 
154  | 
if num_bytes is None:  | 
155  | 
self._content = zlib.decompress(self._z_content)  | 
|
156  | 
else:  | 
|
157  | 
self._z_content_decompressor = zlib.decompressobj()  | 
|
158  | 
                    # Seed the decompressor with the uncompressed bytes, so
 | 
|
159  | 
                    # that the rest of the code is simplified
 | 
|
160  | 
self._content = self._z_content_decompressor.decompress(  | 
|
161  | 
self._z_content, num_bytes + _ZLIB_DECOMP_WINDOW)  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
162  | 
else:  | 
| 
3735.2.182
by Matt Nordhoff
 Improve an assertion message slightly, and fix typos in 2 others  | 
163  | 
raise AssertionError('Unknown compressor: %r'  | 
| 
3735.2.183
by John Arbash Meinel
 Fix the compressor name.  | 
164  | 
% self._compressor_name)  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
165  | 
        # Any bytes remaining to be decompressed will be in the decompressors
 | 
166  | 
        # 'unconsumed_tail'
 | 
|
167  | 
||
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
168  | 
        # Do we have enough bytes already?
 | 
| 
3735.32.11
by John Arbash Meinel
 Add tests for the ability to do partial decompression without knowing the final length.  | 
169  | 
if num_bytes is not None and len(self._content) >= num_bytes:  | 
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
170  | 
            return
 | 
| 
3735.32.27
by John Arbash Meinel
 Have _LazyGroupContentManager pre-extract everything it holds.  | 
171  | 
if num_bytes is None and self._z_content_decompressor is None:  | 
172  | 
            # We must have already decompressed everything
 | 
|
173  | 
            return
 | 
|
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
174  | 
        # If we got this far, and don't have a decompressor, something is wrong
 | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
175  | 
if self._z_content_decompressor is None:  | 
176  | 
raise AssertionError(  | 
|
| 
3735.2.182
by Matt Nordhoff
 Improve an assertion message slightly, and fix typos in 2 others  | 
177  | 
'No decompressor to decompress %d bytes' % num_bytes)  | 
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
178  | 
remaining_decomp = self._z_content_decompressor.unconsumed_tail  | 
| 
3735.32.11
by John Arbash Meinel
 Add tests for the ability to do partial decompression without knowing the final length.  | 
179  | 
if num_bytes is None:  | 
180  | 
if remaining_decomp:  | 
|
181  | 
                # We don't know how much is left, but we'll decompress it all
 | 
|
182  | 
self._content += self._z_content_decompressor.decompress(  | 
|
183  | 
remaining_decomp)  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
184  | 
                # Note: There's what I consider a bug in zlib.decompressobj
 | 
| 
3735.32.11
by John Arbash Meinel
 Add tests for the ability to do partial decompression without knowing the final length.  | 
185  | 
                #       If you pass back in the entire unconsumed_tail, only
 | 
186  | 
                #       this time you don't pass a max-size, it doesn't
 | 
|
187  | 
                #       change the unconsumed_tail back to None/''.
 | 
|
188  | 
                #       However, we know we are done with the whole stream
 | 
|
189  | 
self._z_content_decompressor = None  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
190  | 
            # XXX: Why is this the only place in this routine we set this?
 | 
| 
3735.32.11
by John Arbash Meinel
 Add tests for the ability to do partial decompression without knowing the final length.  | 
191  | 
self._content_length = len(self._content)  | 
192  | 
else:  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
193  | 
if not remaining_decomp:  | 
194  | 
raise AssertionError('Nothing left to decompress')  | 
|
| 
3735.32.11
by John Arbash Meinel
 Add tests for the ability to do partial decompression without knowing the final length.  | 
195  | 
needed_bytes = num_bytes - len(self._content)  | 
| 
3735.32.12
by John Arbash Meinel
 Add groupcompress-block[-ref] as valid stream types.  | 
196  | 
            # We always set max_size to 32kB over the minimum needed, so that
 | 
197  | 
            # zlib will give us as much as we really want.
 | 
|
198  | 
            # TODO: If this isn't good enough, we could make a loop here,
 | 
|
199  | 
            #       that keeps expanding the request until we get enough
 | 
|
| 
3735.32.11
by John Arbash Meinel
 Add tests for the ability to do partial decompression without knowing the final length.  | 
200  | 
self._content += self._z_content_decompressor.decompress(  | 
201  | 
remaining_decomp, needed_bytes + _ZLIB_DECOMP_WINDOW)  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
202  | 
if len(self._content) < num_bytes:  | 
203  | 
raise AssertionError('%d bytes wanted, only %d available'  | 
|
204  | 
% (num_bytes, len(self._content)))  | 
|
| 
3735.32.11
by John Arbash Meinel
 Add tests for the ability to do partial decompression without knowing the final length.  | 
205  | 
if not self._z_content_decompressor.unconsumed_tail:  | 
206  | 
                # The stream is finished
 | 
|
207  | 
self._z_content_decompressor = None  | 
|
| 
3735.32.6
by John Arbash Meinel
 A bit of reworking changes things so content is expanded at extract() time.  | 
208  | 
|
| 
3735.38.4
by John Arbash Meinel
 Another disk format change.  | 
209  | 
def _parse_bytes(self, bytes, pos):  | 
| 
3735.32.5
by John Arbash Meinel
 Change the parsing code to start out just holding the compressed bytes.  | 
210  | 
"""Read the various lengths from the header.  | 
211  | 
||
212  | 
        This also populates the various 'compressed' buffers.
 | 
|
213  | 
||
214  | 
        :return: The position in bytes just after the last newline
 | 
|
215  | 
        """
 | 
|
| 
3735.38.4
by John Arbash Meinel
 Another disk format change.  | 
216  | 
        # At present, we have 2 integers for the compressed and uncompressed
 | 
217  | 
        # content. In base10 (ascii) 14 bytes can represent > 1TB, so to avoid
 | 
|
218  | 
        # checking too far, cap the search to 14 bytes.
 | 
|
219  | 
pos2 = bytes.index('\n', pos, pos + 14)  | 
|
220  | 
self._z_content_length = int(bytes[pos:pos2])  | 
|
221  | 
pos = pos2 + 1  | 
|
222  | 
pos2 = bytes.index('\n', pos, pos + 14)  | 
|
223  | 
self._content_length = int(bytes[pos:pos2])  | 
|
224  | 
pos = pos2 + 1  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
225  | 
if len(bytes) != (pos + self._z_content_length):  | 
226  | 
            # XXX: Define some GCCorrupt error ?
 | 
|
227  | 
raise AssertionError('Invalid bytes: (%d) != %d + %d' %  | 
|
228  | 
(len(bytes), pos, self._z_content_length))  | 
|
| 
3735.38.4
by John Arbash Meinel
 Another disk format change.  | 
229  | 
self._z_content = bytes[pos:]  | 
| 
3735.32.5
by John Arbash Meinel
 Change the parsing code to start out just holding the compressed bytes.  | 
230  | 
|
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
231  | 
    @classmethod
 | 
232  | 
def from_bytes(cls, bytes):  | 
|
233  | 
out = cls()  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
234  | 
if bytes[:6] not in cls.GCB_KNOWN_HEADERS:  | 
235  | 
raise ValueError('bytes did not start with any of %r'  | 
|
236  | 
% (cls.GCB_KNOWN_HEADERS,))  | 
|
237  | 
        # XXX: why not testing the whole header ?
 | 
|
| 
0.17.44
by John Arbash Meinel
 Use the bit field to allow both lzma groups and zlib groups.  | 
238  | 
if bytes[4] == 'z':  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
239  | 
out._compressor_name = 'zlib'  | 
| 
0.17.45
by John Arbash Meinel
 Just make sure we have the right decompressor  | 
240  | 
elif bytes[4] == 'l':  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
241  | 
out._compressor_name = 'lzma'  | 
| 
0.17.45
by John Arbash Meinel
 Just make sure we have the right decompressor  | 
242  | 
else:  | 
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
243  | 
raise ValueError('unknown compressor: %r' % (bytes,))  | 
| 
3735.38.4
by John Arbash Meinel
 Another disk format change.  | 
244  | 
out._parse_bytes(bytes, 6)  | 
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
245  | 
return out  | 
246  | 
||
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
247  | 
def extract(self, key, start, end, sha1=None):  | 
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
248  | 
"""Extract the text for a specific key.  | 
249  | 
||
250  | 
        :param key: The label used for this content
 | 
|
251  | 
        :param sha1: TODO (should we validate only when sha1 is supplied?)
 | 
|
252  | 
        :return: The bytes for the content
 | 
|
253  | 
        """
 | 
|
| 
3735.34.1
by John Arbash Meinel
 Some testing to see if we can decrease the peak memory consumption a bit.  | 
254  | 
if start == end == 0:  | 
| 
3735.2.158
by John Arbash Meinel
 Remove support for passing None for end in GroupCompressBlock.extract.  | 
255  | 
return ''  | 
256  | 
self._ensure_content(end)  | 
|
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
257  | 
        # The bytes are 'f' or 'd' for the type, then a variable-length
 | 
258  | 
        # base128 integer for the content size, then the actual content
 | 
|
| 
3735.32.15
by John Arbash Meinel
 Change the GroupCompressBlock code to allow not recording 'end'.  | 
259  | 
        # We know that the variable-length integer won't be longer than 5
 | 
260  | 
        # bytes (it takes 5 bytes to encode 2^32)
 | 
|
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
261  | 
c = self._content[start]  | 
262  | 
if c == 'f':  | 
|
263  | 
type = 'fulltext'  | 
|
| 
0.17.36
by John Arbash Meinel
 Adding a mini-len to the delta/fulltext bytes  | 
264  | 
else:  | 
| 
3735.32.7
by John Arbash Meinel
 Implement partial decompression support.  | 
265  | 
if c != 'd':  | 
266  | 
raise ValueError('Unknown content control code: %s'  | 
|
267  | 
% (c,))  | 
|
268  | 
type = 'delta'  | 
|
| 
3735.32.15
by John Arbash Meinel
 Change the GroupCompressBlock code to allow not recording 'end'.  | 
269  | 
content_len, len_len = decode_base128_int(  | 
270  | 
self._content[start + 1:start + 6])  | 
|
271  | 
content_start = start + 1 + len_len  | 
|
| 
3735.2.158
by John Arbash Meinel
 Remove support for passing None for end in GroupCompressBlock.extract.  | 
272  | 
if end != content_start + content_len:  | 
273  | 
raise ValueError('end != len according to field header'  | 
|
274  | 
' %s != %s' % (end, content_start + content_len))  | 
|
| 
0.17.36
by John Arbash Meinel
 Adding a mini-len to the delta/fulltext bytes  | 
275  | 
if c == 'f':  | 
| 
3735.40.19
by John Arbash Meinel
 Implement apply_delta_to_source which doesn't have to malloc another string.  | 
276  | 
bytes = self._content[content_start:end]  | 
| 
0.17.36
by John Arbash Meinel
 Adding a mini-len to the delta/fulltext bytes  | 
277  | 
elif c == 'd':  | 
| 
3735.40.19
by John Arbash Meinel
 Implement apply_delta_to_source which doesn't have to malloc another string.  | 
278  | 
bytes = apply_delta_to_source(self._content, content_start, end)  | 
| 
3735.2.158
by John Arbash Meinel
 Remove support for passing None for end in GroupCompressBlock.extract.  | 
279  | 
return bytes  | 
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
280  | 
|
| 
4469.1.2
by John Arbash Meinel
 The only caller already knows the content length, so make the api such that  | 
281  | 
def set_chunked_content(self, content_chunks, length):  | 
| 
4469.1.1
by John Arbash Meinel
 Add a set_content_chunked member to GroupCompressBlock.  | 
282  | 
"""Set the content of this block to the given chunks."""  | 
| 
4469.1.3
by John Arbash Meinel
 Notes on why we do it the way we do.  | 
283  | 
        # If we have lots of short lines, it is may be more efficient to join
 | 
284  | 
        # the content ahead of time. If the content is <10MiB, we don't really
 | 
|
285  | 
        # care about the extra memory consumption, so we can just pack it and
 | 
|
286  | 
        # be done. However, timing showed 18s => 17.9s for repacking 1k revs of
 | 
|
287  | 
        # mysql, which is below the noise margin
 | 
|
| 
4469.1.2
by John Arbash Meinel
 The only caller already knows the content length, so make the api such that  | 
288  | 
self._content_length = length  | 
| 
4469.1.1
by John Arbash Meinel
 Add a set_content_chunked member to GroupCompressBlock.  | 
289  | 
self._content_chunks = content_chunks  | 
| 
4469.1.2
by John Arbash Meinel
 The only caller already knows the content length, so make the api such that  | 
290  | 
self._content = None  | 
| 
4469.1.1
by John Arbash Meinel
 Add a set_content_chunked member to GroupCompressBlock.  | 
291  | 
self._z_content = None  | 
292  | 
||
| 
3735.32.17
by John Arbash Meinel
 We now round-trip the wire_bytes.  | 
293  | 
def set_content(self, content):  | 
294  | 
"""Set the content of this block."""  | 
|
295  | 
self._content_length = len(content)  | 
|
296  | 
self._content = content  | 
|
297  | 
self._z_content = None  | 
|
298  | 
||
| 
4469.1.1
by John Arbash Meinel
 Add a set_content_chunked member to GroupCompressBlock.  | 
299  | 
def _create_z_content_using_lzma(self):  | 
300  | 
if self._content_chunks is not None:  | 
|
301  | 
self._content = ''.join(self._content_chunks)  | 
|
302  | 
self._content_chunks = None  | 
|
303  | 
if self._content is None:  | 
|
304  | 
raise AssertionError('Nothing to compress')  | 
|
305  | 
self._z_content = pylzma.compress(self._content)  | 
|
306  | 
self._z_content_length = len(self._z_content)  | 
|
307  | 
||
308  | 
def _create_z_content_from_chunks(self):  | 
|
309  | 
compressor = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION)  | 
|
| 
4469.1.3
by John Arbash Meinel
 Notes on why we do it the way we do.  | 
310  | 
compressed_chunks = map(compressor.compress, self._content_chunks)  | 
| 
4469.1.1
by John Arbash Meinel
 Add a set_content_chunked member to GroupCompressBlock.  | 
311  | 
compressed_chunks.append(compressor.flush())  | 
312  | 
self._z_content = ''.join(compressed_chunks)  | 
|
313  | 
self._z_content_length = len(self._z_content)  | 
|
314  | 
||
315  | 
def _create_z_content(self):  | 
|
316  | 
if self._z_content is not None:  | 
|
317  | 
            return
 | 
|
318  | 
if _USE_LZMA:  | 
|
319  | 
self._create_z_content_using_lzma()  | 
|
320  | 
            return
 | 
|
321  | 
if self._content_chunks is not None:  | 
|
322  | 
self._create_z_content_from_chunks()  | 
|
323  | 
            return
 | 
|
324  | 
self._z_content = zlib.compress(self._content)  | 
|
325  | 
self._z_content_length = len(self._z_content)  | 
|
326  | 
||
| 
3735.32.17
by John Arbash Meinel
 We now round-trip the wire_bytes.  | 
327  | 
def to_bytes(self):  | 
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
328  | 
"""Encode the information into a byte stream."""  | 
| 
4469.1.1
by John Arbash Meinel
 Add a set_content_chunked member to GroupCompressBlock.  | 
329  | 
self._create_z_content()  | 
| 
0.17.46
by John Arbash Meinel
 Set the proper header when using/not using lzma  | 
330  | 
if _USE_LZMA:  | 
331  | 
header = self.GCB_LZ_HEADER  | 
|
332  | 
else:  | 
|
333  | 
header = self.GCB_HEADER  | 
|
334  | 
chunks = [header,  | 
|
| 
3735.38.4
by John Arbash Meinel
 Another disk format change.  | 
335  | 
'%d\n%d\n' % (self._z_content_length, self._content_length),  | 
336  | 
self._z_content,  | 
|
| 
0.25.7
by John Arbash Meinel
 Have the GroupCompressBlock decide how to compress the header and content.  | 
337  | 
                 ]
 | 
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
338  | 
return ''.join(chunks)  | 
339  | 
||
| 
4300.1.1
by John Arbash Meinel
 Add the ability to convert a gc block into 'human readable' form.  | 
340  | 
def _dump(self, include_text=False):  | 
341  | 
"""Take this block, and spit out a human-readable structure.  | 
|
342  | 
||
343  | 
        :param include_text: Inserts also include text bits, chose whether you
 | 
|
344  | 
            want this displayed in the dump or not.
 | 
|
345  | 
        :return: A dump of the given block. The layout is something like:
 | 
|
346  | 
            [('f', length), ('d', delta_length, text_length, [delta_info])]
 | 
|
347  | 
            delta_info := [('i', num_bytes, text), ('c', offset, num_bytes),
 | 
|
348  | 
            ...]
 | 
|
349  | 
        """
 | 
|
350  | 
self._ensure_content()  | 
|
351  | 
result = []  | 
|
352  | 
pos = 0  | 
|
353  | 
while pos < self._content_length:  | 
|
354  | 
kind = self._content[pos]  | 
|
355  | 
pos += 1  | 
|
356  | 
if kind not in ('f', 'd'):  | 
|
357  | 
raise ValueError('invalid kind character: %r' % (kind,))  | 
|
358  | 
content_len, len_len = decode_base128_int(  | 
|
359  | 
self._content[pos:pos + 5])  | 
|
360  | 
pos += len_len  | 
|
361  | 
if content_len + pos > self._content_length:  | 
|
362  | 
raise ValueError('invalid content_len %d for record @ pos %d'  | 
|
363  | 
% (content_len, pos - len_len - 1))  | 
|
364  | 
if kind == 'f': # Fulltext  | 
|
| 
4398.5.6
by John Arbash Meinel
 A bit more debugging information from gcblock._dump(True)  | 
365  | 
if include_text:  | 
366  | 
text = self._content[pos:pos+content_len]  | 
|
367  | 
result.append(('f', content_len, text))  | 
|
368  | 
else:  | 
|
369  | 
result.append(('f', content_len))  | 
|
| 
4300.1.1
by John Arbash Meinel
 Add the ability to convert a gc block into 'human readable' form.  | 
370  | 
elif kind == 'd': # Delta  | 
371  | 
delta_content = self._content[pos:pos+content_len]  | 
|
372  | 
delta_info = []  | 
|
373  | 
                # The first entry in a delta is the decompressed length
 | 
|
374  | 
decomp_len, delta_pos = decode_base128_int(delta_content)  | 
|
375  | 
result.append(('d', content_len, decomp_len, delta_info))  | 
|
376  | 
measured_len = 0  | 
|
377  | 
while delta_pos < content_len:  | 
|
378  | 
c = ord(delta_content[delta_pos])  | 
|
379  | 
delta_pos += 1  | 
|
380  | 
if c & 0x80: # Copy  | 
|
381  | 
(offset, length,  | 
|
382  | 
delta_pos) = decode_copy_instruction(delta_content, c,  | 
|
383  | 
delta_pos)  | 
|
| 
4398.5.6
by John Arbash Meinel
 A bit more debugging information from gcblock._dump(True)  | 
384  | 
if include_text:  | 
385  | 
text = self._content[offset:offset+length]  | 
|
386  | 
delta_info.append(('c', offset, length, text))  | 
|
387  | 
else:  | 
|
388  | 
delta_info.append(('c', offset, length))  | 
|
| 
4300.1.1
by John Arbash Meinel
 Add the ability to convert a gc block into 'human readable' form.  | 
389  | 
measured_len += length  | 
390  | 
else: # Insert  | 
|
391  | 
if include_text:  | 
|
392  | 
txt = delta_content[delta_pos:delta_pos+c]  | 
|
393  | 
else:  | 
|
394  | 
txt = ''  | 
|
395  | 
delta_info.append(('i', c, txt))  | 
|
396  | 
measured_len += c  | 
|
397  | 
delta_pos += c  | 
|
398  | 
if delta_pos != content_len:  | 
|
399  | 
raise ValueError('Delta consumed a bad number of bytes:'  | 
|
400  | 
' %d != %d' % (delta_pos, content_len))  | 
|
401  | 
if measured_len != decomp_len:  | 
|
402  | 
raise ValueError('Delta claimed fulltext was %d bytes, but'  | 
|
403  | 
' extraction resulted in %d bytes'  | 
|
404  | 
% (decomp_len, measured_len))  | 
|
405  | 
pos += content_len  | 
|
406  | 
return result  | 
|
407  | 
||
| 
0.25.2
by John Arbash Meinel
 First cut at meta-info as text form.  | 
408  | 
|
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
409  | 
class _LazyGroupCompressFactory(object):  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
410  | 
"""Yield content from a GroupCompressBlock on demand."""  | 
411  | 
||
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
412  | 
def __init__(self, key, parents, manager, start, end, first):  | 
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
413  | 
"""Create a _LazyGroupCompressFactory  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
414  | 
|
415  | 
        :param key: The key of just this record
 | 
|
416  | 
        :param parents: The parents of this key (possibly None)
 | 
|
417  | 
        :param gc_block: A GroupCompressBlock object
 | 
|
418  | 
        :param start: Offset of the first byte for this record in the
 | 
|
419  | 
            uncompressd content
 | 
|
420  | 
        :param end: Offset of the byte just after the end of this record
 | 
|
421  | 
            (ie, bytes = content[start:end])
 | 
|
422  | 
        :param first: Is this the first Factory for the given block?
 | 
|
423  | 
        """
 | 
|
424  | 
self.key = key  | 
|
425  | 
self.parents = parents  | 
|
426  | 
self.sha1 = None  | 
|
| 
3735.32.15
by John Arbash Meinel
 Change the GroupCompressBlock code to allow not recording 'end'.  | 
427  | 
        # Note: This attribute coupled with Manager._factories creates a
 | 
428  | 
        #       reference cycle. Perhaps we would rather use a weakref(), or
 | 
|
429  | 
        #       find an appropriate time to release the ref. After the first
 | 
|
430  | 
        #       get_bytes_as call? After Manager.get_record_stream() returns
 | 
|
431  | 
        #       the object?
 | 
|
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
432  | 
self._manager = manager  | 
| 
3735.34.1
by John Arbash Meinel
 Some testing to see if we can decrease the peak memory consumption a bit.  | 
433  | 
self._bytes = None  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
434  | 
self.storage_kind = 'groupcompress-block'  | 
435  | 
if not first:  | 
|
436  | 
self.storage_kind = 'groupcompress-block-ref'  | 
|
437  | 
self._first = first  | 
|
438  | 
self._start = start  | 
|
439  | 
self._end = end  | 
|
440  | 
||
| 
3735.32.12
by John Arbash Meinel
 Add groupcompress-block[-ref] as valid stream types.  | 
441  | 
def __repr__(self):  | 
442  | 
return '%s(%s, first=%s)' % (self.__class__.__name__,  | 
|
443  | 
self.key, self._first)  | 
|
444  | 
||
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
445  | 
def get_bytes_as(self, storage_kind):  | 
446  | 
if storage_kind == self.storage_kind:  | 
|
447  | 
if self._first:  | 
|
448  | 
                # wire bytes, something...
 | 
|
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
449  | 
return self._manager._wire_bytes()  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
450  | 
else:  | 
451  | 
return ''  | 
|
452  | 
if storage_kind in ('fulltext', 'chunked'):  | 
|
| 
3735.34.1
by John Arbash Meinel
 Some testing to see if we can decrease the peak memory consumption a bit.  | 
453  | 
if self._bytes is None:  | 
| 
3735.34.3
by John Arbash Meinel
 Cleanup, in preparation for merging to brisbane-core.  | 
454  | 
                # Grab and cache the raw bytes for this entry
 | 
455  | 
                # and break the ref-cycle with _manager since we don't need it
 | 
|
456  | 
                # anymore
 | 
|
| 
3735.34.1
by John Arbash Meinel
 Some testing to see if we can decrease the peak memory consumption a bit.  | 
457  | 
self._manager._prepare_for_extract()  | 
458  | 
block = self._manager._block  | 
|
| 
3735.34.2
by John Arbash Meinel
 Merge brisbane-core tip, resolve differences.  | 
459  | 
self._bytes = block.extract(self.key, self._start, self._end)  | 
| 
3735.37.5
by John Arbash Meinel
 Restore the refcycle reduction code.  | 
460  | 
                # There are code paths that first extract as fulltext, and then
 | 
461  | 
                # extract as storage_kind (smart fetch). So we don't break the
 | 
|
462  | 
                # refcycle here, but instead in manager.get_record_stream()
 | 
|
| 
3735.2.163
by John Arbash Meinel
 Merge bzr.dev 4187, and revert the change to fix refcycle issues.  | 
463  | 
                # self._manager = None
 | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
464  | 
if storage_kind == 'fulltext':  | 
| 
3735.34.1
by John Arbash Meinel
 Some testing to see if we can decrease the peak memory consumption a bit.  | 
465  | 
return self._bytes  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
466  | 
else:  | 
| 
3735.34.1
by John Arbash Meinel
 Some testing to see if we can decrease the peak memory consumption a bit.  | 
467  | 
return [self._bytes]  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
468  | 
raise errors.UnavailableRepresentation(self.key, storage_kind,  | 
| 
3735.34.3
by John Arbash Meinel
 Cleanup, in preparation for merging to brisbane-core.  | 
469  | 
self.storage_kind)  | 
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
470  | 
|
471  | 
||
| 
3735.32.17
by John Arbash Meinel
 We now round-trip the wire_bytes.  | 
472  | 
class _LazyGroupContentManager(object):  | 
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
473  | 
"""This manages a group of _LazyGroupCompressFactory objects."""  | 
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
474  | 
|
475  | 
def __init__(self, block):  | 
|
476  | 
self._block = block  | 
|
477  | 
        # We need to preserve the ordering
 | 
|
478  | 
self._factories = []  | 
|
| 
3735.32.27
by John Arbash Meinel
 Have _LazyGroupContentManager pre-extract everything it holds.  | 
479  | 
self._last_byte = 0  | 
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
480  | 
|
481  | 
def add_factory(self, key, parents, start, end):  | 
|
482  | 
if not self._factories:  | 
|
483  | 
first = True  | 
|
484  | 
else:  | 
|
485  | 
first = False  | 
|
486  | 
        # Note that this creates a reference cycle....
 | 
|
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
487  | 
factory = _LazyGroupCompressFactory(key, parents, self,  | 
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
488  | 
start, end, first=first)  | 
| 
3735.36.13
by John Arbash Meinel
 max() shows up under lsprof as more expensive than creating an object.  | 
489  | 
        # max() works here, but as a function call, doing a compare seems to be
 | 
490  | 
        # significantly faster, timeit says 250ms for max() and 100ms for the
 | 
|
491  | 
        # comparison
 | 
|
492  | 
if end > self._last_byte:  | 
|
493  | 
self._last_byte = end  | 
|
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
494  | 
self._factories.append(factory)  | 
495  | 
||
496  | 
def get_record_stream(self):  | 
|
497  | 
"""Get a record for all keys added so far."""  | 
|
498  | 
for factory in self._factories:  | 
|
499  | 
yield factory  | 
|
| 
3735.34.3
by John Arbash Meinel
 Cleanup, in preparation for merging to brisbane-core.  | 
500  | 
            # Break the ref-cycle
 | 
| 
3735.34.2
by John Arbash Meinel
 Merge brisbane-core tip, resolve differences.  | 
501  | 
factory._bytes = None  | 
| 
3735.37.5
by John Arbash Meinel
 Restore the refcycle reduction code.  | 
502  | 
factory._manager = None  | 
| 
3735.32.15
by John Arbash Meinel
 Change the GroupCompressBlock code to allow not recording 'end'.  | 
503  | 
        # TODO: Consider setting self._factories = None after the above loop,
 | 
504  | 
        #       as it will break the reference cycle
 | 
|
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
505  | 
|
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
506  | 
def _trim_block(self, last_byte):  | 
507  | 
"""Create a new GroupCompressBlock, with just some of the content."""  | 
|
508  | 
        # None of the factories need to be adjusted, because the content is
 | 
|
509  | 
        # located in an identical place. Just that some of the unreferenced
 | 
|
510  | 
        # trailing bytes are stripped
 | 
|
511  | 
trace.mutter('stripping trailing bytes from groupcompress block'  | 
|
512  | 
' %d => %d', self._block._content_length, last_byte)  | 
|
513  | 
new_block = GroupCompressBlock()  | 
|
514  | 
self._block._ensure_content(last_byte)  | 
|
515  | 
new_block.set_content(self._block._content[:last_byte])  | 
|
516  | 
self._block = new_block  | 
|
517  | 
||
518  | 
def _rebuild_block(self):  | 
|
519  | 
"""Create a new GroupCompressBlock with only the referenced texts."""  | 
|
520  | 
compressor = GroupCompressor()  | 
|
521  | 
tstart = time.time()  | 
|
522  | 
old_length = self._block._content_length  | 
|
| 
3735.2.162
by John Arbash Meinel
 Change GroupCompressor.compress() to return the start_point.  | 
523  | 
end_point = 0  | 
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
524  | 
for factory in self._factories:  | 
525  | 
bytes = factory.get_bytes_as('fulltext')  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
526  | 
(found_sha1, start_point, end_point,  | 
527  | 
type) = compressor.compress(factory.key, bytes, factory.sha1)  | 
|
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
528  | 
            # Now update this factory with the new offsets, etc
 | 
529  | 
factory.sha1 = found_sha1  | 
|
| 
3735.2.162
by John Arbash Meinel
 Change GroupCompressor.compress() to return the start_point.  | 
530  | 
factory._start = start_point  | 
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
531  | 
factory._end = end_point  | 
| 
3735.2.162
by John Arbash Meinel
 Change GroupCompressor.compress() to return the start_point.  | 
532  | 
self._last_byte = end_point  | 
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
533  | 
new_block = compressor.flush()  | 
534  | 
        # TODO: Should we check that new_block really *is* smaller than the old
 | 
|
535  | 
        #       block? It seems hard to come up with a method that it would
 | 
|
536  | 
        #       expand, since we do full compression again. Perhaps based on a
 | 
|
537  | 
        #       request that ends up poorly ordered?
 | 
|
538  | 
delta = time.time() - tstart  | 
|
539  | 
self._block = new_block  | 
|
540  | 
trace.mutter('creating new compressed block on-the-fly in %.3fs'  | 
|
541  | 
' %d bytes => %d bytes', delta, old_length,  | 
|
542  | 
self._block._content_length)  | 
|
543  | 
||
| 
3735.32.27
by John Arbash Meinel
 Have _LazyGroupContentManager pre-extract everything it holds.  | 
544  | 
def _prepare_for_extract(self):  | 
545  | 
"""A _LazyGroupCompressFactory is about to extract to fulltext."""  | 
|
546  | 
        # We expect that if one child is going to fulltext, all will be. This
 | 
|
547  | 
        # helps prevent all of them from extracting a small amount at a time.
 | 
|
548  | 
        # Which in itself isn't terribly expensive, but resizing 2MB 32kB at a
 | 
|
549  | 
        # time (self._block._content) is a little expensive.
 | 
|
550  | 
self._block._ensure_content(self._last_byte)  | 
|
551  | 
||
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
552  | 
def _check_rebuild_block(self):  | 
553  | 
"""Check to see if our block should be repacked."""  | 
|
554  | 
total_bytes_used = 0  | 
|
555  | 
last_byte_used = 0  | 
|
556  | 
for factory in self._factories:  | 
|
557  | 
total_bytes_used += factory._end - factory._start  | 
|
558  | 
last_byte_used = max(last_byte_used, factory._end)  | 
|
559  | 
        # If we are using most of the bytes from the block, we have nothing
 | 
|
560  | 
        # else to check (currently more that 1/2)
 | 
|
561  | 
if total_bytes_used * 2 >= self._block._content_length:  | 
|
562  | 
            return
 | 
|
563  | 
        # Can we just strip off the trailing bytes? If we are going to be
 | 
|
564  | 
        # transmitting more than 50% of the front of the content, go ahead
 | 
|
565  | 
if total_bytes_used * 2 > last_byte_used:  | 
|
566  | 
self._trim_block(last_byte_used)  | 
|
567  | 
            return
 | 
|
568  | 
||
569  | 
        # We are using a small amount of the data, and it isn't just packed
 | 
|
570  | 
        # nicely at the front, so rebuild the content.
 | 
|
571  | 
        # Note: This would be *nicer* as a strip-data-from-group, rather than
 | 
|
572  | 
        #       building it up again from scratch
 | 
|
573  | 
        #       It might be reasonable to consider the fulltext sizes for
 | 
|
574  | 
        #       different bits when deciding this, too. As you may have a small
 | 
|
575  | 
        #       fulltext, and a trivial delta, and you are just trading around
 | 
|
576  | 
        #       for another fulltext. If we do a simple 'prune' you may end up
 | 
|
577  | 
        #       expanding many deltas into fulltexts, as well.
 | 
|
578  | 
        #       If we build a cheap enough 'strip', then we could try a strip,
 | 
|
579  | 
        #       if that expands the content, we then rebuild.
 | 
|
580  | 
self._rebuild_block()  | 
|
581  | 
||
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
582  | 
def _wire_bytes(self):  | 
583  | 
"""Return a byte stream suitable for transmitting over the wire."""  | 
|
| 
3735.32.24
by John Arbash Meinel
 _wire_bytes() now strips groups as necessary, as does _insert_record_stream  | 
584  | 
self._check_rebuild_block()  | 
| 
3735.32.16
by John Arbash Meinel
 We now have a general header for the GC block.  | 
585  | 
        # The outer block starts with:
 | 
586  | 
        #   'groupcompress-block\n'
 | 
|
587  | 
        #   <length of compressed key info>\n
 | 
|
588  | 
        #   <length of uncompressed info>\n
 | 
|
589  | 
        #   <length of gc block>\n
 | 
|
590  | 
        #   <header bytes>
 | 
|
591  | 
        #   <gc-block>
 | 
|
592  | 
lines = ['groupcompress-block\n']  | 
|
593  | 
        # The minimal info we need is the key, the start offset, and the
 | 
|
594  | 
        # parents. The length and type are encoded in the record itself.
 | 
|
595  | 
        # However, passing in the other bits makes it easier.  The list of
 | 
|
596  | 
        # keys, and the start offset, the length
 | 
|
597  | 
        # 1 line key
 | 
|
598  | 
        # 1 line with parents, '' for ()
 | 
|
599  | 
        # 1 line for start offset
 | 
|
600  | 
        # 1 line for end byte
 | 
|
601  | 
header_lines = []  | 
|
| 
3735.32.15
by John Arbash Meinel
 Change the GroupCompressBlock code to allow not recording 'end'.  | 
602  | 
for factory in self._factories:  | 
| 
3735.32.16
by John Arbash Meinel
 We now have a general header for the GC block.  | 
603  | 
key_bytes = '\x00'.join(factory.key)  | 
604  | 
parents = factory.parents  | 
|
605  | 
if parents is None:  | 
|
606  | 
parent_bytes = 'None:'  | 
|
607  | 
else:  | 
|
608  | 
parent_bytes = '\t'.join('\x00'.join(key) for key in parents)  | 
|
609  | 
record_header = '%s\n%s\n%d\n%d\n' % (  | 
|
610  | 
key_bytes, parent_bytes, factory._start, factory._end)  | 
|
611  | 
header_lines.append(record_header)  | 
|
| 
3735.37.5
by John Arbash Meinel
 Restore the refcycle reduction code.  | 
612  | 
            # TODO: Can we break the refcycle at this point and set
 | 
613  | 
            #       factory._manager = None?
 | 
|
| 
3735.32.16
by John Arbash Meinel
 We now have a general header for the GC block.  | 
614  | 
header_bytes = ''.join(header_lines)  | 
615  | 
del header_lines  | 
|
616  | 
header_bytes_len = len(header_bytes)  | 
|
617  | 
z_header_bytes = zlib.compress(header_bytes)  | 
|
618  | 
del header_bytes  | 
|
619  | 
z_header_bytes_len = len(z_header_bytes)  | 
|
| 
3735.32.17
by John Arbash Meinel
 We now round-trip the wire_bytes.  | 
620  | 
block_bytes = self._block.to_bytes()  | 
| 
3735.32.16
by John Arbash Meinel
 We now have a general header for the GC block.  | 
621  | 
lines.append('%d\n%d\n%d\n' % (z_header_bytes_len, header_bytes_len,  | 
| 
3735.32.17
by John Arbash Meinel
 We now round-trip the wire_bytes.  | 
622  | 
len(block_bytes)))  | 
| 
3735.32.16
by John Arbash Meinel
 We now have a general header for the GC block.  | 
623  | 
lines.append(z_header_bytes)  | 
| 
3735.32.17
by John Arbash Meinel
 We now round-trip the wire_bytes.  | 
624  | 
lines.append(block_bytes)  | 
625  | 
del z_header_bytes, block_bytes  | 
|
| 
3735.32.16
by John Arbash Meinel
 We now have a general header for the GC block.  | 
626  | 
return ''.join(lines)  | 
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
627  | 
|
| 
3735.32.17
by John Arbash Meinel
 We now round-trip the wire_bytes.  | 
628  | 
    @classmethod
 | 
| 
3735.32.18
by John Arbash Meinel
 We now support generating a network stream.  | 
629  | 
def from_bytes(cls, bytes):  | 
| 
3735.32.17
by John Arbash Meinel
 We now round-trip the wire_bytes.  | 
630  | 
        # TODO: This does extra string copying, probably better to do it a
 | 
631  | 
        #       different way
 | 
|
632  | 
(storage_kind, z_header_len, header_len,  | 
|
633  | 
block_len, rest) = bytes.split('\n', 4)  | 
|
634  | 
del bytes  | 
|
635  | 
if storage_kind != 'groupcompress-block':  | 
|
636  | 
raise ValueError('Unknown storage kind: %s' % (storage_kind,))  | 
|
637  | 
z_header_len = int(z_header_len)  | 
|
638  | 
if len(rest) < z_header_len:  | 
|
639  | 
raise ValueError('Compressed header len shorter than all bytes')  | 
|
640  | 
z_header = rest[:z_header_len]  | 
|
641  | 
header_len = int(header_len)  | 
|
642  | 
header = zlib.decompress(z_header)  | 
|
643  | 
if len(header) != header_len:  | 
|
644  | 
raise ValueError('invalid length for decompressed bytes')  | 
|
645  | 
del z_header  | 
|
646  | 
block_len = int(block_len)  | 
|
647  | 
if len(rest) != z_header_len + block_len:  | 
|
648  | 
raise ValueError('Invalid length for block')  | 
|
649  | 
block_bytes = rest[z_header_len:]  | 
|
650  | 
del rest  | 
|
651  | 
        # So now we have a valid GCB, we just need to parse the factories that
 | 
|
652  | 
        # were sent to us
 | 
|
653  | 
header_lines = header.split('\n')  | 
|
654  | 
del header  | 
|
655  | 
last = header_lines.pop()  | 
|
656  | 
if last != '':  | 
|
657  | 
raise ValueError('header lines did not end with a trailing'  | 
|
658  | 
' newline')  | 
|
659  | 
if len(header_lines) % 4 != 0:  | 
|
660  | 
raise ValueError('The header was not an even multiple of 4 lines')  | 
|
661  | 
block = GroupCompressBlock.from_bytes(block_bytes)  | 
|
662  | 
del block_bytes  | 
|
663  | 
result = cls(block)  | 
|
664  | 
for start in xrange(0, len(header_lines), 4):  | 
|
665  | 
            # intern()?
 | 
|
666  | 
key = tuple(header_lines[start].split('\x00'))  | 
|
667  | 
parents_line = header_lines[start+1]  | 
|
668  | 
if parents_line == 'None:':  | 
|
669  | 
parents = None  | 
|
670  | 
else:  | 
|
671  | 
parents = tuple([tuple(segment.split('\x00'))  | 
|
672  | 
for segment in parents_line.split('\t')  | 
|
673  | 
if segment])  | 
|
674  | 
start_offset = int(header_lines[start+2])  | 
|
675  | 
end_offset = int(header_lines[start+3])  | 
|
676  | 
result.add_factory(key, parents, start_offset, end_offset)  | 
|
677  | 
return result  | 
|
678  | 
||
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
679  | 
|
| 
3735.32.18
by John Arbash Meinel
 We now support generating a network stream.  | 
680  | 
def network_block_to_records(storage_kind, bytes, line_end):  | 
681  | 
if storage_kind != 'groupcompress-block':  | 
|
682  | 
raise ValueError('Unknown storage kind: %s' % (storage_kind,))  | 
|
683  | 
manager = _LazyGroupContentManager.from_bytes(bytes)  | 
|
684  | 
return manager.get_record_stream()  | 
|
685  | 
||
686  | 
||
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
687  | 
class _CommonGroupCompressor(object):  | 
688  | 
||
689  | 
def __init__(self):  | 
|
690  | 
"""Create a GroupCompressor."""  | 
|
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
691  | 
self.chunks = []  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
692  | 
self._last = None  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
693  | 
self.endpoint = 0  | 
694  | 
self.input_bytes = 0  | 
|
695  | 
self.labels_deltas = {}  | 
|
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
696  | 
self._delta_index = None # Set by the children  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
697  | 
self._block = GroupCompressBlock()  | 
698  | 
||
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
699  | 
def compress(self, key, bytes, expected_sha, nostore_sha=None, soft=False):  | 
700  | 
"""Compress lines with label key.  | 
|
701  | 
||
702  | 
        :param key: A key tuple. It is stored in the output
 | 
|
703  | 
            for identification of the text during decompression. If the last
 | 
|
704  | 
            element is 'None' it is replaced with the sha1 of the text -
 | 
|
705  | 
            e.g. sha1:xxxxxxx.
 | 
|
706  | 
        :param bytes: The bytes to be compressed
 | 
|
707  | 
        :param expected_sha: If non-None, the sha the lines are believed to
 | 
|
708  | 
            have. During compression the sha is calculated; a mismatch will
 | 
|
709  | 
            cause an error.
 | 
|
710  | 
        :param nostore_sha: If the computed sha1 sum matches, we will raise
 | 
|
711  | 
            ExistingContent rather than adding the text.
 | 
|
712  | 
        :param soft: Do a 'soft' compression. This means that we require larger
 | 
|
713  | 
            ranges to match to be considered for a copy command.
 | 
|
714  | 
||
715  | 
        :return: The sha1 of lines, the start and end offsets in the delta, and
 | 
|
716  | 
            the type ('fulltext' or 'delta').
 | 
|
717  | 
||
718  | 
        :seealso VersionedFiles.add_lines:
 | 
|
719  | 
        """
 | 
|
720  | 
if not bytes: # empty, like a dir entry, etc  | 
|
721  | 
if nostore_sha == _null_sha1:  | 
|
722  | 
raise errors.ExistingContent()  | 
|
723  | 
return _null_sha1, 0, 0, 'fulltext'  | 
|
724  | 
        # we assume someone knew what they were doing when they passed it in
 | 
|
725  | 
if expected_sha is not None:  | 
|
726  | 
sha1 = expected_sha  | 
|
727  | 
else:  | 
|
728  | 
sha1 = osutils.sha_string(bytes)  | 
|
729  | 
if nostore_sha is not None:  | 
|
730  | 
if sha1 == nostore_sha:  | 
|
731  | 
raise errors.ExistingContent()  | 
|
732  | 
if key[-1] is None:  | 
|
733  | 
key = key[:-1] + ('sha1:' + sha1,)  | 
|
734  | 
||
735  | 
start, end, type = self._compress(key, bytes, len(bytes) / 2, soft)  | 
|
736  | 
return sha1, start, end, type  | 
|
737  | 
||
738  | 
def _compress(self, key, bytes, max_delta_size, soft=False):  | 
|
739  | 
"""Compress lines with label key.  | 
|
740  | 
||
741  | 
        :param key: A key tuple. It is stored in the output for identification
 | 
|
742  | 
            of the text during decompression.
 | 
|
743  | 
||
744  | 
        :param bytes: The bytes to be compressed
 | 
|
745  | 
||
746  | 
        :param max_delta_size: The size above which we issue a fulltext instead
 | 
|
747  | 
            of a delta.
 | 
|
748  | 
||
749  | 
        :param soft: Do a 'soft' compression. This means that we require larger
 | 
|
750  | 
            ranges to match to be considered for a copy command.
 | 
|
751  | 
||
752  | 
        :return: The sha1 of lines, the start and end offsets in the delta, and
 | 
|
753  | 
            the type ('fulltext' or 'delta').
 | 
|
754  | 
        """
 | 
|
755  | 
raise NotImplementedError(self._compress)  | 
|
756  | 
||
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
757  | 
def extract(self, key):  | 
758  | 
"""Extract a key previously added to the compressor.  | 
|
759  | 
||
760  | 
        :param key: The key to extract.
 | 
|
761  | 
        :return: An iterable over bytes and the sha1.
 | 
|
762  | 
        """
 | 
|
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
763  | 
(start_byte, start_chunk, end_byte, end_chunk) = self.labels_deltas[key]  | 
764  | 
delta_chunks = self.chunks[start_chunk:end_chunk]  | 
|
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
765  | 
stored_bytes = ''.join(delta_chunks)  | 
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
766  | 
if stored_bytes[0] == 'f':  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
767  | 
fulltext_len, offset = decode_base128_int(stored_bytes[1:10])  | 
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
768  | 
data_len = fulltext_len + 1 + offset  | 
769  | 
if data_len != len(stored_bytes):  | 
|
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
770  | 
raise ValueError('Index claimed fulltext len, but stored bytes'  | 
771  | 
' claim %s != %s'  | 
|
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
772  | 
% (len(stored_bytes), data_len))  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
773  | 
bytes = stored_bytes[offset + 1:]  | 
774  | 
else:  | 
|
775  | 
            # XXX: This is inefficient at best
 | 
|
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
776  | 
source = ''.join(self.chunks[:start_chunk])  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
777  | 
if stored_bytes[0] != 'd':  | 
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
778  | 
raise ValueError('Unknown content kind, bytes claim %s'  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
779  | 
% (stored_bytes[0],))  | 
780  | 
delta_len, offset = decode_base128_int(stored_bytes[1:10])  | 
|
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
781  | 
data_len = delta_len + 1 + offset  | 
782  | 
if data_len != len(stored_bytes):  | 
|
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
783  | 
raise ValueError('Index claimed delta len, but stored bytes'  | 
784  | 
' claim %s != %s'  | 
|
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
785  | 
% (len(stored_bytes), data_len))  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
786  | 
bytes = apply_delta(source, stored_bytes[offset + 1:])  | 
787  | 
bytes_sha1 = osutils.sha_string(bytes)  | 
|
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
788  | 
return bytes, bytes_sha1  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
789  | 
|
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
790  | 
def flush(self):  | 
791  | 
"""Finish this group, creating a formatted stream.  | 
|
792  | 
||
793  | 
        After calling this, the compressor should no longer be used
 | 
|
794  | 
        """
 | 
|
| 
4398.6.2
by John Arbash Meinel
 Add a TODO, marking the code that causes us to peak at 2x memory consumption  | 
795  | 
        # TODO: this causes us to 'bloat' to 2x the size of content in the
 | 
796  | 
        #       group. This has an impact for 'commit' of large objects.
 | 
|
797  | 
        #       One possibility is to use self._content_chunks, and be lazy and
 | 
|
798  | 
        #       only fill out self._content as a full string when we actually
 | 
|
799  | 
        #       need it. That would at least drop the peak memory consumption
 | 
|
800  | 
        #       for 'commit' down to ~1x the size of the largest file, at a
 | 
|
801  | 
        #       cost of increased complexity within this code. 2x is still <<
 | 
|
802  | 
        #       3x the size of the largest file, so we are doing ok.
 | 
|
| 
4469.1.2
by John Arbash Meinel
 The only caller already knows the content length, so make the api such that  | 
803  | 
self._block.set_chunked_content(self.chunks, self.endpoint)  | 
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
804  | 
self.chunks = None  | 
805  | 
self._delta_index = None  | 
|
806  | 
return self._block  | 
|
807  | 
||
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
808  | 
def pop_last(self):  | 
809  | 
"""Call this if you want to 'revoke' the last compression.  | 
|
810  | 
||
811  | 
        After this, the data structures will be rolled back, but you cannot do
 | 
|
812  | 
        more compression.
 | 
|
813  | 
        """
 | 
|
814  | 
self._delta_index = None  | 
|
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
815  | 
del self.chunks[self._last[0]:]  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
816  | 
self.endpoint = self._last[1]  | 
817  | 
self._last = None  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
818  | 
|
819  | 
def ratio(self):  | 
|
820  | 
"""Return the overall compression ratio."""  | 
|
821  | 
return float(self.input_bytes) / float(self.endpoint)  | 
|
822  | 
||
823  | 
||
824  | 
class PythonGroupCompressor(_CommonGroupCompressor):  | 
|
825  | 
||
| 
3735.40.2
by John Arbash Meinel
 Add a groupcompress.encode_copy_instruction function.  | 
826  | 
def __init__(self):  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
827  | 
"""Create a GroupCompressor.  | 
828  | 
||
829  | 
        Used only if the pyrex version is not available.
 | 
|
830  | 
        """
 | 
|
831  | 
super(PythonGroupCompressor, self).__init__()  | 
|
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
832  | 
self._delta_index = LinesDeltaIndex([])  | 
833  | 
        # The actual content is managed by LinesDeltaIndex
 | 
|
834  | 
self.chunks = self._delta_index.lines  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
835  | 
|
836  | 
def _compress(self, key, bytes, max_delta_size, soft=False):  | 
|
837  | 
"""see _CommonGroupCompressor._compress"""  | 
|
838  | 
input_len = len(bytes)  | 
|
| 
3735.40.2
by John Arbash Meinel
 Add a groupcompress.encode_copy_instruction function.  | 
839  | 
new_lines = osutils.split_lines(bytes)  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
840  | 
out_lines, index_lines = self._delta_index.make_delta(  | 
841  | 
new_lines, bytes_length=input_len, soft=soft)  | 
|
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
842  | 
delta_length = sum(map(len, out_lines))  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
843  | 
if delta_length > max_delta_size:  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
844  | 
            # The delta is longer than the fulltext, insert a fulltext
 | 
845  | 
type = 'fulltext'  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
846  | 
out_lines = ['f', encode_base128_int(input_len)]  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
847  | 
out_lines.extend(new_lines)  | 
848  | 
index_lines = [False, False]  | 
|
849  | 
index_lines.extend([True] * len(new_lines))  | 
|
850  | 
else:  | 
|
851  | 
            # this is a worthy delta, output it
 | 
|
852  | 
type = 'delta'  | 
|
853  | 
out_lines[0] = 'd'  | 
|
854  | 
            # Update the delta_length to include those two encoded integers
 | 
|
855  | 
out_lines[1] = encode_base128_int(delta_length)  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
856  | 
        # Before insertion
 | 
857  | 
start = self.endpoint  | 
|
858  | 
chunk_start = len(self.chunks)  | 
|
| 
4241.17.2
by John Arbash Meinel
 PythonGroupCompressor needs to support pop_last() properly.  | 
859  | 
self._last = (chunk_start, self.endpoint)  | 
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
860  | 
self._delta_index.extend_lines(out_lines, index_lines)  | 
861  | 
self.endpoint = self._delta_index.endpoint  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
862  | 
self.input_bytes += input_len  | 
863  | 
chunk_end = len(self.chunks)  | 
|
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
864  | 
self.labels_deltas[key] = (start, chunk_start,  | 
865  | 
self.endpoint, chunk_end)  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
866  | 
return start, self.endpoint, type  | 
867  | 
||
868  | 
||
869  | 
class PyrexGroupCompressor(_CommonGroupCompressor):  | 
|
| 
0.17.3
by Robert Collins
 new encoder, allows non monotonically increasing sequence matches for moar compression.  | 
870  | 
"""Produce a serialised group of compressed texts.  | 
| 
0.23.6
by John Arbash Meinel
 Start stripping out the actual GroupCompressor  | 
871  | 
|
| 
0.17.3
by Robert Collins
 new encoder, allows non monotonically increasing sequence matches for moar compression.  | 
872  | 
    It contains code very similar to SequenceMatcher because of having a similar
 | 
873  | 
    task. However some key differences apply:
 | 
|
874  | 
     - there is no junk, we want a minimal edit not a human readable diff.
 | 
|
875  | 
     - we don't filter very common lines (because we don't know where a good
 | 
|
876  | 
       range will start, and after the first text we want to be emitting minmal
 | 
|
877  | 
       edits only.
 | 
|
878  | 
     - we chain the left side, not the right side
 | 
|
879  | 
     - we incrementally update the adjacency matrix as new lines are provided.
 | 
|
880  | 
     - we look for matches in all of the left side, so the routine which does
 | 
|
881  | 
       the analagous task of find_longest_match does not need to filter on the
 | 
|
882  | 
       left side.
 | 
|
883  | 
    """
 | 
|
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
884  | 
|
| 
3735.32.19
by John Arbash Meinel
 Get rid of the 'delta' flag to GroupCompressor. It didn't do anything anyway.  | 
885  | 
def __init__(self):  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
886  | 
super(PyrexGroupCompressor, self).__init__()  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
887  | 
self._delta_index = DeltaIndex()  | 
| 
0.23.6
by John Arbash Meinel
 Start stripping out the actual GroupCompressor  | 
888  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
889  | 
def _compress(self, key, bytes, max_delta_size, soft=False):  | 
890  | 
"""see _CommonGroupCompressor._compress"""  | 
|
| 
0.23.52
by John Arbash Meinel
 Use the max_delta flag.  | 
891  | 
input_len = len(bytes)  | 
| 
0.23.12
by John Arbash Meinel
 Add a 'len:' field to the data.  | 
892  | 
        # By having action/label/sha1/len, we can parse the group if the index
 | 
893  | 
        # was ever destroyed, we have the key in 'label', we know the final
 | 
|
894  | 
        # bytes are valid from sha1, and we know where to find the end of this
 | 
|
895  | 
        # record because of 'len'. (the delta record itself will store the
 | 
|
896  | 
        # total length for the expanded record)
 | 
|
| 
0.23.13
by John Arbash Meinel
 Factor out the ability to have/not have labels.  | 
897  | 
        # 'len: %d\n' costs approximately 1% increase in total data
 | 
898  | 
        # Having the labels at all costs us 9-10% increase, 38% increase for
 | 
|
899  | 
        # inventory pages, and 5.8% increase for text pages
 | 
|
| 
0.25.6
by John Arbash Meinel
 (tests broken) implement the basic ability to have a separate header  | 
900  | 
        # new_chunks = ['label:%s\nsha1:%s\n' % (label, sha1)]
 | 
| 
0.23.33
by John Arbash Meinel
 Fix a bug when handling multiple large-range copies.  | 
901  | 
if self._delta_index._source_offset != self.endpoint:  | 
902  | 
raise AssertionError('_source_offset != endpoint'  | 
|
903  | 
                ' somehow the DeltaIndex got out of sync with'
 | 
|
904  | 
' the output lines')  | 
|
| 
0.23.52
by John Arbash Meinel
 Use the max_delta flag.  | 
905  | 
delta = self._delta_index.make_delta(bytes, max_delta_size)  | 
906  | 
if (delta is None):  | 
|
| 
0.25.10
by John Arbash Meinel
 Play around with detecting compression breaks.  | 
907  | 
type = 'fulltext'  | 
| 
0.17.36
by John Arbash Meinel
 Adding a mini-len to the delta/fulltext bytes  | 
908  | 
enc_length = encode_base128_int(len(bytes))  | 
909  | 
len_mini_header = 1 + len(enc_length)  | 
|
910  | 
self._delta_index.add_source(bytes, len_mini_header)  | 
|
911  | 
new_chunks = ['f', enc_length, bytes]  | 
|
| 
0.23.9
by John Arbash Meinel
 We now basically have full support for using diff-delta as the compressor.  | 
912  | 
else:  | 
| 
0.25.10
by John Arbash Meinel
 Play around with detecting compression breaks.  | 
913  | 
type = 'delta'  | 
| 
0.17.36
by John Arbash Meinel
 Adding a mini-len to the delta/fulltext bytes  | 
914  | 
enc_length = encode_base128_int(len(delta))  | 
915  | 
len_mini_header = 1 + len(enc_length)  | 
|
916  | 
new_chunks = ['d', enc_length, delta]  | 
|
| 
3735.38.5
by John Arbash Meinel
 A bit of testing showed that _FAST=True was actually *slower*.  | 
917  | 
self._delta_index.add_delta_source(delta, len_mini_header)  | 
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
918  | 
        # Before insertion
 | 
919  | 
start = self.endpoint  | 
|
920  | 
chunk_start = len(self.chunks)  | 
|
921  | 
        # Now output these bytes
 | 
|
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
922  | 
self._output_chunks(new_chunks)  | 
| 
0.23.6
by John Arbash Meinel
 Start stripping out the actual GroupCompressor  | 
923  | 
self.input_bytes += input_len  | 
| 
3735.40.18
by John Arbash Meinel
 Get rid of the entries dict in GroupCompressBlock.  | 
924  | 
chunk_end = len(self.chunks)  | 
925  | 
self.labels_deltas[key] = (start, chunk_start,  | 
|
926  | 
self.endpoint, chunk_end)  | 
|
| 
0.23.29
by John Arbash Meinel
 Forgot to add the delta bytes to the index objects.  | 
927  | 
if not self._delta_index._source_offset == self.endpoint:  | 
928  | 
raise AssertionError('the delta index is out of sync'  | 
|
929  | 
'with the output lines %s != %s'  | 
|
930  | 
% (self._delta_index._source_offset, self.endpoint))  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
931  | 
return start, self.endpoint, type  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
932  | 
|
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
933  | 
def _output_chunks(self, new_chunks):  | 
| 
0.23.9
by John Arbash Meinel
 We now basically have full support for using diff-delta as the compressor.  | 
934  | 
"""Output some chunks.  | 
935  | 
||
936  | 
        :param new_chunks: The chunks to output.
 | 
|
937  | 
        """
 | 
|
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
938  | 
self._last = (len(self.chunks), self.endpoint)  | 
| 
0.17.12
by Robert Collins
 Encode copy ranges as bytes not lines, halves decode overhead.  | 
939  | 
endpoint = self.endpoint  | 
| 
3735.40.17
by John Arbash Meinel
 Change the attribute from 'lines' to 'chunks' to make it more  | 
940  | 
self.chunks.extend(new_chunks)  | 
| 
0.23.9
by John Arbash Meinel
 We now basically have full support for using diff-delta as the compressor.  | 
941  | 
endpoint += sum(map(len, new_chunks))  | 
| 
0.17.12
by Robert Collins
 Encode copy ranges as bytes not lines, halves decode overhead.  | 
942  | 
self.endpoint = endpoint  | 
| 
0.17.3
by Robert Collins
 new encoder, allows non monotonically increasing sequence matches for moar compression.  | 
943  | 
|
| 
0.17.11
by Robert Collins
 Add extraction of just-compressed texts to support converting from knits.  | 
944  | 
|
| 
4465.2.4
by Aaron Bentley
 Switch between warn and raise depending on inconsistent_fatal.  | 
945  | 
def make_pack_factory(graph, delta, keylength, inconsistency_fatal=True):  | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
946  | 
"""Create a factory for creating a pack based groupcompress.  | 
947  | 
||
948  | 
    This is only functional enough to run interface tests, it doesn't try to
 | 
|
949  | 
    provide a full pack environment.
 | 
|
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
950  | 
|
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
951  | 
    :param graph: Store a graph.
 | 
952  | 
    :param delta: Delta compress contents.
 | 
|
953  | 
    :param keylength: How long should keys be.
 | 
|
954  | 
    """
 | 
|
955  | 
def factory(transport):  | 
|
| 
3735.32.2
by John Arbash Meinel
 The 'delta' flag has no effect on the content (all GC is delta'd),  | 
956  | 
parents = graph  | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
957  | 
ref_length = 0  | 
958  | 
if graph:  | 
|
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
959  | 
ref_length = 1  | 
| 
0.17.7
by Robert Collins
 Update for current index2 changes.  | 
960  | 
graph_index = BTreeBuilder(reference_lists=ref_length,  | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
961  | 
key_elements=keylength)  | 
962  | 
stream = transport.open_write_stream('newpack')  | 
|
963  | 
writer = pack.ContainerWriter(stream.write)  | 
|
964  | 
writer.begin()  | 
|
965  | 
index = _GCGraphIndex(graph_index, lambda:True, parents=parents,  | 
|
| 
4465.2.4
by Aaron Bentley
 Switch between warn and raise depending on inconsistent_fatal.  | 
966  | 
add_callback=graph_index.add_nodes,  | 
967  | 
inconsistency_fatal=inconsistency_fatal)  | 
|
| 
4343.3.21
by John Arbash Meinel
 Implement get_missing_parents in terms of _KeyRefs.  | 
968  | 
access = knit._DirectPackAccess({})  | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
969  | 
access.set_writer(writer, graph_index, (transport, 'newpack'))  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
970  | 
result = GroupCompressVersionedFiles(index, access, delta)  | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
971  | 
result.stream = stream  | 
972  | 
result.writer = writer  | 
|
973  | 
return result  | 
|
974  | 
return factory  | 
|
975  | 
||
976  | 
||
977  | 
def cleanup_pack_group(versioned_files):  | 
|
| 
0.17.23
by Robert Collins
 Only decompress as much of the zlib data as is needed to read the text recipe.  | 
978  | 
versioned_files.writer.end()  | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
979  | 
versioned_files.stream.close()  | 
980  | 
||
981  | 
||
982  | 
class GroupCompressVersionedFiles(VersionedFiles):  | 
|
983  | 
"""A group-compress based VersionedFiles implementation."""  | 
|
984  | 
||
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
985  | 
def __init__(self, index, access, delta=True):  | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
986  | 
"""Create a GroupCompressVersionedFiles object.  | 
987  | 
||
988  | 
        :param index: The index object storing access and graph data.
 | 
|
989  | 
        :param access: The access object storing raw data.
 | 
|
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
990  | 
        :param delta: Whether to delta compress or just entropy compress.
 | 
991  | 
        """
 | 
|
992  | 
self._index = index  | 
|
993  | 
self._access = access  | 
|
994  | 
self._delta = delta  | 
|
| 
0.17.11
by Robert Collins
 Add extraction of just-compressed texts to support converting from knits.  | 
995  | 
self._unadded_refs = {}  | 
| 
0.17.24
by Robert Collins
 Add a group cache to decompression, 5 times faster than knit at decompression when accessing everything in a group.  | 
996  | 
self._group_cache = LRUSizeCache(max_size=50*1024*1024)  | 
| 
3735.31.7
by John Arbash Meinel
 Start bringing in stacking support for Groupcompress repos.  | 
997  | 
self._fallback_vfs = []  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
998  | 
|
999  | 
def add_lines(self, key, parents, lines, parent_texts=None,  | 
|
1000  | 
left_matching_blocks=None, nostore_sha=None, random_id=False,  | 
|
1001  | 
check_content=True):  | 
|
1002  | 
"""Add a text to the store.  | 
|
1003  | 
||
1004  | 
        :param key: The key tuple of the text to add.
 | 
|
1005  | 
        :param parents: The parents key tuples of the text to add.
 | 
|
1006  | 
        :param lines: A list of lines. Each line must be a bytestring. And all
 | 
|
1007  | 
            of them except the last must be terminated with \n and contain no
 | 
|
1008  | 
            other \n's. The last line may either contain no \n's or a single
 | 
|
1009  | 
            terminating \n. If the lines list does meet this constraint the add
 | 
|
1010  | 
            routine may error or may succeed - but you will be unable to read
 | 
|
1011  | 
            the data back accurately. (Checking the lines have been split
 | 
|
1012  | 
            correctly is expensive and extremely unlikely to catch bugs so it
 | 
|
1013  | 
            is not done at runtime unless check_content is True.)
 | 
|
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1014  | 
        :param parent_texts: An optional dictionary containing the opaque
 | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1015  | 
            representations of some or all of the parents of version_id to
 | 
1016  | 
            allow delta optimisations.  VERY IMPORTANT: the texts must be those
 | 
|
1017  | 
            returned by add_lines or data corruption can be caused.
 | 
|
1018  | 
        :param left_matching_blocks: a hint about which areas are common
 | 
|
1019  | 
            between the text and its left-hand-parent.  The format is
 | 
|
1020  | 
            the SequenceMatcher.get_matching_blocks format.
 | 
|
1021  | 
        :param nostore_sha: Raise ExistingContent and do not add the lines to
 | 
|
1022  | 
            the versioned file if the digest of the lines matches this.
 | 
|
1023  | 
        :param random_id: If True a random id has been selected rather than
 | 
|
1024  | 
            an id determined by some deterministic process such as a converter
 | 
|
1025  | 
            from a foreign VCS. When True the backend may choose not to check
 | 
|
1026  | 
            for uniqueness of the resulting key within the versioned file, so
 | 
|
1027  | 
            this should only be done when the result is expected to be unique
 | 
|
1028  | 
            anyway.
 | 
|
1029  | 
        :param check_content: If True, the lines supplied are verified to be
 | 
|
1030  | 
            bytestrings that are correctly formed lines.
 | 
|
1031  | 
        :return: The text sha1, the number of bytes in the text, and an opaque
 | 
|
1032  | 
                 representation of the inserted version which can be provided
 | 
|
1033  | 
                 back to future add_lines calls in the parent_texts dictionary.
 | 
|
1034  | 
        """
 | 
|
1035  | 
self._index._check_write_ok()  | 
|
1036  | 
self._check_add(key, lines, random_id, check_content)  | 
|
1037  | 
if parents is None:  | 
|
1038  | 
            # The caller might pass None if there is no graph data, but kndx
 | 
|
1039  | 
            # indexes can't directly store that, so we give them
 | 
|
1040  | 
            # an empty tuple instead.
 | 
|
1041  | 
parents = ()  | 
|
1042  | 
        # double handling for now. Make it work until then.
 | 
|
| 
0.20.5
by John Arbash Meinel
 Finish the Fulltext => Chunked conversions so that we work in the more-efficient Chunks.  | 
1043  | 
length = sum(map(len, lines))  | 
1044  | 
record = ChunkedContentFactory(key, parents, None, lines)  | 
|
| 
3735.31.12
by John Arbash Meinel
 Push nostore_sha down through the stack.  | 
1045  | 
sha1 = list(self._insert_record_stream([record], random_id=random_id,  | 
1046  | 
nostore_sha=nostore_sha))[0]  | 
|
| 
0.20.5
by John Arbash Meinel
 Finish the Fulltext => Chunked conversions so that we work in the more-efficient Chunks.  | 
1047  | 
return sha1, length, None  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1048  | 
|
| 
4398.8.6
by John Arbash Meinel
 Switch the api from VF.add_text to VF._add_text and trim some extra 'features'.  | 
1049  | 
def _add_text(self, key, parents, text, nostore_sha=None, random_id=False):  | 
| 
4398.9.1
by Matt Nordhoff
 Update _add_text docstrings that still referred to add_text.  | 
1050  | 
"""See VersionedFiles._add_text()."""  | 
| 
4398.8.4
by John Arbash Meinel
 Implement add_text for GroupCompressVersionedFiles  | 
1051  | 
self._index._check_write_ok()  | 
1052  | 
self._check_add(key, None, random_id, check_content=False)  | 
|
1053  | 
if text.__class__ is not str:  | 
|
1054  | 
raise errors.BzrBadParameterUnicode("text")  | 
|
1055  | 
if parents is None:  | 
|
1056  | 
            # The caller might pass None if there is no graph data, but kndx
 | 
|
1057  | 
            # indexes can't directly store that, so we give them
 | 
|
1058  | 
            # an empty tuple instead.
 | 
|
1059  | 
parents = ()  | 
|
1060  | 
        # double handling for now. Make it work until then.
 | 
|
1061  | 
length = len(text)  | 
|
1062  | 
record = FulltextContentFactory(key, parents, None, text)  | 
|
1063  | 
sha1 = list(self._insert_record_stream([record], random_id=random_id,  | 
|
1064  | 
nostore_sha=nostore_sha))[0]  | 
|
1065  | 
return sha1, length, None  | 
|
1066  | 
||
| 
3735.31.7
by John Arbash Meinel
 Start bringing in stacking support for Groupcompress repos.  | 
1067  | 
def add_fallback_versioned_files(self, a_versioned_files):  | 
1068  | 
"""Add a source of texts for texts not present in this knit.  | 
|
1069  | 
||
1070  | 
        :param a_versioned_files: A VersionedFiles object.
 | 
|
1071  | 
        """
 | 
|
1072  | 
self._fallback_vfs.append(a_versioned_files)  | 
|
1073  | 
||
| 
0.17.4
by Robert Collins
 Annotate.  | 
1074  | 
def annotate(self, key):  | 
1075  | 
"""See VersionedFiles.annotate."""  | 
|
1076  | 
graph = Graph(self)  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1077  | 
parent_map = self.get_parent_map([key])  | 
1078  | 
if not parent_map:  | 
|
1079  | 
raise errors.RevisionNotPresent(key, self)  | 
|
1080  | 
if parent_map[key] is not None:  | 
|
| 
4371.3.18
by John Arbash Meinel
 Change VF.annotate to use the new KnownGraph code.  | 
1081  | 
parent_map = dict((k, v) for k, v in graph.iter_ancestry([key])  | 
1082  | 
if v is not None)  | 
|
1083  | 
keys = parent_map.keys()  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1084  | 
else:  | 
1085  | 
keys = [key]  | 
|
1086  | 
parent_map = {key:()}  | 
|
| 
4371.3.30
by John Arbash Meinel
 Clean up the annotate code while using the new functionality.  | 
1087  | 
        # We used Graph(self) to load the parent_map, but now that we have it,
 | 
1088  | 
        # we can just query the parent map directly, so create a KnownGraph
 | 
|
1089  | 
heads_provider = _mod_graph.KnownGraph(parent_map)  | 
|
| 
0.17.4
by Robert Collins
 Annotate.  | 
1090  | 
parent_cache = {}  | 
1091  | 
reannotate = annotate.reannotate  | 
|
1092  | 
for record in self.get_record_stream(keys, 'topological', True):  | 
|
1093  | 
key = record.key  | 
|
| 
4371.2.1
by Vincent Ladeuil
 Start fixing annotate for gc.  | 
1094  | 
lines = osutils.chunks_to_lines(record.get_bytes_as('chunked'))  | 
| 
0.17.4
by Robert Collins
 Annotate.  | 
1095  | 
parent_lines = [parent_cache[parent] for parent in parent_map[key]]  | 
1096  | 
parent_cache[key] = list(  | 
|
| 
4371.3.30
by John Arbash Meinel
 Clean up the annotate code while using the new functionality.  | 
1097  | 
reannotate(parent_lines, lines, key, None, heads_provider))  | 
| 
0.17.4
by Robert Collins
 Annotate.  | 
1098  | 
return parent_cache[key]  | 
1099  | 
||
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1100  | 
def check(self, progress_bar=None):  | 
1101  | 
"""See VersionedFiles.check()."""  | 
|
1102  | 
keys = self.keys()  | 
|
1103  | 
for record in self.get_record_stream(keys, 'unordered', True):  | 
|
1104  | 
record.get_bytes_as('fulltext')  | 
|
1105  | 
||
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1106  | 
def _check_add(self, key, lines, random_id, check_content):  | 
1107  | 
"""check that version_id and lines are safe to add."""  | 
|
1108  | 
version_id = key[-1]  | 
|
| 
0.17.26
by Robert Collins
 Working better --gc-plain-chk.  | 
1109  | 
if version_id is not None:  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1110  | 
if osutils.contains_whitespace(version_id):  | 
| 
3735.31.1
by John Arbash Meinel
 Bring the groupcompress plugin into the brisbane-core branch.  | 
1111  | 
raise errors.InvalidRevisionId(version_id, self)  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1112  | 
self.check_not_reserved_id(version_id)  | 
1113  | 
        # TODO: If random_id==False and the key is already present, we should
 | 
|
1114  | 
        # probably check that the existing content is identical to what is
 | 
|
1115  | 
        # being inserted, and otherwise raise an exception.  This would make
 | 
|
1116  | 
        # the bundle code simpler.
 | 
|
1117  | 
if check_content:  | 
|
1118  | 
self._check_lines_not_unicode(lines)  | 
|
1119  | 
self._check_lines_are_lines(lines)  | 
|
1120  | 
||
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1121  | 
def get_parent_map(self, keys):  | 
| 
3735.31.7
by John Arbash Meinel
 Start bringing in stacking support for Groupcompress repos.  | 
1122  | 
"""Get a map of the graph parents of keys.  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1123  | 
|
1124  | 
        :param keys: The keys to look up parents for.
 | 
|
1125  | 
        :return: A mapping from keys to parents. Absent keys are absent from
 | 
|
1126  | 
            the mapping.
 | 
|
1127  | 
        """
 | 
|
| 
3735.31.7
by John Arbash Meinel
 Start bringing in stacking support for Groupcompress repos.  | 
1128  | 
return self._get_parent_map_with_sources(keys)[0]  | 
1129  | 
||
1130  | 
def _get_parent_map_with_sources(self, keys):  | 
|
1131  | 
"""Get a map of the parents of keys.  | 
|
1132  | 
||
1133  | 
        :param keys: The keys to look up parents for.
 | 
|
1134  | 
        :return: A tuple. The first element is a mapping from keys to parents.
 | 
|
1135  | 
            Absent keys are absent from the mapping. The second element is a
 | 
|
1136  | 
            list with the locations each key was found in. The first element
 | 
|
1137  | 
            is the in-this-knit parents, the second the first fallback source,
 | 
|
1138  | 
            and so on.
 | 
|
1139  | 
        """
 | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1140  | 
result = {}  | 
| 
3735.31.7
by John Arbash Meinel
 Start bringing in stacking support for Groupcompress repos.  | 
1141  | 
sources = [self._index] + self._fallback_vfs  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1142  | 
source_results = []  | 
1143  | 
missing = set(keys)  | 
|
1144  | 
for source in sources:  | 
|
1145  | 
if not missing:  | 
|
1146  | 
                break
 | 
|
1147  | 
new_result = source.get_parent_map(missing)  | 
|
1148  | 
source_results.append(new_result)  | 
|
1149  | 
result.update(new_result)  | 
|
1150  | 
missing.difference_update(set(new_result))  | 
|
| 
3735.31.7
by John Arbash Meinel
 Start bringing in stacking support for Groupcompress repos.  | 
1151  | 
return result, source_results  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1152  | 
|
| 
0.25.6
by John Arbash Meinel
 (tests broken) implement the basic ability to have a separate header  | 
1153  | 
def _get_block(self, index_memo):  | 
| 
0.20.14
by John Arbash Meinel
 Factor out _get_group_and_delta_lines.  | 
1154  | 
read_memo = index_memo[0:3]  | 
1155  | 
        # get the group:
 | 
|
1156  | 
try:  | 
|
| 
0.25.6
by John Arbash Meinel
 (tests broken) implement the basic ability to have a separate header  | 
1157  | 
block = self._group_cache[read_memo]  | 
| 
0.20.14
by John Arbash Meinel
 Factor out _get_group_and_delta_lines.  | 
1158  | 
except KeyError:  | 
1159  | 
            # read the group
 | 
|
1160  | 
zdata = self._access.get_raw_records([read_memo]).next()  | 
|
1161  | 
            # decompress - whole thing - this is not a bug, as it
 | 
|
1162  | 
            # permits caching. We might want to store the partially
 | 
|
1163  | 
            # decompresed group and decompress object, so that recent
 | 
|
1164  | 
            # texts are not penalised by big groups.
 | 
|
| 
0.25.6
by John Arbash Meinel
 (tests broken) implement the basic ability to have a separate header  | 
1165  | 
block = GroupCompressBlock.from_bytes(zdata)  | 
1166  | 
self._group_cache[read_memo] = block  | 
|
| 
0.20.14
by John Arbash Meinel
 Factor out _get_group_and_delta_lines.  | 
1167  | 
        # cheapo debugging:
 | 
1168  | 
        # print len(zdata), len(plain)
 | 
|
1169  | 
        # parse - requires split_lines, better to have byte offsets
 | 
|
1170  | 
        # here (but not by much - we only split the region for the
 | 
|
1171  | 
        # recipe, and we often want to end up with lines anyway.
 | 
|
| 
0.25.6
by John Arbash Meinel
 (tests broken) implement the basic ability to have a separate header  | 
1172  | 
return block  | 
| 
0.20.14
by John Arbash Meinel
 Factor out _get_group_and_delta_lines.  | 
1173  | 
|
| 
0.20.18
by John Arbash Meinel
 Implement new handling of get_bytes_as(), and get_missing_compression_parent_keys()  | 
1174  | 
def get_missing_compression_parent_keys(self):  | 
1175  | 
"""Return the keys of missing compression parents.  | 
|
1176  | 
||
1177  | 
        Missing compression parents occur when a record stream was missing
 | 
|
1178  | 
        basis texts, or a index was scanned that had missing basis texts.
 | 
|
1179  | 
        """
 | 
|
1180  | 
        # GroupCompress cannot currently reference texts that are not in the
 | 
|
1181  | 
        # group, so this is valid for now
 | 
|
1182  | 
return frozenset()  | 
|
1183  | 
||
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1184  | 
def get_record_stream(self, keys, ordering, include_delta_closure):  | 
1185  | 
"""Get a stream of records for keys.  | 
|
1186  | 
||
1187  | 
        :param keys: The keys to include.
 | 
|
1188  | 
        :param ordering: Either 'unordered' or 'topological'. A topologically
 | 
|
1189  | 
            sorted stream has compression parents strictly before their
 | 
|
1190  | 
            children.
 | 
|
1191  | 
        :param include_delta_closure: If True then the closure across any
 | 
|
1192  | 
            compression parents will be included (in the opaque data).
 | 
|
1193  | 
        :return: An iterator of ContentFactory objects, each of which is only
 | 
|
1194  | 
            valid until the iterator is advanced.
 | 
|
1195  | 
        """
 | 
|
1196  | 
        # keys might be a generator
 | 
|
| 
0.22.6
by John Arbash Meinel
 Clustering chk pages properly makes a big difference.  | 
1197  | 
orig_keys = list(keys)  | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1198  | 
keys = set(keys)  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1199  | 
if not keys:  | 
1200  | 
            return
 | 
|
| 
0.20.23
by John Arbash Meinel
 Add a progress indicator for chk pages.  | 
1201  | 
if (not self._index.has_graph  | 
| 
3735.31.14
by John Arbash Meinel
 Change the gc-optimal to 'groupcompress'  | 
1202  | 
and ordering in ('topological', 'groupcompress')):  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1203  | 
            # Cannot topological order when no graph has been stored.
 | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1204  | 
            # but we allow 'as-requested' or 'unordered'
 | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1205  | 
ordering = 'unordered'  | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1206  | 
|
1207  | 
remaining_keys = keys  | 
|
1208  | 
while True:  | 
|
1209  | 
try:  | 
|
1210  | 
keys = set(remaining_keys)  | 
|
1211  | 
for content_factory in self._get_remaining_record_stream(keys,  | 
|
1212  | 
orig_keys, ordering, include_delta_closure):  | 
|
1213  | 
remaining_keys.discard(content_factory.key)  | 
|
1214  | 
yield content_factory  | 
|
1215  | 
                return
 | 
|
1216  | 
except errors.RetryWithNewPacks, e:  | 
|
1217  | 
self._access.reload_or_raise(e)  | 
|
1218  | 
||
1219  | 
def _find_from_fallback(self, missing):  | 
|
1220  | 
"""Find whatever keys you can from the fallbacks.  | 
|
1221  | 
||
1222  | 
        :param missing: A set of missing keys. This set will be mutated as keys
 | 
|
1223  | 
            are found from a fallback_vfs
 | 
|
1224  | 
        :return: (parent_map, key_to_source_map, source_results)
 | 
|
1225  | 
            parent_map  the overall key => parent_keys
 | 
|
1226  | 
            key_to_source_map   a dict from {key: source}
 | 
|
1227  | 
            source_results      a list of (source: keys)
 | 
|
1228  | 
        """
 | 
|
1229  | 
parent_map = {}  | 
|
1230  | 
key_to_source_map = {}  | 
|
1231  | 
source_results = []  | 
|
1232  | 
for source in self._fallback_vfs:  | 
|
1233  | 
if not missing:  | 
|
1234  | 
                break
 | 
|
1235  | 
source_parents = source.get_parent_map(missing)  | 
|
1236  | 
parent_map.update(source_parents)  | 
|
1237  | 
source_parents = list(source_parents)  | 
|
1238  | 
source_results.append((source, source_parents))  | 
|
1239  | 
key_to_source_map.update((key, source) for key in source_parents)  | 
|
1240  | 
missing.difference_update(source_parents)  | 
|
1241  | 
return parent_map, key_to_source_map, source_results  | 
|
1242  | 
||
1243  | 
def _get_ordered_source_keys(self, ordering, parent_map, key_to_source_map):  | 
|
1244  | 
"""Get the (source, [keys]) list.  | 
|
1245  | 
||
1246  | 
        The returned objects should be in the order defined by 'ordering',
 | 
|
1247  | 
        which can weave between different sources.
 | 
|
1248  | 
        :param ordering: Must be one of 'topological' or 'groupcompress'
 | 
|
1249  | 
        :return: List of [(source, [keys])] tuples, such that all keys are in
 | 
|
1250  | 
            the defined order, regardless of source.
 | 
|
1251  | 
        """
 | 
|
1252  | 
if ordering == 'topological':  | 
|
1253  | 
present_keys = topo_sort(parent_map)  | 
|
1254  | 
else:  | 
|
1255  | 
            # ordering == 'groupcompress'
 | 
|
1256  | 
            # XXX: This only optimizes for the target ordering. We may need
 | 
|
1257  | 
            #      to balance that with the time it takes to extract
 | 
|
1258  | 
            #      ordering, by somehow grouping based on
 | 
|
1259  | 
            #      locations[key][0:3]
 | 
|
1260  | 
present_keys = sort_gc_optimal(parent_map)  | 
|
1261  | 
        # Now group by source:
 | 
|
1262  | 
source_keys = []  | 
|
1263  | 
current_source = None  | 
|
1264  | 
for key in present_keys:  | 
|
1265  | 
source = key_to_source_map.get(key, self)  | 
|
1266  | 
if source is not current_source:  | 
|
1267  | 
source_keys.append((source, []))  | 
|
| 
3735.32.12
by John Arbash Meinel
 Add groupcompress-block[-ref] as valid stream types.  | 
1268  | 
current_source = source  | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1269  | 
source_keys[-1][1].append(key)  | 
1270  | 
return source_keys  | 
|
1271  | 
||
1272  | 
def _get_as_requested_source_keys(self, orig_keys, locations, unadded_keys,  | 
|
1273  | 
key_to_source_map):  | 
|
1274  | 
source_keys = []  | 
|
1275  | 
current_source = None  | 
|
1276  | 
for key in orig_keys:  | 
|
1277  | 
if key in locations or key in unadded_keys:  | 
|
1278  | 
source = self  | 
|
1279  | 
elif key in key_to_source_map:  | 
|
1280  | 
source = key_to_source_map[key]  | 
|
1281  | 
else: # absent  | 
|
1282  | 
                continue
 | 
|
1283  | 
if source is not current_source:  | 
|
1284  | 
source_keys.append((source, []))  | 
|
| 
3735.32.12
by John Arbash Meinel
 Add groupcompress-block[-ref] as valid stream types.  | 
1285  | 
current_source = source  | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1286  | 
source_keys[-1][1].append(key)  | 
1287  | 
return source_keys  | 
|
1288  | 
||
1289  | 
def _get_io_ordered_source_keys(self, locations, unadded_keys,  | 
|
1290  | 
source_result):  | 
|
1291  | 
def get_group(key):  | 
|
1292  | 
            # This is the group the bytes are stored in, followed by the
 | 
|
1293  | 
            # location in the group
 | 
|
1294  | 
return locations[key][0]  | 
|
1295  | 
present_keys = sorted(locations.iterkeys(), key=get_group)  | 
|
1296  | 
        # We don't have an ordering for keys in the in-memory object, but
 | 
|
1297  | 
        # lets process the in-memory ones first.
 | 
|
1298  | 
present_keys = list(unadded_keys) + present_keys  | 
|
1299  | 
        # Now grab all of the ones from other sources
 | 
|
1300  | 
source_keys = [(self, present_keys)]  | 
|
1301  | 
source_keys.extend(source_result)  | 
|
1302  | 
return source_keys  | 
|
1303  | 
||
1304  | 
def _get_remaining_record_stream(self, keys, orig_keys, ordering,  | 
|
1305  | 
include_delta_closure):  | 
|
1306  | 
"""Get a stream of records for keys.  | 
|
1307  | 
||
1308  | 
        :param keys: The keys to include.
 | 
|
1309  | 
        :param ordering: one of 'unordered', 'topological', 'groupcompress' or
 | 
|
1310  | 
            'as-requested'
 | 
|
1311  | 
        :param include_delta_closure: If True then the closure across any
 | 
|
1312  | 
            compression parents will be included (in the opaque data).
 | 
|
1313  | 
        :return: An iterator of ContentFactory objects, each of which is only
 | 
|
1314  | 
            valid until the iterator is advanced.
 | 
|
1315  | 
        """
 | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1316  | 
        # Cheap: iterate
 | 
1317  | 
locations = self._index.get_build_details(keys)  | 
|
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1318  | 
unadded_keys = set(self._unadded_refs).intersection(keys)  | 
1319  | 
missing = keys.difference(locations)  | 
|
1320  | 
missing.difference_update(unadded_keys)  | 
|
1321  | 
(fallback_parent_map, key_to_source_map,  | 
|
1322  | 
source_result) = self._find_from_fallback(missing)  | 
|
1323  | 
if ordering in ('topological', 'groupcompress'):  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1324  | 
            # would be better to not globally sort initially but instead
 | 
1325  | 
            # start with one key, recurse to its oldest parent, then grab
 | 
|
1326  | 
            # everything in the same group, etc.
 | 
|
1327  | 
parent_map = dict((key, details[2]) for key, details in  | 
|
1328  | 
locations.iteritems())  | 
|
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1329  | 
for key in unadded_keys:  | 
1330  | 
parent_map[key] = self._unadded_refs[key]  | 
|
1331  | 
parent_map.update(fallback_parent_map)  | 
|
1332  | 
source_keys = self._get_ordered_source_keys(ordering, parent_map,  | 
|
1333  | 
key_to_source_map)  | 
|
| 
0.22.6
by John Arbash Meinel
 Clustering chk pages properly makes a big difference.  | 
1334  | 
elif ordering == 'as-requested':  | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1335  | 
source_keys = self._get_as_requested_source_keys(orig_keys,  | 
1336  | 
locations, unadded_keys, key_to_source_map)  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1337  | 
else:  | 
| 
0.20.10
by John Arbash Meinel
 Change the extraction ordering for 'unordered'.  | 
1338  | 
            # We want to yield the keys in a semi-optimal (read-wise) ordering.
 | 
1339  | 
            # Otherwise we thrash the _group_cache and destroy performance
 | 
|
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1340  | 
source_keys = self._get_io_ordered_source_keys(locations,  | 
1341  | 
unadded_keys, source_result)  | 
|
1342  | 
for key in missing:  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1343  | 
yield AbsentContentFactory(key)  | 
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
1344  | 
manager = None  | 
| 
3735.34.3
by John Arbash Meinel
 Cleanup, in preparation for merging to brisbane-core.  | 
1345  | 
last_read_memo = None  | 
| 
3735.32.15
by John Arbash Meinel
 Change the GroupCompressBlock code to allow not recording 'end'.  | 
1346  | 
        # TODO: This works fairly well at batching up existing groups into a
 | 
1347  | 
        #       streamable format, and possibly allowing for taking one big
 | 
|
1348  | 
        #       group and splitting it when it isn't fully utilized.
 | 
|
1349  | 
        #       However, it doesn't allow us to find under-utilized groups and
 | 
|
1350  | 
        #       combine them into a bigger group on the fly.
 | 
|
1351  | 
        #       (Consider the issue with how chk_map inserts texts
 | 
|
1352  | 
        #       one-at-a-time.) This could be done at insert_record_stream()
 | 
|
1353  | 
        #       time, but it probably would decrease the number of
 | 
|
1354  | 
        #       bytes-on-the-wire for fetch.
 | 
|
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1355  | 
for source, keys in source_keys:  | 
1356  | 
if source is self:  | 
|
1357  | 
for key in keys:  | 
|
1358  | 
if key in self._unadded_refs:  | 
|
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
1359  | 
if manager is not None:  | 
1360  | 
for factory in manager.get_record_stream():  | 
|
| 
3735.32.12
by John Arbash Meinel
 Add groupcompress-block[-ref] as valid stream types.  | 
1361  | 
yield factory  | 
| 
3735.34.3
by John Arbash Meinel
 Cleanup, in preparation for merging to brisbane-core.  | 
1362  | 
last_read_memo = manager = None  | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1363  | 
bytes, sha1 = self._compressor.extract(key)  | 
1364  | 
parents = self._unadded_refs[key]  | 
|
| 
3735.32.12
by John Arbash Meinel
 Add groupcompress-block[-ref] as valid stream types.  | 
1365  | 
yield FulltextContentFactory(key, parents, sha1, bytes)  | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1366  | 
else:  | 
1367  | 
index_memo, _, parents, (method, _) = locations[key]  | 
|
| 
3735.34.1
by John Arbash Meinel
 Some testing to see if we can decrease the peak memory consumption a bit.  | 
1368  | 
read_memo = index_memo[0:3]  | 
| 
3735.34.3
by John Arbash Meinel
 Cleanup, in preparation for merging to brisbane-core.  | 
1369  | 
if last_read_memo != read_memo:  | 
1370  | 
                            # We are starting a new block. If we have a
 | 
|
1371  | 
                            # manager, we have found everything that fits for
 | 
|
1372  | 
                            # now, so yield records
 | 
|
1373  | 
if manager is not None:  | 
|
1374  | 
for factory in manager.get_record_stream():  | 
|
1375  | 
yield factory  | 
|
1376  | 
                            # Now start a new manager
 | 
|
| 
3735.34.1
by John Arbash Meinel
 Some testing to see if we can decrease the peak memory consumption a bit.  | 
1377  | 
block = self._get_block(index_memo)  | 
| 
3735.34.3
by John Arbash Meinel
 Cleanup, in preparation for merging to brisbane-core.  | 
1378  | 
manager = _LazyGroupContentManager(block)  | 
1379  | 
last_read_memo = read_memo  | 
|
| 
3735.32.8
by John Arbash Meinel
 Some tests for the LazyGroupCompressFactory  | 
1380  | 
start, end = index_memo[3:5]  | 
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
1381  | 
manager.add_factory(key, parents, start, end)  | 
| 
0.17.11
by Robert Collins
 Add extraction of just-compressed texts to support converting from knits.  | 
1382  | 
else:  | 
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
1383  | 
if manager is not None:  | 
1384  | 
for factory in manager.get_record_stream():  | 
|
| 
3735.32.12
by John Arbash Meinel
 Add groupcompress-block[-ref] as valid stream types.  | 
1385  | 
yield factory  | 
| 
3735.34.3
by John Arbash Meinel
 Cleanup, in preparation for merging to brisbane-core.  | 
1386  | 
last_read_memo = manager = None  | 
| 
3735.31.18
by John Arbash Meinel
 Implement stacking support across all ordering implementations.  | 
1387  | 
for record in source.get_record_stream(keys, ordering,  | 
1388  | 
include_delta_closure):  | 
|
1389  | 
yield record  | 
|
| 
3735.32.14
by John Arbash Meinel
 Move the tests over to testing the LazyGroupContentManager object.  | 
1390  | 
if manager is not None:  | 
1391  | 
for factory in manager.get_record_stream():  | 
|
| 
3735.32.12
by John Arbash Meinel
 Add groupcompress-block[-ref] as valid stream types.  | 
1392  | 
yield factory  | 
| 
0.20.5
by John Arbash Meinel
 Finish the Fulltext => Chunked conversions so that we work in the more-efficient Chunks.  | 
1393  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1394  | 
def get_sha1s(self, keys):  | 
1395  | 
"""See VersionedFiles.get_sha1s()."""  | 
|
1396  | 
result = {}  | 
|
1397  | 
for record in self.get_record_stream(keys, 'unordered', True):  | 
|
1398  | 
if record.sha1 != None:  | 
|
1399  | 
result[record.key] = record.sha1  | 
|
1400  | 
else:  | 
|
1401  | 
if record.storage_kind != 'absent':  | 
|
| 
3735.40.2
by John Arbash Meinel
 Add a groupcompress.encode_copy_instruction function.  | 
1402  | 
result[record.key] = osutils.sha_string(  | 
1403  | 
record.get_bytes_as('fulltext'))  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1404  | 
return result  | 
1405  | 
||
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1406  | 
def insert_record_stream(self, stream):  | 
1407  | 
"""Insert a record stream into this container.  | 
|
1408  | 
||
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1409  | 
        :param stream: A stream of records to insert.
 | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1410  | 
        :return: None
 | 
1411  | 
        :seealso VersionedFiles.get_record_stream:
 | 
|
1412  | 
        """
 | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1413  | 
        # XXX: Setting random_id=True makes
 | 
1414  | 
        # test_insert_record_stream_existing_keys fail for groupcompress and
 | 
|
1415  | 
        # groupcompress-nograph, this needs to be revisited while addressing
 | 
|
1416  | 
        # 'bzr branch' performance issues.
 | 
|
1417  | 
for _ in self._insert_record_stream(stream, random_id=False):  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1418  | 
            pass
 | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1419  | 
|
| 
3735.32.21
by John Arbash Meinel
 We now have a 'reuse_blocks=False' flag for autopack et al.  | 
1420  | 
def _insert_record_stream(self, stream, random_id=False, nostore_sha=None,  | 
1421  | 
reuse_blocks=True):  | 
|
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1422  | 
"""Internal core to insert a record stream into this container.  | 
1423  | 
||
1424  | 
        This helper function has a different interface than insert_record_stream
 | 
|
1425  | 
        to allow add_lines to be minimal, but still return the needed data.
 | 
|
1426  | 
||
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1427  | 
        :param stream: A stream of records to insert.
 | 
| 
3735.31.12
by John Arbash Meinel
 Push nostore_sha down through the stack.  | 
1428  | 
        :param nostore_sha: If the sha1 of a given text matches nostore_sha,
 | 
1429  | 
            raise ExistingContent, rather than committing the new text.
 | 
|
| 
3735.32.21
by John Arbash Meinel
 We now have a 'reuse_blocks=False' flag for autopack et al.  | 
1430  | 
        :param reuse_blocks: If the source is streaming from
 | 
1431  | 
            groupcompress-blocks, just insert the blocks as-is, rather than
 | 
|
1432  | 
            expanding the texts and inserting again.
 | 
|
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1433  | 
        :return: An iterator over the sha1 of the inserted records.
 | 
1434  | 
        :seealso insert_record_stream:
 | 
|
1435  | 
        :seealso add_lines:
 | 
|
1436  | 
        """
 | 
|
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
1437  | 
adapters = {}  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1438  | 
def get_adapter(adapter_key):  | 
1439  | 
try:  | 
|
1440  | 
return adapters[adapter_key]  | 
|
1441  | 
except KeyError:  | 
|
1442  | 
adapter_factory = adapter_registry.get(adapter_key)  | 
|
1443  | 
adapter = adapter_factory(self)  | 
|
1444  | 
adapters[adapter_key] = adapter  | 
|
1445  | 
return adapter  | 
|
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1446  | 
        # This will go up to fulltexts for gc to gc fetching, which isn't
 | 
1447  | 
        # ideal.
 | 
|
| 
3735.32.19
by John Arbash Meinel
 Get rid of the 'delta' flag to GroupCompressor. It didn't do anything anyway.  | 
1448  | 
self._compressor = GroupCompressor()  | 
| 
0.17.11
by Robert Collins
 Add extraction of just-compressed texts to support converting from knits.  | 
1449  | 
self._unadded_refs = {}  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1450  | 
keys_to_add = []  | 
| 
0.17.6
by Robert Collins
 Cap group size at 20MB internal buffer. (Probably way too big).  | 
1451  | 
def flush():  | 
| 
3735.32.23
by John Arbash Meinel
 Add a _LazyGroupContentManager._check_rebuild_block  | 
1452  | 
bytes = self._compressor.flush().to_bytes()  | 
| 
0.17.6
by Robert Collins
 Cap group size at 20MB internal buffer. (Probably way too big).  | 
1453  | 
index, start, length = self._access.add_raw_records(  | 
| 
0.25.7
by John Arbash Meinel
 Have the GroupCompressBlock decide how to compress the header and content.  | 
1454  | 
[(None, len(bytes))], bytes)[0]  | 
| 
0.17.6
by Robert Collins
 Cap group size at 20MB internal buffer. (Probably way too big).  | 
1455  | 
nodes = []  | 
1456  | 
for key, reads, refs in keys_to_add:  | 
|
1457  | 
nodes.append((key, "%d %d %s" % (start, length, reads), refs))  | 
|
1458  | 
self._index.add_records(nodes, random_id=random_id)  | 
|
| 
0.25.10
by John Arbash Meinel
 Play around with detecting compression breaks.  | 
1459  | 
self._unadded_refs = {}  | 
1460  | 
del keys_to_add[:]  | 
|
| 
3735.32.19
by John Arbash Meinel
 Get rid of the 'delta' flag to GroupCompressor. It didn't do anything anyway.  | 
1461  | 
self._compressor = GroupCompressor()  | 
| 
0.25.10
by John Arbash Meinel
 Play around with detecting compression breaks.  | 
1462  | 
|
| 
0.20.15
by John Arbash Meinel
 Change so that regions that have lots of copies get converted back  | 
1463  | 
last_prefix = None  | 
| 
0.25.10
by John Arbash Meinel
 Play around with detecting compression breaks.  | 
1464  | 
max_fulltext_len = 0  | 
| 
0.25.11
by John Arbash Meinel
 Slightly different handling of large texts.  | 
1465  | 
max_fulltext_prefix = None  | 
| 
3735.32.20
by John Arbash Meinel
 groupcompress now copies the blocks exactly as they were given.  | 
1466  | 
insert_manager = None  | 
1467  | 
block_start = None  | 
|
1468  | 
block_length = None  | 
|
| 
3735.36.15
by John Arbash Meinel
 Set 'combine_backing_indices=False' as the default for text and chk indices.  | 
1469  | 
        # XXX: TODO: remove this, it is just for safety checking for now
 | 
1470  | 
inserted_keys = set()  | 
|
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1471  | 
for record in stream:  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1472  | 
            # Raise an error when a record is missing.
 | 
1473  | 
if record.storage_kind == 'absent':  | 
|
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
1474  | 
raise errors.RevisionNotPresent(record.key, self)  | 
| 
3735.36.15
by John Arbash Meinel
 Set 'combine_backing_indices=False' as the default for text and chk indices.  | 
1475  | 
if random_id:  | 
1476  | 
if record.key in inserted_keys:  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1477  | 
trace.note('Insert claimed random_id=True,'  | 
1478  | 
' but then inserted %r two times', record.key)  | 
|
| 
3735.36.15
by John Arbash Meinel
 Set 'combine_backing_indices=False' as the default for text and chk indices.  | 
1479  | 
                    continue
 | 
1480  | 
inserted_keys.add(record.key)  | 
|
| 
3735.32.21
by John Arbash Meinel
 We now have a 'reuse_blocks=False' flag for autopack et al.  | 
1481  | 
if reuse_blocks:  | 
1482  | 
                # If the reuse_blocks flag is set, check to see if we can just
 | 
|
1483  | 
                # copy a groupcompress block as-is.
 | 
|
1484  | 
if record.storage_kind == 'groupcompress-block':  | 
|
1485  | 
                    # Insert the raw block into the target repo
 | 
|
1486  | 
insert_manager = record._manager  | 
|
| 
3735.2.163
by John Arbash Meinel
 Merge bzr.dev 4187, and revert the change to fix refcycle issues.  | 
1487  | 
insert_manager._check_rebuild_block()  | 
| 
3735.32.21
by John Arbash Meinel
 We now have a 'reuse_blocks=False' flag for autopack et al.  | 
1488  | 
bytes = record._manager._block.to_bytes()  | 
1489  | 
_, start, length = self._access.add_raw_records(  | 
|
1490  | 
[(None, len(bytes))], bytes)[0]  | 
|
1491  | 
del bytes  | 
|
1492  | 
block_start = start  | 
|
1493  | 
block_length = length  | 
|
1494  | 
if record.storage_kind in ('groupcompress-block',  | 
|
1495  | 
'groupcompress-block-ref'):  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1496  | 
if insert_manager is None:  | 
1497  | 
raise AssertionError('No insert_manager set')  | 
|
| 
3735.32.21
by John Arbash Meinel
 We now have a 'reuse_blocks=False' flag for autopack et al.  | 
1498  | 
value = "%d %d %d %d" % (block_start, block_length,  | 
1499  | 
record._start, record._end)  | 
|
1500  | 
nodes = [(record.key, value, (record.parents,))]  | 
|
| 
3735.38.1
by John Arbash Meinel
 Change the delta byte stream to remove the 'source length' entry.  | 
1501  | 
                    # TODO: Consider buffering up many nodes to be added, not
 | 
1502  | 
                    #       sure how much overhead this has, but we're seeing
 | 
|
1503  | 
                    #       ~23s / 120s in add_records calls
 | 
|
| 
3735.32.21
by John Arbash Meinel
 We now have a 'reuse_blocks=False' flag for autopack et al.  | 
1504  | 
self._index.add_records(nodes, random_id=random_id)  | 
1505  | 
                    continue
 | 
|
| 
0.20.18
by John Arbash Meinel
 Implement new handling of get_bytes_as(), and get_missing_compression_parent_keys()  | 
1506  | 
try:  | 
| 
0.23.52
by John Arbash Meinel
 Use the max_delta flag.  | 
1507  | 
bytes = record.get_bytes_as('fulltext')  | 
| 
0.20.18
by John Arbash Meinel
 Implement new handling of get_bytes_as(), and get_missing_compression_parent_keys()  | 
1508  | 
except errors.UnavailableRepresentation:  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1509  | 
adapter_key = record.storage_kind, 'fulltext'  | 
1510  | 
adapter = get_adapter(adapter_key)  | 
|
| 
0.20.21
by John Arbash Meinel
 Merge the chk sorting code.  | 
1511  | 
bytes = adapter.get_bytes(record)  | 
| 
0.20.13
by John Arbash Meinel
 Play around a bit.  | 
1512  | 
if len(record.key) > 1:  | 
1513  | 
prefix = record.key[0]  | 
|
| 
0.25.11
by John Arbash Meinel
 Slightly different handling of large texts.  | 
1514  | 
soft = (prefix == last_prefix)  | 
| 
0.25.10
by John Arbash Meinel
 Play around with detecting compression breaks.  | 
1515  | 
else:  | 
1516  | 
prefix = None  | 
|
| 
0.25.11
by John Arbash Meinel
 Slightly different handling of large texts.  | 
1517  | 
soft = False  | 
1518  | 
if max_fulltext_len < len(bytes):  | 
|
1519  | 
max_fulltext_len = len(bytes)  | 
|
1520  | 
max_fulltext_prefix = prefix  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1521  | 
(found_sha1, start_point, end_point,  | 
1522  | 
type) = self._compressor.compress(record.key,  | 
|
1523  | 
bytes, record.sha1, soft=soft,  | 
|
1524  | 
nostore_sha=nostore_sha)  | 
|
1525  | 
            # delta_ratio = float(len(bytes)) / (end_point - start_point)
 | 
|
| 
0.25.10
by John Arbash Meinel
 Play around with detecting compression breaks.  | 
1526  | 
            # Check if we want to continue to include that text
 | 
| 
0.25.11
by John Arbash Meinel
 Slightly different handling of large texts.  | 
1527  | 
if (prefix == max_fulltext_prefix  | 
1528  | 
and end_point < 2 * max_fulltext_len):  | 
|
1529  | 
                # As long as we are on the same file_id, we will fill at least
 | 
|
1530  | 
                # 2 * max_fulltext_len
 | 
|
1531  | 
start_new_block = False  | 
|
1532  | 
elif end_point > 4*1024*1024:  | 
|
1533  | 
start_new_block = True  | 
|
1534  | 
elif (prefix is not None and prefix != last_prefix  | 
|
1535  | 
and end_point > 2*1024*1024):  | 
|
1536  | 
start_new_block = True  | 
|
1537  | 
else:  | 
|
1538  | 
start_new_block = False  | 
|
| 
0.25.10
by John Arbash Meinel
 Play around with detecting compression breaks.  | 
1539  | 
last_prefix = prefix  | 
1540  | 
if start_new_block:  | 
|
1541  | 
self._compressor.pop_last()  | 
|
1542  | 
flush()  | 
|
1543  | 
max_fulltext_len = len(bytes)  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1544  | 
(found_sha1, start_point, end_point,  | 
1545  | 
type) = self._compressor.compress(record.key, bytes,  | 
|
1546  | 
record.sha1)  | 
|
| 
0.17.26
by Robert Collins
 Working better --gc-plain-chk.  | 
1547  | 
if record.key[-1] is None:  | 
1548  | 
key = record.key[:-1] + ('sha1:' + found_sha1,)  | 
|
1549  | 
else:  | 
|
1550  | 
key = record.key  | 
|
1551  | 
self._unadded_refs[key] = record.parents  | 
|
| 
0.17.3
by Robert Collins
 new encoder, allows non monotonically increasing sequence matches for moar compression.  | 
1552  | 
yield found_sha1  | 
| 
3735.2.164
by John Arbash Meinel
 Fix a critical bug that caused problems with the index entries.  | 
1553  | 
keys_to_add.append((key, '%d %d' % (start_point, end_point),  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1554  | 
(record.parents,)))  | 
| 
0.17.8
by Robert Collins
 Flush pending updates at the end of _insert_record_stream  | 
1555  | 
if len(keys_to_add):  | 
1556  | 
flush()  | 
|
| 
0.17.11
by Robert Collins
 Add extraction of just-compressed texts to support converting from knits.  | 
1557  | 
self._compressor = None  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1558  | 
|
1559  | 
def iter_lines_added_or_present_in_keys(self, keys, pb=None):  | 
|
1560  | 
"""Iterate over the lines in the versioned files from keys.  | 
|
1561  | 
||
1562  | 
        This may return lines from other keys. Each item the returned
 | 
|
1563  | 
        iterator yields is a tuple of a line and a text version that that line
 | 
|
1564  | 
        is present in (not introduced in).
 | 
|
1565  | 
||
1566  | 
        Ordering of results is in whatever order is most suitable for the
 | 
|
1567  | 
        underlying storage format.
 | 
|
1568  | 
||
1569  | 
        If a progress bar is supplied, it may be used to indicate progress.
 | 
|
1570  | 
        The caller is responsible for cleaning up progress bars (because this
 | 
|
1571  | 
        is an iterator).
 | 
|
1572  | 
||
1573  | 
        NOTES:
 | 
|
1574  | 
         * Lines are normalised by the underlying store: they will all have \n
 | 
|
1575  | 
           terminators.
 | 
|
1576  | 
         * Lines are returned in arbitrary order.
 | 
|
1577  | 
||
1578  | 
        :return: An iterator over (line, key).
 | 
|
1579  | 
        """
 | 
|
1580  | 
keys = set(keys)  | 
|
1581  | 
total = len(keys)  | 
|
1582  | 
        # we don't care about inclusions, the caller cares.
 | 
|
1583  | 
        # but we need to setup a list of records to visit.
 | 
|
1584  | 
        # we need key, position, length
 | 
|
1585  | 
for key_idx, record in enumerate(self.get_record_stream(keys,  | 
|
1586  | 
'unordered', True)):  | 
|
1587  | 
            # XXX: todo - optimise to use less than full texts.
 | 
|
1588  | 
key = record.key  | 
|
| 
4398.8.8
by John Arbash Meinel
 Respond to Andrew's review comments.  | 
1589  | 
if pb is not None:  | 
1590  | 
pb.update('Walking content', key_idx, total)  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1591  | 
if record.storage_kind == 'absent':  | 
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
1592  | 
raise errors.RevisionNotPresent(key, self)  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1593  | 
lines = osutils.split_lines(record.get_bytes_as('fulltext'))  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1594  | 
for line in lines:  | 
1595  | 
yield line, key  | 
|
| 
4398.8.8
by John Arbash Meinel
 Respond to Andrew's review comments.  | 
1596  | 
if pb is not None:  | 
1597  | 
pb.update('Walking content', total, total)  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1598  | 
|
1599  | 
def keys(self):  | 
|
1600  | 
"""See VersionedFiles.keys."""  | 
|
1601  | 
if 'evil' in debug.debug_flags:  | 
|
1602  | 
trace.mutter_callsite(2, "keys scales with size of history")  | 
|
| 
3735.31.7
by John Arbash Meinel
 Start bringing in stacking support for Groupcompress repos.  | 
1603  | 
sources = [self._index] + self._fallback_vfs  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1604  | 
result = set()  | 
1605  | 
for source in sources:  | 
|
1606  | 
result.update(source.keys())  | 
|
1607  | 
return result  | 
|
1608  | 
||
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
1609  | 
|
1610  | 
class _GCGraphIndex(object):  | 
|
1611  | 
"""Mapper from GroupCompressVersionedFiles needs into GraphIndex storage."""  | 
|
1612  | 
||
| 
0.17.9
by Robert Collins
 Initial stab at repository format support.  | 
1613  | 
def __init__(self, graph_index, is_locked, parents=True,  | 
| 
4465.2.4
by Aaron Bentley
 Switch between warn and raise depending on inconsistent_fatal.  | 
1614  | 
add_callback=None, track_external_parent_refs=False,  | 
1615  | 
inconsistency_fatal=True):  | 
|
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
1616  | 
"""Construct a _GCGraphIndex on a graph_index.  | 
1617  | 
||
1618  | 
        :param graph_index: An implementation of bzrlib.index.GraphIndex.
 | 
|
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
1619  | 
        :param is_locked: A callback, returns True if the index is locked and
 | 
1620  | 
            thus usable.
 | 
|
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1621  | 
        :param parents: If True, record knits parents, if not do not record
 | 
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
1622  | 
            parents.
 | 
1623  | 
        :param add_callback: If not None, allow additions to the index and call
 | 
|
1624  | 
            this callback with a list of added GraphIndex nodes:
 | 
|
1625  | 
            [(node, value, node_refs), ...]
 | 
|
| 
4343.3.21
by John Arbash Meinel
 Implement get_missing_parents in terms of _KeyRefs.  | 
1626  | 
        :param track_external_parent_refs: As keys are added, keep track of the
 | 
1627  | 
            keys they reference, so that we can query get_missing_parents(),
 | 
|
1628  | 
            etc.
 | 
|
| 
4465.2.4
by Aaron Bentley
 Switch between warn and raise depending on inconsistent_fatal.  | 
1629  | 
        :param inconsistency_fatal: When asked to add records that are already
 | 
1630  | 
            present, and the details are inconsistent with the existing
 | 
|
1631  | 
            record, raise an exception instead of warning (and skipping the
 | 
|
1632  | 
            record).
 | 
|
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
1633  | 
        """
 | 
1634  | 
self._add_callback = add_callback  | 
|
1635  | 
self._graph_index = graph_index  | 
|
1636  | 
self._parents = parents  | 
|
1637  | 
self.has_graph = parents  | 
|
1638  | 
self._is_locked = is_locked  | 
|
| 
4465.2.4
by Aaron Bentley
 Switch between warn and raise depending on inconsistent_fatal.  | 
1639  | 
self._inconsistency_fatal = inconsistency_fatal  | 
| 
4343.3.21
by John Arbash Meinel
 Implement get_missing_parents in terms of _KeyRefs.  | 
1640  | 
if track_external_parent_refs:  | 
1641  | 
self._key_dependencies = knit._KeyRefs()  | 
|
1642  | 
else:  | 
|
1643  | 
self._key_dependencies = None  | 
|
| 
0.17.1
by Robert Collins
 Starting point. Interface tests hooked up and failing.  | 
1644  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1645  | 
def add_records(self, records, random_id=False):  | 
1646  | 
"""Add multiple records to the index.  | 
|
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1647  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1648  | 
        This function does not insert data into the Immutable GraphIndex
 | 
1649  | 
        backing the KnitGraphIndex, instead it prepares data for insertion by
 | 
|
1650  | 
        the caller and checks that it is safe to insert then calls
 | 
|
1651  | 
        self._add_callback with the prepared GraphIndex nodes.
 | 
|
1652  | 
||
1653  | 
        :param records: a list of tuples:
 | 
|
1654  | 
                         (key, options, access_memo, parents).
 | 
|
1655  | 
        :param random_id: If True the ids being added were randomly generated
 | 
|
1656  | 
            and no check for existence will be performed.
 | 
|
1657  | 
        """
 | 
|
1658  | 
if not self._add_callback:  | 
|
1659  | 
raise errors.ReadOnlyError(self)  | 
|
1660  | 
        # we hope there are no repositories with inconsistent parentage
 | 
|
1661  | 
        # anymore.
 | 
|
1662  | 
||
1663  | 
changed = False  | 
|
1664  | 
keys = {}  | 
|
1665  | 
for (key, value, refs) in records:  | 
|
1666  | 
if not self._parents:  | 
|
1667  | 
if refs:  | 
|
1668  | 
for ref in refs:  | 
|
1669  | 
if ref:  | 
|
| 
4398.8.1
by John Arbash Meinel
 Add a VersionedFile.add_text() api.  | 
1670  | 
raise errors.KnitCorrupt(self,  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1671  | 
                                "attempt to add node with parents "
 | 
1672  | 
"in parentless index.")  | 
|
1673  | 
refs = ()  | 
|
1674  | 
changed = True  | 
|
1675  | 
keys[key] = (value, refs)  | 
|
1676  | 
        # check for dups
 | 
|
1677  | 
if not random_id:  | 
|
1678  | 
present_nodes = self._get_entries(keys)  | 
|
1679  | 
for (index, key, value, node_refs) in present_nodes:  | 
|
1680  | 
if node_refs != keys[key][1]:  | 
|
| 
4465.2.4
by Aaron Bentley
 Switch between warn and raise depending on inconsistent_fatal.  | 
1681  | 
details = '%s %s %s' % (key, (value, node_refs), keys[key])  | 
1682  | 
if self._inconsistency_fatal:  | 
|
1683  | 
raise errors.KnitCorrupt(self, "inconsistent details"  | 
|
1684  | 
" in add_records: %s" %  | 
|
1685  | 
details)  | 
|
1686  | 
else:  | 
|
1687  | 
trace.warning("inconsistent details in skipped"  | 
|
1688  | 
" record: %s", details)  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1689  | 
del keys[key]  | 
1690  | 
changed = True  | 
|
1691  | 
if changed:  | 
|
1692  | 
result = []  | 
|
1693  | 
if self._parents:  | 
|
1694  | 
for key, (value, node_refs) in keys.iteritems():  | 
|
1695  | 
result.append((key, value, node_refs))  | 
|
1696  | 
else:  | 
|
1697  | 
for key, (value, node_refs) in keys.iteritems():  | 
|
1698  | 
result.append((key, value))  | 
|
1699  | 
records = result  | 
|
| 
4343.3.21
by John Arbash Meinel
 Implement get_missing_parents in terms of _KeyRefs.  | 
1700  | 
key_dependencies = self._key_dependencies  | 
1701  | 
if key_dependencies is not None and self._parents:  | 
|
1702  | 
for key, value, refs in records:  | 
|
1703  | 
parents = refs[0]  | 
|
1704  | 
key_dependencies.add_references(key, parents)  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1705  | 
self._add_callback(records)  | 
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1706  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1707  | 
def _check_read(self):  | 
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
1708  | 
"""Raise an exception if reads are not permitted."""  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1709  | 
if not self._is_locked():  | 
1710  | 
raise errors.ObjectNotLocked(self)  | 
|
1711  | 
||
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1712  | 
def _check_write_ok(self):  | 
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
1713  | 
"""Raise an exception if writes are not permitted."""  | 
| 
0.17.2
by Robert Collins
 Core proof of concept working.  | 
1714  | 
if not self._is_locked():  | 
1715  | 
raise errors.ObjectNotLocked(self)  | 
|
1716  | 
||
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1717  | 
def _get_entries(self, keys, check_present=False):  | 
1718  | 
"""Get the entries for keys.  | 
|
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
1719  | 
|
1720  | 
        Note: Callers are responsible for checking that the index is locked
 | 
|
1721  | 
        before calling this method.
 | 
|
1722  | 
||
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1723  | 
        :param keys: An iterable of index key tuples.
 | 
1724  | 
        """
 | 
|
1725  | 
keys = set(keys)  | 
|
1726  | 
found_keys = set()  | 
|
1727  | 
if self._parents:  | 
|
1728  | 
for node in self._graph_index.iter_entries(keys):  | 
|
1729  | 
yield node  | 
|
1730  | 
found_keys.add(node[1])  | 
|
1731  | 
else:  | 
|
1732  | 
            # adapt parentless index to the rest of the code.
 | 
|
1733  | 
for node in self._graph_index.iter_entries(keys):  | 
|
1734  | 
yield node[0], node[1], node[2], ()  | 
|
1735  | 
found_keys.add(node[1])  | 
|
1736  | 
if check_present:  | 
|
1737  | 
missing_keys = keys.difference(found_keys)  | 
|
1738  | 
if missing_keys:  | 
|
| 
4398.8.8
by John Arbash Meinel
 Respond to Andrew's review comments.  | 
1739  | 
raise errors.RevisionNotPresent(missing_keys.pop(), self)  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1740  | 
|
1741  | 
def get_parent_map(self, keys):  | 
|
1742  | 
"""Get a map of the parents of keys.  | 
|
1743  | 
||
1744  | 
        :param keys: The keys to look up parents for.
 | 
|
1745  | 
        :return: A mapping from keys to parents. Absent keys are absent from
 | 
|
1746  | 
            the mapping.
 | 
|
1747  | 
        """
 | 
|
1748  | 
self._check_read()  | 
|
1749  | 
nodes = self._get_entries(keys)  | 
|
1750  | 
result = {}  | 
|
1751  | 
if self._parents:  | 
|
1752  | 
for node in nodes:  | 
|
1753  | 
result[node[1]] = node[3][0]  | 
|
1754  | 
else:  | 
|
1755  | 
for node in nodes:  | 
|
1756  | 
result[node[1]] = None  | 
|
1757  | 
return result  | 
|
1758  | 
||
| 
4343.3.1
by John Arbash Meinel
 Set 'supports_external_lookups=True' for dev6 repositories.  | 
1759  | 
def get_missing_parents(self):  | 
| 
4343.3.21
by John Arbash Meinel
 Implement get_missing_parents in terms of _KeyRefs.  | 
1760  | 
"""Return the keys of missing parents."""  | 
1761  | 
        # Copied from _KnitGraphIndex.get_missing_parents
 | 
|
1762  | 
        # We may have false positives, so filter those out.
 | 
|
1763  | 
self._key_dependencies.add_keys(  | 
|
1764  | 
self.get_parent_map(self._key_dependencies.get_unsatisfied_refs()))  | 
|
1765  | 
return frozenset(self._key_dependencies.get_unsatisfied_refs())  | 
|
| 
4343.3.1
by John Arbash Meinel
 Set 'supports_external_lookups=True' for dev6 repositories.  | 
1766  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1767  | 
def get_build_details(self, keys):  | 
1768  | 
"""Get the various build details for keys.  | 
|
1769  | 
||
1770  | 
        Ghosts are omitted from the result.
 | 
|
1771  | 
||
1772  | 
        :param keys: An iterable of keys.
 | 
|
1773  | 
        :return: A dict of key:
 | 
|
1774  | 
            (index_memo, compression_parent, parents, record_details).
 | 
|
1775  | 
            index_memo
 | 
|
1776  | 
                opaque structure to pass to read_records to extract the raw
 | 
|
1777  | 
                data
 | 
|
1778  | 
            compression_parent
 | 
|
1779  | 
                Content that this record is built upon, may be None
 | 
|
1780  | 
            parents
 | 
|
1781  | 
                Logical parents of this node
 | 
|
1782  | 
            record_details
 | 
|
1783  | 
                extra information about the content which needs to be passed to
 | 
|
1784  | 
                Factory.parse_record
 | 
|
1785  | 
        """
 | 
|
1786  | 
self._check_read()  | 
|
1787  | 
result = {}  | 
|
| 
0.20.29
by Ian Clatworthy
 groupcompress.py code cleanups  | 
1788  | 
entries = self._get_entries(keys)  | 
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1789  | 
for entry in entries:  | 
1790  | 
key = entry[1]  | 
|
1791  | 
if not self._parents:  | 
|
1792  | 
parents = None  | 
|
1793  | 
else:  | 
|
1794  | 
parents = entry[3][0]  | 
|
1795  | 
method = 'group'  | 
|
1796  | 
result[key] = (self._node_to_position(entry),  | 
|
1797  | 
None, parents, (method, None))  | 
|
1798  | 
return result  | 
|
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1799  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1800  | 
def keys(self):  | 
1801  | 
"""Get all the keys in the collection.  | 
|
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1802  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1803  | 
        The keys are not ordered.
 | 
1804  | 
        """
 | 
|
1805  | 
self._check_read()  | 
|
1806  | 
return [node[1] for node in self._graph_index.iter_all_entries()]  | 
|
| 
3735.31.2
by John Arbash Meinel
 Cleanup trailing whitespace, get test_source to pass by removing asserts.  | 
1807  | 
|
| 
0.17.5
by Robert Collins
 nograph tests completely passing.  | 
1808  | 
def _node_to_position(self, node):  | 
1809  | 
"""Convert an index value to position details."""  | 
|
1810  | 
bits = node[2].split(' ')  | 
|
1811  | 
        # It would be nice not to read the entire gzip.
 | 
|
1812  | 
start = int(bits[0])  | 
|
1813  | 
stop = int(bits[1])  | 
|
1814  | 
basis_end = int(bits[2])  | 
|
1815  | 
delta_end = int(bits[3])  | 
|
1816  | 
return node[0], start, stop, basis_end, delta_end  | 
|
| 
0.18.14
by John Arbash Meinel
 A bit more work, not really usable yet.  | 
1817  | 
|
| 
4343.3.2
by John Arbash Meinel
 All stacking tests seem to be passing for dev6 repos  | 
1818  | 
def scan_unvalidated_index(self, graph_index):  | 
1819  | 
"""Inform this _GCGraphIndex that there is an unvalidated index.  | 
|
1820  | 
||
1821  | 
        This allows this _GCGraphIndex to keep track of any missing
 | 
|
1822  | 
        compression parents we may want to have filled in to make those
 | 
|
1823  | 
        indices valid.
 | 
|
1824  | 
||
1825  | 
        :param graph_index: A GraphIndex
 | 
|
1826  | 
        """
 | 
|
| 
4343.3.21
by John Arbash Meinel
 Implement get_missing_parents in terms of _KeyRefs.  | 
1827  | 
if self._key_dependencies is not None:  | 
1828  | 
            # Add parent refs from graph_index (and discard parent refs that
 | 
|
1829  | 
            # the graph_index has).
 | 
|
1830  | 
add_refs = self._key_dependencies.add_references  | 
|
1831  | 
for node in graph_index.iter_all_entries():  | 
|
1832  | 
add_refs(node[1], node[3][0])  | 
|
| 
4343.3.2
by John Arbash Meinel
 All stacking tests seem to be passing for dev6 repos  | 
1833  | 
|
1834  | 
||
| 
0.18.14
by John Arbash Meinel
 A bit more work, not really usable yet.  | 
1835  | 
|
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
1836  | 
from bzrlib._groupcompress_py import (  | 
1837  | 
apply_delta,  | 
|
| 
3735.40.19
by John Arbash Meinel
 Implement apply_delta_to_source which doesn't have to malloc another string.  | 
1838  | 
apply_delta_to_source,  | 
| 
3735.40.11
by John Arbash Meinel
 Implement make_delta and apply_delta.  | 
1839  | 
encode_base128_int,  | 
1840  | 
decode_base128_int,  | 
|
| 
4300.1.1
by John Arbash Meinel
 Add the ability to convert a gc block into 'human readable' form.  | 
1841  | 
decode_copy_instruction,  | 
| 
3735.40.13
by John Arbash Meinel
 Rename EquivalenceTable to LinesDeltaIndex.  | 
1842  | 
LinesDeltaIndex,  | 
| 
3735.40.4
by John Arbash Meinel
 Factor out tests that rely on the exact bytecode.  | 
1843  | 
    )
 | 
| 
0.18.14
by John Arbash Meinel
 A bit more work, not really usable yet.  | 
1844  | 
try:  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1845  | 
from bzrlib._groupcompress_pyx import (  | 
1846  | 
apply_delta,  | 
|
| 
3735.40.19
by John Arbash Meinel
 Implement apply_delta_to_source which doesn't have to malloc another string.  | 
1847  | 
apply_delta_to_source,  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1848  | 
DeltaIndex,  | 
| 
3735.40.16
by John Arbash Meinel
 Implement (de|en)code_base128_int in pyrex.  | 
1849  | 
encode_base128_int,  | 
1850  | 
decode_base128_int,  | 
|
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1851  | 
        )
 | 
| 
3735.40.2
by John Arbash Meinel
 Add a groupcompress.encode_copy_instruction function.  | 
1852  | 
GroupCompressor = PyrexGroupCompressor  | 
| 
0.18.14
by John Arbash Meinel
 A bit more work, not really usable yet.  | 
1853  | 
except ImportError:  | 
| 
4241.6.6
by Robert Collins, John Arbash Meinel, Ian Clathworthy, Vincent Ladeuil
 Groupcompress from brisbane-core.  | 
1854  | 
GroupCompressor = PythonGroupCompressor  | 
1855  |