/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
5247.1.4 by Vincent Ladeuil
Merge cleanup into first-try
1
# Copyright (C) 2005-2010 Canonical Ltd
1887.1.1 by Adeodato Simó
Do not separate paragraphs in the copyright statement with blank lines,
2
#
846 by Martin Pool
- start adding refactored/simplified hash cache
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
1887.1.1 by Adeodato Simó
Do not separate paragraphs in the copyright statement with blank lines,
7
#
846 by Martin Pool
- start adding refactored/simplified hash cache
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
1887.1.1 by Adeodato Simó
Do not separate paragraphs in the copyright statement with blank lines,
12
#
846 by Martin Pool
- start adding refactored/simplified hash cache
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
4183.7.1 by Sabin Iacob
update FSF mailing address
15
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
846 by Martin Pool
- start adding refactored/simplified hash cache
16
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
17
# TODO: Up-front, stat all files in order and remove those which are deleted or
18
# out-of-date.  Don't actually re-read them until they're needed.  That ought
19
# to bring all the inodes into core so that future stats to them are fast, and
953 by Martin Pool
- refactor imports and stats for hashcache
20
# it preserves the nice property that any caller will always get up-to-date
21
# data except in unavoidable cases.
864 by Martin Pool
doc
22
23
# TODO: Perhaps return more details on the file to avoid statting it
24
# again: nonexistent, file type, size, etc
25
1213 by Martin Pool
- move import in hashcache
26
# TODO: Perhaps use a Python pickle instead of a text file; might be faster.
27
864 by Martin Pool
doc
28
6813.3.1 by Martin
Make hashcache pass tests on Python 3
29
CACHE_HEADER = b"### bzr hashcache v5\n"
859 by Martin Pool
- add HashCache.write and a simple test for it
30
4241.14.15 by Vincent Ladeuil
Fix one unicode readlink related test failure.
31
import os
32
import stat
33
import time
953 by Martin Pool
- refactor imports and stats for hashcache
34
6624 by Jelmer Vernooij
Merge Python3 porting work ('py3 pokes')
35
from . import (
4241.14.15 by Vincent Ladeuil
Fix one unicode readlink related test failure.
36
    atomicfile,
37
    errors,
38
    filters as _mod_filters,
39
    osutils,
40
    trace,
41
    )
1540.1.1 by Martin Pool
[patch] stat-cache fixes from Denys
42
43
1185.59.10 by Denys Duchier
hashcache: new constants and improved comment
44
FP_MTIME_COLUMN = 1
45
FP_CTIME_COLUMN = 2
1092.2.6 by Robert Collins
symlink support updated to work
46
FP_MODE_COLUMN = 5
859 by Martin Pool
- add HashCache.write and a simple test for it
47
846 by Martin Pool
- start adding refactored/simplified hash cache
48
49
class HashCache(object):
50
    """Cache for looking up file SHA-1.
51
52
    Files are considered to match the cached value if the fingerprint
53
    of the file has not changed.  This includes its mtime, ctime,
54
    device number, inode number, and size.  This should catch
55
    modifications or replacement of the file by a new one.
56
57
    This may not catch modifications that do not change the file's
58
    size and that occur within the resolution window of the
59
    timestamps.  To handle this we specifically do not cache files
60
    which have changed since the start of the present second, since
61
    they could undetectably change again.
62
63
    This scheme may fail if the machine's clock steps backwards.
64
    Don't do that.
65
66
    This does not canonicalize the paths passed in; that should be
67
    done by the caller.
68
860 by Martin Pool
- refactor hashcache to use just one dictionary
69
    _cache
70
        Indexed by path, points to a two-tuple of the SHA-1 of the file.
71
        and its fingerprint.
846 by Martin Pool
- start adding refactored/simplified hash cache
72
73
    stat_count
74
        number of times files have been statted
75
76
    hit_count
77
        number of times files have been retrieved from the cache, avoiding a
78
        re-read
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
79
846 by Martin Pool
- start adding refactored/simplified hash cache
80
    miss_count
81
        number of misses (times files have been completely re-read)
82
    """
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
83
    needs_write = False
84
3368.2.4 by Ian Clatworthy
make content filter lookup a tree responsibility
85
    def __init__(self, root, cache_file_name, mode=None,
7143.15.2 by Jelmer Vernooij
Run autopep8.
86
                 content_filter_stack_provider=None):
3368.2.4 by Ian Clatworthy
make content filter lookup a tree responsibility
87
        """Create a hash cache in base dir, and set the file mode to mode.
88
3368.2.5 by Ian Clatworthy
incorporate jameinel's review feedback
89
        :param content_filter_stack_provider: a function that takes a
90
            path (relative to the top of the tree) and a file-id as
3368.2.19 by Ian Clatworthy
first round of changes from abentley's review
91
            parameters and returns a stack of ContentFilters.
3368.2.5 by Ian Clatworthy
incorporate jameinel's review feedback
92
            If None, no content filtering is performed.
3368.2.4 by Ian Clatworthy
make content filter lookup a tree responsibility
93
        """
7479.2.1 by Jelmer Vernooij
Drop python2 support.
94
        if not isinstance(root, str):
6621.18.2 by Martin
Make bzrlib.hashcache python 3 compatible
95
            raise ValueError("Base dir for hashcache must be text")
96
        self.root = root
846 by Martin Pool
- start adding refactored/simplified hash cache
97
        self.hit_count = 0
98
        self.miss_count = 0
99
        self.stat_count = 0
100
        self.danger_count = 0
953 by Martin Pool
- refactor imports and stats for hashcache
101
        self.removed_count = 0
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
102
        self.update_count = 0
860 by Martin Pool
- refactor hashcache to use just one dictionary
103
        self._cache = {}
1534.4.50 by Robert Collins
Got the bzrdir api straightened out, plenty of refactoring to use it pending, but the api is up and running.
104
        self._mode = mode
6621.18.3 by Martin
Remove remaining uses of safe_unicode from hashcache
105
        self._cache_file_name = cache_file_name
3368.2.23 by Ian Clatworthy
cleanup some method names
106
        self._filter_provider = content_filter_stack_provider
846 by Martin Pool
- start adding refactored/simplified hash cache
107
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
108
    def cache_file_name(self):
1534.4.51 by Robert Collins
Test the disk layout of format3 working trees.
109
        return self._cache_file_name
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
110
846 by Martin Pool
- start adding refactored/simplified hash cache
111
    def clear(self):
860 by Martin Pool
- refactor hashcache to use just one dictionary
112
        """Discard all cached information.
113
114
        This does not reset the counters."""
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
115
        if self._cache:
116
            self.needs_write = True
117
            self._cache = {}
846 by Martin Pool
- start adding refactored/simplified hash cache
118
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
119
    def scan(self):
120
        """Scan all files and remove entries where the cache entry is obsolete.
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
121
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
122
        Obsolete entries are those where the file has been modified or deleted
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
123
        since the entry was inserted.
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
124
        """
6621.18.2 by Martin
Make bzrlib.hashcache python 3 compatible
125
        # Stat in inode order as optimisation for at least linux.
126
        def inode_order(path_and_cache):
127
            return path_and_cache[1][1][3]
7479.2.1 by Jelmer Vernooij
Drop python2 support.
128
        for path, cache_val in sorted(self._cache.items(), key=inode_order):
4241.14.15 by Vincent Ladeuil
Fix one unicode readlink related test failure.
129
            abspath = osutils.pathjoin(self.root, path)
1845.1.3 by Martin Pool
Improvements to hashcache testing:
130
            fp = self._fingerprint(abspath)
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
131
            self.stat_count += 1
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
132
6656.1.1 by Martin
Apply 2to3 dict fixer and clean up resulting mess using view helpers
133
            if not fp or cache_val[1] != fp:
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
134
                # not here or not a regular file anymore
135
                self.removed_count += 1
136
                self.needs_write = True
137
                del self._cache[path]
138
2012.1.7 by Aaron Bentley
Get tree._iter_changed down to ~ 1 stat per file
139
    def get_sha1(self, path, stat_value=None):
953 by Martin Pool
- refactor imports and stats for hashcache
140
        """Return the sha1 of a file.
846 by Martin Pool
- start adding refactored/simplified hash cache
141
        """
6621.18.2 by Martin
Make bzrlib.hashcache python 3 compatible
142
        abspath = osutils.pathjoin(self.root, path)
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
143
        self.stat_count += 1
2012.1.7 by Aaron Bentley
Get tree._iter_changed down to ~ 1 stat per file
144
        file_fp = self._fingerprint(abspath, stat_value)
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
145
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
146
        if not file_fp:
147
            # not a regular file or not existing
148
            if path in self._cache:
149
                self.removed_count += 1
150
                self.needs_write = True
151
                del self._cache[path]
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
152
            return None
953 by Martin Pool
- refactor imports and stats for hashcache
153
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
154
        if path in self._cache:
155
            cache_sha1, cache_fp = self._cache[path]
860 by Martin Pool
- refactor hashcache to use just one dictionary
156
        else:
157
            cache_sha1, cache_fp = None, None
846 by Martin Pool
- start adding refactored/simplified hash cache
158
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
159
        if cache_fp == file_fp:
846 by Martin Pool
- start adding refactored/simplified hash cache
160
            self.hit_count += 1
860 by Martin Pool
- refactor hashcache to use just one dictionary
161
            return cache_sha1
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
162
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
163
        self.miss_count += 1
1092.2.6 by Robert Collins
symlink support updated to work
164
165
        mode = file_fp[FP_MODE_COLUMN]
166
        if stat.S_ISREG(mode):
3368.2.23 by Ian Clatworthy
cleanup some method names
167
            if self._filter_provider is None:
3368.2.5 by Ian Clatworthy
incorporate jameinel's review feedback
168
                filters = []
3368.2.4 by Ian Clatworthy
make content filter lookup a tree responsibility
169
            else:
7192.5.1 by Jelmer Vernooij
Remove more file ids.
170
                filters = self._filter_provider(path=path)
3368.2.4 by Ian Clatworthy
make content filter lookup a tree responsibility
171
            digest = self._really_sha1_file(abspath, filters)
1092.2.6 by Robert Collins
symlink support updated to work
172
        elif stat.S_ISLNK(mode):
6621.18.3 by Martin
Remove remaining uses of safe_unicode from hashcache
173
            target = osutils.readlink(abspath)
4241.14.15 by Vincent Ladeuil
Fix one unicode readlink related test failure.
174
            digest = osutils.sha_string(target.encode('UTF-8'))
1092.2.6 by Robert Collins
symlink support updated to work
175
        else:
4241.14.15 by Vincent Ladeuil
Fix one unicode readlink related test failure.
176
            raise errors.BzrError("file %r: unknown file stat mode: %o"
177
                                  % (abspath, mode))
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
178
1845.1.2 by mbp at sourcefrog
Use larger time window on hashcache to be safe with fractional times
179
        # window of 3 seconds to allow for 2s resolution on windows,
180
        # unsynchronized file servers, etc.
1845.1.3 by Martin Pool
Improvements to hashcache testing:
181
        cutoff = self._cutoff_time()
1845.1.2 by mbp at sourcefrog
Use larger time window on hashcache to be safe with fractional times
182
        if file_fp[FP_MTIME_COLUMN] >= cutoff \
183
                or file_fp[FP_CTIME_COLUMN] >= cutoff:
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
184
            # changed too recently; can't be cached.  we can
185
            # return the result and it could possibly be cached
186
            # next time.
1185.59.10 by Denys Duchier
hashcache: new constants and improved comment
187
            #
188
            # the point is that we only want to cache when we are sure that any
189
            # subsequent modifications of the file can be detected.  If a
190
            # modification neither changes the inode, the device, the size, nor
191
            # the mode, then we can only distinguish it by time; therefore we
192
            # need to let sufficient time elapse before we may cache this entry
193
            # again.  If we didn't do this, then, for example, a very quick 1
194
            # byte replacement in the file might go undetected.
1845.1.2 by mbp at sourcefrog
Use larger time window on hashcache to be safe with fractional times
195
            ## mutter('%r modified too recently; not caching', path)
196
            self.danger_count += 1
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
197
            if cache_fp:
198
                self.removed_count += 1
199
                self.needs_write = True
200
                del self._cache[path]
846 by Martin Pool
- start adding refactored/simplified hash cache
201
        else:
7143.15.2 by Jelmer Vernooij
Run autopep8.
202
            # mutter('%r added to cache: now=%f, mtime=%d, ctime=%d',
1845.1.2 by mbp at sourcefrog
Use larger time window on hashcache to be safe with fractional times
203
            ##        path, time.time(), file_fp[FP_MTIME_COLUMN],
7143.15.2 by Jelmer Vernooij
Run autopep8.
204
            # file_fp[FP_CTIME_COLUMN])
954 by Martin Pool
- separate out code that just scans the hash cache to find files that are possibly
205
            self.update_count += 1
206
            self.needs_write = True
207
            self._cache[path] = (digest, file_fp)
208
        return digest
1845.1.3 by Martin Pool
Improvements to hashcache testing:
209
3368.2.4 by Ian Clatworthy
make content filter lookup a tree responsibility
210
    def _really_sha1_file(self, abspath, filters):
1845.1.3 by Martin Pool
Improvements to hashcache testing:
211
        """Calculate the SHA1 of a file by reading the full text"""
4241.14.15 by Vincent Ladeuil
Fix one unicode readlink related test failure.
212
        return _mod_filters.internal_size_sha_file_byname(abspath, filters)[1]
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
213
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
214
    def write(self):
859 by Martin Pool
- add HashCache.write and a simple test for it
215
        """Write contents of cache to file."""
7067.12.1 by Jelmer Vernooij
Fix an ignore test. Make AtomicFile a contextmanager.
216
        with atomicfile.AtomicFile(self.cache_file_name(), 'wb',
7143.15.2 by Jelmer Vernooij
Run autopep8.
217
                                   new_mode=self._mode) as outf:
1908.4.8 by John Arbash Meinel
Small tweak to hashcache to make it write out faster
218
            outf.write(CACHE_HEADER)
859 by Martin Pool
- add HashCache.write and a simple test for it
219
7479.2.1 by Jelmer Vernooij
Drop python2 support.
220
            for path, c in self._cache.items():
6813.3.1 by Martin
Make hashcache pass tests on Python 3
221
                line_info = [path.encode('utf-8'), b'// ', c[0], b' ']
222
                line_info.append(b'%d %d %d %d %d %d' % c[1])
223
                line_info.append(b'\n')
224
                outf.write(b''.join(line_info))
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
225
            self.needs_write = False
7143.15.2 by Jelmer Vernooij
Run autopep8.
226
            # mutter("write hash cache: %s hits=%d misses=%d stat=%d recent=%d updates=%d",
7143.15.5 by Jelmer Vernooij
More PEP8 fixes.
227
            #        self.cache_file_name(), self.hit_count, self.miss_count,
7143.15.2 by Jelmer Vernooij
Run autopep8.
228
            # self.stat_count,
229
            # self.danger_count, self.update_count)
862 by Martin Pool
- code to re-read hashcache from file
230
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
231
    def read(self):
862 by Martin Pool
- code to re-read hashcache from file
232
        """Reinstate cache from file.
233
234
        Overwrites existing cache.
235
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
236
        If the cache file has the wrong version marker, this just clears
862 by Martin Pool
- code to re-read hashcache from file
237
        the cache."""
238
        self._cache = {}
239
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
240
        fn = self.cache_file_name()
241
        try:
6813.3.1 by Martin
Make hashcache pass tests on Python 3
242
            inf = open(fn, 'rb', buffering=65000)
6619.3.2 by Jelmer Vernooij
Apply 2to3 except fix.
243
        except IOError as e:
6855.3.1 by Jelmer Vernooij
Several more fixes.
244
            trace.mutter("failed to open %s: %s", fn, str(e))
1214 by Martin Pool
- hashcache should be written out if it can't be read
245
            # better write it now so it is valid
246
            self.needs_write = True
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
247
            return
248
6813.3.1 by Martin
Make hashcache pass tests on Python 3
249
        with inf:
250
            hdr = inf.readline()
251
            if hdr != CACHE_HEADER:
252
                trace.mutter('cache header marker not found at top of %s;'
253
                             ' discarding cache', fn)
254
                self.needs_write = True
255
                return
256
257
            for l in inf:
258
                pos = l.index(b'// ')
259
                path = l[:pos].decode('utf-8')
260
                if path in self._cache:
261
                    trace.warning('duplicated path %r in cache' % path)
262
                    continue
263
264
                pos += 3
265
                fields = l[pos:].split(b' ')
266
                if len(fields) != 7:
267
                    trace.warning("bad line in hashcache: %r" % l)
268
                    continue
269
270
                sha1 = fields[0]
271
                if len(sha1) != 40:
272
                    trace.warning("bad sha1 in hashcache: %r" % sha1)
273
                    continue
274
275
                fp = tuple(map(int, fields[1:]))
276
277
                self._cache[path] = (sha1, fp)
4708.2.1 by Martin
Ensure all files opened by bazaar proper are explicitly closed
278
866 by Martin Pool
- use new path-based hashcache for WorkingTree- squash mtime/ctime to whole seconds- update and if necessary write out hashcache when WorkingTree object is created.
279
        self.needs_write = False
1845.1.3 by Martin Pool
Improvements to hashcache testing:
280
281
    def _cutoff_time(self):
282
        """Return cutoff time.
283
284
        Files modified more recently than this time are at risk of being
285
        undetectably modified and so can't be cached.
286
        """
287
        return int(time.time()) - 3
3943.8.1 by Marius Kruger
remove all trailing whitespace from bzr source
288
2012.1.18 by Aaron Bentley
rename fs param to stat_value
289
    def _fingerprint(self, abspath, stat_value=None):
290
        if stat_value is None:
2012.1.7 by Aaron Bentley
Get tree._iter_changed down to ~ 1 stat per file
291
            try:
2012.1.18 by Aaron Bentley
rename fs param to stat_value
292
                stat_value = os.lstat(abspath)
2012.1.7 by Aaron Bentley
Get tree._iter_changed down to ~ 1 stat per file
293
            except OSError:
294
                # might be missing, etc
295
                return None
2012.1.18 by Aaron Bentley
rename fs param to stat_value
296
        if stat.S_ISDIR(stat_value.st_mode):
1845.1.3 by Martin Pool
Improvements to hashcache testing:
297
            return None
298
        # we discard any high precision because it's not reliable; perhaps we
299
        # could do better on some systems?
6621.18.1 by Martin
Remove or fix use of long type and nearby type issues
300
        return (stat_value.st_size, int(stat_value.st_mtime),
301
                int(stat_value.st_ctime), stat_value.st_ino,
2012.1.18 by Aaron Bentley
rename fs param to stat_value
302
                stat_value.st_dev, stat_value.st_mode)