/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
2052.3.1 by John Arbash Meinel
Add tests to cleanup the copyright of all source files
1
# Copyright (C) 2005, 2006 Canonical Ltd
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
1570.1.7 by Robert Collins
Replace the slow topo_sort routine with a much faster one for non trivial datasets.
17
"""Reconcilers are able to fix some potential data errors in a branch."""
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
18
19
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
20
__all__ = [
21
    'KnitReconciler',
22
    'PackReconciler',
23
    'reconcile',
24
    'Reconciler',
25
    'RepoReconciler',
26
    ]
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
27
28
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
29
from bzrlib import (
2745.6.16 by Aaron Bentley
Update from review
30
    errors,
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
31
    ui,
32
    repository,
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
33
    repofmt,
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
34
    )
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
35
from bzrlib.trace import mutter, note
1570.1.7 by Robert Collins
Replace the slow topo_sort routine with a much faster one for non trivial datasets.
36
from bzrlib.tsort import TopoSorter
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
37
38
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
39
def reconcile(dir, other=None):
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
40
    """Reconcile the data in dir.
41
42
    Currently this is limited to a inventory 'reweave'.
43
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
44
    This is a convenience method, for using a Reconciler object.
45
46
    Directly using Reconciler is recommended for library users that
47
    desire fine grained control or analysis of the found issues.
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
48
49
    :param other: another bzrdir to reconcile against.
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
50
    """
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
51
    reconciler = Reconciler(dir, other=other)
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
52
    reconciler.reconcile()
53
54
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
55
class Reconciler(object):
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
56
    """Reconcilers are used to reconcile existing data."""
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
57
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
58
    def __init__(self, dir, other=None):
59
        """Create a Reconciler."""
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
60
        self.bzrdir = dir
61
62
    def reconcile(self):
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
63
        """Perform reconciliation.
64
        
65
        After reconciliation the following attributes document found issues:
66
        inconsistent_parents: The number of revisions in the repository whose
67
                              ancestry was being reported incorrectly.
68
        garbage_inventories: The number of inventory objects without revisions
69
                             that were garbage collected.
3389.2.7 by John Arbash Meinel
Review comments from Ian
70
        fixed_branch_history: None if there was no branch, False if the branch
71
                              history was correct, True if the branch history
72
                              needed to be re-normalized.
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
73
        """
1594.1.3 by Robert Collins
Fixup pb usage to use nested_progress_bar.
74
        self.pb = ui.ui_factory.nested_progress_bar()
75
        try:
76
            self._reconcile()
77
        finally:
78
            self.pb.finished()
79
80
    def _reconcile(self):
81
        """Helper function for performing reconciliation."""
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
82
        self._reconcile_branch()
83
        self._reconcile_repository()
84
85
    def _reconcile_branch(self):
86
        try:
87
            self.branch = self.bzrdir.open_branch()
88
        except errors.NotBranchError:
89
            # Nothing to check here
3389.2.7 by John Arbash Meinel
Review comments from Ian
90
            self.fixed_branch_history = None
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
91
            return
92
        self.pb.note('Reconciling branch %s',
93
                     self.branch.base)
94
        branch_reconciler = self.branch.reconcile(thorough=True)
95
        self.fixed_branch_history = branch_reconciler.fixed_history
96
97
    def _reconcile_repository(self):
1570.1.11 by Robert Collins
Make reconcile work with shared repositories.
98
        self.repo = self.bzrdir.find_repository()
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
99
        self.pb.note('Reconciling repository %s',
100
                     self.repo.bzrdir.root_transport.base)
2960.1.1 by Robert Collins
* Reconcile now shows progress bars. (Robert Collins, #159351)
101
        self.pb.update("Reconciling repository", 0, 1)
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
102
        repo_reconciler = self.repo.reconcile(thorough=True)
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
103
        self.inconsistent_parents = repo_reconciler.inconsistent_parents
104
        self.garbage_inventories = repo_reconciler.garbage_inventories
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
105
        if repo_reconciler.aborted:
106
            self.pb.note(
107
                'Reconcile aborted: revision index has inconsistent parents.')
108
            self.pb.note(
109
                'Run "bzr check" for more details.')
110
        else:
111
            self.pb.note('Reconciliation complete.')
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
112
113
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
114
class BranchReconciler(object):
115
    """Reconciler that works on a branch."""
116
117
    def __init__(self, a_branch, thorough=False):
118
        self.fixed_history = None
119
        self.thorough = thorough
120
        self.branch = a_branch
121
122
    def reconcile(self):
123
        self.branch.lock_write()
124
        try:
125
            self.pb = ui.ui_factory.nested_progress_bar()
126
            try:
127
                self._reconcile_steps()
128
            finally:
129
                self.pb.finished()
130
        finally:
131
            self.branch.unlock()
132
133
    def _reconcile_steps(self):
134
        self._reconcile_revision_history()
135
136
    def _reconcile_revision_history(self):
137
        repo = self.branch.repository
138
        last_revno, last_revision_id = self.branch.last_revision_info()
139
        real_history = list(repo.iter_reverse_revision_history(
140
                                last_revision_id))
141
        real_history.reverse()
142
        if last_revno != len(real_history):
143
            self.fixed_history = True
144
            # Technically for Branch5 formats, it is more efficient to use
145
            # set_revision_history, as this will regenerate it again.
146
            # Not really worth a whole BranchReconciler class just for this,
147
            # though.
148
            self.pb.note('Fixing last revision info %s => %s',
149
                         last_revno, len(real_history))
150
            self.branch.set_last_revision_info(len(real_history),
151
                                               last_revision_id)
152
        else:
153
            self.fixed_history = False
154
            self.pb.note('revision_history ok.')
155
156
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
157
class RepoReconciler(object):
158
    """Reconciler that reconciles a repository.
159
2857.1.2 by Robert Collins
Review feedback.
160
    The goal of repository reconciliation is to make any derived data
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
161
    consistent with the core data committed by a user. This can involve 
162
    reindexing, or removing unreferenced data if that can interfere with
163
    queries in a given repository.
164
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
165
    Currently this consists of an inventory reweave with revision cross-checks.
166
    """
167
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
168
    def __init__(self, repo, other=None, thorough=False):
169
        """Construct a RepoReconciler.
170
171
        :param thorough: perform a thorough check which may take longer but
172
                         will correct non-data loss issues such as incorrect
173
                         cached data.
174
        """
175
        self.garbage_inventories = 0
176
        self.inconsistent_parents = 0
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
177
        self.aborted = False
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
178
        self.repo = repo
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
179
        self.thorough = thorough
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
180
181
    def reconcile(self):
182
        """Perform reconciliation.
183
        
184
        After reconciliation the following attributes document found issues:
185
        inconsistent_parents: The number of revisions in the repository whose
186
                              ancestry was being reported incorrectly.
187
        garbage_inventories: The number of inventory objects without revisions
188
                             that were garbage collected.
189
        """
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
190
        self.repo.lock_write()
191
        try:
1594.1.3 by Robert Collins
Fixup pb usage to use nested_progress_bar.
192
            self.pb = ui.ui_factory.nested_progress_bar()
193
            try:
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
194
                self._reconcile_steps()
1594.1.3 by Robert Collins
Fixup pb usage to use nested_progress_bar.
195
            finally:
196
                self.pb.finished()
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
197
        finally:
198
            self.repo.unlock()
199
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
200
    def _reconcile_steps(self):
201
        """Perform the steps to reconcile this repository."""
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
202
        self._reweave_inventory()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
203
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
204
    def _reweave_inventory(self):
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
205
        """Regenerate the inventory weave for the repository from scratch.
206
        
207
        This is a smart function: it will only do the reweave if doing it 
208
        will correct data issues. The self.thorough flag controls whether
209
        only data-loss causing issues (!self.thorough) or all issues
210
        (self.thorough) are treated as requiring the reweave.
211
        """
212
        # local because needing to know about WeaveFile is a wart we want to hide
1563.2.42 by Robert Collins
Stop reconcile on weaves being quadratic.
213
        from bzrlib.weave import WeaveFile, Weave
1563.2.29 by Robert Collins
Remove all but fetch references to repository.revision_store.
214
        transaction = self.repo.get_transaction()
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
215
        self.pb.update('Reading inventory data.')
216
        self.inventory = self.repo.get_inventory_weave()
217
        # the total set of revisions to process
1563.2.29 by Robert Collins
Remove all but fetch references to repository.revision_store.
218
        self.pending = set([rev_id for rev_id in self.repo._revision_store.all_revision_ids(transaction)])
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
219
220
        # mapping from revision_id to parents
221
        self._rev_graph = {}
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
222
        # errors that we detect
223
        self.inconsistent_parents = 0
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
224
        # we need the revision id of each revision and its available parents list
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
225
        self._setup_steps(len(self.pending))
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
226
        for rev_id in self.pending:
227
            # put a revision into the graph.
228
            self._graph_revision(rev_id)
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
229
        self._check_garbage_inventories()
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
230
        # if there are no inconsistent_parents and 
231
        # (no garbage inventories or we are not doing a thorough check)
232
        if (not self.inconsistent_parents and 
233
            (not self.garbage_inventories or not self.thorough)):
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
234
            self.pb.note('Inventory ok.')
235
            return
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
236
        self.pb.update('Backing up inventory...', 0, 0)
1563.2.25 by Robert Collins
Merge in upstream.
237
        self.repo.control_weaves.copy(self.inventory, 'inventory.backup', self.repo.get_transaction())
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
238
        self.pb.note('Backup Inventory created.')
239
        # asking for '' should never return a non-empty weave
1616.1.1 by Martin Pool
[merge] robertc
240
        new_inventory_vf = self.repo.control_weaves.get_empty('inventory.new',
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
241
            self.repo.get_transaction())
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
242
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
243
        # we have topological order of revisions and non ghost parents ready.
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
244
        self._setup_steps(len(self._rev_graph))
1570.1.7 by Robert Collins
Replace the slow topo_sort routine with a much faster one for non trivial datasets.
245
        for rev_id in TopoSorter(self._rev_graph.items()).iter_topo_order():
246
            parents = self._rev_graph[rev_id]
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
247
            # double check this really is in topological order.
1616.1.1 by Martin Pool
[merge] robertc
248
            unavailable = [p for p in parents if p not in new_inventory_vf]
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
249
            if unavailable:
250
                raise AssertionError('unavailable parents: %r'
251
                    % unavailable)
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
252
            # this entry has all the non ghost parents in the inventory
253
            # file already.
254
            self._reweave_step('adding inventories')
1616.1.1 by Martin Pool
[merge] robertc
255
            if isinstance(new_inventory_vf, WeaveFile):
256
                # It's really a WeaveFile, but we call straight into the
257
                # Weave's add method to disable the auto-write-out behaviour.
1607.1.11 by Robert Collins
Merge from bzr.dev
258
                # This is done to avoid a revision_count * time-to-write additional overhead on 
259
                # reconcile.
1616.1.1 by Martin Pool
[merge] robertc
260
                new_inventory_vf._check_write_ok()
2794.1.1 by Robert Collins
Allow knits to be instructed not to add a text based on a sha, for commit.
261
                Weave._add_lines(new_inventory_vf, rev_id, parents,
2805.6.7 by Robert Collins
Review feedback.
262
                    self.inventory.get_lines(rev_id), None, None, None, False, True)
1563.2.42 by Robert Collins
Stop reconcile on weaves being quadratic.
263
            else:
1616.1.1 by Martin Pool
[merge] robertc
264
                new_inventory_vf.add_lines(rev_id, parents, self.inventory.get_lines(rev_id))
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
265
1616.1.1 by Martin Pool
[merge] robertc
266
        if isinstance(new_inventory_vf, WeaveFile):
267
            new_inventory_vf._save()
268
        # if this worked, the set of new_inventory_vf.names should equal
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
269
        # self.pending
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
270
        if not (set(new_inventory_vf.versions()) == self.pending):
271
            raise AssertionError()
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
272
        self.pb.update('Writing weave')
1616.1.1 by Martin Pool
[merge] robertc
273
        self.repo.control_weaves.copy(new_inventory_vf, 'inventory', self.repo.get_transaction())
1563.2.25 by Robert Collins
Merge in upstream.
274
        self.repo.control_weaves.delete('inventory.new', self.repo.get_transaction())
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
275
        self.inventory = None
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
276
        self.pb.note('Inventory regenerated.')
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
277
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
278
    def _setup_steps(self, new_total):
279
        """Setup the markers we need to control the progress bar."""
280
        self.total = new_total
281
        self.count = 0
282
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
283
    def _graph_revision(self, rev_id):
284
        """Load a revision into the revision graph."""
285
        # pick a random revision
286
        # analyse revision id rev_id and put it in the stack.
287
        self._reweave_step('loading revisions')
1570.1.13 by Robert Collins
Check for incorrect revision parentage in the weave during revision access.
288
        rev = self.repo.get_revision_reconcile(rev_id)
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
289
        parents = []
290
        for parent in rev.parent_ids:
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
291
            if self._parent_is_available(parent):
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
292
                parents.append(parent)
293
            else:
294
                mutter('found ghost %s', parent)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
295
        self._rev_graph[rev_id] = parents
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
296
        if self._parents_are_inconsistent(rev_id, parents):
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
297
            self.inconsistent_parents += 1
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
298
            mutter('Inconsistent inventory parents: id {%s} '
299
                   'inventory claims %r, '
300
                   'available parents are %r, '
301
                   'unavailable parents are %r',
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
302
                   rev_id,
303
                   set(self.inventory.get_parent_map([rev_id])[rev_id]),
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
304
                   set(parents),
305
                   set(rev.parent_ids).difference(set(parents)))
306
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
307
    def _parents_are_inconsistent(self, rev_id, parents):
308
        """Return True if the parents list of rev_id does not match the weave.
309
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
310
        This detects inconsistencies based on the self.thorough value:
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
311
        if thorough is on, the first parent value is checked as well as ghost
312
        differences.
313
        Otherwise only the ghost differences are evaluated.
314
        """
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
315
        weave_parents = self.inventory.get_parent_map([rev_id])[rev_id]
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
316
        weave_missing_old_ghosts = set(weave_parents) != set(parents)
317
        first_parent_is_wrong = (
318
            len(weave_parents) and len(parents) and
319
            parents[0] != weave_parents[0])
320
        if self.thorough:
321
            return weave_missing_old_ghosts or first_parent_is_wrong
322
        else:
323
            return weave_missing_old_ghosts
324
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
325
    def _check_garbage_inventories(self):
326
        """Check for garbage inventories which we cannot trust
327
328
        We cant trust them because their pre-requisite file data may not
329
        be present - all we know is that their revision was not installed.
330
        """
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
331
        if not self.thorough:
332
            return
1563.2.39 by Robert Collins
Merge from integration.
333
        inventories = set(self.inventory.versions())
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
334
        revisions = set(self._rev_graph.keys())
335
        garbage = inventories.difference(revisions)
336
        self.garbage_inventories = len(garbage)
337
        for revision_id in garbage:
338
            mutter('Garbage inventory {%s} found.', revision_id)
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
339
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
340
    def _parent_is_available(self, parent):
341
        """True if parent is a fully available revision
342
343
        A fully available revision has a inventory and a revision object in the
344
        repository.
345
        """
346
        return (parent in self._rev_graph or 
347
                (parent in self.inventory and self.repo.has_revision(parent)))
348
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
349
    def _reweave_step(self, message):
350
        """Mark a single step of regeneration complete."""
351
        self.pb.update(message, self.count, self.total)
352
        self.count += 1
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
353
354
355
class KnitReconciler(RepoReconciler):
356
    """Reconciler that reconciles a knit format repository.
357
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
358
    This will detect garbage inventories and remove them in thorough mode.
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
359
    """
360
361
    def _reconcile_steps(self):
362
        """Perform the steps to reconcile this repository."""
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
363
        if self.thorough:
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
364
            try:
365
                self._load_indexes()
366
            except errors.BzrCheckError:
367
                self.aborted = True
368
                return
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
369
            # knits never suffer this
370
            self._gc_inventory()
2745.6.13 by Aaron Bentley
Misc cleanup
371
            self._fix_text_parents()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
372
373
    def _load_indexes(self):
374
        """Load indexes for the reconciliation."""
375
        self.transaction = self.repo.get_transaction()
376
        self.pb.update('Reading indexes.', 0, 2)
377
        self.inventory = self.repo.get_inventory_weave()
378
        self.pb.update('Reading indexes.', 1, 2)
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
379
        self.repo._check_for_inconsistent_revision_parents()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
380
        self.revisions = self.repo._revision_store.get_revision_file(self.transaction)
381
        self.pb.update('Reading indexes.', 2, 2)
382
383
    def _gc_inventory(self):
384
        """Remove inventories that are not referenced from the revision store."""
385
        self.pb.update('Checking unused inventories.', 0, 1)
386
        self._check_garbage_inventories()
387
        self.pb.update('Checking unused inventories.', 1, 3)
388
        if not self.garbage_inventories:
389
            self.pb.note('Inventory ok.')
390
            return
391
        self.pb.update('Backing up inventory...', 0, 0)
392
        self.repo.control_weaves.copy(self.inventory, 'inventory.backup', self.transaction)
393
        self.pb.note('Backup Inventory created.')
394
        # asking for '' should never return a non-empty weave
1616.1.1 by Martin Pool
[merge] robertc
395
        new_inventory_vf = self.repo.control_weaves.get_empty('inventory.new',
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
396
            self.transaction)
397
398
        # we have topological order of revisions and non ghost parents ready.
1594.2.9 by Robert Collins
Teach Knit repositories how to handle ghosts without corrupting at all.
399
        self._setup_steps(len(self.revisions))
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
400
        revision_ids = self.revisions.versions()
401
        graph = self.revisions.get_parent_map(revision_ids)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
402
        for rev_id in TopoSorter(graph.items()).iter_topo_order():
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
403
            parents = graph[rev_id]
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
404
            # double check this really is in topological order, ignoring existing ghosts.
405
            unavailable = [p for p in parents if p not in new_inventory_vf and
406
                p in self.revisions]
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
407
            if unavailable:
408
                raise AssertionError(
409
                    'unavailable parents: %r' % (unavailable,))
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
410
            # this entry has all the non ghost parents in the inventory
411
            # file already.
412
            self._reweave_step('adding inventories')
413
            # ugly but needed, weaves are just way tooooo slow else.
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
414
            new_inventory_vf.add_lines_with_ghosts(rev_id, parents,
415
                self.inventory.get_lines(rev_id))
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
416
1616.1.1 by Martin Pool
[merge] robertc
417
        # if this worked, the set of new_inventory_vf.names should equal
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
418
        # self.pending
3376.2.4 by Martin Pool
Remove every assert statement from bzrlib!
419
        if not(set(new_inventory_vf.versions()) == set(self.revisions.versions())):
420
            raise AssertionError()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
421
        self.pb.update('Writing weave')
1616.1.1 by Martin Pool
[merge] robertc
422
        self.repo.control_weaves.copy(new_inventory_vf, 'inventory', self.transaction)
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
423
        self.repo.control_weaves.delete('inventory.new', self.transaction)
424
        self.inventory = None
425
        self.pb.note('Inventory regenerated.')
426
427
    def _check_garbage_inventories(self):
428
        """Check for garbage inventories which we cannot trust
429
430
        We cant trust them because their pre-requisite file data may not
431
        be present - all we know is that their revision was not installed.
432
        """
433
        inventories = set(self.inventory.versions())
434
        revisions = set(self.revisions.versions())
435
        garbage = inventories.difference(revisions)
436
        self.garbage_inventories = len(garbage)
437
        for revision_id in garbage:
438
            mutter('Garbage inventory {%s} found.', revision_id)
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
439
440
    def _fix_text_parents(self):
2745.6.13 by Aaron Bentley
Misc cleanup
441
        """Fix bad versionedfile parent entries.
442
2745.6.16 by Aaron Bentley
Update from review
443
        It is possible for the parents entry in a versionedfile entry to be
2745.6.13 by Aaron Bentley
Misc cleanup
444
        inconsistent with the values in the revision and inventory.
445
446
        This method finds entries with such inconsistencies, corrects their
447
        parent lists, and replaces the versionedfile with a corrected version.
448
        """
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
449
        transaction = self.repo.get_transaction()
2906.1.1 by Andrew Bennetts
Speed up reconcile by not repeatedly fetching the full inventories, by cache heads and parents queries, and by fetching revision trees in batches.
450
        versions = self.revisions.versions()
2927.2.2 by Andrew Bennetts
Only try to check versions that actually exist in the versioned file, and do a little more muttering.
451
        mutter('Prepopulating revision text cache with %d revisions',
452
                len(versions))
3036.1.3 by Robert Collins
Privatise VersionedFileChecker.
453
        vf_checker = self.repo._get_versioned_file_checker()
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
454
        # List all weaves before altering, to avoid race conditions when we
455
        # delete unused weaves.
456
        weaves = list(enumerate(self.repo.weave_store))
457
        for num, file_id in weaves:
2745.6.12 by Aaron Bentley
Do topological sorting when adding new records to VersionedFile
458
            self.pb.update('Fixing text parents', num,
459
                           len(self.repo.weave_store))
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
460
            vf = self.repo.weave_store.get_weave(file_id, transaction)
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
461
            versions_with_bad_parents, unused_versions = \
3036.1.2 by Robert Collins
Simplify the check_file_version_parents API some more. This has already changed in this release cycle.
462
                vf_checker.check_file_version_parents(vf, file_id)
2927.2.14 by Andrew Bennetts
Tweaks suggested by review.
463
            if (len(versions_with_bad_parents) == 0 and
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
464
                len(unused_versions) == 0):
2927.2.14 by Andrew Bennetts
Tweaks suggested by review.
465
                continue
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
466
            full_text_versions = set()
467
            self._fix_text_parent(file_id, vf, versions_with_bad_parents,
468
                full_text_versions, unused_versions)
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
469
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
470
    def _fix_text_parent(self, file_id, vf, versions_with_bad_parents,
471
            full_text_versions, unused_versions):
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
472
        """Fix bad versionedfile entries in a single versioned file."""
2927.2.2 by Andrew Bennetts
Only try to check versions that actually exist in the versioned file, and do a little more muttering.
473
        mutter('fixing text parent: %r (%d versions)', file_id,
474
                len(versions_with_bad_parents))
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
475
        mutter('(%d need to be full texts, %d are unused)',
476
                len(full_text_versions), len(unused_versions))
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
477
        new_vf = self.repo.weave_store.get_empty('temp:%s' % file_id,
478
            self.transaction)
479
        new_parents = {}
480
        for version in vf.versions():
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
481
            if version in unused_versions:
482
                continue
483
            elif version in versions_with_bad_parents:
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
484
                parents = versions_with_bad_parents[version][1]
485
            else:
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
486
                parents = vf.get_parent_map([version])[version]
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
487
            new_parents[version] = parents
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
488
        if not len(new_parents):
489
            # No used versions, remove the VF.
490
            self.repo.weave_store.delete(file_id, self.transaction)
491
            return
2592.3.214 by Robert Collins
Merge bzr.dev.
492
        for version in TopoSorter(new_parents.items()).iter_topo_order():
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
493
            lines = vf.get_lines(version)
494
            parents = new_parents[version]
495
            if parents and (parents[0] in full_text_versions):
2927.2.10 by Andrew Bennetts
More docstrings, elaborate a comment with an XXX, and remove a little bit of cruft.
496
                # Force this record to be a fulltext, not a delta.
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
497
                new_vf._add(version, lines, parents, False,
498
                    None, None, None, False)
499
            else:
500
                new_vf.add_lines(version, parents, lines)
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
501
        self.repo.weave_store.copy(new_vf, file_id, self.transaction)
502
        self.repo.weave_store.delete('temp:%s' % file_id, self.transaction)
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
503
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
504
505
class PackReconciler(RepoReconciler):
506
    """Reconciler that reconciles a pack based repository.
507
508
    Garbage inventories do not affect ancestry queries, and removal is
509
    considerably more expensive as there is no separate versioned file for
510
    them, so they are not cleaned. In short it is currently a no-op.
511
512
    In future this may be a good place to hook in annotation cache checking,
513
    index recreation etc.
514
    """
515
2592.3.239 by Martin Pool
doc
516
    # XXX: The index corruption that _fix_text_parents performs is needed for
517
    # packs, but not yet implemented. The basic approach is to:
518
    #  - lock the names list
519
    #  - perform a customised pack() that regenerates data as needed
520
    #  - unlock the names list
521
    # https://bugs.edge.launchpad.net/bzr/+bug/154173
522
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
523
    def _reconcile_steps(self):
524
        """Perform the steps to reconcile this repository."""
2951.1.2 by Robert Collins
Partial refactoring of pack_repo to create a Packer object for packing.
525
        if not self.thorough:
526
            return
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
527
        collection = self.repo._pack_collection
528
        collection.ensure_loaded()
529
        collection.lock_names()
2951.1.2 by Robert Collins
Partial refactoring of pack_repo to create a Packer object for packing.
530
        try:
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
531
            packs = collection.all_packs()
532
            all_revisions = self.repo.all_revision_ids()
533
            total_inventories = len(list(
534
                collection.inventory_index.combined_index.iter_all_entries()))
535
            if len(all_revisions):
536
                self._packer = repofmt.pack_repo.ReconcilePacker(
537
                    collection, packs, ".reconcile", all_revisions)
538
                new_pack = self._packer.pack(pb=self.pb)
539
                if new_pack is not None:
2951.1.10 by Robert Collins
Peer review feedback with Ian.
540
                    self._discard_and_save(packs)
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
541
            else:
542
                # only make a new pack when there is data to copy.
2951.1.10 by Robert Collins
Peer review feedback with Ian.
543
                self._discard_and_save(packs)
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
544
            self.garbage_inventories = total_inventories - len(list(
545
                collection.inventory_index.combined_index.iter_all_entries()))
2951.1.2 by Robert Collins
Partial refactoring of pack_repo to create a Packer object for packing.
546
        finally:
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
547
            collection._unlock_names()
548
2951.1.10 by Robert Collins
Peer review feedback with Ian.
549
    def _discard_and_save(self, packs):
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
550
        """Discard some packs from the repository.
551
2951.1.10 by Robert Collins
Peer review feedback with Ian.
552
        This removes them from the memory index, saves the in-memory index
553
        which makes the newly reconciled pack visible and hides the packs to be
554
        discarded, and finally renames the packs being discarded into the
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
555
        obsolete packs directory.
2951.1.10 by Robert Collins
Peer review feedback with Ian.
556
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
557
        :param packs: The packs to discard.
558
        """
559
        for pack in packs:
560
            self.repo._pack_collection._remove_pack_from_memory(pack)
561
        self.repo._pack_collection._save_pack_names()
562
        self.repo._pack_collection._obsolete_packs(packs)