/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
2052.3.1 by John Arbash Meinel
Add tests to cleanup the copyright of all source files
1
# Copyright (C) 2005, 2006 Canonical Ltd
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
1570.1.7 by Robert Collins
Replace the slow topo_sort routine with a much faster one for non trivial datasets.
17
"""Reconcilers are able to fix some potential data errors in a branch."""
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
18
19
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
20
__all__ = [
21
    'KnitReconciler',
22
    'PackReconciler',
23
    'reconcile',
24
    'Reconciler',
25
    'RepoReconciler',
26
    ]
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
27
28
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
29
from bzrlib import (
2745.6.16 by Aaron Bentley
Update from review
30
    errors,
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
31
    ui,
32
    repository,
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
33
    repofmt,
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
34
    )
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
35
from bzrlib.trace import mutter, note
1570.1.7 by Robert Collins
Replace the slow topo_sort routine with a much faster one for non trivial datasets.
36
from bzrlib.tsort import TopoSorter
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
37
38
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
39
def reconcile(dir, other=None):
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
40
    """Reconcile the data in dir.
41
42
    Currently this is limited to a inventory 'reweave'.
43
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
44
    This is a convenience method, for using a Reconciler object.
45
46
    Directly using Reconciler is recommended for library users that
47
    desire fine grained control or analysis of the found issues.
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
48
49
    :param other: another bzrdir to reconcile against.
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
50
    """
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
51
    reconciler = Reconciler(dir, other=other)
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
52
    reconciler.reconcile()
53
54
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
55
class Reconciler(object):
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
56
    """Reconcilers are used to reconcile existing data."""
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
57
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
58
    def __init__(self, dir, other=None):
59
        """Create a Reconciler."""
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
60
        self.bzrdir = dir
61
62
    def reconcile(self):
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
63
        """Perform reconciliation.
64
        
65
        After reconciliation the following attributes document found issues:
66
        inconsistent_parents: The number of revisions in the repository whose
67
                              ancestry was being reported incorrectly.
68
        garbage_inventories: The number of inventory objects without revisions
69
                             that were garbage collected.
3389.2.7 by John Arbash Meinel
Review comments from Ian
70
        fixed_branch_history: None if there was no branch, False if the branch
71
                              history was correct, True if the branch history
72
                              needed to be re-normalized.
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
73
        """
1594.1.3 by Robert Collins
Fixup pb usage to use nested_progress_bar.
74
        self.pb = ui.ui_factory.nested_progress_bar()
75
        try:
76
            self._reconcile()
77
        finally:
78
            self.pb.finished()
79
80
    def _reconcile(self):
81
        """Helper function for performing reconciliation."""
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
82
        self._reconcile_branch()
83
        self._reconcile_repository()
84
85
    def _reconcile_branch(self):
86
        try:
87
            self.branch = self.bzrdir.open_branch()
88
        except errors.NotBranchError:
89
            # Nothing to check here
3389.2.7 by John Arbash Meinel
Review comments from Ian
90
            self.fixed_branch_history = None
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
91
            return
92
        self.pb.note('Reconciling branch %s',
93
                     self.branch.base)
94
        branch_reconciler = self.branch.reconcile(thorough=True)
95
        self.fixed_branch_history = branch_reconciler.fixed_history
96
97
    def _reconcile_repository(self):
1570.1.11 by Robert Collins
Make reconcile work with shared repositories.
98
        self.repo = self.bzrdir.find_repository()
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
99
        self.pb.note('Reconciling repository %s',
100
                     self.repo.bzrdir.root_transport.base)
2960.1.1 by Robert Collins
* Reconcile now shows progress bars. (Robert Collins, #159351)
101
        self.pb.update("Reconciling repository", 0, 1)
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
102
        repo_reconciler = self.repo.reconcile(thorough=True)
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
103
        self.inconsistent_parents = repo_reconciler.inconsistent_parents
104
        self.garbage_inventories = repo_reconciler.garbage_inventories
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
105
        if repo_reconciler.aborted:
106
            self.pb.note(
107
                'Reconcile aborted: revision index has inconsistent parents.')
108
            self.pb.note(
109
                'Run "bzr check" for more details.')
110
        else:
111
            self.pb.note('Reconciliation complete.')
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
112
113
3389.2.3 by John Arbash Meinel
Add Branch.reconcile() functionality.
114
class BranchReconciler(object):
115
    """Reconciler that works on a branch."""
116
117
    def __init__(self, a_branch, thorough=False):
118
        self.fixed_history = None
119
        self.thorough = thorough
120
        self.branch = a_branch
121
122
    def reconcile(self):
123
        self.branch.lock_write()
124
        try:
125
            self.pb = ui.ui_factory.nested_progress_bar()
126
            try:
127
                self._reconcile_steps()
128
            finally:
129
                self.pb.finished()
130
        finally:
131
            self.branch.unlock()
132
133
    def _reconcile_steps(self):
134
        self._reconcile_revision_history()
135
136
    def _reconcile_revision_history(self):
137
        repo = self.branch.repository
138
        last_revno, last_revision_id = self.branch.last_revision_info()
139
        real_history = list(repo.iter_reverse_revision_history(
140
                                last_revision_id))
141
        real_history.reverse()
142
        if last_revno != len(real_history):
143
            self.fixed_history = True
144
            # Technically for Branch5 formats, it is more efficient to use
145
            # set_revision_history, as this will regenerate it again.
146
            # Not really worth a whole BranchReconciler class just for this,
147
            # though.
148
            self.pb.note('Fixing last revision info %s => %s',
149
                         last_revno, len(real_history))
150
            self.branch.set_last_revision_info(len(real_history),
151
                                               last_revision_id)
152
        else:
153
            self.fixed_history = False
154
            self.pb.note('revision_history ok.')
155
156
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
157
class RepoReconciler(object):
158
    """Reconciler that reconciles a repository.
159
2857.1.2 by Robert Collins
Review feedback.
160
    The goal of repository reconciliation is to make any derived data
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
161
    consistent with the core data committed by a user. This can involve 
162
    reindexing, or removing unreferenced data if that can interfere with
163
    queries in a given repository.
164
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
165
    Currently this consists of an inventory reweave with revision cross-checks.
166
    """
167
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
168
    def __init__(self, repo, other=None, thorough=False):
169
        """Construct a RepoReconciler.
170
171
        :param thorough: perform a thorough check which may take longer but
172
                         will correct non-data loss issues such as incorrect
173
                         cached data.
174
        """
175
        self.garbage_inventories = 0
176
        self.inconsistent_parents = 0
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
177
        self.aborted = False
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
178
        self.repo = repo
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
179
        self.thorough = thorough
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
180
181
    def reconcile(self):
182
        """Perform reconciliation.
183
        
184
        After reconciliation the following attributes document found issues:
185
        inconsistent_parents: The number of revisions in the repository whose
186
                              ancestry was being reported incorrectly.
187
        garbage_inventories: The number of inventory objects without revisions
188
                             that were garbage collected.
189
        """
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
190
        self.repo.lock_write()
191
        try:
1594.1.3 by Robert Collins
Fixup pb usage to use nested_progress_bar.
192
            self.pb = ui.ui_factory.nested_progress_bar()
193
            try:
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
194
                self._reconcile_steps()
1594.1.3 by Robert Collins
Fixup pb usage to use nested_progress_bar.
195
            finally:
196
                self.pb.finished()
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
197
        finally:
198
            self.repo.unlock()
199
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
200
    def _reconcile_steps(self):
201
        """Perform the steps to reconcile this repository."""
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
202
        self._reweave_inventory()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
203
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
204
    def _reweave_inventory(self):
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
205
        """Regenerate the inventory weave for the repository from scratch.
206
        
207
        This is a smart function: it will only do the reweave if doing it 
208
        will correct data issues. The self.thorough flag controls whether
209
        only data-loss causing issues (!self.thorough) or all issues
210
        (self.thorough) are treated as requiring the reweave.
211
        """
212
        # local because needing to know about WeaveFile is a wart we want to hide
1563.2.42 by Robert Collins
Stop reconcile on weaves being quadratic.
213
        from bzrlib.weave import WeaveFile, Weave
1563.2.29 by Robert Collins
Remove all but fetch references to repository.revision_store.
214
        transaction = self.repo.get_transaction()
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
215
        self.pb.update('Reading inventory data.')
216
        self.inventory = self.repo.get_inventory_weave()
217
        # the total set of revisions to process
1563.2.29 by Robert Collins
Remove all but fetch references to repository.revision_store.
218
        self.pending = set([rev_id for rev_id in self.repo._revision_store.all_revision_ids(transaction)])
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
219
220
        # mapping from revision_id to parents
221
        self._rev_graph = {}
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
222
        # errors that we detect
223
        self.inconsistent_parents = 0
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
224
        # we need the revision id of each revision and its available parents list
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
225
        self._setup_steps(len(self.pending))
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
226
        for rev_id in self.pending:
227
            # put a revision into the graph.
228
            self._graph_revision(rev_id)
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
229
        self._check_garbage_inventories()
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
230
        # if there are no inconsistent_parents and 
231
        # (no garbage inventories or we are not doing a thorough check)
232
        if (not self.inconsistent_parents and 
233
            (not self.garbage_inventories or not self.thorough)):
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
234
            self.pb.note('Inventory ok.')
235
            return
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
236
        self.pb.update('Backing up inventory...', 0, 0)
1563.2.25 by Robert Collins
Merge in upstream.
237
        self.repo.control_weaves.copy(self.inventory, 'inventory.backup', self.repo.get_transaction())
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
238
        self.pb.note('Backup Inventory created.')
239
        # asking for '' should never return a non-empty weave
1616.1.1 by Martin Pool
[merge] robertc
240
        new_inventory_vf = self.repo.control_weaves.get_empty('inventory.new',
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
241
            self.repo.get_transaction())
1570.1.6 by Robert Collins
Update fast topological_sort to be a function and to have the topo_sort tests run against it.
242
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
243
        # we have topological order of revisions and non ghost parents ready.
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
244
        self._setup_steps(len(self._rev_graph))
1570.1.7 by Robert Collins
Replace the slow topo_sort routine with a much faster one for non trivial datasets.
245
        for rev_id in TopoSorter(self._rev_graph.items()).iter_topo_order():
246
            parents = self._rev_graph[rev_id]
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
247
            # double check this really is in topological order.
1616.1.1 by Martin Pool
[merge] robertc
248
            unavailable = [p for p in parents if p not in new_inventory_vf]
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
249
            assert len(unavailable) == 0
250
            # this entry has all the non ghost parents in the inventory
251
            # file already.
252
            self._reweave_step('adding inventories')
1616.1.1 by Martin Pool
[merge] robertc
253
            if isinstance(new_inventory_vf, WeaveFile):
254
                # It's really a WeaveFile, but we call straight into the
255
                # Weave's add method to disable the auto-write-out behaviour.
1607.1.11 by Robert Collins
Merge from bzr.dev
256
                # This is done to avoid a revision_count * time-to-write additional overhead on 
257
                # reconcile.
1616.1.1 by Martin Pool
[merge] robertc
258
                new_inventory_vf._check_write_ok()
2794.1.1 by Robert Collins
Allow knits to be instructed not to add a text based on a sha, for commit.
259
                Weave._add_lines(new_inventory_vf, rev_id, parents,
2805.6.7 by Robert Collins
Review feedback.
260
                    self.inventory.get_lines(rev_id), None, None, None, False, True)
1563.2.42 by Robert Collins
Stop reconcile on weaves being quadratic.
261
            else:
1616.1.1 by Martin Pool
[merge] robertc
262
                new_inventory_vf.add_lines(rev_id, parents, self.inventory.get_lines(rev_id))
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
263
1616.1.1 by Martin Pool
[merge] robertc
264
        if isinstance(new_inventory_vf, WeaveFile):
265
            new_inventory_vf._save()
266
        # if this worked, the set of new_inventory_vf.names should equal
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
267
        # self.pending
1616.1.1 by Martin Pool
[merge] robertc
268
        assert set(new_inventory_vf.versions()) == self.pending
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
269
        self.pb.update('Writing weave')
1616.1.1 by Martin Pool
[merge] robertc
270
        self.repo.control_weaves.copy(new_inventory_vf, 'inventory', self.repo.get_transaction())
1563.2.25 by Robert Collins
Merge in upstream.
271
        self.repo.control_weaves.delete('inventory.new', self.repo.get_transaction())
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
272
        self.inventory = None
1570.1.2 by Robert Collins
Import bzrtools' 'fix' command as 'bzr reconcile.'
273
        self.pb.note('Inventory regenerated.')
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
274
1570.1.10 by Robert Collins
UI tweaks to reconcile - show progress for inventory backup.
275
    def _setup_steps(self, new_total):
276
        """Setup the markers we need to control the progress bar."""
277
        self.total = new_total
278
        self.count = 0
279
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
280
    def _graph_revision(self, rev_id):
281
        """Load a revision into the revision graph."""
282
        # pick a random revision
283
        # analyse revision id rev_id and put it in the stack.
284
        self._reweave_step('loading revisions')
1570.1.13 by Robert Collins
Check for incorrect revision parentage in the weave during revision access.
285
        rev = self.repo.get_revision_reconcile(rev_id)
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
286
        assert rev.revision_id == rev_id
287
        parents = []
288
        for parent in rev.parent_ids:
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
289
            if self._parent_is_available(parent):
1570.1.3 by Robert Collins
Optimise reconcilation to only hit each revision once.
290
                parents.append(parent)
291
            else:
292
                mutter('found ghost %s', parent)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
293
        self._rev_graph[rev_id] = parents
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
294
        if self._parents_are_inconsistent(rev_id, parents):
1570.1.8 by Robert Collins
Only reconcile if doing so will perform gc or correct ancestry.
295
            self.inconsistent_parents += 1
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
296
            mutter('Inconsistent inventory parents: id {%s} '
297
                   'inventory claims %r, '
298
                   'available parents are %r, '
299
                   'unavailable parents are %r',
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
300
                   rev_id,
301
                   set(self.inventory.get_parent_map([rev_id])[rev_id]),
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
302
                   set(parents),
303
                   set(rev.parent_ids).difference(set(parents)))
304
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
305
    def _parents_are_inconsistent(self, rev_id, parents):
306
        """Return True if the parents list of rev_id does not match the weave.
307
1759.2.2 by Jelmer Vernooij
Revert some of my spelling fixes and fix some typos after review by Aaron.
308
        This detects inconsistencies based on the self.thorough value:
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
309
        if thorough is on, the first parent value is checked as well as ghost
310
        differences.
311
        Otherwise only the ghost differences are evaluated.
312
        """
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
313
        weave_parents = self.inventory.get_parent_map([rev_id])[rev_id]
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
314
        weave_missing_old_ghosts = set(weave_parents) != set(parents)
315
        first_parent_is_wrong = (
316
            len(weave_parents) and len(parents) and
317
            parents[0] != weave_parents[0])
318
        if self.thorough:
319
            return weave_missing_old_ghosts or first_parent_is_wrong
320
        else:
321
            return weave_missing_old_ghosts
322
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
323
    def _check_garbage_inventories(self):
324
        """Check for garbage inventories which we cannot trust
325
326
        We cant trust them because their pre-requisite file data may not
327
        be present - all we know is that their revision was not installed.
328
        """
1692.1.3 by Robert Collins
Finish the reconcile tweak: filled in ghosts are a data loss issue and need to be checked during fast reconciles.
329
        if not self.thorough:
330
            return
1563.2.39 by Robert Collins
Merge from integration.
331
        inventories = set(self.inventory.versions())
1594.2.2 by Robert Collins
Trivial change to reconcile to mutter the cause of reconciliation to bzr.log
332
        revisions = set(self._rev_graph.keys())
333
        garbage = inventories.difference(revisions)
334
        self.garbage_inventories = len(garbage)
335
        for revision_id in garbage:
336
            mutter('Garbage inventory {%s} found.', revision_id)
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
337
1570.1.14 by Robert Collins
Enforce repository consistency during 'fetch' operations.
338
    def _parent_is_available(self, parent):
339
        """True if parent is a fully available revision
340
341
        A fully available revision has a inventory and a revision object in the
342
        repository.
343
        """
344
        return (parent in self._rev_graph or 
345
                (parent in self.inventory and self.repo.has_revision(parent)))
346
1570.1.4 by Robert Collins
Somewhat optimised version of reconciler.
347
    def _reweave_step(self, message):
348
        """Mark a single step of regeneration complete."""
349
        self.pb.update(message, self.count, self.total)
350
        self.count += 1
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
351
352
353
class KnitReconciler(RepoReconciler):
354
    """Reconciler that reconciles a knit format repository.
355
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
356
    This will detect garbage inventories and remove them in thorough mode.
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
357
    """
358
359
    def _reconcile_steps(self):
360
        """Perform the steps to reconcile this repository."""
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
361
        if self.thorough:
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
362
            try:
363
                self._load_indexes()
364
            except errors.BzrCheckError:
365
                self.aborted = True
366
                return
1692.1.1 by Robert Collins
* Repository.reconcile now takes a thorough keyword parameter to allow
367
            # knits never suffer this
368
            self._gc_inventory()
2745.6.13 by Aaron Bentley
Misc cleanup
369
            self._fix_text_parents()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
370
371
    def _load_indexes(self):
372
        """Load indexes for the reconciliation."""
373
        self.transaction = self.repo.get_transaction()
374
        self.pb.update('Reading indexes.', 0, 2)
375
        self.inventory = self.repo.get_inventory_weave()
376
        self.pb.update('Reading indexes.', 1, 2)
2819.2.5 by Andrew Bennetts
Make reconcile abort gracefully if the revision index has bad parents.
377
        self.repo._check_for_inconsistent_revision_parents()
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
378
        self.revisions = self.repo._revision_store.get_revision_file(self.transaction)
379
        self.pb.update('Reading indexes.', 2, 2)
380
381
    def _gc_inventory(self):
382
        """Remove inventories that are not referenced from the revision store."""
383
        self.pb.update('Checking unused inventories.', 0, 1)
384
        self._check_garbage_inventories()
385
        self.pb.update('Checking unused inventories.', 1, 3)
386
        if not self.garbage_inventories:
387
            self.pb.note('Inventory ok.')
388
            return
389
        self.pb.update('Backing up inventory...', 0, 0)
390
        self.repo.control_weaves.copy(self.inventory, 'inventory.backup', self.transaction)
391
        self.pb.note('Backup Inventory created.')
392
        # asking for '' should never return a non-empty weave
1616.1.1 by Martin Pool
[merge] robertc
393
        new_inventory_vf = self.repo.control_weaves.get_empty('inventory.new',
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
394
            self.transaction)
395
396
        # we have topological order of revisions and non ghost parents ready.
1594.2.9 by Robert Collins
Teach Knit repositories how to handle ghosts without corrupting at all.
397
        self._setup_steps(len(self.revisions))
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
398
        revision_ids = self.revisions.versions()
399
        graph = self.revisions.get_parent_map(revision_ids)
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
400
        for rev_id in TopoSorter(graph.items()).iter_topo_order():
3287.6.1 by Robert Collins
* ``VersionedFile.get_graph`` is deprecated, with no replacement method.
401
            parents = graph[rev_id]
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
402
            # double check this really is in topological order, ignoring existing ghosts.
403
            unavailable = [p for p in parents if p not in new_inventory_vf and
404
                p in self.revisions]
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
405
            assert len(unavailable) == 0
406
            # this entry has all the non ghost parents in the inventory
407
            # file already.
408
            self._reweave_step('adding inventories')
409
            # ugly but needed, weaves are just way tooooo slow else.
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
410
            new_inventory_vf.add_lines_with_ghosts(rev_id, parents,
411
                self.inventory.get_lines(rev_id))
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
412
1616.1.1 by Martin Pool
[merge] robertc
413
        # if this worked, the set of new_inventory_vf.names should equal
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
414
        # self.pending
1616.1.1 by Martin Pool
[merge] robertc
415
        assert set(new_inventory_vf.versions()) == set(self.revisions.versions())
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
416
        self.pb.update('Writing weave')
1616.1.1 by Martin Pool
[merge] robertc
417
        self.repo.control_weaves.copy(new_inventory_vf, 'inventory', self.transaction)
1594.2.7 by Robert Collins
Add versionedfile.fix_parents api for correcting data post hoc.
418
        self.repo.control_weaves.delete('inventory.new', self.transaction)
419
        self.inventory = None
420
        self.pb.note('Inventory regenerated.')
421
422
    def _check_garbage_inventories(self):
423
        """Check for garbage inventories which we cannot trust
424
425
        We cant trust them because their pre-requisite file data may not
426
        be present - all we know is that their revision was not installed.
427
        """
428
        inventories = set(self.inventory.versions())
429
        revisions = set(self.revisions.versions())
430
        garbage = inventories.difference(revisions)
431
        self.garbage_inventories = len(garbage)
432
        for revision_id in garbage:
433
            mutter('Garbage inventory {%s} found.', revision_id)
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
434
435
    def _fix_text_parents(self):
2745.6.13 by Aaron Bentley
Misc cleanup
436
        """Fix bad versionedfile parent entries.
437
2745.6.16 by Aaron Bentley
Update from review
438
        It is possible for the parents entry in a versionedfile entry to be
2745.6.13 by Aaron Bentley
Misc cleanup
439
        inconsistent with the values in the revision and inventory.
440
441
        This method finds entries with such inconsistencies, corrects their
442
        parent lists, and replaces the versionedfile with a corrected version.
443
        """
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
444
        transaction = self.repo.get_transaction()
2906.1.1 by Andrew Bennetts
Speed up reconcile by not repeatedly fetching the full inventories, by cache heads and parents queries, and by fetching revision trees in batches.
445
        versions = self.revisions.versions()
2927.2.2 by Andrew Bennetts
Only try to check versions that actually exist in the versioned file, and do a little more muttering.
446
        mutter('Prepopulating revision text cache with %d revisions',
447
                len(versions))
3036.1.3 by Robert Collins
Privatise VersionedFileChecker.
448
        vf_checker = self.repo._get_versioned_file_checker()
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
449
        # List all weaves before altering, to avoid race conditions when we
450
        # delete unused weaves.
451
        weaves = list(enumerate(self.repo.weave_store))
452
        for num, file_id in weaves:
2745.6.12 by Aaron Bentley
Do topological sorting when adding new records to VersionedFile
453
            self.pb.update('Fixing text parents', num,
454
                           len(self.repo.weave_store))
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
455
            vf = self.repo.weave_store.get_weave(file_id, transaction)
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
456
            versions_with_bad_parents, unused_versions = \
3036.1.2 by Robert Collins
Simplify the check_file_version_parents API some more. This has already changed in this release cycle.
457
                vf_checker.check_file_version_parents(vf, file_id)
2927.2.14 by Andrew Bennetts
Tweaks suggested by review.
458
            if (len(versions_with_bad_parents) == 0 and
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
459
                len(unused_versions) == 0):
2927.2.14 by Andrew Bennetts
Tweaks suggested by review.
460
                continue
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
461
            full_text_versions = set()
462
            self._fix_text_parent(file_id, vf, versions_with_bad_parents,
463
                full_text_versions, unused_versions)
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
464
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
465
    def _fix_text_parent(self, file_id, vf, versions_with_bad_parents,
466
            full_text_versions, unused_versions):
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
467
        """Fix bad versionedfile entries in a single versioned file."""
2927.2.2 by Andrew Bennetts
Only try to check versions that actually exist in the versioned file, and do a little more muttering.
468
        mutter('fixing text parent: %r (%d versions)', file_id,
469
                len(versions_with_bad_parents))
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
470
        mutter('(%d need to be full texts, %d are unused)',
471
                len(full_text_versions), len(unused_versions))
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
472
        new_vf = self.repo.weave_store.get_empty('temp:%s' % file_id,
473
            self.transaction)
474
        new_parents = {}
475
        for version in vf.versions():
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
476
            if version in unused_versions:
477
                continue
478
            elif version in versions_with_bad_parents:
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
479
                parents = versions_with_bad_parents[version][1]
480
            else:
3287.5.2 by Robert Collins
Deprecate VersionedFile.get_parents, breaking pulling from a ghost containing knit or pack repository to weaves, which improves correctness and allows simplification of core code.
481
                parents = vf.get_parent_map([version])[version]
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
482
            new_parents[version] = parents
2988.1.8 by Robert Collins
Change check and reconcile to use the new _generate_text_key_index rather
483
        if not len(new_parents):
484
            # No used versions, remove the VF.
485
            self.repo.weave_store.delete(file_id, self.transaction)
486
            return
2592.3.214 by Robert Collins
Merge bzr.dev.
487
        for version in TopoSorter(new_parents.items()).iter_topo_order():
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
488
            lines = vf.get_lines(version)
489
            parents = new_parents[version]
490
            if parents and (parents[0] in full_text_versions):
2927.2.10 by Andrew Bennetts
More docstrings, elaborate a comment with an XXX, and remove a little bit of cruft.
491
                # Force this record to be a fulltext, not a delta.
2927.2.3 by Andrew Bennetts
Add fulltexts to avoid bug 155730.
492
                new_vf._add(version, lines, parents, False,
493
                    None, None, None, False)
494
            else:
495
                new_vf.add_lines(version, parents, lines)
2745.6.53 by Andrew Bennetts
Some more changes suggested by review.
496
        self.repo.weave_store.copy(new_vf, file_id, self.transaction)
497
        self.repo.weave_store.delete('temp:%s' % file_id, self.transaction)
2745.6.11 by Aaron Bentley
Fix knit file parents to follow parentage from revision/inventory XML
498
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
499
500
class PackReconciler(RepoReconciler):
501
    """Reconciler that reconciles a pack based repository.
502
503
    Garbage inventories do not affect ancestry queries, and removal is
504
    considerably more expensive as there is no separate versioned file for
505
    them, so they are not cleaned. In short it is currently a no-op.
506
507
    In future this may be a good place to hook in annotation cache checking,
508
    index recreation etc.
509
    """
510
2592.3.239 by Martin Pool
doc
511
    # XXX: The index corruption that _fix_text_parents performs is needed for
512
    # packs, but not yet implemented. The basic approach is to:
513
    #  - lock the names list
514
    #  - perform a customised pack() that regenerates data as needed
515
    #  - unlock the names list
516
    # https://bugs.edge.launchpad.net/bzr/+bug/154173
517
2592.3.80 by Robert Collins
Make reconcile work, and pass tests.
518
    def _reconcile_steps(self):
519
        """Perform the steps to reconcile this repository."""
2951.1.2 by Robert Collins
Partial refactoring of pack_repo to create a Packer object for packing.
520
        if not self.thorough:
521
            return
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
522
        collection = self.repo._pack_collection
523
        collection.ensure_loaded()
524
        collection.lock_names()
2951.1.2 by Robert Collins
Partial refactoring of pack_repo to create a Packer object for packing.
525
        try:
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
526
            packs = collection.all_packs()
527
            all_revisions = self.repo.all_revision_ids()
528
            total_inventories = len(list(
529
                collection.inventory_index.combined_index.iter_all_entries()))
530
            if len(all_revisions):
531
                self._packer = repofmt.pack_repo.ReconcilePacker(
532
                    collection, packs, ".reconcile", all_revisions)
533
                new_pack = self._packer.pack(pb=self.pb)
534
                if new_pack is not None:
2951.1.10 by Robert Collins
Peer review feedback with Ian.
535
                    self._discard_and_save(packs)
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
536
            else:
537
                # only make a new pack when there is data to copy.
2951.1.10 by Robert Collins
Peer review feedback with Ian.
538
                self._discard_and_save(packs)
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
539
            self.garbage_inventories = total_inventories - len(list(
540
                collection.inventory_index.combined_index.iter_all_entries()))
2951.1.2 by Robert Collins
Partial refactoring of pack_repo to create a Packer object for packing.
541
        finally:
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
542
            collection._unlock_names()
543
2951.1.10 by Robert Collins
Peer review feedback with Ian.
544
    def _discard_and_save(self, packs):
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
545
        """Discard some packs from the repository.
546
2951.1.10 by Robert Collins
Peer review feedback with Ian.
547
        This removes them from the memory index, saves the in-memory index
548
        which makes the newly reconciled pack visible and hides the packs to be
549
        discarded, and finally renames the packs being discarded into the
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
550
        obsolete packs directory.
2951.1.10 by Robert Collins
Peer review feedback with Ian.
551
2951.1.3 by Robert Collins
Partial support for native reconcile with packs.
552
        :param packs: The packs to discard.
553
        """
554
        for pack in packs:
555
            self.repo._pack_collection._remove_pack_from_memory(pack)
556
        self.repo._pack_collection._save_pack_names()
557
        self.repo._pack_collection._obsolete_packs(packs)