bzr branch
http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
| 
2511.1.4
by Ian Clatworthy
 updated NEWS and added commit performance notes to doc/developers  | 
1  | 
Commit Performance Notes  | 
2  | 
========================  | 
|
3  | 
||
| 
2541.2.1
by Robert Collins
 Document contributing to the performance drive somewhat.  | 
4  | 
.. contents:: :local:  | 
| 
2511.1.4
by Ian Clatworthy
 updated NEWS and added commit performance notes to doc/developers  | 
5  | 
|
| 
2513.1.5
by Martin Pool
 More analysis of commit  | 
6  | 
Changes to commit  | 
7  | 
-----------------  | 
|
8  | 
||
9  | 
We want to improve the commit code in two phases.  | 
|
10  | 
||
11  | 
Phase one is to have a better separation from the format-specific logic,  | 
|
12  | 
the user interface, and the general process of committing.  | 
|
13  | 
||
14  | 
Phase two is to have better interfaces by which a good workingtree format  | 
|
15  | 
can efficiently pass data to a good storage format. If we get phase one  | 
|
16  | 
right, it will be relatively easy and non-disruptive to bring this in.  | 
|
17  | 
||
18  | 
||
| 
2511.1.4
by Ian Clatworthy
 updated NEWS and added commit performance notes to doc/developers  | 
19  | 
Commit: The Minimum Work Required  | 
20  | 
---------------------------------  | 
|
21  | 
||
22  | 
Here is a description of the minimum work that commit must do. We  | 
|
23  | 
want to make sure that our design doesn't cost too much more than this  | 
|
24  | 
minimum. I am trying to do this without making too many assumptions  | 
|
25  | 
about the underlying storage, but am assuming that the ui and basic  | 
|
26  | 
architecture (wt, branch, repo) stays about the same.  | 
|
27  | 
||
28  | 
The basic purpose of commit is to:  | 
|
29  | 
||
30  | 
1. create and store a new revision based on the contents of the working tree  | 
|
31  | 
2. make this the new basis revision for the working tree  | 
|
32  | 
||
33  | 
We can do a selected commit of only some files or subtrees.  | 
|
34  | 
||
35  | 
The best performance we could hope for is:  | 
|
36  | 
- stat each versioned selected working file once  | 
|
37  | 
- read from the workingtree and write into the repository any new file texts  | 
|
38  | 
- in general, do work proportional to the size of the shape (eg  | 
|
39  | 
inventory) of the old and new selected trees, and to the total size of  | 
|
40  | 
the modified files  | 
|
41  | 
||
42  | 
In more detail:  | 
|
43  | 
||
44  | 
1.0 - Store new file texts: if a versioned file contains a new text  | 
|
45  | 
there is no avoiding storing it. To determine which ones have changed  | 
|
46  | 
we must go over the workingtree and at least stat each file. If the  | 
|
47  | 
file is modified since it was last hashed, it must be read in.  | 
|
48  | 
Ideally we would read it only once, and either notice that it has not  | 
|
49  | 
changed, or store it at that point.  | 
|
50  | 
||
51  | 
On the other hand we want new code to be able to handle files that are  | 
|
52  | 
larger than will fit in memory. We may then need to read each file up  | 
|
53  | 
to two times: once to determine if there is a new text and calculate  | 
|
54  | 
its hash, and again to store it.  | 
|
55  | 
||
56  | 
1.1 - Store a tree-shape description (ie inventory or similar.) This  | 
|
57  | 
describes the non-file objects, and provides a reference from the  | 
|
58  | 
Revision to the texts within it.  | 
|
59  | 
||
60  | 
1.2 - Generate and store a new revision object.  | 
|
61  | 
||
62  | 
1.3 - Do delta-compression on the stored objects. (git notably does  | 
|
63  | 
not do this at commit time, deferring this entirely until later.)  | 
|
64  | 
This requires finding the appropriate basis for each modified file: in  | 
|
65  | 
the current scheme we get the file id, last-revision from the  | 
|
66  | 
dirstate, look into the knit for that text, extract that text in  | 
|
67  | 
total, generate a delta, then store that into the knit. Most delta  | 
|
| 
2513.1.2
by Martin Pool
 Remove duplicated commit use case documentation  | 
68  | 
operations are O(n**2) to O(n**3) in the size of the modified files.  | 
| 
2511.1.4
by Ian Clatworthy
 updated NEWS and added commit performance notes to doc/developers  | 
69  | 
|
70  | 
1.4 - Cache annotation information for the changes: at the moment this  | 
|
71  | 
is done as part of the delta storage. There are some flaws in that  | 
|
72  | 
approach, such as that it is not updated when ghosts are filled, and  | 
|
73  | 
the annotation can't be re-run with new diff parameters.  | 
|
74  | 
||
75  | 
2.1 - Make the new revision the basis for the tree, and clear the list  | 
|
76  | 
of parents. Strictly this is all that's logically necessary, unless  | 
|
77  | 
the working tree format requires more work.  | 
|
78  | 
||
79  | 
The dirstate format does require more work, because it caches the  | 
|
80  | 
parent tree data for each file within the working tree data. In  | 
|
81  | 
practice this means that every commit rewrites the entire dirstate  | 
|
82  | 
file - we could try to avoid rewriting the whole file but this may be  | 
|
83  | 
difficult because variable-length data (the last-changed revision id)  | 
|
84  | 
is inserted into many rows.  | 
|
85  | 
||
86  | 
The current dirstate design then seems to mean that any commit of a  | 
|
87  | 
single file imposes a cost proportional to the size of the current  | 
|
88  | 
workingtree. Maybe there are other benefits that outweigh this.  | 
|
89  | 
Alternatively if it was fast enough for operations to always look at  | 
|
90  | 
the original storage of the parent trees we could do without the  | 
|
91  | 
cache.  | 
|
92  | 
||
93  | 
2.2 - Record the observed file hashes into the workingtree control  | 
|
94  | 
files. For the files that we just committed, we have the information  | 
|
95  | 
to store a valid hash cache entry: we know their stat information and  | 
|
96  | 
the sha1 of the file contents. This is not strictly necessary to the  | 
|
97  | 
speed of commit, but it will be useful later in avoiding reading those  | 
|
98  | 
files, and the only cost of doing it now is writing it out.  | 
|
99  | 
||
100  | 
In fact there are some user interface niceties that complicate this:  | 
|
101  | 
||
102  | 
3 - Before starting the commit proper, we prompt for a commit message  | 
|
103  | 
and in that commit message editor we show a list of the files that  | 
|
104  | 
will be committed: basically the output of bzr status. This is  | 
|
105  | 
basically the same as the list of changes we detect while storing the  | 
|
106  | 
commit, but because the user will sometimes change the tree after  | 
|
107  | 
opening the commit editor and expect the final state to be committed I  | 
|
108  | 
think we do have to look for changes twice. Since it takes the user a  | 
|
109  | 
while to enter a message this is not a big problem as long as both the  | 
|
110  | 
status summary and the commit are individually fast.  | 
|
111  | 
||
112  | 
4 - As the commit proceeds (or after?) we show another status-like  | 
|
113  | 
summary. Just printing the names of modified files as they're stored  | 
|
114  | 
would be easy. Recording deleted and renamed files or directories is  | 
|
115  | 
more work: this can only be done by reference to the primary parent  | 
|
116  | 
tree and requires it be read in. Worse, reporting renames requires  | 
|
117  | 
searching by id across the entire parent tree. Possibly full  | 
|
118  | 
reporting should be a default-off verbose option because it does  | 
|
119  | 
require more work beyond the commit itself.  | 
|
120  | 
||
121  | 
5 - Bazaar currently allows for missing files to be automatically  | 
|
122  | 
marked as removed at the time of commit. Leaving aside the ui  | 
|
123  | 
consequences, this means that we have to update the working inventory  | 
|
124  | 
to mark these files as removed. Since as discussed above we always  | 
|
125  | 
have to rewrite the dirstate on commit this is not substantial, though  | 
|
126  | 
we should make sure we do this in one pass, not two. I have  | 
|
127  | 
previously proposed to make this behaviour a non-default option.  | 
|
128  | 
||
129  | 
We may need to run hooks or generate signatures during commit, but  | 
|
130  | 
they don't seem to have substantial performance consequences.  | 
|
131  | 
||
132  | 
If one wanted to optimize solely for the speed of commit I think  | 
|
133  | 
hash-addressed file-per-text storage like in git (or bzr 0.1) is very  | 
|
134  | 
good. Remarkably, it does not need to read the inventory for the  | 
|
135  | 
previous revision. For each versioned file, we just need to get its  | 
|
136  | 
hash, either by reading the file or validating its stat data. If that  | 
|
137  | 
hash is not already in the repository, the file is just copied in and  | 
|
138  | 
compressed. As directories are traversed, they're turned into texts  | 
|
139  | 
and stored as well, and then finally the revision is too. This does  | 
|
140  | 
depend on later doing some delta compression of these texts.  | 
|
141  | 
||
142  | 
Variations on this are possible. Rather than writing a single file  | 
|
143  | 
into the repository for each text, we could fold them into a single  | 
|
144  | 
collation or pack file. That would create a smaller number of files  | 
|
145  | 
in the repository, but looking up a single text would require looking  | 
|
146  | 
into their indexes rather than just asking the filesystem.  | 
|
147  | 
||
148  | 
Rather than using hashes we can use file-id/rev-id pairs as at  | 
|
149  | 
present, which has several consequences pro and con.  | 
|
150  | 
||
151  | 
||
152  | 
Commit vs Status  | 
|
153  | 
----------------  | 
|
154  | 
||
155  | 
At first glance, commit simply stores the changes status reports. In fact,  | 
|
156  | 
this isn't technically correct: commit considers some files modified that  | 
|
157  | 
status does not. The notes below were put together by John Arbash Meinel  | 
|
158  | 
and Aaron Bentley in May 2007 to explain the finer details of commit to  | 
|
159  | 
Ian Clatworthy. They are recorded here as they are likely to be useful to  | 
|
160  | 
others new to Bazaar ...  | 
|
161  | 
||
162  | 
1) **Unknown files have a different effect.** With --no-strict (the default)  | 
|
163  | 
they have no effect and can be completely ignored. With --strict they  | 
|
164  | 
should cause the commit to abort (so you don't forget to add the two new  | 
|
165  | 
test files that you just created).  | 
|
166  | 
||
167  | 
2) **Multiple parents.** 'status' always compares 2 trees, typically the  | 
|
168  | 
last-committed tree and the current working tree. 'commit' will compare  | 
|
169  | 
more trees if there has been a merge.  | 
|
170  | 
||
171  | 
a) The "last modified" property for files.  | 
|
172  | 
A file may be marked as changed since the last commit, but that  | 
|
173  | 
change may have come in from the merge, and the change could have  | 
|
174  | 
happened several commits back. There are several edge cases to be  | 
|
175  | 
handled here, like if both branches modified the same file, or if  | 
|
176  | 
just one branch modified it.  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
177  | 
|
| 
2511.1.4
by Ian Clatworthy
 updated NEWS and added commit performance notes to doc/developers  | 
178  | 
b) The trickier case is when a file appears unmodified since last  | 
179  | 
commit, but it was modified versus one of the merged branches. I  | 
|
180  | 
believe there are a few ways this can happen, like if a merged  | 
|
181  | 
branch changes a file and then reverts it back (you still update  | 
|
182  | 
the 'last modified' field).  | 
|
183  | 
In general, if both sides disagree on the 'last-modified' flag,  | 
|
184  | 
then you need to generate a new entry pointing 'last-modified' at  | 
|
185  | 
this revision (because you are resolving the differences between  | 
|
186  | 
the 2 parents).  | 
|
187  | 
||
188  | 
3) **Automatic deletion of 'missing' files.** This is a point that we go  | 
|
189  | 
back and forth on. I think the basic idea is that 'bzr commit' by  | 
|
190  | 
default should abort if it finds a 'missing' file (in case that file was  | 
|
191  | 
renamed rather than deleted), but 'bzr commit --auto' can add unknown  | 
|
192  | 
files and remove missing files automatically.  | 
|
193  | 
||
194  | 
4) **sha1 for newly added files.** status doesn't really need this: it should  | 
|
195  | 
only care that the file is not present in base, but is present now. In  | 
|
196  | 
some ways commit doesn't care either, since it needs to read and sha the  | 
|
197  | 
file itself anyway.  | 
|
198  | 
||
199  | 
5) **Nested trees.** status doesn't recurse into nested trees, but commit does.  | 
|
200  | 
This is just because not all of the nested-trees work has been merged yet.  | 
|
201  | 
||
202  | 
A tree-reference is considered modified if the subtree has been  | 
|
203  | 
committed since the last containing-tree commit. But commit needs to  | 
|
204  | 
recurse into every subtree, to ensure that a commit is done if the  | 
|
205  | 
subtree has changed since its last commit. _iter_changes only reports  | 
|
206  | 
on tree-references that are modified, so it can't be used for doing  | 
|
207  | 
subtree commits.  | 
|
208  | 
||
209  | 
||
210  | 
Avoiding Work: Smarter Change Detection  | 
|
211  | 
---------------------------------------  | 
|
212  | 
||
213  | 
Commit currently walks through every file building an inventory. Here is  | 
|
214  | 
Aaron's brain dump on a better way ...  | 
|
215  | 
||
216  | 
_iter_changes won't tell us about tree references that haven't changed,  | 
|
217  | 
even if those subtrees have changed. (Unless we ask for unchanged  | 
|
218  | 
files, which we don't want to do, of course.)  | 
|
219  | 
||
220  | 
There is an iter_references method, but using it looks just as expensive  | 
|
221  | 
as calling kind().  | 
|
222  | 
||
223  | 
I did some work on updating commit to use iter_changes, but found for  | 
|
224  | 
multi-parent trees, I had to fall back to the slow inventory comparison  | 
|
225  | 
approach.  | 
|
226  | 
||
227  | 
Really, I think we need a call akin to iter_changes that handles  | 
|
228  | 
multiple parents, and knows to emit entries when InventoryEntry.revision  | 
|
229  | 
is all that's changed.  | 
|
230  | 
||
231  | 
||
232  | 
Avoiding Work: Better Layering  | 
|
233  | 
------------------------------  | 
|
234  | 
||
235  | 
For each file, commit is currently doing more work than it should. Here is  | 
|
236  | 
John's take on a better way ...  | 
|
237  | 
||
238  | 
Note that "_iter_changes" *does* have to touch every path on disk, but  | 
|
239  | 
it just can do it in a more efficient manner. (It doesn't have to create  | 
|
240  | 
an InventoryEntry for all the ones that haven't changed).  | 
|
241  | 
||
242  | 
I agree with Aaron that we need something a little different than  | 
|
243  | 
_iter_changes. Both because of handling multiple parents, as well as we  | 
|
244  | 
don't want it to actually read the files if we have a stat-cache miss.  | 
|
245  | 
||
246  | 
Specifically, the commit code *has* to read the files because it is  | 
|
247  | 
going to add the text to the repository, and we want it to compute the  | 
|
248  | 
sha1 at *that* time, so we are guaranteed to have the valid sha (rather  | 
|
249  | 
than just whatever the last cached one was). So we want the code to  | 
|
250  | 
return 'None' if it doesn't have an up-to-date sha1, rather than reading  | 
|
251  | 
the file and computing it, just before it returns it to the parent.  | 
|
252  | 
||
253  | 
The commit code (0.16) should really be restructured. It's layering is  | 
|
254  | 
pretty wrong.  | 
|
255  | 
||
256  | 
Specifically, calling "kind()" requires a stat of the file. But we have  | 
|
257  | 
to do a stat to get the size/whether the record is up-to-date, etc. So  | 
|
258  | 
we really need to have a "create_an_up_to_date_inventory()" function.  | 
|
259  | 
But because we are accessing every object on disk, we want to be working  | 
|
260  | 
in tuples rather than Inventory objects. And because DirState already  | 
|
261  | 
has the parent records next to the current working inventory, it can do  | 
|
262  | 
all the work to do really fast comparison and throw-away of unimportant  | 
|
263  | 
records.  | 
|
264  | 
||
265  | 
The way I made "bzr status" fast is by moving the 'ignore this record'  | 
|
266  | 
ability as deep into the stack as I could get. Status has the property  | 
|
267  | 
that you don't care about most of the records, just like commit. So the  | 
|
268  | 
sooner you can stop evaluating the 99% that you don't care about, the  | 
|
269  | 
less work you do.  | 
|
270  | 
||
| 
2513.1.2
by Martin Pool
 Remove duplicated commit use case documentation  | 
271  | 
|
| 
2513.1.3
by Martin Pool
 More commit specs  | 
272  | 
Avoiding work: avoiding reading parent data  | 
273  | 
-------------------------------------------  | 
|
274  | 
||
275  | 
We would like to avoid the work of reading any data about the parent  | 
|
276  | 
revisions. We should at least try to avoid reading anything from the  | 
|
277  | 
repository; we can also consider whether it is possible or useful to hold  | 
|
278  | 
less parent information in the working tree.  | 
|
279  | 
||
280  | 
When a commit of selected files is requested, the committed snapshot is a  | 
|
281  | 
composite of some directories from the parent revision and some from the  | 
|
282  | 
working tree. In this case it is logically necessary to have the parent  | 
|
283  | 
inventory information.  | 
|
284  | 
||
285  | 
If file last-change information or per-file graph information is stored  | 
|
286  | 
then it must be available from the parent trees.  | 
|
287  | 
||
288  | 
If the Branch's storage method does delta compression at commit time it  | 
|
289  | 
may need to retrieve file or inventory texts from the repository.  | 
|
290  | 
||
291  | 
It is desirable to avoid roundtrips to the Repository during commit,  | 
|
292  | 
particularly because it may be remote. If the WorkingTree can determine  | 
|
293  | 
by itself that a text was in the parent and therefore should be in the  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
294  | 
Repository that avoids one roundtrip per file.  | 
| 
2513.1.3
by Martin Pool
 More commit specs  | 
295  | 
|
296  | 
There is a possibility here that the parent revision is not stored, or not  | 
|
297  | 
correctly stored, in the repository the tree is being committed into, and  | 
|
298  | 
so the committed tree would not be reconstructable. We could check that  | 
|
299  | 
the parent revision is present in the inventory and rely on the invariant  | 
|
300  | 
that if a revision is present, everything to reconstruct it will be  | 
|
301  | 
present too.  | 
|
302  | 
||
| 
2513.1.5
by Martin Pool
 More analysis of commit  | 
303  | 
|
304  | 
Code structure  | 
|
305  | 
--------------  | 
|
306  | 
||
307  | 
Caller starts a commit  | 
|
308  | 
||
309  | 
>>> Branch.commit(from_tree, options)  | 
|
310  | 
||
311  | 
This creates a CommitBuilder object matched to the Branch, Repository and  | 
|
312  | 
Tree. It can vary depending on model differences or by knowledge of what  | 
|
313  | 
is efficient with the Repository and Tree. Model differences might  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
314  | 
include whether no-text-change merges need to be reported, and whether the  | 
| 
2513.1.5
by Martin Pool
 More analysis of commit  | 
315  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
316  | 
The basic CommitBuilder.commit structure can be  | 
| 
2513.1.5
by Martin Pool
 More analysis of commit  | 
317  | 
|
318  | 
1. Ask the branch if it is ready to commit (up to date with master if  | 
|
319  | 
any.)  | 
|
320  | 
||
321  | 
2. Ask the tree if it is ready to commit to the branch (up to date with  | 
|
322  | 
branch?), no conflicts, etc  | 
|
323  | 
||
324  | 
3. Commit changed files; prototype implementation:  | 
|
325  | 
||
326  | 
a. Ask the working tree for all committable files; for each it should  | 
|
327  | 
return the per-file parents, stat information, kind, etc.  | 
|
328  | 
||
329  | 
b. Ask the repository to store the new file text; the repository should  | 
|
330  | 
return the stored sha1 and new revision id.  | 
|
331  | 
||
332  | 
4. Commit changed inventory  | 
|
333  | 
||
334  | 
5. Commit revision object  | 
|
335  | 
||
336  | 
||
337  | 
||
338  | 
||
339  | 
||
340  | 
||
341  | 
||
342  | 
||
343  | 
||
| 
2513.1.3
by Martin Pool
 More commit specs  | 
344  | 
Complications of commit  | 
345  | 
-----------------------  | 
|
346  | 
||
347  | 
Bazaar (as of 0.17) does not support selective-file commit of a merge;  | 
|
348  | 
this could be done if we decide how it should be recorded - is this to be  | 
|
349  | 
stored as an overall merge revision; as a preliminary non-merge revisions;  | 
|
350  | 
or will the per-file graph diverge from the revision graph.  | 
|
351  | 
||
352  | 
There are several checks that may cause the commit to be refused, which  | 
|
353  | 
may be activated or deactivated by options.  | 
|
354  | 
||
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
355  | 
* presence of conflicts in the tree  | 
356  | 
||
357  | 
* presence of unknown files  | 
|
358  | 
||
359  | 
* the working tree basis is up to date with the branch tip  | 
|
360  | 
||
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
361  | 
* the local branch is up to date with the master branch, if there  | 
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
362  | 
is one and --local is not specified  | 
363  | 
||
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
364  | 
* an empty commit message is given,  | 
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
365  | 
|
366  | 
* a hook flags an error  | 
|
367  | 
||
368  | 
* a "pointless" commit, with no inventory changes  | 
|
| 
2513.1.3
by Martin Pool
 More commit specs  | 
369  | 
|
370  | 
Most of these require walking the tree and can be easily done while  | 
|
371  | 
recording the tree shape. This does require that it be possible to abort  | 
|
372  | 
the commit after the tree changes have been recorded. It could be ok to  | 
|
373  | 
either leave the unreachable partly-committed records in the repository,  | 
|
374  | 
or to roll back.  | 
|
375  | 
||
376  | 
Other complications:  | 
|
377  | 
||
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
378  | 
* when automatically adding new files or deleting missing files during  | 
379  | 
commit, they must be noted during commit and written into the working  | 
|
380  | 
tree at some point  | 
|
381  | 
||
382  | 
* refuse "pointless" commits with no file changes - should be easy by  | 
|
383  | 
just refusing to do the final step of storing a new overall inventory  | 
|
384  | 
and revision object  | 
|
385  | 
||
386  | 
* heuristic detection of renames between add and delete (out of scope for  | 
|
387  | 
this change)  | 
|
388  | 
||
389  | 
* pushing changes to a master branch if any  | 
|
390  | 
||
391  | 
* running hooks, pre and post commit  | 
|
392  | 
||
393  | 
* prompting for a commit message if necessary, including a list of the  | 
|
394  | 
changes that have already been observed  | 
|
395  | 
||
396  | 
* if there are tree references and recursing into them is enabled, then  | 
|
397  | 
do so  | 
|
| 
2513.1.3
by Martin Pool
 More commit specs  | 
398  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
399  | 
Commit needs to protect against duplicated file ids  | 
| 
2513.1.5
by Martin Pool
 More analysis of commit  | 
400  | 
|
401  | 
||
| 
2513.1.3
by Martin Pool
 More commit specs  | 
402  | 
Updates that need to be made in the working tree, either on conclusion  | 
403  | 
of commit or during the scan, include  | 
|
404  | 
||
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
405  | 
* Changes made to the tree shape, including automatic adds, renames or  | 
406  | 
deletes  | 
|
407  | 
||
408  | 
* For trees (eg dirstate) that cache parent inventories, the old parent  | 
|
409  | 
information must be removed and the new one inserted  | 
|
410  | 
||
411  | 
* The tree hashcache information should be updated to reflect the stat  | 
|
412  | 
value at which the file was the same as the committed version, and the  | 
|
413  | 
content hash it was observed to have. This needs to be done carefully to  | 
|
414  | 
prevent inconsistencies if the file is modified during or shortly after  | 
|
415  | 
the commit. Perhaps it would work to read the mtime of the file before we  | 
|
416  | 
read its text to commit.  | 
|
| 
2513.1.3
by Martin Pool
 More commit specs  | 
417  | 
|
418  | 
||
| 
2513.1.2
by Martin Pool
 Remove duplicated commit use case documentation  | 
419  | 
Interface stack  | 
420  | 
---------------  | 
|
421  | 
||
422  | 
The commit api is invoked by the command interface, and copies information  | 
|
423  | 
from the tree into the branch and its repository, possibly updating the  | 
|
424  | 
WorkingTree afterwards.  | 
|
425  | 
||
426  | 
The command interface passes:  | 
|
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
427  | 
|
428  | 
* a commit message (from an option, if any),  | 
|
429  | 
* or an indication that it should be read interactively from the ui object;  | 
|
430  | 
* a list of files to commit  | 
|
431  | 
* an option for a dry-run commit  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
432  | 
* verbose option, or callback to indicate  | 
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
433  | 
* timestamp, timezone, committer, chosen revision id  | 
434  | 
* config (for what?)  | 
|
435  | 
* option for local-only commit on a bound branch  | 
|
436  | 
* option for strict commits (fail if there are unknown or missing files)  | 
|
437  | 
* option to allow "pointless" commits (with no tree changes)  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
438  | 
|
| 
2513.1.3
by Martin Pool
 More commit specs  | 
439  | 
(This is rather a lot of options to pass individually and just for code tidyness maybe some of them should be combine into objects.)  | 
| 
2513.1.2
by Martin Pool
 Remove duplicated commit use case documentation  | 
440  | 
|
| 
2513.1.3
by Martin Pool
 More commit specs  | 
441  | 
>>> Branch.commit(from_tree, message, files_to_commit, ...)  | 
| 
2513.1.2
by Martin Pool
 Remove duplicated commit use case documentation  | 
442  | 
|
443  | 
There will be different implementations of this for different Branch  | 
|
444  | 
classes, whether for foreign branches or Bazaar repositories using  | 
|
445  | 
different storage methods.  | 
|
446  | 
||
447  | 
Most of the commit should occur during a single lockstep iteration across  | 
|
448  | 
the workingtree and parent trees. The WorkingTree interface needs to  | 
|
449  | 
provide methods that give commit all it needs. Some of these methods  | 
|
450  | 
(such as answering the file's last change revision) may be deprecated in  | 
|
451  | 
newer working trees and there we have a choice of either calculating the  | 
|
452  | 
value from the data that is present, or refusing to support commit to  | 
|
453  | 
newer repositories.  | 
|
454  | 
||
455  | 
For a dirstate tree the iteration of changes from the parent can easily be  | 
|
456  | 
done within its own iter_changes.  | 
|
457  | 
||
458  | 
Dirstate inventories may be most easily updated in a single operation at  | 
|
459  | 
the end; however it may be best to accumulate data as we proceed through  | 
|
460  | 
the tree rather than revisiting it at the end.  | 
|
461  | 
||
462  | 
Showing a progress bar for commit may not be necessary if we report files  | 
|
463  | 
as they are committed. Alternatively we could transiently show a progress  | 
|
464  | 
bar for each directory that's scanned, even if no changes are observed.  | 
|
465  | 
||
466  | 
This needs to collect a list of added/changed/removed files, each of which  | 
|
467  | 
must have its text stored (if any) and containing directory updated. This  | 
|
468  | 
can be done by calling Tree._iter_changes on the source tree, asking for  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
469  | 
changes  | 
| 
2513.1.2
by Martin Pool
 Remove duplicated commit use case documentation  | 
470  | 
|
471  | 
In the 0.17 model the commit operation needs to know the per-file parents  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
472  | 
and per-file last-changed revision.  | 
| 
2513.1.2
by Martin Pool
 Remove duplicated commit use case documentation  | 
473  | 
|
474  | 
(In this and other operations we must avoid having multiple layers walk  | 
|
475  | 
over the tree separately. For example, it is no good to have the Command  | 
|
476  | 
layer walk the tree to generate a list of all file ids to commit, because  | 
|
477  | 
the tree will also be walked later. The layers that do need to operate  | 
|
478  | 
per-file should probably be bound together in a per-dirblock iterator,  | 
|
479  | 
rather than each iterating independently.)  | 
|
| 
2513.1.3
by Martin Pool
 More commit specs  | 
480  | 
|
481  | 
Branch->Tree interface  | 
|
482  | 
----------------------  | 
|
483  | 
||
484  | 
The Branch commit code needs to ask the Tree what should be committed, in  | 
|
485  | 
terms of changes from the parent revisions. If the Tree holds all the  | 
|
486  | 
necessary parent tree information itself it can do it single handed;  | 
|
487  | 
otherwise it may need to ask the Repository for parent information.  | 
|
488  | 
||
489  | 
This should be a streaming interface, probably like iter_changes returning  | 
|
490  | 
information per directory block.  | 
|
491  | 
||
492  | 
The interface should not return a block for directories that are  | 
|
493  | 
recursively unchanged.  | 
|
494  | 
||
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
495  | 
The tree's idea of what is possibly changed may be more conservative than  | 
496  | 
that of the branch. For example the tree may report on merges of files  | 
|
497  | 
where the text is identical to the parents: this must be recorded for  | 
|
498  | 
Bazaar branches that record per-file ancestry but is not necessary for all  | 
|
499  | 
branches. If the tree is responsible for determining when directories  | 
|
500  | 
have been recursively modified then it will report on all the parents of  | 
|
501  | 
such files. There are several implementation options:  | 
|
502  | 
||
503  | 
1. Return all files and directories the branch might want to commit, even  | 
|
504  | 
if the branch ends up taking no action on them.  | 
|
505  | 
||
506  | 
2. When starting the iteration, the branch can specify what type of change  | 
|
507  | 
is considered interesting.  | 
|
508  | 
||
509  | 
Since these types of changes are probably (??) rare compared to files that  | 
|
510  | 
are either completely unmodified or substantially modified, the first may  | 
|
511  | 
be the best and simplest option.  | 
|
512  | 
||
513  | 
The branch needs to build an inventory to commit, which must include  | 
|
514  | 
unchanged files within changed directories. This should be returned from  | 
|
515  | 
the working tree too. Repositories that store per-directory inventories  | 
|
516  | 
will want to build and store these from the lowest directories up.  | 
|
517  | 
For 0.17 format repositories with an all-in-one inventory it may be  | 
|
518  | 
easiest to accumulate inventory entries in arbitrary order into an  | 
|
519  | 
in-memory Inventory and then serialize it.  | 
|
520  | 
||
521  | 
It ought to be possible to commit any Tree into a Branch, without  | 
|
522  | 
requiring a WorkingTree; the commit code should cope if the tree is not  | 
|
523  | 
interested in updating hashcache information or does not have a  | 
|
524  | 
``last_revision``.  | 
|
525  | 
||
526  | 
||
527  | 
Information from the tree to repository  | 
|
528  | 
---------------------------------------  | 
|
529  | 
||
530  | 
The main things the tree needs to tell the Branch about are:  | 
|
531  | 
||
532  | 
* A file is modified from its parent revision (in text, permissions,  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
533  | 
other), and so its text may need to be stored.  | 
534  | 
||
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
535  | 
Files should also be reported if they have more than one unique parent  | 
536  | 
revision, for repositories that store per-file graphs or last-change  | 
|
537  | 
revisions. Perhaps this behaviour should be optional.  | 
|
538  | 
||
| 
2513.1.5
by Martin Pool
 More analysis of commit  | 
539  | 
**XXX:** are renames/deletions reported here too?  | 
540  | 
||
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
541  | 
* The complete contents of a modified directory, so that its inventory  | 
542  | 
text may be stored. This should be done after all the contained files  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
543  | 
and directories have been reported. If there are unmodified files,  | 
544  | 
or unselected files carried through from  | 
|
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
545  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
546  | 
XXX: Actually perhaps not grouped by directory, but rather grouped  | 
547  | 
appropriately for the shape of inventory storage in the repository.  | 
|
| 
2513.1.5
by Martin Pool
 More analysis of commit  | 
548  | 
|
549  | 
In a zoomed-in checkout the workingtree may not have all the shape data  | 
|
550  | 
for the entire tree.  | 
|
551  | 
||
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
552  | 
* A file is missing -- could cause either automatic removal or an aborted  | 
553  | 
commit.  | 
|
554  | 
||
555  | 
* Any unknown files -- can cause automatic addition, abortion of a strict  | 
|
556  | 
commit, or just reporting.  | 
|
557  | 
||
558  | 
||
559  | 
Information from the repository to the tree  | 
|
560  | 
-------------------------------------------  | 
|
561  | 
||
562  | 
After the commit the tree needs to be updated to the new revision. Some  | 
|
563  | 
information which was accumulated during the commit must be made available  | 
|
564  | 
to the workingtree. It's probably reasonable to hold it all in memory and  | 
|
565  | 
allow the workingtree to get it in whatever order it wants.  | 
|
566  | 
||
567  | 
* A list of modified entries, and for each one:  | 
|
568  | 
||
569  | 
* The stat values observed when the file was first read.  | 
|
570  | 
||
571  | 
* The hash of the committed file text.  | 
|
572  | 
||
573  | 
* The file's last-change revision, if appropriate.  | 
|
574  | 
||
575  | 
This should include any entries automatically added or removed.  | 
|
576  | 
||
577  | 
This might be construed as an enhanced version of ``set_parent_trees``.  | 
|
578  | 
We can avoid a stat on each file by using the value that was observed when  | 
|
579  | 
it was first read.  | 
|
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
580  | 
|
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
581  | 
|
582  | 
||
583  | 
Selective commit  | 
|
584  | 
----------------  | 
|
585  | 
||
586  | 
For a partial commit the directory contents may need to contain a mix of  | 
|
587  | 
entries from the working tree and parent trees. This code probably  | 
|
588  | 
shouldn't live in a specific tree implementation; maybe there should be a  | 
|
589  | 
general filter that selects paths from one tree into another?  | 
|
590  | 
||
591  | 
However, the tree walking code does probably need to know about selected  | 
|
592  | 
paths to avoid examining unselected files or directories.  | 
|
593  | 
||
| 
4853.1.1
by Patrick Regan
 Removed trailing whitespace from files in doc directory  | 
594  | 
We never refuse selective file commits (except of merges).  | 
| 
2513.1.5
by Martin Pool
 More analysis of commit  | 
595  | 
|
| 
2513.1.4
by Martin Pool
 More commit documentation  | 
596  | 
|
597  | 
||
598  | 
Common commit code  | 
|
599  | 
------------------  | 
|
600  | 
||
601  | 
What is common to all commit implementations, regardless of workingtree or  | 
|
602  | 
repository format?  | 
|
603  | 
||
604  | 
* Prompting for a commit message?  | 
|
605  | 
* Strictness/conflict checks?  | 
|
606  | 
* Auto add/remove?  | 
|
607  | 
||
608  | 
How should this be separated?  | 
|
609  | 
||
610  | 
||
611  | 
||
612  | 
Order of traversal  | 
|
613  | 
------------------  | 
|
614  | 
||
615  | 
For current and contemplated Bazaar storage formats, we can only finally  | 
|
616  | 
commit a directory after its contained files and directories have been  | 
|
617  | 
committed.  | 
|
618  | 
||
619  | 
The dirstate workingtree format naturally iterates by directory in order  | 
|
620  | 
by path, yielding directories before their contents. This may also be the  | 
|
621  | 
most efficient order in which to stat and read the files.  | 
|
622  | 
||
623  | 
One option would be to construe the interface as a visitor which reports  | 
|
624  | 
when files are detected to be changed, and also when directories are  | 
|
625  | 
finished.  | 
|
| 
2513.1.3
by Martin Pool
 More commit specs  | 
626  | 
|
627  | 
||
628  | 
Open question: per-file graphs  | 
|
629  | 
------------------------------  | 
|
630  | 
||
631  | 
**XXX:** If we want to retain explicitly stored per-file graphs, it would  | 
|
632  | 
seem that we do need to record per-file parents. We have not yet finally  | 
|
633  | 
settled that we do want to remove them or treat them as a cache. This api  | 
|
634  | 
stack is still ok whether we do or not, but the internals of it may  | 
|
635  | 
change.  |