bzr branch
http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
|
2592.3.229
by Martin Pool
Initial pack format documentation |
1 |
KnitPack repository format |
2 |
========================== |
|
3 |
||
4 |
Bazaar 0.92 adds a new format (experimental at first) implemented in |
|
5 |
``bzrlib.repofmt.pack_repo.py``. |
|
6 |
||
7 |
This format provides a knit-like interface which is quite compatible |
|
8 |
with knit format repositories: you can get a VersionedFile for a |
|
9 |
particular file-id, or for revisions, or for the inventory, even though |
|
10 |
these do not correspond to single files on disk. |
|
11 |
||
12 |
The on-disk format is that the repository directory contains these |
|
13 |
files and subdirectories: |
|
14 |
||
15 |
==================== ============================================= |
|
16 |
packs/ completed readonly packs |
|
17 |
indices/ indices for completed packs |
|
18 |
upload/ temporary files for packs currently being |
|
19 |
written |
|
20 |
obsolete_packs/ packs that have been repacked and are no |
|
21 |
longer normally needed |
|
22 |
pack-names index of all live packs |
|
23 |
lock/ lockdir |
|
24 |
==================== ============================================= |
|
25 |
||
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
26 |
Note that for consistency we always write "indices" not "indexes". |
27 |
||
|
2592.3.229
by Martin Pool
Initial pack format documentation |
28 |
This is implemented on top of pack files, which are written once from |
29 |
start to end, then left alone. A pack consists of a body file, plus |
|
30 |
several index files. There are four index files for each pack, which |
|
31 |
have the same basename and an extension indicating the purpose of the |
|
32 |
index: |
|
33 |
||
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
34 |
======== ========== ======================== ========================== |
35 |
extn Purpose Key References |
|
36 |
======== ========== ======================== ========================== |
|
37 |
``.tix`` File texts ``file_id, revision_id`` per-file parents, |
|
38 |
compression basis |
|
39 |
per-file parents |
|
40 |
``.six`` Signatures ``revision_id,`` - |
|
41 |
``.rix`` Revisions ``revision_id,`` revision parents |
|
42 |
``.iix`` Inventory ``revision_id,`` revision parents, |
|
43 |
compression base |
|
44 |
======== ========== ======================== ========================== |
|
|
2592.3.229
by Martin Pool
Initial pack format documentation |
45 |
|
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
46 |
Indices are accessed through the ``bzrlib.index.GraphIndex`` class. |
|
2592.3.229
by Martin Pool
Initial pack format documentation |
47 |
Indices are stored as sorted files on disk. Each line is one record, |
48 |
and contains: |
|
49 |
||
50 |
* key fields |
|
51 |
* a value string - for all these indices, this is an ascii decimal pair |
|
52 |
of "offset length" giving the position of the refenced data within |
|
53 |
the pack body file |
|
54 |
* a list of zero or more reference lists |
|
55 |
||
56 |
The reference lists let a graph be stored within the index. Each |
|
57 |
reference list entry points to another entry in the same index. The |
|
58 |
references are represented as a byte offset for the target within the |
|
59 |
index file. |
|
60 |
||
61 |
When a compression base is given, it indicates that the body of the text |
|
62 |
or inventory is a forward delta from the referenced revision. The |
|
63 |
compression base list must have length 0 or 1. |
|
64 |
||
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
65 |
Like packs, indexes are written only once and then unmodified. A |
66 |
GraphIndex builder is a mutable in-memory graph that can be sorted, |
|
67 |
cross-referenced and written out when the write group completes. |
|
68 |
||
69 |
There can also be index entries with a value of 'a' for absent. These |
|
70 |
records exist just to be pointed to in a graph. This is used, for |
|
71 |
example, to give the revision-parent pointer when the parent revision is |
|
72 |
in a previous pack. |
|
73 |
||
|
2592.3.229
by Martin Pool
Initial pack format documentation |
74 |
The data content for each record is a knit data chunk. The knits are |
75 |
always unannotated - the annotations must be generated when needed. |
|
76 |
(We'd like to cache/memoize the annotations.) The data hunks can be |
|
77 |
moved between packs without needing to recompress them. |
|
78 |
||
79 |
It is not possible to regenerate an index from the body file, because it |
|
80 |
contains information stored in the knit index that's not in the body. |
|
81 |
(In particular, the per-file graph is only stored in the index.) |
|
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
82 |
We would like to change this in a future format. |
|
2592.3.229
by Martin Pool
Initial pack format documentation |
83 |
|
84 |
The lock is a regular LockDir lock. The lock is only held for a much |
|
85 |
reduced scope, while updating the pack-names file. The bulk of the |
|
86 |
insertion can be done without the repository locked. This is an |
|
87 |
implementation detail; the repository user should still call |
|
88 |
``repository.lock_write`` at the regular time but be aware this does not |
|
89 |
correspond to a physical mutex. |
|
90 |
||
91 |
Read locks control caching but do not affect writers. |
|
92 |
||
93 |
The newly-added repository write group concept is very important to |
|
94 |
KnitPack repositories. When ``start_write_group`` is called, a new |
|
95 |
temporary pack is created and all modifications to the repository will |
|
96 |
go into it until either ``commit_write_group`` or ``abort_write_group`` |
|
97 |
is called, at which time it is either finished and moved into place or |
|
98 |
discarded respectively. Write groups cannot be nested, only one can be |
|
99 |
underway at a time on a Repository instance and they must occur within a |
|
100 |
write lock. |
|
101 |
||
102 |
Normally the data for each revision will be entirely within a single |
|
103 |
pack but this is not required. |
|
104 |
||
105 |
When a pack is finished, it gets a final name based on the md5 of all |
|
106 |
the data written into the pack body file. |
|
107 |
||
108 |
The ``pack-names`` file gives the list of all finished non-obsolete |
|
109 |
packs. (This should always be the same as the list of files in the |
|
110 |
``packs/`` directory, but the file is needed for readonly http clients |
|
111 |
that can't easily list directories, and it includes other information.) |
|
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
112 |
The constraint on the ``pack-names`` list is that every file mentioned |
113 |
must exist in the ``packs/`` directory. |
|
114 |
||
115 |
In rare cases, when a writer is interrupted, about-to-be-removed packs |
|
116 |
may still be present in the directory but removed from the list. |
|
117 |
||
118 |
As well as the list of names, the pack-names file also contains the |
|
119 |
size, in bytes, of each of the four indices. This is used to bootstrap |
|
120 |
bisection search within the indices. |
|
|
2592.3.229
by Martin Pool
Initial pack format documentation |
121 |
|
122 |
In normal use, one pack will be created for each commit to a repository. |
|
123 |
This would build up to an inefficient number of files over time, so a |
|
124 |
``repack`` operation is available to recombine them, by producing larger |
|
125 |
files containing data on multiple revisions. This can be done manually |
|
126 |
by running ``bzr pack``, and it also may happen automatically when a |
|
127 |
write group is committed. |
|
128 |
||
129 |
The repacking strategy used at the moment tries to balance not doing too |
|
130 |
much work during commit with not having too many small files left in the |
|
131 |
repository. The algorithm is roughly this: the total number of |
|
132 |
revisions in the repository is expressed as a decimal number, e.g. |
|
133 |
"532". Then we'll repack until we have five packs containing a hundred |
|
134 |
revisions each, three packs containing ten revisions each, and two packs |
|
135 |
with single revisions. This means that each revision will normally |
|
136 |
initially be created in a single-revision pack, then moved to a |
|
137 |
ten-revision pack, then to a 100-pack, and so on. |
|
138 |
||
|
2592.3.230
by Martin Pool
Review comments on knitpack docs |
139 |
As with other repositories, in normal use data is only inserted. |
140 |
However, in some circumstances we may want to garbage-collect or prune |
|
141 |
existing data, or reconcile indexes. |
|
|
2592.3.229
by Martin Pool
Initial pack format documentation |
142 |
|
143 |
vim: tw=72 ft=rest expandtab |