1
# Copyright (C) 2006, 2007, 2008 Canonical Ltd
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
# GNU General Public License for more details.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
"""DirState objects record the state of a directory and its bzr metadata.
19
Pseudo EBNF grammar for the state file. Fields are separated by NULLs, and
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
are not - this is done for clarity of reading. All string data is in utf8.
23
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
26
WHOLE_NUMBER = {digit}, digit;
28
REVISION_ID = a non-empty utf8 string;
30
dirstate format = header line, full checksum, row count, parent details,
31
ghost_details, entries;
32
header line = "#bazaar dirstate flat format 3", NL;
33
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
34
row count = "num_entries: ", WHOLE_NUMBER, NL;
35
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
36
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
38
entry = entry_key, current_entry_details, {parent_entry_details};
39
entry_key = dirname, basename, fileid;
40
current_entry_details = common_entry_details, working_entry_details;
41
parent_entry_details = common_entry_details, history_entry_details;
42
common_entry_details = MINIKIND, fingerprint, size, executable
43
working_entry_details = packed_stat
44
history_entry_details = REVISION_ID;
47
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
49
Given this definition, the following is useful to know:
50
entry (aka row) - all the data for a given key.
51
entry[0]: The key (dirname, basename, fileid)
55
entry[1]: The tree(s) data for this path and id combination.
56
entry[1][0]: The current tree
57
entry[1][1]: The second tree
59
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate:
60
entry[1][0][0]: minikind
61
entry[1][0][1]: fingerprint
63
entry[1][0][3]: executable
64
entry[1][0][4]: packed_stat
66
entry[1][1][4]: revision_id
68
There may be multiple rows at the root, one per id present in the root, so the
69
in memory root row is now:
70
self._dirblocks[0] -> ('', [entry ...]),
71
and the entries in there are
74
entries[0][2]: file_id
75
entries[1][0]: The tree data for the current tree for this fileid at /
79
'r' is a relocated entry: This path is not present in this tree with this id,
80
but the id can be found at another location. The fingerprint is used to
81
point to the target location.
82
'a' is an absent entry: In that tree the id is not present at this path.
83
'd' is a directory entry: This path in this tree is a directory with the
84
current file id. There is no fingerprint for directories.
85
'f' is a file entry: As for directory, but its a file. The fingerprint is a
87
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
89
't' is a reference to a nested subtree; the fingerprint is the referenced
94
The entries on disk and in memory are ordered according to the following keys:
96
directory, as a list of components
100
--- Format 1 had the following different definition: ---
101
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
102
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
104
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
105
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
108
PARENT ROW's are emitted for every parent that is not in the ghosts details
109
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
110
each row will have a PARENT ROW for foo and baz, but not for bar.
113
In any tree, a kind of 'moved' indicates that the fingerprint field
114
(which we treat as opaque data specific to the 'kind' anyway) has the
115
details for the id of this row in that tree.
117
I'm strongly tempted to add a id->path index as well, but I think that
118
where we need id->path mapping; we also usually read the whole file, so
119
I'm going to skip that for the moment, as we have the ability to locate
120
via bisect any path in any tree, and if we lookup things by path, we can
121
accumulate an id->path mapping as we go, which will tend to match what we
124
I plan to implement this asap, so please speak up now to alter/tweak the
125
design - and once we stabilise on this, I'll update the wiki page for
128
The rationale for all this is that we want fast operations for the
129
common case (diff/status/commit/merge on all files) and extremely fast
130
operations for the less common but still occurs a lot status/diff/commit
131
on specific files). Operations on specific files involve a scan for all
132
the children of a path, *in every involved tree*, which the current
133
format did not accommodate.
137
1) Fast end to end use for bzr's top 5 uses cases. (commmit/diff/status/merge/???)
138
2) fall back current object model as needed.
139
3) scale usably to the largest trees known today - say 50K entries. (mozilla
140
is an example of this)
144
Eventually reuse dirstate objects across locks IFF the dirstate file has not
145
been modified, but will require that we flush/ignore cached stat-hit data
146
because we won't want to restat all files on disk just because a lock was
147
acquired, yet we cannot trust the data after the previous lock was released.
149
Memory representation:
150
vector of all directories, and vector of the childen ?
152
root_entrie = (direntry for root, [parent_direntries_for_root]),
154
('', ['data for achild', 'data for bchild', 'data for cchild'])
155
('dir', ['achild', 'cchild', 'echild'])
157
- single bisect to find N subtrees from a path spec
158
- in-order for serialisation - this is 'dirblock' grouping.
159
- insertion of a file '/a' affects only the '/' child-vector, that is, to
160
insert 10K elements from scratch does not generates O(N^2) memoves of a
161
single vector, rather each individual, which tends to be limited to a
162
manageable number. Will scale badly on trees with 10K entries in a
163
single directory. compare with Inventory.InventoryDirectory which has
164
a dictionary for the children. No bisect capability, can only probe for
165
exact matches, or grab all elements and sort.
166
- What's the risk of error here? Once we have the base format being processed
167
we should have a net win regardless of optimality. So we are going to
168
go with what seems reasonable.
171
Maybe we should do a test profile of the core structure - 10K simulated
172
searches/lookups/etc?
174
Objects for each row?
175
The lifetime of Dirstate objects is current per lock, but see above for
176
possible extensions. The lifetime of a row from a dirstate is expected to be
177
very short in the optimistic case: which we are optimising for. For instance,
178
subtree status will determine from analysis of the disk data what rows need to
179
be examined at all, and will be able to determine from a single row whether
180
that file has altered or not, so we are aiming to process tens of thousands of
181
entries each second within the dirstate context, before exposing anything to
182
the larger codebase. This suggests we want the time for a single file
183
comparison to be < 0.1 milliseconds. That would give us 10000 paths per second
184
processed, and to scale to 100 thousand we'll another order of magnitude to do
185
that. Now, as the lifetime for all unchanged entries is the time to parse, stat
186
the file on disk, and then immediately discard, the overhead of object creation
187
becomes a significant cost.
189
Figures: Creating a tuple from from 3 elements was profiled at 0.0625
190
microseconds, whereas creating a object which is subclassed from tuple was
191
0.500 microseconds, and creating an object with 3 elements and slots was 3
192
microseconds long. 0.1 milliseconds is 100 microseconds, and ideally we'll get
193
down to 10 microseconds for the total processing - having 33% of that be object
194
creation is a huge overhead. There is a potential cost in using tuples within
195
each row which is that the conditional code to do comparisons may be slower
196
than method invocation, but method invocation is known to be slow due to stack
197
frame creation, so avoiding methods in these tight inner loops in unfortunately
198
desirable. We can consider a pyrex version of this with objects in future if
207
from stat import S_IEXEC
226
def pack_stat(st, _encode=binascii.b2a_base64, _pack=struct.pack):
227
"""Convert stat values into a packed representation."""
228
# jam 20060614 it isn't really worth removing more entries if we
229
# are going to leave it in packed form.
230
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
231
# With all entries, filesize is 5.9M and read time is maybe 280ms
232
# well within the noise margin
234
# base64 encoding always adds a final newline, so strip it off
235
# The current version
236
return _encode(_pack('>LLLLLL'
237
, st.st_size, int(st.st_mtime), int(st.st_ctime)
238
, st.st_dev, st.st_ino & 0xFFFFFFFF, st.st_mode))[:-1]
239
# This is 0.060s / 1.520s faster by not encoding as much information
240
# return _encode(_pack('>LL', int(st.st_mtime), st.st_mode))[:-1]
241
# This is not strictly faster than _encode(_pack())[:-1]
242
# return '%X.%X.%X.%X.%X.%X' % (
243
# st.st_size, int(st.st_mtime), int(st.st_ctime),
244
# st.st_dev, st.st_ino, st.st_mode)
245
# Similar to the _encode(_pack('>LL'))
246
# return '%X.%X' % (int(st.st_mtime), st.st_mode)
249
class DirState(object):
250
"""Record directory and metadata state for fast access.
252
A dirstate is a specialised data structure for managing local working
253
tree state information. Its not yet well defined whether it is platform
254
specific, and if it is how we detect/parameterize that.
256
Dirstates use the usual lock_write, lock_read and unlock mechanisms.
257
Unlike most bzr disk formats, DirStates must be locked for reading, using
258
lock_read. (This is an os file lock internally.) This is necessary
259
because the file can be rewritten in place.
261
DirStates must be explicitly written with save() to commit changes; just
262
unlocking them does not write the changes to disk.
265
_kind_to_minikind = {
271
'tree-reference': 't',
273
_minikind_to_kind = {
279
't': 'tree-reference',
281
_stat_to_minikind = {
286
_to_yesno = {True:'y', False: 'n'} # TODO profile the performance gain
287
# of using int conversion rather than a dict here. AND BLAME ANDREW IF
290
# TODO: jam 20070221 Figure out what to do if we have a record that exceeds
291
# the BISECT_PAGE_SIZE. For now, we just have to make it large enough
292
# that we are sure a single record will always fit.
293
BISECT_PAGE_SIZE = 4096
296
IN_MEMORY_UNMODIFIED = 1
297
IN_MEMORY_MODIFIED = 2
299
# A pack_stat (the x's) that is just noise and will never match the output
302
NULL_PARENT_DETAILS = ('a', '', 0, False, '')
304
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
305
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
307
def __init__(self, path):
308
"""Create a DirState object.
310
:param path: The path at which the dirstate file on disk should live.
312
# _header_state and _dirblock_state represent the current state
313
# of the dirstate metadata and the per-row data respectiely.
314
# NOT_IN_MEMORY indicates that no data is in memory
315
# IN_MEMORY_UNMODIFIED indicates that what we have in memory
316
# is the same as is on disk
317
# IN_MEMORY_MODIFIED indicates that we have a modified version
318
# of what is on disk.
319
# In future we will add more granularity, for instance _dirblock_state
320
# will probably support partially-in-memory as a separate variable,
321
# allowing for partially-in-memory unmodified and partially-in-memory
323
self._header_state = DirState.NOT_IN_MEMORY
324
self._dirblock_state = DirState.NOT_IN_MEMORY
325
# If true, an error has been detected while updating the dirstate, and
326
# for safety we're not going to commit to disk.
327
self._changes_aborted = False
331
self._state_file = None
332
self._filename = path
333
self._lock_token = None
334
self._lock_state = None
335
self._id_index = None
336
# a map from packed_stat to sha's.
337
self._packed_stat_index = None
338
self._end_of_header = None
339
self._cutoff_time = None
340
self._split_path_cache = {}
341
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
342
if 'hashcache' in debug.debug_flags:
343
self._sha1_file = self._sha1_file_and_mutter
345
self._sha1_file = filters.sha_file_by_name
346
# These two attributes provide a simple cache for lookups into the
347
# dirstate in-memory vectors. By probing respectively for the last
348
# block, and for the next entry, we save nearly 2 bisections per path
350
self._last_block_index = None
351
self._last_entry_index = None
355
(self.__class__.__name__, self._filename)
357
def add(self, path, file_id, kind, stat, fingerprint):
358
"""Add a path to be tracked.
360
:param path: The path within the dirstate - '' is the root, 'foo' is the
361
path foo within the root, 'foo/bar' is the path bar within foo
363
:param file_id: The file id of the path being added.
364
:param kind: The kind of the path, as a string like 'file',
366
:param stat: The output of os.lstat for the path.
367
:param fingerprint: The sha value of the file,
368
or the target of a symlink,
369
or the referenced revision id for tree-references,
370
or '' for directories.
373
# find the block its in.
374
# find the location in the block.
375
# check its not there
377
#------- copied from inventory.ensure_normalized_name - keep synced.
378
# --- normalized_filename wants a unicode basename only, so get one.
379
dirname, basename = osutils.split(path)
380
# we dont import normalized_filename directly because we want to be
381
# able to change the implementation at runtime for tests.
382
norm_name, can_access = osutils.normalized_filename(basename)
383
if norm_name != basename:
387
raise errors.InvalidNormalization(path)
388
# you should never have files called . or ..; just add the directory
389
# in the parent, or according to the special treatment for the root
390
if basename == '.' or basename == '..':
391
raise errors.InvalidEntryName(path)
392
# now that we've normalised, we need the correct utf8 path and
393
# dirname and basename elements. This single encode and split should be
394
# faster than three separate encodes.
395
utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
396
dirname, basename = osutils.split(utf8path)
397
assert file_id.__class__ == str, \
398
"must be a utf8 file_id not %s" % (type(file_id))
399
# Make sure the file_id does not exist in this tree
400
file_id_entry = self._get_entry(0, fileid_utf8=file_id)
401
if file_id_entry != (None, None):
402
path = osutils.pathjoin(file_id_entry[0][0], file_id_entry[0][1])
403
kind = DirState._minikind_to_kind[file_id_entry[1][0][0]]
404
info = '%s:%s' % (kind, path)
405
raise errors.DuplicateFileId(file_id, info)
406
first_key = (dirname, basename, '')
407
block_index, present = self._find_block_index_from_key(first_key)
409
# check the path is not in the tree
410
block = self._dirblocks[block_index][1]
411
entry_index, _ = self._find_entry_index(first_key, block)
412
while (entry_index < len(block) and
413
block[entry_index][0][0:2] == first_key[0:2]):
414
if block[entry_index][1][0][0] not in 'ar':
415
# this path is in the dirstate in the current tree.
416
raise Exception, "adding already added path!"
419
# The block where we want to put the file is not present. But it
420
# might be because the directory was empty, or not loaded yet. Look
421
# for a parent entry, if not found, raise NotVersionedError
422
parent_dir, parent_base = osutils.split(dirname)
423
parent_block_idx, parent_entry_idx, _, parent_present = \
424
self._get_block_entry_index(parent_dir, parent_base, 0)
425
if not parent_present:
426
raise errors.NotVersionedError(path, str(self))
427
self._ensure_block(parent_block_idx, parent_entry_idx, dirname)
428
block = self._dirblocks[block_index][1]
429
entry_key = (dirname, basename, file_id)
432
packed_stat = DirState.NULLSTAT
435
packed_stat = pack_stat(stat)
436
parent_info = self._empty_parent_info()
437
minikind = DirState._kind_to_minikind[kind]
439
entry_data = entry_key, [
440
(minikind, fingerprint, size, False, packed_stat),
442
elif kind == 'directory':
443
entry_data = entry_key, [
444
(minikind, '', 0, False, packed_stat),
446
elif kind == 'symlink':
447
entry_data = entry_key, [
448
(minikind, fingerprint, size, False, packed_stat),
450
elif kind == 'tree-reference':
451
entry_data = entry_key, [
452
(minikind, fingerprint, 0, False, packed_stat),
455
raise errors.BzrError('unknown kind %r' % kind)
456
entry_index, present = self._find_entry_index(entry_key, block)
458
block.insert(entry_index, entry_data)
460
assert block[entry_index][1][0][0] == 'a', " %r(%r) already added" % (basename, file_id)
461
block[entry_index][1][0] = entry_data[1][0]
463
if kind == 'directory':
464
# insert a new dirblock
465
self._ensure_block(block_index, entry_index, utf8path)
466
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
468
self._id_index.setdefault(entry_key[2], set()).add(entry_key)
470
def _bisect(self, paths):
471
"""Bisect through the disk structure for specific rows.
473
:param paths: A list of paths to find
474
:return: A dict mapping path => entries for found entries. Missing
475
entries will not be in the map.
476
The list is not sorted, and entries will be populated
477
based on when they were read.
479
self._requires_lock()
480
# We need the file pointer to be right after the initial header block
481
self._read_header_if_needed()
482
# If _dirblock_state was in memory, we should just return info from
483
# there, this function is only meant to handle when we want to read
485
assert self._dirblock_state == DirState.NOT_IN_MEMORY
487
# The disk representation is generally info + '\0\n\0' at the end. But
488
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
489
# Because it means we can sync on the '\n'
490
state_file = self._state_file
491
file_size = os.fstat(state_file.fileno()).st_size
492
# We end up with 2 extra fields, we should have a trailing '\n' to
493
# ensure that we read the whole record, and we should have a precursur
494
# '' which ensures that we start after the previous '\n'
495
entry_field_count = self._fields_per_entry() + 1
497
low = self._end_of_header
498
high = file_size - 1 # Ignore the final '\0'
499
# Map from (dir, name) => entry
502
# Avoid infinite seeking
503
max_count = 30*len(paths)
505
# pending is a list of places to look.
506
# each entry is a tuple of low, high, dir_names
507
# low -> the first byte offset to read (inclusive)
508
# high -> the last byte offset (inclusive)
509
# dir_names -> The list of (dir, name) pairs that should be found in
510
# the [low, high] range
511
pending = [(low, high, paths)]
513
page_size = self._bisect_page_size
515
fields_to_entry = self._get_fields_to_entry()
518
low, high, cur_files = pending.pop()
520
if not cur_files or low >= high:
525
if count > max_count:
526
raise errors.BzrError('Too many seeks, most likely a bug.')
528
mid = max(low, (low+high-page_size)/2)
531
# limit the read size, so we don't end up reading data that we have
533
read_size = min(page_size, (high-mid)+1)
534
block = state_file.read(read_size)
537
entries = block.split('\n')
540
# We didn't find a '\n', so we cannot have found any records.
541
# So put this range back and try again. But we know we have to
542
# increase the page size, because a single read did not contain
543
# a record break (so records must be larger than page_size)
545
pending.append((low, high, cur_files))
548
# Check the first and last entries, in case they are partial, or if
549
# we don't care about the rest of this page
551
first_fields = entries[0].split('\0')
552
if len(first_fields) < entry_field_count:
553
# We didn't get the complete first entry
554
# so move start, and grab the next, which
555
# should be a full entry
556
start += len(entries[0])+1
557
first_fields = entries[1].split('\0')
560
if len(first_fields) <= 2:
561
# We didn't even get a filename here... what do we do?
562
# Try a large page size and repeat this query
564
pending.append((low, high, cur_files))
567
# Find what entries we are looking for, which occur before and
568
# after this first record.
571
first_path = first_fields[1] + '/' + first_fields[2]
573
first_path = first_fields[2]
574
first_loc = _bisect_path_left(cur_files, first_path)
576
# These exist before the current location
577
pre = cur_files[:first_loc]
578
# These occur after the current location, which may be in the
579
# data we read, or might be after the last entry
580
post = cur_files[first_loc:]
582
if post and len(first_fields) >= entry_field_count:
583
# We have files after the first entry
585
# Parse the last entry
586
last_entry_num = len(entries)-1
587
last_fields = entries[last_entry_num].split('\0')
588
if len(last_fields) < entry_field_count:
589
# The very last hunk was not complete,
590
# read the previous hunk
591
after = mid + len(block) - len(entries[-1])
593
last_fields = entries[last_entry_num].split('\0')
595
after = mid + len(block)
598
last_path = last_fields[1] + '/' + last_fields[2]
600
last_path = last_fields[2]
601
last_loc = _bisect_path_right(post, last_path)
603
middle_files = post[:last_loc]
604
post = post[last_loc:]
607
# We have files that should occur in this block
608
# (>= first, <= last)
609
# Either we will find them here, or we can mark them as
612
if middle_files[0] == first_path:
613
# We might need to go before this location
614
pre.append(first_path)
615
if middle_files[-1] == last_path:
616
post.insert(0, last_path)
618
# Find out what paths we have
619
paths = {first_path:[first_fields]}
620
# last_path might == first_path so we need to be
621
# careful if we should append rather than overwrite
622
if last_entry_num != first_entry_num:
623
paths.setdefault(last_path, []).append(last_fields)
624
for num in xrange(first_entry_num+1, last_entry_num):
625
# TODO: jam 20070223 We are already splitting here, so
626
# shouldn't we just split the whole thing rather
627
# than doing the split again in add_one_record?
628
fields = entries[num].split('\0')
630
path = fields[1] + '/' + fields[2]
633
paths.setdefault(path, []).append(fields)
635
for path in middle_files:
636
for fields in paths.get(path, []):
637
# offset by 1 because of the opening '\0'
638
# consider changing fields_to_entry to avoid the
640
entry = fields_to_entry(fields[1:])
641
found.setdefault(path, []).append(entry)
643
# Now we have split up everything into pre, middle, and post, and
644
# we have handled everything that fell in 'middle'.
645
# We add 'post' first, so that we prefer to seek towards the
646
# beginning, so that we will tend to go as early as we need, and
647
# then only seek forward after that.
649
pending.append((after, high, post))
651
pending.append((low, start-1, pre))
653
# Consider that we may want to return the directory entries in sorted
654
# order. For now, we just return them in whatever order we found them,
655
# and leave it up to the caller if they care if it is ordered or not.
658
def _bisect_dirblocks(self, dir_list):
659
"""Bisect through the disk structure to find entries in given dirs.
661
_bisect_dirblocks is meant to find the contents of directories, which
662
differs from _bisect, which only finds individual entries.
664
:param dir_list: A sorted list of directory names ['', 'dir', 'foo'].
665
:return: A map from dir => entries_for_dir
667
# TODO: jam 20070223 A lot of the bisecting logic could be shared
668
# between this and _bisect. It would require parameterizing the
669
# inner loop with a function, though. We should evaluate the
670
# performance difference.
671
self._requires_lock()
672
# We need the file pointer to be right after the initial header block
673
self._read_header_if_needed()
674
# If _dirblock_state was in memory, we should just return info from
675
# there, this function is only meant to handle when we want to read
677
assert self._dirblock_state == DirState.NOT_IN_MEMORY
679
# The disk representation is generally info + '\0\n\0' at the end. But
680
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
681
# Because it means we can sync on the '\n'
682
state_file = self._state_file
683
file_size = os.fstat(state_file.fileno()).st_size
684
# We end up with 2 extra fields, we should have a trailing '\n' to
685
# ensure that we read the whole record, and we should have a precursur
686
# '' which ensures that we start after the previous '\n'
687
entry_field_count = self._fields_per_entry() + 1
689
low = self._end_of_header
690
high = file_size - 1 # Ignore the final '\0'
691
# Map from dir => entry
694
# Avoid infinite seeking
695
max_count = 30*len(dir_list)
697
# pending is a list of places to look.
698
# each entry is a tuple of low, high, dir_names
699
# low -> the first byte offset to read (inclusive)
700
# high -> the last byte offset (inclusive)
701
# dirs -> The list of directories that should be found in
702
# the [low, high] range
703
pending = [(low, high, dir_list)]
705
page_size = self._bisect_page_size
707
fields_to_entry = self._get_fields_to_entry()
710
low, high, cur_dirs = pending.pop()
712
if not cur_dirs or low >= high:
717
if count > max_count:
718
raise errors.BzrError('Too many seeks, most likely a bug.')
720
mid = max(low, (low+high-page_size)/2)
723
# limit the read size, so we don't end up reading data that we have
725
read_size = min(page_size, (high-mid)+1)
726
block = state_file.read(read_size)
729
entries = block.split('\n')
732
# We didn't find a '\n', so we cannot have found any records.
733
# So put this range back and try again. But we know we have to
734
# increase the page size, because a single read did not contain
735
# a record break (so records must be larger than page_size)
737
pending.append((low, high, cur_dirs))
740
# Check the first and last entries, in case they are partial, or if
741
# we don't care about the rest of this page
743
first_fields = entries[0].split('\0')
744
if len(first_fields) < entry_field_count:
745
# We didn't get the complete first entry
746
# so move start, and grab the next, which
747
# should be a full entry
748
start += len(entries[0])+1
749
first_fields = entries[1].split('\0')
752
if len(first_fields) <= 1:
753
# We didn't even get a dirname here... what do we do?
754
# Try a large page size and repeat this query
756
pending.append((low, high, cur_dirs))
759
# Find what entries we are looking for, which occur before and
760
# after this first record.
762
first_dir = first_fields[1]
763
first_loc = bisect.bisect_left(cur_dirs, first_dir)
765
# These exist before the current location
766
pre = cur_dirs[:first_loc]
767
# These occur after the current location, which may be in the
768
# data we read, or might be after the last entry
769
post = cur_dirs[first_loc:]
771
if post and len(first_fields) >= entry_field_count:
772
# We have records to look at after the first entry
774
# Parse the last entry
775
last_entry_num = len(entries)-1
776
last_fields = entries[last_entry_num].split('\0')
777
if len(last_fields) < entry_field_count:
778
# The very last hunk was not complete,
779
# read the previous hunk
780
after = mid + len(block) - len(entries[-1])
782
last_fields = entries[last_entry_num].split('\0')
784
after = mid + len(block)
786
last_dir = last_fields[1]
787
last_loc = bisect.bisect_right(post, last_dir)
789
middle_files = post[:last_loc]
790
post = post[last_loc:]
793
# We have files that should occur in this block
794
# (>= first, <= last)
795
# Either we will find them here, or we can mark them as
798
if middle_files[0] == first_dir:
799
# We might need to go before this location
800
pre.append(first_dir)
801
if middle_files[-1] == last_dir:
802
post.insert(0, last_dir)
804
# Find out what paths we have
805
paths = {first_dir:[first_fields]}
806
# last_dir might == first_dir so we need to be
807
# careful if we should append rather than overwrite
808
if last_entry_num != first_entry_num:
809
paths.setdefault(last_dir, []).append(last_fields)
810
for num in xrange(first_entry_num+1, last_entry_num):
811
# TODO: jam 20070223 We are already splitting here, so
812
# shouldn't we just split the whole thing rather
813
# than doing the split again in add_one_record?
814
fields = entries[num].split('\0')
815
paths.setdefault(fields[1], []).append(fields)
817
for cur_dir in middle_files:
818
for fields in paths.get(cur_dir, []):
819
# offset by 1 because of the opening '\0'
820
# consider changing fields_to_entry to avoid the
822
entry = fields_to_entry(fields[1:])
823
found.setdefault(cur_dir, []).append(entry)
825
# Now we have split up everything into pre, middle, and post, and
826
# we have handled everything that fell in 'middle'.
827
# We add 'post' first, so that we prefer to seek towards the
828
# beginning, so that we will tend to go as early as we need, and
829
# then only seek forward after that.
831
pending.append((after, high, post))
833
pending.append((low, start-1, pre))
837
def _bisect_recursive(self, paths):
838
"""Bisect for entries for all paths and their children.
840
This will use bisect to find all records for the supplied paths. It
841
will then continue to bisect for any records which are marked as
842
directories. (and renames?)
844
:param paths: A sorted list of (dir, name) pairs
845
eg: [('', 'a'), ('', 'f'), ('a/b', 'c')]
846
:return: A dictionary mapping (dir, name, file_id) => [tree_info]
848
# Map from (dir, name, file_id) => [tree_info]
851
found_dir_names = set()
853
# Directories that have been read
854
processed_dirs = set()
855
# Get the ball rolling with the first bisect for all entries.
856
newly_found = self._bisect(paths)
859
# Directories that need to be read
861
paths_to_search = set()
862
for entry_list in newly_found.itervalues():
863
for dir_name_id, trees_info in entry_list:
864
found[dir_name_id] = trees_info
865
found_dir_names.add(dir_name_id[:2])
867
for tree_info in trees_info:
868
minikind = tree_info[0]
871
# We already processed this one as a directory,
872
# we don't need to do the extra work again.
874
subdir, name, file_id = dir_name_id
875
path = osutils.pathjoin(subdir, name)
877
if path not in processed_dirs:
878
pending_dirs.add(path)
879
elif minikind == 'r':
880
# Rename, we need to directly search the target
881
# which is contained in the fingerprint column
882
dir_name = osutils.split(tree_info[1])
883
if dir_name[0] in pending_dirs:
884
# This entry will be found in the dir search
886
if dir_name not in found_dir_names:
887
paths_to_search.add(tree_info[1])
888
# Now we have a list of paths to look for directly, and
889
# directory blocks that need to be read.
890
# newly_found is mixing the keys between (dir, name) and path
891
# entries, but that is okay, because we only really care about the
893
newly_found = self._bisect(sorted(paths_to_search))
894
newly_found.update(self._bisect_dirblocks(sorted(pending_dirs)))
895
processed_dirs.update(pending_dirs)
898
def _discard_merge_parents(self):
899
"""Discard any parents trees beyond the first.
901
Note that if this fails the dirstate is corrupted.
903
After this function returns the dirstate contains 2 trees, neither of
906
self._read_header_if_needed()
907
parents = self.get_parent_ids()
910
# only require all dirblocks if we are doing a full-pass removal.
911
self._read_dirblocks_if_needed()
912
dead_patterns = set([('a', 'r'), ('a', 'a'), ('r', 'r'), ('r', 'a')])
913
def iter_entries_removable():
914
for block in self._dirblocks:
915
deleted_positions = []
916
for pos, entry in enumerate(block[1]):
918
if (entry[1][0][0], entry[1][1][0]) in dead_patterns:
919
deleted_positions.append(pos)
920
if deleted_positions:
921
if len(deleted_positions) == len(block[1]):
924
for pos in reversed(deleted_positions):
926
# if the first parent is a ghost:
927
if parents[0] in self.get_ghosts():
928
empty_parent = [DirState.NULL_PARENT_DETAILS]
929
for entry in iter_entries_removable():
930
entry[1][1:] = empty_parent
932
for entry in iter_entries_removable():
936
self._parents = [parents[0]]
937
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
938
self._header_state = DirState.IN_MEMORY_MODIFIED
940
def _empty_parent_info(self):
941
return [DirState.NULL_PARENT_DETAILS] * (len(self._parents) -
944
def _ensure_block(self, parent_block_index, parent_row_index, dirname):
945
"""Ensure a block for dirname exists.
947
This function exists to let callers which know that there is a
948
directory dirname ensure that the block for it exists. This block can
949
fail to exist because of demand loading, or because a directory had no
950
children. In either case it is not an error. It is however an error to
951
call this if there is no parent entry for the directory, and thus the
952
function requires the coordinates of such an entry to be provided.
954
The root row is special cased and can be indicated with a parent block
957
:param parent_block_index: The index of the block in which dirname's row
959
:param parent_row_index: The index in the parent block where the row
961
:param dirname: The utf8 dirname to ensure there is a block for.
962
:return: The index for the block.
964
if dirname == '' and parent_row_index == 0 and parent_block_index == 0:
965
# This is the signature of the root row, and the
966
# contents-of-root row is always index 1
968
# the basename of the directory must be the end of its full name.
969
if not (parent_block_index == -1 and
970
parent_block_index == -1 and dirname == ''):
971
assert dirname.endswith(
972
self._dirblocks[parent_block_index][1][parent_row_index][0][1])
973
block_index, present = self._find_block_index_from_key((dirname, '', ''))
975
## In future, when doing partial parsing, this should load and
976
# populate the entire block.
977
self._dirblocks.insert(block_index, (dirname, []))
980
def _entries_to_current_state(self, new_entries):
981
"""Load new_entries into self.dirblocks.
983
Process new_entries into the current state object, making them the active
984
state. The entries are grouped together by directory to form dirblocks.
986
:param new_entries: A sorted list of entries. This function does not sort
987
to prevent unneeded overhead when callers have a sorted list already.
990
assert new_entries[0][0][0:2] == ('', ''), \
991
"Missing root row %r" % (new_entries[0][0],)
992
# The two blocks here are deliberate: the root block and the
993
# contents-of-root block.
994
self._dirblocks = [('', []), ('', [])]
995
current_block = self._dirblocks[0][1]
998
append_entry = current_block.append
999
for entry in new_entries:
1000
if entry[0][0] != current_dirname:
1001
# new block - different dirname
1003
current_dirname = entry[0][0]
1004
self._dirblocks.append((current_dirname, current_block))
1005
append_entry = current_block.append
1006
# append the entry to the current block
1008
self._split_root_dirblock_into_contents()
1010
def _split_root_dirblock_into_contents(self):
1011
"""Split the root dirblocks into root and contents-of-root.
1013
After parsing by path, we end up with root entries and contents-of-root
1014
entries in the same block. This loop splits them out again.
1016
# The above loop leaves the "root block" entries mixed with the
1017
# "contents-of-root block". But we don't want an if check on
1018
# all entries, so instead we just fix it up here.
1019
assert self._dirblocks[1] == ('', [])
1021
contents_of_root_block = []
1022
for entry in self._dirblocks[0][1]:
1023
if not entry[0][1]: # This is a root entry
1024
root_block.append(entry)
1026
contents_of_root_block.append(entry)
1027
self._dirblocks[0] = ('', root_block)
1028
self._dirblocks[1] = ('', contents_of_root_block)
1030
def _entry_to_line(self, entry):
1031
"""Serialize entry to a NULL delimited line ready for _get_output_lines.
1033
:param entry: An entry_tuple as defined in the module docstring.
1035
entire_entry = list(entry[0])
1036
for tree_number, tree_data in enumerate(entry[1]):
1037
# (minikind, fingerprint, size, executable, tree_specific_string)
1038
entire_entry.extend(tree_data)
1039
# 3 for the key, 5 for the fields per tree.
1040
tree_offset = 3 + tree_number * 5
1042
entire_entry[tree_offset + 0] = tree_data[0]
1044
entire_entry[tree_offset + 2] = str(tree_data[2])
1046
entire_entry[tree_offset + 3] = DirState._to_yesno[tree_data[3]]
1047
return '\0'.join(entire_entry)
1049
def _fields_per_entry(self):
1050
"""How many null separated fields should be in each entry row.
1052
Each line now has an extra '\n' field which is not used
1053
so we just skip over it
1055
3 fields for the key
1056
+ number of fields per tree_data (5) * tree count
1059
tree_count = 1 + self._num_present_parents()
1060
return 3 + 5 * tree_count + 1
1062
def _find_block(self, key, add_if_missing=False):
1063
"""Return the block that key should be present in.
1065
:param key: A dirstate entry key.
1066
:return: The block tuple.
1068
block_index, present = self._find_block_index_from_key(key)
1070
if not add_if_missing:
1071
# check to see if key is versioned itself - we might want to
1072
# add it anyway, because dirs with no entries dont get a
1073
# dirblock at parse time.
1074
# This is an uncommon branch to take: most dirs have children,
1075
# and most code works with versioned paths.
1076
parent_base, parent_name = osutils.split(key[0])
1077
if not self._get_block_entry_index(parent_base, parent_name, 0)[3]:
1078
# some parent path has not been added - its an error to add
1080
raise errors.NotVersionedError(key[0:2], str(self))
1081
self._dirblocks.insert(block_index, (key[0], []))
1082
return self._dirblocks[block_index]
1084
def _find_block_index_from_key(self, key):
1085
"""Find the dirblock index for a key.
1087
:return: The block index, True if the block for the key is present.
1089
if key[0:2] == ('', ''):
1092
if (self._last_block_index is not None and
1093
self._dirblocks[self._last_block_index][0] == key[0]):
1094
return self._last_block_index, True
1097
block_index = bisect_dirblock(self._dirblocks, key[0], 1,
1098
cache=self._split_path_cache)
1099
# _right returns one-past-where-key is so we have to subtract
1100
# one to use it. we use _right here because there are two
1101
# '' blocks - the root, and the contents of root
1102
# we always have a minimum of 2 in self._dirblocks: root and
1103
# root-contents, and for '', we get 2 back, so this is
1104
# simple and correct:
1105
present = (block_index < len(self._dirblocks) and
1106
self._dirblocks[block_index][0] == key[0])
1107
self._last_block_index = block_index
1108
# Reset the entry index cache to the beginning of the block.
1109
self._last_entry_index = -1
1110
return block_index, present
1112
def _find_entry_index(self, key, block):
1113
"""Find the entry index for a key in a block.
1115
:return: The entry index, True if the entry for the key is present.
1117
len_block = len(block)
1119
if self._last_entry_index is not None:
1121
entry_index = self._last_entry_index + 1
1122
# A hit is when the key is after the last slot, and before or
1123
# equal to the next slot.
1124
if ((entry_index > 0 and block[entry_index - 1][0] < key) and
1125
key <= block[entry_index][0]):
1126
self._last_entry_index = entry_index
1127
present = (block[entry_index][0] == key)
1128
return entry_index, present
1131
entry_index = bisect.bisect_left(block, (key, []))
1132
present = (entry_index < len_block and
1133
block[entry_index][0] == key)
1134
self._last_entry_index = entry_index
1135
return entry_index, present
1138
def from_tree(tree, dir_state_filename):
1139
"""Create a dirstate from a bzr Tree.
1141
:param tree: The tree which should provide parent information and
1143
:return: a DirState object which is currently locked for writing.
1144
(it was locked by DirState.initialize)
1146
result = DirState.initialize(dir_state_filename)
1150
parent_ids = tree.get_parent_ids()
1151
num_parents = len(parent_ids)
1153
for parent_id in parent_ids:
1154
parent_tree = tree.branch.repository.revision_tree(parent_id)
1155
parent_trees.append((parent_id, parent_tree))
1156
parent_tree.lock_read()
1157
result.set_parent_trees(parent_trees, [])
1158
result.set_state_from_inventory(tree.inventory)
1160
for revid, parent_tree in parent_trees:
1161
parent_tree.unlock()
1164
# The caller won't have a chance to unlock this, so make sure we
1170
def update_by_delta(self, delta):
1171
"""Apply an inventory delta to the dirstate for tree 0
1173
:param delta: An inventory delta. See Inventory.apply_delta for
1176
self._read_dirblocks_if_needed()
1179
for old_path, new_path, file_id, inv_entry in sorted(delta,
1181
assert file_id not in insertions
1182
assert file_id not in removals
1183
if old_path is not None:
1184
old_path = old_path.encode('utf-8')
1185
removals[file_id] = old_path
1186
if new_path is not None:
1187
new_path = new_path.encode('utf-8')
1188
dirname, basename = osutils.split(new_path)
1189
key = (dirname, basename, file_id)
1190
minikind = DirState._kind_to_minikind[inv_entry.kind]
1192
fingerprint = inv_entry.reference_revision
1195
insertions[file_id] = (key, minikind, inv_entry.executable,
1196
fingerprint, new_path)
1197
if None not in (old_path, new_path):
1198
for child in self._iter_child_entries(0, old_path):
1199
if child[0][2] in insertions or child[0][2] in removals:
1201
child_dirname = child[0][0]
1202
child_basename = child[0][1]
1203
minikind = child[1][0][0]
1204
fingerprint = child[1][0][4]
1205
executable = child[1][0][3]
1206
old_child_path = osutils.pathjoin(child[0][0],
1208
removals[child[0][2]] = old_child_path
1209
child_suffix = child_dirname[len(old_path):]
1210
new_child_dirname = (new_path + child_suffix)
1211
key = (new_child_dirname, child_basename, child[0][2])
1212
new_child_path = os.path.join(new_child_dirname,
1214
insertions[child[0][2]] = (key, minikind, executable,
1215
fingerprint, new_child_path)
1216
self._apply_removals(removals.values())
1217
self._apply_insertions(insertions.values())
1219
def _apply_removals(self, removals):
1220
for path in sorted(removals, reverse=True):
1221
dirname, basename = osutils.split(path)
1222
block_i, entry_i, d_present, f_present = \
1223
self._get_block_entry_index(dirname, basename, 0)
1224
entry = self._dirblocks[block_i][1][entry_i]
1225
self._make_absent(entry)
1227
def _apply_insertions(self, adds):
1228
for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
1229
self.update_minimal(key, minikind, executable, fingerprint,
1230
path_utf8=path_utf8)
1232
def update_basis_by_delta(self, delta, new_revid):
1233
"""Update the parents of this tree after a commit.
1235
This gives the tree one parent, with revision id new_revid. The
1236
inventory delta is applied to the current basis tree to generate the
1237
inventory for the parent new_revid, and all other parent trees are
1240
Note that an exception during the operation of this method will leave
1241
the dirstate in a corrupt state where it should not be saved.
1243
Finally, we expect all changes to be synchronising the basis tree with
1246
:param new_revid: The new revision id for the trees parent.
1247
:param delta: An inventory delta (see apply_inventory_delta) describing
1248
the changes from the current left most parent revision to new_revid.
1250
self._read_dirblocks_if_needed()
1251
self._discard_merge_parents()
1252
if self._ghosts != []:
1253
raise NotImplementedError(self.update_basis_by_delta)
1254
if len(self._parents) == 0:
1255
# setup a blank tree, the most simple way.
1256
empty_parent = DirState.NULL_PARENT_DETAILS
1257
for entry in self._iter_entries():
1258
entry[1].append(empty_parent)
1259
self._parents.append(new_revid)
1261
self._parents[0] = new_revid
1263
delta = sorted(delta, reverse=True)
1267
# The paths this function accepts are unicode and must be encoded as we
1269
encode = cache_utf8.encode
1270
inv_to_entry = self._inv_entry_to_details
1271
# delta is now (deletes, changes), (adds) in reverse lexographical
1273
# deletes in reverse lexographic order are safe to process in situ.
1274
# renames are not, as a rename from any path could go to a path
1275
# lexographically lower, so we transform renames into delete, add pairs,
1276
# expanding them recursively as needed.
1277
# At the same time, to reduce interface friction we convert the input
1278
# inventory entries to dirstate.
1279
root_only = ('', '')
1280
for old_path, new_path, file_id, inv_entry in delta:
1281
if old_path is None:
1282
adds.append((None, encode(new_path), file_id,
1283
inv_to_entry(inv_entry), True))
1284
elif new_path is None:
1285
deletes.append((encode(old_path), None, file_id, None, True))
1286
elif (old_path, new_path) != root_only:
1288
# Because renames must preserve their children we must have
1289
# processed all relocations and removes before hand. The sort
1290
# order ensures we've examined the child paths, but we also
1291
# have to execute the removals, or the split to an add/delete
1292
# pair will result in the deleted item being reinserted, or
1293
# renamed items being reinserted twice - and possibly at the
1294
# wrong place. Splitting into a delete/add pair also simplifies
1295
# the handling of entries with ('f', ...), ('r' ...) because
1296
# the target of the 'r' is old_path here, and we add that to
1297
# deletes, meaning that the add handler does not need to check
1298
# for 'r' items on every pass.
1299
self._update_basis_apply_deletes(deletes)
1301
new_path_utf8 = encode(new_path)
1302
# Split into an add/delete pair recursively.
1303
adds.append((None, new_path_utf8, file_id,
1304
inv_to_entry(inv_entry), False))
1305
# Expunge deletes that we've seen so that deleted/renamed
1306
# children of a rename directory are handled correctly.
1307
new_deletes = reversed(list(self._iter_child_entries(1,
1309
# Remove the current contents of the tree at orig_path, and
1310
# reinsert at the correct new path.
1311
for entry in new_deletes:
1313
source_path = entry[0][0] + '/' + entry[0][1]
1315
source_path = entry[0][1]
1317
target_path = new_path_utf8 + source_path[len(old_path):]
1319
assert len(old_path) > 0, ("cannot rename directory to"
1321
target_path = source_path[len(old_path) + 1:]
1322
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1324
(source_path, target_path, entry[0][2], None, False))
1326
(encode(old_path), new_path, file_id, None, False))
1328
# changes to just the root should not require remove/insertion
1330
changes.append((encode(old_path), encode(new_path), file_id,
1331
inv_to_entry(inv_entry)))
1333
# Finish expunging deletes/first half of renames.
1334
self._update_basis_apply_deletes(deletes)
1335
# Reinstate second half of renames and new paths.
1336
self._update_basis_apply_adds(adds)
1337
# Apply in-situ changes.
1338
self._update_basis_apply_changes(changes)
1340
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1341
self._header_state = DirState.IN_MEMORY_MODIFIED
1342
self._id_index = None
1345
def _update_basis_apply_adds(self, adds):
1346
"""Apply a sequence of adds to tree 1 during update_basis_by_delta.
1348
They may be adds, or renames that have been split into add/delete
1351
:param adds: A sequence of adds. Each add is a tuple:
1352
(None, new_path_utf8, file_id, (entry_details), real_add). real_add
1353
is False when the add is the second half of a remove-and-reinsert
1354
pair created to handle renames and deletes.
1356
# Adds are accumulated partly from renames, so can be in any input
1359
# adds is now in lexographic order, which places all parents before
1360
# their children, so we can process it linearly.
1362
for old_path, new_path, file_id, new_details, real_add in adds:
1363
assert old_path is None
1364
# the entry for this file_id must be in tree 0.
1365
entry = self._get_entry(0, file_id, new_path)
1366
if entry[0] is None or entry[0][2] != file_id:
1367
self._changes_aborted = True
1368
raise errors.InconsistentDelta(new_path, file_id,
1369
'working tree does not contain new entry')
1370
if real_add and entry[1][1][0] not in absent:
1371
self._changes_aborted = True
1372
raise errors.InconsistentDelta(new_path, file_id,
1373
'The entry was considered to be a genuinely new record,'
1374
' but there was already an old record for it.')
1375
# We don't need to update the target of an 'r' because the handling
1376
# of renames turns all 'r' situations into a delete at the original
1378
entry[1][1] = new_details
1380
def _update_basis_apply_changes(self, changes):
1381
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1383
:param adds: A sequence of changes. Each change is a tuple:
1384
(path_utf8, path_utf8, file_id, (entry_details))
1387
for old_path, new_path, file_id, new_details in changes:
1388
assert old_path == new_path
1389
# the entry for this file_id must be in tree 0.
1390
entry = self._get_entry(0, file_id, new_path)
1391
if entry[0] is None or entry[0][2] != file_id:
1392
self._changes_aborted = True
1393
raise errors.InconsistentDelta(new_path, file_id,
1394
'working tree does not contain new entry')
1395
if (entry[1][0][0] in absent or
1396
entry[1][1][0] in absent):
1397
self._changes_aborted = True
1398
raise errors.InconsistentDelta(new_path, file_id,
1399
'changed considered absent')
1400
entry[1][1] = new_details
1402
def _update_basis_apply_deletes(self, deletes):
1403
"""Apply a sequence of deletes to tree 1 during update_basis_by_delta.
1405
They may be deletes, or renames that have been split into add/delete
1408
:param deletes: A sequence of deletes. Each delete is a tuple:
1409
(old_path_utf8, new_path_utf8, file_id, None, real_delete).
1410
real_delete is True when the desired outcome is an actual deletion
1411
rather than the rename handling logic temporarily deleting a path
1412
during the replacement of a parent.
1414
null = DirState.NULL_PARENT_DETAILS
1415
for old_path, new_path, file_id, _, real_delete in deletes:
1417
assert new_path is None
1419
assert new_path is not None
1420
# the entry for this file_id must be in tree 1.
1421
dirname, basename = osutils.split(old_path)
1422
block_index, entry_index, dir_present, file_present = \
1423
self._get_block_entry_index(dirname, basename, 1)
1424
if not file_present:
1425
self._changes_aborted = True
1426
raise errors.InconsistentDelta(old_path, file_id,
1427
'basis tree does not contain removed entry')
1428
entry = self._dirblocks[block_index][1][entry_index]
1429
if entry[0][2] != file_id:
1430
self._changes_aborted = True
1431
raise errors.InconsistentDelta(old_path, file_id,
1432
'mismatched file_id in tree 1')
1434
if entry[1][0][0] != 'a':
1435
self._changes_aborted = True
1436
raise errors.InconsistentDelta(old_path, file_id,
1437
'This was marked as a real delete, but the WT state'
1438
' claims that it still exists and is versioned.')
1439
del self._dirblocks[block_index][1][entry_index]
1441
if entry[1][0][0] == 'a':
1442
self._changes_aborted = True
1443
raise errors.InconsistentDelta(old_path, file_id,
1444
'The entry was considered a rename, but the source path'
1445
' is marked as absent.')
1446
# For whatever reason, we were asked to rename an entry
1447
# that was originally marked as deleted. This could be
1448
# because we are renaming the parent directory, and the WT
1449
# current state has the file marked as deleted.
1450
elif entry[1][0][0] == 'r':
1451
# implement the rename
1452
del self._dirblocks[block_index][1][entry_index]
1454
# it is being resurrected here, so blank it out temporarily.
1455
self._dirblocks[block_index][1][entry_index][1][1] = null
1457
def update_entry(self, entry, abspath, stat_value,
1458
_stat_to_minikind=_stat_to_minikind,
1459
_pack_stat=pack_stat):
1460
"""Update the entry based on what is actually on disk.
1462
:param entry: This is the dirblock entry for the file in question.
1463
:param abspath: The path on disk for this file.
1464
:param stat_value: (optional) if we already have done a stat on the
1466
:return: The sha1 hexdigest of the file (40 bytes) or link target of a
1470
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
1474
packed_stat = _pack_stat(stat_value)
1475
(saved_minikind, saved_link_or_sha1, saved_file_size,
1476
saved_executable, saved_packed_stat) = entry[1][0]
1478
if (minikind == saved_minikind
1479
and packed_stat == saved_packed_stat):
1480
# The stat hasn't changed since we saved, so we can re-use the
1485
# size should also be in packed_stat
1486
if saved_file_size == stat_value.st_size:
1487
return saved_link_or_sha1
1489
# If we have gotten this far, that means that we need to actually
1490
# process this entry.
1493
link_or_sha1 = self._sha1_file(abspath)
1494
executable = self._is_executable(stat_value.st_mode,
1496
if self._cutoff_time is None:
1497
self._sha_cutoff_time()
1498
if (stat_value.st_mtime < self._cutoff_time
1499
and stat_value.st_ctime < self._cutoff_time):
1500
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
1501
executable, packed_stat)
1503
entry[1][0] = ('f', '', stat_value.st_size,
1504
executable, DirState.NULLSTAT)
1505
elif minikind == 'd':
1507
entry[1][0] = ('d', '', 0, False, packed_stat)
1508
if saved_minikind != 'd':
1509
# This changed from something into a directory. Make sure we
1510
# have a directory block for it. This doesn't happen very
1511
# often, so this doesn't have to be super fast.
1512
block_index, entry_index, dir_present, file_present = \
1513
self._get_block_entry_index(entry[0][0], entry[0][1], 0)
1514
self._ensure_block(block_index, entry_index,
1515
osutils.pathjoin(entry[0][0], entry[0][1]))
1516
elif minikind == 'l':
1517
link_or_sha1 = self._read_link(abspath, saved_link_or_sha1)
1518
if self._cutoff_time is None:
1519
self._sha_cutoff_time()
1520
if (stat_value.st_mtime < self._cutoff_time
1521
and stat_value.st_ctime < self._cutoff_time):
1522
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
1525
entry[1][0] = ('l', '', stat_value.st_size,
1526
False, DirState.NULLSTAT)
1527
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1530
def _sha_cutoff_time(self):
1531
"""Return cutoff time.
1533
Files modified more recently than this time are at risk of being
1534
undetectably modified and so can't be cached.
1536
# Cache the cutoff time as long as we hold a lock.
1537
# time.time() isn't super expensive (approx 3.38us), but
1538
# when you call it 50,000 times it adds up.
1539
# For comparison, os.lstat() costs 7.2us if it is hot.
1540
self._cutoff_time = int(time.time()) - 3
1541
return self._cutoff_time
1543
def _lstat(self, abspath, entry):
1544
"""Return the os.lstat value for this path."""
1545
return os.lstat(abspath)
1547
def _sha1_file_and_mutter(self, abspath):
1548
# when -Dhashcache is turned on, this is monkey-patched in to log
1550
trace.mutter("dirstate sha1 " + abspath)
1551
return filters.sha_file_by_name(abspath)
1553
def _is_executable(self, mode, old_executable):
1554
"""Is this file executable?"""
1555
return bool(S_IEXEC & mode)
1557
def _is_executable_win32(self, mode, old_executable):
1558
"""On win32 the executable bit is stored in the dirstate."""
1559
return old_executable
1561
if sys.platform == 'win32':
1562
_is_executable = _is_executable_win32
1564
def _read_link(self, abspath, old_link):
1565
"""Read the target of a symlink"""
1566
# TODO: jam 200700301 On Win32, this could just return the value
1567
# already in memory. However, this really needs to be done at a
1568
# higher level, because there either won't be anything on disk,
1569
# or the thing on disk will be a file.
1570
return os.readlink(abspath)
1572
def get_ghosts(self):
1573
"""Return a list of the parent tree revision ids that are ghosts."""
1574
self._read_header_if_needed()
1577
def get_lines(self):
1578
"""Serialise the entire dirstate to a sequence of lines."""
1579
if (self._header_state == DirState.IN_MEMORY_UNMODIFIED and
1580
self._dirblock_state == DirState.IN_MEMORY_UNMODIFIED):
1581
# read whats on disk.
1582
self._state_file.seek(0)
1583
return self._state_file.readlines()
1585
lines.append(self._get_parents_line(self.get_parent_ids()))
1586
lines.append(self._get_ghosts_line(self._ghosts))
1587
# append the root line which is special cased
1588
lines.extend(map(self._entry_to_line, self._iter_entries()))
1589
return self._get_output_lines(lines)
1591
def _get_ghosts_line(self, ghost_ids):
1592
"""Create a line for the state file for ghost information."""
1593
return '\0'.join([str(len(ghost_ids))] + ghost_ids)
1595
def _get_parents_line(self, parent_ids):
1596
"""Create a line for the state file for parents information."""
1597
return '\0'.join([str(len(parent_ids))] + parent_ids)
1599
def _get_fields_to_entry(self):
1600
"""Get a function which converts entry fields into a entry record.
1602
This handles size and executable, as well as parent records.
1604
:return: A function which takes a list of fields, and returns an
1605
appropriate record for storing in memory.
1607
# This is intentionally unrolled for performance
1608
num_present_parents = self._num_present_parents()
1609
if num_present_parents == 0:
1610
def fields_to_entry_0_parents(fields, _int=int):
1611
path_name_file_id_key = (fields[0], fields[1], fields[2])
1612
return (path_name_file_id_key, [
1614
fields[3], # minikind
1615
fields[4], # fingerprint
1616
_int(fields[5]), # size
1617
fields[6] == 'y', # executable
1618
fields[7], # packed_stat or revision_id
1620
return fields_to_entry_0_parents
1621
elif num_present_parents == 1:
1622
def fields_to_entry_1_parent(fields, _int=int):
1623
path_name_file_id_key = (fields[0], fields[1], fields[2])
1624
return (path_name_file_id_key, [
1626
fields[3], # minikind
1627
fields[4], # fingerprint
1628
_int(fields[5]), # size
1629
fields[6] == 'y', # executable
1630
fields[7], # packed_stat or revision_id
1633
fields[8], # minikind
1634
fields[9], # fingerprint
1635
_int(fields[10]), # size
1636
fields[11] == 'y', # executable
1637
fields[12], # packed_stat or revision_id
1640
return fields_to_entry_1_parent
1641
elif num_present_parents == 2:
1642
def fields_to_entry_2_parents(fields, _int=int):
1643
path_name_file_id_key = (fields[0], fields[1], fields[2])
1644
return (path_name_file_id_key, [
1646
fields[3], # minikind
1647
fields[4], # fingerprint
1648
_int(fields[5]), # size
1649
fields[6] == 'y', # executable
1650
fields[7], # packed_stat or revision_id
1653
fields[8], # minikind
1654
fields[9], # fingerprint
1655
_int(fields[10]), # size
1656
fields[11] == 'y', # executable
1657
fields[12], # packed_stat or revision_id
1660
fields[13], # minikind
1661
fields[14], # fingerprint
1662
_int(fields[15]), # size
1663
fields[16] == 'y', # executable
1664
fields[17], # packed_stat or revision_id
1667
return fields_to_entry_2_parents
1669
def fields_to_entry_n_parents(fields, _int=int):
1670
path_name_file_id_key = (fields[0], fields[1], fields[2])
1671
trees = [(fields[cur], # minikind
1672
fields[cur+1], # fingerprint
1673
_int(fields[cur+2]), # size
1674
fields[cur+3] == 'y', # executable
1675
fields[cur+4], # stat or revision_id
1676
) for cur in xrange(3, len(fields)-1, 5)]
1677
return path_name_file_id_key, trees
1678
return fields_to_entry_n_parents
1680
def get_parent_ids(self):
1681
"""Return a list of the parent tree ids for the directory state."""
1682
self._read_header_if_needed()
1683
return list(self._parents)
1685
def _get_block_entry_index(self, dirname, basename, tree_index):
1686
"""Get the coordinates for a path in the state structure.
1688
:param dirname: The utf8 dirname to lookup.
1689
:param basename: The utf8 basename to lookup.
1690
:param tree_index: The index of the tree for which this lookup should
1692
:return: A tuple describing where the path is located, or should be
1693
inserted. The tuple contains four fields: the block index, the row
1694
index, the directory is present (boolean), the entire path is
1695
present (boolean). There is no guarantee that either
1696
coordinate is currently reachable unless the found field for it is
1697
True. For instance, a directory not present in the searched tree
1698
may be returned with a value one greater than the current highest
1699
block offset. The directory present field will always be True when
1700
the path present field is True. The directory present field does
1701
NOT indicate that the directory is present in the searched tree,
1702
rather it indicates that there are at least some files in some
1705
self._read_dirblocks_if_needed()
1706
key = dirname, basename, ''
1707
block_index, present = self._find_block_index_from_key(key)
1709
# no such directory - return the dir index and 0 for the row.
1710
return block_index, 0, False, False
1711
block = self._dirblocks[block_index][1] # access the entries only
1712
entry_index, present = self._find_entry_index(key, block)
1713
# linear search through entries at this path to find the one
1715
while entry_index < len(block) and block[entry_index][0][1] == basename:
1716
if block[entry_index][1][tree_index][0] not in 'ar':
1717
# neither absent or relocated
1718
return block_index, entry_index, True, True
1720
return block_index, entry_index, True, False
1722
def _get_entry(self, tree_index, fileid_utf8=None, path_utf8=None):
1723
"""Get the dirstate entry for path in tree tree_index.
1725
If either file_id or path is supplied, it is used as the key to lookup.
1726
If both are supplied, the fastest lookup is used, and an error is
1727
raised if they do not both point at the same row.
1729
:param tree_index: The index of the tree we wish to locate this path
1730
in. If the path is present in that tree, the entry containing its
1731
details is returned, otherwise (None, None) is returned
1732
0 is the working tree, higher indexes are successive parent
1734
:param fileid_utf8: A utf8 file_id to look up.
1735
:param path_utf8: An utf8 path to be looked up.
1736
:return: The dirstate entry tuple for path, or (None, None)
1738
self._read_dirblocks_if_needed()
1739
if path_utf8 is not None:
1740
assert path_utf8.__class__ == str, ('path_utf8 is not a str: %s %s'
1741
% (type(path_utf8), path_utf8))
1742
# path lookups are faster
1743
dirname, basename = osutils.split(path_utf8)
1744
block_index, entry_index, dir_present, file_present = \
1745
self._get_block_entry_index(dirname, basename, tree_index)
1746
if not file_present:
1748
entry = self._dirblocks[block_index][1][entry_index]
1749
assert entry[0][2] and entry[1][tree_index][0] not in ('a', 'r'), 'unversioned entry?!?!'
1751
if entry[0][2] != fileid_utf8:
1752
self._changes_aborted = True
1753
raise errors.BzrError('integrity error ? : mismatching'
1754
' tree_index, file_id and path')
1757
assert fileid_utf8 is not None
1758
possible_keys = self._get_id_index().get(fileid_utf8, None)
1759
if not possible_keys:
1761
for key in possible_keys:
1762
block_index, present = \
1763
self._find_block_index_from_key(key)
1764
# strange, probably indicates an out of date
1765
# id index - for now, allow this.
1768
# WARNING: DO not change this code to use _get_block_entry_index
1769
# as that function is not suitable: it does not use the key
1770
# to lookup, and thus the wrong coordinates are returned.
1771
block = self._dirblocks[block_index][1]
1772
entry_index, present = self._find_entry_index(key, block)
1774
entry = self._dirblocks[block_index][1][entry_index]
1775
if entry[1][tree_index][0] in 'fdlt':
1776
# this is the result we are looking for: the
1777
# real home of this file_id in this tree.
1779
if entry[1][tree_index][0] == 'a':
1780
# there is no home for this entry in this tree
1782
assert entry[1][tree_index][0] == 'r', \
1783
"entry %r has invalid minikind %r for tree %r" \
1785
entry[1][tree_index][0],
1787
real_path = entry[1][tree_index][1]
1788
return self._get_entry(tree_index, fileid_utf8=fileid_utf8,
1789
path_utf8=real_path)
1793
def initialize(cls, path):
1794
"""Create a new dirstate on path.
1796
The new dirstate will be an empty tree - that is it has no parents,
1797
and only a root node - which has id ROOT_ID.
1799
:param path: The name of the file for the dirstate.
1800
:return: A write-locked DirState object.
1802
# This constructs a new DirState object on a path, sets the _state_file
1803
# to a new empty file for that path. It then calls _set_data() with our
1804
# stock empty dirstate information - a root with ROOT_ID, no children,
1805
# and no parents. Finally it calls save() to ensure that this data will
1808
# root dir and root dir contents with no children.
1809
empty_tree_dirblocks = [('', []), ('', [])]
1810
# a new root directory, with a NULLSTAT.
1811
empty_tree_dirblocks[0][1].append(
1812
(('', '', inventory.ROOT_ID), [
1813
('d', '', 0, False, DirState.NULLSTAT),
1817
result._set_data([], empty_tree_dirblocks)
1824
def _inv_entry_to_details(self, inv_entry):
1825
"""Convert an inventory entry (from a revision tree) to state details.
1827
:param inv_entry: An inventory entry whose sha1 and link targets can be
1828
relied upon, and which has a revision set.
1829
:return: A details tuple - the details for a single tree at a path +
1832
kind = inv_entry.kind
1833
minikind = DirState._kind_to_minikind[kind]
1834
tree_data = inv_entry.revision
1835
assert tree_data, 'empty revision for the inv_entry %s.' % \
1837
if kind == 'directory':
1841
elif kind == 'symlink':
1842
fingerprint = inv_entry.symlink_target or ''
1845
elif kind == 'file':
1846
fingerprint = inv_entry.text_sha1 or ''
1847
size = inv_entry.text_size or 0
1848
executable = inv_entry.executable
1849
elif kind == 'tree-reference':
1850
fingerprint = inv_entry.reference_revision or ''
1854
raise Exception("can't pack %s" % inv_entry)
1855
return (minikind, fingerprint, size, executable, tree_data)
1857
def _iter_child_entries(self, tree_index, path_utf8):
1858
"""Iterate over all the entries that are children of path_utf.
1860
This only returns entries that are present (not in 'a', 'r') in
1861
tree_index. tree_index data is not refreshed, so if tree 0 is used,
1862
results may differ from that obtained if paths were statted to
1863
determine what ones were directories.
1865
Asking for the children of a non-directory will return an empty
1869
next_pending_dirs = [path_utf8]
1871
while next_pending_dirs:
1872
pending_dirs = next_pending_dirs
1873
next_pending_dirs = []
1874
for path in pending_dirs:
1875
block_index, present = self._find_block_index_from_key(
1877
if block_index == 0:
1879
if len(self._dirblocks) == 1:
1880
# asked for the children of the root with no other
1884
# children of a non-directory asked for.
1886
block = self._dirblocks[block_index]
1887
for entry in block[1]:
1888
kind = entry[1][tree_index][0]
1889
if kind not in absent:
1893
path = entry[0][0] + '/' + entry[0][1]
1896
next_pending_dirs.append(path)
1898
def _iter_entries(self):
1899
"""Iterate over all the entries in the dirstate.
1901
Each yelt item is an entry in the standard format described in the
1902
docstring of bzrlib.dirstate.
1904
self._read_dirblocks_if_needed()
1905
for directory in self._dirblocks:
1906
for entry in directory[1]:
1909
def _get_id_index(self):
1910
"""Get an id index of self._dirblocks."""
1911
if self._id_index is None:
1913
for key, tree_details in self._iter_entries():
1914
id_index.setdefault(key[2], set()).add(key)
1915
self._id_index = id_index
1916
return self._id_index
1918
def _get_output_lines(self, lines):
1919
"""Format lines for final output.
1921
:param lines: A sequence of lines containing the parents list and the
1924
output_lines = [DirState.HEADER_FORMAT_3]
1925
lines.append('') # a final newline
1926
inventory_text = '\0\n\0'.join(lines)
1927
output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
1928
# -3, 1 for num parents, 1 for ghosts, 1 for final newline
1929
num_entries = len(lines)-3
1930
output_lines.append('num_entries: %s\n' % (num_entries,))
1931
output_lines.append(inventory_text)
1934
def _make_deleted_row(self, fileid_utf8, parents):
1935
"""Return a deleted row for fileid_utf8."""
1936
return ('/', 'RECYCLED.BIN', 'file', fileid_utf8, 0, DirState.NULLSTAT,
1939
def _num_present_parents(self):
1940
"""The number of parent entries in each record row."""
1941
return len(self._parents) - len(self._ghosts)
1945
"""Construct a DirState on the file at path path.
1947
:return: An unlocked DirState object, associated with the given path.
1949
result = DirState(path)
1952
def _read_dirblocks_if_needed(self):
1953
"""Read in all the dirblocks from the file if they are not in memory.
1955
This populates self._dirblocks, and sets self._dirblock_state to
1956
IN_MEMORY_UNMODIFIED. It is not currently ready for incremental block
1959
self._read_header_if_needed()
1960
if self._dirblock_state == DirState.NOT_IN_MEMORY:
1961
_read_dirblocks(self)
1963
def _read_header(self):
1964
"""This reads in the metadata header, and the parent ids.
1966
After reading in, the file should be positioned at the null
1967
just before the start of the first record in the file.
1969
:return: (expected crc checksum, number of entries, parent list)
1971
self._read_prelude()
1972
parent_line = self._state_file.readline()
1973
info = parent_line.split('\0')
1974
num_parents = int(info[0])
1975
assert num_parents == len(info)-2, 'incorrect parent info line'
1976
self._parents = info[1:-1]
1978
ghost_line = self._state_file.readline()
1979
info = ghost_line.split('\0')
1980
num_ghosts = int(info[1])
1981
assert num_ghosts == len(info)-3, 'incorrect ghost info line'
1982
self._ghosts = info[2:-1]
1983
self._header_state = DirState.IN_MEMORY_UNMODIFIED
1984
self._end_of_header = self._state_file.tell()
1986
def _read_header_if_needed(self):
1987
"""Read the header of the dirstate file if needed."""
1988
# inline this as it will be called a lot
1989
if not self._lock_token:
1990
raise errors.ObjectNotLocked(self)
1991
if self._header_state == DirState.NOT_IN_MEMORY:
1994
def _read_prelude(self):
1995
"""Read in the prelude header of the dirstate file.
1997
This only reads in the stuff that is not connected to the crc
1998
checksum. The position will be correct to read in the rest of
1999
the file and check the checksum after this point.
2000
The next entry in the file should be the number of parents,
2001
and their ids. Followed by a newline.
2003
header = self._state_file.readline()
2004
assert header == DirState.HEADER_FORMAT_3, \
2005
'invalid header line: %r' % (header,)
2006
crc_line = self._state_file.readline()
2007
assert crc_line.startswith('crc32: '), 'missing crc32 checksum'
2008
self.crc_expected = int(crc_line[len('crc32: '):-1])
2009
num_entries_line = self._state_file.readline()
2010
assert num_entries_line.startswith('num_entries: '), 'missing num_entries line'
2011
self._num_entries = int(num_entries_line[len('num_entries: '):-1])
2013
def sha1_from_stat(self, path, stat_result, _pack_stat=pack_stat):
2014
"""Find a sha1 given a stat lookup."""
2015
return self._get_packed_stat_index().get(_pack_stat(stat_result), None)
2017
def _get_packed_stat_index(self):
2018
"""Get a packed_stat index of self._dirblocks."""
2019
if self._packed_stat_index is None:
2021
for key, tree_details in self._iter_entries():
2022
if tree_details[0][0] == 'f':
2023
index[tree_details[0][4]] = tree_details[0][1]
2024
self._packed_stat_index = index
2025
return self._packed_stat_index
2028
"""Save any pending changes created during this session.
2030
We reuse the existing file, because that prevents race conditions with
2031
file creation, and use oslocks on it to prevent concurrent modification
2032
and reads - because dirstate's incremental data aggregation is not
2033
compatible with reading a modified file, and replacing a file in use by
2034
another process is impossible on Windows.
2036
A dirstate in read only mode should be smart enough though to validate
2037
that the file has not changed, and otherwise discard its cache and
2038
start over, to allow for fine grained read lock duration, so 'status'
2039
wont block 'commit' - for example.
2041
if self._changes_aborted:
2042
# Should this be a warning? For now, I'm expecting that places that
2043
# mark it inconsistent will warn, making a warning here redundant.
2044
trace.mutter('Not saving DirState because '
2045
'_changes_aborted is set.')
2047
if (self._header_state == DirState.IN_MEMORY_MODIFIED or
2048
self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2050
grabbed_write_lock = False
2051
if self._lock_state != 'w':
2052
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2053
# Switch over to the new lock, as the old one may be closed.
2054
# TODO: jam 20070315 We should validate the disk file has
2055
# not changed contents. Since temporary_write_lock may
2056
# not be an atomic operation.
2057
self._lock_token = new_lock
2058
self._state_file = new_lock.f
2059
if not grabbed_write_lock:
2060
# We couldn't grab a write lock, so we switch back to a read one
2063
self._state_file.seek(0)
2064
self._state_file.writelines(self.get_lines())
2065
self._state_file.truncate()
2066
self._state_file.flush()
2067
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2068
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
2070
if grabbed_write_lock:
2071
self._lock_token = self._lock_token.restore_read_lock()
2072
self._state_file = self._lock_token.f
2073
# TODO: jam 20070315 We should validate the disk file has
2074
# not changed contents. Since restore_read_lock may
2075
# not be an atomic operation.
2077
def _set_data(self, parent_ids, dirblocks):
2078
"""Set the full dirstate data in memory.
2080
This is an internal function used to completely replace the objects
2081
in memory state. It puts the dirstate into state 'full-dirty'.
2083
:param parent_ids: A list of parent tree revision ids.
2084
:param dirblocks: A list containing one tuple for each directory in the
2085
tree. Each tuple contains the directory path and a list of entries
2086
found in that directory.
2088
# our memory copy is now authoritative.
2089
self._dirblocks = dirblocks
2090
self._header_state = DirState.IN_MEMORY_MODIFIED
2091
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2092
self._parents = list(parent_ids)
2093
self._id_index = None
2094
self._packed_stat_index = None
2096
def set_path_id(self, path, new_id):
2097
"""Change the id of path to new_id in the current working tree.
2099
:param path: The path inside the tree to set - '' is the root, 'foo'
2100
is the path foo in the root.
2101
:param new_id: The new id to assign to the path. This must be a utf8
2102
file id (not unicode, and not None).
2104
assert new_id.__class__ == str, \
2105
"path_id %r is not a plain string" % (new_id,)
2106
self._read_dirblocks_if_needed()
2108
# TODO: logic not written
2109
raise NotImplementedError(self.set_path_id)
2110
# TODO: check new id is unique
2111
entry = self._get_entry(0, path_utf8=path)
2112
if entry[0][2] == new_id:
2113
# Nothing to change.
2115
# mark the old path absent, and insert a new root path
2116
self._make_absent(entry)
2117
self.update_minimal(('', '', new_id), 'd',
2118
path_utf8='', packed_stat=entry[1][0][4])
2119
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2120
if self._id_index is not None:
2121
self._id_index.setdefault(new_id, set()).add(entry[0])
2123
def set_parent_trees(self, trees, ghosts):
2124
"""Set the parent trees for the dirstate.
2126
:param trees: A list of revision_id, tree tuples. tree must be provided
2127
even if the revision_id refers to a ghost: supply an empty tree in
2129
:param ghosts: A list of the revision_ids that are ghosts at the time
2132
# TODO: generate a list of parent indexes to preserve to save
2133
# processing specific parent trees. In the common case one tree will
2134
# be preserved - the left most parent.
2135
# TODO: if the parent tree is a dirstate, we might want to walk them
2136
# all by path in parallel for 'optimal' common-case performance.
2137
# generate new root row.
2138
self._read_dirblocks_if_needed()
2139
# TODO future sketch: Examine the existing parents to generate a change
2140
# map and then walk the new parent trees only, mapping them into the
2141
# dirstate. Walk the dirstate at the same time to remove unreferenced
2144
# sketch: loop over all entries in the dirstate, cherry picking
2145
# entries from the parent trees, if they are not ghost trees.
2146
# after we finish walking the dirstate, all entries not in the dirstate
2147
# are deletes, so we want to append them to the end as per the design
2148
# discussions. So do a set difference on ids with the parents to
2149
# get deletes, and add them to the end.
2150
# During the update process we need to answer the following questions:
2151
# - find other keys containing a fileid in order to create cross-path
2152
# links. We dont't trivially use the inventory from other trees
2153
# because this leads to either double touching, or to accessing
2155
# - find other keys containing a path
2156
# We accumulate each entry via this dictionary, including the root
2159
# we could do parallel iterators, but because file id data may be
2160
# scattered throughout, we dont save on index overhead: we have to look
2161
# at everything anyway. We can probably save cycles by reusing parent
2162
# data and doing an incremental update when adding an additional
2163
# parent, but for now the common cases are adding a new parent (merge),
2164
# and replacing completely (commit), and commit is more common: so
2165
# optimise merge later.
2167
# ---- start generation of full tree mapping data
2168
# what trees should we use?
2169
parent_trees = [tree for rev_id, tree in trees if rev_id not in ghosts]
2170
# how many trees do we end up with
2171
parent_count = len(parent_trees)
2173
# one: the current tree
2174
for entry in self._iter_entries():
2175
# skip entries not in the current tree
2176
if entry[1][0][0] in 'ar': # absent, relocated
2178
by_path[entry[0]] = [entry[1][0]] + \
2179
[DirState.NULL_PARENT_DETAILS] * parent_count
2180
id_index[entry[0][2]] = set([entry[0]])
2182
# now the parent trees:
2183
for tree_index, tree in enumerate(parent_trees):
2184
# the index is off by one, adjust it.
2185
tree_index = tree_index + 1
2186
# when we add new locations for a fileid we need these ranges for
2187
# any fileid in this tree as we set the by_path[id] to:
2188
# already_processed_tree_details + new_details + new_location_suffix
2189
# the suffix is from tree_index+1:parent_count+1.
2190
new_location_suffix = [DirState.NULL_PARENT_DETAILS] * (parent_count - tree_index)
2191
# now stitch in all the entries from this tree
2192
for path, entry in tree.inventory.iter_entries_by_dir():
2193
# here we process each trees details for each item in the tree.
2194
# we first update any existing entries for the id at other paths,
2195
# then we either create or update the entry for the id at the
2196
# right path, and finally we add (if needed) a mapping from
2197
# file_id to this path. We do it in this order to allow us to
2198
# avoid checking all known paths for the id when generating a
2199
# new entry at this path: by adding the id->path mapping last,
2200
# all the mappings are valid and have correct relocation
2201
# records where needed.
2202
file_id = entry.file_id
2203
path_utf8 = path.encode('utf8')
2204
dirname, basename = osutils.split(path_utf8)
2205
new_entry_key = (dirname, basename, file_id)
2206
# tree index consistency: All other paths for this id in this tree
2207
# index must point to the correct path.
2208
for entry_key in id_index.setdefault(file_id, set()):
2209
# TODO:PROFILING: It might be faster to just update
2210
# rather than checking if we need to, and then overwrite
2211
# the one we are located at.
2212
if entry_key != new_entry_key:
2213
# this file id is at a different path in one of the
2214
# other trees, so put absent pointers there
2215
# This is the vertical axis in the matrix, all pointing
2217
by_path[entry_key][tree_index] = ('r', path_utf8, 0, False, '')
2218
# by path consistency: Insert into an existing path record (trivial), or
2219
# add a new one with relocation pointers for the other tree indexes.
2220
if new_entry_key in id_index[file_id]:
2221
# there is already an entry where this data belongs, just insert it.
2222
by_path[new_entry_key][tree_index] = \
2223
self._inv_entry_to_details(entry)
2225
# add relocated entries to the horizontal axis - this row
2226
# mapping from path,id. We need to look up the correct path
2227
# for the indexes from 0 to tree_index -1
2229
for lookup_index in xrange(tree_index):
2230
# boundary case: this is the first occurence of file_id
2231
# so there are no id_indexs, possibly take this out of
2233
if not len(id_index[file_id]):
2234
new_details.append(DirState.NULL_PARENT_DETAILS)
2236
# grab any one entry, use it to find the right path.
2237
# TODO: optimise this to reduce memory use in highly
2238
# fragmented situations by reusing the relocation
2240
a_key = iter(id_index[file_id]).next()
2241
if by_path[a_key][lookup_index][0] in ('r', 'a'):
2242
# its a pointer or missing statement, use it as is.
2243
new_details.append(by_path[a_key][lookup_index])
2245
# we have the right key, make a pointer to it.
2246
real_path = ('/'.join(a_key[0:2])).strip('/')
2247
new_details.append(('r', real_path, 0, False, ''))
2248
new_details.append(self._inv_entry_to_details(entry))
2249
new_details.extend(new_location_suffix)
2250
by_path[new_entry_key] = new_details
2251
id_index[file_id].add(new_entry_key)
2252
# --- end generation of full tree mappings
2254
# sort and output all the entries
2255
new_entries = self._sort_entries(by_path.items())
2256
self._entries_to_current_state(new_entries)
2257
self._parents = [rev_id for rev_id, tree in trees]
2258
self._ghosts = list(ghosts)
2259
self._header_state = DirState.IN_MEMORY_MODIFIED
2260
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2261
self._id_index = id_index
2263
def _sort_entries(self, entry_list):
2264
"""Given a list of entries, sort them into the right order.
2266
This is done when constructing a new dirstate from trees - normally we
2267
try to keep everything in sorted blocks all the time, but sometimes
2268
it's easier to sort after the fact.
2271
# sort by: directory parts, file name, file id
2272
return entry[0][0].split('/'), entry[0][1], entry[0][2]
2273
return sorted(entry_list, key=_key)
2275
def set_state_from_inventory(self, new_inv):
2276
"""Set new_inv as the current state.
2278
This API is called by tree transform, and will usually occur with
2279
existing parent trees.
2281
:param new_inv: The inventory object to set current state from.
2283
if 'evil' in debug.debug_flags:
2284
trace.mutter_callsite(1,
2285
"set_state_from_inventory called; please mutate the tree instead")
2286
self._read_dirblocks_if_needed()
2288
# Two iterators: current data and new data, both in dirblock order.
2289
# We zip them together, which tells about entries that are new in the
2290
# inventory, or removed in the inventory, or present in both and
2293
# You might think we could just synthesize a new dirstate directly
2294
# since we're processing it in the right order. However, we need to
2295
# also consider there may be any number of parent trees and relocation
2296
# pointers, and we don't want to duplicate that here.
2297
new_iterator = new_inv.iter_entries_by_dir()
2298
# we will be modifying the dirstate, so we need a stable iterator. In
2299
# future we might write one, for now we just clone the state into a
2300
# list - which is a shallow copy.
2301
old_iterator = iter(list(self._iter_entries()))
2302
# both must have roots so this is safe:
2303
current_new = new_iterator.next()
2304
current_old = old_iterator.next()
2305
def advance(iterator):
2307
return iterator.next()
2308
except StopIteration:
2310
while current_new or current_old:
2311
# skip entries in old that are not really there
2312
if current_old and current_old[1][0][0] in 'ar':
2313
# relocated or absent
2314
current_old = advance(old_iterator)
2317
# convert new into dirblock style
2318
new_path_utf8 = current_new[0].encode('utf8')
2319
new_dirname, new_basename = osutils.split(new_path_utf8)
2320
new_id = current_new[1].file_id
2321
new_entry_key = (new_dirname, new_basename, new_id)
2322
current_new_minikind = \
2323
DirState._kind_to_minikind[current_new[1].kind]
2324
if current_new_minikind == 't':
2325
fingerprint = current_new[1].reference_revision or ''
2327
# We normally only insert or remove records, or update
2328
# them when it has significantly changed. Then we want to
2329
# erase its fingerprint. Unaffected records should
2330
# normally not be updated at all.
2333
# for safety disable variables
2334
new_path_utf8 = new_dirname = new_basename = new_id = \
2335
new_entry_key = None
2336
# 5 cases, we dont have a value that is strictly greater than everything, so
2337
# we make both end conditions explicit
2339
# old is finished: insert current_new into the state.
2340
self.update_minimal(new_entry_key, current_new_minikind,
2341
executable=current_new[1].executable,
2342
path_utf8=new_path_utf8, fingerprint=fingerprint)
2343
current_new = advance(new_iterator)
2344
elif not current_new:
2346
self._make_absent(current_old)
2347
current_old = advance(old_iterator)
2348
elif new_entry_key == current_old[0]:
2349
# same - common case
2350
# We're looking at the same path and id in both the dirstate
2351
# and inventory, so just need to update the fields in the
2352
# dirstate from the one in the inventory.
2353
# TODO: update the record if anything significant has changed.
2354
# the minimal required trigger is if the execute bit or cached
2356
if (current_old[1][0][3] != current_new[1].executable or
2357
current_old[1][0][0] != current_new_minikind):
2358
self.update_minimal(current_old[0], current_new_minikind,
2359
executable=current_new[1].executable,
2360
path_utf8=new_path_utf8, fingerprint=fingerprint)
2361
# both sides are dealt with, move on
2362
current_old = advance(old_iterator)
2363
current_new = advance(new_iterator)
2364
elif (cmp_by_dirs(new_dirname, current_old[0][0]) < 0
2365
or (new_dirname == current_old[0][0]
2366
and new_entry_key[1:] < current_old[0][1:])):
2368
# add a entry for this and advance new
2369
self.update_minimal(new_entry_key, current_new_minikind,
2370
executable=current_new[1].executable,
2371
path_utf8=new_path_utf8, fingerprint=fingerprint)
2372
current_new = advance(new_iterator)
2374
# we've advanced past the place where the old key would be,
2375
# without seeing it in the new list. so it must be gone.
2376
self._make_absent(current_old)
2377
current_old = advance(old_iterator)
2378
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2379
self._id_index = None
2380
self._packed_stat_index = None
2382
def _make_absent(self, current_old):
2383
"""Mark current_old - an entry - as absent for tree 0.
2385
:return: True if this was the last details entry for the entry key:
2386
that is, if the underlying block has had the entry removed, thus
2387
shrinking in length.
2389
# build up paths that this id will be left at after the change is made,
2390
# so we can update their cross references in tree 0
2391
all_remaining_keys = set()
2392
# Dont check the working tree, because it's going.
2393
for details in current_old[1][1:]:
2394
if details[0] not in 'ar': # absent, relocated
2395
all_remaining_keys.add(current_old[0])
2396
elif details[0] == 'r': # relocated
2397
# record the key for the real path.
2398
all_remaining_keys.add(tuple(osutils.split(details[1])) + (current_old[0][2],))
2399
# absent rows are not present at any path.
2400
last_reference = current_old[0] not in all_remaining_keys
2402
# the current row consists entire of the current item (being marked
2403
# absent), and relocated or absent entries for the other trees:
2404
# Remove it, its meaningless.
2405
block = self._find_block(current_old[0])
2406
entry_index, present = self._find_entry_index(current_old[0], block[1])
2407
assert present, 'could not find entry for %s' % (current_old,)
2408
block[1].pop(entry_index)
2409
# if we have an id_index in use, remove this key from it for this id.
2410
if self._id_index is not None:
2411
self._id_index[current_old[0][2]].remove(current_old[0])
2412
# update all remaining keys for this id to record it as absent. The
2413
# existing details may either be the record we are marking as deleted
2414
# (if there were other trees with the id present at this path), or may
2416
for update_key in all_remaining_keys:
2417
update_block_index, present = \
2418
self._find_block_index_from_key(update_key)
2419
assert present, 'could not find block for %s' % (update_key,)
2420
update_entry_index, present = \
2421
self._find_entry_index(update_key, self._dirblocks[update_block_index][1])
2422
assert present, 'could not find entry for %s' % (update_key,)
2423
update_tree_details = self._dirblocks[update_block_index][1][update_entry_index][1]
2424
# it must not be absent at the moment
2425
assert update_tree_details[0][0] != 'a' # absent
2426
update_tree_details[0] = DirState.NULL_PARENT_DETAILS
2427
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2428
return last_reference
2430
def update_minimal(self, key, minikind, executable=False, fingerprint='',
2431
packed_stat=None, size=0, path_utf8=None):
2432
"""Update an entry to the state in tree 0.
2434
This will either create a new entry at 'key' or update an existing one.
2435
It also makes sure that any other records which might mention this are
2438
:param key: (dir, name, file_id) for the new entry
2439
:param minikind: The type for the entry ('f' == 'file', 'd' ==
2441
:param executable: Should the executable bit be set?
2442
:param fingerprint: Simple fingerprint for new entry: sha1 for files,
2443
referenced revision id for subtrees, etc.
2444
:param packed_stat: Packed stat value for new entry.
2445
:param size: Size information for new entry
2446
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
2449
If packed_stat and fingerprint are not given, they're invalidated in
2452
block = self._find_block(key)[1]
2453
if packed_stat is None:
2454
packed_stat = DirState.NULLSTAT
2455
# XXX: Some callers pass '' as the packed_stat, and it seems to be
2456
# sometimes present in the dirstate - this seems oddly inconsistent.
2458
entry_index, present = self._find_entry_index(key, block)
2459
new_details = (minikind, fingerprint, size, executable, packed_stat)
2460
id_index = self._get_id_index()
2462
# new entry, synthesis cross reference here,
2463
existing_keys = id_index.setdefault(key[2], set())
2464
if not existing_keys:
2465
# not currently in the state, simplest case
2466
new_entry = key, [new_details] + self._empty_parent_info()
2468
# present at one or more existing other paths.
2469
# grab one of them and use it to generate parent
2470
# relocation/absent entries.
2471
new_entry = key, [new_details]
2472
for other_key in existing_keys:
2473
# change the record at other to be a pointer to this new
2474
# record. The loop looks similar to the change to
2475
# relocations when updating an existing record but its not:
2476
# the test for existing kinds is different: this can be
2477
# factored out to a helper though.
2478
other_block_index, present = self._find_block_index_from_key(other_key)
2479
assert present, 'could not find block for %s' % (other_key,)
2480
other_entry_index, present = self._find_entry_index(other_key,
2481
self._dirblocks[other_block_index][1])
2482
assert present, 'could not find entry for %s' % (other_key,)
2483
assert path_utf8 is not None
2484
self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
2485
('r', path_utf8, 0, False, '')
2487
num_present_parents = self._num_present_parents()
2488
for lookup_index in xrange(1, num_present_parents + 1):
2489
# grab any one entry, use it to find the right path.
2490
# TODO: optimise this to reduce memory use in highly
2491
# fragmented situations by reusing the relocation
2493
update_block_index, present = \
2494
self._find_block_index_from_key(other_key)
2495
assert present, 'could not find block for %s' % (other_key,)
2496
update_entry_index, present = \
2497
self._find_entry_index(other_key, self._dirblocks[update_block_index][1])
2498
assert present, 'could not find entry for %s' % (other_key,)
2499
update_details = self._dirblocks[update_block_index][1][update_entry_index][1][lookup_index]
2500
if update_details[0] in 'ar': # relocated, absent
2501
# its a pointer or absent in lookup_index's tree, use
2503
new_entry[1].append(update_details)
2505
# we have the right key, make a pointer to it.
2506
pointer_path = osutils.pathjoin(*other_key[0:2])
2507
new_entry[1].append(('r', pointer_path, 0, False, ''))
2508
block.insert(entry_index, new_entry)
2509
existing_keys.add(key)
2511
# Does the new state matter?
2512
block[entry_index][1][0] = new_details
2513
# parents cannot be affected by what we do.
2514
# other occurences of this id can be found
2515
# from the id index.
2517
# tree index consistency: All other paths for this id in this tree
2518
# index must point to the correct path. We have to loop here because
2519
# we may have passed entries in the state with this file id already
2520
# that were absent - where parent entries are - and they need to be
2521
# converted to relocated.
2522
assert path_utf8 is not None
2523
for entry_key in id_index.setdefault(key[2], set()):
2524
# TODO:PROFILING: It might be faster to just update
2525
# rather than checking if we need to, and then overwrite
2526
# the one we are located at.
2527
if entry_key != key:
2528
# this file id is at a different path in one of the
2529
# other trees, so put absent pointers there
2530
# This is the vertical axis in the matrix, all pointing
2532
block_index, present = self._find_block_index_from_key(entry_key)
2534
entry_index, present = self._find_entry_index(entry_key, self._dirblocks[block_index][1])
2536
self._dirblocks[block_index][1][entry_index][1][0] = \
2537
('r', path_utf8, 0, False, '')
2538
# add a containing dirblock if needed.
2539
if new_details[0] == 'd':
2540
subdir_key = (osutils.pathjoin(*key[0:2]), '', '')
2541
block_index, present = self._find_block_index_from_key(subdir_key)
2543
self._dirblocks.insert(block_index, (subdir_key[0], []))
2545
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2547
def _validate(self):
2548
"""Check that invariants on the dirblock are correct.
2550
This can be useful in debugging; it shouldn't be necessary in
2553
This must be called with a lock held.
2555
# NOTE: This must always raise AssertionError not just assert,
2556
# otherwise it may not behave properly under python -O
2558
# TODO: All entries must have some content that's not 'a' or 'r',
2559
# otherwise it could just be removed.
2561
# TODO: All relocations must point directly to a real entry.
2563
# TODO: No repeated keys.
2566
from pprint import pformat
2567
self._read_dirblocks_if_needed()
2568
if len(self._dirblocks) > 0:
2569
if not self._dirblocks[0][0] == '':
2570
raise AssertionError(
2571
"dirblocks don't start with root block:\n" + \
2573
if len(self._dirblocks) > 1:
2574
if not self._dirblocks[1][0] == '':
2575
raise AssertionError(
2576
"dirblocks missing root directory:\n" + \
2578
# the dirblocks are sorted by their path components, name, and dir id
2579
dir_names = [d[0].split('/')
2580
for d in self._dirblocks[1:]]
2581
if dir_names != sorted(dir_names):
2582
raise AssertionError(
2583
"dir names are not in sorted order:\n" + \
2584
pformat(self._dirblocks) + \
2587
for dirblock in self._dirblocks:
2588
# within each dirblock, the entries are sorted by filename and
2590
for entry in dirblock[1]:
2591
if dirblock[0] != entry[0][0]:
2592
raise AssertionError(
2594
"doesn't match directory name in\n%r" %
2595
(entry, pformat(dirblock)))
2596
if dirblock[1] != sorted(dirblock[1]):
2597
raise AssertionError(
2598
"dirblock for %r is not sorted:\n%s" % \
2599
(dirblock[0], pformat(dirblock)))
2601
def check_valid_parent():
2602
"""Check that the current entry has a valid parent.
2604
This makes sure that the parent has a record,
2605
and that the parent isn't marked as "absent" in the
2606
current tree. (It is invalid to have a non-absent file in an absent
2609
if entry[0][0:2] == ('', ''):
2610
# There should be no parent for the root row
2612
parent_entry = self._get_entry(tree_index, path_utf8=entry[0][0])
2613
if parent_entry == (None, None):
2614
raise AssertionError(
2615
"no parent entry for: %s in tree %s"
2616
% (this_path, tree_index))
2617
if parent_entry[1][tree_index][0] != 'd':
2618
raise AssertionError(
2619
"Parent entry for %s is not marked as a valid"
2620
" directory. %s" % (this_path, parent_entry,))
2622
# For each file id, for each tree: either
2623
# the file id is not present at all; all rows with that id in the
2624
# key have it marked as 'absent'
2625
# OR the file id is present under exactly one name; any other entries
2626
# that mention that id point to the correct name.
2628
# We check this with a dict per tree pointing either to the present
2629
# name, or None if absent.
2630
tree_count = self._num_present_parents() + 1
2631
id_path_maps = [dict() for i in range(tree_count)]
2632
# Make sure that all renamed entries point to the correct location.
2633
for entry in self._iter_entries():
2634
file_id = entry[0][2]
2635
this_path = osutils.pathjoin(entry[0][0], entry[0][1])
2636
if len(entry[1]) != tree_count:
2637
raise AssertionError(
2638
"wrong number of entry details for row\n%s" \
2639
",\nexpected %d" % \
2640
(pformat(entry), tree_count))
2641
absent_positions = 0
2642
for tree_index, tree_state in enumerate(entry[1]):
2643
this_tree_map = id_path_maps[tree_index]
2644
minikind = tree_state[0]
2645
if minikind in 'ar':
2646
absent_positions += 1
2647
# have we seen this id before in this column?
2648
if file_id in this_tree_map:
2649
previous_path, previous_loc = this_tree_map[file_id]
2650
# any later mention of this file must be consistent with
2651
# what was said before
2653
if previous_path is not None:
2654
raise AssertionError(
2655
"file %s is absent in row %r but also present " \
2657
(file_id, entry, previous_path))
2658
elif minikind == 'r':
2659
target_location = tree_state[1]
2660
if previous_path != target_location:
2661
raise AssertionError(
2662
"file %s relocation in row %r but also at %r" \
2663
% (file_id, entry, previous_path))
2665
# a file, directory, etc - may have been previously
2666
# pointed to by a relocation, which must point here
2667
if previous_path != this_path:
2668
raise AssertionError(
2669
"entry %r inconsistent with previous path %r "
2671
(entry, previous_path, previous_loc))
2672
check_valid_parent()
2675
# absent; should not occur anywhere else
2676
this_tree_map[file_id] = None, this_path
2677
elif minikind == 'r':
2678
# relocation, must occur at expected location
2679
this_tree_map[file_id] = tree_state[1], this_path
2681
this_tree_map[file_id] = this_path, this_path
2682
check_valid_parent()
2683
if absent_positions == tree_count:
2684
raise AssertionError(
2685
"entry %r has no data for any tree." % (entry,))
2687
def _wipe_state(self):
2688
"""Forget all state information about the dirstate."""
2689
self._header_state = DirState.NOT_IN_MEMORY
2690
self._dirblock_state = DirState.NOT_IN_MEMORY
2691
self._changes_aborted = False
2694
self._dirblocks = []
2695
self._id_index = None
2696
self._packed_stat_index = None
2697
self._end_of_header = None
2698
self._cutoff_time = None
2699
self._split_path_cache = {}
2701
def lock_read(self):
2702
"""Acquire a read lock on the dirstate."""
2703
if self._lock_token is not None:
2704
raise errors.LockContention(self._lock_token)
2705
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2706
# already in memory, we could read just the header and check for
2707
# any modification. If not modified, we can just leave things
2709
self._lock_token = lock.ReadLock(self._filename)
2710
self._lock_state = 'r'
2711
self._state_file = self._lock_token.f
2714
def lock_write(self):
2715
"""Acquire a write lock on the dirstate."""
2716
if self._lock_token is not None:
2717
raise errors.LockContention(self._lock_token)
2718
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2719
# already in memory, we could read just the header and check for
2720
# any modification. If not modified, we can just leave things
2722
self._lock_token = lock.WriteLock(self._filename)
2723
self._lock_state = 'w'
2724
self._state_file = self._lock_token.f
2728
"""Drop any locks held on the dirstate."""
2729
if self._lock_token is None:
2730
raise errors.LockNotHeld(self)
2731
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2732
# already in memory, we could read just the header and check for
2733
# any modification. If not modified, we can just leave things
2735
self._state_file = None
2736
self._lock_state = None
2737
self._lock_token.unlock()
2738
self._lock_token = None
2739
self._split_path_cache = {}
2741
def _requires_lock(self):
2742
"""Check that a lock is currently held by someone on the dirstate."""
2743
if not self._lock_token:
2744
raise errors.ObjectNotLocked(self)
2747
# Try to load the compiled form if possible
2749
from bzrlib._dirstate_helpers_c import (
2750
_read_dirblocks_c as _read_dirblocks,
2751
bisect_dirblock_c as bisect_dirblock,
2752
_bisect_path_left_c as _bisect_path_left,
2753
_bisect_path_right_c as _bisect_path_right,
2754
cmp_by_dirs_c as cmp_by_dirs,
2757
from bzrlib._dirstate_helpers_py import (
2758
_read_dirblocks_py as _read_dirblocks,
2759
bisect_dirblock_py as bisect_dirblock,
2760
_bisect_path_left_py as _bisect_path_left,
2761
_bisect_path_right_py as _bisect_path_right,
2762
cmp_by_dirs_py as cmp_by_dirs,