1
# Copyright (C) 2006, 2007, 2008 Canonical Ltd
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
# GNU General Public License for more details.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
"""DirState objects record the state of a directory and its bzr metadata.
19
Pseudo EBNF grammar for the state file. Fields are separated by NULLs, and
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
are not - this is done for clarity of reading. All string data is in utf8.
23
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
26
WHOLE_NUMBER = {digit}, digit;
28
REVISION_ID = a non-empty utf8 string;
30
dirstate format = header line, full checksum, row count, parent details,
31
ghost_details, entries;
32
header line = "#bazaar dirstate flat format 3", NL;
33
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
34
row count = "num_entries: ", WHOLE_NUMBER, NL;
35
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
36
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
38
entry = entry_key, current_entry_details, {parent_entry_details};
39
entry_key = dirname, basename, fileid;
40
current_entry_details = common_entry_details, working_entry_details;
41
parent_entry_details = common_entry_details, history_entry_details;
42
common_entry_details = MINIKIND, fingerprint, size, executable
43
working_entry_details = packed_stat
44
history_entry_details = REVISION_ID;
47
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
49
Given this definition, the following is useful to know:
50
entry (aka row) - all the data for a given key.
51
entry[0]: The key (dirname, basename, fileid)
55
entry[1]: The tree(s) data for this path and id combination.
56
entry[1][0]: The current tree
57
entry[1][1]: The second tree
59
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate:
60
entry[1][0][0]: minikind
61
entry[1][0][1]: fingerprint
63
entry[1][0][3]: executable
64
entry[1][0][4]: packed_stat
66
entry[1][1][4]: revision_id
68
There may be multiple rows at the root, one per id present in the root, so the
69
in memory root row is now:
70
self._dirblocks[0] -> ('', [entry ...]),
71
and the entries in there are
74
entries[0][2]: file_id
75
entries[1][0]: The tree data for the current tree for this fileid at /
79
'r' is a relocated entry: This path is not present in this tree with this id,
80
but the id can be found at another location. The fingerprint is used to
81
point to the target location.
82
'a' is an absent entry: In that tree the id is not present at this path.
83
'd' is a directory entry: This path in this tree is a directory with the
84
current file id. There is no fingerprint for directories.
85
'f' is a file entry: As for directory, but its a file. The fingerprint is a
87
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
89
't' is a reference to a nested subtree; the fingerprint is the referenced
94
The entries on disk and in memory are ordered according to the following keys:
96
directory, as a list of components
100
--- Format 1 had the following different definition: ---
101
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
102
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
104
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
105
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
108
PARENT ROW's are emitted for every parent that is not in the ghosts details
109
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
110
each row will have a PARENT ROW for foo and baz, but not for bar.
113
In any tree, a kind of 'moved' indicates that the fingerprint field
114
(which we treat as opaque data specific to the 'kind' anyway) has the
115
details for the id of this row in that tree.
117
I'm strongly tempted to add a id->path index as well, but I think that
118
where we need id->path mapping; we also usually read the whole file, so
119
I'm going to skip that for the moment, as we have the ability to locate
120
via bisect any path in any tree, and if we lookup things by path, we can
121
accumulate an id->path mapping as we go, which will tend to match what we
124
I plan to implement this asap, so please speak up now to alter/tweak the
125
design - and once we stabilise on this, I'll update the wiki page for
128
The rationale for all this is that we want fast operations for the
129
common case (diff/status/commit/merge on all files) and extremely fast
130
operations for the less common but still occurs a lot status/diff/commit
131
on specific files). Operations on specific files involve a scan for all
132
the children of a path, *in every involved tree*, which the current
133
format did not accommodate.
137
1) Fast end to end use for bzr's top 5 uses cases. (commmit/diff/status/merge/???)
138
2) fall back current object model as needed.
139
3) scale usably to the largest trees known today - say 50K entries. (mozilla
140
is an example of this)
144
Eventually reuse dirstate objects across locks IFF the dirstate file has not
145
been modified, but will require that we flush/ignore cached stat-hit data
146
because we won't want to restat all files on disk just because a lock was
147
acquired, yet we cannot trust the data after the previous lock was released.
149
Memory representation:
150
vector of all directories, and vector of the childen ?
152
root_entrie = (direntry for root, [parent_direntries_for_root]),
154
('', ['data for achild', 'data for bchild', 'data for cchild'])
155
('dir', ['achild', 'cchild', 'echild'])
157
- single bisect to find N subtrees from a path spec
158
- in-order for serialisation - this is 'dirblock' grouping.
159
- insertion of a file '/a' affects only the '/' child-vector, that is, to
160
insert 10K elements from scratch does not generates O(N^2) memoves of a
161
single vector, rather each individual, which tends to be limited to a
162
manageable number. Will scale badly on trees with 10K entries in a
163
single directory. compare with Inventory.InventoryDirectory which has
164
a dictionary for the children. No bisect capability, can only probe for
165
exact matches, or grab all elements and sort.
166
- What's the risk of error here? Once we have the base format being processed
167
we should have a net win regardless of optimality. So we are going to
168
go with what seems reasonable.
171
Maybe we should do a test profile of the core structure - 10K simulated
172
searches/lookups/etc?
174
Objects for each row?
175
The lifetime of Dirstate objects is current per lock, but see above for
176
possible extensions. The lifetime of a row from a dirstate is expected to be
177
very short in the optimistic case: which we are optimising for. For instance,
178
subtree status will determine from analysis of the disk data what rows need to
179
be examined at all, and will be able to determine from a single row whether
180
that file has altered or not, so we are aiming to process tens of thousands of
181
entries each second within the dirstate context, before exposing anything to
182
the larger codebase. This suggests we want the time for a single file
183
comparison to be < 0.1 milliseconds. That would give us 10000 paths per second
184
processed, and to scale to 100 thousand we'll another order of magnitude to do
185
that. Now, as the lifetime for all unchanged entries is the time to parse, stat
186
the file on disk, and then immediately discard, the overhead of object creation
187
becomes a significant cost.
189
Figures: Creating a tuple from from 3 elements was profiled at 0.0625
190
microseconds, whereas creating a object which is subclassed from tuple was
191
0.500 microseconds, and creating an object with 3 elements and slots was 3
192
microseconds long. 0.1 milliseconds is 100 microseconds, and ideally we'll get
193
down to 10 microseconds for the total processing - having 33% of that be object
194
creation is a huge overhead. There is a potential cost in using tuples within
195
each row which is that the conditional code to do comparisons may be slower
196
than method invocation, but method invocation is known to be slow due to stack
197
frame creation, so avoiding methods in these tight inner loops in unfortunately
198
desirable. We can consider a pyrex version of this with objects in future if
207
from stat import S_IEXEC
226
# This is the Windows equivalent of ENOTDIR
227
# It is defined in pywin32.winerror, but we don't want a strong dependency for
228
# just an error code.
229
ERROR_PATH_NOT_FOUND = 3
230
ERROR_DIRECTORY = 267
233
if not getattr(struct, '_compile', None):
234
# Cannot pre-compile the dirstate pack_stat
235
def pack_stat(st, _encode=binascii.b2a_base64, _pack=struct.pack):
236
"""Convert stat values into a packed representation."""
237
return _encode(_pack('>LLLLLL', st.st_size, int(st.st_mtime),
238
int(st.st_ctime), st.st_dev, st.st_ino & 0xFFFFFFFF,
241
# compile the struct compiler we need, so as to only do it once
242
from _struct import Struct
243
_compiled_pack = Struct('>LLLLLL').pack
244
def pack_stat(st, _encode=binascii.b2a_base64, _pack=_compiled_pack):
245
"""Convert stat values into a packed representation."""
246
# jam 20060614 it isn't really worth removing more entries if we
247
# are going to leave it in packed form.
248
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
249
# With all entries, filesize is 5.9M and read time is maybe 280ms
250
# well within the noise margin
252
# base64 encoding always adds a final newline, so strip it off
253
# The current version
254
return _encode(_pack(st.st_size, int(st.st_mtime), int(st.st_ctime),
255
st.st_dev, st.st_ino & 0xFFFFFFFF, st.st_mode))[:-1]
256
# This is 0.060s / 1.520s faster by not encoding as much information
257
# return _encode(_pack('>LL', int(st.st_mtime), st.st_mode))[:-1]
258
# This is not strictly faster than _encode(_pack())[:-1]
259
# return '%X.%X.%X.%X.%X.%X' % (
260
# st.st_size, int(st.st_mtime), int(st.st_ctime),
261
# st.st_dev, st.st_ino, st.st_mode)
262
# Similar to the _encode(_pack('>LL'))
263
# return '%X.%X' % (int(st.st_mtime), st.st_mode)
266
class DirState(object):
267
"""Record directory and metadata state for fast access.
269
A dirstate is a specialised data structure for managing local working
270
tree state information. Its not yet well defined whether it is platform
271
specific, and if it is how we detect/parameterize that.
273
Dirstates use the usual lock_write, lock_read and unlock mechanisms.
274
Unlike most bzr disk formats, DirStates must be locked for reading, using
275
lock_read. (This is an os file lock internally.) This is necessary
276
because the file can be rewritten in place.
278
DirStates must be explicitly written with save() to commit changes; just
279
unlocking them does not write the changes to disk.
282
_kind_to_minikind = {
288
'tree-reference': 't',
290
_minikind_to_kind = {
296
't': 'tree-reference',
298
_stat_to_minikind = {
303
_to_yesno = {True:'y', False: 'n'} # TODO profile the performance gain
304
# of using int conversion rather than a dict here. AND BLAME ANDREW IF
307
# TODO: jam 20070221 Figure out what to do if we have a record that exceeds
308
# the BISECT_PAGE_SIZE. For now, we just have to make it large enough
309
# that we are sure a single record will always fit.
310
BISECT_PAGE_SIZE = 4096
313
IN_MEMORY_UNMODIFIED = 1
314
IN_MEMORY_MODIFIED = 2
316
# A pack_stat (the x's) that is just noise and will never match the output
319
NULL_PARENT_DETAILS = ('a', '', 0, False, '')
321
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
322
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
324
def __init__(self, path, content_filter_stack_provider=None):
325
"""Create a DirState object.
327
:param path: The path at which the dirstate file on disk should live.
328
:param content_filter_stack_provider: a function that takes a
329
path (relative to the top of the tree) and a file-id as
330
parameters and returns a stack of ContentFilters.
331
If None, no content filtering is performed.
333
# _header_state and _dirblock_state represent the current state
334
# of the dirstate metadata and the per-row data respectiely.
335
# NOT_IN_MEMORY indicates that no data is in memory
336
# IN_MEMORY_UNMODIFIED indicates that what we have in memory
337
# is the same as is on disk
338
# IN_MEMORY_MODIFIED indicates that we have a modified version
339
# of what is on disk.
340
# In future we will add more granularity, for instance _dirblock_state
341
# will probably support partially-in-memory as a separate variable,
342
# allowing for partially-in-memory unmodified and partially-in-memory
344
self._header_state = DirState.NOT_IN_MEMORY
345
self._dirblock_state = DirState.NOT_IN_MEMORY
346
# If true, an error has been detected while updating the dirstate, and
347
# for safety we're not going to commit to disk.
348
self._changes_aborted = False
352
self._state_file = None
353
self._filename = path
354
self._lock_token = None
355
self._lock_state = None
356
self._id_index = None
357
# a map from packed_stat to sha's.
358
self._packed_stat_index = None
359
self._end_of_header = None
360
self._cutoff_time = None
361
self._split_path_cache = {}
362
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
363
if 'hashcache' in debug.debug_flags:
364
self._size_sha1_file = self._size_sha1_file_and_mutter
366
self._size_sha1_file = filters.internal_size_sha_file_byname
367
# These two attributes provide a simple cache for lookups into the
368
# dirstate in-memory vectors. By probing respectively for the last
369
# block, and for the next entry, we save nearly 2 bisections per path
371
self._last_block_index = None
372
self._last_entry_index = None
373
# Content filtering setup
374
self._filter_provider = content_filter_stack_provider
378
(self.__class__.__name__, self._filename)
380
def add(self, path, file_id, kind, stat, fingerprint):
381
"""Add a path to be tracked.
383
:param path: The path within the dirstate - '' is the root, 'foo' is the
384
path foo within the root, 'foo/bar' is the path bar within foo
386
:param file_id: The file id of the path being added.
387
:param kind: The kind of the path, as a string like 'file',
389
:param stat: The output of os.lstat for the path.
390
:param fingerprint: The sha value of the file,
391
or the target of a symlink,
392
or the referenced revision id for tree-references,
393
or '' for directories.
396
# find the block its in.
397
# find the location in the block.
398
# check its not there
400
#------- copied from inventory.ensure_normalized_name - keep synced.
401
# --- normalized_filename wants a unicode basename only, so get one.
402
dirname, basename = osutils.split(path)
403
# we dont import normalized_filename directly because we want to be
404
# able to change the implementation at runtime for tests.
405
norm_name, can_access = osutils.normalized_filename(basename)
406
if norm_name != basename:
410
raise errors.InvalidNormalization(path)
411
# you should never have files called . or ..; just add the directory
412
# in the parent, or according to the special treatment for the root
413
if basename == '.' or basename == '..':
414
raise errors.InvalidEntryName(path)
415
# now that we've normalised, we need the correct utf8 path and
416
# dirname and basename elements. This single encode and split should be
417
# faster than three separate encodes.
418
utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
419
dirname, basename = osutils.split(utf8path)
420
# uses __class__ for speed; the check is needed for safety
421
if file_id.__class__ is not str:
422
raise AssertionError(
423
"must be a utf8 file_id not %s" % (type(file_id), ))
424
# Make sure the file_id does not exist in this tree
426
file_id_entry = self._get_entry(0, fileid_utf8=file_id, include_deleted=True)
427
if file_id_entry != (None, None):
428
if file_id_entry[1][0][0] == 'a':
429
if file_id_entry[0] != (dirname, basename, file_id):
430
# set the old name's current operation to rename
431
self.update_minimal(file_id_entry[0],
437
rename_from = file_id_entry[0][0:2]
439
path = osutils.pathjoin(file_id_entry[0][0], file_id_entry[0][1])
440
kind = DirState._minikind_to_kind[file_id_entry[1][0][0]]
441
info = '%s:%s' % (kind, path)
442
raise errors.DuplicateFileId(file_id, info)
443
first_key = (dirname, basename, '')
444
block_index, present = self._find_block_index_from_key(first_key)
446
# check the path is not in the tree
447
block = self._dirblocks[block_index][1]
448
entry_index, _ = self._find_entry_index(first_key, block)
449
while (entry_index < len(block) and
450
block[entry_index][0][0:2] == first_key[0:2]):
451
if block[entry_index][1][0][0] not in 'ar':
452
# this path is in the dirstate in the current tree.
453
raise Exception, "adding already added path!"
456
# The block where we want to put the file is not present. But it
457
# might be because the directory was empty, or not loaded yet. Look
458
# for a parent entry, if not found, raise NotVersionedError
459
parent_dir, parent_base = osutils.split(dirname)
460
parent_block_idx, parent_entry_idx, _, parent_present = \
461
self._get_block_entry_index(parent_dir, parent_base, 0)
462
if not parent_present:
463
raise errors.NotVersionedError(path, str(self))
464
self._ensure_block(parent_block_idx, parent_entry_idx, dirname)
465
block = self._dirblocks[block_index][1]
466
entry_key = (dirname, basename, file_id)
469
packed_stat = DirState.NULLSTAT
472
packed_stat = pack_stat(stat)
473
parent_info = self._empty_parent_info()
474
minikind = DirState._kind_to_minikind[kind]
475
if rename_from is not None:
477
old_path_utf8 = '%s/%s' % rename_from
479
old_path_utf8 = rename_from[1]
480
parent_info[0] = ('r', old_path_utf8, 0, False, '')
482
entry_data = entry_key, [
483
(minikind, fingerprint, size, False, packed_stat),
485
elif kind == 'directory':
486
entry_data = entry_key, [
487
(minikind, '', 0, False, packed_stat),
489
elif kind == 'symlink':
490
entry_data = entry_key, [
491
(minikind, fingerprint, size, False, packed_stat),
493
elif kind == 'tree-reference':
494
entry_data = entry_key, [
495
(minikind, fingerprint, 0, False, packed_stat),
498
raise errors.BzrError('unknown kind %r' % kind)
499
entry_index, present = self._find_entry_index(entry_key, block)
501
block.insert(entry_index, entry_data)
503
if block[entry_index][1][0][0] != 'a':
504
raise AssertionError(" %r(%r) already added" % (basename, file_id))
505
block[entry_index][1][0] = entry_data[1][0]
507
if kind == 'directory':
508
# insert a new dirblock
509
self._ensure_block(block_index, entry_index, utf8path)
510
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
512
self._id_index.setdefault(entry_key[2], set()).add(entry_key)
514
def _bisect(self, paths):
515
"""Bisect through the disk structure for specific rows.
517
:param paths: A list of paths to find
518
:return: A dict mapping path => entries for found entries. Missing
519
entries will not be in the map.
520
The list is not sorted, and entries will be populated
521
based on when they were read.
523
self._requires_lock()
524
# We need the file pointer to be right after the initial header block
525
self._read_header_if_needed()
526
# If _dirblock_state was in memory, we should just return info from
527
# there, this function is only meant to handle when we want to read
529
if self._dirblock_state != DirState.NOT_IN_MEMORY:
530
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
532
# The disk representation is generally info + '\0\n\0' at the end. But
533
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
534
# Because it means we can sync on the '\n'
535
state_file = self._state_file
536
file_size = os.fstat(state_file.fileno()).st_size
537
# We end up with 2 extra fields, we should have a trailing '\n' to
538
# ensure that we read the whole record, and we should have a precursur
539
# '' which ensures that we start after the previous '\n'
540
entry_field_count = self._fields_per_entry() + 1
542
low = self._end_of_header
543
high = file_size - 1 # Ignore the final '\0'
544
# Map from (dir, name) => entry
547
# Avoid infinite seeking
548
max_count = 30*len(paths)
550
# pending is a list of places to look.
551
# each entry is a tuple of low, high, dir_names
552
# low -> the first byte offset to read (inclusive)
553
# high -> the last byte offset (inclusive)
554
# dir_names -> The list of (dir, name) pairs that should be found in
555
# the [low, high] range
556
pending = [(low, high, paths)]
558
page_size = self._bisect_page_size
560
fields_to_entry = self._get_fields_to_entry()
563
low, high, cur_files = pending.pop()
565
if not cur_files or low >= high:
570
if count > max_count:
571
raise errors.BzrError('Too many seeks, most likely a bug.')
573
mid = max(low, (low+high-page_size)/2)
576
# limit the read size, so we don't end up reading data that we have
578
read_size = min(page_size, (high-mid)+1)
579
block = state_file.read(read_size)
582
entries = block.split('\n')
585
# We didn't find a '\n', so we cannot have found any records.
586
# So put this range back and try again. But we know we have to
587
# increase the page size, because a single read did not contain
588
# a record break (so records must be larger than page_size)
590
pending.append((low, high, cur_files))
593
# Check the first and last entries, in case they are partial, or if
594
# we don't care about the rest of this page
596
first_fields = entries[0].split('\0')
597
if len(first_fields) < entry_field_count:
598
# We didn't get the complete first entry
599
# so move start, and grab the next, which
600
# should be a full entry
601
start += len(entries[0])+1
602
first_fields = entries[1].split('\0')
605
if len(first_fields) <= 2:
606
# We didn't even get a filename here... what do we do?
607
# Try a large page size and repeat this query
609
pending.append((low, high, cur_files))
612
# Find what entries we are looking for, which occur before and
613
# after this first record.
616
first_path = first_fields[1] + '/' + first_fields[2]
618
first_path = first_fields[2]
619
first_loc = _bisect_path_left(cur_files, first_path)
621
# These exist before the current location
622
pre = cur_files[:first_loc]
623
# These occur after the current location, which may be in the
624
# data we read, or might be after the last entry
625
post = cur_files[first_loc:]
627
if post and len(first_fields) >= entry_field_count:
628
# We have files after the first entry
630
# Parse the last entry
631
last_entry_num = len(entries)-1
632
last_fields = entries[last_entry_num].split('\0')
633
if len(last_fields) < entry_field_count:
634
# The very last hunk was not complete,
635
# read the previous hunk
636
after = mid + len(block) - len(entries[-1])
638
last_fields = entries[last_entry_num].split('\0')
640
after = mid + len(block)
643
last_path = last_fields[1] + '/' + last_fields[2]
645
last_path = last_fields[2]
646
last_loc = _bisect_path_right(post, last_path)
648
middle_files = post[:last_loc]
649
post = post[last_loc:]
652
# We have files that should occur in this block
653
# (>= first, <= last)
654
# Either we will find them here, or we can mark them as
657
if middle_files[0] == first_path:
658
# We might need to go before this location
659
pre.append(first_path)
660
if middle_files[-1] == last_path:
661
post.insert(0, last_path)
663
# Find out what paths we have
664
paths = {first_path:[first_fields]}
665
# last_path might == first_path so we need to be
666
# careful if we should append rather than overwrite
667
if last_entry_num != first_entry_num:
668
paths.setdefault(last_path, []).append(last_fields)
669
for num in xrange(first_entry_num+1, last_entry_num):
670
# TODO: jam 20070223 We are already splitting here, so
671
# shouldn't we just split the whole thing rather
672
# than doing the split again in add_one_record?
673
fields = entries[num].split('\0')
675
path = fields[1] + '/' + fields[2]
678
paths.setdefault(path, []).append(fields)
680
for path in middle_files:
681
for fields in paths.get(path, []):
682
# offset by 1 because of the opening '\0'
683
# consider changing fields_to_entry to avoid the
685
entry = fields_to_entry(fields[1:])
686
found.setdefault(path, []).append(entry)
688
# Now we have split up everything into pre, middle, and post, and
689
# we have handled everything that fell in 'middle'.
690
# We add 'post' first, so that we prefer to seek towards the
691
# beginning, so that we will tend to go as early as we need, and
692
# then only seek forward after that.
694
pending.append((after, high, post))
696
pending.append((low, start-1, pre))
698
# Consider that we may want to return the directory entries in sorted
699
# order. For now, we just return them in whatever order we found them,
700
# and leave it up to the caller if they care if it is ordered or not.
703
def _bisect_dirblocks(self, dir_list):
704
"""Bisect through the disk structure to find entries in given dirs.
706
_bisect_dirblocks is meant to find the contents of directories, which
707
differs from _bisect, which only finds individual entries.
709
:param dir_list: A sorted list of directory names ['', 'dir', 'foo'].
710
:return: A map from dir => entries_for_dir
712
# TODO: jam 20070223 A lot of the bisecting logic could be shared
713
# between this and _bisect. It would require parameterizing the
714
# inner loop with a function, though. We should evaluate the
715
# performance difference.
716
self._requires_lock()
717
# We need the file pointer to be right after the initial header block
718
self._read_header_if_needed()
719
# If _dirblock_state was in memory, we should just return info from
720
# there, this function is only meant to handle when we want to read
722
if self._dirblock_state != DirState.NOT_IN_MEMORY:
723
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
724
# The disk representation is generally info + '\0\n\0' at the end. But
725
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
726
# Because it means we can sync on the '\n'
727
state_file = self._state_file
728
file_size = os.fstat(state_file.fileno()).st_size
729
# We end up with 2 extra fields, we should have a trailing '\n' to
730
# ensure that we read the whole record, and we should have a precursur
731
# '' which ensures that we start after the previous '\n'
732
entry_field_count = self._fields_per_entry() + 1
734
low = self._end_of_header
735
high = file_size - 1 # Ignore the final '\0'
736
# Map from dir => entry
739
# Avoid infinite seeking
740
max_count = 30*len(dir_list)
742
# pending is a list of places to look.
743
# each entry is a tuple of low, high, dir_names
744
# low -> the first byte offset to read (inclusive)
745
# high -> the last byte offset (inclusive)
746
# dirs -> The list of directories that should be found in
747
# the [low, high] range
748
pending = [(low, high, dir_list)]
750
page_size = self._bisect_page_size
752
fields_to_entry = self._get_fields_to_entry()
755
low, high, cur_dirs = pending.pop()
757
if not cur_dirs or low >= high:
762
if count > max_count:
763
raise errors.BzrError('Too many seeks, most likely a bug.')
765
mid = max(low, (low+high-page_size)/2)
768
# limit the read size, so we don't end up reading data that we have
770
read_size = min(page_size, (high-mid)+1)
771
block = state_file.read(read_size)
774
entries = block.split('\n')
777
# We didn't find a '\n', so we cannot have found any records.
778
# So put this range back and try again. But we know we have to
779
# increase the page size, because a single read did not contain
780
# a record break (so records must be larger than page_size)
782
pending.append((low, high, cur_dirs))
785
# Check the first and last entries, in case they are partial, or if
786
# we don't care about the rest of this page
788
first_fields = entries[0].split('\0')
789
if len(first_fields) < entry_field_count:
790
# We didn't get the complete first entry
791
# so move start, and grab the next, which
792
# should be a full entry
793
start += len(entries[0])+1
794
first_fields = entries[1].split('\0')
797
if len(first_fields) <= 1:
798
# We didn't even get a dirname here... what do we do?
799
# Try a large page size and repeat this query
801
pending.append((low, high, cur_dirs))
804
# Find what entries we are looking for, which occur before and
805
# after this first record.
807
first_dir = first_fields[1]
808
first_loc = bisect.bisect_left(cur_dirs, first_dir)
810
# These exist before the current location
811
pre = cur_dirs[:first_loc]
812
# These occur after the current location, which may be in the
813
# data we read, or might be after the last entry
814
post = cur_dirs[first_loc:]
816
if post and len(first_fields) >= entry_field_count:
817
# We have records to look at after the first entry
819
# Parse the last entry
820
last_entry_num = len(entries)-1
821
last_fields = entries[last_entry_num].split('\0')
822
if len(last_fields) < entry_field_count:
823
# The very last hunk was not complete,
824
# read the previous hunk
825
after = mid + len(block) - len(entries[-1])
827
last_fields = entries[last_entry_num].split('\0')
829
after = mid + len(block)
831
last_dir = last_fields[1]
832
last_loc = bisect.bisect_right(post, last_dir)
834
middle_files = post[:last_loc]
835
post = post[last_loc:]
838
# We have files that should occur in this block
839
# (>= first, <= last)
840
# Either we will find them here, or we can mark them as
843
if middle_files[0] == first_dir:
844
# We might need to go before this location
845
pre.append(first_dir)
846
if middle_files[-1] == last_dir:
847
post.insert(0, last_dir)
849
# Find out what paths we have
850
paths = {first_dir:[first_fields]}
851
# last_dir might == first_dir so we need to be
852
# careful if we should append rather than overwrite
853
if last_entry_num != first_entry_num:
854
paths.setdefault(last_dir, []).append(last_fields)
855
for num in xrange(first_entry_num+1, last_entry_num):
856
# TODO: jam 20070223 We are already splitting here, so
857
# shouldn't we just split the whole thing rather
858
# than doing the split again in add_one_record?
859
fields = entries[num].split('\0')
860
paths.setdefault(fields[1], []).append(fields)
862
for cur_dir in middle_files:
863
for fields in paths.get(cur_dir, []):
864
# offset by 1 because of the opening '\0'
865
# consider changing fields_to_entry to avoid the
867
entry = fields_to_entry(fields[1:])
868
found.setdefault(cur_dir, []).append(entry)
870
# Now we have split up everything into pre, middle, and post, and
871
# we have handled everything that fell in 'middle'.
872
# We add 'post' first, so that we prefer to seek towards the
873
# beginning, so that we will tend to go as early as we need, and
874
# then only seek forward after that.
876
pending.append((after, high, post))
878
pending.append((low, start-1, pre))
882
def _bisect_recursive(self, paths):
883
"""Bisect for entries for all paths and their children.
885
This will use bisect to find all records for the supplied paths. It
886
will then continue to bisect for any records which are marked as
887
directories. (and renames?)
889
:param paths: A sorted list of (dir, name) pairs
890
eg: [('', 'a'), ('', 'f'), ('a/b', 'c')]
891
:return: A dictionary mapping (dir, name, file_id) => [tree_info]
893
# Map from (dir, name, file_id) => [tree_info]
896
found_dir_names = set()
898
# Directories that have been read
899
processed_dirs = set()
900
# Get the ball rolling with the first bisect for all entries.
901
newly_found = self._bisect(paths)
904
# Directories that need to be read
906
paths_to_search = set()
907
for entry_list in newly_found.itervalues():
908
for dir_name_id, trees_info in entry_list:
909
found[dir_name_id] = trees_info
910
found_dir_names.add(dir_name_id[:2])
912
for tree_info in trees_info:
913
minikind = tree_info[0]
916
# We already processed this one as a directory,
917
# we don't need to do the extra work again.
919
subdir, name, file_id = dir_name_id
920
path = osutils.pathjoin(subdir, name)
922
if path not in processed_dirs:
923
pending_dirs.add(path)
924
elif minikind == 'r':
925
# Rename, we need to directly search the target
926
# which is contained in the fingerprint column
927
dir_name = osutils.split(tree_info[1])
928
if dir_name[0] in pending_dirs:
929
# This entry will be found in the dir search
931
if dir_name not in found_dir_names:
932
paths_to_search.add(tree_info[1])
933
# Now we have a list of paths to look for directly, and
934
# directory blocks that need to be read.
935
# newly_found is mixing the keys between (dir, name) and path
936
# entries, but that is okay, because we only really care about the
938
newly_found = self._bisect(sorted(paths_to_search))
939
newly_found.update(self._bisect_dirblocks(sorted(pending_dirs)))
940
processed_dirs.update(pending_dirs)
943
def _discard_merge_parents(self):
944
"""Discard any parents trees beyond the first.
946
Note that if this fails the dirstate is corrupted.
948
After this function returns the dirstate contains 2 trees, neither of
951
self._read_header_if_needed()
952
parents = self.get_parent_ids()
955
# only require all dirblocks if we are doing a full-pass removal.
956
self._read_dirblocks_if_needed()
957
dead_patterns = set([('a', 'r'), ('a', 'a'), ('r', 'r'), ('r', 'a')])
958
def iter_entries_removable():
959
for block in self._dirblocks:
960
deleted_positions = []
961
for pos, entry in enumerate(block[1]):
963
if (entry[1][0][0], entry[1][1][0]) in dead_patterns:
964
deleted_positions.append(pos)
965
if deleted_positions:
966
if len(deleted_positions) == len(block[1]):
969
for pos in reversed(deleted_positions):
971
# if the first parent is a ghost:
972
if parents[0] in self.get_ghosts():
973
empty_parent = [DirState.NULL_PARENT_DETAILS]
974
for entry in iter_entries_removable():
975
entry[1][1:] = empty_parent
977
for entry in iter_entries_removable():
981
self._parents = [parents[0]]
982
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
983
self._header_state = DirState.IN_MEMORY_MODIFIED
985
def _empty_parent_info(self):
986
return [DirState.NULL_PARENT_DETAILS] * (len(self._parents) -
989
def _ensure_block(self, parent_block_index, parent_row_index, dirname):
990
"""Ensure a block for dirname exists.
992
This function exists to let callers which know that there is a
993
directory dirname ensure that the block for it exists. This block can
994
fail to exist because of demand loading, or because a directory had no
995
children. In either case it is not an error. It is however an error to
996
call this if there is no parent entry for the directory, and thus the
997
function requires the coordinates of such an entry to be provided.
999
The root row is special cased and can be indicated with a parent block
1002
:param parent_block_index: The index of the block in which dirname's row
1004
:param parent_row_index: The index in the parent block where the row
1006
:param dirname: The utf8 dirname to ensure there is a block for.
1007
:return: The index for the block.
1009
if dirname == '' and parent_row_index == 0 and parent_block_index == 0:
1010
# This is the signature of the root row, and the
1011
# contents-of-root row is always index 1
1013
# the basename of the directory must be the end of its full name.
1014
if not (parent_block_index == -1 and
1015
parent_block_index == -1 and dirname == ''):
1016
if not dirname.endswith(
1017
self._dirblocks[parent_block_index][1][parent_row_index][0][1]):
1018
raise AssertionError("bad dirname %r" % dirname)
1019
block_index, present = self._find_block_index_from_key((dirname, '', ''))
1021
## In future, when doing partial parsing, this should load and
1022
# populate the entire block.
1023
self._dirblocks.insert(block_index, (dirname, []))
1026
def _entries_to_current_state(self, new_entries):
1027
"""Load new_entries into self.dirblocks.
1029
Process new_entries into the current state object, making them the active
1030
state. The entries are grouped together by directory to form dirblocks.
1032
:param new_entries: A sorted list of entries. This function does not sort
1033
to prevent unneeded overhead when callers have a sorted list already.
1036
if new_entries[0][0][0:2] != ('', ''):
1037
raise AssertionError(
1038
"Missing root row %r" % (new_entries[0][0],))
1039
# The two blocks here are deliberate: the root block and the
1040
# contents-of-root block.
1041
self._dirblocks = [('', []), ('', [])]
1042
current_block = self._dirblocks[0][1]
1043
current_dirname = ''
1045
append_entry = current_block.append
1046
for entry in new_entries:
1047
if entry[0][0] != current_dirname:
1048
# new block - different dirname
1050
current_dirname = entry[0][0]
1051
self._dirblocks.append((current_dirname, current_block))
1052
append_entry = current_block.append
1053
# append the entry to the current block
1055
self._split_root_dirblock_into_contents()
1057
def _split_root_dirblock_into_contents(self):
1058
"""Split the root dirblocks into root and contents-of-root.
1060
After parsing by path, we end up with root entries and contents-of-root
1061
entries in the same block. This loop splits them out again.
1063
# The above loop leaves the "root block" entries mixed with the
1064
# "contents-of-root block". But we don't want an if check on
1065
# all entries, so instead we just fix it up here.
1066
if self._dirblocks[1] != ('', []):
1067
raise ValueError("bad dirblock start %r" % (self._dirblocks[1],))
1069
contents_of_root_block = []
1070
for entry in self._dirblocks[0][1]:
1071
if not entry[0][1]: # This is a root entry
1072
root_block.append(entry)
1074
contents_of_root_block.append(entry)
1075
self._dirblocks[0] = ('', root_block)
1076
self._dirblocks[1] = ('', contents_of_root_block)
1078
def _entries_for_path(self, path):
1079
"""Return a list with all the entries that match path for all ids."""
1080
dirname, basename = os.path.split(path)
1081
key = (dirname, basename, '')
1082
block_index, present = self._find_block_index_from_key(key)
1084
# the block which should contain path is absent.
1087
block = self._dirblocks[block_index][1]
1088
entry_index, _ = self._find_entry_index(key, block)
1089
# we may need to look at multiple entries at this path: walk while the specific_files match.
1090
while (entry_index < len(block) and
1091
block[entry_index][0][0:2] == key[0:2]):
1092
result.append(block[entry_index])
1096
def _entry_to_line(self, entry):
1097
"""Serialize entry to a NULL delimited line ready for _get_output_lines.
1099
:param entry: An entry_tuple as defined in the module docstring.
1101
entire_entry = list(entry[0])
1102
for tree_number, tree_data in enumerate(entry[1]):
1103
# (minikind, fingerprint, size, executable, tree_specific_string)
1104
entire_entry.extend(tree_data)
1105
# 3 for the key, 5 for the fields per tree.
1106
tree_offset = 3 + tree_number * 5
1108
entire_entry[tree_offset + 0] = tree_data[0]
1110
entire_entry[tree_offset + 2] = str(tree_data[2])
1112
entire_entry[tree_offset + 3] = DirState._to_yesno[tree_data[3]]
1113
return '\0'.join(entire_entry)
1115
def _fields_per_entry(self):
1116
"""How many null separated fields should be in each entry row.
1118
Each line now has an extra '\n' field which is not used
1119
so we just skip over it
1121
3 fields for the key
1122
+ number of fields per tree_data (5) * tree count
1125
tree_count = 1 + self._num_present_parents()
1126
return 3 + 5 * tree_count + 1
1128
def _find_block(self, key, add_if_missing=False):
1129
"""Return the block that key should be present in.
1131
:param key: A dirstate entry key.
1132
:return: The block tuple.
1134
block_index, present = self._find_block_index_from_key(key)
1136
if not add_if_missing:
1137
# check to see if key is versioned itself - we might want to
1138
# add it anyway, because dirs with no entries dont get a
1139
# dirblock at parse time.
1140
# This is an uncommon branch to take: most dirs have children,
1141
# and most code works with versioned paths.
1142
parent_base, parent_name = osutils.split(key[0])
1143
if not self._get_block_entry_index(parent_base, parent_name, 0)[3]:
1144
# some parent path has not been added - its an error to add
1146
raise errors.NotVersionedError(key[0:2], str(self))
1147
self._dirblocks.insert(block_index, (key[0], []))
1148
return self._dirblocks[block_index]
1150
def _find_block_index_from_key(self, key):
1151
"""Find the dirblock index for a key.
1153
:return: The block index, True if the block for the key is present.
1155
if key[0:2] == ('', ''):
1158
if (self._last_block_index is not None and
1159
self._dirblocks[self._last_block_index][0] == key[0]):
1160
return self._last_block_index, True
1163
block_index = bisect_dirblock(self._dirblocks, key[0], 1,
1164
cache=self._split_path_cache)
1165
# _right returns one-past-where-key is so we have to subtract
1166
# one to use it. we use _right here because there are two
1167
# '' blocks - the root, and the contents of root
1168
# we always have a minimum of 2 in self._dirblocks: root and
1169
# root-contents, and for '', we get 2 back, so this is
1170
# simple and correct:
1171
present = (block_index < len(self._dirblocks) and
1172
self._dirblocks[block_index][0] == key[0])
1173
self._last_block_index = block_index
1174
# Reset the entry index cache to the beginning of the block.
1175
self._last_entry_index = -1
1176
return block_index, present
1178
def _find_entry_index(self, key, block):
1179
"""Find the entry index for a key in a block.
1181
:return: The entry index, True if the entry for the key is present.
1183
len_block = len(block)
1185
if self._last_entry_index is not None:
1187
entry_index = self._last_entry_index + 1
1188
# A hit is when the key is after the last slot, and before or
1189
# equal to the next slot.
1190
if ((entry_index > 0 and block[entry_index - 1][0] < key) and
1191
key <= block[entry_index][0]):
1192
self._last_entry_index = entry_index
1193
present = (block[entry_index][0] == key)
1194
return entry_index, present
1197
entry_index = bisect.bisect_left(block, (key, []))
1198
present = (entry_index < len_block and
1199
block[entry_index][0] == key)
1200
self._last_entry_index = entry_index
1201
return entry_index, present
1204
def from_tree(tree, dir_state_filename):
1205
"""Create a dirstate from a bzr Tree.
1207
:param tree: The tree which should provide parent information and
1209
:return: a DirState object which is currently locked for writing.
1210
(it was locked by DirState.initialize)
1212
result = DirState.initialize(dir_state_filename)
1216
parent_ids = tree.get_parent_ids()
1217
num_parents = len(parent_ids)
1219
for parent_id in parent_ids:
1220
parent_tree = tree.branch.repository.revision_tree(parent_id)
1221
parent_trees.append((parent_id, parent_tree))
1222
parent_tree.lock_read()
1223
result.set_parent_trees(parent_trees, [])
1224
result.set_state_from_inventory(tree.inventory)
1226
for revid, parent_tree in parent_trees:
1227
parent_tree.unlock()
1230
# The caller won't have a chance to unlock this, so make sure we
1236
def update_by_delta(self, delta):
1237
"""Apply an inventory delta to the dirstate for tree 0
1239
:param delta: An inventory delta. See Inventory.apply_delta for
1242
self._read_dirblocks_if_needed()
1245
for old_path, new_path, file_id, inv_entry in sorted(delta, reverse=True):
1246
if (file_id in insertions) or (file_id in removals):
1247
raise AssertionError("repeated file id in delta %r" % (file_id,))
1248
if old_path is not None:
1249
old_path = old_path.encode('utf-8')
1250
removals[file_id] = old_path
1251
if new_path is not None:
1252
new_path = new_path.encode('utf-8')
1253
dirname, basename = osutils.split(new_path)
1254
key = (dirname, basename, file_id)
1255
minikind = DirState._kind_to_minikind[inv_entry.kind]
1257
fingerprint = inv_entry.reference_revision
1260
insertions[file_id] = (key, minikind, inv_entry.executable,
1261
fingerprint, new_path)
1262
# Transform moves into delete+add pairs
1263
if None not in (old_path, new_path):
1264
for child in self._iter_child_entries(0, old_path):
1265
if child[0][2] in insertions or child[0][2] in removals:
1267
child_dirname = child[0][0]
1268
child_basename = child[0][1]
1269
minikind = child[1][0][0]
1270
fingerprint = child[1][0][4]
1271
executable = child[1][0][3]
1272
old_child_path = osutils.pathjoin(child[0][0],
1274
removals[child[0][2]] = old_child_path
1275
child_suffix = child_dirname[len(old_path):]
1276
new_child_dirname = (new_path + child_suffix)
1277
key = (new_child_dirname, child_basename, child[0][2])
1278
new_child_path = os.path.join(new_child_dirname,
1280
insertions[child[0][2]] = (key, minikind, executable,
1281
fingerprint, new_child_path)
1282
self._apply_removals(removals.values())
1283
self._apply_insertions(insertions.values())
1285
def _apply_removals(self, removals):
1286
for path in sorted(removals, reverse=True):
1287
dirname, basename = osutils.split(path)
1288
block_i, entry_i, d_present, f_present = \
1289
self._get_block_entry_index(dirname, basename, 0)
1290
entry = self._dirblocks[block_i][1][entry_i]
1291
self._make_absent(entry)
1292
# See if we have a malformed delta: deleting a directory must not
1293
# leave crud behind. This increases the number of bisects needed
1294
# substantially, but deletion or renames of large numbers of paths
1295
# is rare enough it shouldn't be an issue (famous last words?) RBC
1297
block_i, entry_i, d_present, f_present = \
1298
self._get_block_entry_index(path, '', 0)
1300
# The dir block is still present in the dirstate; this could
1301
# be due to it being in a parent tree, or a corrupt delta.
1302
for child_entry in self._dirblocks[block_i][1]:
1303
if child_entry[1][0][0] not in ('r', 'a'):
1304
raise errors.InconsistentDelta(path, entry[0][2],
1305
"The file id was deleted but its children were "
1308
def _apply_insertions(self, adds):
1309
for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
1310
self.update_minimal(key, minikind, executable, fingerprint,
1311
path_utf8=path_utf8)
1313
def update_basis_by_delta(self, delta, new_revid):
1314
"""Update the parents of this tree after a commit.
1316
This gives the tree one parent, with revision id new_revid. The
1317
inventory delta is applied to the current basis tree to generate the
1318
inventory for the parent new_revid, and all other parent trees are
1321
Note that an exception during the operation of this method will leave
1322
the dirstate in a corrupt state where it should not be saved.
1324
Finally, we expect all changes to be synchronising the basis tree with
1327
:param new_revid: The new revision id for the trees parent.
1328
:param delta: An inventory delta (see apply_inventory_delta) describing
1329
the changes from the current left most parent revision to new_revid.
1331
self._read_dirblocks_if_needed()
1332
self._discard_merge_parents()
1333
if self._ghosts != []:
1334
raise NotImplementedError(self.update_basis_by_delta)
1335
if len(self._parents) == 0:
1336
# setup a blank tree, the most simple way.
1337
empty_parent = DirState.NULL_PARENT_DETAILS
1338
for entry in self._iter_entries():
1339
entry[1].append(empty_parent)
1340
self._parents.append(new_revid)
1342
self._parents[0] = new_revid
1344
delta = sorted(delta, reverse=True)
1348
# The paths this function accepts are unicode and must be encoded as we
1350
encode = cache_utf8.encode
1351
inv_to_entry = self._inv_entry_to_details
1352
# delta is now (deletes, changes), (adds) in reverse lexographical
1354
# deletes in reverse lexographic order are safe to process in situ.
1355
# renames are not, as a rename from any path could go to a path
1356
# lexographically lower, so we transform renames into delete, add pairs,
1357
# expanding them recursively as needed.
1358
# At the same time, to reduce interface friction we convert the input
1359
# inventory entries to dirstate.
1360
root_only = ('', '')
1361
for old_path, new_path, file_id, inv_entry in delta:
1362
if old_path is None:
1363
adds.append((None, encode(new_path), file_id,
1364
inv_to_entry(inv_entry), True))
1365
elif new_path is None:
1366
deletes.append((encode(old_path), None, file_id, None, True))
1367
elif (old_path, new_path) != root_only:
1369
# Because renames must preserve their children we must have
1370
# processed all relocations and removes before hand. The sort
1371
# order ensures we've examined the child paths, but we also
1372
# have to execute the removals, or the split to an add/delete
1373
# pair will result in the deleted item being reinserted, or
1374
# renamed items being reinserted twice - and possibly at the
1375
# wrong place. Splitting into a delete/add pair also simplifies
1376
# the handling of entries with ('f', ...), ('r' ...) because
1377
# the target of the 'r' is old_path here, and we add that to
1378
# deletes, meaning that the add handler does not need to check
1379
# for 'r' items on every pass.
1380
self._update_basis_apply_deletes(deletes)
1382
new_path_utf8 = encode(new_path)
1383
# Split into an add/delete pair recursively.
1384
adds.append((None, new_path_utf8, file_id,
1385
inv_to_entry(inv_entry), False))
1386
# Expunge deletes that we've seen so that deleted/renamed
1387
# children of a rename directory are handled correctly.
1388
new_deletes = reversed(list(self._iter_child_entries(1,
1390
# Remove the current contents of the tree at orig_path, and
1391
# reinsert at the correct new path.
1392
for entry in new_deletes:
1394
source_path = entry[0][0] + '/' + entry[0][1]
1396
source_path = entry[0][1]
1398
target_path = new_path_utf8 + source_path[len(old_path):]
1401
raise AssertionError("cannot rename directory to"
1403
target_path = source_path[len(old_path) + 1:]
1404
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1406
(source_path, target_path, entry[0][2], None, False))
1408
(encode(old_path), new_path, file_id, None, False))
1410
# changes to just the root should not require remove/insertion
1412
changes.append((encode(old_path), encode(new_path), file_id,
1413
inv_to_entry(inv_entry)))
1415
# Finish expunging deletes/first half of renames.
1416
self._update_basis_apply_deletes(deletes)
1417
# Reinstate second half of renames and new paths.
1418
self._update_basis_apply_adds(adds)
1419
# Apply in-situ changes.
1420
self._update_basis_apply_changes(changes)
1422
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1423
self._header_state = DirState.IN_MEMORY_MODIFIED
1424
self._id_index = None
1427
def _update_basis_apply_adds(self, adds):
1428
"""Apply a sequence of adds to tree 1 during update_basis_by_delta.
1430
They may be adds, or renames that have been split into add/delete
1433
:param adds: A sequence of adds. Each add is a tuple:
1434
(None, new_path_utf8, file_id, (entry_details), real_add). real_add
1435
is False when the add is the second half of a remove-and-reinsert
1436
pair created to handle renames and deletes.
1438
# Adds are accumulated partly from renames, so can be in any input
1441
# adds is now in lexographic order, which places all parents before
1442
# their children, so we can process it linearly.
1444
for old_path, new_path, file_id, new_details, real_add in adds:
1445
# the entry for this file_id must be in tree 0.
1446
entry = self._get_entry(0, file_id, new_path)
1447
if entry[0] is None or entry[0][2] != file_id:
1448
self._changes_aborted = True
1449
raise errors.InconsistentDelta(new_path, file_id,
1450
'working tree does not contain new entry')
1451
if real_add and entry[1][1][0] not in absent:
1452
self._changes_aborted = True
1453
raise errors.InconsistentDelta(new_path, file_id,
1454
'The entry was considered to be a genuinely new record,'
1455
' but there was already an old record for it.')
1456
# We don't need to update the target of an 'r' because the handling
1457
# of renames turns all 'r' situations into a delete at the original
1459
entry[1][1] = new_details
1461
def _update_basis_apply_changes(self, changes):
1462
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1464
:param adds: A sequence of changes. Each change is a tuple:
1465
(path_utf8, path_utf8, file_id, (entry_details))
1468
for old_path, new_path, file_id, new_details in changes:
1469
# the entry for this file_id must be in tree 0.
1470
entry = self._get_entry(0, file_id, new_path)
1471
if entry[0] is None or entry[0][2] != file_id:
1472
self._changes_aborted = True
1473
raise errors.InconsistentDelta(new_path, file_id,
1474
'working tree does not contain new entry')
1475
if (entry[1][0][0] in absent or
1476
entry[1][1][0] in absent):
1477
self._changes_aborted = True
1478
raise errors.InconsistentDelta(new_path, file_id,
1479
'changed considered absent')
1480
entry[1][1] = new_details
1482
def _update_basis_apply_deletes(self, deletes):
1483
"""Apply a sequence of deletes to tree 1 during update_basis_by_delta.
1485
They may be deletes, or renames that have been split into add/delete
1488
:param deletes: A sequence of deletes. Each delete is a tuple:
1489
(old_path_utf8, new_path_utf8, file_id, None, real_delete).
1490
real_delete is True when the desired outcome is an actual deletion
1491
rather than the rename handling logic temporarily deleting a path
1492
during the replacement of a parent.
1494
null = DirState.NULL_PARENT_DETAILS
1495
for old_path, new_path, file_id, _, real_delete in deletes:
1496
if real_delete != (new_path is None):
1497
raise AssertionError("bad delete delta")
1498
# the entry for this file_id must be in tree 1.
1499
dirname, basename = osutils.split(old_path)
1500
block_index, entry_index, dir_present, file_present = \
1501
self._get_block_entry_index(dirname, basename, 1)
1502
if not file_present:
1503
self._changes_aborted = True
1504
raise errors.InconsistentDelta(old_path, file_id,
1505
'basis tree does not contain removed entry')
1506
entry = self._dirblocks[block_index][1][entry_index]
1507
if entry[0][2] != file_id:
1508
self._changes_aborted = True
1509
raise errors.InconsistentDelta(old_path, file_id,
1510
'mismatched file_id in tree 1')
1512
if entry[1][0][0] != 'a':
1513
self._changes_aborted = True
1514
raise errors.InconsistentDelta(old_path, file_id,
1515
'This was marked as a real delete, but the WT state'
1516
' claims that it still exists and is versioned.')
1517
del self._dirblocks[block_index][1][entry_index]
1519
if entry[1][0][0] == 'a':
1520
self._changes_aborted = True
1521
raise errors.InconsistentDelta(old_path, file_id,
1522
'The entry was considered a rename, but the source path'
1523
' is marked as absent.')
1524
# For whatever reason, we were asked to rename an entry
1525
# that was originally marked as deleted. This could be
1526
# because we are renaming the parent directory, and the WT
1527
# current state has the file marked as deleted.
1528
elif entry[1][0][0] == 'r':
1529
# implement the rename
1530
del self._dirblocks[block_index][1][entry_index]
1532
# it is being resurrected here, so blank it out temporarily.
1533
self._dirblocks[block_index][1][entry_index][1][1] = null
1535
def _observed_sha1(self, entry, sha1, stat_value,
1536
_stat_to_minikind=_stat_to_minikind, _pack_stat=pack_stat):
1537
"""Note the sha1 of a file.
1539
:param entry: The entry the sha1 is for.
1540
:param sha1: The observed sha1.
1541
:param stat_value: The os.lstat for the file.
1544
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
1548
packed_stat = _pack_stat(stat_value)
1550
if self._cutoff_time is None:
1551
self._sha_cutoff_time()
1552
if (stat_value.st_mtime < self._cutoff_time
1553
and stat_value.st_ctime < self._cutoff_time):
1554
entry[1][0] = ('f', sha1, entry[1][0][2], entry[1][0][3],
1556
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1558
def _sha_cutoff_time(self):
1559
"""Return cutoff time.
1561
Files modified more recently than this time are at risk of being
1562
undetectably modified and so can't be cached.
1564
# Cache the cutoff time as long as we hold a lock.
1565
# time.time() isn't super expensive (approx 3.38us), but
1566
# when you call it 50,000 times it adds up.
1567
# For comparison, os.lstat() costs 7.2us if it is hot.
1568
self._cutoff_time = int(time.time()) - 3
1569
return self._cutoff_time
1571
def _lstat(self, abspath, entry):
1572
"""Return the os.lstat value for this path."""
1573
return os.lstat(abspath)
1575
def _size_sha1_file_and_mutter(self, abspath, filter_list):
1576
# when -Dhashcache is turned on, this is monkey-patched in to log
1578
trace.mutter("dirstate sha1 " + abspath)
1579
return filters.internal_size_sha_file_byname(abspath, filter_list)
1581
def _is_executable(self, mode, old_executable):
1582
"""Is this file executable?"""
1583
return bool(S_IEXEC & mode)
1585
def _is_executable_win32(self, mode, old_executable):
1586
"""On win32 the executable bit is stored in the dirstate."""
1587
return old_executable
1589
if sys.platform == 'win32':
1590
_is_executable = _is_executable_win32
1592
def _read_link(self, abspath, old_link):
1593
"""Read the target of a symlink"""
1594
# TODO: jam 200700301 On Win32, this could just return the value
1595
# already in memory. However, this really needs to be done at a
1596
# higher level, because there either won't be anything on disk,
1597
# or the thing on disk will be a file.
1598
return os.readlink(abspath.encode(osutils._fs_enc))
1600
def get_ghosts(self):
1601
"""Return a list of the parent tree revision ids that are ghosts."""
1602
self._read_header_if_needed()
1605
def get_lines(self):
1606
"""Serialise the entire dirstate to a sequence of lines."""
1607
if (self._header_state == DirState.IN_MEMORY_UNMODIFIED and
1608
self._dirblock_state == DirState.IN_MEMORY_UNMODIFIED):
1609
# read whats on disk.
1610
self._state_file.seek(0)
1611
return self._state_file.readlines()
1613
lines.append(self._get_parents_line(self.get_parent_ids()))
1614
lines.append(self._get_ghosts_line(self._ghosts))
1615
# append the root line which is special cased
1616
lines.extend(map(self._entry_to_line, self._iter_entries()))
1617
return self._get_output_lines(lines)
1619
def _get_ghosts_line(self, ghost_ids):
1620
"""Create a line for the state file for ghost information."""
1621
return '\0'.join([str(len(ghost_ids))] + ghost_ids)
1623
def _get_parents_line(self, parent_ids):
1624
"""Create a line for the state file for parents information."""
1625
return '\0'.join([str(len(parent_ids))] + parent_ids)
1627
def _get_fields_to_entry(self):
1628
"""Get a function which converts entry fields into a entry record.
1630
This handles size and executable, as well as parent records.
1632
:return: A function which takes a list of fields, and returns an
1633
appropriate record for storing in memory.
1635
# This is intentionally unrolled for performance
1636
num_present_parents = self._num_present_parents()
1637
if num_present_parents == 0:
1638
def fields_to_entry_0_parents(fields, _int=int):
1639
path_name_file_id_key = (fields[0], fields[1], fields[2])
1640
return (path_name_file_id_key, [
1642
fields[3], # minikind
1643
fields[4], # fingerprint
1644
_int(fields[5]), # size
1645
fields[6] == 'y', # executable
1646
fields[7], # packed_stat or revision_id
1648
return fields_to_entry_0_parents
1649
elif num_present_parents == 1:
1650
def fields_to_entry_1_parent(fields, _int=int):
1651
path_name_file_id_key = (fields[0], fields[1], fields[2])
1652
return (path_name_file_id_key, [
1654
fields[3], # minikind
1655
fields[4], # fingerprint
1656
_int(fields[5]), # size
1657
fields[6] == 'y', # executable
1658
fields[7], # packed_stat or revision_id
1661
fields[8], # minikind
1662
fields[9], # fingerprint
1663
_int(fields[10]), # size
1664
fields[11] == 'y', # executable
1665
fields[12], # packed_stat or revision_id
1668
return fields_to_entry_1_parent
1669
elif num_present_parents == 2:
1670
def fields_to_entry_2_parents(fields, _int=int):
1671
path_name_file_id_key = (fields[0], fields[1], fields[2])
1672
return (path_name_file_id_key, [
1674
fields[3], # minikind
1675
fields[4], # fingerprint
1676
_int(fields[5]), # size
1677
fields[6] == 'y', # executable
1678
fields[7], # packed_stat or revision_id
1681
fields[8], # minikind
1682
fields[9], # fingerprint
1683
_int(fields[10]), # size
1684
fields[11] == 'y', # executable
1685
fields[12], # packed_stat or revision_id
1688
fields[13], # minikind
1689
fields[14], # fingerprint
1690
_int(fields[15]), # size
1691
fields[16] == 'y', # executable
1692
fields[17], # packed_stat or revision_id
1695
return fields_to_entry_2_parents
1697
def fields_to_entry_n_parents(fields, _int=int):
1698
path_name_file_id_key = (fields[0], fields[1], fields[2])
1699
trees = [(fields[cur], # minikind
1700
fields[cur+1], # fingerprint
1701
_int(fields[cur+2]), # size
1702
fields[cur+3] == 'y', # executable
1703
fields[cur+4], # stat or revision_id
1704
) for cur in xrange(3, len(fields)-1, 5)]
1705
return path_name_file_id_key, trees
1706
return fields_to_entry_n_parents
1708
def get_parent_ids(self):
1709
"""Return a list of the parent tree ids for the directory state."""
1710
self._read_header_if_needed()
1711
return list(self._parents)
1713
def _get_block_entry_index(self, dirname, basename, tree_index):
1714
"""Get the coordinates for a path in the state structure.
1716
:param dirname: The utf8 dirname to lookup.
1717
:param basename: The utf8 basename to lookup.
1718
:param tree_index: The index of the tree for which this lookup should
1720
:return: A tuple describing where the path is located, or should be
1721
inserted. The tuple contains four fields: the block index, the row
1722
index, the directory is present (boolean), the entire path is
1723
present (boolean). There is no guarantee that either
1724
coordinate is currently reachable unless the found field for it is
1725
True. For instance, a directory not present in the searched tree
1726
may be returned with a value one greater than the current highest
1727
block offset. The directory present field will always be True when
1728
the path present field is True. The directory present field does
1729
NOT indicate that the directory is present in the searched tree,
1730
rather it indicates that there are at least some files in some
1733
self._read_dirblocks_if_needed()
1734
key = dirname, basename, ''
1735
block_index, present = self._find_block_index_from_key(key)
1737
# no such directory - return the dir index and 0 for the row.
1738
return block_index, 0, False, False
1739
block = self._dirblocks[block_index][1] # access the entries only
1740
entry_index, present = self._find_entry_index(key, block)
1741
# linear search through entries at this path to find the one
1743
while entry_index < len(block) and block[entry_index][0][1] == basename:
1744
if block[entry_index][1][tree_index][0] not in 'ar':
1745
# neither absent or relocated
1746
return block_index, entry_index, True, True
1748
return block_index, entry_index, True, False
1750
def _get_entry(self, tree_index, fileid_utf8=None, path_utf8=None, include_deleted=False):
1751
"""Get the dirstate entry for path in tree tree_index.
1753
If either file_id or path is supplied, it is used as the key to lookup.
1754
If both are supplied, the fastest lookup is used, and an error is
1755
raised if they do not both point at the same row.
1757
:param tree_index: The index of the tree we wish to locate this path
1758
in. If the path is present in that tree, the entry containing its
1759
details is returned, otherwise (None, None) is returned
1760
0 is the working tree, higher indexes are successive parent
1762
:param fileid_utf8: A utf8 file_id to look up.
1763
:param path_utf8: An utf8 path to be looked up.
1764
:param include_deleted: If True, and performing a lookup via
1765
fileid_utf8 rather than path_utf8, return an entry for deleted
1767
:return: The dirstate entry tuple for path, or (None, None)
1769
self._read_dirblocks_if_needed()
1770
if path_utf8 is not None:
1771
if type(path_utf8) is not str:
1772
raise AssertionError('path_utf8 is not a str: %s %s'
1773
% (type(path_utf8), path_utf8))
1774
# path lookups are faster
1775
dirname, basename = osutils.split(path_utf8)
1776
block_index, entry_index, dir_present, file_present = \
1777
self._get_block_entry_index(dirname, basename, tree_index)
1778
if not file_present:
1780
entry = self._dirblocks[block_index][1][entry_index]
1781
if not (entry[0][2] and entry[1][tree_index][0] not in ('a', 'r')):
1782
raise AssertionError('unversioned entry?')
1784
if entry[0][2] != fileid_utf8:
1785
self._changes_aborted = True
1786
raise errors.BzrError('integrity error ? : mismatching'
1787
' tree_index, file_id and path')
1790
possible_keys = self._get_id_index().get(fileid_utf8, None)
1791
if not possible_keys:
1793
for key in possible_keys:
1794
block_index, present = \
1795
self._find_block_index_from_key(key)
1796
# strange, probably indicates an out of date
1797
# id index - for now, allow this.
1800
# WARNING: DO not change this code to use _get_block_entry_index
1801
# as that function is not suitable: it does not use the key
1802
# to lookup, and thus the wrong coordinates are returned.
1803
block = self._dirblocks[block_index][1]
1804
entry_index, present = self._find_entry_index(key, block)
1806
entry = self._dirblocks[block_index][1][entry_index]
1807
if entry[1][tree_index][0] in 'fdlt':
1808
# this is the result we are looking for: the
1809
# real home of this file_id in this tree.
1811
if entry[1][tree_index][0] == 'a':
1812
# there is no home for this entry in this tree
1816
if entry[1][tree_index][0] != 'r':
1817
raise AssertionError(
1818
"entry %r has invalid minikind %r for tree %r" \
1820
entry[1][tree_index][0],
1822
real_path = entry[1][tree_index][1]
1823
return self._get_entry(tree_index, fileid_utf8=fileid_utf8,
1824
path_utf8=real_path)
1828
def initialize(cls, path):
1829
"""Create a new dirstate on path.
1831
The new dirstate will be an empty tree - that is it has no parents,
1832
and only a root node - which has id ROOT_ID.
1834
:param path: The name of the file for the dirstate.
1835
:return: A write-locked DirState object.
1837
# This constructs a new DirState object on a path, sets the _state_file
1838
# to a new empty file for that path. It then calls _set_data() with our
1839
# stock empty dirstate information - a root with ROOT_ID, no children,
1840
# and no parents. Finally it calls save() to ensure that this data will
1843
# root dir and root dir contents with no children.
1844
empty_tree_dirblocks = [('', []), ('', [])]
1845
# a new root directory, with a NULLSTAT.
1846
empty_tree_dirblocks[0][1].append(
1847
(('', '', inventory.ROOT_ID), [
1848
('d', '', 0, False, DirState.NULLSTAT),
1852
result._set_data([], empty_tree_dirblocks)
1860
def _inv_entry_to_details(inv_entry):
1861
"""Convert an inventory entry (from a revision tree) to state details.
1863
:param inv_entry: An inventory entry whose sha1 and link targets can be
1864
relied upon, and which has a revision set.
1865
:return: A details tuple - the details for a single tree at a path +
1868
kind = inv_entry.kind
1869
minikind = DirState._kind_to_minikind[kind]
1870
tree_data = inv_entry.revision
1871
if kind == 'directory':
1875
elif kind == 'symlink':
1876
# We don't support non-ascii targets for symlinks yet.
1877
fingerprint = str(inv_entry.symlink_target or '')
1880
elif kind == 'file':
1881
fingerprint = inv_entry.text_sha1 or ''
1882
size = inv_entry.text_size or 0
1883
executable = inv_entry.executable
1884
elif kind == 'tree-reference':
1885
fingerprint = inv_entry.reference_revision or ''
1889
raise Exception("can't pack %s" % inv_entry)
1890
return (minikind, fingerprint, size, executable, tree_data)
1892
def _iter_child_entries(self, tree_index, path_utf8):
1893
"""Iterate over all the entries that are children of path_utf.
1895
This only returns entries that are present (not in 'a', 'r') in
1896
tree_index. tree_index data is not refreshed, so if tree 0 is used,
1897
results may differ from that obtained if paths were statted to
1898
determine what ones were directories.
1900
Asking for the children of a non-directory will return an empty
1904
next_pending_dirs = [path_utf8]
1906
while next_pending_dirs:
1907
pending_dirs = next_pending_dirs
1908
next_pending_dirs = []
1909
for path in pending_dirs:
1910
block_index, present = self._find_block_index_from_key(
1912
if block_index == 0:
1914
if len(self._dirblocks) == 1:
1915
# asked for the children of the root with no other
1919
# children of a non-directory asked for.
1921
block = self._dirblocks[block_index]
1922
for entry in block[1]:
1923
kind = entry[1][tree_index][0]
1924
if kind not in absent:
1928
path = entry[0][0] + '/' + entry[0][1]
1931
next_pending_dirs.append(path)
1933
def _iter_entries(self):
1934
"""Iterate over all the entries in the dirstate.
1936
Each yelt item is an entry in the standard format described in the
1937
docstring of bzrlib.dirstate.
1939
self._read_dirblocks_if_needed()
1940
for directory in self._dirblocks:
1941
for entry in directory[1]:
1944
def _get_id_index(self):
1945
"""Get an id index of self._dirblocks."""
1946
if self._id_index is None:
1948
for key, tree_details in self._iter_entries():
1949
id_index.setdefault(key[2], set()).add(key)
1950
self._id_index = id_index
1951
return self._id_index
1953
def _get_output_lines(self, lines):
1954
"""Format lines for final output.
1956
:param lines: A sequence of lines containing the parents list and the
1959
output_lines = [DirState.HEADER_FORMAT_3]
1960
lines.append('') # a final newline
1961
inventory_text = '\0\n\0'.join(lines)
1962
output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
1963
# -3, 1 for num parents, 1 for ghosts, 1 for final newline
1964
num_entries = len(lines)-3
1965
output_lines.append('num_entries: %s\n' % (num_entries,))
1966
output_lines.append(inventory_text)
1969
def _make_deleted_row(self, fileid_utf8, parents):
1970
"""Return a deleted row for fileid_utf8."""
1971
return ('/', 'RECYCLED.BIN', 'file', fileid_utf8, 0, DirState.NULLSTAT,
1974
def _num_present_parents(self):
1975
"""The number of parent entries in each record row."""
1976
return len(self._parents) - len(self._ghosts)
1979
def on_file(path, content_filter_stack_provider=None):
1980
"""Construct a DirState on the file at path path.
1982
:param content_filter_stack_provider: a function that takes a
1983
path (relative to the top of the tree) and a file-id as
1984
parameters and returns a stack of ContentFilters.
1985
If None, no content filtering is performed.
1986
:return: An unlocked DirState object, associated with the given path.
1988
result = DirState(path, content_filter_stack_provider)
1991
def _read_dirblocks_if_needed(self):
1992
"""Read in all the dirblocks from the file if they are not in memory.
1994
This populates self._dirblocks, and sets self._dirblock_state to
1995
IN_MEMORY_UNMODIFIED. It is not currently ready for incremental block
1998
self._read_header_if_needed()
1999
if self._dirblock_state == DirState.NOT_IN_MEMORY:
2000
_read_dirblocks(self)
2002
def _read_header(self):
2003
"""This reads in the metadata header, and the parent ids.
2005
After reading in, the file should be positioned at the null
2006
just before the start of the first record in the file.
2008
:return: (expected crc checksum, number of entries, parent list)
2010
self._read_prelude()
2011
parent_line = self._state_file.readline()
2012
info = parent_line.split('\0')
2013
num_parents = int(info[0])
2014
self._parents = info[1:-1]
2015
ghost_line = self._state_file.readline()
2016
info = ghost_line.split('\0')
2017
num_ghosts = int(info[1])
2018
self._ghosts = info[2:-1]
2019
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2020
self._end_of_header = self._state_file.tell()
2022
def _read_header_if_needed(self):
2023
"""Read the header of the dirstate file if needed."""
2024
# inline this as it will be called a lot
2025
if not self._lock_token:
2026
raise errors.ObjectNotLocked(self)
2027
if self._header_state == DirState.NOT_IN_MEMORY:
2030
def _read_prelude(self):
2031
"""Read in the prelude header of the dirstate file.
2033
This only reads in the stuff that is not connected to the crc
2034
checksum. The position will be correct to read in the rest of
2035
the file and check the checksum after this point.
2036
The next entry in the file should be the number of parents,
2037
and their ids. Followed by a newline.
2039
header = self._state_file.readline()
2040
if header != DirState.HEADER_FORMAT_3:
2041
raise errors.BzrError(
2042
'invalid header line: %r' % (header,))
2043
crc_line = self._state_file.readline()
2044
if not crc_line.startswith('crc32: '):
2045
raise errors.BzrError('missing crc32 checksum: %r' % crc_line)
2046
self.crc_expected = int(crc_line[len('crc32: '):-1])
2047
num_entries_line = self._state_file.readline()
2048
if not num_entries_line.startswith('num_entries: '):
2049
raise errors.BzrError('missing num_entries line')
2050
self._num_entries = int(num_entries_line[len('num_entries: '):-1])
2052
def sha1_from_stat(self, path, stat_result, _pack_stat=pack_stat):
2053
"""Find a sha1 given a stat lookup."""
2054
return self._get_packed_stat_index().get(_pack_stat(stat_result), None)
2056
def _get_packed_stat_index(self):
2057
"""Get a packed_stat index of self._dirblocks."""
2058
if self._packed_stat_index is None:
2060
for key, tree_details in self._iter_entries():
2061
if tree_details[0][0] == 'f':
2062
index[tree_details[0][4]] = tree_details[0][1]
2063
self._packed_stat_index = index
2064
return self._packed_stat_index
2067
"""Save any pending changes created during this session.
2069
We reuse the existing file, because that prevents race conditions with
2070
file creation, and use oslocks on it to prevent concurrent modification
2071
and reads - because dirstate's incremental data aggregation is not
2072
compatible with reading a modified file, and replacing a file in use by
2073
another process is impossible on Windows.
2075
A dirstate in read only mode should be smart enough though to validate
2076
that the file has not changed, and otherwise discard its cache and
2077
start over, to allow for fine grained read lock duration, so 'status'
2078
wont block 'commit' - for example.
2080
if self._changes_aborted:
2081
# Should this be a warning? For now, I'm expecting that places that
2082
# mark it inconsistent will warn, making a warning here redundant.
2083
trace.mutter('Not saving DirState because '
2084
'_changes_aborted is set.')
2086
if (self._header_state == DirState.IN_MEMORY_MODIFIED or
2087
self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2089
grabbed_write_lock = False
2090
if self._lock_state != 'w':
2091
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2092
# Switch over to the new lock, as the old one may be closed.
2093
# TODO: jam 20070315 We should validate the disk file has
2094
# not changed contents. Since temporary_write_lock may
2095
# not be an atomic operation.
2096
self._lock_token = new_lock
2097
self._state_file = new_lock.f
2098
if not grabbed_write_lock:
2099
# We couldn't grab a write lock, so we switch back to a read one
2102
self._state_file.seek(0)
2103
self._state_file.writelines(self.get_lines())
2104
self._state_file.truncate()
2105
self._state_file.flush()
2106
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2107
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
2109
if grabbed_write_lock:
2110
self._lock_token = self._lock_token.restore_read_lock()
2111
self._state_file = self._lock_token.f
2112
# TODO: jam 20070315 We should validate the disk file has
2113
# not changed contents. Since restore_read_lock may
2114
# not be an atomic operation.
2116
def _set_data(self, parent_ids, dirblocks):
2117
"""Set the full dirstate data in memory.
2119
This is an internal function used to completely replace the objects
2120
in memory state. It puts the dirstate into state 'full-dirty'.
2122
:param parent_ids: A list of parent tree revision ids.
2123
:param dirblocks: A list containing one tuple for each directory in the
2124
tree. Each tuple contains the directory path and a list of entries
2125
found in that directory.
2127
# our memory copy is now authoritative.
2128
self._dirblocks = dirblocks
2129
self._header_state = DirState.IN_MEMORY_MODIFIED
2130
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2131
self._parents = list(parent_ids)
2132
self._id_index = None
2133
self._packed_stat_index = None
2135
def set_path_id(self, path, new_id):
2136
"""Change the id of path to new_id in the current working tree.
2138
:param path: The path inside the tree to set - '' is the root, 'foo'
2139
is the path foo in the root.
2140
:param new_id: The new id to assign to the path. This must be a utf8
2141
file id (not unicode, and not None).
2143
self._read_dirblocks_if_needed()
2145
# TODO: logic not written
2146
raise NotImplementedError(self.set_path_id)
2147
# TODO: check new id is unique
2148
entry = self._get_entry(0, path_utf8=path)
2149
if entry[0][2] == new_id:
2150
# Nothing to change.
2152
# mark the old path absent, and insert a new root path
2153
self._make_absent(entry)
2154
self.update_minimal(('', '', new_id), 'd',
2155
path_utf8='', packed_stat=entry[1][0][4])
2156
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2157
if self._id_index is not None:
2158
self._id_index.setdefault(new_id, set()).add(entry[0])
2160
def set_parent_trees(self, trees, ghosts):
2161
"""Set the parent trees for the dirstate.
2163
:param trees: A list of revision_id, tree tuples. tree must be provided
2164
even if the revision_id refers to a ghost: supply an empty tree in
2166
:param ghosts: A list of the revision_ids that are ghosts at the time
2169
# TODO: generate a list of parent indexes to preserve to save
2170
# processing specific parent trees. In the common case one tree will
2171
# be preserved - the left most parent.
2172
# TODO: if the parent tree is a dirstate, we might want to walk them
2173
# all by path in parallel for 'optimal' common-case performance.
2174
# generate new root row.
2175
self._read_dirblocks_if_needed()
2176
# TODO future sketch: Examine the existing parents to generate a change
2177
# map and then walk the new parent trees only, mapping them into the
2178
# dirstate. Walk the dirstate at the same time to remove unreferenced
2181
# sketch: loop over all entries in the dirstate, cherry picking
2182
# entries from the parent trees, if they are not ghost trees.
2183
# after we finish walking the dirstate, all entries not in the dirstate
2184
# are deletes, so we want to append them to the end as per the design
2185
# discussions. So do a set difference on ids with the parents to
2186
# get deletes, and add them to the end.
2187
# During the update process we need to answer the following questions:
2188
# - find other keys containing a fileid in order to create cross-path
2189
# links. We dont't trivially use the inventory from other trees
2190
# because this leads to either double touching, or to accessing
2192
# - find other keys containing a path
2193
# We accumulate each entry via this dictionary, including the root
2196
# we could do parallel iterators, but because file id data may be
2197
# scattered throughout, we dont save on index overhead: we have to look
2198
# at everything anyway. We can probably save cycles by reusing parent
2199
# data and doing an incremental update when adding an additional
2200
# parent, but for now the common cases are adding a new parent (merge),
2201
# and replacing completely (commit), and commit is more common: so
2202
# optimise merge later.
2204
# ---- start generation of full tree mapping data
2205
# what trees should we use?
2206
parent_trees = [tree for rev_id, tree in trees if rev_id not in ghosts]
2207
# how many trees do we end up with
2208
parent_count = len(parent_trees)
2210
# one: the current tree
2211
for entry in self._iter_entries():
2212
# skip entries not in the current tree
2213
if entry[1][0][0] in 'ar': # absent, relocated
2215
by_path[entry[0]] = [entry[1][0]] + \
2216
[DirState.NULL_PARENT_DETAILS] * parent_count
2217
id_index[entry[0][2]] = set([entry[0]])
2219
# now the parent trees:
2220
for tree_index, tree in enumerate(parent_trees):
2221
# the index is off by one, adjust it.
2222
tree_index = tree_index + 1
2223
# when we add new locations for a fileid we need these ranges for
2224
# any fileid in this tree as we set the by_path[id] to:
2225
# already_processed_tree_details + new_details + new_location_suffix
2226
# the suffix is from tree_index+1:parent_count+1.
2227
new_location_suffix = [DirState.NULL_PARENT_DETAILS] * (parent_count - tree_index)
2228
# now stitch in all the entries from this tree
2229
for path, entry in tree.inventory.iter_entries_by_dir():
2230
# here we process each trees details for each item in the tree.
2231
# we first update any existing entries for the id at other paths,
2232
# then we either create or update the entry for the id at the
2233
# right path, and finally we add (if needed) a mapping from
2234
# file_id to this path. We do it in this order to allow us to
2235
# avoid checking all known paths for the id when generating a
2236
# new entry at this path: by adding the id->path mapping last,
2237
# all the mappings are valid and have correct relocation
2238
# records where needed.
2239
file_id = entry.file_id
2240
path_utf8 = path.encode('utf8')
2241
dirname, basename = osutils.split(path_utf8)
2242
new_entry_key = (dirname, basename, file_id)
2243
# tree index consistency: All other paths for this id in this tree
2244
# index must point to the correct path.
2245
for entry_key in id_index.setdefault(file_id, set()):
2246
# TODO:PROFILING: It might be faster to just update
2247
# rather than checking if we need to, and then overwrite
2248
# the one we are located at.
2249
if entry_key != new_entry_key:
2250
# this file id is at a different path in one of the
2251
# other trees, so put absent pointers there
2252
# This is the vertical axis in the matrix, all pointing
2254
by_path[entry_key][tree_index] = ('r', path_utf8, 0, False, '')
2255
# by path consistency: Insert into an existing path record (trivial), or
2256
# add a new one with relocation pointers for the other tree indexes.
2257
if new_entry_key in id_index[file_id]:
2258
# there is already an entry where this data belongs, just insert it.
2259
by_path[new_entry_key][tree_index] = \
2260
self._inv_entry_to_details(entry)
2262
# add relocated entries to the horizontal axis - this row
2263
# mapping from path,id. We need to look up the correct path
2264
# for the indexes from 0 to tree_index -1
2266
for lookup_index in xrange(tree_index):
2267
# boundary case: this is the first occurence of file_id
2268
# so there are no id_indexs, possibly take this out of
2270
if not len(id_index[file_id]):
2271
new_details.append(DirState.NULL_PARENT_DETAILS)
2273
# grab any one entry, use it to find the right path.
2274
# TODO: optimise this to reduce memory use in highly
2275
# fragmented situations by reusing the relocation
2277
a_key = iter(id_index[file_id]).next()
2278
if by_path[a_key][lookup_index][0] in ('r', 'a'):
2279
# its a pointer or missing statement, use it as is.
2280
new_details.append(by_path[a_key][lookup_index])
2282
# we have the right key, make a pointer to it.
2283
real_path = ('/'.join(a_key[0:2])).strip('/')
2284
new_details.append(('r', real_path, 0, False, ''))
2285
new_details.append(self._inv_entry_to_details(entry))
2286
new_details.extend(new_location_suffix)
2287
by_path[new_entry_key] = new_details
2288
id_index[file_id].add(new_entry_key)
2289
# --- end generation of full tree mappings
2291
# sort and output all the entries
2292
new_entries = self._sort_entries(by_path.items())
2293
self._entries_to_current_state(new_entries)
2294
self._parents = [rev_id for rev_id, tree in trees]
2295
self._ghosts = list(ghosts)
2296
self._header_state = DirState.IN_MEMORY_MODIFIED
2297
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2298
self._id_index = id_index
2300
def _sort_entries(self, entry_list):
2301
"""Given a list of entries, sort them into the right order.
2303
This is done when constructing a new dirstate from trees - normally we
2304
try to keep everything in sorted blocks all the time, but sometimes
2305
it's easier to sort after the fact.
2308
# sort by: directory parts, file name, file id
2309
return entry[0][0].split('/'), entry[0][1], entry[0][2]
2310
return sorted(entry_list, key=_key)
2312
def set_state_from_inventory(self, new_inv):
2313
"""Set new_inv as the current state.
2315
This API is called by tree transform, and will usually occur with
2316
existing parent trees.
2318
:param new_inv: The inventory object to set current state from.
2320
if 'evil' in debug.debug_flags:
2321
trace.mutter_callsite(1,
2322
"set_state_from_inventory called; please mutate the tree instead")
2323
self._read_dirblocks_if_needed()
2325
# Two iterators: current data and new data, both in dirblock order.
2326
# We zip them together, which tells about entries that are new in the
2327
# inventory, or removed in the inventory, or present in both and
2330
# You might think we could just synthesize a new dirstate directly
2331
# since we're processing it in the right order. However, we need to
2332
# also consider there may be any number of parent trees and relocation
2333
# pointers, and we don't want to duplicate that here.
2334
new_iterator = new_inv.iter_entries_by_dir()
2335
# we will be modifying the dirstate, so we need a stable iterator. In
2336
# future we might write one, for now we just clone the state into a
2337
# list - which is a shallow copy.
2338
old_iterator = iter(list(self._iter_entries()))
2339
# both must have roots so this is safe:
2340
current_new = new_iterator.next()
2341
current_old = old_iterator.next()
2342
def advance(iterator):
2344
return iterator.next()
2345
except StopIteration:
2347
while current_new or current_old:
2348
# skip entries in old that are not really there
2349
if current_old and current_old[1][0][0] in 'ar':
2350
# relocated or absent
2351
current_old = advance(old_iterator)
2354
# convert new into dirblock style
2355
new_path_utf8 = current_new[0].encode('utf8')
2356
new_dirname, new_basename = osutils.split(new_path_utf8)
2357
new_id = current_new[1].file_id
2358
new_entry_key = (new_dirname, new_basename, new_id)
2359
current_new_minikind = \
2360
DirState._kind_to_minikind[current_new[1].kind]
2361
if current_new_minikind == 't':
2362
fingerprint = current_new[1].reference_revision or ''
2364
# We normally only insert or remove records, or update
2365
# them when it has significantly changed. Then we want to
2366
# erase its fingerprint. Unaffected records should
2367
# normally not be updated at all.
2370
# for safety disable variables
2371
new_path_utf8 = new_dirname = new_basename = new_id = \
2372
new_entry_key = None
2373
# 5 cases, we dont have a value that is strictly greater than everything, so
2374
# we make both end conditions explicit
2376
# old is finished: insert current_new into the state.
2377
self.update_minimal(new_entry_key, current_new_minikind,
2378
executable=current_new[1].executable,
2379
path_utf8=new_path_utf8, fingerprint=fingerprint)
2380
current_new = advance(new_iterator)
2381
elif not current_new:
2383
self._make_absent(current_old)
2384
current_old = advance(old_iterator)
2385
elif new_entry_key == current_old[0]:
2386
# same - common case
2387
# We're looking at the same path and id in both the dirstate
2388
# and inventory, so just need to update the fields in the
2389
# dirstate from the one in the inventory.
2390
# TODO: update the record if anything significant has changed.
2391
# the minimal required trigger is if the execute bit or cached
2393
if (current_old[1][0][3] != current_new[1].executable or
2394
current_old[1][0][0] != current_new_minikind):
2395
self.update_minimal(current_old[0], current_new_minikind,
2396
executable=current_new[1].executable,
2397
path_utf8=new_path_utf8, fingerprint=fingerprint)
2398
# both sides are dealt with, move on
2399
current_old = advance(old_iterator)
2400
current_new = advance(new_iterator)
2401
elif (cmp_by_dirs(new_dirname, current_old[0][0]) < 0
2402
or (new_dirname == current_old[0][0]
2403
and new_entry_key[1:] < current_old[0][1:])):
2405
# add a entry for this and advance new
2406
self.update_minimal(new_entry_key, current_new_minikind,
2407
executable=current_new[1].executable,
2408
path_utf8=new_path_utf8, fingerprint=fingerprint)
2409
current_new = advance(new_iterator)
2411
# we've advanced past the place where the old key would be,
2412
# without seeing it in the new list. so it must be gone.
2413
self._make_absent(current_old)
2414
current_old = advance(old_iterator)
2415
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2416
self._id_index = None
2417
self._packed_stat_index = None
2419
def _make_absent(self, current_old):
2420
"""Mark current_old - an entry - as absent for tree 0.
2422
:return: True if this was the last details entry for the entry key:
2423
that is, if the underlying block has had the entry removed, thus
2424
shrinking in length.
2426
# build up paths that this id will be left at after the change is made,
2427
# so we can update their cross references in tree 0
2428
all_remaining_keys = set()
2429
# Dont check the working tree, because it's going.
2430
for details in current_old[1][1:]:
2431
if details[0] not in 'ar': # absent, relocated
2432
all_remaining_keys.add(current_old[0])
2433
elif details[0] == 'r': # relocated
2434
# record the key for the real path.
2435
all_remaining_keys.add(tuple(osutils.split(details[1])) + (current_old[0][2],))
2436
# absent rows are not present at any path.
2437
last_reference = current_old[0] not in all_remaining_keys
2439
# the current row consists entire of the current item (being marked
2440
# absent), and relocated or absent entries for the other trees:
2441
# Remove it, its meaningless.
2442
block = self._find_block(current_old[0])
2443
entry_index, present = self._find_entry_index(current_old[0], block[1])
2445
raise AssertionError('could not find entry for %s' % (current_old,))
2446
block[1].pop(entry_index)
2447
# if we have an id_index in use, remove this key from it for this id.
2448
if self._id_index is not None:
2449
self._id_index[current_old[0][2]].remove(current_old[0])
2450
# update all remaining keys for this id to record it as absent. The
2451
# existing details may either be the record we are marking as deleted
2452
# (if there were other trees with the id present at this path), or may
2454
for update_key in all_remaining_keys:
2455
update_block_index, present = \
2456
self._find_block_index_from_key(update_key)
2458
raise AssertionError('could not find block for %s' % (update_key,))
2459
update_entry_index, present = \
2460
self._find_entry_index(update_key, self._dirblocks[update_block_index][1])
2462
raise AssertionError('could not find entry for %s' % (update_key,))
2463
update_tree_details = self._dirblocks[update_block_index][1][update_entry_index][1]
2464
# it must not be absent at the moment
2465
if update_tree_details[0][0] == 'a': # absent
2466
raise AssertionError('bad row %r' % (update_tree_details,))
2467
update_tree_details[0] = DirState.NULL_PARENT_DETAILS
2468
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2469
return last_reference
2471
def update_minimal(self, key, minikind, executable=False, fingerprint='',
2472
packed_stat=None, size=0, path_utf8=None):
2473
"""Update an entry to the state in tree 0.
2475
This will either create a new entry at 'key' or update an existing one.
2476
It also makes sure that any other records which might mention this are
2479
:param key: (dir, name, file_id) for the new entry
2480
:param minikind: The type for the entry ('f' == 'file', 'd' ==
2482
:param executable: Should the executable bit be set?
2483
:param fingerprint: Simple fingerprint for new entry: sha1 for files,
2484
referenced revision id for subtrees, etc.
2485
:param packed_stat: Packed stat value for new entry.
2486
:param size: Size information for new entry
2487
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
2490
If packed_stat and fingerprint are not given, they're invalidated in
2493
block = self._find_block(key)[1]
2494
if packed_stat is None:
2495
packed_stat = DirState.NULLSTAT
2496
# XXX: Some callers pass '' as the packed_stat, and it seems to be
2497
# sometimes present in the dirstate - this seems oddly inconsistent.
2499
entry_index, present = self._find_entry_index(key, block)
2500
new_details = (minikind, fingerprint, size, executable, packed_stat)
2501
id_index = self._get_id_index()
2503
# new entry, synthesis cross reference here,
2504
existing_keys = id_index.setdefault(key[2], set())
2505
if not existing_keys:
2506
# not currently in the state, simplest case
2507
new_entry = key, [new_details] + self._empty_parent_info()
2509
# present at one or more existing other paths.
2510
# grab one of them and use it to generate parent
2511
# relocation/absent entries.
2512
new_entry = key, [new_details]
2513
for other_key in existing_keys:
2514
# change the record at other to be a pointer to this new
2515
# record. The loop looks similar to the change to
2516
# relocations when updating an existing record but its not:
2517
# the test for existing kinds is different: this can be
2518
# factored out to a helper though.
2519
other_block_index, present = self._find_block_index_from_key(other_key)
2521
raise AssertionError('could not find block for %s' % (other_key,))
2522
other_entry_index, present = self._find_entry_index(other_key,
2523
self._dirblocks[other_block_index][1])
2525
raise AssertionError('could not find entry for %s' % (other_key,))
2526
if path_utf8 is None:
2527
raise AssertionError('no path')
2528
self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
2529
('r', path_utf8, 0, False, '')
2531
num_present_parents = self._num_present_parents()
2532
for lookup_index in xrange(1, num_present_parents + 1):
2533
# grab any one entry, use it to find the right path.
2534
# TODO: optimise this to reduce memory use in highly
2535
# fragmented situations by reusing the relocation
2537
update_block_index, present = \
2538
self._find_block_index_from_key(other_key)
2540
raise AssertionError('could not find block for %s' % (other_key,))
2541
update_entry_index, present = \
2542
self._find_entry_index(other_key, self._dirblocks[update_block_index][1])
2544
raise AssertionError('could not find entry for %s' % (other_key,))
2545
update_details = self._dirblocks[update_block_index][1][update_entry_index][1][lookup_index]
2546
if update_details[0] in 'ar': # relocated, absent
2547
# its a pointer or absent in lookup_index's tree, use
2549
new_entry[1].append(update_details)
2551
# we have the right key, make a pointer to it.
2552
pointer_path = osutils.pathjoin(*other_key[0:2])
2553
new_entry[1].append(('r', pointer_path, 0, False, ''))
2554
block.insert(entry_index, new_entry)
2555
existing_keys.add(key)
2557
# Does the new state matter?
2558
block[entry_index][1][0] = new_details
2559
# parents cannot be affected by what we do.
2560
# other occurences of this id can be found
2561
# from the id index.
2563
# tree index consistency: All other paths for this id in this tree
2564
# index must point to the correct path. We have to loop here because
2565
# we may have passed entries in the state with this file id already
2566
# that were absent - where parent entries are - and they need to be
2567
# converted to relocated.
2568
if path_utf8 is None:
2569
raise AssertionError('no path')
2570
for entry_key in id_index.setdefault(key[2], set()):
2571
# TODO:PROFILING: It might be faster to just update
2572
# rather than checking if we need to, and then overwrite
2573
# the one we are located at.
2574
if entry_key != key:
2575
# this file id is at a different path in one of the
2576
# other trees, so put absent pointers there
2577
# This is the vertical axis in the matrix, all pointing
2579
block_index, present = self._find_block_index_from_key(entry_key)
2581
raise AssertionError('not present: %r', entry_key)
2582
entry_index, present = self._find_entry_index(entry_key, self._dirblocks[block_index][1])
2584
raise AssertionError('not present: %r', entry_key)
2585
self._dirblocks[block_index][1][entry_index][1][0] = \
2586
('r', path_utf8, 0, False, '')
2587
# add a containing dirblock if needed.
2588
if new_details[0] == 'd':
2589
subdir_key = (osutils.pathjoin(*key[0:2]), '', '')
2590
block_index, present = self._find_block_index_from_key(subdir_key)
2592
self._dirblocks.insert(block_index, (subdir_key[0], []))
2594
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2596
def _validate(self):
2597
"""Check that invariants on the dirblock are correct.
2599
This can be useful in debugging; it shouldn't be necessary in
2602
This must be called with a lock held.
2604
# NOTE: This must always raise AssertionError not just assert,
2605
# otherwise it may not behave properly under python -O
2607
# TODO: All entries must have some content that's not 'a' or 'r',
2608
# otherwise it could just be removed.
2610
# TODO: All relocations must point directly to a real entry.
2612
# TODO: No repeated keys.
2615
from pprint import pformat
2616
self._read_dirblocks_if_needed()
2617
if len(self._dirblocks) > 0:
2618
if not self._dirblocks[0][0] == '':
2619
raise AssertionError(
2620
"dirblocks don't start with root block:\n" + \
2621
pformat(self._dirblocks))
2622
if len(self._dirblocks) > 1:
2623
if not self._dirblocks[1][0] == '':
2624
raise AssertionError(
2625
"dirblocks missing root directory:\n" + \
2626
pformat(self._dirblocks))
2627
# the dirblocks are sorted by their path components, name, and dir id
2628
dir_names = [d[0].split('/')
2629
for d in self._dirblocks[1:]]
2630
if dir_names != sorted(dir_names):
2631
raise AssertionError(
2632
"dir names are not in sorted order:\n" + \
2633
pformat(self._dirblocks) + \
2636
for dirblock in self._dirblocks:
2637
# within each dirblock, the entries are sorted by filename and
2639
for entry in dirblock[1]:
2640
if dirblock[0] != entry[0][0]:
2641
raise AssertionError(
2643
"doesn't match directory name in\n%r" %
2644
(entry, pformat(dirblock)))
2645
if dirblock[1] != sorted(dirblock[1]):
2646
raise AssertionError(
2647
"dirblock for %r is not sorted:\n%s" % \
2648
(dirblock[0], pformat(dirblock)))
2650
def check_valid_parent():
2651
"""Check that the current entry has a valid parent.
2653
This makes sure that the parent has a record,
2654
and that the parent isn't marked as "absent" in the
2655
current tree. (It is invalid to have a non-absent file in an absent
2658
if entry[0][0:2] == ('', ''):
2659
# There should be no parent for the root row
2661
parent_entry = self._get_entry(tree_index, path_utf8=entry[0][0])
2662
if parent_entry == (None, None):
2663
raise AssertionError(
2664
"no parent entry for: %s in tree %s"
2665
% (this_path, tree_index))
2666
if parent_entry[1][tree_index][0] != 'd':
2667
raise AssertionError(
2668
"Parent entry for %s is not marked as a valid"
2669
" directory. %s" % (this_path, parent_entry,))
2671
# For each file id, for each tree: either
2672
# the file id is not present at all; all rows with that id in the
2673
# key have it marked as 'absent'
2674
# OR the file id is present under exactly one name; any other entries
2675
# that mention that id point to the correct name.
2677
# We check this with a dict per tree pointing either to the present
2678
# name, or None if absent.
2679
tree_count = self._num_present_parents() + 1
2680
id_path_maps = [dict() for i in range(tree_count)]
2681
# Make sure that all renamed entries point to the correct location.
2682
for entry in self._iter_entries():
2683
file_id = entry[0][2]
2684
this_path = osutils.pathjoin(entry[0][0], entry[0][1])
2685
if len(entry[1]) != tree_count:
2686
raise AssertionError(
2687
"wrong number of entry details for row\n%s" \
2688
",\nexpected %d" % \
2689
(pformat(entry), tree_count))
2690
absent_positions = 0
2691
for tree_index, tree_state in enumerate(entry[1]):
2692
this_tree_map = id_path_maps[tree_index]
2693
minikind = tree_state[0]
2694
if minikind in 'ar':
2695
absent_positions += 1
2696
# have we seen this id before in this column?
2697
if file_id in this_tree_map:
2698
previous_path, previous_loc = this_tree_map[file_id]
2699
# any later mention of this file must be consistent with
2700
# what was said before
2702
if previous_path is not None:
2703
raise AssertionError(
2704
"file %s is absent in row %r but also present " \
2706
(file_id, entry, previous_path))
2707
elif minikind == 'r':
2708
target_location = tree_state[1]
2709
if previous_path != target_location:
2710
raise AssertionError(
2711
"file %s relocation in row %r but also at %r" \
2712
% (file_id, entry, previous_path))
2714
# a file, directory, etc - may have been previously
2715
# pointed to by a relocation, which must point here
2716
if previous_path != this_path:
2717
raise AssertionError(
2718
"entry %r inconsistent with previous path %r "
2720
(entry, previous_path, previous_loc))
2721
check_valid_parent()
2724
# absent; should not occur anywhere else
2725
this_tree_map[file_id] = None, this_path
2726
elif minikind == 'r':
2727
# relocation, must occur at expected location
2728
this_tree_map[file_id] = tree_state[1], this_path
2730
this_tree_map[file_id] = this_path, this_path
2731
check_valid_parent()
2732
if absent_positions == tree_count:
2733
raise AssertionError(
2734
"entry %r has no data for any tree." % (entry,))
2736
def _wipe_state(self):
2737
"""Forget all state information about the dirstate."""
2738
self._header_state = DirState.NOT_IN_MEMORY
2739
self._dirblock_state = DirState.NOT_IN_MEMORY
2740
self._changes_aborted = False
2743
self._dirblocks = []
2744
self._id_index = None
2745
self._packed_stat_index = None
2746
self._end_of_header = None
2747
self._cutoff_time = None
2748
self._split_path_cache = {}
2750
def lock_read(self):
2751
"""Acquire a read lock on the dirstate."""
2752
if self._lock_token is not None:
2753
raise errors.LockContention(self._lock_token)
2754
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2755
# already in memory, we could read just the header and check for
2756
# any modification. If not modified, we can just leave things
2758
self._lock_token = lock.ReadLock(self._filename)
2759
self._lock_state = 'r'
2760
self._state_file = self._lock_token.f
2763
def lock_write(self):
2764
"""Acquire a write lock on the dirstate."""
2765
if self._lock_token is not None:
2766
raise errors.LockContention(self._lock_token)
2767
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2768
# already in memory, we could read just the header and check for
2769
# any modification. If not modified, we can just leave things
2771
self._lock_token = lock.WriteLock(self._filename)
2772
self._lock_state = 'w'
2773
self._state_file = self._lock_token.f
2777
"""Drop any locks held on the dirstate."""
2778
if self._lock_token is None:
2779
raise errors.LockNotHeld(self)
2780
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2781
# already in memory, we could read just the header and check for
2782
# any modification. If not modified, we can just leave things
2784
self._state_file = None
2785
self._lock_state = None
2786
self._lock_token.unlock()
2787
self._lock_token = None
2788
self._split_path_cache = {}
2790
def _requires_lock(self):
2791
"""Check that a lock is currently held by someone on the dirstate."""
2792
if not self._lock_token:
2793
raise errors.ObjectNotLocked(self)
2796
def py_update_entry(state, entry, abspath, stat_value,
2797
_stat_to_minikind=DirState._stat_to_minikind,
2798
_pack_stat=pack_stat):
2799
"""Update the entry based on what is actually on disk.
2801
This function only calculates the sha if it needs to - if the entry is
2802
uncachable, or clearly different to the first parent's entry, no sha
2803
is calculated, and None is returned.
2805
:param state: The dirstate this entry is in.
2806
:param entry: This is the dirblock entry for the file in question.
2807
:param abspath: The path on disk for this file.
2808
:param stat_value: The stat value done on the path.
2809
:return: None, or The sha1 hexdigest of the file (40 bytes) or link
2810
target of a symlink.
2813
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
2817
packed_stat = _pack_stat(stat_value)
2818
(saved_minikind, saved_link_or_sha1, saved_file_size,
2819
saved_executable, saved_packed_stat) = entry[1][0]
2821
if (minikind == saved_minikind
2822
and packed_stat == saved_packed_stat):
2823
# The stat hasn't changed since we saved, so we can re-use the
2828
# size should also be in packed_stat
2829
if saved_file_size == stat_value.st_size:
2830
return saved_link_or_sha1
2832
# If we have gotten this far, that means that we need to actually
2833
# process this entry.
2836
executable = state._is_executable(stat_value.st_mode,
2838
if state._cutoff_time is None:
2839
state._sha_cutoff_time()
2840
if (stat_value.st_mtime < state._cutoff_time
2841
and stat_value.st_ctime < state._cutoff_time
2842
and len(entry[1]) > 1
2843
and entry[1][1][0] != 'a'):
2844
# Could check for size changes for further optimised
2845
# avoidance of sha1's. However the most prominent case of
2846
# over-shaing is during initial add, which this catches.
2847
# Besides, if content filtering happens, size and sha
2848
# need to be checked together - checking just the size
2850
if state._filter_provider is None:
2853
relpath = osutils.pathjoin(entry[0][0], entry[0][1])
2854
file_id = entry[0][2]
2855
filter_list = state._filter_provider(relpath, file_id)
2856
link_or_sha1 = state._size_sha1_file(abspath, filter_list)[1]
2857
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
2858
executable, packed_stat)
2860
entry[1][0] = ('f', '', stat_value.st_size,
2861
executable, DirState.NULLSTAT)
2862
elif minikind == 'd':
2864
entry[1][0] = ('d', '', 0, False, packed_stat)
2865
if saved_minikind != 'd':
2866
# This changed from something into a directory. Make sure we
2867
# have a directory block for it. This doesn't happen very
2868
# often, so this doesn't have to be super fast.
2869
block_index, entry_index, dir_present, file_present = \
2870
state._get_block_entry_index(entry[0][0], entry[0][1], 0)
2871
state._ensure_block(block_index, entry_index,
2872
osutils.pathjoin(entry[0][0], entry[0][1]))
2873
elif minikind == 'l':
2874
link_or_sha1 = state._read_link(abspath, saved_link_or_sha1)
2875
if state._cutoff_time is None:
2876
state._sha_cutoff_time()
2877
if (stat_value.st_mtime < state._cutoff_time
2878
and stat_value.st_ctime < state._cutoff_time):
2879
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
2882
entry[1][0] = ('l', '', stat_value.st_size,
2883
False, DirState.NULLSTAT)
2884
state._dirblock_state = DirState.IN_MEMORY_MODIFIED
2886
update_entry = py_update_entry
2889
class ProcessEntryPython(object):
2891
__slots__ = ["old_dirname_to_file_id", "new_dirname_to_file_id", "uninteresting",
2892
"last_source_parent", "last_target_parent", "include_unchanged",
2893
"use_filesystem_for_exec", "utf8_decode", "searched_specific_files",
2894
"search_specific_files", "state", "source_index", "target_index",
2895
"want_unversioned", "tree"]
2897
def __init__(self, include_unchanged, use_filesystem_for_exec,
2898
search_specific_files, state, source_index, target_index,
2899
want_unversioned, tree):
2900
self.old_dirname_to_file_id = {}
2901
self.new_dirname_to_file_id = {}
2902
# Just a sentry, so that _process_entry can say that this
2903
# record is handled, but isn't interesting to process (unchanged)
2904
self.uninteresting = object()
2905
# Using a list so that we can access the values and change them in
2906
# nested scope. Each one is [path, file_id, entry]
2907
self.last_source_parent = [None, None]
2908
self.last_target_parent = [None, None]
2909
self.include_unchanged = include_unchanged
2910
self.use_filesystem_for_exec = use_filesystem_for_exec
2911
self.utf8_decode = cache_utf8._utf8_decode
2912
# for all search_indexs in each path at or under each element of
2913
# search_specific_files, if the detail is relocated: add the id, and add the
2914
# relocated path as one to search if its not searched already. If the
2915
# detail is not relocated, add the id.
2916
self.searched_specific_files = set()
2917
self.search_specific_files = search_specific_files
2919
self.source_index = source_index
2920
self.target_index = target_index
2921
self.want_unversioned = want_unversioned
2924
def _process_entry(self, entry, path_info, pathjoin=osutils.pathjoin):
2925
"""Compare an entry and real disk to generate delta information.
2927
:param path_info: top_relpath, basename, kind, lstat, abspath for
2928
the path of entry. If None, then the path is considered absent.
2929
(Perhaps we should pass in a concrete entry for this ?)
2930
Basename is returned as a utf8 string because we expect this
2931
tuple will be ignored, and don't want to take the time to
2933
:return: None if these don't match
2934
A tuple of information about the change, or
2935
the object 'uninteresting' if these match, but are
2936
basically identical.
2938
if self.source_index is None:
2939
source_details = DirState.NULL_PARENT_DETAILS
2941
source_details = entry[1][self.source_index]
2942
target_details = entry[1][self.target_index]
2943
target_minikind = target_details[0]
2944
if path_info is not None and target_minikind in 'fdlt':
2945
if not (self.target_index == 0):
2946
raise AssertionError()
2947
link_or_sha1 = update_entry(self.state, entry,
2948
abspath=path_info[4], stat_value=path_info[3])
2949
# The entry may have been modified by update_entry
2950
target_details = entry[1][self.target_index]
2951
target_minikind = target_details[0]
2954
file_id = entry[0][2]
2955
source_minikind = source_details[0]
2956
if source_minikind in 'fdltr' and target_minikind in 'fdlt':
2957
# claimed content in both: diff
2958
# r | fdlt | | add source to search, add id path move and perform
2959
# | | | diff check on source-target
2960
# r | fdlt | a | dangling file that was present in the basis.
2962
if source_minikind in 'r':
2963
# add the source to the search path to find any children it
2964
# has. TODO ? : only add if it is a container ?
2965
if not osutils.is_inside_any(self.searched_specific_files,
2967
self.search_specific_files.add(source_details[1])
2968
# generate the old path; this is needed for stating later
2970
old_path = source_details[1]
2971
old_dirname, old_basename = os.path.split(old_path)
2972
path = pathjoin(entry[0][0], entry[0][1])
2973
old_entry = self.state._get_entry(self.source_index,
2975
# update the source details variable to be the real
2977
if old_entry == (None, None):
2978
raise errors.CorruptDirstate(self.state._filename,
2979
"entry '%s/%s' is considered renamed from %r"
2980
" but source does not exist\n"
2981
"entry: %s" % (entry[0][0], entry[0][1], old_path, entry))
2982
source_details = old_entry[1][self.source_index]
2983
source_minikind = source_details[0]
2985
old_dirname = entry[0][0]
2986
old_basename = entry[0][1]
2987
old_path = path = None
2988
if path_info is None:
2989
# the file is missing on disk, show as removed.
2990
content_change = True
2994
# source and target are both versioned and disk file is present.
2995
target_kind = path_info[2]
2996
if target_kind == 'directory':
2998
old_path = path = pathjoin(old_dirname, old_basename)
2999
self.new_dirname_to_file_id[path] = file_id
3000
if source_minikind != 'd':
3001
content_change = True
3003
# directories have no fingerprint
3004
content_change = False
3006
elif target_kind == 'file':
3007
if source_minikind != 'f':
3008
content_change = True
3010
# If the size is the same, check the sha:
3011
if target_details[2] == source_details[2]:
3012
if link_or_sha1 is None:
3014
file_obj = file(path_info[4], 'rb')
3016
statvalue = os.fstat(file_obj.fileno())
3017
link_or_sha1 = osutils.sha_file(file_obj)
3020
self.state._observed_sha1(entry, link_or_sha1,
3022
content_change = (link_or_sha1 != source_details[1])
3024
# Size changed, so must be different
3025
content_change = True
3026
# Target details is updated at update_entry time
3027
if self.use_filesystem_for_exec:
3028
# We don't need S_ISREG here, because we are sure
3029
# we are dealing with a file.
3030
target_exec = bool(stat.S_IEXEC & path_info[3].st_mode)
3032
target_exec = target_details[3]
3033
elif target_kind == 'symlink':
3034
if source_minikind != 'l':
3035
content_change = True
3037
content_change = (link_or_sha1 != source_details[1])
3039
elif target_kind == 'tree-reference':
3040
if source_minikind != 't':
3041
content_change = True
3043
content_change = False
3046
raise Exception, "unknown kind %s" % path_info[2]
3047
if source_minikind == 'd':
3049
old_path = path = pathjoin(old_dirname, old_basename)
3050
self.old_dirname_to_file_id[old_path] = file_id
3051
# parent id is the entry for the path in the target tree
3052
if old_dirname == self.last_source_parent[0]:
3053
source_parent_id = self.last_source_parent[1]
3056
source_parent_id = self.old_dirname_to_file_id[old_dirname]
3058
source_parent_entry = self.state._get_entry(self.source_index,
3059
path_utf8=old_dirname)
3060
source_parent_id = source_parent_entry[0][2]
3061
if source_parent_id == entry[0][2]:
3062
# This is the root, so the parent is None
3063
source_parent_id = None
3065
self.last_source_parent[0] = old_dirname
3066
self.last_source_parent[1] = source_parent_id
3067
new_dirname = entry[0][0]
3068
if new_dirname == self.last_target_parent[0]:
3069
target_parent_id = self.last_target_parent[1]
3072
target_parent_id = self.new_dirname_to_file_id[new_dirname]
3074
# TODO: We don't always need to do the lookup, because the
3075
# parent entry will be the same as the source entry.
3076
target_parent_entry = self.state._get_entry(self.target_index,
3077
path_utf8=new_dirname)
3078
if target_parent_entry == (None, None):
3079
raise AssertionError(
3080
"Could not find target parent in wt: %s\nparent of: %s"
3081
% (new_dirname, entry))
3082
target_parent_id = target_parent_entry[0][2]
3083
if target_parent_id == entry[0][2]:
3084
# This is the root, so the parent is None
3085
target_parent_id = None
3087
self.last_target_parent[0] = new_dirname
3088
self.last_target_parent[1] = target_parent_id
3090
source_exec = source_details[3]
3091
if (self.include_unchanged
3093
or source_parent_id != target_parent_id
3094
or old_basename != entry[0][1]
3095
or source_exec != target_exec
3097
if old_path is None:
3098
old_path = path = pathjoin(old_dirname, old_basename)
3099
old_path_u = self.utf8_decode(old_path)[0]
3102
old_path_u = self.utf8_decode(old_path)[0]
3103
if old_path == path:
3106
path_u = self.utf8_decode(path)[0]
3107
source_kind = DirState._minikind_to_kind[source_minikind]
3108
return (entry[0][2],
3109
(old_path_u, path_u),
3112
(source_parent_id, target_parent_id),
3113
(self.utf8_decode(old_basename)[0], self.utf8_decode(entry[0][1])[0]),
3114
(source_kind, target_kind),
3115
(source_exec, target_exec))
3117
return self.uninteresting
3118
elif source_minikind in 'a' and target_minikind in 'fdlt':
3119
# looks like a new file
3120
path = pathjoin(entry[0][0], entry[0][1])
3121
# parent id is the entry for the path in the target tree
3122
# TODO: these are the same for an entire directory: cache em.
3123
parent_id = self.state._get_entry(self.target_index,
3124
path_utf8=entry[0][0])[0][2]
3125
if parent_id == entry[0][2]:
3127
if path_info is not None:
3129
if self.use_filesystem_for_exec:
3130
# We need S_ISREG here, because we aren't sure if this
3133
stat.S_ISREG(path_info[3].st_mode)
3134
and stat.S_IEXEC & path_info[3].st_mode)
3136
target_exec = target_details[3]
3137
return (entry[0][2],
3138
(None, self.utf8_decode(path)[0]),
3142
(None, self.utf8_decode(entry[0][1])[0]),
3143
(None, path_info[2]),
3144
(None, target_exec))
3146
# Its a missing file, report it as such.
3147
return (entry[0][2],
3148
(None, self.utf8_decode(path)[0]),
3152
(None, self.utf8_decode(entry[0][1])[0]),
3155
elif source_minikind in 'fdlt' and target_minikind in 'a':
3156
# unversioned, possibly, or possibly not deleted: we dont care.
3157
# if its still on disk, *and* theres no other entry at this
3158
# path [we dont know this in this routine at the moment -
3159
# perhaps we should change this - then it would be an unknown.
3160
old_path = pathjoin(entry[0][0], entry[0][1])
3161
# parent id is the entry for the path in the target tree
3162
parent_id = self.state._get_entry(self.source_index, path_utf8=entry[0][0])[0][2]
3163
if parent_id == entry[0][2]:
3165
return (entry[0][2],
3166
(self.utf8_decode(old_path)[0], None),
3170
(self.utf8_decode(entry[0][1])[0], None),
3171
(DirState._minikind_to_kind[source_minikind], None),
3172
(source_details[3], None))
3173
elif source_minikind in 'fdlt' and target_minikind in 'r':
3174
# a rename; could be a true rename, or a rename inherited from
3175
# a renamed parent. TODO: handle this efficiently. Its not
3176
# common case to rename dirs though, so a correct but slow
3177
# implementation will do.
3178
if not osutils.is_inside_any(self.searched_specific_files, target_details[1]):
3179
self.search_specific_files.add(target_details[1])
3180
elif source_minikind in 'ra' and target_minikind in 'ra':
3181
# neither of the selected trees contain this file,
3182
# so skip over it. This is not currently directly tested, but
3183
# is indirectly via test_too_much.TestCommands.test_conflicts.
3186
raise AssertionError("don't know how to compare "
3187
"source_minikind=%r, target_minikind=%r"
3188
% (source_minikind, target_minikind))
3189
## import pdb;pdb.set_trace()
3195
def iter_changes(self):
3196
"""Iterate over the changes."""
3197
utf8_decode = cache_utf8._utf8_decode
3198
_cmp_by_dirs = cmp_by_dirs
3199
_process_entry = self._process_entry
3200
uninteresting = self.uninteresting
3201
search_specific_files = self.search_specific_files
3202
searched_specific_files = self.searched_specific_files
3203
splitpath = osutils.splitpath
3205
# compare source_index and target_index at or under each element of search_specific_files.
3206
# follow the following comparison table. Note that we only want to do diff operations when
3207
# the target is fdl because thats when the walkdirs logic will have exposed the pathinfo
3211
# Source | Target | disk | action
3212
# r | fdlt | | add source to search, add id path move and perform
3213
# | | | diff check on source-target
3214
# r | fdlt | a | dangling file that was present in the basis.
3216
# r | a | | add source to search
3218
# r | r | | this path is present in a non-examined tree, skip.
3219
# r | r | a | this path is present in a non-examined tree, skip.
3220
# a | fdlt | | add new id
3221
# a | fdlt | a | dangling locally added file, skip
3222
# a | a | | not present in either tree, skip
3223
# a | a | a | not present in any tree, skip
3224
# a | r | | not present in either tree at this path, skip as it
3225
# | | | may not be selected by the users list of paths.
3226
# a | r | a | not present in either tree at this path, skip as it
3227
# | | | may not be selected by the users list of paths.
3228
# fdlt | fdlt | | content in both: diff them
3229
# fdlt | fdlt | a | deleted locally, but not unversioned - show as deleted ?
3230
# fdlt | a | | unversioned: output deleted id for now
3231
# fdlt | a | a | unversioned and deleted: output deleted id
3232
# fdlt | r | | relocated in this tree, so add target to search.
3233
# | | | Dont diff, we will see an r,fd; pair when we reach
3234
# | | | this id at the other path.
3235
# fdlt | r | a | relocated in this tree, so add target to search.
3236
# | | | Dont diff, we will see an r,fd; pair when we reach
3237
# | | | this id at the other path.
3239
# TODO: jam 20070516 - Avoid the _get_entry lookup overhead by
3240
# keeping a cache of directories that we have seen.
3242
while search_specific_files:
3243
# TODO: the pending list should be lexically sorted? the
3244
# interface doesn't require it.
3245
current_root = search_specific_files.pop()
3246
current_root_unicode = current_root.decode('utf8')
3247
searched_specific_files.add(current_root)
3248
# process the entries for this containing directory: the rest will be
3249
# found by their parents recursively.
3250
root_entries = self.state._entries_for_path(current_root)
3251
root_abspath = self.tree.abspath(current_root_unicode)
3253
root_stat = os.lstat(root_abspath)
3255
if e.errno == errno.ENOENT:
3256
# the path does not exist: let _process_entry know that.
3257
root_dir_info = None
3259
# some other random error: hand it up.
3262
root_dir_info = ('', current_root,
3263
osutils.file_kind_from_stat_mode(root_stat.st_mode), root_stat,
3265
if root_dir_info[2] == 'directory':
3266
if self.tree._directory_is_tree_reference(
3267
current_root.decode('utf8')):
3268
root_dir_info = root_dir_info[:2] + \
3269
('tree-reference',) + root_dir_info[3:]
3271
if not root_entries and not root_dir_info:
3272
# this specified path is not present at all, skip it.
3274
path_handled = False
3275
for entry in root_entries:
3276
result = _process_entry(entry, root_dir_info)
3277
if result is not None:
3279
if result is not uninteresting:
3281
if self.want_unversioned and not path_handled and root_dir_info:
3282
new_executable = bool(
3283
stat.S_ISREG(root_dir_info[3].st_mode)
3284
and stat.S_IEXEC & root_dir_info[3].st_mode)
3286
(None, current_root_unicode),
3290
(None, splitpath(current_root_unicode)[-1]),
3291
(None, root_dir_info[2]),
3292
(None, new_executable)
3294
initial_key = (current_root, '', '')
3295
block_index, _ = self.state._find_block_index_from_key(initial_key)
3296
if block_index == 0:
3297
# we have processed the total root already, but because the
3298
# initial key matched it we should skip it here.
3300
if root_dir_info and root_dir_info[2] == 'tree-reference':
3301
current_dir_info = None
3303
dir_iterator = osutils._walkdirs_utf8(root_abspath, prefix=current_root)
3305
current_dir_info = dir_iterator.next()
3307
# on win32, python2.4 has e.errno == ERROR_DIRECTORY, but
3308
# python 2.5 has e.errno == EINVAL,
3309
# and e.winerror == ERROR_DIRECTORY
3310
e_winerror = getattr(e, 'winerror', None)
3311
win_errors = (ERROR_DIRECTORY, ERROR_PATH_NOT_FOUND)
3312
# there may be directories in the inventory even though
3313
# this path is not a file on disk: so mark it as end of
3315
if e.errno in (errno.ENOENT, errno.ENOTDIR, errno.EINVAL):
3316
current_dir_info = None
3317
elif (sys.platform == 'win32'
3318
and (e.errno in win_errors
3319
or e_winerror in win_errors)):
3320
current_dir_info = None
3324
if current_dir_info[0][0] == '':
3325
# remove .bzr from iteration
3326
bzr_index = bisect.bisect_left(current_dir_info[1], ('.bzr',))
3327
if current_dir_info[1][bzr_index][0] != '.bzr':
3328
raise AssertionError()
3329
del current_dir_info[1][bzr_index]
3330
# walk until both the directory listing and the versioned metadata
3332
if (block_index < len(self.state._dirblocks) and
3333
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
3334
current_block = self.state._dirblocks[block_index]
3336
current_block = None
3337
while (current_dir_info is not None or
3338
current_block is not None):
3339
if (current_dir_info and current_block
3340
and current_dir_info[0][0] != current_block[0]):
3341
if _cmp_by_dirs(current_dir_info[0][0], current_block[0]) < 0:
3342
# filesystem data refers to paths not covered by the dirblock.
3343
# this has two possibilities:
3344
# A) it is versioned but empty, so there is no block for it
3345
# B) it is not versioned.
3347
# if (A) then we need to recurse into it to check for
3348
# new unknown files or directories.
3349
# if (B) then we should ignore it, because we don't
3350
# recurse into unknown directories.
3352
while path_index < len(current_dir_info[1]):
3353
current_path_info = current_dir_info[1][path_index]
3354
if self.want_unversioned:
3355
if current_path_info[2] == 'directory':
3356
if self.tree._directory_is_tree_reference(
3357
current_path_info[0].decode('utf8')):
3358
current_path_info = current_path_info[:2] + \
3359
('tree-reference',) + current_path_info[3:]
3360
new_executable = bool(
3361
stat.S_ISREG(current_path_info[3].st_mode)
3362
and stat.S_IEXEC & current_path_info[3].st_mode)
3364
(None, utf8_decode(current_path_info[0])[0]),
3368
(None, utf8_decode(current_path_info[1])[0]),
3369
(None, current_path_info[2]),
3370
(None, new_executable))
3371
# dont descend into this unversioned path if it is
3373
if current_path_info[2] in ('directory',
3375
del current_dir_info[1][path_index]
3379
# This dir info has been handled, go to the next
3381
current_dir_info = dir_iterator.next()
3382
except StopIteration:
3383
current_dir_info = None
3385
# We have a dirblock entry for this location, but there
3386
# is no filesystem path for this. This is most likely
3387
# because a directory was removed from the disk.
3388
# We don't have to report the missing directory,
3389
# because that should have already been handled, but we
3390
# need to handle all of the files that are contained
3392
for current_entry in current_block[1]:
3393
# entry referring to file not present on disk.
3394
# advance the entry only, after processing.
3395
result = _process_entry(current_entry, None)
3396
if result is not None:
3397
if result is not uninteresting:
3400
if (block_index < len(self.state._dirblocks) and
3401
osutils.is_inside(current_root,
3402
self.state._dirblocks[block_index][0])):
3403
current_block = self.state._dirblocks[block_index]
3405
current_block = None
3408
if current_block and entry_index < len(current_block[1]):
3409
current_entry = current_block[1][entry_index]
3411
current_entry = None
3412
advance_entry = True
3414
if current_dir_info and path_index < len(current_dir_info[1]):
3415
current_path_info = current_dir_info[1][path_index]
3416
if current_path_info[2] == 'directory':
3417
if self.tree._directory_is_tree_reference(
3418
current_path_info[0].decode('utf8')):
3419
current_path_info = current_path_info[:2] + \
3420
('tree-reference',) + current_path_info[3:]
3422
current_path_info = None
3424
path_handled = False
3425
while (current_entry is not None or
3426
current_path_info is not None):
3427
if current_entry is None:
3428
# the check for path_handled when the path is adnvaced
3429
# will yield this path if needed.
3431
elif current_path_info is None:
3432
# no path is fine: the per entry code will handle it.
3433
result = _process_entry(current_entry, current_path_info)
3434
if result is not None:
3435
if result is not uninteresting:
3437
elif (current_entry[0][1] != current_path_info[1]
3438
or current_entry[1][self.target_index][0] in 'ar'):
3439
# The current path on disk doesn't match the dirblock
3440
# record. Either the dirblock is marked as absent, or
3441
# the file on disk is not present at all in the
3442
# dirblock. Either way, report about the dirblock
3443
# entry, and let other code handle the filesystem one.
3445
# Compare the basename for these files to determine
3447
if current_path_info[1] < current_entry[0][1]:
3448
# extra file on disk: pass for now, but only
3449
# increment the path, not the entry
3450
advance_entry = False
3452
# entry referring to file not present on disk.
3453
# advance the entry only, after processing.
3454
result = _process_entry(current_entry, None)
3455
if result is not None:
3456
if result is not uninteresting:
3458
advance_path = False
3460
result = _process_entry(current_entry, current_path_info)
3461
if result is not None:
3463
if result is not uninteresting:
3465
if advance_entry and current_entry is not None:
3467
if entry_index < len(current_block[1]):
3468
current_entry = current_block[1][entry_index]
3470
current_entry = None
3472
advance_entry = True # reset the advance flaga
3473
if advance_path and current_path_info is not None:
3474
if not path_handled:
3475
# unversioned in all regards
3476
if self.want_unversioned:
3477
new_executable = bool(
3478
stat.S_ISREG(current_path_info[3].st_mode)
3479
and stat.S_IEXEC & current_path_info[3].st_mode)
3481
relpath_unicode = utf8_decode(current_path_info[0])[0]
3482
except UnicodeDecodeError:
3483
raise errors.BadFilenameEncoding(
3484
current_path_info[0], osutils._fs_enc)
3486
(None, relpath_unicode),
3490
(None, utf8_decode(current_path_info[1])[0]),
3491
(None, current_path_info[2]),
3492
(None, new_executable))
3493
# dont descend into this unversioned path if it is
3495
if current_path_info[2] in ('directory'):
3496
del current_dir_info[1][path_index]
3498
# dont descend the disk iterator into any tree
3500
if current_path_info[2] == 'tree-reference':
3501
del current_dir_info[1][path_index]
3504
if path_index < len(current_dir_info[1]):
3505
current_path_info = current_dir_info[1][path_index]
3506
if current_path_info[2] == 'directory':
3507
if self.tree._directory_is_tree_reference(
3508
current_path_info[0].decode('utf8')):
3509
current_path_info = current_path_info[:2] + \
3510
('tree-reference',) + current_path_info[3:]
3512
current_path_info = None
3513
path_handled = False
3515
advance_path = True # reset the advance flagg.
3516
if current_block is not None:
3518
if (block_index < len(self.state._dirblocks) and
3519
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
3520
current_block = self.state._dirblocks[block_index]
3522
current_block = None
3523
if current_dir_info is not None:
3525
current_dir_info = dir_iterator.next()
3526
except StopIteration:
3527
current_dir_info = None
3528
_process_entry = ProcessEntryPython
3531
# Try to load the compiled form if possible
3533
from bzrlib._dirstate_helpers_c import (
3534
_read_dirblocks_c as _read_dirblocks,
3535
bisect_dirblock_c as bisect_dirblock,
3536
_bisect_path_left_c as _bisect_path_left,
3537
_bisect_path_right_c as _bisect_path_right,
3538
cmp_by_dirs_c as cmp_by_dirs,
3539
ProcessEntryC as _process_entry,
3540
update_entry as update_entry,
3543
from bzrlib._dirstate_helpers_py import (
3544
_read_dirblocks_py as _read_dirblocks,
3545
bisect_dirblock_py as bisect_dirblock,
3546
_bisect_path_left_py as _bisect_path_left,
3547
_bisect_path_right_py as _bisect_path_right,
3548
cmp_by_dirs_py as cmp_by_dirs,