1
# Copyright (C) 2006, 2007, 2008 Canonical Ltd
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
# GNU General Public License for more details.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
"""DirState objects record the state of a directory and its bzr metadata.
19
Pseudo EBNF grammar for the state file. Fields are separated by NULLs, and
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
are not - this is done for clarity of reading. All string data is in utf8.
23
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
26
WHOLE_NUMBER = {digit}, digit;
28
REVISION_ID = a non-empty utf8 string;
30
dirstate format = header line, full checksum, row count, parent details,
31
ghost_details, entries;
32
header line = "#bazaar dirstate flat format 3", NL;
33
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
34
row count = "num_entries: ", WHOLE_NUMBER, NL;
35
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
36
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
38
entry = entry_key, current_entry_details, {parent_entry_details};
39
entry_key = dirname, basename, fileid;
40
current_entry_details = common_entry_details, working_entry_details;
41
parent_entry_details = common_entry_details, history_entry_details;
42
common_entry_details = MINIKIND, fingerprint, size, executable
43
working_entry_details = packed_stat
44
history_entry_details = REVISION_ID;
47
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
49
Given this definition, the following is useful to know:
50
entry (aka row) - all the data for a given key.
51
entry[0]: The key (dirname, basename, fileid)
55
entry[1]: The tree(s) data for this path and id combination.
56
entry[1][0]: The current tree
57
entry[1][1]: The second tree
59
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate:
60
entry[1][0][0]: minikind
61
entry[1][0][1]: fingerprint
63
entry[1][0][3]: executable
64
entry[1][0][4]: packed_stat
66
entry[1][1][4]: revision_id
68
There may be multiple rows at the root, one per id present in the root, so the
69
in memory root row is now:
70
self._dirblocks[0] -> ('', [entry ...]),
71
and the entries in there are
74
entries[0][2]: file_id
75
entries[1][0]: The tree data for the current tree for this fileid at /
79
'r' is a relocated entry: This path is not present in this tree with this id,
80
but the id can be found at another location. The fingerprint is used to
81
point to the target location.
82
'a' is an absent entry: In that tree the id is not present at this path.
83
'd' is a directory entry: This path in this tree is a directory with the
84
current file id. There is no fingerprint for directories.
85
'f' is a file entry: As for directory, but its a file. The fingerprint is a
87
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
89
't' is a reference to a nested subtree; the fingerprint is the referenced
94
The entries on disk and in memory are ordered according to the following keys:
96
directory, as a list of components
100
--- Format 1 had the following different definition: ---
101
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
102
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
104
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
105
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
108
PARENT ROW's are emitted for every parent that is not in the ghosts details
109
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
110
each row will have a PARENT ROW for foo and baz, but not for bar.
113
In any tree, a kind of 'moved' indicates that the fingerprint field
114
(which we treat as opaque data specific to the 'kind' anyway) has the
115
details for the id of this row in that tree.
117
I'm strongly tempted to add a id->path index as well, but I think that
118
where we need id->path mapping; we also usually read the whole file, so
119
I'm going to skip that for the moment, as we have the ability to locate
120
via bisect any path in any tree, and if we lookup things by path, we can
121
accumulate an id->path mapping as we go, which will tend to match what we
124
I plan to implement this asap, so please speak up now to alter/tweak the
125
design - and once we stabilise on this, I'll update the wiki page for
128
The rationale for all this is that we want fast operations for the
129
common case (diff/status/commit/merge on all files) and extremely fast
130
operations for the less common but still occurs a lot status/diff/commit
131
on specific files). Operations on specific files involve a scan for all
132
the children of a path, *in every involved tree*, which the current
133
format did not accommodate.
137
1) Fast end to end use for bzr's top 5 uses cases. (commmit/diff/status/merge/???)
138
2) fall back current object model as needed.
139
3) scale usably to the largest trees known today - say 50K entries. (mozilla
140
is an example of this)
144
Eventually reuse dirstate objects across locks IFF the dirstate file has not
145
been modified, but will require that we flush/ignore cached stat-hit data
146
because we won't want to restat all files on disk just because a lock was
147
acquired, yet we cannot trust the data after the previous lock was released.
149
Memory representation:
150
vector of all directories, and vector of the childen ?
152
root_entrie = (direntry for root, [parent_direntries_for_root]),
154
('', ['data for achild', 'data for bchild', 'data for cchild'])
155
('dir', ['achild', 'cchild', 'echild'])
157
- single bisect to find N subtrees from a path spec
158
- in-order for serialisation - this is 'dirblock' grouping.
159
- insertion of a file '/a' affects only the '/' child-vector, that is, to
160
insert 10K elements from scratch does not generates O(N^2) memoves of a
161
single vector, rather each individual, which tends to be limited to a
162
manageable number. Will scale badly on trees with 10K entries in a
163
single directory. compare with Inventory.InventoryDirectory which has
164
a dictionary for the children. No bisect capability, can only probe for
165
exact matches, or grab all elements and sort.
166
- What's the risk of error here? Once we have the base format being processed
167
we should have a net win regardless of optimality. So we are going to
168
go with what seems reasonable.
171
Maybe we should do a test profile of the core structure - 10K simulated
172
searches/lookups/etc?
174
Objects for each row?
175
The lifetime of Dirstate objects is current per lock, but see above for
176
possible extensions. The lifetime of a row from a dirstate is expected to be
177
very short in the optimistic case: which we are optimising for. For instance,
178
subtree status will determine from analysis of the disk data what rows need to
179
be examined at all, and will be able to determine from a single row whether
180
that file has altered or not, so we are aiming to process tens of thousands of
181
entries each second within the dirstate context, before exposing anything to
182
the larger codebase. This suggests we want the time for a single file
183
comparison to be < 0.1 milliseconds. That would give us 10000 paths per second
184
processed, and to scale to 100 thousand we'll another order of magnitude to do
185
that. Now, as the lifetime for all unchanged entries is the time to parse, stat
186
the file on disk, and then immediately discard, the overhead of object creation
187
becomes a significant cost.
189
Figures: Creating a tuple from from 3 elements was profiled at 0.0625
190
microseconds, whereas creating a object which is subclassed from tuple was
191
0.500 microseconds, and creating an object with 3 elements and slots was 3
192
microseconds long. 0.1 milliseconds is 100 microseconds, and ideally we'll get
193
down to 10 microseconds for the total processing - having 33% of that be object
194
creation is a huge overhead. There is a potential cost in using tuples within
195
each row which is that the conditional code to do comparisons may be slower
196
than method invocation, but method invocation is known to be slow due to stack
197
frame creation, so avoiding methods in these tight inner loops in unfortunately
198
desirable. We can consider a pyrex version of this with objects in future if
207
from stat import S_IEXEC
226
# This is the Windows equivalent of ENOTDIR
227
# It is defined in pywin32.winerror, but we don't want a strong dependency for
228
# just an error code.
229
ERROR_PATH_NOT_FOUND = 3
230
ERROR_DIRECTORY = 267
233
if not getattr(struct, '_compile', None):
234
# Cannot pre-compile the dirstate pack_stat
235
def pack_stat(st, _encode=binascii.b2a_base64, _pack=struct.pack):
236
"""Convert stat values into a packed representation."""
237
return _encode(_pack('>LLLLLL', st.st_size, int(st.st_mtime),
238
int(st.st_ctime), st.st_dev, st.st_ino & 0xFFFFFFFF,
241
# compile the struct compiler we need, so as to only do it once
242
from _struct import Struct
243
_compiled_pack = Struct('>LLLLLL').pack
244
def pack_stat(st, _encode=binascii.b2a_base64, _pack=_compiled_pack):
245
"""Convert stat values into a packed representation."""
246
# jam 20060614 it isn't really worth removing more entries if we
247
# are going to leave it in packed form.
248
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
249
# With all entries, filesize is 5.9M and read time is maybe 280ms
250
# well within the noise margin
252
# base64 encoding always adds a final newline, so strip it off
253
# The current version
254
return _encode(_pack(st.st_size, int(st.st_mtime), int(st.st_ctime),
255
st.st_dev, st.st_ino & 0xFFFFFFFF, st.st_mode))[:-1]
256
# This is 0.060s / 1.520s faster by not encoding as much information
257
# return _encode(_pack('>LL', int(st.st_mtime), st.st_mode))[:-1]
258
# This is not strictly faster than _encode(_pack())[:-1]
259
# return '%X.%X.%X.%X.%X.%X' % (
260
# st.st_size, int(st.st_mtime), int(st.st_ctime),
261
# st.st_dev, st.st_ino, st.st_mode)
262
# Similar to the _encode(_pack('>LL'))
263
# return '%X.%X' % (int(st.st_mtime), st.st_mode)
266
class DirState(object):
267
"""Record directory and metadata state for fast access.
269
A dirstate is a specialised data structure for managing local working
270
tree state information. Its not yet well defined whether it is platform
271
specific, and if it is how we detect/parameterize that.
273
Dirstates use the usual lock_write, lock_read and unlock mechanisms.
274
Unlike most bzr disk formats, DirStates must be locked for reading, using
275
lock_read. (This is an os file lock internally.) This is necessary
276
because the file can be rewritten in place.
278
DirStates must be explicitly written with save() to commit changes; just
279
unlocking them does not write the changes to disk.
282
_kind_to_minikind = {
288
'tree-reference': 't',
290
_minikind_to_kind = {
296
't': 'tree-reference',
298
_stat_to_minikind = {
303
_to_yesno = {True:'y', False: 'n'} # TODO profile the performance gain
304
# of using int conversion rather than a dict here. AND BLAME ANDREW IF
307
# TODO: jam 20070221 Figure out what to do if we have a record that exceeds
308
# the BISECT_PAGE_SIZE. For now, we just have to make it large enough
309
# that we are sure a single record will always fit.
310
BISECT_PAGE_SIZE = 4096
313
IN_MEMORY_UNMODIFIED = 1
314
IN_MEMORY_MODIFIED = 2
316
# A pack_stat (the x's) that is just noise and will never match the output
319
NULL_PARENT_DETAILS = ('a', '', 0, False, '')
321
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
322
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
324
def __init__(self, path, content_filter_stack_provider=None):
325
"""Create a DirState object.
327
:param path: The path at which the dirstate file on disk should live.
328
:param content_filter_stack_provider: a function that takes a
329
path (relative to the top of the tree) and a file-id as
330
parameters and returns a stack of ContentFilters.
331
If None, no content filtering is performed.
333
# _header_state and _dirblock_state represent the current state
334
# of the dirstate metadata and the per-row data respectiely.
335
# NOT_IN_MEMORY indicates that no data is in memory
336
# IN_MEMORY_UNMODIFIED indicates that what we have in memory
337
# is the same as is on disk
338
# IN_MEMORY_MODIFIED indicates that we have a modified version
339
# of what is on disk.
340
# In future we will add more granularity, for instance _dirblock_state
341
# will probably support partially-in-memory as a separate variable,
342
# allowing for partially-in-memory unmodified and partially-in-memory
344
self._header_state = DirState.NOT_IN_MEMORY
345
self._dirblock_state = DirState.NOT_IN_MEMORY
346
# If true, an error has been detected while updating the dirstate, and
347
# for safety we're not going to commit to disk.
348
self._changes_aborted = False
352
self._state_file = None
353
self._filename = path
354
self._lock_token = None
355
self._lock_state = None
356
self._id_index = None
357
# a map from packed_stat to sha's.
358
self._packed_stat_index = None
359
self._end_of_header = None
360
self._cutoff_time = None
361
self._split_path_cache = {}
362
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
363
if 'hashcache' in debug.debug_flags:
364
self._size_sha1_file = self._size_sha1_file_and_mutter
366
self._size_sha1_file = filters.internal_size_sha_file_byname
367
# These two attributes provide a simple cache for lookups into the
368
# dirstate in-memory vectors. By probing respectively for the last
369
# block, and for the next entry, we save nearly 2 bisections per path
371
self._last_block_index = None
372
self._last_entry_index = None
373
# Content filtering setup
374
self._filter_provider = content_filter_stack_provider
378
(self.__class__.__name__, self._filename)
380
def add(self, path, file_id, kind, stat, fingerprint):
381
"""Add a path to be tracked.
383
:param path: The path within the dirstate - '' is the root, 'foo' is the
384
path foo within the root, 'foo/bar' is the path bar within foo
386
:param file_id: The file id of the path being added.
387
:param kind: The kind of the path, as a string like 'file',
389
:param stat: The output of os.lstat for the path.
390
:param fingerprint: The sha value of the file,
391
or the target of a symlink,
392
or the referenced revision id for tree-references,
393
or '' for directories.
396
# find the block its in.
397
# find the location in the block.
398
# check its not there
400
#------- copied from inventory.ensure_normalized_name - keep synced.
401
# --- normalized_filename wants a unicode basename only, so get one.
402
dirname, basename = osutils.split(path)
403
# we dont import normalized_filename directly because we want to be
404
# able to change the implementation at runtime for tests.
405
norm_name, can_access = osutils.normalized_filename(basename)
406
if norm_name != basename:
410
raise errors.InvalidNormalization(path)
411
# you should never have files called . or ..; just add the directory
412
# in the parent, or according to the special treatment for the root
413
if basename == '.' or basename == '..':
414
raise errors.InvalidEntryName(path)
415
# now that we've normalised, we need the correct utf8 path and
416
# dirname and basename elements. This single encode and split should be
417
# faster than three separate encodes.
418
utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
419
dirname, basename = osutils.split(utf8path)
420
# uses __class__ for speed; the check is needed for safety
421
if file_id.__class__ is not str:
422
raise AssertionError(
423
"must be a utf8 file_id not %s" % (type(file_id), ))
424
# Make sure the file_id does not exist in this tree
425
file_id_entry = self._get_entry(0, fileid_utf8=file_id)
426
if file_id_entry != (None, None):
427
path = osutils.pathjoin(file_id_entry[0][0], file_id_entry[0][1])
428
kind = DirState._minikind_to_kind[file_id_entry[1][0][0]]
429
info = '%s:%s' % (kind, path)
430
raise errors.DuplicateFileId(file_id, info)
431
first_key = (dirname, basename, '')
432
block_index, present = self._find_block_index_from_key(first_key)
434
# check the path is not in the tree
435
block = self._dirblocks[block_index][1]
436
entry_index, _ = self._find_entry_index(first_key, block)
437
while (entry_index < len(block) and
438
block[entry_index][0][0:2] == first_key[0:2]):
439
if block[entry_index][1][0][0] not in 'ar':
440
# this path is in the dirstate in the current tree.
441
raise Exception, "adding already added path!"
444
# The block where we want to put the file is not present. But it
445
# might be because the directory was empty, or not loaded yet. Look
446
# for a parent entry, if not found, raise NotVersionedError
447
parent_dir, parent_base = osutils.split(dirname)
448
parent_block_idx, parent_entry_idx, _, parent_present = \
449
self._get_block_entry_index(parent_dir, parent_base, 0)
450
if not parent_present:
451
raise errors.NotVersionedError(path, str(self))
452
self._ensure_block(parent_block_idx, parent_entry_idx, dirname)
453
block = self._dirblocks[block_index][1]
454
entry_key = (dirname, basename, file_id)
457
packed_stat = DirState.NULLSTAT
460
packed_stat = pack_stat(stat)
461
parent_info = self._empty_parent_info()
462
minikind = DirState._kind_to_minikind[kind]
464
entry_data = entry_key, [
465
(minikind, fingerprint, size, False, packed_stat),
467
elif kind == 'directory':
468
entry_data = entry_key, [
469
(minikind, '', 0, False, packed_stat),
471
elif kind == 'symlink':
472
entry_data = entry_key, [
473
(minikind, fingerprint, size, False, packed_stat),
475
elif kind == 'tree-reference':
476
entry_data = entry_key, [
477
(minikind, fingerprint, 0, False, packed_stat),
480
raise errors.BzrError('unknown kind %r' % kind)
481
entry_index, present = self._find_entry_index(entry_key, block)
483
block.insert(entry_index, entry_data)
485
if block[entry_index][1][0][0] != 'a':
486
raise AssertionError(" %r(%r) already added" % (basename, file_id))
487
block[entry_index][1][0] = entry_data[1][0]
489
if kind == 'directory':
490
# insert a new dirblock
491
self._ensure_block(block_index, entry_index, utf8path)
492
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
494
self._id_index.setdefault(entry_key[2], set()).add(entry_key)
496
def _bisect(self, paths):
497
"""Bisect through the disk structure for specific rows.
499
:param paths: A list of paths to find
500
:return: A dict mapping path => entries for found entries. Missing
501
entries will not be in the map.
502
The list is not sorted, and entries will be populated
503
based on when they were read.
505
self._requires_lock()
506
# We need the file pointer to be right after the initial header block
507
self._read_header_if_needed()
508
# If _dirblock_state was in memory, we should just return info from
509
# there, this function is only meant to handle when we want to read
511
if self._dirblock_state != DirState.NOT_IN_MEMORY:
512
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
514
# The disk representation is generally info + '\0\n\0' at the end. But
515
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
516
# Because it means we can sync on the '\n'
517
state_file = self._state_file
518
file_size = os.fstat(state_file.fileno()).st_size
519
# We end up with 2 extra fields, we should have a trailing '\n' to
520
# ensure that we read the whole record, and we should have a precursur
521
# '' which ensures that we start after the previous '\n'
522
entry_field_count = self._fields_per_entry() + 1
524
low = self._end_of_header
525
high = file_size - 1 # Ignore the final '\0'
526
# Map from (dir, name) => entry
529
# Avoid infinite seeking
530
max_count = 30*len(paths)
532
# pending is a list of places to look.
533
# each entry is a tuple of low, high, dir_names
534
# low -> the first byte offset to read (inclusive)
535
# high -> the last byte offset (inclusive)
536
# dir_names -> The list of (dir, name) pairs that should be found in
537
# the [low, high] range
538
pending = [(low, high, paths)]
540
page_size = self._bisect_page_size
542
fields_to_entry = self._get_fields_to_entry()
545
low, high, cur_files = pending.pop()
547
if not cur_files or low >= high:
552
if count > max_count:
553
raise errors.BzrError('Too many seeks, most likely a bug.')
555
mid = max(low, (low+high-page_size)/2)
558
# limit the read size, so we don't end up reading data that we have
560
read_size = min(page_size, (high-mid)+1)
561
block = state_file.read(read_size)
564
entries = block.split('\n')
567
# We didn't find a '\n', so we cannot have found any records.
568
# So put this range back and try again. But we know we have to
569
# increase the page size, because a single read did not contain
570
# a record break (so records must be larger than page_size)
572
pending.append((low, high, cur_files))
575
# Check the first and last entries, in case they are partial, or if
576
# we don't care about the rest of this page
578
first_fields = entries[0].split('\0')
579
if len(first_fields) < entry_field_count:
580
# We didn't get the complete first entry
581
# so move start, and grab the next, which
582
# should be a full entry
583
start += len(entries[0])+1
584
first_fields = entries[1].split('\0')
587
if len(first_fields) <= 2:
588
# We didn't even get a filename here... what do we do?
589
# Try a large page size and repeat this query
591
pending.append((low, high, cur_files))
594
# Find what entries we are looking for, which occur before and
595
# after this first record.
598
first_path = first_fields[1] + '/' + first_fields[2]
600
first_path = first_fields[2]
601
first_loc = _bisect_path_left(cur_files, first_path)
603
# These exist before the current location
604
pre = cur_files[:first_loc]
605
# These occur after the current location, which may be in the
606
# data we read, or might be after the last entry
607
post = cur_files[first_loc:]
609
if post and len(first_fields) >= entry_field_count:
610
# We have files after the first entry
612
# Parse the last entry
613
last_entry_num = len(entries)-1
614
last_fields = entries[last_entry_num].split('\0')
615
if len(last_fields) < entry_field_count:
616
# The very last hunk was not complete,
617
# read the previous hunk
618
after = mid + len(block) - len(entries[-1])
620
last_fields = entries[last_entry_num].split('\0')
622
after = mid + len(block)
625
last_path = last_fields[1] + '/' + last_fields[2]
627
last_path = last_fields[2]
628
last_loc = _bisect_path_right(post, last_path)
630
middle_files = post[:last_loc]
631
post = post[last_loc:]
634
# We have files that should occur in this block
635
# (>= first, <= last)
636
# Either we will find them here, or we can mark them as
639
if middle_files[0] == first_path:
640
# We might need to go before this location
641
pre.append(first_path)
642
if middle_files[-1] == last_path:
643
post.insert(0, last_path)
645
# Find out what paths we have
646
paths = {first_path:[first_fields]}
647
# last_path might == first_path so we need to be
648
# careful if we should append rather than overwrite
649
if last_entry_num != first_entry_num:
650
paths.setdefault(last_path, []).append(last_fields)
651
for num in xrange(first_entry_num+1, last_entry_num):
652
# TODO: jam 20070223 We are already splitting here, so
653
# shouldn't we just split the whole thing rather
654
# than doing the split again in add_one_record?
655
fields = entries[num].split('\0')
657
path = fields[1] + '/' + fields[2]
660
paths.setdefault(path, []).append(fields)
662
for path in middle_files:
663
for fields in paths.get(path, []):
664
# offset by 1 because of the opening '\0'
665
# consider changing fields_to_entry to avoid the
667
entry = fields_to_entry(fields[1:])
668
found.setdefault(path, []).append(entry)
670
# Now we have split up everything into pre, middle, and post, and
671
# we have handled everything that fell in 'middle'.
672
# We add 'post' first, so that we prefer to seek towards the
673
# beginning, so that we will tend to go as early as we need, and
674
# then only seek forward after that.
676
pending.append((after, high, post))
678
pending.append((low, start-1, pre))
680
# Consider that we may want to return the directory entries in sorted
681
# order. For now, we just return them in whatever order we found them,
682
# and leave it up to the caller if they care if it is ordered or not.
685
def _bisect_dirblocks(self, dir_list):
686
"""Bisect through the disk structure to find entries in given dirs.
688
_bisect_dirblocks is meant to find the contents of directories, which
689
differs from _bisect, which only finds individual entries.
691
:param dir_list: A sorted list of directory names ['', 'dir', 'foo'].
692
:return: A map from dir => entries_for_dir
694
# TODO: jam 20070223 A lot of the bisecting logic could be shared
695
# between this and _bisect. It would require parameterizing the
696
# inner loop with a function, though. We should evaluate the
697
# performance difference.
698
self._requires_lock()
699
# We need the file pointer to be right after the initial header block
700
self._read_header_if_needed()
701
# If _dirblock_state was in memory, we should just return info from
702
# there, this function is only meant to handle when we want to read
704
if self._dirblock_state != DirState.NOT_IN_MEMORY:
705
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
706
# The disk representation is generally info + '\0\n\0' at the end. But
707
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
708
# Because it means we can sync on the '\n'
709
state_file = self._state_file
710
file_size = os.fstat(state_file.fileno()).st_size
711
# We end up with 2 extra fields, we should have a trailing '\n' to
712
# ensure that we read the whole record, and we should have a precursur
713
# '' which ensures that we start after the previous '\n'
714
entry_field_count = self._fields_per_entry() + 1
716
low = self._end_of_header
717
high = file_size - 1 # Ignore the final '\0'
718
# Map from dir => entry
721
# Avoid infinite seeking
722
max_count = 30*len(dir_list)
724
# pending is a list of places to look.
725
# each entry is a tuple of low, high, dir_names
726
# low -> the first byte offset to read (inclusive)
727
# high -> the last byte offset (inclusive)
728
# dirs -> The list of directories that should be found in
729
# the [low, high] range
730
pending = [(low, high, dir_list)]
732
page_size = self._bisect_page_size
734
fields_to_entry = self._get_fields_to_entry()
737
low, high, cur_dirs = pending.pop()
739
if not cur_dirs or low >= high:
744
if count > max_count:
745
raise errors.BzrError('Too many seeks, most likely a bug.')
747
mid = max(low, (low+high-page_size)/2)
750
# limit the read size, so we don't end up reading data that we have
752
read_size = min(page_size, (high-mid)+1)
753
block = state_file.read(read_size)
756
entries = block.split('\n')
759
# We didn't find a '\n', so we cannot have found any records.
760
# So put this range back and try again. But we know we have to
761
# increase the page size, because a single read did not contain
762
# a record break (so records must be larger than page_size)
764
pending.append((low, high, cur_dirs))
767
# Check the first and last entries, in case they are partial, or if
768
# we don't care about the rest of this page
770
first_fields = entries[0].split('\0')
771
if len(first_fields) < entry_field_count:
772
# We didn't get the complete first entry
773
# so move start, and grab the next, which
774
# should be a full entry
775
start += len(entries[0])+1
776
first_fields = entries[1].split('\0')
779
if len(first_fields) <= 1:
780
# We didn't even get a dirname here... what do we do?
781
# Try a large page size and repeat this query
783
pending.append((low, high, cur_dirs))
786
# Find what entries we are looking for, which occur before and
787
# after this first record.
789
first_dir = first_fields[1]
790
first_loc = bisect.bisect_left(cur_dirs, first_dir)
792
# These exist before the current location
793
pre = cur_dirs[:first_loc]
794
# These occur after the current location, which may be in the
795
# data we read, or might be after the last entry
796
post = cur_dirs[first_loc:]
798
if post and len(first_fields) >= entry_field_count:
799
# We have records to look at after the first entry
801
# Parse the last entry
802
last_entry_num = len(entries)-1
803
last_fields = entries[last_entry_num].split('\0')
804
if len(last_fields) < entry_field_count:
805
# The very last hunk was not complete,
806
# read the previous hunk
807
after = mid + len(block) - len(entries[-1])
809
last_fields = entries[last_entry_num].split('\0')
811
after = mid + len(block)
813
last_dir = last_fields[1]
814
last_loc = bisect.bisect_right(post, last_dir)
816
middle_files = post[:last_loc]
817
post = post[last_loc:]
820
# We have files that should occur in this block
821
# (>= first, <= last)
822
# Either we will find them here, or we can mark them as
825
if middle_files[0] == first_dir:
826
# We might need to go before this location
827
pre.append(first_dir)
828
if middle_files[-1] == last_dir:
829
post.insert(0, last_dir)
831
# Find out what paths we have
832
paths = {first_dir:[first_fields]}
833
# last_dir might == first_dir so we need to be
834
# careful if we should append rather than overwrite
835
if last_entry_num != first_entry_num:
836
paths.setdefault(last_dir, []).append(last_fields)
837
for num in xrange(first_entry_num+1, last_entry_num):
838
# TODO: jam 20070223 We are already splitting here, so
839
# shouldn't we just split the whole thing rather
840
# than doing the split again in add_one_record?
841
fields = entries[num].split('\0')
842
paths.setdefault(fields[1], []).append(fields)
844
for cur_dir in middle_files:
845
for fields in paths.get(cur_dir, []):
846
# offset by 1 because of the opening '\0'
847
# consider changing fields_to_entry to avoid the
849
entry = fields_to_entry(fields[1:])
850
found.setdefault(cur_dir, []).append(entry)
852
# Now we have split up everything into pre, middle, and post, and
853
# we have handled everything that fell in 'middle'.
854
# We add 'post' first, so that we prefer to seek towards the
855
# beginning, so that we will tend to go as early as we need, and
856
# then only seek forward after that.
858
pending.append((after, high, post))
860
pending.append((low, start-1, pre))
864
def _bisect_recursive(self, paths):
865
"""Bisect for entries for all paths and their children.
867
This will use bisect to find all records for the supplied paths. It
868
will then continue to bisect for any records which are marked as
869
directories. (and renames?)
871
:param paths: A sorted list of (dir, name) pairs
872
eg: [('', 'a'), ('', 'f'), ('a/b', 'c')]
873
:return: A dictionary mapping (dir, name, file_id) => [tree_info]
875
# Map from (dir, name, file_id) => [tree_info]
878
found_dir_names = set()
880
# Directories that have been read
881
processed_dirs = set()
882
# Get the ball rolling with the first bisect for all entries.
883
newly_found = self._bisect(paths)
886
# Directories that need to be read
888
paths_to_search = set()
889
for entry_list in newly_found.itervalues():
890
for dir_name_id, trees_info in entry_list:
891
found[dir_name_id] = trees_info
892
found_dir_names.add(dir_name_id[:2])
894
for tree_info in trees_info:
895
minikind = tree_info[0]
898
# We already processed this one as a directory,
899
# we don't need to do the extra work again.
901
subdir, name, file_id = dir_name_id
902
path = osutils.pathjoin(subdir, name)
904
if path not in processed_dirs:
905
pending_dirs.add(path)
906
elif minikind == 'r':
907
# Rename, we need to directly search the target
908
# which is contained in the fingerprint column
909
dir_name = osutils.split(tree_info[1])
910
if dir_name[0] in pending_dirs:
911
# This entry will be found in the dir search
913
if dir_name not in found_dir_names:
914
paths_to_search.add(tree_info[1])
915
# Now we have a list of paths to look for directly, and
916
# directory blocks that need to be read.
917
# newly_found is mixing the keys between (dir, name) and path
918
# entries, but that is okay, because we only really care about the
920
newly_found = self._bisect(sorted(paths_to_search))
921
newly_found.update(self._bisect_dirblocks(sorted(pending_dirs)))
922
processed_dirs.update(pending_dirs)
925
def _discard_merge_parents(self):
926
"""Discard any parents trees beyond the first.
928
Note that if this fails the dirstate is corrupted.
930
After this function returns the dirstate contains 2 trees, neither of
933
self._read_header_if_needed()
934
parents = self.get_parent_ids()
937
# only require all dirblocks if we are doing a full-pass removal.
938
self._read_dirblocks_if_needed()
939
dead_patterns = set([('a', 'r'), ('a', 'a'), ('r', 'r'), ('r', 'a')])
940
def iter_entries_removable():
941
for block in self._dirblocks:
942
deleted_positions = []
943
for pos, entry in enumerate(block[1]):
945
if (entry[1][0][0], entry[1][1][0]) in dead_patterns:
946
deleted_positions.append(pos)
947
if deleted_positions:
948
if len(deleted_positions) == len(block[1]):
951
for pos in reversed(deleted_positions):
953
# if the first parent is a ghost:
954
if parents[0] in self.get_ghosts():
955
empty_parent = [DirState.NULL_PARENT_DETAILS]
956
for entry in iter_entries_removable():
957
entry[1][1:] = empty_parent
959
for entry in iter_entries_removable():
963
self._parents = [parents[0]]
964
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
965
self._header_state = DirState.IN_MEMORY_MODIFIED
967
def _empty_parent_info(self):
968
return [DirState.NULL_PARENT_DETAILS] * (len(self._parents) -
971
def _ensure_block(self, parent_block_index, parent_row_index, dirname):
972
"""Ensure a block for dirname exists.
974
This function exists to let callers which know that there is a
975
directory dirname ensure that the block for it exists. This block can
976
fail to exist because of demand loading, or because a directory had no
977
children. In either case it is not an error. It is however an error to
978
call this if there is no parent entry for the directory, and thus the
979
function requires the coordinates of such an entry to be provided.
981
The root row is special cased and can be indicated with a parent block
984
:param parent_block_index: The index of the block in which dirname's row
986
:param parent_row_index: The index in the parent block where the row
988
:param dirname: The utf8 dirname to ensure there is a block for.
989
:return: The index for the block.
991
if dirname == '' and parent_row_index == 0 and parent_block_index == 0:
992
# This is the signature of the root row, and the
993
# contents-of-root row is always index 1
995
# the basename of the directory must be the end of its full name.
996
if not (parent_block_index == -1 and
997
parent_block_index == -1 and dirname == ''):
998
if not dirname.endswith(
999
self._dirblocks[parent_block_index][1][parent_row_index][0][1]):
1000
raise AssertionError("bad dirname %r" % dirname)
1001
block_index, present = self._find_block_index_from_key((dirname, '', ''))
1003
## In future, when doing partial parsing, this should load and
1004
# populate the entire block.
1005
self._dirblocks.insert(block_index, (dirname, []))
1008
def _entries_to_current_state(self, new_entries):
1009
"""Load new_entries into self.dirblocks.
1011
Process new_entries into the current state object, making them the active
1012
state. The entries are grouped together by directory to form dirblocks.
1014
:param new_entries: A sorted list of entries. This function does not sort
1015
to prevent unneeded overhead when callers have a sorted list already.
1018
if new_entries[0][0][0:2] != ('', ''):
1019
raise AssertionError(
1020
"Missing root row %r" % (new_entries[0][0],))
1021
# The two blocks here are deliberate: the root block and the
1022
# contents-of-root block.
1023
self._dirblocks = [('', []), ('', [])]
1024
current_block = self._dirblocks[0][1]
1025
current_dirname = ''
1027
append_entry = current_block.append
1028
for entry in new_entries:
1029
if entry[0][0] != current_dirname:
1030
# new block - different dirname
1032
current_dirname = entry[0][0]
1033
self._dirblocks.append((current_dirname, current_block))
1034
append_entry = current_block.append
1035
# append the entry to the current block
1037
self._split_root_dirblock_into_contents()
1039
def _split_root_dirblock_into_contents(self):
1040
"""Split the root dirblocks into root and contents-of-root.
1042
After parsing by path, we end up with root entries and contents-of-root
1043
entries in the same block. This loop splits them out again.
1045
# The above loop leaves the "root block" entries mixed with the
1046
# "contents-of-root block". But we don't want an if check on
1047
# all entries, so instead we just fix it up here.
1048
if self._dirblocks[1] != ('', []):
1049
raise ValueError("bad dirblock start %r" % (self._dirblocks[1],))
1051
contents_of_root_block = []
1052
for entry in self._dirblocks[0][1]:
1053
if not entry[0][1]: # This is a root entry
1054
root_block.append(entry)
1056
contents_of_root_block.append(entry)
1057
self._dirblocks[0] = ('', root_block)
1058
self._dirblocks[1] = ('', contents_of_root_block)
1060
def _entries_for_path(self, path):
1061
"""Return a list with all the entries that match path for all ids."""
1062
dirname, basename = os.path.split(path)
1063
key = (dirname, basename, '')
1064
block_index, present = self._find_block_index_from_key(key)
1066
# the block which should contain path is absent.
1069
block = self._dirblocks[block_index][1]
1070
entry_index, _ = self._find_entry_index(key, block)
1071
# we may need to look at multiple entries at this path: walk while the specific_files match.
1072
while (entry_index < len(block) and
1073
block[entry_index][0][0:2] == key[0:2]):
1074
result.append(block[entry_index])
1078
def _entry_to_line(self, entry):
1079
"""Serialize entry to a NULL delimited line ready for _get_output_lines.
1081
:param entry: An entry_tuple as defined in the module docstring.
1083
entire_entry = list(entry[0])
1084
for tree_number, tree_data in enumerate(entry[1]):
1085
# (minikind, fingerprint, size, executable, tree_specific_string)
1086
entire_entry.extend(tree_data)
1087
# 3 for the key, 5 for the fields per tree.
1088
tree_offset = 3 + tree_number * 5
1090
entire_entry[tree_offset + 0] = tree_data[0]
1092
entire_entry[tree_offset + 2] = str(tree_data[2])
1094
entire_entry[tree_offset + 3] = DirState._to_yesno[tree_data[3]]
1095
return '\0'.join(entire_entry)
1097
def _fields_per_entry(self):
1098
"""How many null separated fields should be in each entry row.
1100
Each line now has an extra '\n' field which is not used
1101
so we just skip over it
1103
3 fields for the key
1104
+ number of fields per tree_data (5) * tree count
1107
tree_count = 1 + self._num_present_parents()
1108
return 3 + 5 * tree_count + 1
1110
def _find_block(self, key, add_if_missing=False):
1111
"""Return the block that key should be present in.
1113
:param key: A dirstate entry key.
1114
:return: The block tuple.
1116
block_index, present = self._find_block_index_from_key(key)
1118
if not add_if_missing:
1119
# check to see if key is versioned itself - we might want to
1120
# add it anyway, because dirs with no entries dont get a
1121
# dirblock at parse time.
1122
# This is an uncommon branch to take: most dirs have children,
1123
# and most code works with versioned paths.
1124
parent_base, parent_name = osutils.split(key[0])
1125
if not self._get_block_entry_index(parent_base, parent_name, 0)[3]:
1126
# some parent path has not been added - its an error to add
1128
raise errors.NotVersionedError(key[0:2], str(self))
1129
self._dirblocks.insert(block_index, (key[0], []))
1130
return self._dirblocks[block_index]
1132
def _find_block_index_from_key(self, key):
1133
"""Find the dirblock index for a key.
1135
:return: The block index, True if the block for the key is present.
1137
if key[0:2] == ('', ''):
1140
if (self._last_block_index is not None and
1141
self._dirblocks[self._last_block_index][0] == key[0]):
1142
return self._last_block_index, True
1145
block_index = bisect_dirblock(self._dirblocks, key[0], 1,
1146
cache=self._split_path_cache)
1147
# _right returns one-past-where-key is so we have to subtract
1148
# one to use it. we use _right here because there are two
1149
# '' blocks - the root, and the contents of root
1150
# we always have a minimum of 2 in self._dirblocks: root and
1151
# root-contents, and for '', we get 2 back, so this is
1152
# simple and correct:
1153
present = (block_index < len(self._dirblocks) and
1154
self._dirblocks[block_index][0] == key[0])
1155
self._last_block_index = block_index
1156
# Reset the entry index cache to the beginning of the block.
1157
self._last_entry_index = -1
1158
return block_index, present
1160
def _find_entry_index(self, key, block):
1161
"""Find the entry index for a key in a block.
1163
:return: The entry index, True if the entry for the key is present.
1165
len_block = len(block)
1167
if self._last_entry_index is not None:
1169
entry_index = self._last_entry_index + 1
1170
# A hit is when the key is after the last slot, and before or
1171
# equal to the next slot.
1172
if ((entry_index > 0 and block[entry_index - 1][0] < key) and
1173
key <= block[entry_index][0]):
1174
self._last_entry_index = entry_index
1175
present = (block[entry_index][0] == key)
1176
return entry_index, present
1179
entry_index = bisect.bisect_left(block, (key, []))
1180
present = (entry_index < len_block and
1181
block[entry_index][0] == key)
1182
self._last_entry_index = entry_index
1183
return entry_index, present
1186
def from_tree(tree, dir_state_filename):
1187
"""Create a dirstate from a bzr Tree.
1189
:param tree: The tree which should provide parent information and
1191
:return: a DirState object which is currently locked for writing.
1192
(it was locked by DirState.initialize)
1194
result = DirState.initialize(dir_state_filename)
1198
parent_ids = tree.get_parent_ids()
1199
num_parents = len(parent_ids)
1201
for parent_id in parent_ids:
1202
parent_tree = tree.branch.repository.revision_tree(parent_id)
1203
parent_trees.append((parent_id, parent_tree))
1204
parent_tree.lock_read()
1205
result.set_parent_trees(parent_trees, [])
1206
result.set_state_from_inventory(tree.inventory)
1208
for revid, parent_tree in parent_trees:
1209
parent_tree.unlock()
1212
# The caller won't have a chance to unlock this, so make sure we
1218
def update_by_delta(self, delta):
1219
"""Apply an inventory delta to the dirstate for tree 0
1221
:param delta: An inventory delta. See Inventory.apply_delta for
1224
self._read_dirblocks_if_needed()
1227
for old_path, new_path, file_id, inv_entry in sorted(delta, reverse=True):
1228
if (file_id in insertions) or (file_id in removals):
1229
raise AssertionError("repeated file id in delta %r" % (file_id,))
1230
if old_path is not None:
1231
old_path = old_path.encode('utf-8')
1232
removals[file_id] = old_path
1233
if new_path is not None:
1234
new_path = new_path.encode('utf-8')
1235
dirname, basename = osutils.split(new_path)
1236
key = (dirname, basename, file_id)
1237
minikind = DirState._kind_to_minikind[inv_entry.kind]
1239
fingerprint = inv_entry.reference_revision
1242
insertions[file_id] = (key, minikind, inv_entry.executable,
1243
fingerprint, new_path)
1244
# Transform moves into delete+add pairs
1245
if None not in (old_path, new_path):
1246
for child in self._iter_child_entries(0, old_path):
1247
if child[0][2] in insertions or child[0][2] in removals:
1249
child_dirname = child[0][0]
1250
child_basename = child[0][1]
1251
minikind = child[1][0][0]
1252
fingerprint = child[1][0][4]
1253
executable = child[1][0][3]
1254
old_child_path = osutils.pathjoin(child[0][0],
1256
removals[child[0][2]] = old_child_path
1257
child_suffix = child_dirname[len(old_path):]
1258
new_child_dirname = (new_path + child_suffix)
1259
key = (new_child_dirname, child_basename, child[0][2])
1260
new_child_path = os.path.join(new_child_dirname,
1262
insertions[child[0][2]] = (key, minikind, executable,
1263
fingerprint, new_child_path)
1264
self._apply_removals(removals.values())
1265
self._apply_insertions(insertions.values())
1267
def _apply_removals(self, removals):
1268
for path in sorted(removals, reverse=True):
1269
dirname, basename = osutils.split(path)
1270
block_i, entry_i, d_present, f_present = \
1271
self._get_block_entry_index(dirname, basename, 0)
1272
entry = self._dirblocks[block_i][1][entry_i]
1273
self._make_absent(entry)
1274
# See if we have a malformed delta: deleting a directory must not
1275
# leave crud behind. This increases the number of bisects needed
1276
# substantially, but deletion or renames of large numbers of paths
1277
# is rare enough it shouldn't be an issue (famous last words?) RBC
1279
block_i, entry_i, d_present, f_present = \
1280
self._get_block_entry_index(path, '', 0)
1282
# The dir block is still present in the dirstate; this could
1283
# be due to it being in a parent tree, or a corrupt delta.
1284
for child_entry in self._dirblocks[block_i][1]:
1285
if child_entry[1][0][0] not in ('r', 'a'):
1286
raise errors.InconsistentDelta(path, entry[0][2],
1287
"The file id was deleted but its children were "
1290
def _apply_insertions(self, adds):
1291
for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
1292
self.update_minimal(key, minikind, executable, fingerprint,
1293
path_utf8=path_utf8)
1295
def update_basis_by_delta(self, delta, new_revid):
1296
"""Update the parents of this tree after a commit.
1298
This gives the tree one parent, with revision id new_revid. The
1299
inventory delta is applied to the current basis tree to generate the
1300
inventory for the parent new_revid, and all other parent trees are
1303
Note that an exception during the operation of this method will leave
1304
the dirstate in a corrupt state where it should not be saved.
1306
Finally, we expect all changes to be synchronising the basis tree with
1309
:param new_revid: The new revision id for the trees parent.
1310
:param delta: An inventory delta (see apply_inventory_delta) describing
1311
the changes from the current left most parent revision to new_revid.
1313
self._read_dirblocks_if_needed()
1314
self._discard_merge_parents()
1315
if self._ghosts != []:
1316
raise NotImplementedError(self.update_basis_by_delta)
1317
if len(self._parents) == 0:
1318
# setup a blank tree, the most simple way.
1319
empty_parent = DirState.NULL_PARENT_DETAILS
1320
for entry in self._iter_entries():
1321
entry[1].append(empty_parent)
1322
self._parents.append(new_revid)
1324
self._parents[0] = new_revid
1326
delta = sorted(delta, reverse=True)
1330
# The paths this function accepts are unicode and must be encoded as we
1332
encode = cache_utf8.encode
1333
inv_to_entry = self._inv_entry_to_details
1334
# delta is now (deletes, changes), (adds) in reverse lexographical
1336
# deletes in reverse lexographic order are safe to process in situ.
1337
# renames are not, as a rename from any path could go to a path
1338
# lexographically lower, so we transform renames into delete, add pairs,
1339
# expanding them recursively as needed.
1340
# At the same time, to reduce interface friction we convert the input
1341
# inventory entries to dirstate.
1342
root_only = ('', '')
1343
for old_path, new_path, file_id, inv_entry in delta:
1344
if old_path is None:
1345
adds.append((None, encode(new_path), file_id,
1346
inv_to_entry(inv_entry), True))
1347
elif new_path is None:
1348
deletes.append((encode(old_path), None, file_id, None, True))
1349
elif (old_path, new_path) != root_only:
1351
# Because renames must preserve their children we must have
1352
# processed all relocations and removes before hand. The sort
1353
# order ensures we've examined the child paths, but we also
1354
# have to execute the removals, or the split to an add/delete
1355
# pair will result in the deleted item being reinserted, or
1356
# renamed items being reinserted twice - and possibly at the
1357
# wrong place. Splitting into a delete/add pair also simplifies
1358
# the handling of entries with ('f', ...), ('r' ...) because
1359
# the target of the 'r' is old_path here, and we add that to
1360
# deletes, meaning that the add handler does not need to check
1361
# for 'r' items on every pass.
1362
self._update_basis_apply_deletes(deletes)
1364
new_path_utf8 = encode(new_path)
1365
# Split into an add/delete pair recursively.
1366
adds.append((None, new_path_utf8, file_id,
1367
inv_to_entry(inv_entry), False))
1368
# Expunge deletes that we've seen so that deleted/renamed
1369
# children of a rename directory are handled correctly.
1370
new_deletes = reversed(list(self._iter_child_entries(1,
1372
# Remove the current contents of the tree at orig_path, and
1373
# reinsert at the correct new path.
1374
for entry in new_deletes:
1376
source_path = entry[0][0] + '/' + entry[0][1]
1378
source_path = entry[0][1]
1380
target_path = new_path_utf8 + source_path[len(old_path):]
1383
raise AssertionError("cannot rename directory to"
1385
target_path = source_path[len(old_path) + 1:]
1386
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1388
(source_path, target_path, entry[0][2], None, False))
1390
(encode(old_path), new_path, file_id, None, False))
1392
# changes to just the root should not require remove/insertion
1394
changes.append((encode(old_path), encode(new_path), file_id,
1395
inv_to_entry(inv_entry)))
1397
# Finish expunging deletes/first half of renames.
1398
self._update_basis_apply_deletes(deletes)
1399
# Reinstate second half of renames and new paths.
1400
self._update_basis_apply_adds(adds)
1401
# Apply in-situ changes.
1402
self._update_basis_apply_changes(changes)
1404
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1405
self._header_state = DirState.IN_MEMORY_MODIFIED
1406
self._id_index = None
1409
def _update_basis_apply_adds(self, adds):
1410
"""Apply a sequence of adds to tree 1 during update_basis_by_delta.
1412
They may be adds, or renames that have been split into add/delete
1415
:param adds: A sequence of adds. Each add is a tuple:
1416
(None, new_path_utf8, file_id, (entry_details), real_add). real_add
1417
is False when the add is the second half of a remove-and-reinsert
1418
pair created to handle renames and deletes.
1420
# Adds are accumulated partly from renames, so can be in any input
1423
# adds is now in lexographic order, which places all parents before
1424
# their children, so we can process it linearly.
1426
for old_path, new_path, file_id, new_details, real_add in adds:
1427
# the entry for this file_id must be in tree 0.
1428
entry = self._get_entry(0, file_id, new_path)
1429
if entry[0] is None or entry[0][2] != file_id:
1430
self._changes_aborted = True
1431
raise errors.InconsistentDelta(new_path, file_id,
1432
'working tree does not contain new entry')
1433
if real_add and entry[1][1][0] not in absent:
1434
self._changes_aborted = True
1435
raise errors.InconsistentDelta(new_path, file_id,
1436
'The entry was considered to be a genuinely new record,'
1437
' but there was already an old record for it.')
1438
# We don't need to update the target of an 'r' because the handling
1439
# of renames turns all 'r' situations into a delete at the original
1441
entry[1][1] = new_details
1443
def _update_basis_apply_changes(self, changes):
1444
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1446
:param adds: A sequence of changes. Each change is a tuple:
1447
(path_utf8, path_utf8, file_id, (entry_details))
1450
for old_path, new_path, file_id, new_details in changes:
1451
# the entry for this file_id must be in tree 0.
1452
entry = self._get_entry(0, file_id, new_path)
1453
if entry[0] is None or entry[0][2] != file_id:
1454
self._changes_aborted = True
1455
raise errors.InconsistentDelta(new_path, file_id,
1456
'working tree does not contain new entry')
1457
if (entry[1][0][0] in absent or
1458
entry[1][1][0] in absent):
1459
self._changes_aborted = True
1460
raise errors.InconsistentDelta(new_path, file_id,
1461
'changed considered absent')
1462
entry[1][1] = new_details
1464
def _update_basis_apply_deletes(self, deletes):
1465
"""Apply a sequence of deletes to tree 1 during update_basis_by_delta.
1467
They may be deletes, or renames that have been split into add/delete
1470
:param deletes: A sequence of deletes. Each delete is a tuple:
1471
(old_path_utf8, new_path_utf8, file_id, None, real_delete).
1472
real_delete is True when the desired outcome is an actual deletion
1473
rather than the rename handling logic temporarily deleting a path
1474
during the replacement of a parent.
1476
null = DirState.NULL_PARENT_DETAILS
1477
for old_path, new_path, file_id, _, real_delete in deletes:
1478
if real_delete != (new_path is None):
1479
raise AssertionError("bad delete delta")
1480
# the entry for this file_id must be in tree 1.
1481
dirname, basename = osutils.split(old_path)
1482
block_index, entry_index, dir_present, file_present = \
1483
self._get_block_entry_index(dirname, basename, 1)
1484
if not file_present:
1485
self._changes_aborted = True
1486
raise errors.InconsistentDelta(old_path, file_id,
1487
'basis tree does not contain removed entry')
1488
entry = self._dirblocks[block_index][1][entry_index]
1489
if entry[0][2] != file_id:
1490
self._changes_aborted = True
1491
raise errors.InconsistentDelta(old_path, file_id,
1492
'mismatched file_id in tree 1')
1494
if entry[1][0][0] != 'a':
1495
self._changes_aborted = True
1496
raise errors.InconsistentDelta(old_path, file_id,
1497
'This was marked as a real delete, but the WT state'
1498
' claims that it still exists and is versioned.')
1499
del self._dirblocks[block_index][1][entry_index]
1501
if entry[1][0][0] == 'a':
1502
self._changes_aborted = True
1503
raise errors.InconsistentDelta(old_path, file_id,
1504
'The entry was considered a rename, but the source path'
1505
' is marked as absent.')
1506
# For whatever reason, we were asked to rename an entry
1507
# that was originally marked as deleted. This could be
1508
# because we are renaming the parent directory, and the WT
1509
# current state has the file marked as deleted.
1510
elif entry[1][0][0] == 'r':
1511
# implement the rename
1512
del self._dirblocks[block_index][1][entry_index]
1514
# it is being resurrected here, so blank it out temporarily.
1515
self._dirblocks[block_index][1][entry_index][1][1] = null
1517
def _observed_sha1(self, entry, sha1, stat_value,
1518
_stat_to_minikind=_stat_to_minikind, _pack_stat=pack_stat):
1519
"""Note the sha1 of a file.
1521
:param entry: The entry the sha1 is for.
1522
:param sha1: The observed sha1.
1523
:param stat_value: The os.lstat for the file.
1526
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
1530
packed_stat = _pack_stat(stat_value)
1532
if self._cutoff_time is None:
1533
self._sha_cutoff_time()
1534
if (stat_value.st_mtime < self._cutoff_time
1535
and stat_value.st_ctime < self._cutoff_time):
1536
entry[1][0] = ('f', sha1, entry[1][0][2], entry[1][0][3],
1538
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1540
def _sha_cutoff_time(self):
1541
"""Return cutoff time.
1543
Files modified more recently than this time are at risk of being
1544
undetectably modified and so can't be cached.
1546
# Cache the cutoff time as long as we hold a lock.
1547
# time.time() isn't super expensive (approx 3.38us), but
1548
# when you call it 50,000 times it adds up.
1549
# For comparison, os.lstat() costs 7.2us if it is hot.
1550
self._cutoff_time = int(time.time()) - 3
1551
return self._cutoff_time
1553
def _lstat(self, abspath, entry):
1554
"""Return the os.lstat value for this path."""
1555
return os.lstat(abspath)
1557
def _size_sha1_file_and_mutter(self, abspath, filter_list):
1558
# when -Dhashcache is turned on, this is monkey-patched in to log
1560
trace.mutter("dirstate sha1 " + abspath)
1561
return filters.internal_size_sha_file_byname(abspath, filter_list)
1563
def _is_executable(self, mode, old_executable):
1564
"""Is this file executable?"""
1565
return bool(S_IEXEC & mode)
1567
def _is_executable_win32(self, mode, old_executable):
1568
"""On win32 the executable bit is stored in the dirstate."""
1569
return old_executable
1571
if sys.platform == 'win32':
1572
_is_executable = _is_executable_win32
1574
def _read_link(self, abspath, old_link):
1575
"""Read the target of a symlink"""
1576
# TODO: jam 200700301 On Win32, this could just return the value
1577
# already in memory. However, this really needs to be done at a
1578
# higher level, because there either won't be anything on disk,
1579
# or the thing on disk will be a file.
1580
return os.readlink(abspath)
1582
def get_ghosts(self):
1583
"""Return a list of the parent tree revision ids that are ghosts."""
1584
self._read_header_if_needed()
1587
def get_lines(self):
1588
"""Serialise the entire dirstate to a sequence of lines."""
1589
if (self._header_state == DirState.IN_MEMORY_UNMODIFIED and
1590
self._dirblock_state == DirState.IN_MEMORY_UNMODIFIED):
1591
# read whats on disk.
1592
self._state_file.seek(0)
1593
return self._state_file.readlines()
1595
lines.append(self._get_parents_line(self.get_parent_ids()))
1596
lines.append(self._get_ghosts_line(self._ghosts))
1597
# append the root line which is special cased
1598
lines.extend(map(self._entry_to_line, self._iter_entries()))
1599
return self._get_output_lines(lines)
1601
def _get_ghosts_line(self, ghost_ids):
1602
"""Create a line for the state file for ghost information."""
1603
return '\0'.join([str(len(ghost_ids))] + ghost_ids)
1605
def _get_parents_line(self, parent_ids):
1606
"""Create a line for the state file for parents information."""
1607
return '\0'.join([str(len(parent_ids))] + parent_ids)
1609
def _get_fields_to_entry(self):
1610
"""Get a function which converts entry fields into a entry record.
1612
This handles size and executable, as well as parent records.
1614
:return: A function which takes a list of fields, and returns an
1615
appropriate record for storing in memory.
1617
# This is intentionally unrolled for performance
1618
num_present_parents = self._num_present_parents()
1619
if num_present_parents == 0:
1620
def fields_to_entry_0_parents(fields, _int=int):
1621
path_name_file_id_key = (fields[0], fields[1], fields[2])
1622
return (path_name_file_id_key, [
1624
fields[3], # minikind
1625
fields[4], # fingerprint
1626
_int(fields[5]), # size
1627
fields[6] == 'y', # executable
1628
fields[7], # packed_stat or revision_id
1630
return fields_to_entry_0_parents
1631
elif num_present_parents == 1:
1632
def fields_to_entry_1_parent(fields, _int=int):
1633
path_name_file_id_key = (fields[0], fields[1], fields[2])
1634
return (path_name_file_id_key, [
1636
fields[3], # minikind
1637
fields[4], # fingerprint
1638
_int(fields[5]), # size
1639
fields[6] == 'y', # executable
1640
fields[7], # packed_stat or revision_id
1643
fields[8], # minikind
1644
fields[9], # fingerprint
1645
_int(fields[10]), # size
1646
fields[11] == 'y', # executable
1647
fields[12], # packed_stat or revision_id
1650
return fields_to_entry_1_parent
1651
elif num_present_parents == 2:
1652
def fields_to_entry_2_parents(fields, _int=int):
1653
path_name_file_id_key = (fields[0], fields[1], fields[2])
1654
return (path_name_file_id_key, [
1656
fields[3], # minikind
1657
fields[4], # fingerprint
1658
_int(fields[5]), # size
1659
fields[6] == 'y', # executable
1660
fields[7], # packed_stat or revision_id
1663
fields[8], # minikind
1664
fields[9], # fingerprint
1665
_int(fields[10]), # size
1666
fields[11] == 'y', # executable
1667
fields[12], # packed_stat or revision_id
1670
fields[13], # minikind
1671
fields[14], # fingerprint
1672
_int(fields[15]), # size
1673
fields[16] == 'y', # executable
1674
fields[17], # packed_stat or revision_id
1677
return fields_to_entry_2_parents
1679
def fields_to_entry_n_parents(fields, _int=int):
1680
path_name_file_id_key = (fields[0], fields[1], fields[2])
1681
trees = [(fields[cur], # minikind
1682
fields[cur+1], # fingerprint
1683
_int(fields[cur+2]), # size
1684
fields[cur+3] == 'y', # executable
1685
fields[cur+4], # stat or revision_id
1686
) for cur in xrange(3, len(fields)-1, 5)]
1687
return path_name_file_id_key, trees
1688
return fields_to_entry_n_parents
1690
def get_parent_ids(self):
1691
"""Return a list of the parent tree ids for the directory state."""
1692
self._read_header_if_needed()
1693
return list(self._parents)
1695
def _get_block_entry_index(self, dirname, basename, tree_index):
1696
"""Get the coordinates for a path in the state structure.
1698
:param dirname: The utf8 dirname to lookup.
1699
:param basename: The utf8 basename to lookup.
1700
:param tree_index: The index of the tree for which this lookup should
1702
:return: A tuple describing where the path is located, or should be
1703
inserted. The tuple contains four fields: the block index, the row
1704
index, the directory is present (boolean), the entire path is
1705
present (boolean). There is no guarantee that either
1706
coordinate is currently reachable unless the found field for it is
1707
True. For instance, a directory not present in the searched tree
1708
may be returned with a value one greater than the current highest
1709
block offset. The directory present field will always be True when
1710
the path present field is True. The directory present field does
1711
NOT indicate that the directory is present in the searched tree,
1712
rather it indicates that there are at least some files in some
1715
self._read_dirblocks_if_needed()
1716
key = dirname, basename, ''
1717
block_index, present = self._find_block_index_from_key(key)
1719
# no such directory - return the dir index and 0 for the row.
1720
return block_index, 0, False, False
1721
block = self._dirblocks[block_index][1] # access the entries only
1722
entry_index, present = self._find_entry_index(key, block)
1723
# linear search through entries at this path to find the one
1725
while entry_index < len(block) and block[entry_index][0][1] == basename:
1726
if block[entry_index][1][tree_index][0] not in 'ar':
1727
# neither absent or relocated
1728
return block_index, entry_index, True, True
1730
return block_index, entry_index, True, False
1732
def _get_entry(self, tree_index, fileid_utf8=None, path_utf8=None):
1733
"""Get the dirstate entry for path in tree tree_index.
1735
If either file_id or path is supplied, it is used as the key to lookup.
1736
If both are supplied, the fastest lookup is used, and an error is
1737
raised if they do not both point at the same row.
1739
:param tree_index: The index of the tree we wish to locate this path
1740
in. If the path is present in that tree, the entry containing its
1741
details is returned, otherwise (None, None) is returned
1742
0 is the working tree, higher indexes are successive parent
1744
:param fileid_utf8: A utf8 file_id to look up.
1745
:param path_utf8: An utf8 path to be looked up.
1746
:return: The dirstate entry tuple for path, or (None, None)
1748
self._read_dirblocks_if_needed()
1749
if path_utf8 is not None:
1750
if type(path_utf8) is not str:
1751
raise AssertionError('path_utf8 is not a str: %s %s'
1752
% (type(path_utf8), path_utf8))
1753
# path lookups are faster
1754
dirname, basename = osutils.split(path_utf8)
1755
block_index, entry_index, dir_present, file_present = \
1756
self._get_block_entry_index(dirname, basename, tree_index)
1757
if not file_present:
1759
entry = self._dirblocks[block_index][1][entry_index]
1760
if not (entry[0][2] and entry[1][tree_index][0] not in ('a', 'r')):
1761
raise AssertionError('unversioned entry?')
1763
if entry[0][2] != fileid_utf8:
1764
self._changes_aborted = True
1765
raise errors.BzrError('integrity error ? : mismatching'
1766
' tree_index, file_id and path')
1769
possible_keys = self._get_id_index().get(fileid_utf8, None)
1770
if not possible_keys:
1772
for key in possible_keys:
1773
block_index, present = \
1774
self._find_block_index_from_key(key)
1775
# strange, probably indicates an out of date
1776
# id index - for now, allow this.
1779
# WARNING: DO not change this code to use _get_block_entry_index
1780
# as that function is not suitable: it does not use the key
1781
# to lookup, and thus the wrong coordinates are returned.
1782
block = self._dirblocks[block_index][1]
1783
entry_index, present = self._find_entry_index(key, block)
1785
entry = self._dirblocks[block_index][1][entry_index]
1786
if entry[1][tree_index][0] in 'fdlt':
1787
# this is the result we are looking for: the
1788
# real home of this file_id in this tree.
1790
if entry[1][tree_index][0] == 'a':
1791
# there is no home for this entry in this tree
1793
if entry[1][tree_index][0] != 'r':
1794
raise AssertionError(
1795
"entry %r has invalid minikind %r for tree %r" \
1797
entry[1][tree_index][0],
1799
real_path = entry[1][tree_index][1]
1800
return self._get_entry(tree_index, fileid_utf8=fileid_utf8,
1801
path_utf8=real_path)
1805
def initialize(cls, path):
1806
"""Create a new dirstate on path.
1808
The new dirstate will be an empty tree - that is it has no parents,
1809
and only a root node - which has id ROOT_ID.
1811
:param path: The name of the file for the dirstate.
1812
:return: A write-locked DirState object.
1814
# This constructs a new DirState object on a path, sets the _state_file
1815
# to a new empty file for that path. It then calls _set_data() with our
1816
# stock empty dirstate information - a root with ROOT_ID, no children,
1817
# and no parents. Finally it calls save() to ensure that this data will
1820
# root dir and root dir contents with no children.
1821
empty_tree_dirblocks = [('', []), ('', [])]
1822
# a new root directory, with a NULLSTAT.
1823
empty_tree_dirblocks[0][1].append(
1824
(('', '', inventory.ROOT_ID), [
1825
('d', '', 0, False, DirState.NULLSTAT),
1829
result._set_data([], empty_tree_dirblocks)
1837
def _inv_entry_to_details(inv_entry):
1838
"""Convert an inventory entry (from a revision tree) to state details.
1840
:param inv_entry: An inventory entry whose sha1 and link targets can be
1841
relied upon, and which has a revision set.
1842
:return: A details tuple - the details for a single tree at a path +
1845
kind = inv_entry.kind
1846
minikind = DirState._kind_to_minikind[kind]
1847
tree_data = inv_entry.revision
1848
if kind == 'directory':
1852
elif kind == 'symlink':
1853
# We don't support non-ascii targets for symlinks yet.
1854
fingerprint = str(inv_entry.symlink_target or '')
1857
elif kind == 'file':
1858
fingerprint = inv_entry.text_sha1 or ''
1859
size = inv_entry.text_size or 0
1860
executable = inv_entry.executable
1861
elif kind == 'tree-reference':
1862
fingerprint = inv_entry.reference_revision or ''
1866
raise Exception("can't pack %s" % inv_entry)
1867
return (minikind, fingerprint, size, executable, tree_data)
1869
def _iter_child_entries(self, tree_index, path_utf8):
1870
"""Iterate over all the entries that are children of path_utf.
1872
This only returns entries that are present (not in 'a', 'r') in
1873
tree_index. tree_index data is not refreshed, so if tree 0 is used,
1874
results may differ from that obtained if paths were statted to
1875
determine what ones were directories.
1877
Asking for the children of a non-directory will return an empty
1881
next_pending_dirs = [path_utf8]
1883
while next_pending_dirs:
1884
pending_dirs = next_pending_dirs
1885
next_pending_dirs = []
1886
for path in pending_dirs:
1887
block_index, present = self._find_block_index_from_key(
1889
if block_index == 0:
1891
if len(self._dirblocks) == 1:
1892
# asked for the children of the root with no other
1896
# children of a non-directory asked for.
1898
block = self._dirblocks[block_index]
1899
for entry in block[1]:
1900
kind = entry[1][tree_index][0]
1901
if kind not in absent:
1905
path = entry[0][0] + '/' + entry[0][1]
1908
next_pending_dirs.append(path)
1910
def _iter_entries(self):
1911
"""Iterate over all the entries in the dirstate.
1913
Each yelt item is an entry in the standard format described in the
1914
docstring of bzrlib.dirstate.
1916
self._read_dirblocks_if_needed()
1917
for directory in self._dirblocks:
1918
for entry in directory[1]:
1921
def _get_id_index(self):
1922
"""Get an id index of self._dirblocks."""
1923
if self._id_index is None:
1925
for key, tree_details in self._iter_entries():
1926
id_index.setdefault(key[2], set()).add(key)
1927
self._id_index = id_index
1928
return self._id_index
1930
def _get_output_lines(self, lines):
1931
"""Format lines for final output.
1933
:param lines: A sequence of lines containing the parents list and the
1936
output_lines = [DirState.HEADER_FORMAT_3]
1937
lines.append('') # a final newline
1938
inventory_text = '\0\n\0'.join(lines)
1939
output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
1940
# -3, 1 for num parents, 1 for ghosts, 1 for final newline
1941
num_entries = len(lines)-3
1942
output_lines.append('num_entries: %s\n' % (num_entries,))
1943
output_lines.append(inventory_text)
1946
def _make_deleted_row(self, fileid_utf8, parents):
1947
"""Return a deleted row for fileid_utf8."""
1948
return ('/', 'RECYCLED.BIN', 'file', fileid_utf8, 0, DirState.NULLSTAT,
1951
def _num_present_parents(self):
1952
"""The number of parent entries in each record row."""
1953
return len(self._parents) - len(self._ghosts)
1956
def on_file(path, content_filter_stack_provider=None):
1957
"""Construct a DirState on the file at path path.
1959
:param content_filter_stack_provider: a function that takes a
1960
path (relative to the top of the tree) and a file-id as
1961
parameters and returns a stack of ContentFilters.
1962
If None, no content filtering is performed.
1963
:return: An unlocked DirState object, associated with the given path.
1965
result = DirState(path, content_filter_stack_provider)
1968
def _read_dirblocks_if_needed(self):
1969
"""Read in all the dirblocks from the file if they are not in memory.
1971
This populates self._dirblocks, and sets self._dirblock_state to
1972
IN_MEMORY_UNMODIFIED. It is not currently ready for incremental block
1975
self._read_header_if_needed()
1976
if self._dirblock_state == DirState.NOT_IN_MEMORY:
1977
_read_dirblocks(self)
1979
def _read_header(self):
1980
"""This reads in the metadata header, and the parent ids.
1982
After reading in, the file should be positioned at the null
1983
just before the start of the first record in the file.
1985
:return: (expected crc checksum, number of entries, parent list)
1987
self._read_prelude()
1988
parent_line = self._state_file.readline()
1989
info = parent_line.split('\0')
1990
num_parents = int(info[0])
1991
self._parents = info[1:-1]
1992
ghost_line = self._state_file.readline()
1993
info = ghost_line.split('\0')
1994
num_ghosts = int(info[1])
1995
self._ghosts = info[2:-1]
1996
self._header_state = DirState.IN_MEMORY_UNMODIFIED
1997
self._end_of_header = self._state_file.tell()
1999
def _read_header_if_needed(self):
2000
"""Read the header of the dirstate file if needed."""
2001
# inline this as it will be called a lot
2002
if not self._lock_token:
2003
raise errors.ObjectNotLocked(self)
2004
if self._header_state == DirState.NOT_IN_MEMORY:
2007
def _read_prelude(self):
2008
"""Read in the prelude header of the dirstate file.
2010
This only reads in the stuff that is not connected to the crc
2011
checksum. The position will be correct to read in the rest of
2012
the file and check the checksum after this point.
2013
The next entry in the file should be the number of parents,
2014
and their ids. Followed by a newline.
2016
header = self._state_file.readline()
2017
if header != DirState.HEADER_FORMAT_3:
2018
raise errors.BzrError(
2019
'invalid header line: %r' % (header,))
2020
crc_line = self._state_file.readline()
2021
if not crc_line.startswith('crc32: '):
2022
raise errors.BzrError('missing crc32 checksum: %r' % crc_line)
2023
self.crc_expected = int(crc_line[len('crc32: '):-1])
2024
num_entries_line = self._state_file.readline()
2025
if not num_entries_line.startswith('num_entries: '):
2026
raise errors.BzrError('missing num_entries line')
2027
self._num_entries = int(num_entries_line[len('num_entries: '):-1])
2029
def sha1_from_stat(self, path, stat_result, _pack_stat=pack_stat):
2030
"""Find a sha1 given a stat lookup."""
2031
return self._get_packed_stat_index().get(_pack_stat(stat_result), None)
2033
def _get_packed_stat_index(self):
2034
"""Get a packed_stat index of self._dirblocks."""
2035
if self._packed_stat_index is None:
2037
for key, tree_details in self._iter_entries():
2038
if tree_details[0][0] == 'f':
2039
index[tree_details[0][4]] = tree_details[0][1]
2040
self._packed_stat_index = index
2041
return self._packed_stat_index
2044
"""Save any pending changes created during this session.
2046
We reuse the existing file, because that prevents race conditions with
2047
file creation, and use oslocks on it to prevent concurrent modification
2048
and reads - because dirstate's incremental data aggregation is not
2049
compatible with reading a modified file, and replacing a file in use by
2050
another process is impossible on Windows.
2052
A dirstate in read only mode should be smart enough though to validate
2053
that the file has not changed, and otherwise discard its cache and
2054
start over, to allow for fine grained read lock duration, so 'status'
2055
wont block 'commit' - for example.
2057
if self._changes_aborted:
2058
# Should this be a warning? For now, I'm expecting that places that
2059
# mark it inconsistent will warn, making a warning here redundant.
2060
trace.mutter('Not saving DirState because '
2061
'_changes_aborted is set.')
2063
if (self._header_state == DirState.IN_MEMORY_MODIFIED or
2064
self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2066
grabbed_write_lock = False
2067
if self._lock_state != 'w':
2068
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2069
# Switch over to the new lock, as the old one may be closed.
2070
# TODO: jam 20070315 We should validate the disk file has
2071
# not changed contents. Since temporary_write_lock may
2072
# not be an atomic operation.
2073
self._lock_token = new_lock
2074
self._state_file = new_lock.f
2075
if not grabbed_write_lock:
2076
# We couldn't grab a write lock, so we switch back to a read one
2079
self._state_file.seek(0)
2080
self._state_file.writelines(self.get_lines())
2081
self._state_file.truncate()
2082
self._state_file.flush()
2083
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2084
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
2086
if grabbed_write_lock:
2087
self._lock_token = self._lock_token.restore_read_lock()
2088
self._state_file = self._lock_token.f
2089
# TODO: jam 20070315 We should validate the disk file has
2090
# not changed contents. Since restore_read_lock may
2091
# not be an atomic operation.
2093
def _set_data(self, parent_ids, dirblocks):
2094
"""Set the full dirstate data in memory.
2096
This is an internal function used to completely replace the objects
2097
in memory state. It puts the dirstate into state 'full-dirty'.
2099
:param parent_ids: A list of parent tree revision ids.
2100
:param dirblocks: A list containing one tuple for each directory in the
2101
tree. Each tuple contains the directory path and a list of entries
2102
found in that directory.
2104
# our memory copy is now authoritative.
2105
self._dirblocks = dirblocks
2106
self._header_state = DirState.IN_MEMORY_MODIFIED
2107
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2108
self._parents = list(parent_ids)
2109
self._id_index = None
2110
self._packed_stat_index = None
2112
def set_path_id(self, path, new_id):
2113
"""Change the id of path to new_id in the current working tree.
2115
:param path: The path inside the tree to set - '' is the root, 'foo'
2116
is the path foo in the root.
2117
:param new_id: The new id to assign to the path. This must be a utf8
2118
file id (not unicode, and not None).
2120
self._read_dirblocks_if_needed()
2122
# TODO: logic not written
2123
raise NotImplementedError(self.set_path_id)
2124
# TODO: check new id is unique
2125
entry = self._get_entry(0, path_utf8=path)
2126
if entry[0][2] == new_id:
2127
# Nothing to change.
2129
# mark the old path absent, and insert a new root path
2130
self._make_absent(entry)
2131
self.update_minimal(('', '', new_id), 'd',
2132
path_utf8='', packed_stat=entry[1][0][4])
2133
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2134
if self._id_index is not None:
2135
self._id_index.setdefault(new_id, set()).add(entry[0])
2137
def set_parent_trees(self, trees, ghosts):
2138
"""Set the parent trees for the dirstate.
2140
:param trees: A list of revision_id, tree tuples. tree must be provided
2141
even if the revision_id refers to a ghost: supply an empty tree in
2143
:param ghosts: A list of the revision_ids that are ghosts at the time
2146
# TODO: generate a list of parent indexes to preserve to save
2147
# processing specific parent trees. In the common case one tree will
2148
# be preserved - the left most parent.
2149
# TODO: if the parent tree is a dirstate, we might want to walk them
2150
# all by path in parallel for 'optimal' common-case performance.
2151
# generate new root row.
2152
self._read_dirblocks_if_needed()
2153
# TODO future sketch: Examine the existing parents to generate a change
2154
# map and then walk the new parent trees only, mapping them into the
2155
# dirstate. Walk the dirstate at the same time to remove unreferenced
2158
# sketch: loop over all entries in the dirstate, cherry picking
2159
# entries from the parent trees, if they are not ghost trees.
2160
# after we finish walking the dirstate, all entries not in the dirstate
2161
# are deletes, so we want to append them to the end as per the design
2162
# discussions. So do a set difference on ids with the parents to
2163
# get deletes, and add them to the end.
2164
# During the update process we need to answer the following questions:
2165
# - find other keys containing a fileid in order to create cross-path
2166
# links. We dont't trivially use the inventory from other trees
2167
# because this leads to either double touching, or to accessing
2169
# - find other keys containing a path
2170
# We accumulate each entry via this dictionary, including the root
2173
# we could do parallel iterators, but because file id data may be
2174
# scattered throughout, we dont save on index overhead: we have to look
2175
# at everything anyway. We can probably save cycles by reusing parent
2176
# data and doing an incremental update when adding an additional
2177
# parent, but for now the common cases are adding a new parent (merge),
2178
# and replacing completely (commit), and commit is more common: so
2179
# optimise merge later.
2181
# ---- start generation of full tree mapping data
2182
# what trees should we use?
2183
parent_trees = [tree for rev_id, tree in trees if rev_id not in ghosts]
2184
# how many trees do we end up with
2185
parent_count = len(parent_trees)
2187
# one: the current tree
2188
for entry in self._iter_entries():
2189
# skip entries not in the current tree
2190
if entry[1][0][0] in 'ar': # absent, relocated
2192
by_path[entry[0]] = [entry[1][0]] + \
2193
[DirState.NULL_PARENT_DETAILS] * parent_count
2194
id_index[entry[0][2]] = set([entry[0]])
2196
# now the parent trees:
2197
for tree_index, tree in enumerate(parent_trees):
2198
# the index is off by one, adjust it.
2199
tree_index = tree_index + 1
2200
# when we add new locations for a fileid we need these ranges for
2201
# any fileid in this tree as we set the by_path[id] to:
2202
# already_processed_tree_details + new_details + new_location_suffix
2203
# the suffix is from tree_index+1:parent_count+1.
2204
new_location_suffix = [DirState.NULL_PARENT_DETAILS] * (parent_count - tree_index)
2205
# now stitch in all the entries from this tree
2206
for path, entry in tree.inventory.iter_entries_by_dir():
2207
# here we process each trees details for each item in the tree.
2208
# we first update any existing entries for the id at other paths,
2209
# then we either create or update the entry for the id at the
2210
# right path, and finally we add (if needed) a mapping from
2211
# file_id to this path. We do it in this order to allow us to
2212
# avoid checking all known paths for the id when generating a
2213
# new entry at this path: by adding the id->path mapping last,
2214
# all the mappings are valid and have correct relocation
2215
# records where needed.
2216
file_id = entry.file_id
2217
path_utf8 = path.encode('utf8')
2218
dirname, basename = osutils.split(path_utf8)
2219
new_entry_key = (dirname, basename, file_id)
2220
# tree index consistency: All other paths for this id in this tree
2221
# index must point to the correct path.
2222
for entry_key in id_index.setdefault(file_id, set()):
2223
# TODO:PROFILING: It might be faster to just update
2224
# rather than checking if we need to, and then overwrite
2225
# the one we are located at.
2226
if entry_key != new_entry_key:
2227
# this file id is at a different path in one of the
2228
# other trees, so put absent pointers there
2229
# This is the vertical axis in the matrix, all pointing
2231
by_path[entry_key][tree_index] = ('r', path_utf8, 0, False, '')
2232
# by path consistency: Insert into an existing path record (trivial), or
2233
# add a new one with relocation pointers for the other tree indexes.
2234
if new_entry_key in id_index[file_id]:
2235
# there is already an entry where this data belongs, just insert it.
2236
by_path[new_entry_key][tree_index] = \
2237
self._inv_entry_to_details(entry)
2239
# add relocated entries to the horizontal axis - this row
2240
# mapping from path,id. We need to look up the correct path
2241
# for the indexes from 0 to tree_index -1
2243
for lookup_index in xrange(tree_index):
2244
# boundary case: this is the first occurence of file_id
2245
# so there are no id_indexs, possibly take this out of
2247
if not len(id_index[file_id]):
2248
new_details.append(DirState.NULL_PARENT_DETAILS)
2250
# grab any one entry, use it to find the right path.
2251
# TODO: optimise this to reduce memory use in highly
2252
# fragmented situations by reusing the relocation
2254
a_key = iter(id_index[file_id]).next()
2255
if by_path[a_key][lookup_index][0] in ('r', 'a'):
2256
# its a pointer or missing statement, use it as is.
2257
new_details.append(by_path[a_key][lookup_index])
2259
# we have the right key, make a pointer to it.
2260
real_path = ('/'.join(a_key[0:2])).strip('/')
2261
new_details.append(('r', real_path, 0, False, ''))
2262
new_details.append(self._inv_entry_to_details(entry))
2263
new_details.extend(new_location_suffix)
2264
by_path[new_entry_key] = new_details
2265
id_index[file_id].add(new_entry_key)
2266
# --- end generation of full tree mappings
2268
# sort and output all the entries
2269
new_entries = self._sort_entries(by_path.items())
2270
self._entries_to_current_state(new_entries)
2271
self._parents = [rev_id for rev_id, tree in trees]
2272
self._ghosts = list(ghosts)
2273
self._header_state = DirState.IN_MEMORY_MODIFIED
2274
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2275
self._id_index = id_index
2277
def _sort_entries(self, entry_list):
2278
"""Given a list of entries, sort them into the right order.
2280
This is done when constructing a new dirstate from trees - normally we
2281
try to keep everything in sorted blocks all the time, but sometimes
2282
it's easier to sort after the fact.
2285
# sort by: directory parts, file name, file id
2286
return entry[0][0].split('/'), entry[0][1], entry[0][2]
2287
return sorted(entry_list, key=_key)
2289
def set_state_from_inventory(self, new_inv):
2290
"""Set new_inv as the current state.
2292
This API is called by tree transform, and will usually occur with
2293
existing parent trees.
2295
:param new_inv: The inventory object to set current state from.
2297
if 'evil' in debug.debug_flags:
2298
trace.mutter_callsite(1,
2299
"set_state_from_inventory called; please mutate the tree instead")
2300
self._read_dirblocks_if_needed()
2302
# Two iterators: current data and new data, both in dirblock order.
2303
# We zip them together, which tells about entries that are new in the
2304
# inventory, or removed in the inventory, or present in both and
2307
# You might think we could just synthesize a new dirstate directly
2308
# since we're processing it in the right order. However, we need to
2309
# also consider there may be any number of parent trees and relocation
2310
# pointers, and we don't want to duplicate that here.
2311
new_iterator = new_inv.iter_entries_by_dir()
2312
# we will be modifying the dirstate, so we need a stable iterator. In
2313
# future we might write one, for now we just clone the state into a
2314
# list - which is a shallow copy.
2315
old_iterator = iter(list(self._iter_entries()))
2316
# both must have roots so this is safe:
2317
current_new = new_iterator.next()
2318
current_old = old_iterator.next()
2319
def advance(iterator):
2321
return iterator.next()
2322
except StopIteration:
2324
while current_new or current_old:
2325
# skip entries in old that are not really there
2326
if current_old and current_old[1][0][0] in 'ar':
2327
# relocated or absent
2328
current_old = advance(old_iterator)
2331
# convert new into dirblock style
2332
new_path_utf8 = current_new[0].encode('utf8')
2333
new_dirname, new_basename = osutils.split(new_path_utf8)
2334
new_id = current_new[1].file_id
2335
new_entry_key = (new_dirname, new_basename, new_id)
2336
current_new_minikind = \
2337
DirState._kind_to_minikind[current_new[1].kind]
2338
if current_new_minikind == 't':
2339
fingerprint = current_new[1].reference_revision or ''
2341
# We normally only insert or remove records, or update
2342
# them when it has significantly changed. Then we want to
2343
# erase its fingerprint. Unaffected records should
2344
# normally not be updated at all.
2347
# for safety disable variables
2348
new_path_utf8 = new_dirname = new_basename = new_id = \
2349
new_entry_key = None
2350
# 5 cases, we dont have a value that is strictly greater than everything, so
2351
# we make both end conditions explicit
2353
# old is finished: insert current_new into the state.
2354
self.update_minimal(new_entry_key, current_new_minikind,
2355
executable=current_new[1].executable,
2356
path_utf8=new_path_utf8, fingerprint=fingerprint)
2357
current_new = advance(new_iterator)
2358
elif not current_new:
2360
self._make_absent(current_old)
2361
current_old = advance(old_iterator)
2362
elif new_entry_key == current_old[0]:
2363
# same - common case
2364
# We're looking at the same path and id in both the dirstate
2365
# and inventory, so just need to update the fields in the
2366
# dirstate from the one in the inventory.
2367
# TODO: update the record if anything significant has changed.
2368
# the minimal required trigger is if the execute bit or cached
2370
if (current_old[1][0][3] != current_new[1].executable or
2371
current_old[1][0][0] != current_new_minikind):
2372
self.update_minimal(current_old[0], current_new_minikind,
2373
executable=current_new[1].executable,
2374
path_utf8=new_path_utf8, fingerprint=fingerprint)
2375
# both sides are dealt with, move on
2376
current_old = advance(old_iterator)
2377
current_new = advance(new_iterator)
2378
elif (cmp_by_dirs(new_dirname, current_old[0][0]) < 0
2379
or (new_dirname == current_old[0][0]
2380
and new_entry_key[1:] < current_old[0][1:])):
2382
# add a entry for this and advance new
2383
self.update_minimal(new_entry_key, current_new_minikind,
2384
executable=current_new[1].executable,
2385
path_utf8=new_path_utf8, fingerprint=fingerprint)
2386
current_new = advance(new_iterator)
2388
# we've advanced past the place where the old key would be,
2389
# without seeing it in the new list. so it must be gone.
2390
self._make_absent(current_old)
2391
current_old = advance(old_iterator)
2392
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2393
self._id_index = None
2394
self._packed_stat_index = None
2396
def _make_absent(self, current_old):
2397
"""Mark current_old - an entry - as absent for tree 0.
2399
:return: True if this was the last details entry for the entry key:
2400
that is, if the underlying block has had the entry removed, thus
2401
shrinking in length.
2403
# build up paths that this id will be left at after the change is made,
2404
# so we can update their cross references in tree 0
2405
all_remaining_keys = set()
2406
# Dont check the working tree, because it's going.
2407
for details in current_old[1][1:]:
2408
if details[0] not in 'ar': # absent, relocated
2409
all_remaining_keys.add(current_old[0])
2410
elif details[0] == 'r': # relocated
2411
# record the key for the real path.
2412
all_remaining_keys.add(tuple(osutils.split(details[1])) + (current_old[0][2],))
2413
# absent rows are not present at any path.
2414
last_reference = current_old[0] not in all_remaining_keys
2416
# the current row consists entire of the current item (being marked
2417
# absent), and relocated or absent entries for the other trees:
2418
# Remove it, its meaningless.
2419
block = self._find_block(current_old[0])
2420
entry_index, present = self._find_entry_index(current_old[0], block[1])
2422
raise AssertionError('could not find entry for %s' % (current_old,))
2423
block[1].pop(entry_index)
2424
# if we have an id_index in use, remove this key from it for this id.
2425
if self._id_index is not None:
2426
self._id_index[current_old[0][2]].remove(current_old[0])
2427
# update all remaining keys for this id to record it as absent. The
2428
# existing details may either be the record we are marking as deleted
2429
# (if there were other trees with the id present at this path), or may
2431
for update_key in all_remaining_keys:
2432
update_block_index, present = \
2433
self._find_block_index_from_key(update_key)
2435
raise AssertionError('could not find block for %s' % (update_key,))
2436
update_entry_index, present = \
2437
self._find_entry_index(update_key, self._dirblocks[update_block_index][1])
2439
raise AssertionError('could not find entry for %s' % (update_key,))
2440
update_tree_details = self._dirblocks[update_block_index][1][update_entry_index][1]
2441
# it must not be absent at the moment
2442
if update_tree_details[0][0] == 'a': # absent
2443
raise AssertionError('bad row %r' % (update_tree_details,))
2444
update_tree_details[0] = DirState.NULL_PARENT_DETAILS
2445
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2446
return last_reference
2448
def update_minimal(self, key, minikind, executable=False, fingerprint='',
2449
packed_stat=None, size=0, path_utf8=None):
2450
"""Update an entry to the state in tree 0.
2452
This will either create a new entry at 'key' or update an existing one.
2453
It also makes sure that any other records which might mention this are
2456
:param key: (dir, name, file_id) for the new entry
2457
:param minikind: The type for the entry ('f' == 'file', 'd' ==
2459
:param executable: Should the executable bit be set?
2460
:param fingerprint: Simple fingerprint for new entry: sha1 for files,
2461
referenced revision id for subtrees, etc.
2462
:param packed_stat: Packed stat value for new entry.
2463
:param size: Size information for new entry
2464
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
2467
If packed_stat and fingerprint are not given, they're invalidated in
2470
block = self._find_block(key)[1]
2471
if packed_stat is None:
2472
packed_stat = DirState.NULLSTAT
2473
# XXX: Some callers pass '' as the packed_stat, and it seems to be
2474
# sometimes present in the dirstate - this seems oddly inconsistent.
2476
entry_index, present = self._find_entry_index(key, block)
2477
new_details = (minikind, fingerprint, size, executable, packed_stat)
2478
id_index = self._get_id_index()
2480
# new entry, synthesis cross reference here,
2481
existing_keys = id_index.setdefault(key[2], set())
2482
if not existing_keys:
2483
# not currently in the state, simplest case
2484
new_entry = key, [new_details] + self._empty_parent_info()
2486
# present at one or more existing other paths.
2487
# grab one of them and use it to generate parent
2488
# relocation/absent entries.
2489
new_entry = key, [new_details]
2490
for other_key in existing_keys:
2491
# change the record at other to be a pointer to this new
2492
# record. The loop looks similar to the change to
2493
# relocations when updating an existing record but its not:
2494
# the test for existing kinds is different: this can be
2495
# factored out to a helper though.
2496
other_block_index, present = self._find_block_index_from_key(other_key)
2498
raise AssertionError('could not find block for %s' % (other_key,))
2499
other_entry_index, present = self._find_entry_index(other_key,
2500
self._dirblocks[other_block_index][1])
2502
raise AssertionError('could not find entry for %s' % (other_key,))
2503
if path_utf8 is None:
2504
raise AssertionError('no path')
2505
self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
2506
('r', path_utf8, 0, False, '')
2508
num_present_parents = self._num_present_parents()
2509
for lookup_index in xrange(1, num_present_parents + 1):
2510
# grab any one entry, use it to find the right path.
2511
# TODO: optimise this to reduce memory use in highly
2512
# fragmented situations by reusing the relocation
2514
update_block_index, present = \
2515
self._find_block_index_from_key(other_key)
2517
raise AssertionError('could not find block for %s' % (other_key,))
2518
update_entry_index, present = \
2519
self._find_entry_index(other_key, self._dirblocks[update_block_index][1])
2521
raise AssertionError('could not find entry for %s' % (other_key,))
2522
update_details = self._dirblocks[update_block_index][1][update_entry_index][1][lookup_index]
2523
if update_details[0] in 'ar': # relocated, absent
2524
# its a pointer or absent in lookup_index's tree, use
2526
new_entry[1].append(update_details)
2528
# we have the right key, make a pointer to it.
2529
pointer_path = osutils.pathjoin(*other_key[0:2])
2530
new_entry[1].append(('r', pointer_path, 0, False, ''))
2531
block.insert(entry_index, new_entry)
2532
existing_keys.add(key)
2534
# Does the new state matter?
2535
block[entry_index][1][0] = new_details
2536
# parents cannot be affected by what we do.
2537
# other occurences of this id can be found
2538
# from the id index.
2540
# tree index consistency: All other paths for this id in this tree
2541
# index must point to the correct path. We have to loop here because
2542
# we may have passed entries in the state with this file id already
2543
# that were absent - where parent entries are - and they need to be
2544
# converted to relocated.
2545
if path_utf8 is None:
2546
raise AssertionError('no path')
2547
for entry_key in id_index.setdefault(key[2], set()):
2548
# TODO:PROFILING: It might be faster to just update
2549
# rather than checking if we need to, and then overwrite
2550
# the one we are located at.
2551
if entry_key != key:
2552
# this file id is at a different path in one of the
2553
# other trees, so put absent pointers there
2554
# This is the vertical axis in the matrix, all pointing
2556
block_index, present = self._find_block_index_from_key(entry_key)
2558
raise AssertionError('not present: %r', entry_key)
2559
entry_index, present = self._find_entry_index(entry_key, self._dirblocks[block_index][1])
2561
raise AssertionError('not present: %r', entry_key)
2562
self._dirblocks[block_index][1][entry_index][1][0] = \
2563
('r', path_utf8, 0, False, '')
2564
# add a containing dirblock if needed.
2565
if new_details[0] == 'd':
2566
subdir_key = (osutils.pathjoin(*key[0:2]), '', '')
2567
block_index, present = self._find_block_index_from_key(subdir_key)
2569
self._dirblocks.insert(block_index, (subdir_key[0], []))
2571
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2573
def _validate(self):
2574
"""Check that invariants on the dirblock are correct.
2576
This can be useful in debugging; it shouldn't be necessary in
2579
This must be called with a lock held.
2581
# NOTE: This must always raise AssertionError not just assert,
2582
# otherwise it may not behave properly under python -O
2584
# TODO: All entries must have some content that's not 'a' or 'r',
2585
# otherwise it could just be removed.
2587
# TODO: All relocations must point directly to a real entry.
2589
# TODO: No repeated keys.
2592
from pprint import pformat
2593
self._read_dirblocks_if_needed()
2594
if len(self._dirblocks) > 0:
2595
if not self._dirblocks[0][0] == '':
2596
raise AssertionError(
2597
"dirblocks don't start with root block:\n" + \
2598
pformat(self._dirblocks))
2599
if len(self._dirblocks) > 1:
2600
if not self._dirblocks[1][0] == '':
2601
raise AssertionError(
2602
"dirblocks missing root directory:\n" + \
2603
pformat(self._dirblocks))
2604
# the dirblocks are sorted by their path components, name, and dir id
2605
dir_names = [d[0].split('/')
2606
for d in self._dirblocks[1:]]
2607
if dir_names != sorted(dir_names):
2608
raise AssertionError(
2609
"dir names are not in sorted order:\n" + \
2610
pformat(self._dirblocks) + \
2613
for dirblock in self._dirblocks:
2614
# within each dirblock, the entries are sorted by filename and
2616
for entry in dirblock[1]:
2617
if dirblock[0] != entry[0][0]:
2618
raise AssertionError(
2620
"doesn't match directory name in\n%r" %
2621
(entry, pformat(dirblock)))
2622
if dirblock[1] != sorted(dirblock[1]):
2623
raise AssertionError(
2624
"dirblock for %r is not sorted:\n%s" % \
2625
(dirblock[0], pformat(dirblock)))
2627
def check_valid_parent():
2628
"""Check that the current entry has a valid parent.
2630
This makes sure that the parent has a record,
2631
and that the parent isn't marked as "absent" in the
2632
current tree. (It is invalid to have a non-absent file in an absent
2635
if entry[0][0:2] == ('', ''):
2636
# There should be no parent for the root row
2638
parent_entry = self._get_entry(tree_index, path_utf8=entry[0][0])
2639
if parent_entry == (None, None):
2640
raise AssertionError(
2641
"no parent entry for: %s in tree %s"
2642
% (this_path, tree_index))
2643
if parent_entry[1][tree_index][0] != 'd':
2644
raise AssertionError(
2645
"Parent entry for %s is not marked as a valid"
2646
" directory. %s" % (this_path, parent_entry,))
2648
# For each file id, for each tree: either
2649
# the file id is not present at all; all rows with that id in the
2650
# key have it marked as 'absent'
2651
# OR the file id is present under exactly one name; any other entries
2652
# that mention that id point to the correct name.
2654
# We check this with a dict per tree pointing either to the present
2655
# name, or None if absent.
2656
tree_count = self._num_present_parents() + 1
2657
id_path_maps = [dict() for i in range(tree_count)]
2658
# Make sure that all renamed entries point to the correct location.
2659
for entry in self._iter_entries():
2660
file_id = entry[0][2]
2661
this_path = osutils.pathjoin(entry[0][0], entry[0][1])
2662
if len(entry[1]) != tree_count:
2663
raise AssertionError(
2664
"wrong number of entry details for row\n%s" \
2665
",\nexpected %d" % \
2666
(pformat(entry), tree_count))
2667
absent_positions = 0
2668
for tree_index, tree_state in enumerate(entry[1]):
2669
this_tree_map = id_path_maps[tree_index]
2670
minikind = tree_state[0]
2671
if minikind in 'ar':
2672
absent_positions += 1
2673
# have we seen this id before in this column?
2674
if file_id in this_tree_map:
2675
previous_path, previous_loc = this_tree_map[file_id]
2676
# any later mention of this file must be consistent with
2677
# what was said before
2679
if previous_path is not None:
2680
raise AssertionError(
2681
"file %s is absent in row %r but also present " \
2683
(file_id, entry, previous_path))
2684
elif minikind == 'r':
2685
target_location = tree_state[1]
2686
if previous_path != target_location:
2687
raise AssertionError(
2688
"file %s relocation in row %r but also at %r" \
2689
% (file_id, entry, previous_path))
2691
# a file, directory, etc - may have been previously
2692
# pointed to by a relocation, which must point here
2693
if previous_path != this_path:
2694
raise AssertionError(
2695
"entry %r inconsistent with previous path %r "
2697
(entry, previous_path, previous_loc))
2698
check_valid_parent()
2701
# absent; should not occur anywhere else
2702
this_tree_map[file_id] = None, this_path
2703
elif minikind == 'r':
2704
# relocation, must occur at expected location
2705
this_tree_map[file_id] = tree_state[1], this_path
2707
this_tree_map[file_id] = this_path, this_path
2708
check_valid_parent()
2709
if absent_positions == tree_count:
2710
raise AssertionError(
2711
"entry %r has no data for any tree." % (entry,))
2713
def _wipe_state(self):
2714
"""Forget all state information about the dirstate."""
2715
self._header_state = DirState.NOT_IN_MEMORY
2716
self._dirblock_state = DirState.NOT_IN_MEMORY
2717
self._changes_aborted = False
2720
self._dirblocks = []
2721
self._id_index = None
2722
self._packed_stat_index = None
2723
self._end_of_header = None
2724
self._cutoff_time = None
2725
self._split_path_cache = {}
2727
def lock_read(self):
2728
"""Acquire a read lock on the dirstate."""
2729
if self._lock_token is not None:
2730
raise errors.LockContention(self._lock_token)
2731
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2732
# already in memory, we could read just the header and check for
2733
# any modification. If not modified, we can just leave things
2735
self._lock_token = lock.ReadLock(self._filename)
2736
self._lock_state = 'r'
2737
self._state_file = self._lock_token.f
2740
def lock_write(self):
2741
"""Acquire a write lock on the dirstate."""
2742
if self._lock_token is not None:
2743
raise errors.LockContention(self._lock_token)
2744
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2745
# already in memory, we could read just the header and check for
2746
# any modification. If not modified, we can just leave things
2748
self._lock_token = lock.WriteLock(self._filename)
2749
self._lock_state = 'w'
2750
self._state_file = self._lock_token.f
2754
"""Drop any locks held on the dirstate."""
2755
if self._lock_token is None:
2756
raise errors.LockNotHeld(self)
2757
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2758
# already in memory, we could read just the header and check for
2759
# any modification. If not modified, we can just leave things
2761
self._state_file = None
2762
self._lock_state = None
2763
self._lock_token.unlock()
2764
self._lock_token = None
2765
self._split_path_cache = {}
2767
def _requires_lock(self):
2768
"""Check that a lock is currently held by someone on the dirstate."""
2769
if not self._lock_token:
2770
raise errors.ObjectNotLocked(self)
2773
def py_update_entry(state, entry, abspath, stat_value,
2774
_stat_to_minikind=DirState._stat_to_minikind,
2775
_pack_stat=pack_stat):
2776
"""Update the entry based on what is actually on disk.
2778
This function only calculates the sha if it needs to - if the entry is
2779
uncachable, or clearly different to the first parent's entry, no sha
2780
is calculated, and None is returned.
2782
:param state: The dirstate this entry is in.
2783
:param entry: This is the dirblock entry for the file in question.
2784
:param abspath: The path on disk for this file.
2785
:param stat_value: The stat value done on the path.
2786
:return: None, or The sha1 hexdigest of the file (40 bytes) or link
2787
target of a symlink.
2790
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
2794
packed_stat = _pack_stat(stat_value)
2795
(saved_minikind, saved_link_or_sha1, saved_file_size,
2796
saved_executable, saved_packed_stat) = entry[1][0]
2798
if (minikind == saved_minikind
2799
and packed_stat == saved_packed_stat):
2800
# The stat hasn't changed since we saved, so we can re-use the
2805
# size should also be in packed_stat
2806
if saved_file_size == stat_value.st_size:
2807
return saved_link_or_sha1
2809
# If we have gotten this far, that means that we need to actually
2810
# process this entry.
2813
executable = state._is_executable(stat_value.st_mode,
2815
if state._cutoff_time is None:
2816
state._sha_cutoff_time()
2817
if (stat_value.st_mtime < state._cutoff_time
2818
and stat_value.st_ctime < state._cutoff_time
2819
and len(entry[1]) > 1
2820
and entry[1][1][0] != 'a'):
2821
# Could check for size changes for further optimised
2822
# avoidance of sha1's. However the most prominent case of
2823
# over-shaing is during initial add, which this catches.
2824
# Besides, if content filtering happens, size and sha
2825
# need to be checked together - checking just the size
2827
if state._filter_provider is None:
2830
relpath = osutils.pathjoin(entry[0][0], entry[0][1])
2831
file_id = entry[0][2]
2832
filter_list = state._filter_provider(relpath, file_id)
2833
link_or_sha1 = state._size_sha1_file(abspath, filter_list)[1]
2834
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
2835
executable, packed_stat)
2837
entry[1][0] = ('f', '', stat_value.st_size,
2838
executable, DirState.NULLSTAT)
2839
elif minikind == 'd':
2841
entry[1][0] = ('d', '', 0, False, packed_stat)
2842
if saved_minikind != 'd':
2843
# This changed from something into a directory. Make sure we
2844
# have a directory block for it. This doesn't happen very
2845
# often, so this doesn't have to be super fast.
2846
block_index, entry_index, dir_present, file_present = \
2847
state._get_block_entry_index(entry[0][0], entry[0][1], 0)
2848
state._ensure_block(block_index, entry_index,
2849
osutils.pathjoin(entry[0][0], entry[0][1]))
2850
elif minikind == 'l':
2851
link_or_sha1 = state._read_link(abspath, saved_link_or_sha1)
2852
if state._cutoff_time is None:
2853
state._sha_cutoff_time()
2854
if (stat_value.st_mtime < state._cutoff_time
2855
and stat_value.st_ctime < state._cutoff_time):
2856
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
2859
entry[1][0] = ('l', '', stat_value.st_size,
2860
False, DirState.NULLSTAT)
2861
state._dirblock_state = DirState.IN_MEMORY_MODIFIED
2863
update_entry = py_update_entry
2866
class ProcessEntryPython(object):
2868
__slots__ = ["old_dirname_to_file_id", "new_dirname_to_file_id", "uninteresting",
2869
"last_source_parent", "last_target_parent", "include_unchanged",
2870
"use_filesystem_for_exec", "utf8_decode", "searched_specific_files",
2871
"search_specific_files", "state", "source_index", "target_index",
2872
"want_unversioned", "tree"]
2874
def __init__(self, include_unchanged, use_filesystem_for_exec,
2875
search_specific_files, state, source_index, target_index,
2876
want_unversioned, tree):
2877
self.old_dirname_to_file_id = {}
2878
self.new_dirname_to_file_id = {}
2879
# Just a sentry, so that _process_entry can say that this
2880
# record is handled, but isn't interesting to process (unchanged)
2881
self.uninteresting = object()
2882
# Using a list so that we can access the values and change them in
2883
# nested scope. Each one is [path, file_id, entry]
2884
self.last_source_parent = [None, None]
2885
self.last_target_parent = [None, None]
2886
self.include_unchanged = include_unchanged
2887
self.use_filesystem_for_exec = use_filesystem_for_exec
2888
self.utf8_decode = cache_utf8._utf8_decode
2889
# for all search_indexs in each path at or under each element of
2890
# search_specific_files, if the detail is relocated: add the id, and add the
2891
# relocated path as one to search if its not searched already. If the
2892
# detail is not relocated, add the id.
2893
self.searched_specific_files = set()
2894
self.search_specific_files = search_specific_files
2896
self.source_index = source_index
2897
self.target_index = target_index
2898
self.want_unversioned = want_unversioned
2901
def _process_entry(self, entry, path_info, pathjoin=osutils.pathjoin):
2902
"""Compare an entry and real disk to generate delta information.
2904
:param path_info: top_relpath, basename, kind, lstat, abspath for
2905
the path of entry. If None, then the path is considered absent.
2906
(Perhaps we should pass in a concrete entry for this ?)
2907
Basename is returned as a utf8 string because we expect this
2908
tuple will be ignored, and don't want to take the time to
2910
:return: None if these don't match
2911
A tuple of information about the change, or
2912
the object 'uninteresting' if these match, but are
2913
basically identical.
2915
if self.source_index is None:
2916
source_details = DirState.NULL_PARENT_DETAILS
2918
source_details = entry[1][self.source_index]
2919
target_details = entry[1][self.target_index]
2920
target_minikind = target_details[0]
2921
if path_info is not None and target_minikind in 'fdlt':
2922
if not (self.target_index == 0):
2923
raise AssertionError()
2924
link_or_sha1 = update_entry(self.state, entry,
2925
abspath=path_info[4], stat_value=path_info[3])
2926
# The entry may have been modified by update_entry
2927
target_details = entry[1][self.target_index]
2928
target_minikind = target_details[0]
2931
file_id = entry[0][2]
2932
source_minikind = source_details[0]
2933
if source_minikind in 'fdltr' and target_minikind in 'fdlt':
2934
# claimed content in both: diff
2935
# r | fdlt | | add source to search, add id path move and perform
2936
# | | | diff check on source-target
2937
# r | fdlt | a | dangling file that was present in the basis.
2939
if source_minikind in 'r':
2940
# add the source to the search path to find any children it
2941
# has. TODO ? : only add if it is a container ?
2942
if not osutils.is_inside_any(self.searched_specific_files,
2944
self.search_specific_files.add(source_details[1])
2945
# generate the old path; this is needed for stating later
2947
old_path = source_details[1]
2948
old_dirname, old_basename = os.path.split(old_path)
2949
path = pathjoin(entry[0][0], entry[0][1])
2950
old_entry = self.state._get_entry(self.source_index,
2952
# update the source details variable to be the real
2954
if old_entry == (None, None):
2955
raise errors.CorruptDirstate(self.state._filename,
2956
"entry '%s/%s' is considered renamed from %r"
2957
" but source does not exist\n"
2958
"entry: %s" % (entry[0][0], entry[0][1], old_path, entry))
2959
source_details = old_entry[1][self.source_index]
2960
source_minikind = source_details[0]
2962
old_dirname = entry[0][0]
2963
old_basename = entry[0][1]
2964
old_path = path = None
2965
if path_info is None:
2966
# the file is missing on disk, show as removed.
2967
content_change = True
2971
# source and target are both versioned and disk file is present.
2972
target_kind = path_info[2]
2973
if target_kind == 'directory':
2975
old_path = path = pathjoin(old_dirname, old_basename)
2976
self.new_dirname_to_file_id[path] = file_id
2977
if source_minikind != 'd':
2978
content_change = True
2980
# directories have no fingerprint
2981
content_change = False
2983
elif target_kind == 'file':
2984
if source_minikind != 'f':
2985
content_change = True
2987
# If the size is the same, check the sha:
2988
if target_details[2] == source_details[2]:
2989
if link_or_sha1 is None:
2991
file_obj = file(path_info[4], 'rb')
2993
statvalue = os.fstat(file_obj.fileno())
2994
link_or_sha1 = osutils.sha_file(file_obj)
2997
self.state._observed_sha1(entry, link_or_sha1,
2999
content_change = (link_or_sha1 != source_details[1])
3001
# Size changed, so must be different
3002
content_change = True
3003
# Target details is updated at update_entry time
3004
if self.use_filesystem_for_exec:
3005
# We don't need S_ISREG here, because we are sure
3006
# we are dealing with a file.
3007
target_exec = bool(stat.S_IEXEC & path_info[3].st_mode)
3009
target_exec = target_details[3]
3010
elif target_kind == 'symlink':
3011
if source_minikind != 'l':
3012
content_change = True
3014
content_change = (link_or_sha1 != source_details[1])
3016
elif target_kind == 'tree-reference':
3017
if source_minikind != 't':
3018
content_change = True
3020
content_change = False
3023
raise Exception, "unknown kind %s" % path_info[2]
3024
if source_minikind == 'd':
3026
old_path = path = pathjoin(old_dirname, old_basename)
3027
self.old_dirname_to_file_id[old_path] = file_id
3028
# parent id is the entry for the path in the target tree
3029
if old_dirname == self.last_source_parent[0]:
3030
source_parent_id = self.last_source_parent[1]
3033
source_parent_id = self.old_dirname_to_file_id[old_dirname]
3035
source_parent_entry = self.state._get_entry(self.source_index,
3036
path_utf8=old_dirname)
3037
source_parent_id = source_parent_entry[0][2]
3038
if source_parent_id == entry[0][2]:
3039
# This is the root, so the parent is None
3040
source_parent_id = None
3042
self.last_source_parent[0] = old_dirname
3043
self.last_source_parent[1] = source_parent_id
3044
new_dirname = entry[0][0]
3045
if new_dirname == self.last_target_parent[0]:
3046
target_parent_id = self.last_target_parent[1]
3049
target_parent_id = self.new_dirname_to_file_id[new_dirname]
3051
# TODO: We don't always need to do the lookup, because the
3052
# parent entry will be the same as the source entry.
3053
target_parent_entry = self.state._get_entry(self.target_index,
3054
path_utf8=new_dirname)
3055
if target_parent_entry == (None, None):
3056
raise AssertionError(
3057
"Could not find target parent in wt: %s\nparent of: %s"
3058
% (new_dirname, entry))
3059
target_parent_id = target_parent_entry[0][2]
3060
if target_parent_id == entry[0][2]:
3061
# This is the root, so the parent is None
3062
target_parent_id = None
3064
self.last_target_parent[0] = new_dirname
3065
self.last_target_parent[1] = target_parent_id
3067
source_exec = source_details[3]
3068
if (self.include_unchanged
3070
or source_parent_id != target_parent_id
3071
or old_basename != entry[0][1]
3072
or source_exec != target_exec
3074
if old_path is None:
3075
old_path = path = pathjoin(old_dirname, old_basename)
3076
old_path_u = self.utf8_decode(old_path)[0]
3079
old_path_u = self.utf8_decode(old_path)[0]
3080
if old_path == path:
3083
path_u = self.utf8_decode(path)[0]
3084
source_kind = DirState._minikind_to_kind[source_minikind]
3085
return (entry[0][2],
3086
(old_path_u, path_u),
3089
(source_parent_id, target_parent_id),
3090
(self.utf8_decode(old_basename)[0], self.utf8_decode(entry[0][1])[0]),
3091
(source_kind, target_kind),
3092
(source_exec, target_exec))
3094
return self.uninteresting
3095
elif source_minikind in 'a' and target_minikind in 'fdlt':
3096
# looks like a new file
3097
path = pathjoin(entry[0][0], entry[0][1])
3098
# parent id is the entry for the path in the target tree
3099
# TODO: these are the same for an entire directory: cache em.
3100
parent_id = self.state._get_entry(self.target_index,
3101
path_utf8=entry[0][0])[0][2]
3102
if parent_id == entry[0][2]:
3104
if path_info is not None:
3106
if self.use_filesystem_for_exec:
3107
# We need S_ISREG here, because we aren't sure if this
3110
stat.S_ISREG(path_info[3].st_mode)
3111
and stat.S_IEXEC & path_info[3].st_mode)
3113
target_exec = target_details[3]
3114
return (entry[0][2],
3115
(None, self.utf8_decode(path)[0]),
3119
(None, self.utf8_decode(entry[0][1])[0]),
3120
(None, path_info[2]),
3121
(None, target_exec))
3123
# Its a missing file, report it as such.
3124
return (entry[0][2],
3125
(None, self.utf8_decode(path)[0]),
3129
(None, self.utf8_decode(entry[0][1])[0]),
3132
elif source_minikind in 'fdlt' and target_minikind in 'a':
3133
# unversioned, possibly, or possibly not deleted: we dont care.
3134
# if its still on disk, *and* theres no other entry at this
3135
# path [we dont know this in this routine at the moment -
3136
# perhaps we should change this - then it would be an unknown.
3137
old_path = pathjoin(entry[0][0], entry[0][1])
3138
# parent id is the entry for the path in the target tree
3139
parent_id = self.state._get_entry(self.source_index, path_utf8=entry[0][0])[0][2]
3140
if parent_id == entry[0][2]:
3142
return (entry[0][2],
3143
(self.utf8_decode(old_path)[0], None),
3147
(self.utf8_decode(entry[0][1])[0], None),
3148
(DirState._minikind_to_kind[source_minikind], None),
3149
(source_details[3], None))
3150
elif source_minikind in 'fdlt' and target_minikind in 'r':
3151
# a rename; could be a true rename, or a rename inherited from
3152
# a renamed parent. TODO: handle this efficiently. Its not
3153
# common case to rename dirs though, so a correct but slow
3154
# implementation will do.
3155
if not osutils.is_inside_any(self.searched_specific_files, target_details[1]):
3156
self.search_specific_files.add(target_details[1])
3157
elif source_minikind in 'ra' and target_minikind in 'ra':
3158
# neither of the selected trees contain this file,
3159
# so skip over it. This is not currently directly tested, but
3160
# is indirectly via test_too_much.TestCommands.test_conflicts.
3163
raise AssertionError("don't know how to compare "
3164
"source_minikind=%r, target_minikind=%r"
3165
% (source_minikind, target_minikind))
3166
## import pdb;pdb.set_trace()
3172
def iter_changes(self):
3173
"""Iterate over the changes."""
3174
utf8_decode = cache_utf8._utf8_decode
3175
_cmp_by_dirs = cmp_by_dirs
3176
_process_entry = self._process_entry
3177
uninteresting = self.uninteresting
3178
search_specific_files = self.search_specific_files
3179
searched_specific_files = self.searched_specific_files
3180
splitpath = osutils.splitpath
3182
# compare source_index and target_index at or under each element of search_specific_files.
3183
# follow the following comparison table. Note that we only want to do diff operations when
3184
# the target is fdl because thats when the walkdirs logic will have exposed the pathinfo
3188
# Source | Target | disk | action
3189
# r | fdlt | | add source to search, add id path move and perform
3190
# | | | diff check on source-target
3191
# r | fdlt | a | dangling file that was present in the basis.
3193
# r | a | | add source to search
3195
# r | r | | this path is present in a non-examined tree, skip.
3196
# r | r | a | this path is present in a non-examined tree, skip.
3197
# a | fdlt | | add new id
3198
# a | fdlt | a | dangling locally added file, skip
3199
# a | a | | not present in either tree, skip
3200
# a | a | a | not present in any tree, skip
3201
# a | r | | not present in either tree at this path, skip as it
3202
# | | | may not be selected by the users list of paths.
3203
# a | r | a | not present in either tree at this path, skip as it
3204
# | | | may not be selected by the users list of paths.
3205
# fdlt | fdlt | | content in both: diff them
3206
# fdlt | fdlt | a | deleted locally, but not unversioned - show as deleted ?
3207
# fdlt | a | | unversioned: output deleted id for now
3208
# fdlt | a | a | unversioned and deleted: output deleted id
3209
# fdlt | r | | relocated in this tree, so add target to search.
3210
# | | | Dont diff, we will see an r,fd; pair when we reach
3211
# | | | this id at the other path.
3212
# fdlt | r | a | relocated in this tree, so add target to search.
3213
# | | | Dont diff, we will see an r,fd; pair when we reach
3214
# | | | this id at the other path.
3216
# TODO: jam 20070516 - Avoid the _get_entry lookup overhead by
3217
# keeping a cache of directories that we have seen.
3219
while search_specific_files:
3220
# TODO: the pending list should be lexically sorted? the
3221
# interface doesn't require it.
3222
current_root = search_specific_files.pop()
3223
current_root_unicode = current_root.decode('utf8')
3224
searched_specific_files.add(current_root)
3225
# process the entries for this containing directory: the rest will be
3226
# found by their parents recursively.
3227
root_entries = self.state._entries_for_path(current_root)
3228
root_abspath = self.tree.abspath(current_root_unicode)
3230
root_stat = os.lstat(root_abspath)
3232
if e.errno == errno.ENOENT:
3233
# the path does not exist: let _process_entry know that.
3234
root_dir_info = None
3236
# some other random error: hand it up.
3239
root_dir_info = ('', current_root,
3240
osutils.file_kind_from_stat_mode(root_stat.st_mode), root_stat,
3242
if root_dir_info[2] == 'directory':
3243
if self.tree._directory_is_tree_reference(
3244
current_root.decode('utf8')):
3245
root_dir_info = root_dir_info[:2] + \
3246
('tree-reference',) + root_dir_info[3:]
3248
if not root_entries and not root_dir_info:
3249
# this specified path is not present at all, skip it.
3251
path_handled = False
3252
for entry in root_entries:
3253
result = _process_entry(entry, root_dir_info)
3254
if result is not None:
3256
if result is not uninteresting:
3258
if self.want_unversioned and not path_handled and root_dir_info:
3259
new_executable = bool(
3260
stat.S_ISREG(root_dir_info[3].st_mode)
3261
and stat.S_IEXEC & root_dir_info[3].st_mode)
3263
(None, current_root_unicode),
3267
(None, splitpath(current_root_unicode)[-1]),
3268
(None, root_dir_info[2]),
3269
(None, new_executable)
3271
initial_key = (current_root, '', '')
3272
block_index, _ = self.state._find_block_index_from_key(initial_key)
3273
if block_index == 0:
3274
# we have processed the total root already, but because the
3275
# initial key matched it we should skip it here.
3277
if root_dir_info and root_dir_info[2] == 'tree-reference':
3278
current_dir_info = None
3280
dir_iterator = osutils._walkdirs_utf8(root_abspath, prefix=current_root)
3282
current_dir_info = dir_iterator.next()
3284
# on win32, python2.4 has e.errno == ERROR_DIRECTORY, but
3285
# python 2.5 has e.errno == EINVAL,
3286
# and e.winerror == ERROR_DIRECTORY
3287
e_winerror = getattr(e, 'winerror', None)
3288
win_errors = (ERROR_DIRECTORY, ERROR_PATH_NOT_FOUND)
3289
# there may be directories in the inventory even though
3290
# this path is not a file on disk: so mark it as end of
3292
if e.errno in (errno.ENOENT, errno.ENOTDIR, errno.EINVAL):
3293
current_dir_info = None
3294
elif (sys.platform == 'win32'
3295
and (e.errno in win_errors
3296
or e_winerror in win_errors)):
3297
current_dir_info = None
3301
if current_dir_info[0][0] == '':
3302
# remove .bzr from iteration
3303
bzr_index = bisect.bisect_left(current_dir_info[1], ('.bzr',))
3304
if current_dir_info[1][bzr_index][0] != '.bzr':
3305
raise AssertionError()
3306
del current_dir_info[1][bzr_index]
3307
# walk until both the directory listing and the versioned metadata
3309
if (block_index < len(self.state._dirblocks) and
3310
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
3311
current_block = self.state._dirblocks[block_index]
3313
current_block = None
3314
while (current_dir_info is not None or
3315
current_block is not None):
3316
if (current_dir_info and current_block
3317
and current_dir_info[0][0] != current_block[0]):
3318
if _cmp_by_dirs(current_dir_info[0][0], current_block[0]) < 0:
3319
# filesystem data refers to paths not covered by the dirblock.
3320
# this has two possibilities:
3321
# A) it is versioned but empty, so there is no block for it
3322
# B) it is not versioned.
3324
# if (A) then we need to recurse into it to check for
3325
# new unknown files or directories.
3326
# if (B) then we should ignore it, because we don't
3327
# recurse into unknown directories.
3329
while path_index < len(current_dir_info[1]):
3330
current_path_info = current_dir_info[1][path_index]
3331
if self.want_unversioned:
3332
if current_path_info[2] == 'directory':
3333
if self.tree._directory_is_tree_reference(
3334
current_path_info[0].decode('utf8')):
3335
current_path_info = current_path_info[:2] + \
3336
('tree-reference',) + current_path_info[3:]
3337
new_executable = bool(
3338
stat.S_ISREG(current_path_info[3].st_mode)
3339
and stat.S_IEXEC & current_path_info[3].st_mode)
3341
(None, utf8_decode(current_path_info[0])[0]),
3345
(None, utf8_decode(current_path_info[1])[0]),
3346
(None, current_path_info[2]),
3347
(None, new_executable))
3348
# dont descend into this unversioned path if it is
3350
if current_path_info[2] in ('directory',
3352
del current_dir_info[1][path_index]
3356
# This dir info has been handled, go to the next
3358
current_dir_info = dir_iterator.next()
3359
except StopIteration:
3360
current_dir_info = None
3362
# We have a dirblock entry for this location, but there
3363
# is no filesystem path for this. This is most likely
3364
# because a directory was removed from the disk.
3365
# We don't have to report the missing directory,
3366
# because that should have already been handled, but we
3367
# need to handle all of the files that are contained
3369
for current_entry in current_block[1]:
3370
# entry referring to file not present on disk.
3371
# advance the entry only, after processing.
3372
result = _process_entry(current_entry, None)
3373
if result is not None:
3374
if result is not uninteresting:
3377
if (block_index < len(self.state._dirblocks) and
3378
osutils.is_inside(current_root,
3379
self.state._dirblocks[block_index][0])):
3380
current_block = self.state._dirblocks[block_index]
3382
current_block = None
3385
if current_block and entry_index < len(current_block[1]):
3386
current_entry = current_block[1][entry_index]
3388
current_entry = None
3389
advance_entry = True
3391
if current_dir_info and path_index < len(current_dir_info[1]):
3392
current_path_info = current_dir_info[1][path_index]
3393
if current_path_info[2] == 'directory':
3394
if self.tree._directory_is_tree_reference(
3395
current_path_info[0].decode('utf8')):
3396
current_path_info = current_path_info[:2] + \
3397
('tree-reference',) + current_path_info[3:]
3399
current_path_info = None
3401
path_handled = False
3402
while (current_entry is not None or
3403
current_path_info is not None):
3404
if current_entry is None:
3405
# the check for path_handled when the path is adnvaced
3406
# will yield this path if needed.
3408
elif current_path_info is None:
3409
# no path is fine: the per entry code will handle it.
3410
result = _process_entry(current_entry, current_path_info)
3411
if result is not None:
3412
if result is not uninteresting:
3414
elif (current_entry[0][1] != current_path_info[1]
3415
or current_entry[1][self.target_index][0] in 'ar'):
3416
# The current path on disk doesn't match the dirblock
3417
# record. Either the dirblock is marked as absent, or
3418
# the file on disk is not present at all in the
3419
# dirblock. Either way, report about the dirblock
3420
# entry, and let other code handle the filesystem one.
3422
# Compare the basename for these files to determine
3424
if current_path_info[1] < current_entry[0][1]:
3425
# extra file on disk: pass for now, but only
3426
# increment the path, not the entry
3427
advance_entry = False
3429
# entry referring to file not present on disk.
3430
# advance the entry only, after processing.
3431
result = _process_entry(current_entry, None)
3432
if result is not None:
3433
if result is not uninteresting:
3435
advance_path = False
3437
result = _process_entry(current_entry, current_path_info)
3438
if result is not None:
3440
if result is not uninteresting:
3442
if advance_entry and current_entry is not None:
3444
if entry_index < len(current_block[1]):
3445
current_entry = current_block[1][entry_index]
3447
current_entry = None
3449
advance_entry = True # reset the advance flaga
3450
if advance_path and current_path_info is not None:
3451
if not path_handled:
3452
# unversioned in all regards
3453
if self.want_unversioned:
3454
new_executable = bool(
3455
stat.S_ISREG(current_path_info[3].st_mode)
3456
and stat.S_IEXEC & current_path_info[3].st_mode)
3458
relpath_unicode = utf8_decode(current_path_info[0])[0]
3459
except UnicodeDecodeError:
3460
raise errors.BadFilenameEncoding(
3461
current_path_info[0], osutils._fs_enc)
3463
(None, relpath_unicode),
3467
(None, utf8_decode(current_path_info[1])[0]),
3468
(None, current_path_info[2]),
3469
(None, new_executable))
3470
# dont descend into this unversioned path if it is
3472
if current_path_info[2] in ('directory'):
3473
del current_dir_info[1][path_index]
3475
# dont descend the disk iterator into any tree
3477
if current_path_info[2] == 'tree-reference':
3478
del current_dir_info[1][path_index]
3481
if path_index < len(current_dir_info[1]):
3482
current_path_info = current_dir_info[1][path_index]
3483
if current_path_info[2] == 'directory':
3484
if self.tree._directory_is_tree_reference(
3485
current_path_info[0].decode('utf8')):
3486
current_path_info = current_path_info[:2] + \
3487
('tree-reference',) + current_path_info[3:]
3489
current_path_info = None
3490
path_handled = False
3492
advance_path = True # reset the advance flagg.
3493
if current_block is not None:
3495
if (block_index < len(self.state._dirblocks) and
3496
osutils.is_inside(current_root, self.state._dirblocks[block_index][0])):
3497
current_block = self.state._dirblocks[block_index]
3499
current_block = None
3500
if current_dir_info is not None:
3502
current_dir_info = dir_iterator.next()
3503
except StopIteration:
3504
current_dir_info = None
3505
_process_entry = ProcessEntryPython
3508
# Try to load the compiled form if possible
3510
from bzrlib._dirstate_helpers_c import (
3511
_read_dirblocks_c as _read_dirblocks,
3512
bisect_dirblock_c as bisect_dirblock,
3513
_bisect_path_left_c as _bisect_path_left,
3514
_bisect_path_right_c as _bisect_path_right,
3515
cmp_by_dirs_c as cmp_by_dirs,
3516
ProcessEntryC as _process_entry,
3517
update_entry as update_entry,
3520
from bzrlib._dirstate_helpers_py import (
3521
_read_dirblocks_py as _read_dirblocks,
3522
bisect_dirblock_py as bisect_dirblock,
3523
_bisect_path_left_py as _bisect_path_left,
3524
_bisect_path_right_py as _bisect_path_right,
3525
cmp_by_dirs_py as cmp_by_dirs,