1
# Copyright (C) 2006, 2007, 2008 Canonical Ltd
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
# GNU General Public License for more details.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
"""DirState objects record the state of a directory and its bzr metadata.
19
Pseudo EBNF grammar for the state file. Fields are separated by NULLs, and
20
lines by NL. The field delimiters are ommitted in the grammar, line delimiters
21
are not - this is done for clarity of reading. All string data is in utf8.
23
MINIKIND = "f" | "d" | "l" | "a" | "r" | "t";
26
WHOLE_NUMBER = {digit}, digit;
28
REVISION_ID = a non-empty utf8 string;
30
dirstate format = header line, full checksum, row count, parent details,
31
ghost_details, entries;
32
header line = "#bazaar dirstate flat format 3", NL;
33
full checksum = "crc32: ", ["-"], WHOLE_NUMBER, NL;
34
row count = "num_entries: ", WHOLE_NUMBER, NL;
35
parent_details = WHOLE NUMBER, {REVISION_ID}* NL;
36
ghost_details = WHOLE NUMBER, {REVISION_ID}*, NL;
38
entry = entry_key, current_entry_details, {parent_entry_details};
39
entry_key = dirname, basename, fileid;
40
current_entry_details = common_entry_details, working_entry_details;
41
parent_entry_details = common_entry_details, history_entry_details;
42
common_entry_details = MINIKIND, fingerprint, size, executable
43
working_entry_details = packed_stat
44
history_entry_details = REVISION_ID;
47
fingerprint = a nonempty utf8 sequence with meaning defined by minikind.
49
Given this definition, the following is useful to know:
50
entry (aka row) - all the data for a given key.
51
entry[0]: The key (dirname, basename, fileid)
55
entry[1]: The tree(s) data for this path and id combination.
56
entry[1][0]: The current tree
57
entry[1][1]: The second tree
59
For an entry for a tree, we have (using tree 0 - current tree) to demonstrate:
60
entry[1][0][0]: minikind
61
entry[1][0][1]: fingerprint
63
entry[1][0][3]: executable
64
entry[1][0][4]: packed_stat
66
entry[1][1][4]: revision_id
68
There may be multiple rows at the root, one per id present in the root, so the
69
in memory root row is now:
70
self._dirblocks[0] -> ('', [entry ...]),
71
and the entries in there are
74
entries[0][2]: file_id
75
entries[1][0]: The tree data for the current tree for this fileid at /
79
'r' is a relocated entry: This path is not present in this tree with this id,
80
but the id can be found at another location. The fingerprint is used to
81
point to the target location.
82
'a' is an absent entry: In that tree the id is not present at this path.
83
'd' is a directory entry: This path in this tree is a directory with the
84
current file id. There is no fingerprint for directories.
85
'f' is a file entry: As for directory, but its a file. The fingerprint is a
87
'l' is a symlink entry: As for directory, but a symlink. The fingerprint is the
89
't' is a reference to a nested subtree; the fingerprint is the referenced
94
The entries on disk and in memory are ordered according to the following keys:
96
directory, as a list of components
100
--- Format 1 had the following different definition: ---
101
rows = dirname, NULL, basename, NULL, MINIKIND, NULL, fileid_utf8, NULL,
102
WHOLE NUMBER (* size *), NULL, packed stat, NULL, sha1|symlink target,
104
PARENT ROW = NULL, revision_utf8, NULL, MINIKIND, NULL, dirname, NULL,
105
basename, NULL, WHOLE NUMBER (* size *), NULL, "y" | "n", NULL,
108
PARENT ROW's are emitted for every parent that is not in the ghosts details
109
line. That is, if the parents are foo, bar, baz, and the ghosts are bar, then
110
each row will have a PARENT ROW for foo and baz, but not for bar.
113
In any tree, a kind of 'moved' indicates that the fingerprint field
114
(which we treat as opaque data specific to the 'kind' anyway) has the
115
details for the id of this row in that tree.
117
I'm strongly tempted to add a id->path index as well, but I think that
118
where we need id->path mapping; we also usually read the whole file, so
119
I'm going to skip that for the moment, as we have the ability to locate
120
via bisect any path in any tree, and if we lookup things by path, we can
121
accumulate an id->path mapping as we go, which will tend to match what we
124
I plan to implement this asap, so please speak up now to alter/tweak the
125
design - and once we stabilise on this, I'll update the wiki page for
128
The rationale for all this is that we want fast operations for the
129
common case (diff/status/commit/merge on all files) and extremely fast
130
operations for the less common but still occurs a lot status/diff/commit
131
on specific files). Operations on specific files involve a scan for all
132
the children of a path, *in every involved tree*, which the current
133
format did not accommodate.
137
1) Fast end to end use for bzr's top 5 uses cases. (commmit/diff/status/merge/???)
138
2) fall back current object model as needed.
139
3) scale usably to the largest trees known today - say 50K entries. (mozilla
140
is an example of this)
144
Eventually reuse dirstate objects across locks IFF the dirstate file has not
145
been modified, but will require that we flush/ignore cached stat-hit data
146
because we won't want to restat all files on disk just because a lock was
147
acquired, yet we cannot trust the data after the previous lock was released.
149
Memory representation:
150
vector of all directories, and vector of the childen ?
152
root_entrie = (direntry for root, [parent_direntries_for_root]),
154
('', ['data for achild', 'data for bchild', 'data for cchild'])
155
('dir', ['achild', 'cchild', 'echild'])
157
- single bisect to find N subtrees from a path spec
158
- in-order for serialisation - this is 'dirblock' grouping.
159
- insertion of a file '/a' affects only the '/' child-vector, that is, to
160
insert 10K elements from scratch does not generates O(N^2) memoves of a
161
single vector, rather each individual, which tends to be limited to a
162
manageable number. Will scale badly on trees with 10K entries in a
163
single directory. compare with Inventory.InventoryDirectory which has
164
a dictionary for the children. No bisect capability, can only probe for
165
exact matches, or grab all elements and sort.
166
- What's the risk of error here? Once we have the base format being processed
167
we should have a net win regardless of optimality. So we are going to
168
go with what seems reasonable.
171
Maybe we should do a test profile of the core structure - 10K simulated
172
searches/lookups/etc?
174
Objects for each row?
175
The lifetime of Dirstate objects is current per lock, but see above for
176
possible extensions. The lifetime of a row from a dirstate is expected to be
177
very short in the optimistic case: which we are optimising for. For instance,
178
subtree status will determine from analysis of the disk data what rows need to
179
be examined at all, and will be able to determine from a single row whether
180
that file has altered or not, so we are aiming to process tens of thousands of
181
entries each second within the dirstate context, before exposing anything to
182
the larger codebase. This suggests we want the time for a single file
183
comparison to be < 0.1 milliseconds. That would give us 10000 paths per second
184
processed, and to scale to 100 thousand we'll another order of magnitude to do
185
that. Now, as the lifetime for all unchanged entries is the time to parse, stat
186
the file on disk, and then immediately discard, the overhead of object creation
187
becomes a significant cost.
189
Figures: Creating a tuple from from 3 elements was profiled at 0.0625
190
microseconds, whereas creating a object which is subclassed from tuple was
191
0.500 microseconds, and creating an object with 3 elements and slots was 3
192
microseconds long. 0.1 milliseconds is 100 microseconds, and ideally we'll get
193
down to 10 microseconds for the total processing - having 33% of that be object
194
creation is a huge overhead. There is a potential cost in using tuples within
195
each row which is that the conditional code to do comparisons may be slower
196
than method invocation, but method invocation is known to be slow due to stack
197
frame creation, so avoiding methods in these tight inner loops in unfortunately
198
desirable. We can consider a pyrex version of this with objects in future if
207
from stat import S_IEXEC
223
from bzrlib.osutils import pathjoin
226
# compile the struct compiler we need, so as to only do it once
227
from _struct import Struct
228
_compiled_pack = Struct('>LLLLLL').pack
229
def pack_stat(st, _encode=binascii.b2a_base64, _pack=_compiled_pack):
230
"""Convert stat values into a packed representation."""
231
# jam 20060614 it isn't really worth removing more entries if we
232
# are going to leave it in packed form.
233
# With only st_mtime and st_mode filesize is 5.5M and read time is 275ms
234
# With all entries, filesize is 5.9M and read time is maybe 280ms
235
# well within the noise margin
237
# base64 encoding always adds a final newline, so strip it off
238
# The current version
239
return _encode(_pack(st.st_size, int(st.st_mtime), int(st.st_ctime)
240
, st.st_dev, st.st_ino & 0xFFFFFFFF, st.st_mode))[:-1]
241
# This is 0.060s / 1.520s faster by not encoding as much information
242
# return _encode(_pack('>LL', int(st.st_mtime), st.st_mode))[:-1]
243
# This is not strictly faster than _encode(_pack())[:-1]
244
# return '%X.%X.%X.%X.%X.%X' % (
245
# st.st_size, int(st.st_mtime), int(st.st_ctime),
246
# st.st_dev, st.st_ino, st.st_mode)
247
# Similar to the _encode(_pack('>LL'))
248
# return '%X.%X' % (int(st.st_mtime), st.st_mode)
251
class DirState(object):
252
"""Record directory and metadata state for fast access.
254
A dirstate is a specialised data structure for managing local working
255
tree state information. Its not yet well defined whether it is platform
256
specific, and if it is how we detect/parameterize that.
258
Dirstates use the usual lock_write, lock_read and unlock mechanisms.
259
Unlike most bzr disk formats, DirStates must be locked for reading, using
260
lock_read. (This is an os file lock internally.) This is necessary
261
because the file can be rewritten in place.
263
DirStates must be explicitly written with save() to commit changes; just
264
unlocking them does not write the changes to disk.
267
_kind_to_minikind = {
273
'tree-reference': 't',
275
_minikind_to_kind = {
281
't': 'tree-reference',
283
_stat_to_minikind = {
288
_to_yesno = {True:'y', False: 'n'} # TODO profile the performance gain
289
# of using int conversion rather than a dict here. AND BLAME ANDREW IF
292
# TODO: jam 20070221 Figure out what to do if we have a record that exceeds
293
# the BISECT_PAGE_SIZE. For now, we just have to make it large enough
294
# that we are sure a single record will always fit.
295
BISECT_PAGE_SIZE = 4096
298
IN_MEMORY_UNMODIFIED = 1
299
IN_MEMORY_MODIFIED = 2
301
# A pack_stat (the x's) that is just noise and will never match the output
304
NULL_PARENT_DETAILS = ('a', '', 0, False, '')
306
HEADER_FORMAT_2 = '#bazaar dirstate flat format 2\n'
307
HEADER_FORMAT_3 = '#bazaar dirstate flat format 3\n'
309
def __init__(self, path):
310
"""Create a DirState object.
312
:param path: The path at which the dirstate file on disk should live.
314
# _header_state and _dirblock_state represent the current state
315
# of the dirstate metadata and the per-row data respectiely.
316
# NOT_IN_MEMORY indicates that no data is in memory
317
# IN_MEMORY_UNMODIFIED indicates that what we have in memory
318
# is the same as is on disk
319
# IN_MEMORY_MODIFIED indicates that we have a modified version
320
# of what is on disk.
321
# In future we will add more granularity, for instance _dirblock_state
322
# will probably support partially-in-memory as a separate variable,
323
# allowing for partially-in-memory unmodified and partially-in-memory
325
self._header_state = DirState.NOT_IN_MEMORY
326
self._dirblock_state = DirState.NOT_IN_MEMORY
327
# If true, an error has been detected while updating the dirstate, and
328
# for safety we're not going to commit to disk.
329
self._changes_aborted = False
333
self._state_file = None
334
self._filename = path
335
self._lock_token = None
336
self._lock_state = None
337
self._id_index = None
338
# a map from packed_stat to sha's.
339
self._packed_stat_index = None
340
self._end_of_header = None
341
self._cutoff_time = None
342
self._split_path_cache = {}
343
self._bisect_page_size = DirState.BISECT_PAGE_SIZE
344
if 'hashcache' in debug.debug_flags:
345
self._sha1_file = self._sha1_file_and_mutter
347
self._sha1_file = osutils.sha_file_by_name
348
# These two attributes provide a simple cache for lookups into the
349
# dirstate in-memory vectors. By probing respectively for the last
350
# block, and for the next entry, we save nearly 2 bisections per path
352
self._last_block_index = None
353
self._last_entry_index = None
357
(self.__class__.__name__, self._filename)
359
def add(self, path, file_id, kind, stat, fingerprint):
360
"""Add a path to be tracked.
362
:param path: The path within the dirstate - '' is the root, 'foo' is the
363
path foo within the root, 'foo/bar' is the path bar within foo
365
:param file_id: The file id of the path being added.
366
:param kind: The kind of the path, as a string like 'file',
368
:param stat: The output of os.lstat for the path.
369
:param fingerprint: The sha value of the file,
370
or the target of a symlink,
371
or the referenced revision id for tree-references,
372
or '' for directories.
375
# find the block its in.
376
# find the location in the block.
377
# check its not there
379
#------- copied from inventory.ensure_normalized_name - keep synced.
380
# --- normalized_filename wants a unicode basename only, so get one.
381
dirname, basename = osutils.split(path)
382
# we dont import normalized_filename directly because we want to be
383
# able to change the implementation at runtime for tests.
384
norm_name, can_access = osutils.normalized_filename(basename)
385
if norm_name != basename:
389
raise errors.InvalidNormalization(path)
390
# you should never have files called . or ..; just add the directory
391
# in the parent, or according to the special treatment for the root
392
if basename == '.' or basename == '..':
393
raise errors.InvalidEntryName(path)
394
# now that we've normalised, we need the correct utf8 path and
395
# dirname and basename elements. This single encode and split should be
396
# faster than three separate encodes.
397
utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
398
dirname, basename = osutils.split(utf8path)
399
# uses __class__ for speed; the check is needed for safety
400
if file_id.__class__ is not str:
401
raise AssertionError(
402
"must be a utf8 file_id not %s" % (type(file_id), ))
403
# Make sure the file_id does not exist in this tree
404
file_id_entry = self._get_entry(0, fileid_utf8=file_id)
405
if file_id_entry != (None, None):
406
path = osutils.pathjoin(file_id_entry[0][0], file_id_entry[0][1])
407
kind = DirState._minikind_to_kind[file_id_entry[1][0][0]]
408
info = '%s:%s' % (kind, path)
409
raise errors.DuplicateFileId(file_id, info)
410
first_key = (dirname, basename, '')
411
block_index, present = self._find_block_index_from_key(first_key)
413
# check the path is not in the tree
414
block = self._dirblocks[block_index][1]
415
entry_index, _ = self._find_entry_index(first_key, block)
416
while (entry_index < len(block) and
417
block[entry_index][0][0:2] == first_key[0:2]):
418
if block[entry_index][1][0][0] not in 'ar':
419
# this path is in the dirstate in the current tree.
420
raise Exception, "adding already added path!"
423
# The block where we want to put the file is not present. But it
424
# might be because the directory was empty, or not loaded yet. Look
425
# for a parent entry, if not found, raise NotVersionedError
426
parent_dir, parent_base = osutils.split(dirname)
427
parent_block_idx, parent_entry_idx, _, parent_present = \
428
self._get_block_entry_index(parent_dir, parent_base, 0)
429
if not parent_present:
430
raise errors.NotVersionedError(path, str(self))
431
self._ensure_block(parent_block_idx, parent_entry_idx, dirname)
432
block = self._dirblocks[block_index][1]
433
entry_key = (dirname, basename, file_id)
436
packed_stat = DirState.NULLSTAT
439
packed_stat = pack_stat(stat)
440
parent_info = self._empty_parent_info()
441
minikind = DirState._kind_to_minikind[kind]
443
entry_data = entry_key, [
444
(minikind, fingerprint, size, False, packed_stat),
446
elif kind == 'directory':
447
entry_data = entry_key, [
448
(minikind, '', 0, False, packed_stat),
450
elif kind == 'symlink':
451
entry_data = entry_key, [
452
(minikind, fingerprint, size, False, packed_stat),
454
elif kind == 'tree-reference':
455
entry_data = entry_key, [
456
(minikind, fingerprint, 0, False, packed_stat),
459
raise errors.BzrError('unknown kind %r' % kind)
460
entry_index, present = self._find_entry_index(entry_key, block)
462
block.insert(entry_index, entry_data)
464
if block[entry_index][1][0][0] != 'a':
465
raise AssertionError(" %r(%r) already added" % (basename, file_id))
466
block[entry_index][1][0] = entry_data[1][0]
468
if kind == 'directory':
469
# insert a new dirblock
470
self._ensure_block(block_index, entry_index, utf8path)
471
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
473
self._id_index.setdefault(entry_key[2], set()).add(entry_key)
475
def _bisect(self, paths):
476
"""Bisect through the disk structure for specific rows.
478
:param paths: A list of paths to find
479
:return: A dict mapping path => entries for found entries. Missing
480
entries will not be in the map.
481
The list is not sorted, and entries will be populated
482
based on when they were read.
484
self._requires_lock()
485
# We need the file pointer to be right after the initial header block
486
self._read_header_if_needed()
487
# If _dirblock_state was in memory, we should just return info from
488
# there, this function is only meant to handle when we want to read
490
if self._dirblock_state != DirState.NOT_IN_MEMORY:
491
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
493
# The disk representation is generally info + '\0\n\0' at the end. But
494
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
495
# Because it means we can sync on the '\n'
496
state_file = self._state_file
497
file_size = os.fstat(state_file.fileno()).st_size
498
# We end up with 2 extra fields, we should have a trailing '\n' to
499
# ensure that we read the whole record, and we should have a precursur
500
# '' which ensures that we start after the previous '\n'
501
entry_field_count = self._fields_per_entry() + 1
503
low = self._end_of_header
504
high = file_size - 1 # Ignore the final '\0'
505
# Map from (dir, name) => entry
508
# Avoid infinite seeking
509
max_count = 30*len(paths)
511
# pending is a list of places to look.
512
# each entry is a tuple of low, high, dir_names
513
# low -> the first byte offset to read (inclusive)
514
# high -> the last byte offset (inclusive)
515
# dir_names -> The list of (dir, name) pairs that should be found in
516
# the [low, high] range
517
pending = [(low, high, paths)]
519
page_size = self._bisect_page_size
521
fields_to_entry = self._get_fields_to_entry()
524
low, high, cur_files = pending.pop()
526
if not cur_files or low >= high:
531
if count > max_count:
532
raise errors.BzrError('Too many seeks, most likely a bug.')
534
mid = max(low, (low+high-page_size)/2)
537
# limit the read size, so we don't end up reading data that we have
539
read_size = min(page_size, (high-mid)+1)
540
block = state_file.read(read_size)
543
entries = block.split('\n')
546
# We didn't find a '\n', so we cannot have found any records.
547
# So put this range back and try again. But we know we have to
548
# increase the page size, because a single read did not contain
549
# a record break (so records must be larger than page_size)
551
pending.append((low, high, cur_files))
554
# Check the first and last entries, in case they are partial, or if
555
# we don't care about the rest of this page
557
first_fields = entries[0].split('\0')
558
if len(first_fields) < entry_field_count:
559
# We didn't get the complete first entry
560
# so move start, and grab the next, which
561
# should be a full entry
562
start += len(entries[0])+1
563
first_fields = entries[1].split('\0')
566
if len(first_fields) <= 2:
567
# We didn't even get a filename here... what do we do?
568
# Try a large page size and repeat this query
570
pending.append((low, high, cur_files))
573
# Find what entries we are looking for, which occur before and
574
# after this first record.
577
first_path = first_fields[1] + '/' + first_fields[2]
579
first_path = first_fields[2]
580
first_loc = _bisect_path_left(cur_files, first_path)
582
# These exist before the current location
583
pre = cur_files[:first_loc]
584
# These occur after the current location, which may be in the
585
# data we read, or might be after the last entry
586
post = cur_files[first_loc:]
588
if post and len(first_fields) >= entry_field_count:
589
# We have files after the first entry
591
# Parse the last entry
592
last_entry_num = len(entries)-1
593
last_fields = entries[last_entry_num].split('\0')
594
if len(last_fields) < entry_field_count:
595
# The very last hunk was not complete,
596
# read the previous hunk
597
after = mid + len(block) - len(entries[-1])
599
last_fields = entries[last_entry_num].split('\0')
601
after = mid + len(block)
604
last_path = last_fields[1] + '/' + last_fields[2]
606
last_path = last_fields[2]
607
last_loc = _bisect_path_right(post, last_path)
609
middle_files = post[:last_loc]
610
post = post[last_loc:]
613
# We have files that should occur in this block
614
# (>= first, <= last)
615
# Either we will find them here, or we can mark them as
618
if middle_files[0] == first_path:
619
# We might need to go before this location
620
pre.append(first_path)
621
if middle_files[-1] == last_path:
622
post.insert(0, last_path)
624
# Find out what paths we have
625
paths = {first_path:[first_fields]}
626
# last_path might == first_path so we need to be
627
# careful if we should append rather than overwrite
628
if last_entry_num != first_entry_num:
629
paths.setdefault(last_path, []).append(last_fields)
630
for num in xrange(first_entry_num+1, last_entry_num):
631
# TODO: jam 20070223 We are already splitting here, so
632
# shouldn't we just split the whole thing rather
633
# than doing the split again in add_one_record?
634
fields = entries[num].split('\0')
636
path = fields[1] + '/' + fields[2]
639
paths.setdefault(path, []).append(fields)
641
for path in middle_files:
642
for fields in paths.get(path, []):
643
# offset by 1 because of the opening '\0'
644
# consider changing fields_to_entry to avoid the
646
entry = fields_to_entry(fields[1:])
647
found.setdefault(path, []).append(entry)
649
# Now we have split up everything into pre, middle, and post, and
650
# we have handled everything that fell in 'middle'.
651
# We add 'post' first, so that we prefer to seek towards the
652
# beginning, so that we will tend to go as early as we need, and
653
# then only seek forward after that.
655
pending.append((after, high, post))
657
pending.append((low, start-1, pre))
659
# Consider that we may want to return the directory entries in sorted
660
# order. For now, we just return them in whatever order we found them,
661
# and leave it up to the caller if they care if it is ordered or not.
664
def _bisect_dirblocks(self, dir_list):
665
"""Bisect through the disk structure to find entries in given dirs.
667
_bisect_dirblocks is meant to find the contents of directories, which
668
differs from _bisect, which only finds individual entries.
670
:param dir_list: A sorted list of directory names ['', 'dir', 'foo'].
671
:return: A map from dir => entries_for_dir
673
# TODO: jam 20070223 A lot of the bisecting logic could be shared
674
# between this and _bisect. It would require parameterizing the
675
# inner loop with a function, though. We should evaluate the
676
# performance difference.
677
self._requires_lock()
678
# We need the file pointer to be right after the initial header block
679
self._read_header_if_needed()
680
# If _dirblock_state was in memory, we should just return info from
681
# there, this function is only meant to handle when we want to read
683
if self._dirblock_state != DirState.NOT_IN_MEMORY:
684
raise AssertionError("bad dirblock state %r" % self._dirblock_state)
685
# The disk representation is generally info + '\0\n\0' at the end. But
686
# for bisecting, it is easier to treat this as '\0' + info + '\0\n'
687
# Because it means we can sync on the '\n'
688
state_file = self._state_file
689
file_size = os.fstat(state_file.fileno()).st_size
690
# We end up with 2 extra fields, we should have a trailing '\n' to
691
# ensure that we read the whole record, and we should have a precursur
692
# '' which ensures that we start after the previous '\n'
693
entry_field_count = self._fields_per_entry() + 1
695
low = self._end_of_header
696
high = file_size - 1 # Ignore the final '\0'
697
# Map from dir => entry
700
# Avoid infinite seeking
701
max_count = 30*len(dir_list)
703
# pending is a list of places to look.
704
# each entry is a tuple of low, high, dir_names
705
# low -> the first byte offset to read (inclusive)
706
# high -> the last byte offset (inclusive)
707
# dirs -> The list of directories that should be found in
708
# the [low, high] range
709
pending = [(low, high, dir_list)]
711
page_size = self._bisect_page_size
713
fields_to_entry = self._get_fields_to_entry()
716
low, high, cur_dirs = pending.pop()
718
if not cur_dirs or low >= high:
723
if count > max_count:
724
raise errors.BzrError('Too many seeks, most likely a bug.')
726
mid = max(low, (low+high-page_size)/2)
729
# limit the read size, so we don't end up reading data that we have
731
read_size = min(page_size, (high-mid)+1)
732
block = state_file.read(read_size)
735
entries = block.split('\n')
738
# We didn't find a '\n', so we cannot have found any records.
739
# So put this range back and try again. But we know we have to
740
# increase the page size, because a single read did not contain
741
# a record break (so records must be larger than page_size)
743
pending.append((low, high, cur_dirs))
746
# Check the first and last entries, in case they are partial, or if
747
# we don't care about the rest of this page
749
first_fields = entries[0].split('\0')
750
if len(first_fields) < entry_field_count:
751
# We didn't get the complete first entry
752
# so move start, and grab the next, which
753
# should be a full entry
754
start += len(entries[0])+1
755
first_fields = entries[1].split('\0')
758
if len(first_fields) <= 1:
759
# We didn't even get a dirname here... what do we do?
760
# Try a large page size and repeat this query
762
pending.append((low, high, cur_dirs))
765
# Find what entries we are looking for, which occur before and
766
# after this first record.
768
first_dir = first_fields[1]
769
first_loc = bisect.bisect_left(cur_dirs, first_dir)
771
# These exist before the current location
772
pre = cur_dirs[:first_loc]
773
# These occur after the current location, which may be in the
774
# data we read, or might be after the last entry
775
post = cur_dirs[first_loc:]
777
if post and len(first_fields) >= entry_field_count:
778
# We have records to look at after the first entry
780
# Parse the last entry
781
last_entry_num = len(entries)-1
782
last_fields = entries[last_entry_num].split('\0')
783
if len(last_fields) < entry_field_count:
784
# The very last hunk was not complete,
785
# read the previous hunk
786
after = mid + len(block) - len(entries[-1])
788
last_fields = entries[last_entry_num].split('\0')
790
after = mid + len(block)
792
last_dir = last_fields[1]
793
last_loc = bisect.bisect_right(post, last_dir)
795
middle_files = post[:last_loc]
796
post = post[last_loc:]
799
# We have files that should occur in this block
800
# (>= first, <= last)
801
# Either we will find them here, or we can mark them as
804
if middle_files[0] == first_dir:
805
# We might need to go before this location
806
pre.append(first_dir)
807
if middle_files[-1] == last_dir:
808
post.insert(0, last_dir)
810
# Find out what paths we have
811
paths = {first_dir:[first_fields]}
812
# last_dir might == first_dir so we need to be
813
# careful if we should append rather than overwrite
814
if last_entry_num != first_entry_num:
815
paths.setdefault(last_dir, []).append(last_fields)
816
for num in xrange(first_entry_num+1, last_entry_num):
817
# TODO: jam 20070223 We are already splitting here, so
818
# shouldn't we just split the whole thing rather
819
# than doing the split again in add_one_record?
820
fields = entries[num].split('\0')
821
paths.setdefault(fields[1], []).append(fields)
823
for cur_dir in middle_files:
824
for fields in paths.get(cur_dir, []):
825
# offset by 1 because of the opening '\0'
826
# consider changing fields_to_entry to avoid the
828
entry = fields_to_entry(fields[1:])
829
found.setdefault(cur_dir, []).append(entry)
831
# Now we have split up everything into pre, middle, and post, and
832
# we have handled everything that fell in 'middle'.
833
# We add 'post' first, so that we prefer to seek towards the
834
# beginning, so that we will tend to go as early as we need, and
835
# then only seek forward after that.
837
pending.append((after, high, post))
839
pending.append((low, start-1, pre))
843
def _bisect_recursive(self, paths):
844
"""Bisect for entries for all paths and their children.
846
This will use bisect to find all records for the supplied paths. It
847
will then continue to bisect for any records which are marked as
848
directories. (and renames?)
850
:param paths: A sorted list of (dir, name) pairs
851
eg: [('', 'a'), ('', 'f'), ('a/b', 'c')]
852
:return: A dictionary mapping (dir, name, file_id) => [tree_info]
854
# Map from (dir, name, file_id) => [tree_info]
857
found_dir_names = set()
859
# Directories that have been read
860
processed_dirs = set()
861
# Get the ball rolling with the first bisect for all entries.
862
newly_found = self._bisect(paths)
865
# Directories that need to be read
867
paths_to_search = set()
868
for entry_list in newly_found.itervalues():
869
for dir_name_id, trees_info in entry_list:
870
found[dir_name_id] = trees_info
871
found_dir_names.add(dir_name_id[:2])
873
for tree_info in trees_info:
874
minikind = tree_info[0]
877
# We already processed this one as a directory,
878
# we don't need to do the extra work again.
880
subdir, name, file_id = dir_name_id
881
path = osutils.pathjoin(subdir, name)
883
if path not in processed_dirs:
884
pending_dirs.add(path)
885
elif minikind == 'r':
886
# Rename, we need to directly search the target
887
# which is contained in the fingerprint column
888
dir_name = osutils.split(tree_info[1])
889
if dir_name[0] in pending_dirs:
890
# This entry will be found in the dir search
892
if dir_name not in found_dir_names:
893
paths_to_search.add(tree_info[1])
894
# Now we have a list of paths to look for directly, and
895
# directory blocks that need to be read.
896
# newly_found is mixing the keys between (dir, name) and path
897
# entries, but that is okay, because we only really care about the
899
newly_found = self._bisect(sorted(paths_to_search))
900
newly_found.update(self._bisect_dirblocks(sorted(pending_dirs)))
901
processed_dirs.update(pending_dirs)
904
def _discard_merge_parents(self):
905
"""Discard any parents trees beyond the first.
907
Note that if this fails the dirstate is corrupted.
909
After this function returns the dirstate contains 2 trees, neither of
912
self._read_header_if_needed()
913
parents = self.get_parent_ids()
916
# only require all dirblocks if we are doing a full-pass removal.
917
self._read_dirblocks_if_needed()
918
dead_patterns = set([('a', 'r'), ('a', 'a'), ('r', 'r'), ('r', 'a')])
919
def iter_entries_removable():
920
for block in self._dirblocks:
921
deleted_positions = []
922
for pos, entry in enumerate(block[1]):
924
if (entry[1][0][0], entry[1][1][0]) in dead_patterns:
925
deleted_positions.append(pos)
926
if deleted_positions:
927
if len(deleted_positions) == len(block[1]):
930
for pos in reversed(deleted_positions):
932
# if the first parent is a ghost:
933
if parents[0] in self.get_ghosts():
934
empty_parent = [DirState.NULL_PARENT_DETAILS]
935
for entry in iter_entries_removable():
936
entry[1][1:] = empty_parent
938
for entry in iter_entries_removable():
942
self._parents = [parents[0]]
943
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
944
self._header_state = DirState.IN_MEMORY_MODIFIED
946
def _empty_parent_info(self):
947
return [DirState.NULL_PARENT_DETAILS] * (len(self._parents) -
950
def _ensure_block(self, parent_block_index, parent_row_index, dirname):
951
"""Ensure a block for dirname exists.
953
This function exists to let callers which know that there is a
954
directory dirname ensure that the block for it exists. This block can
955
fail to exist because of demand loading, or because a directory had no
956
children. In either case it is not an error. It is however an error to
957
call this if there is no parent entry for the directory, and thus the
958
function requires the coordinates of such an entry to be provided.
960
The root row is special cased and can be indicated with a parent block
963
:param parent_block_index: The index of the block in which dirname's row
965
:param parent_row_index: The index in the parent block where the row
967
:param dirname: The utf8 dirname to ensure there is a block for.
968
:return: The index for the block.
970
if dirname == '' and parent_row_index == 0 and parent_block_index == 0:
971
# This is the signature of the root row, and the
972
# contents-of-root row is always index 1
974
# the basename of the directory must be the end of its full name.
975
if not (parent_block_index == -1 and
976
parent_block_index == -1 and dirname == ''):
977
if not dirname.endswith(
978
self._dirblocks[parent_block_index][1][parent_row_index][0][1]):
979
raise AssertionError("bad dirname %r" % dirname)
980
block_index, present = self._find_block_index_from_key((dirname, '', ''))
982
## In future, when doing partial parsing, this should load and
983
# populate the entire block.
984
self._dirblocks.insert(block_index, (dirname, []))
987
def _entries_to_current_state(self, new_entries):
988
"""Load new_entries into self.dirblocks.
990
Process new_entries into the current state object, making them the active
991
state. The entries are grouped together by directory to form dirblocks.
993
:param new_entries: A sorted list of entries. This function does not sort
994
to prevent unneeded overhead when callers have a sorted list already.
997
if new_entries[0][0][0:2] != ('', ''):
998
raise AssertionError(
999
"Missing root row %r" % (new_entries[0][0],))
1000
# The two blocks here are deliberate: the root block and the
1001
# contents-of-root block.
1002
self._dirblocks = [('', []), ('', [])]
1003
current_block = self._dirblocks[0][1]
1004
current_dirname = ''
1006
append_entry = current_block.append
1007
for entry in new_entries:
1008
if entry[0][0] != current_dirname:
1009
# new block - different dirname
1011
current_dirname = entry[0][0]
1012
self._dirblocks.append((current_dirname, current_block))
1013
append_entry = current_block.append
1014
# append the entry to the current block
1016
self._split_root_dirblock_into_contents()
1018
def _split_root_dirblock_into_contents(self):
1019
"""Split the root dirblocks into root and contents-of-root.
1021
After parsing by path, we end up with root entries and contents-of-root
1022
entries in the same block. This loop splits them out again.
1024
# The above loop leaves the "root block" entries mixed with the
1025
# "contents-of-root block". But we don't want an if check on
1026
# all entries, so instead we just fix it up here.
1027
if self._dirblocks[1] != ('', []):
1028
raise ValueError("bad dirblock start %r" % (self._dirblocks[1],))
1030
contents_of_root_block = []
1031
for entry in self._dirblocks[0][1]:
1032
if not entry[0][1]: # This is a root entry
1033
root_block.append(entry)
1035
contents_of_root_block.append(entry)
1036
self._dirblocks[0] = ('', root_block)
1037
self._dirblocks[1] = ('', contents_of_root_block)
1039
def _entry_to_line(self, entry):
1040
"""Serialize entry to a NULL delimited line ready for _get_output_lines.
1042
:param entry: An entry_tuple as defined in the module docstring.
1044
entire_entry = list(entry[0])
1045
for tree_number, tree_data in enumerate(entry[1]):
1046
# (minikind, fingerprint, size, executable, tree_specific_string)
1047
entire_entry.extend(tree_data)
1048
# 3 for the key, 5 for the fields per tree.
1049
tree_offset = 3 + tree_number * 5
1051
entire_entry[tree_offset + 0] = tree_data[0]
1053
entire_entry[tree_offset + 2] = str(tree_data[2])
1055
entire_entry[tree_offset + 3] = DirState._to_yesno[tree_data[3]]
1056
return '\0'.join(entire_entry)
1058
def _fields_per_entry(self):
1059
"""How many null separated fields should be in each entry row.
1061
Each line now has an extra '\n' field which is not used
1062
so we just skip over it
1064
3 fields for the key
1065
+ number of fields per tree_data (5) * tree count
1068
tree_count = 1 + self._num_present_parents()
1069
return 3 + 5 * tree_count + 1
1071
def _find_block(self, key, add_if_missing=False):
1072
"""Return the block that key should be present in.
1074
:param key: A dirstate entry key.
1075
:return: The block tuple.
1077
block_index, present = self._find_block_index_from_key(key)
1079
if not add_if_missing:
1080
# check to see if key is versioned itself - we might want to
1081
# add it anyway, because dirs with no entries dont get a
1082
# dirblock at parse time.
1083
# This is an uncommon branch to take: most dirs have children,
1084
# and most code works with versioned paths.
1085
parent_base, parent_name = osutils.split(key[0])
1086
if not self._get_block_entry_index(parent_base, parent_name, 0)[3]:
1087
# some parent path has not been added - its an error to add
1089
raise errors.NotVersionedError(key[0:2], str(self))
1090
self._dirblocks.insert(block_index, (key[0], []))
1091
return self._dirblocks[block_index]
1093
def _find_block_index_from_key(self, key):
1094
"""Find the dirblock index for a key.
1096
:return: The block index, True if the block for the key is present.
1098
if key[0:2] == ('', ''):
1101
if (self._last_block_index is not None and
1102
self._dirblocks[self._last_block_index][0] == key[0]):
1103
return self._last_block_index, True
1106
block_index = bisect_dirblock(self._dirblocks, key[0], 1,
1107
cache=self._split_path_cache)
1108
# _right returns one-past-where-key is so we have to subtract
1109
# one to use it. we use _right here because there are two
1110
# '' blocks - the root, and the contents of root
1111
# we always have a minimum of 2 in self._dirblocks: root and
1112
# root-contents, and for '', we get 2 back, so this is
1113
# simple and correct:
1114
present = (block_index < len(self._dirblocks) and
1115
self._dirblocks[block_index][0] == key[0])
1116
self._last_block_index = block_index
1117
# Reset the entry index cache to the beginning of the block.
1118
self._last_entry_index = -1
1119
return block_index, present
1121
def _find_entry_index(self, key, block):
1122
"""Find the entry index for a key in a block.
1124
:return: The entry index, True if the entry for the key is present.
1126
len_block = len(block)
1128
if self._last_entry_index is not None:
1130
entry_index = self._last_entry_index + 1
1131
# A hit is when the key is after the last slot, and before or
1132
# equal to the next slot.
1133
if ((entry_index > 0 and block[entry_index - 1][0] < key) and
1134
key <= block[entry_index][0]):
1135
self._last_entry_index = entry_index
1136
present = (block[entry_index][0] == key)
1137
return entry_index, present
1140
entry_index = bisect.bisect_left(block, (key, []))
1141
present = (entry_index < len_block and
1142
block[entry_index][0] == key)
1143
self._last_entry_index = entry_index
1144
return entry_index, present
1147
def from_tree(tree, dir_state_filename):
1148
"""Create a dirstate from a bzr Tree.
1150
:param tree: The tree which should provide parent information and
1152
:return: a DirState object which is currently locked for writing.
1153
(it was locked by DirState.initialize)
1155
result = DirState.initialize(dir_state_filename)
1159
parent_ids = tree.get_parent_ids()
1160
num_parents = len(parent_ids)
1162
for parent_id in parent_ids:
1163
parent_tree = tree.branch.repository.revision_tree(parent_id)
1164
parent_trees.append((parent_id, parent_tree))
1165
parent_tree.lock_read()
1166
result.set_parent_trees(parent_trees, [])
1167
result.set_state_from_inventory(tree.inventory)
1169
for revid, parent_tree in parent_trees:
1170
parent_tree.unlock()
1173
# The caller won't have a chance to unlock this, so make sure we
1179
def update_by_delta(self, delta):
1180
"""Apply an inventory delta to the dirstate for tree 0
1182
:param delta: An inventory delta. See Inventory.apply_delta for
1185
self._read_dirblocks_if_needed()
1188
for old_path, new_path, file_id, inv_entry in sorted(delta, reverse=True):
1189
if (file_id in insertions) or (file_id in removals):
1190
raise AssertionError("repeated file id in delta %r" % (file_id,))
1191
if old_path is not None:
1192
old_path = old_path.encode('utf-8')
1193
removals[file_id] = old_path
1194
if new_path is not None:
1195
new_path = new_path.encode('utf-8')
1196
dirname, basename = osutils.split(new_path)
1197
key = (dirname, basename, file_id)
1198
minikind = DirState._kind_to_minikind[inv_entry.kind]
1200
fingerprint = inv_entry.reference_revision
1203
insertions[file_id] = (key, minikind, inv_entry.executable,
1204
fingerprint, new_path)
1205
# Transform moves into delete+add pairs
1206
if None not in (old_path, new_path):
1207
for child in self._iter_child_entries(0, old_path):
1208
if child[0][2] in insertions or child[0][2] in removals:
1210
child_dirname = child[0][0]
1211
child_basename = child[0][1]
1212
minikind = child[1][0][0]
1213
fingerprint = child[1][0][4]
1214
executable = child[1][0][3]
1215
old_child_path = osutils.pathjoin(child[0][0],
1217
removals[child[0][2]] = old_child_path
1218
child_suffix = child_dirname[len(old_path):]
1219
new_child_dirname = (new_path + child_suffix)
1220
key = (new_child_dirname, child_basename, child[0][2])
1221
new_child_path = os.path.join(new_child_dirname,
1223
insertions[child[0][2]] = (key, minikind, executable,
1224
fingerprint, new_child_path)
1225
self._apply_removals(removals.values())
1226
self._apply_insertions(insertions.values())
1228
def _apply_removals(self, removals):
1229
for path in sorted(removals, reverse=True):
1230
dirname, basename = osutils.split(path)
1231
block_i, entry_i, d_present, f_present = \
1232
self._get_block_entry_index(dirname, basename, 0)
1233
entry = self._dirblocks[block_i][1][entry_i]
1234
self._make_absent(entry)
1235
# See if we have a malformed delta: deleting a directory must not
1236
# leave crud behind. This increases the number of bisects needed
1237
# substantially, but deletion or renames of large numbers of paths
1238
# is rare enough it shouldn't be an issue (famous last words?) RBC
1240
block_i, entry_i, d_present, f_present = \
1241
self._get_block_entry_index(path, '', 0)
1243
# The dir block is still present in the dirstate; this could
1244
# be due to it being in a parent tree, or a corrupt delta.
1245
for child_entry in self._dirblocks[block_i][1]:
1246
if child_entry[1][0][0] not in ('r', 'a'):
1247
raise errors.InconsistentDelta(path, entry[0][2],
1248
"The file id was deleted but its children were "
1251
def _apply_insertions(self, adds):
1252
for key, minikind, executable, fingerprint, path_utf8 in sorted(adds):
1253
self.update_minimal(key, minikind, executable, fingerprint,
1254
path_utf8=path_utf8)
1256
def update_basis_by_delta(self, delta, new_revid):
1257
"""Update the parents of this tree after a commit.
1259
This gives the tree one parent, with revision id new_revid. The
1260
inventory delta is applied to the current basis tree to generate the
1261
inventory for the parent new_revid, and all other parent trees are
1264
Note that an exception during the operation of this method will leave
1265
the dirstate in a corrupt state where it should not be saved.
1267
Finally, we expect all changes to be synchronising the basis tree with
1270
:param new_revid: The new revision id for the trees parent.
1271
:param delta: An inventory delta (see apply_inventory_delta) describing
1272
the changes from the current left most parent revision to new_revid.
1274
self._read_dirblocks_if_needed()
1275
self._discard_merge_parents()
1276
if self._ghosts != []:
1277
raise NotImplementedError(self.update_basis_by_delta)
1278
if len(self._parents) == 0:
1279
# setup a blank tree, the most simple way.
1280
empty_parent = DirState.NULL_PARENT_DETAILS
1281
for entry in self._iter_entries():
1282
entry[1].append(empty_parent)
1283
self._parents.append(new_revid)
1285
self._parents[0] = new_revid
1287
delta = sorted(delta, reverse=True)
1291
# The paths this function accepts are unicode and must be encoded as we
1293
encode = cache_utf8.encode
1294
inv_to_entry = self._inv_entry_to_details
1295
# delta is now (deletes, changes), (adds) in reverse lexographical
1297
# deletes in reverse lexographic order are safe to process in situ.
1298
# renames are not, as a rename from any path could go to a path
1299
# lexographically lower, so we transform renames into delete, add pairs,
1300
# expanding them recursively as needed.
1301
# At the same time, to reduce interface friction we convert the input
1302
# inventory entries to dirstate.
1303
root_only = ('', '')
1304
for old_path, new_path, file_id, inv_entry in delta:
1305
if old_path is None:
1306
adds.append((None, encode(new_path), file_id,
1307
inv_to_entry(inv_entry), True))
1308
elif new_path is None:
1309
deletes.append((encode(old_path), None, file_id, None, True))
1310
elif (old_path, new_path) != root_only:
1312
# Because renames must preserve their children we must have
1313
# processed all relocations and removes before hand. The sort
1314
# order ensures we've examined the child paths, but we also
1315
# have to execute the removals, or the split to an add/delete
1316
# pair will result in the deleted item being reinserted, or
1317
# renamed items being reinserted twice - and possibly at the
1318
# wrong place. Splitting into a delete/add pair also simplifies
1319
# the handling of entries with ('f', ...), ('r' ...) because
1320
# the target of the 'r' is old_path here, and we add that to
1321
# deletes, meaning that the add handler does not need to check
1322
# for 'r' items on every pass.
1323
self._update_basis_apply_deletes(deletes)
1325
new_path_utf8 = encode(new_path)
1326
# Split into an add/delete pair recursively.
1327
adds.append((None, new_path_utf8, file_id,
1328
inv_to_entry(inv_entry), False))
1329
# Expunge deletes that we've seen so that deleted/renamed
1330
# children of a rename directory are handled correctly.
1331
new_deletes = reversed(list(self._iter_child_entries(1,
1333
# Remove the current contents of the tree at orig_path, and
1334
# reinsert at the correct new path.
1335
for entry in new_deletes:
1337
source_path = entry[0][0] + '/' + entry[0][1]
1339
source_path = entry[0][1]
1341
target_path = new_path_utf8 + source_path[len(old_path):]
1344
raise AssertionError("cannot rename directory to"
1346
target_path = source_path[len(old_path) + 1:]
1347
adds.append((None, target_path, entry[0][2], entry[1][1], False))
1349
(source_path, target_path, entry[0][2], None, False))
1351
(encode(old_path), new_path, file_id, None, False))
1353
# changes to just the root should not require remove/insertion
1355
changes.append((encode(old_path), encode(new_path), file_id,
1356
inv_to_entry(inv_entry)))
1358
# Finish expunging deletes/first half of renames.
1359
self._update_basis_apply_deletes(deletes)
1360
# Reinstate second half of renames and new paths.
1361
self._update_basis_apply_adds(adds)
1362
# Apply in-situ changes.
1363
self._update_basis_apply_changes(changes)
1365
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
1366
self._header_state = DirState.IN_MEMORY_MODIFIED
1367
self._id_index = None
1370
def _update_basis_apply_adds(self, adds):
1371
"""Apply a sequence of adds to tree 1 during update_basis_by_delta.
1373
They may be adds, or renames that have been split into add/delete
1376
:param adds: A sequence of adds. Each add is a tuple:
1377
(None, new_path_utf8, file_id, (entry_details), real_add). real_add
1378
is False when the add is the second half of a remove-and-reinsert
1379
pair created to handle renames and deletes.
1381
# Adds are accumulated partly from renames, so can be in any input
1384
# adds is now in lexographic order, which places all parents before
1385
# their children, so we can process it linearly.
1387
for old_path, new_path, file_id, new_details, real_add in adds:
1388
# the entry for this file_id must be in tree 0.
1389
entry = self._get_entry(0, file_id, new_path)
1390
if entry[0] is None or entry[0][2] != file_id:
1391
self._changes_aborted = True
1392
raise errors.InconsistentDelta(new_path, file_id,
1393
'working tree does not contain new entry')
1394
if real_add and entry[1][1][0] not in absent:
1395
self._changes_aborted = True
1396
raise errors.InconsistentDelta(new_path, file_id,
1397
'The entry was considered to be a genuinely new record,'
1398
' but there was already an old record for it.')
1399
# We don't need to update the target of an 'r' because the handling
1400
# of renames turns all 'r' situations into a delete at the original
1402
entry[1][1] = new_details
1404
def _update_basis_apply_changes(self, changes):
1405
"""Apply a sequence of changes to tree 1 during update_basis_by_delta.
1407
:param adds: A sequence of changes. Each change is a tuple:
1408
(path_utf8, path_utf8, file_id, (entry_details))
1411
for old_path, new_path, file_id, new_details in changes:
1412
# the entry for this file_id must be in tree 0.
1413
entry = self._get_entry(0, file_id, new_path)
1414
if entry[0] is None or entry[0][2] != file_id:
1415
self._changes_aborted = True
1416
raise errors.InconsistentDelta(new_path, file_id,
1417
'working tree does not contain new entry')
1418
if (entry[1][0][0] in absent or
1419
entry[1][1][0] in absent):
1420
self._changes_aborted = True
1421
raise errors.InconsistentDelta(new_path, file_id,
1422
'changed considered absent')
1423
entry[1][1] = new_details
1425
def _update_basis_apply_deletes(self, deletes):
1426
"""Apply a sequence of deletes to tree 1 during update_basis_by_delta.
1428
They may be deletes, or renames that have been split into add/delete
1431
:param deletes: A sequence of deletes. Each delete is a tuple:
1432
(old_path_utf8, new_path_utf8, file_id, None, real_delete).
1433
real_delete is True when the desired outcome is an actual deletion
1434
rather than the rename handling logic temporarily deleting a path
1435
during the replacement of a parent.
1437
null = DirState.NULL_PARENT_DETAILS
1438
for old_path, new_path, file_id, _, real_delete in deletes:
1439
if real_delete != (new_path is None):
1440
raise AssertionError("bad delete delta")
1441
# the entry for this file_id must be in tree 1.
1442
dirname, basename = osutils.split(old_path)
1443
block_index, entry_index, dir_present, file_present = \
1444
self._get_block_entry_index(dirname, basename, 1)
1445
if not file_present:
1446
self._changes_aborted = True
1447
raise errors.InconsistentDelta(old_path, file_id,
1448
'basis tree does not contain removed entry')
1449
entry = self._dirblocks[block_index][1][entry_index]
1450
if entry[0][2] != file_id:
1451
self._changes_aborted = True
1452
raise errors.InconsistentDelta(old_path, file_id,
1453
'mismatched file_id in tree 1')
1455
if entry[1][0][0] != 'a':
1456
self._changes_aborted = True
1457
raise errors.InconsistentDelta(old_path, file_id,
1458
'This was marked as a real delete, but the WT state'
1459
' claims that it still exists and is versioned.')
1460
del self._dirblocks[block_index][1][entry_index]
1462
if entry[1][0][0] == 'a':
1463
self._changes_aborted = True
1464
raise errors.InconsistentDelta(old_path, file_id,
1465
'The entry was considered a rename, but the source path'
1466
' is marked as absent.')
1467
# For whatever reason, we were asked to rename an entry
1468
# that was originally marked as deleted. This could be
1469
# because we are renaming the parent directory, and the WT
1470
# current state has the file marked as deleted.
1471
elif entry[1][0][0] == 'r':
1472
# implement the rename
1473
del self._dirblocks[block_index][1][entry_index]
1475
# it is being resurrected here, so blank it out temporarily.
1476
self._dirblocks[block_index][1][entry_index][1][1] = null
1478
def _sha_cutoff_time(self):
1479
"""Return cutoff time.
1481
Files modified more recently than this time are at risk of being
1482
undetectably modified and so can't be cached.
1484
# Cache the cutoff time as long as we hold a lock.
1485
# time.time() isn't super expensive (approx 3.38us), but
1486
# when you call it 50,000 times it adds up.
1487
# For comparison, os.lstat() costs 7.2us if it is hot.
1488
self._cutoff_time = int(time.time()) - 3
1489
return self._cutoff_time
1491
def _lstat(self, abspath, entry):
1492
"""Return the os.lstat value for this path."""
1493
return os.lstat(abspath)
1495
def _sha1_file_and_mutter(self, abspath):
1496
# when -Dhashcache is turned on, this is monkey-patched in to log
1498
trace.mutter("dirstate sha1 " + abspath)
1499
return osutils.sha_file_by_name(abspath)
1501
def _is_executable(self, mode, old_executable):
1502
"""Is this file executable?"""
1503
return bool(S_IEXEC & mode)
1505
def _is_executable_win32(self, mode, old_executable):
1506
"""On win32 the executable bit is stored in the dirstate."""
1507
return old_executable
1509
if sys.platform == 'win32':
1510
_is_executable = _is_executable_win32
1512
def _read_link(self, abspath, old_link):
1513
"""Read the target of a symlink"""
1514
# TODO: jam 200700301 On Win32, this could just return the value
1515
# already in memory. However, this really needs to be done at a
1516
# higher level, because there either won't be anything on disk,
1517
# or the thing on disk will be a file.
1518
return os.readlink(abspath)
1520
def get_ghosts(self):
1521
"""Return a list of the parent tree revision ids that are ghosts."""
1522
self._read_header_if_needed()
1525
def get_lines(self):
1526
"""Serialise the entire dirstate to a sequence of lines."""
1527
if (self._header_state == DirState.IN_MEMORY_UNMODIFIED and
1528
self._dirblock_state == DirState.IN_MEMORY_UNMODIFIED):
1529
# read whats on disk.
1530
self._state_file.seek(0)
1531
return self._state_file.readlines()
1533
lines.append(self._get_parents_line(self.get_parent_ids()))
1534
lines.append(self._get_ghosts_line(self._ghosts))
1535
# append the root line which is special cased
1536
lines.extend(map(self._entry_to_line, self._iter_entries()))
1537
return self._get_output_lines(lines)
1539
def _get_ghosts_line(self, ghost_ids):
1540
"""Create a line for the state file for ghost information."""
1541
return '\0'.join([str(len(ghost_ids))] + ghost_ids)
1543
def _get_parents_line(self, parent_ids):
1544
"""Create a line for the state file for parents information."""
1545
return '\0'.join([str(len(parent_ids))] + parent_ids)
1547
def _get_fields_to_entry(self):
1548
"""Get a function which converts entry fields into a entry record.
1550
This handles size and executable, as well as parent records.
1552
:return: A function which takes a list of fields, and returns an
1553
appropriate record for storing in memory.
1555
# This is intentionally unrolled for performance
1556
num_present_parents = self._num_present_parents()
1557
if num_present_parents == 0:
1558
def fields_to_entry_0_parents(fields, _int=int):
1559
path_name_file_id_key = (fields[0], fields[1], fields[2])
1560
return (path_name_file_id_key, [
1562
fields[3], # minikind
1563
fields[4], # fingerprint
1564
_int(fields[5]), # size
1565
fields[6] == 'y', # executable
1566
fields[7], # packed_stat or revision_id
1568
return fields_to_entry_0_parents
1569
elif num_present_parents == 1:
1570
def fields_to_entry_1_parent(fields, _int=int):
1571
path_name_file_id_key = (fields[0], fields[1], fields[2])
1572
return (path_name_file_id_key, [
1574
fields[3], # minikind
1575
fields[4], # fingerprint
1576
_int(fields[5]), # size
1577
fields[6] == 'y', # executable
1578
fields[7], # packed_stat or revision_id
1581
fields[8], # minikind
1582
fields[9], # fingerprint
1583
_int(fields[10]), # size
1584
fields[11] == 'y', # executable
1585
fields[12], # packed_stat or revision_id
1588
return fields_to_entry_1_parent
1589
elif num_present_parents == 2:
1590
def fields_to_entry_2_parents(fields, _int=int):
1591
path_name_file_id_key = (fields[0], fields[1], fields[2])
1592
return (path_name_file_id_key, [
1594
fields[3], # minikind
1595
fields[4], # fingerprint
1596
_int(fields[5]), # size
1597
fields[6] == 'y', # executable
1598
fields[7], # packed_stat or revision_id
1601
fields[8], # minikind
1602
fields[9], # fingerprint
1603
_int(fields[10]), # size
1604
fields[11] == 'y', # executable
1605
fields[12], # packed_stat or revision_id
1608
fields[13], # minikind
1609
fields[14], # fingerprint
1610
_int(fields[15]), # size
1611
fields[16] == 'y', # executable
1612
fields[17], # packed_stat or revision_id
1615
return fields_to_entry_2_parents
1617
def fields_to_entry_n_parents(fields, _int=int):
1618
path_name_file_id_key = (fields[0], fields[1], fields[2])
1619
trees = [(fields[cur], # minikind
1620
fields[cur+1], # fingerprint
1621
_int(fields[cur+2]), # size
1622
fields[cur+3] == 'y', # executable
1623
fields[cur+4], # stat or revision_id
1624
) for cur in xrange(3, len(fields)-1, 5)]
1625
return path_name_file_id_key, trees
1626
return fields_to_entry_n_parents
1628
def get_parent_ids(self):
1629
"""Return a list of the parent tree ids for the directory state."""
1630
self._read_header_if_needed()
1631
return list(self._parents)
1633
def _get_block_entry_index(self, dirname, basename, tree_index):
1634
"""Get the coordinates for a path in the state structure.
1636
:param dirname: The utf8 dirname to lookup.
1637
:param basename: The utf8 basename to lookup.
1638
:param tree_index: The index of the tree for which this lookup should
1640
:return: A tuple describing where the path is located, or should be
1641
inserted. The tuple contains four fields: the block index, the row
1642
index, the directory is present (boolean), the entire path is
1643
present (boolean). There is no guarantee that either
1644
coordinate is currently reachable unless the found field for it is
1645
True. For instance, a directory not present in the searched tree
1646
may be returned with a value one greater than the current highest
1647
block offset. The directory present field will always be True when
1648
the path present field is True. The directory present field does
1649
NOT indicate that the directory is present in the searched tree,
1650
rather it indicates that there are at least some files in some
1653
self._read_dirblocks_if_needed()
1654
key = dirname, basename, ''
1655
block_index, present = self._find_block_index_from_key(key)
1657
# no such directory - return the dir index and 0 for the row.
1658
return block_index, 0, False, False
1659
block = self._dirblocks[block_index][1] # access the entries only
1660
entry_index, present = self._find_entry_index(key, block)
1661
# linear search through entries at this path to find the one
1663
while entry_index < len(block) and block[entry_index][0][1] == basename:
1664
if block[entry_index][1][tree_index][0] not in 'ar':
1665
# neither absent or relocated
1666
return block_index, entry_index, True, True
1668
return block_index, entry_index, True, False
1670
def _get_entry(self, tree_index, fileid_utf8=None, path_utf8=None):
1671
"""Get the dirstate entry for path in tree tree_index.
1673
If either file_id or path is supplied, it is used as the key to lookup.
1674
If both are supplied, the fastest lookup is used, and an error is
1675
raised if they do not both point at the same row.
1677
:param tree_index: The index of the tree we wish to locate this path
1678
in. If the path is present in that tree, the entry containing its
1679
details is returned, otherwise (None, None) is returned
1680
0 is the working tree, higher indexes are successive parent
1682
:param fileid_utf8: A utf8 file_id to look up.
1683
:param path_utf8: An utf8 path to be looked up.
1684
:return: The dirstate entry tuple for path, or (None, None)
1686
self._read_dirblocks_if_needed()
1687
if path_utf8 is not None:
1688
if type(path_utf8) is not str:
1689
raise AssertionError('path_utf8 is not a str: %s %s'
1690
% (type(path_utf8), path_utf8))
1691
# path lookups are faster
1692
dirname, basename = osutils.split(path_utf8)
1693
block_index, entry_index, dir_present, file_present = \
1694
self._get_block_entry_index(dirname, basename, tree_index)
1695
if not file_present:
1697
entry = self._dirblocks[block_index][1][entry_index]
1698
if not (entry[0][2] and entry[1][tree_index][0] not in ('a', 'r')):
1699
raise AssertionError('unversioned entry?')
1701
if entry[0][2] != fileid_utf8:
1702
self._changes_aborted = True
1703
raise errors.BzrError('integrity error ? : mismatching'
1704
' tree_index, file_id and path')
1707
possible_keys = self._get_id_index().get(fileid_utf8, None)
1708
if not possible_keys:
1710
for key in possible_keys:
1711
block_index, present = \
1712
self._find_block_index_from_key(key)
1713
# strange, probably indicates an out of date
1714
# id index - for now, allow this.
1717
# WARNING: DO not change this code to use _get_block_entry_index
1718
# as that function is not suitable: it does not use the key
1719
# to lookup, and thus the wrong coordinates are returned.
1720
block = self._dirblocks[block_index][1]
1721
entry_index, present = self._find_entry_index(key, block)
1723
entry = self._dirblocks[block_index][1][entry_index]
1724
if entry[1][tree_index][0] in 'fdlt':
1725
# this is the result we are looking for: the
1726
# real home of this file_id in this tree.
1728
if entry[1][tree_index][0] == 'a':
1729
# there is no home for this entry in this tree
1731
if entry[1][tree_index][0] != 'r':
1732
raise AssertionError(
1733
"entry %r has invalid minikind %r for tree %r" \
1735
entry[1][tree_index][0],
1737
real_path = entry[1][tree_index][1]
1738
return self._get_entry(tree_index, fileid_utf8=fileid_utf8,
1739
path_utf8=real_path)
1743
def initialize(cls, path):
1744
"""Create a new dirstate on path.
1746
The new dirstate will be an empty tree - that is it has no parents,
1747
and only a root node - which has id ROOT_ID.
1749
:param path: The name of the file for the dirstate.
1750
:return: A write-locked DirState object.
1752
# This constructs a new DirState object on a path, sets the _state_file
1753
# to a new empty file for that path. It then calls _set_data() with our
1754
# stock empty dirstate information - a root with ROOT_ID, no children,
1755
# and no parents. Finally it calls save() to ensure that this data will
1758
# root dir and root dir contents with no children.
1759
empty_tree_dirblocks = [('', []), ('', [])]
1760
# a new root directory, with a NULLSTAT.
1761
empty_tree_dirblocks[0][1].append(
1762
(('', '', inventory.ROOT_ID), [
1763
('d', '', 0, False, DirState.NULLSTAT),
1767
result._set_data([], empty_tree_dirblocks)
1775
def _inv_entry_to_details(inv_entry):
1776
"""Convert an inventory entry (from a revision tree) to state details.
1778
:param inv_entry: An inventory entry whose sha1 and link targets can be
1779
relied upon, and which has a revision set.
1780
:return: A details tuple - the details for a single tree at a path +
1783
kind = inv_entry.kind
1784
minikind = DirState._kind_to_minikind[kind]
1785
tree_data = inv_entry.revision
1786
if kind == 'directory':
1790
elif kind == 'symlink':
1791
# We don't support non-ascii targets for symlinks yet.
1792
fingerprint = str(inv_entry.symlink_target or '')
1795
elif kind == 'file':
1796
fingerprint = inv_entry.text_sha1 or ''
1797
size = inv_entry.text_size or 0
1798
executable = inv_entry.executable
1799
elif kind == 'tree-reference':
1800
fingerprint = inv_entry.reference_revision or ''
1804
raise Exception("can't pack %s" % inv_entry)
1805
return (minikind, fingerprint, size, executable, tree_data)
1807
def _iter_child_entries(self, tree_index, path_utf8):
1808
"""Iterate over all the entries that are children of path_utf.
1810
This only returns entries that are present (not in 'a', 'r') in
1811
tree_index. tree_index data is not refreshed, so if tree 0 is used,
1812
results may differ from that obtained if paths were statted to
1813
determine what ones were directories.
1815
Asking for the children of a non-directory will return an empty
1819
next_pending_dirs = [path_utf8]
1821
while next_pending_dirs:
1822
pending_dirs = next_pending_dirs
1823
next_pending_dirs = []
1824
for path in pending_dirs:
1825
block_index, present = self._find_block_index_from_key(
1827
if block_index == 0:
1829
if len(self._dirblocks) == 1:
1830
# asked for the children of the root with no other
1834
# children of a non-directory asked for.
1836
block = self._dirblocks[block_index]
1837
for entry in block[1]:
1838
kind = entry[1][tree_index][0]
1839
if kind not in absent:
1843
path = entry[0][0] + '/' + entry[0][1]
1846
next_pending_dirs.append(path)
1848
def _iter_entries(self):
1849
"""Iterate over all the entries in the dirstate.
1851
Each yelt item is an entry in the standard format described in the
1852
docstring of bzrlib.dirstate.
1854
self._read_dirblocks_if_needed()
1855
for directory in self._dirblocks:
1856
for entry in directory[1]:
1859
def _get_id_index(self):
1860
"""Get an id index of self._dirblocks."""
1861
if self._id_index is None:
1863
for key, tree_details in self._iter_entries():
1864
id_index.setdefault(key[2], set()).add(key)
1865
self._id_index = id_index
1866
return self._id_index
1868
def _get_output_lines(self, lines):
1869
"""Format lines for final output.
1871
:param lines: A sequence of lines containing the parents list and the
1874
output_lines = [DirState.HEADER_FORMAT_3]
1875
lines.append('') # a final newline
1876
inventory_text = '\0\n\0'.join(lines)
1877
output_lines.append('crc32: %s\n' % (zlib.crc32(inventory_text),))
1878
# -3, 1 for num parents, 1 for ghosts, 1 for final newline
1879
num_entries = len(lines)-3
1880
output_lines.append('num_entries: %s\n' % (num_entries,))
1881
output_lines.append(inventory_text)
1884
def _make_deleted_row(self, fileid_utf8, parents):
1885
"""Return a deleted row for fileid_utf8."""
1886
return ('/', 'RECYCLED.BIN', 'file', fileid_utf8, 0, DirState.NULLSTAT,
1889
def _num_present_parents(self):
1890
"""The number of parent entries in each record row."""
1891
return len(self._parents) - len(self._ghosts)
1895
"""Construct a DirState on the file at path path.
1897
:return: An unlocked DirState object, associated with the given path.
1899
result = DirState(path)
1902
def _read_dirblocks_if_needed(self):
1903
"""Read in all the dirblocks from the file if they are not in memory.
1905
This populates self._dirblocks, and sets self._dirblock_state to
1906
IN_MEMORY_UNMODIFIED. It is not currently ready for incremental block
1909
self._read_header_if_needed()
1910
if self._dirblock_state == DirState.NOT_IN_MEMORY:
1911
_read_dirblocks(self)
1913
def _read_header(self):
1914
"""This reads in the metadata header, and the parent ids.
1916
After reading in, the file should be positioned at the null
1917
just before the start of the first record in the file.
1919
:return: (expected crc checksum, number of entries, parent list)
1921
self._read_prelude()
1922
parent_line = self._state_file.readline()
1923
info = parent_line.split('\0')
1924
num_parents = int(info[0])
1925
self._parents = info[1:-1]
1926
ghost_line = self._state_file.readline()
1927
info = ghost_line.split('\0')
1928
num_ghosts = int(info[1])
1929
self._ghosts = info[2:-1]
1930
self._header_state = DirState.IN_MEMORY_UNMODIFIED
1931
self._end_of_header = self._state_file.tell()
1933
def _read_header_if_needed(self):
1934
"""Read the header of the dirstate file if needed."""
1935
# inline this as it will be called a lot
1936
if not self._lock_token:
1937
raise errors.ObjectNotLocked(self)
1938
if self._header_state == DirState.NOT_IN_MEMORY:
1941
def _read_prelude(self):
1942
"""Read in the prelude header of the dirstate file.
1944
This only reads in the stuff that is not connected to the crc
1945
checksum. The position will be correct to read in the rest of
1946
the file and check the checksum after this point.
1947
The next entry in the file should be the number of parents,
1948
and their ids. Followed by a newline.
1950
header = self._state_file.readline()
1951
if header != DirState.HEADER_FORMAT_3:
1952
raise errors.BzrError(
1953
'invalid header line: %r' % (header,))
1954
crc_line = self._state_file.readline()
1955
if not crc_line.startswith('crc32: '):
1956
raise errors.BzrError('missing crc32 checksum: %r' % crc_line)
1957
self.crc_expected = int(crc_line[len('crc32: '):-1])
1958
num_entries_line = self._state_file.readline()
1959
if not num_entries_line.startswith('num_entries: '):
1960
raise errors.BzrError('missing num_entries line')
1961
self._num_entries = int(num_entries_line[len('num_entries: '):-1])
1963
def sha1_from_stat(self, path, stat_result, _pack_stat=pack_stat):
1964
"""Find a sha1 given a stat lookup."""
1965
return self._get_packed_stat_index().get(_pack_stat(stat_result), None)
1967
def _get_packed_stat_index(self):
1968
"""Get a packed_stat index of self._dirblocks."""
1969
if self._packed_stat_index is None:
1971
for key, tree_details in self._iter_entries():
1972
if tree_details[0][0] == 'f':
1973
index[tree_details[0][4]] = tree_details[0][1]
1974
self._packed_stat_index = index
1975
return self._packed_stat_index
1978
"""Save any pending changes created during this session.
1980
We reuse the existing file, because that prevents race conditions with
1981
file creation, and use oslocks on it to prevent concurrent modification
1982
and reads - because dirstate's incremental data aggregation is not
1983
compatible with reading a modified file, and replacing a file in use by
1984
another process is impossible on Windows.
1986
A dirstate in read only mode should be smart enough though to validate
1987
that the file has not changed, and otherwise discard its cache and
1988
start over, to allow for fine grained read lock duration, so 'status'
1989
wont block 'commit' - for example.
1991
if self._changes_aborted:
1992
# Should this be a warning? For now, I'm expecting that places that
1993
# mark it inconsistent will warn, making a warning here redundant.
1994
trace.mutter('Not saving DirState because '
1995
'_changes_aborted is set.')
1997
if (self._header_state == DirState.IN_MEMORY_MODIFIED or
1998
self._dirblock_state == DirState.IN_MEMORY_MODIFIED):
2000
grabbed_write_lock = False
2001
if self._lock_state != 'w':
2002
grabbed_write_lock, new_lock = self._lock_token.temporary_write_lock()
2003
# Switch over to the new lock, as the old one may be closed.
2004
# TODO: jam 20070315 We should validate the disk file has
2005
# not changed contents. Since temporary_write_lock may
2006
# not be an atomic operation.
2007
self._lock_token = new_lock
2008
self._state_file = new_lock.f
2009
if not grabbed_write_lock:
2010
# We couldn't grab a write lock, so we switch back to a read one
2013
self._state_file.seek(0)
2014
self._state_file.writelines(self.get_lines())
2015
self._state_file.truncate()
2016
self._state_file.flush()
2017
self._header_state = DirState.IN_MEMORY_UNMODIFIED
2018
self._dirblock_state = DirState.IN_MEMORY_UNMODIFIED
2020
if grabbed_write_lock:
2021
self._lock_token = self._lock_token.restore_read_lock()
2022
self._state_file = self._lock_token.f
2023
# TODO: jam 20070315 We should validate the disk file has
2024
# not changed contents. Since restore_read_lock may
2025
# not be an atomic operation.
2027
def _set_data(self, parent_ids, dirblocks):
2028
"""Set the full dirstate data in memory.
2030
This is an internal function used to completely replace the objects
2031
in memory state. It puts the dirstate into state 'full-dirty'.
2033
:param parent_ids: A list of parent tree revision ids.
2034
:param dirblocks: A list containing one tuple for each directory in the
2035
tree. Each tuple contains the directory path and a list of entries
2036
found in that directory.
2038
# our memory copy is now authoritative.
2039
self._dirblocks = dirblocks
2040
self._header_state = DirState.IN_MEMORY_MODIFIED
2041
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2042
self._parents = list(parent_ids)
2043
self._id_index = None
2044
self._packed_stat_index = None
2046
def set_path_id(self, path, new_id):
2047
"""Change the id of path to new_id in the current working tree.
2049
:param path: The path inside the tree to set - '' is the root, 'foo'
2050
is the path foo in the root.
2051
:param new_id: The new id to assign to the path. This must be a utf8
2052
file id (not unicode, and not None).
2054
self._read_dirblocks_if_needed()
2056
# TODO: logic not written
2057
raise NotImplementedError(self.set_path_id)
2058
# TODO: check new id is unique
2059
entry = self._get_entry(0, path_utf8=path)
2060
if entry[0][2] == new_id:
2061
# Nothing to change.
2063
# mark the old path absent, and insert a new root path
2064
self._make_absent(entry)
2065
self.update_minimal(('', '', new_id), 'd',
2066
path_utf8='', packed_stat=entry[1][0][4])
2067
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2068
if self._id_index is not None:
2069
self._id_index.setdefault(new_id, set()).add(entry[0])
2071
def set_parent_trees(self, trees, ghosts):
2072
"""Set the parent trees for the dirstate.
2074
:param trees: A list of revision_id, tree tuples. tree must be provided
2075
even if the revision_id refers to a ghost: supply an empty tree in
2077
:param ghosts: A list of the revision_ids that are ghosts at the time
2080
# TODO: generate a list of parent indexes to preserve to save
2081
# processing specific parent trees. In the common case one tree will
2082
# be preserved - the left most parent.
2083
# TODO: if the parent tree is a dirstate, we might want to walk them
2084
# all by path in parallel for 'optimal' common-case performance.
2085
# generate new root row.
2086
self._read_dirblocks_if_needed()
2087
# TODO future sketch: Examine the existing parents to generate a change
2088
# map and then walk the new parent trees only, mapping them into the
2089
# dirstate. Walk the dirstate at the same time to remove unreferenced
2092
# sketch: loop over all entries in the dirstate, cherry picking
2093
# entries from the parent trees, if they are not ghost trees.
2094
# after we finish walking the dirstate, all entries not in the dirstate
2095
# are deletes, so we want to append them to the end as per the design
2096
# discussions. So do a set difference on ids with the parents to
2097
# get deletes, and add them to the end.
2098
# During the update process we need to answer the following questions:
2099
# - find other keys containing a fileid in order to create cross-path
2100
# links. We dont't trivially use the inventory from other trees
2101
# because this leads to either double touching, or to accessing
2103
# - find other keys containing a path
2104
# We accumulate each entry via this dictionary, including the root
2107
# we could do parallel iterators, but because file id data may be
2108
# scattered throughout, we dont save on index overhead: we have to look
2109
# at everything anyway. We can probably save cycles by reusing parent
2110
# data and doing an incremental update when adding an additional
2111
# parent, but for now the common cases are adding a new parent (merge),
2112
# and replacing completely (commit), and commit is more common: so
2113
# optimise merge later.
2115
# ---- start generation of full tree mapping data
2116
# what trees should we use?
2117
parent_trees = [tree for rev_id, tree in trees if rev_id not in ghosts]
2118
# how many trees do we end up with
2119
parent_count = len(parent_trees)
2121
# one: the current tree
2122
for entry in self._iter_entries():
2123
# skip entries not in the current tree
2124
if entry[1][0][0] in 'ar': # absent, relocated
2126
by_path[entry[0]] = [entry[1][0]] + \
2127
[DirState.NULL_PARENT_DETAILS] * parent_count
2128
id_index[entry[0][2]] = set([entry[0]])
2130
# now the parent trees:
2131
for tree_index, tree in enumerate(parent_trees):
2132
# the index is off by one, adjust it.
2133
tree_index = tree_index + 1
2134
# when we add new locations for a fileid we need these ranges for
2135
# any fileid in this tree as we set the by_path[id] to:
2136
# already_processed_tree_details + new_details + new_location_suffix
2137
# the suffix is from tree_index+1:parent_count+1.
2138
new_location_suffix = [DirState.NULL_PARENT_DETAILS] * (parent_count - tree_index)
2139
# now stitch in all the entries from this tree
2140
for path, entry in tree.inventory.iter_entries_by_dir():
2141
# here we process each trees details for each item in the tree.
2142
# we first update any existing entries for the id at other paths,
2143
# then we either create or update the entry for the id at the
2144
# right path, and finally we add (if needed) a mapping from
2145
# file_id to this path. We do it in this order to allow us to
2146
# avoid checking all known paths for the id when generating a
2147
# new entry at this path: by adding the id->path mapping last,
2148
# all the mappings are valid and have correct relocation
2149
# records where needed.
2150
file_id = entry.file_id
2151
path_utf8 = path.encode('utf8')
2152
dirname, basename = osutils.split(path_utf8)
2153
new_entry_key = (dirname, basename, file_id)
2154
# tree index consistency: All other paths for this id in this tree
2155
# index must point to the correct path.
2156
for entry_key in id_index.setdefault(file_id, set()):
2157
# TODO:PROFILING: It might be faster to just update
2158
# rather than checking if we need to, and then overwrite
2159
# the one we are located at.
2160
if entry_key != new_entry_key:
2161
# this file id is at a different path in one of the
2162
# other trees, so put absent pointers there
2163
# This is the vertical axis in the matrix, all pointing
2165
by_path[entry_key][tree_index] = ('r', path_utf8, 0, False, '')
2166
# by path consistency: Insert into an existing path record (trivial), or
2167
# add a new one with relocation pointers for the other tree indexes.
2168
if new_entry_key in id_index[file_id]:
2169
# there is already an entry where this data belongs, just insert it.
2170
by_path[new_entry_key][tree_index] = \
2171
self._inv_entry_to_details(entry)
2173
# add relocated entries to the horizontal axis - this row
2174
# mapping from path,id. We need to look up the correct path
2175
# for the indexes from 0 to tree_index -1
2177
for lookup_index in xrange(tree_index):
2178
# boundary case: this is the first occurence of file_id
2179
# so there are no id_indexs, possibly take this out of
2181
if not len(id_index[file_id]):
2182
new_details.append(DirState.NULL_PARENT_DETAILS)
2184
# grab any one entry, use it to find the right path.
2185
# TODO: optimise this to reduce memory use in highly
2186
# fragmented situations by reusing the relocation
2188
a_key = iter(id_index[file_id]).next()
2189
if by_path[a_key][lookup_index][0] in ('r', 'a'):
2190
# its a pointer or missing statement, use it as is.
2191
new_details.append(by_path[a_key][lookup_index])
2193
# we have the right key, make a pointer to it.
2194
real_path = ('/'.join(a_key[0:2])).strip('/')
2195
new_details.append(('r', real_path, 0, False, ''))
2196
new_details.append(self._inv_entry_to_details(entry))
2197
new_details.extend(new_location_suffix)
2198
by_path[new_entry_key] = new_details
2199
id_index[file_id].add(new_entry_key)
2200
# --- end generation of full tree mappings
2202
# sort and output all the entries
2203
new_entries = self._sort_entries(by_path.items())
2204
self._entries_to_current_state(new_entries)
2205
self._parents = [rev_id for rev_id, tree in trees]
2206
self._ghosts = list(ghosts)
2207
self._header_state = DirState.IN_MEMORY_MODIFIED
2208
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2209
self._id_index = id_index
2211
def _sort_entries(self, entry_list):
2212
"""Given a list of entries, sort them into the right order.
2214
This is done when constructing a new dirstate from trees - normally we
2215
try to keep everything in sorted blocks all the time, but sometimes
2216
it's easier to sort after the fact.
2219
# sort by: directory parts, file name, file id
2220
return entry[0][0].split('/'), entry[0][1], entry[0][2]
2221
return sorted(entry_list, key=_key)
2223
def set_state_from_inventory(self, new_inv):
2224
"""Set new_inv as the current state.
2226
This API is called by tree transform, and will usually occur with
2227
existing parent trees.
2229
:param new_inv: The inventory object to set current state from.
2231
if 'evil' in debug.debug_flags:
2232
trace.mutter_callsite(1,
2233
"set_state_from_inventory called; please mutate the tree instead")
2234
self._read_dirblocks_if_needed()
2236
# Two iterators: current data and new data, both in dirblock order.
2237
# We zip them together, which tells about entries that are new in the
2238
# inventory, or removed in the inventory, or present in both and
2241
# You might think we could just synthesize a new dirstate directly
2242
# since we're processing it in the right order. However, we need to
2243
# also consider there may be any number of parent trees and relocation
2244
# pointers, and we don't want to duplicate that here.
2245
new_iterator = new_inv.iter_entries_by_dir()
2246
# we will be modifying the dirstate, so we need a stable iterator. In
2247
# future we might write one, for now we just clone the state into a
2248
# list - which is a shallow copy.
2249
old_iterator = iter(list(self._iter_entries()))
2250
# both must have roots so this is safe:
2251
current_new = new_iterator.next()
2252
current_old = old_iterator.next()
2253
def advance(iterator):
2255
return iterator.next()
2256
except StopIteration:
2258
while current_new or current_old:
2259
# skip entries in old that are not really there
2260
if current_old and current_old[1][0][0] in 'ar':
2261
# relocated or absent
2262
current_old = advance(old_iterator)
2265
# convert new into dirblock style
2266
new_path_utf8 = current_new[0].encode('utf8')
2267
new_dirname, new_basename = osutils.split(new_path_utf8)
2268
new_id = current_new[1].file_id
2269
new_entry_key = (new_dirname, new_basename, new_id)
2270
current_new_minikind = \
2271
DirState._kind_to_minikind[current_new[1].kind]
2272
if current_new_minikind == 't':
2273
fingerprint = current_new[1].reference_revision or ''
2275
# We normally only insert or remove records, or update
2276
# them when it has significantly changed. Then we want to
2277
# erase its fingerprint. Unaffected records should
2278
# normally not be updated at all.
2281
# for safety disable variables
2282
new_path_utf8 = new_dirname = new_basename = new_id = \
2283
new_entry_key = None
2284
# 5 cases, we dont have a value that is strictly greater than everything, so
2285
# we make both end conditions explicit
2287
# old is finished: insert current_new into the state.
2288
self.update_minimal(new_entry_key, current_new_minikind,
2289
executable=current_new[1].executable,
2290
path_utf8=new_path_utf8, fingerprint=fingerprint)
2291
current_new = advance(new_iterator)
2292
elif not current_new:
2294
self._make_absent(current_old)
2295
current_old = advance(old_iterator)
2296
elif new_entry_key == current_old[0]:
2297
# same - common case
2298
# We're looking at the same path and id in both the dirstate
2299
# and inventory, so just need to update the fields in the
2300
# dirstate from the one in the inventory.
2301
# TODO: update the record if anything significant has changed.
2302
# the minimal required trigger is if the execute bit or cached
2304
if (current_old[1][0][3] != current_new[1].executable or
2305
current_old[1][0][0] != current_new_minikind):
2306
self.update_minimal(current_old[0], current_new_minikind,
2307
executable=current_new[1].executable,
2308
path_utf8=new_path_utf8, fingerprint=fingerprint)
2309
# both sides are dealt with, move on
2310
current_old = advance(old_iterator)
2311
current_new = advance(new_iterator)
2312
elif (cmp_by_dirs(new_dirname, current_old[0][0]) < 0
2313
or (new_dirname == current_old[0][0]
2314
and new_entry_key[1:] < current_old[0][1:])):
2316
# add a entry for this and advance new
2317
self.update_minimal(new_entry_key, current_new_minikind,
2318
executable=current_new[1].executable,
2319
path_utf8=new_path_utf8, fingerprint=fingerprint)
2320
current_new = advance(new_iterator)
2322
# we've advanced past the place where the old key would be,
2323
# without seeing it in the new list. so it must be gone.
2324
self._make_absent(current_old)
2325
current_old = advance(old_iterator)
2326
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2327
self._id_index = None
2328
self._packed_stat_index = None
2330
def _make_absent(self, current_old):
2331
"""Mark current_old - an entry - as absent for tree 0.
2333
:return: True if this was the last details entry for the entry key:
2334
that is, if the underlying block has had the entry removed, thus
2335
shrinking in length.
2337
# build up paths that this id will be left at after the change is made,
2338
# so we can update their cross references in tree 0
2339
all_remaining_keys = set()
2340
# Dont check the working tree, because it's going.
2341
for details in current_old[1][1:]:
2342
if details[0] not in 'ar': # absent, relocated
2343
all_remaining_keys.add(current_old[0])
2344
elif details[0] == 'r': # relocated
2345
# record the key for the real path.
2346
all_remaining_keys.add(tuple(osutils.split(details[1])) + (current_old[0][2],))
2347
# absent rows are not present at any path.
2348
last_reference = current_old[0] not in all_remaining_keys
2350
# the current row consists entire of the current item (being marked
2351
# absent), and relocated or absent entries for the other trees:
2352
# Remove it, its meaningless.
2353
block = self._find_block(current_old[0])
2354
entry_index, present = self._find_entry_index(current_old[0], block[1])
2356
raise AssertionError('could not find entry for %s' % (current_old,))
2357
block[1].pop(entry_index)
2358
# if we have an id_index in use, remove this key from it for this id.
2359
if self._id_index is not None:
2360
self._id_index[current_old[0][2]].remove(current_old[0])
2361
# update all remaining keys for this id to record it as absent. The
2362
# existing details may either be the record we are marking as deleted
2363
# (if there were other trees with the id present at this path), or may
2365
for update_key in all_remaining_keys:
2366
update_block_index, present = \
2367
self._find_block_index_from_key(update_key)
2369
raise AssertionError('could not find block for %s' % (update_key,))
2370
update_entry_index, present = \
2371
self._find_entry_index(update_key, self._dirblocks[update_block_index][1])
2373
raise AssertionError('could not find entry for %s' % (update_key,))
2374
update_tree_details = self._dirblocks[update_block_index][1][update_entry_index][1]
2375
# it must not be absent at the moment
2376
if update_tree_details[0][0] == 'a': # absent
2377
raise AssertionError('bad row %r' % (update_tree_details,))
2378
update_tree_details[0] = DirState.NULL_PARENT_DETAILS
2379
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2380
return last_reference
2382
def update_minimal(self, key, minikind, executable=False, fingerprint='',
2383
packed_stat=None, size=0, path_utf8=None):
2384
"""Update an entry to the state in tree 0.
2386
This will either create a new entry at 'key' or update an existing one.
2387
It also makes sure that any other records which might mention this are
2390
:param key: (dir, name, file_id) for the new entry
2391
:param minikind: The type for the entry ('f' == 'file', 'd' ==
2393
:param executable: Should the executable bit be set?
2394
:param fingerprint: Simple fingerprint for new entry: sha1 for files,
2395
referenced revision id for subtrees, etc.
2396
:param packed_stat: Packed stat value for new entry.
2397
:param size: Size information for new entry
2398
:param path_utf8: key[0] + '/' + key[1], just passed in to avoid doing
2401
If packed_stat and fingerprint are not given, they're invalidated in
2404
block = self._find_block(key)[1]
2405
if packed_stat is None:
2406
packed_stat = DirState.NULLSTAT
2407
# XXX: Some callers pass '' as the packed_stat, and it seems to be
2408
# sometimes present in the dirstate - this seems oddly inconsistent.
2410
entry_index, present = self._find_entry_index(key, block)
2411
new_details = (minikind, fingerprint, size, executable, packed_stat)
2412
id_index = self._get_id_index()
2414
# new entry, synthesis cross reference here,
2415
existing_keys = id_index.setdefault(key[2], set())
2416
if not existing_keys:
2417
# not currently in the state, simplest case
2418
new_entry = key, [new_details] + self._empty_parent_info()
2420
# present at one or more existing other paths.
2421
# grab one of them and use it to generate parent
2422
# relocation/absent entries.
2423
new_entry = key, [new_details]
2424
for other_key in existing_keys:
2425
# change the record at other to be a pointer to this new
2426
# record. The loop looks similar to the change to
2427
# relocations when updating an existing record but its not:
2428
# the test for existing kinds is different: this can be
2429
# factored out to a helper though.
2430
other_block_index, present = self._find_block_index_from_key(other_key)
2432
raise AssertionError('could not find block for %s' % (other_key,))
2433
other_entry_index, present = self._find_entry_index(other_key,
2434
self._dirblocks[other_block_index][1])
2436
raise AssertionError('could not find entry for %s' % (other_key,))
2437
if path_utf8 is None:
2438
raise AssertionError('no path')
2439
self._dirblocks[other_block_index][1][other_entry_index][1][0] = \
2440
('r', path_utf8, 0, False, '')
2442
num_present_parents = self._num_present_parents()
2443
for lookup_index in xrange(1, num_present_parents + 1):
2444
# grab any one entry, use it to find the right path.
2445
# TODO: optimise this to reduce memory use in highly
2446
# fragmented situations by reusing the relocation
2448
update_block_index, present = \
2449
self._find_block_index_from_key(other_key)
2451
raise AssertionError('could not find block for %s' % (other_key,))
2452
update_entry_index, present = \
2453
self._find_entry_index(other_key, self._dirblocks[update_block_index][1])
2455
raise AssertionError('could not find entry for %s' % (other_key,))
2456
update_details = self._dirblocks[update_block_index][1][update_entry_index][1][lookup_index]
2457
if update_details[0] in 'ar': # relocated, absent
2458
# its a pointer or absent in lookup_index's tree, use
2460
new_entry[1].append(update_details)
2462
# we have the right key, make a pointer to it.
2463
pointer_path = osutils.pathjoin(*other_key[0:2])
2464
new_entry[1].append(('r', pointer_path, 0, False, ''))
2465
block.insert(entry_index, new_entry)
2466
existing_keys.add(key)
2468
# Does the new state matter?
2469
block[entry_index][1][0] = new_details
2470
# parents cannot be affected by what we do.
2471
# other occurences of this id can be found
2472
# from the id index.
2474
# tree index consistency: All other paths for this id in this tree
2475
# index must point to the correct path. We have to loop here because
2476
# we may have passed entries in the state with this file id already
2477
# that were absent - where parent entries are - and they need to be
2478
# converted to relocated.
2479
if path_utf8 is None:
2480
raise AssertionError('no path')
2481
for entry_key in id_index.setdefault(key[2], set()):
2482
# TODO:PROFILING: It might be faster to just update
2483
# rather than checking if we need to, and then overwrite
2484
# the one we are located at.
2485
if entry_key != key:
2486
# this file id is at a different path in one of the
2487
# other trees, so put absent pointers there
2488
# This is the vertical axis in the matrix, all pointing
2490
block_index, present = self._find_block_index_from_key(entry_key)
2492
raise AssertionError('not present: %r', entry_key)
2493
entry_index, present = self._find_entry_index(entry_key, self._dirblocks[block_index][1])
2495
raise AssertionError('not present: %r', entry_key)
2496
self._dirblocks[block_index][1][entry_index][1][0] = \
2497
('r', path_utf8, 0, False, '')
2498
# add a containing dirblock if needed.
2499
if new_details[0] == 'd':
2500
subdir_key = (osutils.pathjoin(*key[0:2]), '', '')
2501
block_index, present = self._find_block_index_from_key(subdir_key)
2503
self._dirblocks.insert(block_index, (subdir_key[0], []))
2505
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2507
def _validate(self):
2508
"""Check that invariants on the dirblock are correct.
2510
This can be useful in debugging; it shouldn't be necessary in
2513
This must be called with a lock held.
2515
# NOTE: This must always raise AssertionError not just assert,
2516
# otherwise it may not behave properly under python -O
2518
# TODO: All entries must have some content that's not 'a' or 'r',
2519
# otherwise it could just be removed.
2521
# TODO: All relocations must point directly to a real entry.
2523
# TODO: No repeated keys.
2526
from pprint import pformat
2527
self._read_dirblocks_if_needed()
2528
if len(self._dirblocks) > 0:
2529
if not self._dirblocks[0][0] == '':
2530
raise AssertionError(
2531
"dirblocks don't start with root block:\n" + \
2532
pformat(self._dirblocks))
2533
if len(self._dirblocks) > 1:
2534
if not self._dirblocks[1][0] == '':
2535
raise AssertionError(
2536
"dirblocks missing root directory:\n" + \
2537
pformat(self._dirblocks))
2538
# the dirblocks are sorted by their path components, name, and dir id
2539
dir_names = [d[0].split('/')
2540
for d in self._dirblocks[1:]]
2541
if dir_names != sorted(dir_names):
2542
raise AssertionError(
2543
"dir names are not in sorted order:\n" + \
2544
pformat(self._dirblocks) + \
2547
for dirblock in self._dirblocks:
2548
# within each dirblock, the entries are sorted by filename and
2550
for entry in dirblock[1]:
2551
if dirblock[0] != entry[0][0]:
2552
raise AssertionError(
2554
"doesn't match directory name in\n%r" %
2555
(entry, pformat(dirblock)))
2556
if dirblock[1] != sorted(dirblock[1]):
2557
raise AssertionError(
2558
"dirblock for %r is not sorted:\n%s" % \
2559
(dirblock[0], pformat(dirblock)))
2561
def check_valid_parent():
2562
"""Check that the current entry has a valid parent.
2564
This makes sure that the parent has a record,
2565
and that the parent isn't marked as "absent" in the
2566
current tree. (It is invalid to have a non-absent file in an absent
2569
if entry[0][0:2] == ('', ''):
2570
# There should be no parent for the root row
2572
parent_entry = self._get_entry(tree_index, path_utf8=entry[0][0])
2573
if parent_entry == (None, None):
2574
raise AssertionError(
2575
"no parent entry for: %s in tree %s"
2576
% (this_path, tree_index))
2577
if parent_entry[1][tree_index][0] != 'd':
2578
raise AssertionError(
2579
"Parent entry for %s is not marked as a valid"
2580
" directory. %s" % (this_path, parent_entry,))
2582
# For each file id, for each tree: either
2583
# the file id is not present at all; all rows with that id in the
2584
# key have it marked as 'absent'
2585
# OR the file id is present under exactly one name; any other entries
2586
# that mention that id point to the correct name.
2588
# We check this with a dict per tree pointing either to the present
2589
# name, or None if absent.
2590
tree_count = self._num_present_parents() + 1
2591
id_path_maps = [dict() for i in range(tree_count)]
2592
# Make sure that all renamed entries point to the correct location.
2593
for entry in self._iter_entries():
2594
file_id = entry[0][2]
2595
this_path = osutils.pathjoin(entry[0][0], entry[0][1])
2596
if len(entry[1]) != tree_count:
2597
raise AssertionError(
2598
"wrong number of entry details for row\n%s" \
2599
",\nexpected %d" % \
2600
(pformat(entry), tree_count))
2601
absent_positions = 0
2602
for tree_index, tree_state in enumerate(entry[1]):
2603
this_tree_map = id_path_maps[tree_index]
2604
minikind = tree_state[0]
2605
if minikind in 'ar':
2606
absent_positions += 1
2607
# have we seen this id before in this column?
2608
if file_id in this_tree_map:
2609
previous_path, previous_loc = this_tree_map[file_id]
2610
# any later mention of this file must be consistent with
2611
# what was said before
2613
if previous_path is not None:
2614
raise AssertionError(
2615
"file %s is absent in row %r but also present " \
2617
(file_id, entry, previous_path))
2618
elif minikind == 'r':
2619
target_location = tree_state[1]
2620
if previous_path != target_location:
2621
raise AssertionError(
2622
"file %s relocation in row %r but also at %r" \
2623
% (file_id, entry, previous_path))
2625
# a file, directory, etc - may have been previously
2626
# pointed to by a relocation, which must point here
2627
if previous_path != this_path:
2628
raise AssertionError(
2629
"entry %r inconsistent with previous path %r "
2631
(entry, previous_path, previous_loc))
2632
check_valid_parent()
2635
# absent; should not occur anywhere else
2636
this_tree_map[file_id] = None, this_path
2637
elif minikind == 'r':
2638
# relocation, must occur at expected location
2639
this_tree_map[file_id] = tree_state[1], this_path
2641
this_tree_map[file_id] = this_path, this_path
2642
check_valid_parent()
2643
if absent_positions == tree_count:
2644
raise AssertionError(
2645
"entry %r has no data for any tree." % (entry,))
2647
def _wipe_state(self):
2648
"""Forget all state information about the dirstate."""
2649
self._header_state = DirState.NOT_IN_MEMORY
2650
self._dirblock_state = DirState.NOT_IN_MEMORY
2651
self._changes_aborted = False
2654
self._dirblocks = []
2655
self._id_index = None
2656
self._packed_stat_index = None
2657
self._end_of_header = None
2658
self._cutoff_time = None
2659
self._split_path_cache = {}
2661
def lock_read(self):
2662
"""Acquire a read lock on the dirstate."""
2663
if self._lock_token is not None:
2664
raise errors.LockContention(self._lock_token)
2665
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2666
# already in memory, we could read just the header and check for
2667
# any modification. If not modified, we can just leave things
2669
self._lock_token = lock.ReadLock(self._filename)
2670
self._lock_state = 'r'
2671
self._state_file = self._lock_token.f
2674
def lock_write(self):
2675
"""Acquire a write lock on the dirstate."""
2676
if self._lock_token is not None:
2677
raise errors.LockContention(self._lock_token)
2678
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2679
# already in memory, we could read just the header and check for
2680
# any modification. If not modified, we can just leave things
2682
self._lock_token = lock.WriteLock(self._filename)
2683
self._lock_state = 'w'
2684
self._state_file = self._lock_token.f
2688
"""Drop any locks held on the dirstate."""
2689
if self._lock_token is None:
2690
raise errors.LockNotHeld(self)
2691
# TODO: jam 20070301 Rather than wiping completely, if the blocks are
2692
# already in memory, we could read just the header and check for
2693
# any modification. If not modified, we can just leave things
2695
self._state_file = None
2696
self._lock_state = None
2697
self._lock_token.unlock()
2698
self._lock_token = None
2699
self._split_path_cache = {}
2701
def _requires_lock(self):
2702
"""Check that a lock is currently held by someone on the dirstate."""
2703
if not self._lock_token:
2704
raise errors.ObjectNotLocked(self)
2707
def py_update_entry(self, entry, abspath, stat_value,
2708
_stat_to_minikind=DirState._stat_to_minikind,
2709
_pack_stat=pack_stat):
2710
"""Update the entry based on what is actually on disk.
2712
:param entry: This is the dirblock entry for the file in question.
2713
:param abspath: The path on disk for this file.
2714
:param stat_value: (optional) if we already have done a stat on the
2716
:return: The sha1 hexdigest of the file (40 bytes) or link target of a
2720
minikind = _stat_to_minikind[stat_value.st_mode & 0170000]
2724
packed_stat = _pack_stat(stat_value)
2725
(saved_minikind, saved_link_or_sha1, saved_file_size,
2726
saved_executable, saved_packed_stat) = entry[1][0]
2728
if (minikind == saved_minikind
2729
and packed_stat == saved_packed_stat):
2730
# The stat hasn't changed since we saved, so we can re-use the
2735
# size should also be in packed_stat
2736
if saved_file_size == stat_value.st_size:
2737
return saved_link_or_sha1
2739
# If we have gotten this far, that means that we need to actually
2740
# process this entry.
2743
link_or_sha1 = self._sha1_file(abspath)
2744
executable = self._is_executable(stat_value.st_mode,
2746
if self._cutoff_time is None:
2747
self._sha_cutoff_time()
2748
if (stat_value.st_mtime < self._cutoff_time
2749
and stat_value.st_ctime < self._cutoff_time):
2750
entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
2751
executable, packed_stat)
2753
entry[1][0] = ('f', '', stat_value.st_size,
2754
executable, DirState.NULLSTAT)
2755
elif minikind == 'd':
2757
entry[1][0] = ('d', '', 0, False, packed_stat)
2758
if saved_minikind != 'd':
2759
# This changed from something into a directory. Make sure we
2760
# have a directory block for it. This doesn't happen very
2761
# often, so this doesn't have to be super fast.
2762
block_index, entry_index, dir_present, file_present = \
2763
self._get_block_entry_index(entry[0][0], entry[0][1], 0)
2764
self._ensure_block(block_index, entry_index,
2765
osutils.pathjoin(entry[0][0], entry[0][1]))
2766
elif minikind == 'l':
2767
link_or_sha1 = self._read_link(abspath, saved_link_or_sha1)
2768
if self._cutoff_time is None:
2769
self._sha_cutoff_time()
2770
if (stat_value.st_mtime < self._cutoff_time
2771
and stat_value.st_ctime < self._cutoff_time):
2772
entry[1][0] = ('l', link_or_sha1, stat_value.st_size,
2775
entry[1][0] = ('l', '', stat_value.st_size,
2776
False, DirState.NULLSTAT)
2777
self._dirblock_state = DirState.IN_MEMORY_MODIFIED
2780
update_entry = py_update_entry
2783
class ProcessEntryPython(object):
2785
__slots__ = ["old_dirname_to_file_id", "new_dirname_to_file_id", "uninteresting",
2786
"last_source_parent", "last_target_parent", "include_unchanged",
2787
"use_filesystem_for_exec", "utf8_decode", "searched_specific_files",
2788
"search_specific_files"]
2790
def __init__(self, include_unchanged, use_filesystem_for_exec,
2791
search_specific_files):
2792
self.old_dirname_to_file_id = {}
2793
self.new_dirname_to_file_id = {}
2794
# Just a sentry, so that _process_entry can say that this
2795
# record is handled, but isn't interesting to process (unchanged)
2796
self.uninteresting = object()
2797
# Using a list so that we can access the values and change them in
2798
# nested scope. Each one is [path, file_id, entry]
2799
self.last_source_parent = [None, None]
2800
self.last_target_parent = [None, None]
2801
self.include_unchanged = include_unchanged
2802
self.use_filesystem_for_exec = use_filesystem_for_exec
2803
self.utf8_decode = cache_utf8._utf8_decode
2804
# for all search_indexs in each path at or under each element of
2805
# search_specific_files, if the detail is relocated: add the id, and add the
2806
# relocated path as one to search if its not searched already. If the
2807
# detail is not relocated, add the id.
2808
self.searched_specific_files = set()
2809
self.search_specific_files = search_specific_files
2811
def _process_entry(self, entry, path_info, source_index, target_index, state):
2812
"""Compare an entry and real disk to generate delta information.
2814
:param path_info: top_relpath, basename, kind, lstat, abspath for
2815
the path of entry. If None, then the path is considered absent.
2816
(Perhaps we should pass in a concrete entry for this ?)
2817
Basename is returned as a utf8 string because we expect this
2818
tuple will be ignored, and don't want to take the time to
2820
:return: None if these don't match
2821
A tuple of information about the change, or
2822
the object 'uninteresting' if these match, but are
2823
basically identical.
2825
if source_index is None:
2826
source_details = DirState.NULL_PARENT_DETAILS
2828
source_details = entry[1][source_index]
2829
target_details = entry[1][target_index]
2830
target_minikind = target_details[0]
2831
if path_info is not None and target_minikind in 'fdlt':
2832
if not (target_index == 0):
2833
raise AssertionError()
2834
link_or_sha1 = update_entry(state, entry,
2835
abspath=path_info[4], stat_value=path_info[3])
2836
# The entry may have been modified by update_entry
2837
target_details = entry[1][target_index]
2838
target_minikind = target_details[0]
2841
file_id = entry[0][2]
2842
source_minikind = source_details[0]
2843
if source_minikind in 'fdltr' and target_minikind in 'fdlt':
2844
# claimed content in both: diff
2845
# r | fdlt | | add source to search, add id path move and perform
2846
# | | | diff check on source-target
2847
# r | fdlt | a | dangling file that was present in the basis.
2849
if source_minikind in 'r':
2850
# add the source to the search path to find any children it
2851
# has. TODO ? : only add if it is a container ?
2852
if not osutils.is_inside_any(self.searched_specific_files,
2854
search_specific_files.add(source_details[1])
2855
# generate the old path; this is needed for stating later
2857
old_path = source_details[1]
2858
old_dirname, old_basename = os.path.split(old_path)
2859
path = pathjoin(entry[0][0], entry[0][1])
2860
old_entry = state._get_entry(source_index,
2862
# update the source details variable to be the real
2864
if old_entry == (None, None):
2865
raise errors.CorruptDirstate(state._filename,
2866
"entry '%s/%s' is considered renamed from %r"
2867
" but source does not exist\n"
2868
"entry: %s" % (entry[0][0], entry[0][1], old_path, entry))
2869
source_details = old_entry[1][source_index]
2870
source_minikind = source_details[0]
2872
old_dirname = entry[0][0]
2873
old_basename = entry[0][1]
2874
old_path = path = None
2875
if path_info is None:
2876
# the file is missing on disk, show as removed.
2877
content_change = True
2881
# source and target are both versioned and disk file is present.
2882
target_kind = path_info[2]
2883
if target_kind == 'directory':
2885
old_path = path = pathjoin(old_dirname, old_basename)
2886
self.new_dirname_to_file_id[path] = file_id
2887
if source_minikind != 'd':
2888
content_change = True
2890
# directories have no fingerprint
2891
content_change = False
2893
elif target_kind == 'file':
2894
if source_minikind != 'f':
2895
content_change = True
2897
# We could check the size, but we already have the
2899
content_change = (link_or_sha1 != source_details[1])
2900
# Target details is updated at update_entry time
2901
if self.use_filesystem_for_exec:
2902
# We don't need S_ISREG here, because we are sure
2903
# we are dealing with a file.
2904
target_exec = bool(stat.S_IEXEC & path_info[3].st_mode)
2906
target_exec = target_details[3]
2907
elif target_kind == 'symlink':
2908
if source_minikind != 'l':
2909
content_change = True
2911
content_change = (link_or_sha1 != source_details[1])
2913
elif target_kind == 'tree-reference':
2914
if source_minikind != 't':
2915
content_change = True
2917
content_change = False
2920
raise Exception, "unknown kind %s" % path_info[2]
2921
if source_minikind == 'd':
2923
old_path = path = pathjoin(old_dirname, old_basename)
2924
self.old_dirname_to_file_id[old_path] = file_id
2925
# parent id is the entry for the path in the target tree
2926
if old_dirname == self.last_source_parent[0]:
2927
source_parent_id = self.last_source_parent[1]
2930
source_parent_id = self.old_dirname_to_file_id[old_dirname]
2932
source_parent_entry = state._get_entry(source_index,
2933
path_utf8=old_dirname)
2934
source_parent_id = source_parent_entry[0][2]
2935
if source_parent_id == entry[0][2]:
2936
# This is the root, so the parent is None
2937
source_parent_id = None
2939
self.last_source_parent[0] = old_dirname
2940
self.last_source_parent[1] = source_parent_id
2941
new_dirname = entry[0][0]
2942
if new_dirname == self.last_target_parent[0]:
2943
target_parent_id = self.last_target_parent[1]
2946
target_parent_id = self.new_dirname_to_file_id[new_dirname]
2948
# TODO: We don't always need to do the lookup, because the
2949
# parent entry will be the same as the source entry.
2950
target_parent_entry = state._get_entry(target_index,
2951
path_utf8=new_dirname)
2952
if target_parent_entry == (None, None):
2953
raise AssertionError(
2954
"Could not find target parent in wt: %s\nparent of: %s"
2955
% (new_dirname, entry))
2956
target_parent_id = target_parent_entry[0][2]
2957
if target_parent_id == entry[0][2]:
2958
# This is the root, so the parent is None
2959
target_parent_id = None
2961
self.last_target_parent[0] = new_dirname
2962
self.last_target_parent[1] = target_parent_id
2964
source_exec = source_details[3]
2965
if (self.include_unchanged
2967
or source_parent_id != target_parent_id
2968
or old_basename != entry[0][1]
2969
or source_exec != target_exec
2971
if old_path is None:
2972
old_path = path = pathjoin(old_dirname, old_basename)
2973
old_path_u = self.utf8_decode(old_path)[0]
2976
old_path_u = self.utf8_decode(old_path)[0]
2977
if old_path == path:
2980
path_u = self.utf8_decode(path)[0]
2981
source_kind = DirState._minikind_to_kind[source_minikind]
2982
return (entry[0][2],
2983
(old_path_u, path_u),
2986
(source_parent_id, target_parent_id),
2987
(self.utf8_decode(old_basename)[0], self.utf8_decode(entry[0][1])[0]),
2988
(source_kind, target_kind),
2989
(source_exec, target_exec))
2991
return self.uninteresting
2992
elif source_minikind in 'a' and target_minikind in 'fdlt':
2993
# looks like a new file
2994
path = pathjoin(entry[0][0], entry[0][1])
2995
# parent id is the entry for the path in the target tree
2996
# TODO: these are the same for an entire directory: cache em.
2997
parent_id = state._get_entry(target_index,
2998
path_utf8=entry[0][0])[0][2]
2999
if parent_id == entry[0][2]:
3001
if path_info is not None:
3003
if self.use_filesystem_for_exec:
3004
# We need S_ISREG here, because we aren't sure if this
3007
stat.S_ISREG(path_info[3].st_mode)
3008
and stat.S_IEXEC & path_info[3].st_mode)
3010
target_exec = target_details[3]
3011
return (entry[0][2],
3012
(None, self.utf8_decode(path)[0]),
3016
(None, self.utf8_decode(entry[0][1])[0]),
3017
(None, path_info[2]),
3018
(None, target_exec))
3020
# Its a missing file, report it as such.
3021
return (entry[0][2],
3022
(None, self.utf8_decode(path)[0]),
3026
(None, self.utf8_decode(entry[0][1])[0]),
3029
elif source_minikind in 'fdlt' and target_minikind in 'a':
3030
# unversioned, possibly, or possibly not deleted: we dont care.
3031
# if its still on disk, *and* theres no other entry at this
3032
# path [we dont know this in this routine at the moment -
3033
# perhaps we should change this - then it would be an unknown.
3034
old_path = pathjoin(entry[0][0], entry[0][1])
3035
# parent id is the entry for the path in the target tree
3036
parent_id = state._get_entry(source_index, path_utf8=entry[0][0])[0][2]
3037
if parent_id == entry[0][2]:
3039
return (entry[0][2],
3040
(self.utf8_decode(old_path)[0], None),
3044
(self.utf8_decode(entry[0][1])[0], None),
3045
(DirState._minikind_to_kind[source_minikind], None),
3046
(source_details[3], None))
3047
elif source_minikind in 'fdlt' and target_minikind in 'r':
3048
# a rename; could be a true rename, or a rename inherited from
3049
# a renamed parent. TODO: handle this efficiently. Its not
3050
# common case to rename dirs though, so a correct but slow
3051
# implementation will do.
3052
if not osutils.is_inside_any(self.searched_specific_files, target_details[1]):
3053
search_specific_files.add(target_details[1])
3054
elif source_minikind in 'ra' and target_minikind in 'ra':
3055
# neither of the selected trees contain this file,
3056
# so skip over it. This is not currently directly tested, but
3057
# is indirectly via test_too_much.TestCommands.test_conflicts.
3060
raise AssertionError("don't know how to compare "
3061
"source_minikind=%r, target_minikind=%r"
3062
% (source_minikind, target_minikind))
3063
## import pdb;pdb.set_trace()
3065
_process_entry = ProcessEntryPython
3068
# Try to load the compiled form if possible
3070
from bzrlib._dirstate_helpers_c import (
3071
_read_dirblocks_c as _read_dirblocks,
3072
bisect_dirblock_c as bisect_dirblock,
3073
_bisect_path_left_c as _bisect_path_left,
3074
_bisect_path_right_c as _bisect_path_right,
3075
cmp_by_dirs_c as cmp_by_dirs,
3076
update_entry as update_entry,
3077
ProcessEntryC as _process_entry,
3080
from bzrlib._dirstate_helpers_py import (
3081
_read_dirblocks_py as _read_dirblocks,
3082
bisect_dirblock_py as bisect_dirblock,
3083
_bisect_path_left_py as _bisect_path_left,
3084
_bisect_path_right_py as _bisect_path_right,
3085
cmp_by_dirs_py as cmp_by_dirs,