/brz/remove-bazaar : contents of bzrlib/fetch.py at revision 3735.2.98

: (revision 3735.2.98)

To get this branch, use:

bzr branch
http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar

3350.6.12 by Martin Pool merge trunk; remove RemoteToOtherFetcher	1	# Copyright (C) 2005, 2006, 2008 Canonical Ltd
1887.1.1 by Adeodato Simó Do not separate paragraphs in the copyright statement with blank lines,	2	#
974.1.27 by aaron.bentley at utoronto Initial greedy fetch work	3	# This program is free software; you can redistribute it and/or modify
	4	# it under the terms of the GNU General Public License as published by
	5	# the Free Software Foundation; either version 2 of the License, or
	6	# (at your option) any later version.
1887.1.1 by Adeodato Simó Do not separate paragraphs in the copyright statement with blank lines,	7	#
974.1.27 by aaron.bentley at utoronto Initial greedy fetch work	8	# This program is distributed in the hope that it will be useful,
	9	# but WITHOUT ANY WARRANTY; without even the implied warranty of
	10	# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	11	# GNU General Public License for more details.
1887.1.1 by Adeodato Simó Do not separate paragraphs in the copyright statement with blank lines,	12	#
974.1.27 by aaron.bentley at utoronto Initial greedy fetch work	13	# You should have received a copy of the GNU General Public License
	14	# along with this program; if not, write to the Free Software
	15	# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
1218 by Martin Pool - fix up import	16
1231 by Martin Pool - more progress on fetch on top of weaves	17
	18	"""Copying of history from one branch to another.
	19
	20	The basic plan is that every branch knows the history of everything
	21	that has merged into it. As the first step of a merge, pull, or
	22	branch operation we copy history from the source into the destination
	23	branch.
	24
	25	The copying is done in a slightly complicated order. We don't want to
	26	add a revision to the store until everything it refers to is also
	27	stored, so that if a revision is present we can totally recreate it.
	28	However, we can't know what files are included in a revision until we
1563.2.34 by Robert Collins Remove the commit and rollback transaction methods as misleading, and implement a WriteTransaction	29	read its inventory. So we query the inventory store of the source for
3316.2.14 by Robert Collins Spelling in NEWS.	30	the ids we need, and then pull those ids and then return to the inventories.
1231 by Martin Pool - more progress on fetch on top of weaves	31	"""
	32
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	33	import operator
	34
1534.1.31 by Robert Collins Deprecated fetch.fetch and fetch.greedy_fetch for branch.fetch, and move the Repository.fetch internals to InterRepo and InterWeaveRepo.	35	import bzrlib
	36	import bzrlib.errors as errors
3184.1.8 by Robert Collins * ``InterRepository.missing_revision_ids`` is now deprecated in favour of	37	from bzrlib.errors import InstallFailed
1773.4.1 by Martin Pool Add pyflakes makefile target; fix many warnings	38	from bzrlib.progress import ProgressPhase
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	39	from bzrlib.revision import NULL_REVISION
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	40	from bzrlib.tsort import topo_sort
2094.3.5 by John Arbash Meinel Fix imports to ensure modules are loaded before they are used	41	from bzrlib.trace import mutter
	42	import bzrlib.ui
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	43	from bzrlib.versionedfile import filter_absent, FulltextContentFactory
1534.1.31 by Robert Collins Deprecated fetch.fetch and fetch.greedy_fetch for branch.fetch, and move the Repository.fetch internals to InterRepo and InterWeaveRepo.	44
1231 by Martin Pool - more progress on fetch on top of weaves	45	# TODO: Avoid repeatedly opening weaves so many times.
974.1.27 by aaron.bentley at utoronto Initial greedy fetch work	46
1238 by Martin Pool - remove a lot of dead code from fetch	47	# XXX: This doesn't handle ghost (not present in branch) revisions at
1240 by Martin Pool - clean up fetch code and add progress bar	48	# all yet. I'm not sure they really should be supported.
	49
1262 by Martin Pool - fetch should also copy ancestry records	50	# NOTE: This doesn't copy revisions which may be present but not
	51	# merged into the last revision. I'm not sure we want to do that.
1238 by Martin Pool - remove a lot of dead code from fetch	52
	53	# - get a list of revisions that need to be pulled in
	54	# - for each one, pull in that revision file
	55	# and get the inventory, and store the inventory with right
	56	# parents.
	57	# - and get the ancestry, and store that with right parents too
	58	# - and keep a note of all file ids and version seen
	59	# - then go through all files; for each one get the weave,
	60	# and add in all file versions
	61
3735.2.97 by John Arbash Meinel Change the generic fetch code to give better progress indication.	62	def _pb_stream_adapter(pb, msg, num_keys, stream):
	63	def adapter():
	64	for idx, record in enumerate(stream):
	65	pb.update(msg, idx, num_keys)
	66	yield record
	67	return adapter
	68
1238 by Martin Pool - remove a lot of dead code from fetch	69
1534.4.41 by Robert Collins Branch now uses BzrDir reasonably sanely.	70	class RepoFetcher(object):
	71	"""Pull revisions and texts from one repository to another.
	72
	73	last_revision
	74	if set, try to limit to the data this revision references.
	75
	76	after running:
1260 by Martin Pool - some updates for fetch/update function	77	count_copied -- number of revisions copied
1534.1.33 by Robert Collins Move copy_content_into into InterRepository and InterWeaveRepo, and disable the default codepath test as we have optimised paths for all current combinations.	78
2592.4.5 by Martin Pool Add Repository.base on all repositories.	79	This should not be used directly, it's essential a object to encapsulate
1534.1.33 by Robert Collins Move copy_content_into into InterRepository and InterWeaveRepo, and disable the default codepath test as we have optimised paths for all current combinations.	80	the logic in InterRepository.fetch().
1260 by Martin Pool - some updates for fetch/update function	81	"""
3172.4.1 by Robert Collins * Fetching via bzr+ssh will no longer fill ghosts by default (this is	82
	83	def __init__(self, to_repository, from_repository, last_revision=None, pb=None,
3871.3.1 by Martin Pool Don't set a pack write cache size from RepoFetcher, because the cache is not coherent with reads and causes ShortReadvErrors	84	find_ghosts=True):
3172.4.1 by Robert Collins * Fetching via bzr+ssh will no longer fill ghosts by default (this is	85	"""Create a repo fetcher.
	86
	87	:param find_ghosts: If True search the entire history for ghosts.
3834.4.5 by Andrew Bennetts Add some comments to fetch.py	88	:param _write_group_acquired_callable: Don't use; this parameter only
	89	exists to facilitate a hack done in InterPackRepo.fetch. We would
	90	like to remove this parameter.
3172.4.1 by Robert Collins * Fetching via bzr+ssh will no longer fill ghosts by default (this is	91	"""
1534.1.31 by Robert Collins Deprecated fetch.fetch and fetch.greedy_fetch for branch.fetch, and move the Repository.fetch internals to InterRepo and InterWeaveRepo.	92	# result variables.
	93	self.failed_revisions = []
	94	self.count_copied = 0
2668.2.6 by Andrew Bennetts Merge repository-equality.	95	if to_repository.has_same_location(from_repository):
2592.3.115 by Robert Collins Move same repository check up to Repository.fetch to allow all fetch implementations to benefit.	96	# repository.fetch should be taking care of this case.
2592.4.5 by Martin Pool Add Repository.base on all repositories.	97	raise errors.BzrError('RepoFetcher run '
	98	'between two objects at the same location: '
2592.4.4 by Martin Pool better message for attempted fetch between aliased repositories	99	'%r and %r' % (to_repository, from_repository))
1534.4.41 by Robert Collins Branch now uses BzrDir reasonably sanely.	100	self.to_repository = to_repository
	101	self.from_repository = from_repository
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	102	self.sink = to_repository._get_sink()
1534.4.41 by Robert Collins Branch now uses BzrDir reasonably sanely.	103	# must not mutate self._last_revision as its potentially a shared instance
1185.65.27 by Robert Collins Tweak storage towards mergability.	104	self._last_revision = last_revision
3172.4.1 by Robert Collins * Fetching via bzr+ssh will no longer fill ghosts by default (this is	105	self.find_ghosts = find_ghosts
1185.65.27 by Robert Collins Tweak storage towards mergability.	106	if pb is None:
1594.1.3 by Robert Collins Fixup pb usage to use nested_progress_bar.	107	self.pb = bzrlib.ui.ui_factory.nested_progress_bar()
	108	self.nested_pb = self.pb
1185.65.27 by Robert Collins Tweak storage towards mergability.	109	else:
	110	self.pb = pb
1594.1.3 by Robert Collins Fixup pb usage to use nested_progress_bar.	111	self.nested_pb = None
1534.4.41 by Robert Collins Branch now uses BzrDir reasonably sanely.	112	self.from_repository.lock_read()
3842.3.5 by Andrew Bennetts Remove some debugging cruft, make more tests pass.	113	try:
	114	self.to_repository.lock_write()
	115	try:
	116	self.to_repository.start_write_group()
	117	try:
	118	self.__fetch()
	119	except:
	120	self.to_repository.abort_write_group(suppress_errors=True)
	121	raise
	122	else:
	123	self.to_repository.commit_write_group()
	124	finally:
	125	try:
	126	if self.nested_pb is not None:
	127	self.nested_pb.finished()
	128	finally:
	129	self.to_repository.unlock()
	130	finally:
	131	self.from_repository.unlock()
1185.65.27 by Robert Collins Tweak storage towards mergability.	132
	133	def __fetch(self):
	134	"""Primary worker function.
	135
	136	This initialises all the needed variables, and then fetches the
	137	requested revisions, finally clearing the progress bar.
	138	"""
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	139	# Roughly this is what we're aiming for fetch to become:
	140	#
	141	# missing = self.sink.insert_stream(self.source.get_stream(search))
	142	# if missing:
	143	# missing = self.sink.insert_stream(self.source.get_items(missing))
	144	# assert not missing
1240 by Martin Pool - clean up fetch code and add progress bar	145	self.count_total = 0
1185.33.55 by Martin Pool [patch] weave fetch optimizations (Goffredo Baroncelli)	146	self.file_ids_names = {}
3009.1.1 by Martin Albisetti Changes Fetch to Transferring to better reflect what is going on	147	pp = ProgressPhase('Transferring', 4, self.pb)
1392 by Robert Collins reinstate testfetch test case	148	try:
1733.2.6 by Michael Ellerman Fix phase handling in fetch code.	149	pp.next_phase()
3184.1.9 by Robert Collins * ``Repository.get_data_stream`` is now deprecated in favour of	150	search = self._revids_to_fetch()
	151	if search is None:
2535.3.46 by Andrew Bennetts Fix a bug when "fetching" from a RemoteRepository when the target already has the last revision.	152	return
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	153	self._fetch_everything_for_search(search, pp)
2535.3.6 by Andrew Bennetts Move some "what repo data to fetch logic" from RepoFetcher to Repository.	154	finally:
	155	self.pb.clear()
	156
3184.1.9 by Robert Collins * ``Repository.get_data_stream`` is now deprecated in favour of	157	def _fetch_everything_for_search(self, search, pp):
2535.3.6 by Andrew Bennetts Move some "what repo data to fetch logic" from RepoFetcher to Repository.	158	"""Fetch all data for the given set of revisions."""
2535.3.9 by Andrew Bennetts More comments.	159	# The first phase is "file". We pass the progress bar for it directly
2668.2.8 by Andrew Bennetts Rename get_data_to_fetch_for_revision_ids as item_keys_introduced_by.	160	# into item_keys_introduced_by, which has more information about how
2535.3.9 by Andrew Bennetts More comments.	161	# that phase is progressing than we do. Progress updates for the other
	162	# phases are taken care of in this function.
	163	# XXX: there should be a clear owner of the progress reporting. Perhaps
2668.2.8 by Andrew Bennetts Rename get_data_to_fetch_for_revision_ids as item_keys_introduced_by.	164	# item_keys_introduced_by should have a richer API than it does at the
	165	# moment, so that it can feed the progress information back to this
2535.3.9 by Andrew Bennetts More comments.	166	# function?
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	167	self.pb = bzrlib.ui.ui_factory.nested_progress_bar()
2535.3.7 by Andrew Bennetts Remove now unused _fetch_weave_texts, make progress reporting closer to how it was before I refactored __fetch.	168	try:
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	169	from_format = self.from_repository._format
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	170	stream = self.get_stream(search, pp)
4029.2.1 by Robert Collins Support streaming push to stacked branches.	171	missing_keys = self.sink.insert_stream(stream, from_format)
	172	if missing_keys:
	173	stream = self.get_stream_for_missing_keys(missing_keys)
	174	missing_keys = self.sink.insert_stream(stream, from_format)
	175	if missing_keys:
	176	raise AssertionError(
	177	"second push failed to complete a fetch %r." % (
	178	missing_keys,))
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	179	self.sink.finished()
2535.3.7 by Andrew Bennetts Remove now unused _fetch_weave_texts, make progress reporting closer to how it was before I refactored __fetch.	180	finally:
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	181	if self.pb is not None:
	182	self.pb.finished()
4029.2.1 by Robert Collins Support streaming push to stacked branches.	183
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	184	def get_stream(self, search, pp):
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	185	phase = 'file'
	186	revs = search.get_keys()
	187	graph = self.from_repository.get_graph()
	188	revs = list(graph.iter_topo_order(revs))
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	189	data_to_fetch = self.from_repository.item_keys_introduced_by(
	190	revs, self.pb)
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	191	text_keys = []
	192	for knit_kind, file_id, revisions in data_to_fetch:
	193	if knit_kind != phase:
	194	phase = knit_kind
	195	# Make a new progress bar for this phase
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	196	self.pb.finished()
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	197	pp.next_phase()
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	198	self.pb = bzrlib.ui.ui_factory.nested_progress_bar()
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	199	if knit_kind == "file":
	200	# Accumulate file texts
	201	text_keys.extend([(file_id, revision) for revision in
	202	revisions])
	203	elif knit_kind == "inventory":
	204	# Now copy the file texts.
	205	to_texts = self.to_repository.texts
	206	from_texts = self.from_repository.texts
	207	yield ('texts', from_texts.get_record_stream(
	208	text_keys, self.to_repository._fetch_order,
	209	not self.to_repository._fetch_uses_deltas))
	210	# Cause an error if a text occurs after we have done the
	211	# copy.
	212	text_keys = None
	213	# Before we process the inventory we generate the root
	214	# texts (if necessary) so that the inventories references
	215	# will be valid.
	216	for _ in self._generate_root_texts(revs):
	217	yield _
	218	# NB: This currently reopens the inventory weave in source;
	219	# using a single stream interface instead would avoid this.
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	220	self.pb.update("fetch inventory", 0, 1)
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	221	# we fetch only the referenced inventories because we do not
	222	# know for unselected inventories whether all their required
	223	# texts are present in the other repository - it could be
	224	# corrupt.
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	225	for info in self._get_inventory_stream(revs):
	226	yield info
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	227	elif knit_kind == "signatures":
	228	# Nothing to do here; this will be taken care of when
	229	# _fetch_revision_texts happens.
	230	pass
	231	elif knit_kind == "revisions":
4022.1.2 by Robert Collins Fix progress bars in fetch after refactoring.	232	for _ in self._fetch_revision_texts(revs, self.pb):
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	233	yield _
	234	else:
	235	raise AssertionError("Unknown knit kind %r" % knit_kind)
2535.3.6 by Andrew Bennetts Move some "what repo data to fetch logic" from RepoFetcher to Repository.	236	self.count_copied += len(revs)
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	237
4029.2.1 by Robert Collins Support streaming push to stacked branches.	238	def get_stream_for_missing_keys(self, missing_keys):
	239	# missing keys can only occur when we are byte copying and not
	240	# translating (because translation means we don't send
	241	# unreconstructable deltas ever).
	242	keys = {}
	243	keys['texts'] = set()
	244	keys['revisions'] = set()
	245	keys['inventories'] = set()
	246	keys['signatures'] = set()
	247	for key in missing_keys:
	248	keys[key[0]].add(key[1:])
	249	if len(keys['revisions']):
	250	# If we allowed copying revisions at this point, we could end up
	251	# copying a revision without copying its required texts: a
	252	# violation of the requirements for repository integrity.
	253	raise AssertionError(
	254	'cannot copy revisions to fill in missing deltas %s' % (
	255	keys['revisions'],))
	256	for substream_kind, keys in keys.iteritems():
	257	vf = getattr(self.from_repository, substream_kind)
	258	# Ask for full texts always so that we don't need more round trips
	259	# after this stream.
	260	stream = vf.get_record_stream(keys,
	261	self.to_repository._fetch_order, True)
	262	yield substream_kind, stream
	263
1185.65.30 by Robert Collins Merge integration.	264	def _revids_to_fetch(self):
2535.3.7 by Andrew Bennetts Remove now unused _fetch_weave_texts, make progress reporting closer to how it was before I refactored __fetch.	265	"""Determines the exact revisions needed from self.from_repository to
	266	install self._last_revision in self.to_repository.
	267
	268	If no revisions need to be fetched, then this just returns None.
	269	"""
1185.65.27 by Robert Collins Tweak storage towards mergability.	270	mutter('fetch up to rev {%s}', self._last_revision)
1534.4.50 by Robert Collins Got the bzrdir api straightened out, plenty of refactoring to use it pending, but the api is up and running.	271	if self._last_revision is NULL_REVISION:
	272	# explicit limit of no revisions needed
3184.1.9 by Robert Collins * ``Repository.get_data_stream`` is now deprecated in favour of	273	return None
1963.2.6 by Robey Pointer pychecker is on crack; go back to using 'is None'.	274	if (self._last_revision is not None and
1185.65.27 by Robert Collins Tweak storage towards mergability.	275	self.to_repository.has_revision(self._last_revision)):
3184.1.9 by Robert Collins * ``Repository.get_data_stream`` is now deprecated in favour of	276	return None
1417.1.13 by Robert Collins do not download remote ancestry.weave if the target revision we are stopping at is in our local store	277	try:
3184.1.8 by Robert Collins * ``InterRepository.missing_revision_ids`` is now deprecated in favour of	278	return self.to_repository.search_missing_revision_ids(
	279	self.from_repository, self._last_revision,
	280	find_ghosts=self.find_ghosts)
3350.6.1 by Robert Collins * New ``versionedfile.KeyMapper`` interface to abstract out the access to	281	except errors.NoSuchRevision, e:
1185.65.27 by Robert Collins Tweak storage towards mergability.	282	raise InstallFailed([self._last_revision])
1185.64.3 by Goffredo Baroncelli This patch changes the fetch code. Before, the original code expanded every inventory and	283
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	284	def _get_inventory_stream(self, revision_ids):
3735.2.97 by John Arbash Meinel Change the generic fetch code to give better progress indication.	285	if (self.from_repository._format.supports_chks and
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	286	self.to_repository._format.supports_chks
	287	and (self.from_repository._format._serializer
	288	== self.to_repository._format._serializer)):
	289	# Both sides support chks, and they use the same serializer, so it
	290	# is safe to transmit the chk pages and inventory pages across
	291	# as-is.
	292	return self._get_chk_inventory_stream(revision_ids)
	293	elif (not self.from_repository._format.supports_chks):
	294	# Source repository doesn't support chks. So we can transmit the
	295	# inventories 'as-is' and either they are just accepted on the
	296	# target, or the Sink will properly convert it.
	297	return self._get_simple_inventory_stream(revision_ids)
3735.2.97 by John Arbash Meinel Change the generic fetch code to give better progress indication.	298	else:
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	299	# XXX: Hack to make not-chk->chk fetch: copy the inventories as
	300	# inventories. Note that this should probably be done somehow
	301	# as part of bzrlib.repository.StreamSink. Except JAM couldn't
	302	# figure out how a non-chk repository could possibly handle
	303	# deserializing an inventory stream from a chk repo, as it
	304	# doesn't have a way to understand individual pages.
	305	return self._get_convertable_inventory_stream(revision_ids)
	306
	307	def _get_simple_inventory_stream(self, revision_ids):
	308	from_weave = self.from_repository.inventories
	309	yield ('inventories', from_weave.get_record_stream(
	310	[(rev_id,) for rev_id in revision_ids],
	311	self.inventory_fetch_order(),
	312	not self.delta_on_metadata()))
	313
	314	def _get_chk_inventory_stream(self, revision_ids):
3735.9.14 by John Arbash Meinel Start using the iter_interesting_nodes.	315	"""Fetch the inventory texts, along with the associated chk maps."""
	316	from bzrlib import inventory, chk_map
	317	# We want an inventory outside of the search set, so that we can filter
	318	# out uninteresting chk pages. For now we use
	319	# _find_revision_outside_set, but if we had a Search with cut_revs, we
	320	# could use that instead.
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	321	start_rev_id = self.from_repository._find_revision_outside_set(
	322	revision_ids)
3735.9.14 by John Arbash Meinel Start using the iter_interesting_nodes.	323	start_rev_key = (start_rev_id,)
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	324	inv_keys_to_fetch = [(rev_id,) for rev_id in revision_ids]
3735.9.14 by John Arbash Meinel Start using the iter_interesting_nodes.	325	if start_rev_id != NULL_REVISION:
	326	inv_keys_to_fetch.append((start_rev_id,))
	327	# Any repo that supports chk_bytes must also support out-of-order
	328	# insertion. At least, that is how we expect it to work
	329	# We use get_record_stream instead of iter_inventories because we want
	330	# to be able to insert the stream as well. We could instead fetch
	331	# allowing deltas, and then iter_inventories, but we don't know whether
	332	# source or target is more 'local' anway.
	333	inv_stream = self.from_repository.inventories.get_record_stream(
	334	inv_keys_to_fetch, 'unordered',
	335	True) # We need them as full-texts so we can find their references
	336	uninteresting_chk_roots = set()
	337	interesting_chk_roots = set()
3735.2.76 by Robert Collins Insert a single stream with all inventories being fetched, not one per inventory.	338	def filter_inv_stream(inv_stream):
3735.2.97 by John Arbash Meinel Change the generic fetch code to give better progress indication.	339	for idx, record in enumerate(inv_stream):
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	340	### child_pb.update('fetch inv', idx, len(inv_keys_to_fetch))
3735.2.76 by Robert Collins Insert a single stream with all inventories being fetched, not one per inventory.	341	bytes = record.get_bytes_as('fulltext')
	342	chk_inv = inventory.CHKInventory.deserialise(
	343	self.from_repository.chk_bytes, bytes, record.key)
	344	if record.key == start_rev_key:
	345	uninteresting_chk_roots.add(chk_inv.id_to_entry.key())
	346	p_id_map = chk_inv.parent_id_basename_to_file_id
	347	if p_id_map is not None:
	348	uninteresting_chk_roots.add(p_id_map.key())
	349	else:
	350	yield record
	351	interesting_chk_roots.add(chk_inv.id_to_entry.key())
	352	p_id_map = chk_inv.parent_id_basename_to_file_id
	353	if p_id_map is not None:
	354	interesting_chk_roots.add(p_id_map.key())
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	355	### pb.update('fetch inventory', 0, 2)
	356	yield ('inventories', filter_inv_stream(inv_stream))
3735.9.14 by John Arbash Meinel Start using the iter_interesting_nodes.	357	# Now that we have worked out all of the interesting root nodes, grab
	358	# all of the interesting pages and insert them
3735.2.98 by John Arbash Meinel Merge bzr.dev 4032. Resolve the new streaming fetch.	359	### pb.update('fetch inventory', 1, 2)
	360	interesting = chk_map.iter_interesting_nodes(
	361	self.from_repository.chk_bytes, interesting_chk_roots,
	362	uninteresting_chk_roots)
	363	def to_stream_adapter():
	364	"""Adapt the iter_interesting_nodes result to a single stream.
	365
	366	iter_interesting_nodes returns records as it processes them, which
	367	can be in batches. But we only want a single stream to be inserted.
	368	"""
	369	for record, items in interesting:
	370	for value in record.itervalues():
	371	yield value
	372	# XXX: We could instead call get_record_stream(records.keys())
	373	# ATM, this will always insert the records as fulltexts, and
	374	# requires that you can hang on to records once you have gone
	375	# on to the next one. Further, it causes the target to
	376	# recompress the data. Testing shows it to be faster than
	377	# requesting the records again, though.
	378	yield ('chk_bytes', to_stream_adapter())
	379	### pb.update('fetch inventory', 2, 2)
	380
	381	def _get_convertable_inventory_stream(self, revision_ids):
	382	# XXX: One of source or target is using chks, and they don't have
	383	# compatible serializations. The StreamSink code expects to be
	384	# able to convert on the target, so we need to put
	385	# bytes-on-the-wire that can be converted
	386	yield ('inventories', self._stream_invs_as_fulltexts(revision_ids))
	387
	388	def _stream_invs_as_fulltexts(self, revision_ids):
	389	from_serializer = self.from_repository._format._serializer
	390	revision_keys = [(rev_id,) for rev_id in revision_ids]
	391	parent_map = self.from_repository.inventory.get_parent_map(revision_keys)
	392	for inv in self.from_repository.iter_inventories(revision_ids):
	393	# XXX: This is a bit hackish, but it works. Basically,
	394	# CHKSerializer 'accidentally' supports
	395	# read/write_inventory_to_string, even though that is never
	396	# the format that is stored on disk. It does give us a
	397	# single string representation for an inventory, so live with
	398	# it for now.
	399	# This would be far better if we had a 'serialized inventory
	400	# delta' form. Then we could use 'inventory._make_delta', and
	401	# transmit that. This would both be faster to generate, and
	402	# result in fewer bytes-on-the-wire.
	403	as_bytes = from_serializer.write_inventory_to_string(inv)
	404	key = (inv.revision_id,)
	405	parent_keys = parent_map.get(key, ())
	406	yield FulltextContentFactory(key, parent_keys, None, as_bytes)
	407
2535.3.7 by Andrew Bennetts Remove now unused _fetch_weave_texts, make progress reporting closer to how it was before I refactored __fetch.	408	def _fetch_revision_texts(self, revs, pb):
3830.3.6 by Martin Pool Document _fetch_uses_delta and make it a class attribute	409	# fetch signatures first and then the revision texts
1563.2.31 by Robert Collins Convert Knit repositories to use knits.	410	# may need to be a InterRevisionStore call here.
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	411	from_sf = self.from_repository.signatures
3350.3.19 by Robert Collins Eliminate the use of VersionedFile.join when fetching data.	412	# A missing signature is just skipped.
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	413	keys = [(rev_id,) for rev_id in revs]
	414	signatures = filter_absent(from_sf.get_record_stream(
	415	keys,
3565.3.1 by Robert Collins * The generic fetch code now uses two attributes on Repository objects	416	self.to_repository._fetch_order,
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	417	not self.to_repository._fetch_uses_deltas))
3849.3.1 by John Arbash Meinel Part of bug #300289, stop requiring plain fulltexts for revisions.	418	# If a revision has a delta, this is actually expanded inside the
	419	# insert_record_stream code now, which is an alternate fix for
	420	# bug #261339
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	421	from_rf = self.from_repository.revisions
	422	revisions = from_rf.get_record_stream(
	423	keys,
3565.3.1 by Robert Collins * The generic fetch code now uses two attributes on Repository objects	424	self.to_repository._fetch_order,
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	425	not self.delta_on_metadata())
	426	return [('signatures', signatures), ('revisions', revisions)]
3735.9.14 by John Arbash Meinel Start using the iter_interesting_nodes.	427
3565.3.3 by Robert Collins * Fetching data between repositories that have the same model but no	428	def _generate_root_texts(self, revs):
	429	"""This will be called by __fetch between fetching weave texts and
	430	fetching the inventory weave.
	431
	432	Subclasses should override this if they need to generate root texts
	433	after fetching weave texts.
	434	"""
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	435	return []
	436
	437	def inventory_fetch_order(self):
	438	return self.to_repository._fetch_order
	439
	440	def delta_on_metadata(self):
	441	src_serializer = self.from_repository._format._serializer
	442	target_serializer = self.to_repository._format._serializer
	443	return (self.to_repository._fetch_uses_deltas and
	444	src_serializer == target_serializer)
3565.3.3 by Robert Collins * Fetching data between repositories that have the same model but no	445
	446
1910.2.24 by Aaron Bentley Got intra-repository fetch working between model1 and 2 for all types	447	class Inter1and2Helper(object):
1910.2.48 by Aaron Bentley Update from review comments	448	"""Helper for operations that convert data from model 1 and 2
	449
	450	This is for use by fetchers and converters.
	451	"""
	452
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	453	def __init__(self, source):
1910.2.48 by Aaron Bentley Update from review comments	454	"""Constructor.
	455
	456	:param source: The repository data comes from
	457	"""
	458	self.source = source
	459
	460	def iter_rev_trees(self, revs):
	461	"""Iterate through RevisionTrees efficiently.
	462
	463	Additionally, the inventory's revision_id is set if unset.
	464
	465	Trees are retrieved in batches of 100, and then yielded in the order
	466	they were requested.
	467
	468	:param revs: A list of revision ids
	469	"""
3172.4.4 by Robert Collins Review feedback.	470	# In case that revs is not a list.
3172.4.4 by Robert Collins Review feedback.	471	revs = list(revs)
1910.2.48 by Aaron Bentley Update from review comments	472	while revs:
1910.2.48 by Aaron Bentley Update from review comments	473	for tree in self.source.revision_trees(revs[:100]):
1910.2.44 by Aaron Bentley Retrieve only 500 revision trees at once	474	if tree.inventory.revision_id is None:
	475	tree.inventory.revision_id = tree.get_revision_id()
	476	yield tree
1910.2.48 by Aaron Bentley Update from review comments	477	revs = revs[100:]
1910.2.44 by Aaron Bentley Retrieve only 500 revision trees at once	478
3380.2.4 by Aaron Bentley Updates from review	479	def _find_root_ids(self, revs, parent_map, graph):
3380.2.4 by Aaron Bentley Updates from review	480	revision_root = {}
3380.1.2 by Aaron Bentley Improve handling ghosts and changing root_ids	481	planned_versions = {}
1910.2.48 by Aaron Bentley Update from review comments	482	for tree in self.iter_rev_trees(revs):
1910.2.18 by Aaron Bentley Implement creation of knits for tree roots	483	revision_id = tree.inventory.root.revision
2946.3.3 by John Arbash Meinel Prefer tree.get_root_id() as more explicit than tree.path2id('')	484	root_id = tree.get_root_id()
3380.1.2 by Aaron Bentley Improve handling ghosts and changing root_ids	485	planned_versions.setdefault(root_id, []).append(revision_id)
3380.1.3 by Aaron Bentley Fix model-change fetching with ghosts and when fetch is resumed	486	revision_root[revision_id] = root_id
	487	# Find out which parents we don't already know root ids for
	488	parents = set()
	489	for revision_parents in parent_map.itervalues():
	490	parents.update(revision_parents)
	491	parents.difference_update(revision_root.keys() + [NULL_REVISION])
3380.2.7 by Aaron Bentley Update docs	492	# Limit to revisions present in the versionedfile
3380.1.3 by Aaron Bentley Fix model-change fetching with ghosts and when fetch is resumed	493	parents = graph.get_parent_map(parents).keys()
	494	for tree in self.iter_rev_trees(parents):
	495	root_id = tree.get_root_id()
	496	revision_root[tree.get_revision_id()] = root_id
3380.2.4 by Aaron Bentley Updates from review	497	return revision_root, planned_versions
	498
	499	def generate_root_texts(self, revs):
	500	"""Generate VersionedFiles for all root ids.
	501
	502	:param revs: the revisions to include
	503	"""
	504	graph = self.source.get_graph()
	505	parent_map = graph.get_parent_map(revs)
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	506	rev_order = topo_sort(parent_map)
	507	rev_id_to_root_id, root_id_to_rev_ids = self._find_root_ids(
3380.2.4 by Aaron Bentley Updates from review	508	revs, parent_map, graph)
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	509	root_id_order = [(rev_id_to_root_id[rev_id], rev_id) for rev_id in
	510	rev_order]
	511	# Guaranteed stable, this groups all the file id operations together
	512	# retaining topological order within the revisions of a file id.
	513	# File id splits and joins would invalidate this, but they don't exist
	514	# yet, and are unlikely to in non-rich-root environments anyway.
	515	root_id_order.sort(key=operator.itemgetter(0))
	516	# Create a record stream containing the roots to create.
	517	def yield_roots():
3350.6.7 by Robert Collins Review feedback, making things more clear, adding documentation on what is used where.	518	for key in root_id_order:
	519	root_id, rev_id = key
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	520	rev_parents = parent_map[rev_id]
3380.2.4 by Aaron Bentley Updates from review	521	# We drop revision parents with different file-ids, because
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	522	# that represents a rename of the root to a different location
	523	# - its not actually a parent for us. (We could look for that
	524	# file id in the revision tree at considerably more expense,
	525	# but for now this is sufficient (and reconcile will catch and
	526	# correct this anyway).
3380.1.3 by Aaron Bentley Fix model-change fetching with ghosts and when fetch is resumed	527	# When a parent revision is a ghost, we guess that its root id
3350.6.4 by Robert Collins First cut at pluralised VersionedFiles. Some rather massive API incompatabilities, primarily because of the difficulty of coherence among competing stores.	528	# was unchanged (rather than trimming it from the parent list).
	529	parent_keys = tuple((root_id, parent) for parent in rev_parents
	530	if parent != NULL_REVISION and
	531	rev_id_to_root_id.get(parent, root_id) == root_id)
	532	yield FulltextContentFactory(key, parent_keys, None, '')
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	533	return [('texts', yield_roots())]
3380.1.6 by Aaron Bentley Ensure fetching munges sha1s	534
1910.2.24 by Aaron Bentley Got intra-repository fetch working between model1 and 2 for all types	535
3565.3.3 by Robert Collins * Fetching data between repositories that have the same model but no	536	class Model1toKnit2Fetcher(RepoFetcher):
1910.2.24 by Aaron Bentley Got intra-repository fetch working between model1 and 2 for all types	537	"""Fetch from a Model1 repository into a Knit2 repository
	538	"""
3169.2.2 by Robert Collins Add a test to Repository.deserialise_inventory that the resulting ivnentory is the one asked for, and update relevant tests. Also tweak the model 1 to 2 regenerate inventories logic to use the revision trees parent marker which is more accurate in some cases.	539	def __init__(self, to_repository, from_repository, last_revision=None,
3172.4.1 by Robert Collins * Fetching via bzr+ssh will no longer fill ghosts by default (this is	540	pb=None, find_ghosts=True):
4022.1.1 by Robert Collins Refactoring of fetch to have a sender and sink component enabling splitting the logic over a network stream. (Robert Collins, Andrew Bennetts)	541	self.helper = Inter1and2Helper(from_repository)
	542	RepoFetcher.__init__(self, to_repository, from_repository,
	543	last_revision, pb, find_ghosts)
	544
	545	def _generate_root_texts(self, revs):
	546	return self.helper.generate_root_texts(revs)
	547
	548	def inventory_fetch_order(self):
	549	return 'topological'
	550
	551	Knit1to2Fetcher = Model1toKnit2Fetcher