/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar

« back to all changes in this revision

Viewing changes to ancestry_test.py

  • Committer: John Arbash Meinel
  • Date: 2009-08-07 03:29:09 UTC
  • mto: This revision was merged to the branch mainline in revision 4613.
  • Revision ID: john@arbash-meinel.com-20090807032909-9xg0nwcqoxi9763y
Removing the min(keys) and max(keys) calls saves 100ms in the inner loop
(get_ancestry() over all of bzr.dev in a single index is 347ms => 245ms).
The current breakdown is roughly:
0.4789      0.1127   bzrlib.btree_index:1129(get_ancestry)
0.0418      0.0418   +<method 'update' of 'set' objects>
0.0480      0.0325   +bzrlib.btree_index:966(_multi_bisect_right)
0.0274      0.0274   +<method 'difference' of 'set' objects>
0.0081      0.0081   +<method 'add' of 'set' objects>
0.0075      0.0075   +<sorted>
0.0048      0.0004   +bzrlib.btree_index:899(_get_internal_nodes)
0.2275      0.0004   +bzrlib.btree_index:917(_get_leaf_nodes)
0.0002      0.0002   +<method 'extend' of 'list' objects>
0.0009      0.0001   +bzrlib.btree_index:1375(key_count)

So we have a bit of just general overhead (112ms), and then
50ms spent in _multi_bisect_right, which we could move to a C extension.
50ms in set.update and 28ms in set.difference
227ms in reading and parsing the 222 nodes from disk.
It seems a little unfortunate that parsing is the primary overhead,
but previous investigation did not reveal much fat that could be trimmed.

It is 3.8MB of uncompressed data that is being parsed. That's got to
take some amount of time. 200ms might be reasonable.
Which would hint that the only way to speed it up would be:
1) a different format
2) don't read the whole thing, stupid :)

Show diffs side-by-side

added added

removed removed

Lines of Context:
 
1
 
 
2
from bzrlib import branch
 
3
 
 
4
def get_gcindex(path):
 
5
    b = branch.Branch.open(path)
 
6
    b.lock_read()
 
7
    r = b.repository
 
8
    rev_id = b.last_revision()
 
9
    rev_key = (rev_id,)
 
10
    gcindex = r.revisions._index
 
11
    return b, rev_key, gcindex
 
12
 
 
13
 
 
14
def get_bindex(path):
 
15
    b = branch.Branch.open(path)
 
16
    b.lock_read()
 
17
    r = b.repository
 
18
    rev_id = b.last_revision()
 
19
    rev_key = (rev_id,)
 
20
    bindex = r.revisions._index._graph_index._indices[0]
 
21
    return b, rev_key, bindex
 
22
 
 
23
 
 
24
def ancestry_from_get_ancestry(path):
 
25
    b, rev_key, bindex = get_bindex(path)
 
26
    keys = set([rev_key])
 
27
    search_keys = set([rev_key])
 
28
    parent_map = {}
 
29
    generation = 0
 
30
    while search_keys:
 
31
        generation += 1
 
32
        missing_keys, search_keys = bindex.get_ancestry(search_keys, 0,
 
33
                                                        parent_map)
 
34
        # print '%4d\t%5d\t%5d' % (generation, len(search_keys),
 
35
        #                          len(parent_map))
 
36
    b.unlock()
 
37
 
 
38
def ancestry_from_get_parent_map(path):
 
39
    b, rev_key, gcindex = get_gcindex(path)
 
40
    search_keys = set([rev_key])
 
41
    parent_map = {}
 
42
    generation = 0
 
43
    while search_keys:
 
44
        next_parent_map = gcindex.get_parent_map(search_keys)
 
45
        next_parent_keys = set()
 
46
        map(next_parent_keys.update, next_parent_map.itervalues())
 
47
        parent_map.update(next_parent_map)
 
48
        next_parent_keys = next_parent_keys.difference(parent_map)
 
49
        generation += 1
 
50
        # print '%4d\t%5d\t%5d' % (generation, len(search_keys),
 
51
        #                          len(parent_map))
 
52
        search_keys = next_parent_keys
 
53
    b.unlock()