/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
4763.2.4 by John Arbash Meinel
merge bzr.2.1 in preparation for NEWS entry.
1
# Copyright (C) 2009, 2010 Canonical Ltd
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
16
17
"""Implementation of Graph algorithms when we have already loaded everything.
18
"""
19
6656.2.2 by Jelmer Vernooij
Use absolute_import.
20
from __future__ import absolute_import
21
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
22
cdef extern from "python-compat.h":
23
    pass
24
25
cdef extern from "Python.h":
26
    ctypedef int Py_ssize_t
27
    ctypedef struct PyObject:
28
        pass
29
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
30
    int PyString_CheckExact(object)
4641.5.12 by John Arbash Meinel
small cleanup.
31
32
    int PyObject_RichCompareBool(object, object, int)
4641.5.11 by John Arbash Meinel
Using PyObject_RichCompareBool is a significant benefit. 8.8ms down to 7.7ms.
33
    int Py_LT
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
34
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
35
    int PyTuple_CheckExact(object)
4371.3.31 by John Arbash Meinel
Change _KnownGraph.parents to be a tuple rather than a list.
36
    object PyTuple_New(Py_ssize_t n)
4371.4.23 by John Arbash Meinel
Clean up the imports at the top.
37
    Py_ssize_t PyTuple_GET_SIZE(object t)
38
    PyObject * PyTuple_GET_ITEM(object t, Py_ssize_t o)
4371.3.31 by John Arbash Meinel
Change _KnownGraph.parents to be a tuple rather than a list.
39
    void PyTuple_SET_ITEM(object t, Py_ssize_t o, object v)
4371.4.23 by John Arbash Meinel
Clean up the imports at the top.
40
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
41
    int PyList_CheckExact(object)
4371.4.23 by John Arbash Meinel
Clean up the imports at the top.
42
    Py_ssize_t PyList_GET_SIZE(object l)
43
    PyObject * PyList_GET_ITEM(object l, Py_ssize_t o)
44
    int PyList_SetItem(object l, Py_ssize_t o, object l) except -1
45
    int PyList_Append(object l, object v) except -1
46
47
    int PyDict_CheckExact(object d)
48
    Py_ssize_t PyDict_Size(object d) except -1
4371.3.28 by John Arbash Meinel
Finish converting away from 'cdef list' syntax.
49
    PyObject * PyDict_GetItem(object d, object k)
4371.4.23 by John Arbash Meinel
Clean up the imports at the top.
50
    int PyDict_SetItem(object d, object k, object v) except -1
4371.4.18 by John Arbash Meinel
Big performance win, back to 650ms.
51
    int PyDict_DelItem(object d, object k) except -1
4371.3.28 by John Arbash Meinel
Finish converting away from 'cdef list' syntax.
52
    int PyDict_Next(object d, Py_ssize_t *pos, PyObject **k, PyObject **v)
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
53
4371.3.20 by John Arbash Meinel
A few tweaks to the pyrex version.
54
    void Py_INCREF(object)
55
6800.1.3 by Jelmer Vernooij
Avoid modifying possibly lazily imported module.
56
import collections
4593.5.15 by John Arbash Meinel
Use the same implementation of stack handling to avoid append/pop.
57
import gc
4371.3.21 by John Arbash Meinel
Attempt to make things better by splitting out left_parent and right_parent into
58
6656.2.2 by Jelmer Vernooij
Use absolute_import.
59
from . import errors, revision
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
60
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
61
cdef object NULL_REVISION
62
NULL_REVISION = revision.NULL_REVISION
63
64
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
65
cdef class _KnownGraphNode:
66
    """Represents a single object in the known graph."""
67
4371.3.35 by John Arbash Meinel
Changing the heapq algorithm actually ends up with better overall performance.
68
    cdef object key
4371.3.28 by John Arbash Meinel
Finish converting away from 'cdef list' syntax.
69
    cdef object parents
70
    cdef object children
4371.4.13 by Vincent Ladeuil
Fixed as per John's review.
71
    cdef public long gdfo
4371.4.22 by John Arbash Meinel
Changing from a set of seen to a list to cleanup and a seen attribute.
72
    cdef int seen
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
73
    cdef object extra
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
74
4371.3.31 by John Arbash Meinel
Change _KnownGraph.parents to be a tuple rather than a list.
75
    def __init__(self, key):
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
76
        self.key = key
4371.3.31 by John Arbash Meinel
Change _KnownGraph.parents to be a tuple rather than a list.
77
        self.parents = None
4371.3.21 by John Arbash Meinel
Attempt to make things better by splitting out left_parent and right_parent into
78
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
79
        self.children = []
80
        # Greatest distance from origin
81
        self.gdfo = -1
4371.4.22 by John Arbash Meinel
Changing from a set of seen to a list to cleanup and a seen attribute.
82
        self.seen = 0
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
83
        self.extra = None
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
84
85
    property child_keys:
86
        def __get__(self):
87
            cdef _KnownGraphNode child
88
89
            keys = []
90
            for child in self.children:
4371.3.28 by John Arbash Meinel
Finish converting away from 'cdef list' syntax.
91
                PyList_Append(keys, child.key)
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
92
            return keys
93
4648.2.1 by Gary van der Merwe
Add get_parent_keys, and get_child_keys to KnownGraph.
94
    property parent_keys:
95
        def __get__(self):
96
            if self.parents is None:
97
                return None
98
            
99
            cdef _KnownGraphNode parent
100
101
            keys = []
102
            for parent in self.parents:
103
                PyList_Append(keys, parent.key)
104
            return keys
105
    
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
106
    cdef clear_references(self):
107
        self.parents = None
108
        self.children = None
109
110
    def __repr__(self):
4371.3.35 by John Arbash Meinel
Changing the heapq algorithm actually ends up with better overall performance.
111
        cdef _KnownGraphNode node
112
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
113
        parent_keys = []
4371.3.35 by John Arbash Meinel
Changing the heapq algorithm actually ends up with better overall performance.
114
        if self.parents is not None:
115
            for node in self.parents:
116
                parent_keys.append(node.key)
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
117
        child_keys = []
4371.3.35 by John Arbash Meinel
Changing the heapq algorithm actually ends up with better overall performance.
118
        if self.children is not None:
119
            for node in self.children:
120
                child_keys.append(node.key)
4371.4.10 by Vincent Ladeuil
Cleanup.
121
        return '%s(%s  gdfo:%s par:%s child:%s)' % (
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
122
            self.__class__.__name__, self.key, self.gdfo,
4371.4.12 by Vincent Ladeuil
Fix typos.
123
            parent_keys, child_keys)
4371.4.10 by Vincent Ladeuil
Cleanup.
124
4371.3.37 by John Arbash Meinel
cleanup passes.
125
4634.117.8 by John Arbash Meinel
Use a class regex to find other signatures to exclude.
126
cdef _KnownGraphNode _get_list_node(lst, Py_ssize_t pos):
4371.4.14 by John Arbash Meinel
Optimize _find_gdfo and make known_parent_gdfos a member of KnownGraphNode
127
    cdef PyObject *temp_node
128
129
    temp_node = PyList_GET_ITEM(lst, pos)
130
    return <_KnownGraphNode>temp_node
131
132
4634.117.8 by John Arbash Meinel
Use a class regex to find other signatures to exclude.
133
cdef _KnownGraphNode _get_tuple_node(tpl, Py_ssize_t pos):
4371.4.14 by John Arbash Meinel
Optimize _find_gdfo and make known_parent_gdfos a member of KnownGraphNode
134
    cdef PyObject *temp_node
135
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
136
    temp_node = PyTuple_GET_ITEM(tpl, pos)
4371.4.14 by John Arbash Meinel
Optimize _find_gdfo and make known_parent_gdfos a member of KnownGraphNode
137
    return <_KnownGraphNode>temp_node
138
139
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
140
def get_key(node):
141
    cdef _KnownGraphNode real_node
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
142
    real_node = node
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
143
    return real_node.key
144
145
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
146
cdef object _sort_list_nodes(object lst_or_tpl, int reverse):
147
    """Sort a list of _KnownGraphNode objects.
148
149
    If lst_or_tpl is a list, it is allowed to mutate in place. It may also
150
    just return the input list if everything is already sorted.
151
    """
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
152
    cdef _KnownGraphNode node1, node2
4641.5.9 by John Arbash Meinel
Explore just walking the parents directly, without sorting.
153
    cdef int do_swap, is_tuple
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
154
    cdef Py_ssize_t length
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
155
4641.5.9 by John Arbash Meinel
Explore just walking the parents directly, without sorting.
156
    is_tuple = PyTuple_CheckExact(lst_or_tpl)
157
    if not (is_tuple or PyList_CheckExact(lst_or_tpl)):
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
158
        raise TypeError('lst_or_tpl must be a list or tuple.')
159
    length = len(lst_or_tpl)
160
    if length == 0 or length == 1:
161
        return lst_or_tpl
162
    if length == 2:
4641.5.9 by John Arbash Meinel
Explore just walking the parents directly, without sorting.
163
        if is_tuple:
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
164
            node1 = _get_tuple_node(lst_or_tpl, 0)
165
            node2 = _get_tuple_node(lst_or_tpl, 1)
166
        else:
167
            node1 = _get_list_node(lst_or_tpl, 0)
168
            node2 = _get_list_node(lst_or_tpl, 1)
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
169
        if reverse:
4641.5.11 by John Arbash Meinel
Using PyObject_RichCompareBool is a significant benefit. 8.8ms down to 7.7ms.
170
            do_swap = PyObject_RichCompareBool(node1.key, node2.key, Py_LT)
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
171
        else:
4641.5.11 by John Arbash Meinel
Using PyObject_RichCompareBool is a significant benefit. 8.8ms down to 7.7ms.
172
            do_swap = PyObject_RichCompareBool(node2.key, node1.key, Py_LT)
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
173
        if not do_swap:
174
            return lst_or_tpl
4641.5.9 by John Arbash Meinel
Explore just walking the parents directly, without sorting.
175
        if is_tuple:
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
176
            return (node2, node1)
177
        else:
4641.5.9 by John Arbash Meinel
Explore just walking the parents directly, without sorting.
178
            # Swap 'in-place', since lists are mutable
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
179
            Py_INCREF(node1)
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
180
            PyList_SetItem(lst_or_tpl, 1, node1)
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
181
            Py_INCREF(node2)
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
182
            PyList_SetItem(lst_or_tpl, 0, node2)
183
            return lst_or_tpl
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
184
    # For all other sizes, we just use 'sorted()'
4641.5.9 by John Arbash Meinel
Explore just walking the parents directly, without sorting.
185
    if is_tuple:
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
186
        # Note that sorted() is just list(iterable).sort()
187
        lst_or_tpl = list(lst_or_tpl)
188
    lst_or_tpl.sort(key=get_key, reverse=reverse)
189
    return lst_or_tpl
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
190
191
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
192
cdef class _MergeSorter
193
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
194
cdef class KnownGraph:
195
    """This is a class which assumes we already know the full graph."""
196
197
    cdef public object _nodes
4838.1.2 by John Arbash Meinel
Implement KnownGraph.add_node() for the pyrex version.
198
    cdef public object _known_heads
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
199
    cdef public int do_cache
200
201
    def __init__(self, parent_map, do_cache=True):
202
        """Create a new KnownGraph instance.
203
204
        :param parent_map: A dictionary mapping key => parent_keys
205
        """
4371.4.29 by John Arbash Meinel
Pre-allocating self._nodes turned out to be slower.
206
        # tests at pre-allocating the node dict actually slowed things down
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
207
        self._nodes = {}
208
        # Maps {sorted(revision_id, revision_id): heads}
209
        self._known_heads = {}
210
        self.do_cache = int(do_cache)
4593.5.29 by John Arbash Meinel
Remove the gc enable/disable code.
211
        # TODO: consider disabling gc since we are allocating a lot of nodes
212
        #       that won't be collectable anyway. real world testing has not
213
        #       shown a specific impact, yet.
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
214
        self._initialize_nodes(parent_map)
215
        self._find_gdfo()
216
217
    def __dealloc__(self):
218
        cdef _KnownGraphNode child
4371.3.28 by John Arbash Meinel
Finish converting away from 'cdef list' syntax.
219
        cdef Py_ssize_t pos
220
        cdef PyObject *temp_node
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
221
4371.3.28 by John Arbash Meinel
Finish converting away from 'cdef list' syntax.
222
        while PyDict_Next(self._nodes, &pos, NULL, &temp_node):
223
            child = <_KnownGraphNode>temp_node
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
224
            child.clear_references()
225
4634.117.8 by John Arbash Meinel
Use a class regex to find other signatures to exclude.
226
    cdef _KnownGraphNode _get_or_create_node(self, key):
4371.3.31 by John Arbash Meinel
Change _KnownGraph.parents to be a tuple rather than a list.
227
        cdef PyObject *temp_node
228
        cdef _KnownGraphNode node
229
230
        temp_node = PyDict_GetItem(self._nodes, key)
231
        if temp_node == NULL:
232
            node = _KnownGraphNode(key)
233
            PyDict_SetItem(self._nodes, key, node)
234
        else:
235
            node = <_KnownGraphNode>temp_node
236
        return node
237
4838.1.2 by John Arbash Meinel
Implement KnownGraph.add_node() for the pyrex version.
238
    cdef _populate_parents(self, _KnownGraphNode node, parent_keys):
239
        cdef Py_ssize_t num_parent_keys, pos
240
        cdef _KnownGraphNode parent_node
241
242
        num_parent_keys = len(parent_keys)
243
        # We know how many parents, so we pre allocate the tuple
244
        parent_nodes = PyTuple_New(num_parent_keys)
245
        for pos from 0 <= pos < num_parent_keys:
246
            # Note: it costs us 10ms out of 40ms to lookup all of these
247
            #       parents, it doesn't seem to be an allocation overhead,
248
            #       but rather a lookup overhead. There doesn't seem to be
249
            #       a way around it, and that is one reason why
250
            #       KnownGraphNode maintains a direct pointer to the parent
251
            #       node.
252
            # We use [] because parent_keys may be a tuple or list
253
            parent_node = self._get_or_create_node(parent_keys[pos])
254
            # PyTuple_SET_ITEM will steal a reference, so INCREF first
255
            Py_INCREF(parent_node)
256
            PyTuple_SET_ITEM(parent_nodes, pos, parent_node)
257
            PyList_Append(parent_node.children, node)
258
        node.parents = parent_nodes
259
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
260
    def _initialize_nodes(self, parent_map):
261
        """Populate self._nodes.
262
4371.4.9 by Vincent Ladeuil
Simplify gdfo computing by finding tails when at graph build time.
263
        After this has finished:
264
        - self._nodes will have an entry for every entry in parent_map.
265
        - ghosts will have a parent_keys = None,
4371.4.12 by Vincent Ladeuil
Fix typos.
266
        - all nodes found will also have child_keys populated with all known
267
          child keys,
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
268
        """
4371.3.28 by John Arbash Meinel
Finish converting away from 'cdef list' syntax.
269
        cdef PyObject *temp_key, *temp_parent_keys, *temp_node
4838.1.2 by John Arbash Meinel
Implement KnownGraph.add_node() for the pyrex version.
270
        cdef Py_ssize_t pos
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
271
        cdef _KnownGraphNode node
272
        cdef _KnownGraphNode parent_node
273
4371.3.28 by John Arbash Meinel
Finish converting away from 'cdef list' syntax.
274
        if not PyDict_CheckExact(parent_map):
275
            raise TypeError('parent_map should be a dict of {key:parent_keys}')
276
        # for key, parent_keys in parent_map.iteritems():
277
        pos = 0
278
        while PyDict_Next(parent_map, &pos, &temp_key, &temp_parent_keys):
279
            key = <object>temp_key
280
            parent_keys = <object>temp_parent_keys
4371.4.28 by John Arbash Meinel
We spent a lot of time adding and removing keys from tails.
281
            node = self._get_or_create_node(key)
4838.1.2 by John Arbash Meinel
Implement KnownGraph.add_node() for the pyrex version.
282
            self._populate_parents(node, parent_keys)
4371.4.30 by John Arbash Meinel
We don't need self._tails anymore, nor does it need to be a set.
283
284
    def _find_tails(self):
285
        cdef PyObject *temp_node
286
        cdef _KnownGraphNode node
287
        cdef Py_ssize_t pos
288
289
        tails = []
4371.4.28 by John Arbash Meinel
We spent a lot of time adding and removing keys from tails.
290
        pos = 0
291
        while PyDict_Next(self._nodes, &pos, NULL, &temp_node):
292
            node = <_KnownGraphNode>temp_node
293
            if node.parents is None or PyTuple_GET_SIZE(node.parents) == 0:
4371.4.30 by John Arbash Meinel
We don't need self._tails anymore, nor does it need to be a set.
294
                node.gdfo = 1
295
                PyList_Append(tails, node)
296
        return tails
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
297
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
298
    def _find_tips(self):
299
        cdef PyObject *temp_node
300
        cdef _KnownGraphNode node
301
        cdef Py_ssize_t pos
302
303
        tips = []
304
        pos = 0
305
        while PyDict_Next(self._nodes, &pos, NULL, &temp_node):
306
            node = <_KnownGraphNode>temp_node
307
            if PyList_GET_SIZE(node.children) == 0:
308
                PyList_Append(tips, node)
309
        return tips
310
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
311
    def _find_gdfo(self):
4371.4.8 by Vincent Ladeuil
Replace the existing KnownGraph._find_gdfo.
312
        cdef _KnownGraphNode node
313
        cdef _KnownGraphNode child
4371.4.14 by John Arbash Meinel
Optimize _find_gdfo and make known_parent_gdfos a member of KnownGraphNode
314
        cdef PyObject *temp
4371.4.30 by John Arbash Meinel
We don't need self._tails anymore, nor does it need to be a set.
315
        cdef Py_ssize_t pos
4371.4.14 by John Arbash Meinel
Optimize _find_gdfo and make known_parent_gdfos a member of KnownGraphNode
316
        cdef int replace
4371.4.26 by John Arbash Meinel
Don't shrink the pending list.
317
        cdef Py_ssize_t last_item
4371.4.25 by John Arbash Meinel
We had an 'child_node.gdfo is None' check in the inner loop.
318
        cdef long next_gdfo
4371.4.8 by Vincent Ladeuil
Replace the existing KnownGraph._find_gdfo.
319
4371.4.30 by John Arbash Meinel
We don't need self._tails anymore, nor does it need to be a set.
320
        pending = self._find_tails()
4371.4.8 by Vincent Ladeuil
Replace the existing KnownGraph._find_gdfo.
321
4371.4.26 by John Arbash Meinel
Don't shrink the pending list.
322
        last_item = PyList_GET_SIZE(pending) - 1
323
        while last_item >= 0:
4371.4.14 by John Arbash Meinel
Optimize _find_gdfo and make known_parent_gdfos a member of KnownGraphNode
324
            # Avoid pop followed by push, instead, peek, and replace
4371.4.16 by John Arbash Meinel
Document some perf issues with using known_parent_gdfos, but leave it as a dict.
325
            # timing shows this is 930ms => 770ms for OOo
4371.4.26 by John Arbash Meinel
Don't shrink the pending list.
326
            node = _get_list_node(pending, last_item)
327
            last_item = last_item - 1
4371.4.25 by John Arbash Meinel
We had an 'child_node.gdfo is None' check in the inner loop.
328
            next_gdfo = node.gdfo + 1
4371.4.30 by John Arbash Meinel
We don't need self._tails anymore, nor does it need to be a set.
329
            for pos from 0 <= pos < PyList_GET_SIZE(node.children):
330
                child = _get_list_node(node.children, pos)
4371.4.25 by John Arbash Meinel
We had an 'child_node.gdfo is None' check in the inner loop.
331
                if next_gdfo > child.gdfo:
332
                    child.gdfo = next_gdfo
4371.4.27 by John Arbash Meinel
Re-use the child.seen attribute for tracking num parents.
333
                child.seen = child.seen + 1
334
                if child.seen == PyTuple_GET_SIZE(child.parents):
4371.4.17 by John Arbash Meinel
tried a different tack, to use a new list and append.
335
                    # This child is populated, queue it to be walked
4371.4.26 by John Arbash Meinel
Don't shrink the pending list.
336
                    last_item = last_item + 1
337
                    if last_item < PyList_GET_SIZE(pending):
4371.4.14 by John Arbash Meinel
Optimize _find_gdfo and make known_parent_gdfos a member of KnownGraphNode
338
                        Py_INCREF(child) # SetItem steals a ref
4371.4.26 by John Arbash Meinel
Don't shrink the pending list.
339
                        PyList_SetItem(pending, last_item, child)
4371.4.14 by John Arbash Meinel
Optimize _find_gdfo and make known_parent_gdfos a member of KnownGraphNode
340
                    else:
341
                        PyList_Append(pending, child)
4371.4.27 by John Arbash Meinel
Re-use the child.seen attribute for tracking num parents.
342
                    # We have queued this node, we don't need to track it
343
                    # anymore
344
                    child.seen = 0
4371.4.8 by Vincent Ladeuil
Replace the existing KnownGraph._find_gdfo.
345
4838.1.2 by John Arbash Meinel
Implement KnownGraph.add_node() for the pyrex version.
346
    def add_node(self, key, parent_keys):
347
        """Add a new node to the graph.
348
349
        If this fills in a ghost, then the gdfos of all children will be
350
        updated accordingly.
351
        
352
        :param key: The node being added. If this is a duplicate, this is a
353
            no-op.
354
        :param parent_keys: The parents of the given node.
355
        :return: None (should we return if this was a ghost, etc?)
356
        """
357
        cdef PyObject *maybe_node
358
        cdef _KnownGraphNode node, parent_node, child_node
359
        cdef long parent_gdfo, next_gdfo
360
361
        maybe_node = PyDict_GetItem(self._nodes, key)
362
        if maybe_node != NULL:
363
            node = <_KnownGraphNode>maybe_node
364
            if node.parents is None:
365
                # We are filling in a ghost
366
                self._populate_parents(node, parent_keys)
367
                # We can't trust cached heads anymore
368
                self._known_heads.clear()
369
            else: # Ensure that the parent_key list matches
370
                existing_parent_keys = []
371
                for parent_node in node.parents:
372
                    existing_parent_keys.append(parent_node.key)
373
                # Make sure we use a list for the comparison, in case it was a
374
                # tuple, etc
375
                parent_keys = list(parent_keys)
376
                if existing_parent_keys == parent_keys:
377
                    # Exact match, nothing more to do
378
                    return
379
                else:
380
                    raise ValueError('Parent key mismatch, existing node %s'
381
                        ' has parents of %s not %s'
382
                        % (key, existing_parent_keys, parent_keys))
383
        else:
384
            node = _KnownGraphNode(key)
385
            PyDict_SetItem(self._nodes, key, node)
386
            self._populate_parents(node, parent_keys)
387
        parent_gdfo = 0
388
        for parent_node in node.parents:
389
            if parent_node.gdfo == -1:
390
                # This is a newly introduced ghost, so it gets gdfo of 1
391
                parent_node.gdfo = 1
392
            if parent_gdfo < parent_node.gdfo:
393
                parent_gdfo = parent_node.gdfo
394
        node.gdfo = parent_gdfo + 1
395
        # Now fill the gdfo to all children
396
        # Note that this loop is slightly inefficient, in that we may visit the
397
        # same child (and its decendents) more than once, however, it is
398
        # 'efficient' in that we only walk to nodes that would be updated,
399
        # rather than all nodes
400
        # We use a deque rather than a simple list stack, to go for BFD rather
401
        # than DFD. So that if a longer path is possible, we walk it before we
402
        # get to the final child
6800.1.3 by Jelmer Vernooij
Avoid modifying possibly lazily imported module.
403
        pending = collections.deque([node])
4838.1.2 by John Arbash Meinel
Implement KnownGraph.add_node() for the pyrex version.
404
        pending_popleft = pending.popleft
405
        pending_append = pending.append
406
        while pending:
407
            node = pending_popleft()
408
            next_gdfo = node.gdfo + 1
409
            for child_node in node.children:
410
                if child_node.gdfo < next_gdfo:
411
                    # This child is being updated, we need to check its
412
                    # children
413
                    child_node.gdfo = next_gdfo
414
                    pending_append(child_node)
415
4371.3.19 by John Arbash Meinel
Get an initial pyrex implementation.
416
    def heads(self, keys):
417
        """Return the heads from amongst keys.
418
419
        This is done by searching the ancestries of each key.  Any key that is
420
        reachable from another key is not returned; all the others are.
421
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
422
        This operation scales with the relative depth between any two keys. It
423
        uses gdfo to avoid walking all ancestry.
424
425
        :param keys: An iterable of keys.
426
        :return: A set of the heads. Note that as a set there is no ordering
427
            information. Callers will need to filter their input to create
428
            order if they need it.
429
        """
430
        cdef PyObject *maybe_node
431
        cdef PyObject *maybe_heads
432
        cdef PyObject *temp_node
433
        cdef _KnownGraphNode node
4371.4.31 by John Arbash Meinel
Apply the same 'never pop' logic to the heads() code.
434
        cdef Py_ssize_t pos, last_item
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
435
        cdef long min_gdfo
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
436
4526.10.1 by John Arbash Meinel
Avoid using PyFrozenSet_New
437
        heads_key = frozenset(keys)
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
438
        maybe_heads = PyDict_GetItem(self._known_heads, heads_key)
439
        if maybe_heads != NULL:
440
            return <object>maybe_heads
441
        # Not cached, compute it ourselves
442
        candidate_nodes = {}
443
        for key in keys:
4371.4.28 by John Arbash Meinel
We spent a lot of time adding and removing keys from tails.
444
            maybe_node = PyDict_GetItem(self._nodes, key)
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
445
            if maybe_node == NULL:
446
                raise KeyError('key %s not in nodes' % (key,))
447
            PyDict_SetItem(candidate_nodes, key, <object>maybe_node)
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
448
        maybe_node = PyDict_GetItem(candidate_nodes, NULL_REVISION)
449
        if maybe_node != NULL:
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
450
            # NULL_REVISION is only a head if it is the only entry
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
451
            candidate_nodes.pop(NULL_REVISION)
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
452
            if not candidate_nodes:
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
453
                return frozenset([NULL_REVISION])
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
454
            # The keys changed, so recalculate heads_key
4526.10.1 by John Arbash Meinel
Avoid using PyFrozenSet_New
455
            heads_key = frozenset(candidate_nodes)
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
456
        if PyDict_Size(candidate_nodes) < 2:
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
457
            return heads_key
458
4371.4.22 by John Arbash Meinel
Changing from a set of seen to a list to cleanup and a seen attribute.
459
        cleanup = []
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
460
        pending = []
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
461
        # we know a gdfo cannot be longer than a linear chain of all nodes
462
        min_gdfo = PyDict_Size(self._nodes) + 1
463
        # Build up nodes that need to be walked, note that starting nodes are
464
        # not added to seen()
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
465
        pos = 0
466
        while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
467
            node = <_KnownGraphNode>temp_node
468
            if node.parents is not None:
469
                pending.extend(node.parents)
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
470
            if node.gdfo < min_gdfo:
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
471
                min_gdfo = node.gdfo
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
472
473
        # Now do all the real work
4371.4.31 by John Arbash Meinel
Apply the same 'never pop' logic to the heads() code.
474
        last_item = PyList_GET_SIZE(pending) - 1
475
        while last_item >= 0:
476
            node = _get_list_node(pending, last_item)
477
            last_item = last_item - 1
4371.4.22 by John Arbash Meinel
Changing from a set of seen to a list to cleanup and a seen attribute.
478
            if node.seen:
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
479
                # node already appears in some ancestry
480
                continue
4371.4.22 by John Arbash Meinel
Changing from a set of seen to a list to cleanup and a seen attribute.
481
            PyList_Append(cleanup, node)
482
            node.seen = 1
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
483
            if node.gdfo <= min_gdfo:
484
                continue
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
485
            if node.parents is not None and PyTuple_GET_SIZE(node.parents) > 0:
486
                for pos from 0 <= pos < PyTuple_GET_SIZE(node.parents):
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
487
                    parent_node = _get_tuple_node(node.parents, pos)
4371.4.31 by John Arbash Meinel
Apply the same 'never pop' logic to the heads() code.
488
                    last_item = last_item + 1
489
                    if last_item < PyList_GET_SIZE(pending):
490
                        Py_INCREF(parent_node) # SetItem steals a ref
491
                        PyList_SetItem(pending, last_item, parent_node)
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
492
                    else:
493
                        PyList_Append(pending, parent_node)
4371.4.22 by John Arbash Meinel
Changing from a set of seen to a list to cleanup and a seen attribute.
494
        heads = []
495
        pos = 0
496
        while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
497
            node = <_KnownGraphNode>temp_node
498
            if not node.seen:
499
                PyList_Append(heads, node.key)
4526.10.1 by John Arbash Meinel
Avoid using PyFrozenSet_New
500
        heads = frozenset(heads)
4371.4.22 by John Arbash Meinel
Changing from a set of seen to a list to cleanup and a seen attribute.
501
        for pos from 0 <= pos < PyList_GET_SIZE(cleanup):
502
            node = _get_list_node(cleanup, pos)
503
            node.seen = 0
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
504
        if self.do_cache:
4371.4.21 by John Arbash Meinel
Spend some time optimizing the inner heads() loop, approx 2x faster.
505
            PyDict_SetItem(self._known_heads, heads_key, heads)
4371.4.7 by Vincent Ladeuil
Simple Pyrex version for the new heads() method.
506
        return heads
4577.3.2 by John Arbash Meinel
Implement KnownGraph.topo_sort.
507
508
    def topo_sort(self):
509
        """Return the nodes in topological order.
510
511
        All parents must occur before all children.
512
        """
513
        # This is, for the most part, the same iteration order that we used for
514
        # _find_gdfo, consider finding a way to remove the duplication
515
        # In general, we find the 'tails' (nodes with no parents), and then
516
        # walk to the children. For children that have all of their parents
517
        # yielded, we queue up the child to be yielded as well.
518
        cdef _KnownGraphNode node
519
        cdef _KnownGraphNode child
520
        cdef PyObject *temp
521
        cdef Py_ssize_t pos
522
        cdef int replace
523
        cdef Py_ssize_t last_item
524
525
        pending = self._find_tails()
4593.5.3 by John Arbash Meinel
Bring in the optimized tsort code.
526
        if PyList_GET_SIZE(pending) == 0 and len(self._nodes) > 0:
527
            raise errors.GraphCycleError(self._nodes)
4577.3.2 by John Arbash Meinel
Implement KnownGraph.topo_sort.
528
529
        topo_order = []
530
531
        last_item = PyList_GET_SIZE(pending) - 1
532
        while last_item >= 0:
533
            # Avoid pop followed by push, instead, peek, and replace
534
            # timing shows this is 930ms => 770ms for OOo
535
            node = _get_list_node(pending, last_item)
536
            last_item = last_item - 1
4593.5.3 by John Arbash Meinel
Bring in the optimized tsort code.
537
            if node.parents is not None:
538
                # We don't include ghost parents
539
                PyList_Append(topo_order, node.key)
4577.3.2 by John Arbash Meinel
Implement KnownGraph.topo_sort.
540
            for pos from 0 <= pos < PyList_GET_SIZE(node.children):
541
                child = _get_list_node(node.children, pos)
4593.5.3 by John Arbash Meinel
Bring in the optimized tsort code.
542
                if child.gdfo == -1:
543
                    # We know we have a graph cycle because a node has a parent
544
                    # which we couldn't find
545
                    raise errors.GraphCycleError(self._nodes)
4577.3.2 by John Arbash Meinel
Implement KnownGraph.topo_sort.
546
                child.seen = child.seen + 1
547
                if child.seen == PyTuple_GET_SIZE(child.parents):
548
                    # All parents of this child have been yielded, queue this
549
                    # one to be yielded as well
550
                    last_item = last_item + 1
551
                    if last_item < PyList_GET_SIZE(pending):
552
                        Py_INCREF(child) # SetItem steals a ref
553
                        PyList_SetItem(pending, last_item, child)
554
                    else:
555
                        PyList_Append(pending, child)
556
                    # We have queued this node, we don't need to track it
557
                    # anymore
558
                    child.seen = 0
559
        # We started from the parents, so we don't need to do anymore work
560
        return topo_order
4593.5.6 by John Arbash Meinel
Create the crude merge_sort implementation that just thunks over to the old implementation.
561
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
562
    def gc_sort(self):
563
        """Return a reverse topological ordering which is 'stable'.
564
565
        There are a few constraints:
566
          1) Reverse topological (all children before all parents)
567
          2) Grouped by prefix
568
          3) 'stable' sorting, so that we get the same result, independent of
569
             machine, or extra data.
570
        To do this, we use the same basic algorithm as topo_sort, but when we
571
        aren't sure what node to access next, we sort them lexicographically.
572
        """
573
        cdef PyObject *temp
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
574
        cdef Py_ssize_t pos, last_item
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
575
        cdef _KnownGraphNode node, node2, parent_node
576
577
        tips = self._find_tips()
578
        # Split the tips based on prefix
579
        prefix_tips = {}
580
        for pos from 0 <= pos < PyList_GET_SIZE(tips):
581
            node = _get_list_node(tips, pos)
582
            if PyString_CheckExact(node.key) or len(node.key) == 1:
583
                prefix = ''
584
            else:
585
                prefix = node.key[0]
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
586
            temp = PyDict_GetItem(prefix_tips, prefix)
587
            if temp == NULL:
588
                prefix_tips[prefix] = [node]
589
            else:
590
                tip_nodes = <object>temp
591
                PyList_Append(tip_nodes, node)
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
592
593
        result = []
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
594
        for prefix in sorted(prefix_tips):
595
            temp = PyDict_GetItem(prefix_tips, prefix)
596
            assert temp != NULL
597
            tip_nodes = <object>temp
598
            pending = _sort_list_nodes(tip_nodes, 1)
599
            last_item = PyList_GET_SIZE(pending) - 1
600
            while last_item >= 0:
601
                node = _get_list_node(pending, last_item)
602
                last_item = last_item - 1
4641.5.6 by John Arbash Meinel
Add support for skipping ghost nodes.
603
                if node.parents is None:
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
604
                    # Ghost
4641.5.6 by John Arbash Meinel
Add support for skipping ghost nodes.
605
                    continue
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
606
                PyList_Append(result, node.key)
4641.5.11 by John Arbash Meinel
Using PyObject_RichCompareBool is a significant benefit. 8.8ms down to 7.7ms.
607
                # Sorting the parent keys isn't strictly necessary for stable
608
                # sorting of a given graph. But it does help minimize the
609
                # differences between graphs
610
                # For bzr.dev ancestry:
611
                #   4.73ms  no sort
612
                #   7.73ms  RichCompareBool sort
4641.5.10 by John Arbash Meinel
Go back to sorted parents mode.
613
                parents = _sort_list_nodes(node.parents, 1)
614
                for pos from 0 <= pos < len(parents):
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
615
                    if PyTuple_CheckExact(parents):
616
                        parent_node = _get_tuple_node(parents, pos)
617
                    else:
618
                        parent_node = _get_list_node(parents, pos)
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
619
                    # TODO: GraphCycle detection
620
                    parent_node.seen = parent_node.seen + 1
621
                    if (parent_node.seen
622
                        == PyList_GET_SIZE(parent_node.children)):
623
                        # All children have been processed, queue up this
624
                        # parent
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
625
                        last_item = last_item + 1
626
                        if last_item < PyList_GET_SIZE(pending):
627
                            Py_INCREF(parent_node) # SetItem steals a ref
628
                            PyList_SetItem(pending, last_item, parent_node)
629
                        else:
630
                            PyList_Append(pending, parent_node)
4641.5.5 by John Arbash Meinel
Implement initial gc_sort in pyrex.
631
                        parent_node.seen = 0
632
        return result
4593.5.6 by John Arbash Meinel
Create the crude merge_sort implementation that just thunks over to the old implementation.
633
634
    def merge_sort(self, tip_key):
635
        """Compute the merge sorted graph output."""
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
636
        cdef _MergeSorter sorter
637
4593.5.29 by John Arbash Meinel
Remove the gc enable/disable code.
638
        # TODO: consider disabling gc since we are allocating a lot of nodes
639
        #       that won't be collectable anyway. real world testing has not
640
        #       shown a specific impact, yet.
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
641
        sorter = _MergeSorter(self, tip_key)
642
        return sorter.topo_order()
4648.2.1 by Gary van der Merwe
Add get_parent_keys, and get_child_keys to KnownGraph.
643
    
644
    def get_parent_keys(self, key):
645
        """Get the parents for a key
646
        
647
        Returns a list containg the parents keys. If the key is a ghost,
648
        None is returned. A KeyError will be raised if the key is not in
649
        the graph.
650
        
651
        :param keys: Key to check (eg revision_id)
652
        :return: A list of parents
653
        """
654
        return self._nodes[key].parent_keys 
655
656
    def get_child_keys(self, key):
657
        """Get the children for a key
658
        
659
        Returns a list containg the children keys. A KeyError will be raised
660
        if the key is not in the graph.
661
        
662
        :param keys: Key to check (eg revision_id)
663
        :return: A list of children
664
        """
665
        return self._nodes[key].child_keys    
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
666
667
668
cdef class _MergeSortNode:
669
    """Tracks information about a node during the merge_sort operation."""
670
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
671
    # Public api
672
    cdef public object key
673
    cdef public long merge_depth
674
    cdef public object end_of_merge # True/False Is this the end of the current merge
675
676
    # Private api, used while computing the information
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
677
    cdef _KnownGraphNode left_parent
678
    cdef _KnownGraphNode left_pending_parent
679
    cdef object pending_parents # list of _KnownGraphNode for non-left parents
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
680
    cdef long _revno_first
681
    cdef long _revno_second
682
    cdef long _revno_last
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
683
    # TODO: turn these into flag/bit fields rather than individual members
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
684
    cdef int is_first_child # Is this the first child?
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
685
    cdef int seen_by_child # A child node has seen this parent
686
    cdef int completed # Fully Processed
687
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
688
    def __init__(self, key):
689
        self.key = key
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
690
        self.merge_depth = -1
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
691
        self.left_parent = None
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
692
        self.left_pending_parent = None
693
        self.pending_parents = None
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
694
        self._revno_first = -1
695
        self._revno_second = -1
696
        self._revno_last = -1
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
697
        self.is_first_child = 0
698
        self.seen_by_child = 0
699
        self.completed = 0
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
700
701
    def __repr__(self):
4634.11.1 by John Arbash Meinel
Fix bug #419241. If a graph had a mainline ghost
702
        return '%s(%s depth:%s rev:%s,%s,%s first:%s seen:%s)' % (
703
            self.__class__.__name__, self.key,
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
704
            self.merge_depth,
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
705
            self._revno_first, self._revno_second, self._revno_last,
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
706
            self.is_first_child, self.seen_by_child)
707
4634.117.10 by John Arbash Meinel
Change 'no except' to 'cannot_raise'
708
    cdef int has_pending_parents(self): # cannot_raise
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
709
        if self.left_pending_parent is not None or self.pending_parents:
710
            return 1
711
        return 0
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
712
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
713
    cdef object _revno(self):
714
        if self._revno_first == -1:
715
            if self._revno_second != -1:
716
                raise RuntimeError('Something wrong with: %s' % (self,))
717
            return (self._revno_last,)
718
        else:
719
            return (self._revno_first, self._revno_second, self._revno_last)
720
721
    property revno:
722
        def __get__(self):
723
            return self._revno()
724
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
725
726
cdef class _MergeSorter:
727
    """This class does the work of computing the merge_sort ordering.
728
729
    We have some small advantages, in that we get all the extra information
730
    that KnownGraph knows, like knowing the child lists, etc.
731
    """
732
4593.5.11 by John Arbash Meinel
Without doing any real tuning yet, we see a decent speedup for merge_sort:
733
    # Current performance numbers for merge_sort(bzr_dev_parent_map):
4593.5.25 by John Arbash Meinel
Add tests that we detect GraphCycleError
734
    #  302ms tsort.merge_sort()
735
    #   91ms graph.KnownGraph().merge_sort()
4593.5.28 by John Arbash Meinel
Cleanup pass
736
    #   40ms kg.merge_sort()
4593.5.11 by John Arbash Meinel
Without doing any real tuning yet, we see a decent speedup for merge_sort:
737
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
738
    cdef KnownGraph graph
4593.5.15 by John Arbash Meinel
Use the same implementation of stack handling to avoid append/pop.
739
    cdef object _depth_first_stack  # list
740
    cdef Py_ssize_t _last_stack_item # offset to last item on stack
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
741
    # cdef object _ms_nodes # dict of key => _MergeSortNode
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
742
    cdef object _revno_to_branch_count # {revno => num child branches}
743
    cdef object _scheduled_nodes # List of nodes ready to be yielded
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
744
745
    def __init__(self, known_graph, tip_key):
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
746
        cdef _KnownGraphNode node
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
747
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
748
        self.graph = known_graph
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
749
        # self._ms_nodes = {}
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
750
        self._revno_to_branch_count = {}
4593.5.15 by John Arbash Meinel
Use the same implementation of stack handling to avoid append/pop.
751
        self._depth_first_stack = []
752
        self._last_stack_item = -1
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
753
        self._scheduled_nodes = []
4593.5.42 by John Arbash Meinel
Turns out that some code assumed passing NULL_REVISION to merge_sort
754
        if (tip_key is not None and tip_key != NULL_REVISION
755
            and tip_key != (NULL_REVISION,)):
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
756
            node = self.graph._nodes[tip_key]
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
757
            self._push_node(node, 0)
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
758
4634.117.8 by John Arbash Meinel
Use a class regex to find other signatures to exclude.
759
    cdef _MergeSortNode _get_ms_node(self, _KnownGraphNode node):
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
760
        cdef PyObject *temp_node
761
        cdef _MergeSortNode ms_node
762
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
763
        if node.extra is None:
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
764
            ms_node = _MergeSortNode(node.key)
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
765
            node.extra = ms_node
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
766
        else:
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
767
            ms_node = <_MergeSortNode>node.extra
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
768
        return ms_node
769
4593.5.24 by John Arbash Meinel
Switch some variables from Py_ssize_t to long
770
    cdef _push_node(self, _KnownGraphNode node, long merge_depth):
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
771
        cdef _KnownGraphNode parent_node
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
772
        cdef _MergeSortNode ms_node, ms_parent_node
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
773
        cdef Py_ssize_t pos
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
774
4593.5.27 by John Arbash Meinel
bit of cleanup passes
775
        ms_node = self._get_ms_node(node)
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
776
        ms_node.merge_depth = merge_depth
4634.11.1 by John Arbash Meinel
Fix bug #419241. If a graph had a mainline ghost
777
        if node.parents is None:
778
            raise RuntimeError('ghost nodes should not be pushed'
779
                               ' onto the stack: %s' % (node,))
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
780
        if PyTuple_GET_SIZE(node.parents) > 0:
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
781
            parent_node = _get_tuple_node(node.parents, 0)
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
782
            ms_node.left_parent = parent_node
4634.11.1 by John Arbash Meinel
Fix bug #419241. If a graph had a mainline ghost
783
            if parent_node.parents is None: # left-hand ghost
784
                ms_node.left_pending_parent = None
785
                ms_node.left_parent = None
786
            else:
787
                ms_node.left_pending_parent = parent_node
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
788
        if PyTuple_GET_SIZE(node.parents) > 1:
4593.5.20 by John Arbash Meinel
Expose KnownGraph off of VersionedFiles
789
            ms_node.pending_parents = []
790
            for pos from 1 <= pos < PyTuple_GET_SIZE(node.parents):
4641.5.7 by John Arbash Meinel
A few tweaks to the internals, and we are down to:
791
                parent_node = _get_tuple_node(node.parents, pos)
4593.5.20 by John Arbash Meinel
Expose KnownGraph off of VersionedFiles
792
                if parent_node.parents is None: # ghost
793
                    continue
794
                PyList_Append(ms_node.pending_parents, parent_node)
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
795
4593.5.7 by John Arbash Meinel
start working on a pyrex implementation of MergeSorter
796
        ms_node.is_first_child = 1
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
797
        if ms_node.left_parent is not None:
4593.5.27 by John Arbash Meinel
bit of cleanup passes
798
            ms_parent_node = self._get_ms_node(ms_node.left_parent)
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
799
            if ms_parent_node.seen_by_child:
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
800
                ms_node.is_first_child = 0
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
801
            ms_parent_node.seen_by_child = 1
4593.5.15 by John Arbash Meinel
Use the same implementation of stack handling to avoid append/pop.
802
        self._last_stack_item = self._last_stack_item + 1
803
        if self._last_stack_item < PyList_GET_SIZE(self._depth_first_stack):
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
804
            Py_INCREF(node) # SetItem steals a ref
4593.5.15 by John Arbash Meinel
Use the same implementation of stack handling to avoid append/pop.
805
            PyList_SetItem(self._depth_first_stack, self._last_stack_item,
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
806
                           node)
4593.5.15 by John Arbash Meinel
Use the same implementation of stack handling to avoid append/pop.
807
        else:
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
808
            PyList_Append(self._depth_first_stack, node)
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
809
810
    cdef _pop_node(self):
4593.5.16 by John Arbash Meinel
Change the revno handling code.
811
        cdef PyObject *temp
4593.5.23 by John Arbash Meinel
Move the end-of-merge computation into _pop_node
812
        cdef _MergeSortNode ms_node, ms_parent_node, ms_prev_node
813
        cdef _KnownGraphNode node, parent_node, prev_node
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
814
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
815
        node = _get_list_node(self._depth_first_stack, self._last_stack_item)
816
        ms_node = <_MergeSortNode>node.extra
4593.5.15 by John Arbash Meinel
Use the same implementation of stack handling to avoid append/pop.
817
        self._last_stack_item = self._last_stack_item - 1
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
818
        if ms_node.left_parent is not None:
4593.5.27 by John Arbash Meinel
bit of cleanup passes
819
            # Assign the revision number from the left-hand parent
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
820
            ms_parent_node = <_MergeSortNode>ms_node.left_parent.extra
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
821
            if ms_node.is_first_child:
822
                # First child just increments the final digit
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
823
                ms_node._revno_first = ms_parent_node._revno_first
824
                ms_node._revno_second = ms_parent_node._revno_second
825
                ms_node._revno_last = ms_parent_node._revno_last + 1
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
826
            else:
4593.5.10 by John Arbash Meinel
finish getting the revno numbers working correctly.
827
                # Not the first child, make a new branch
4593.5.27 by John Arbash Meinel
bit of cleanup passes
828
                #  (mainline_revno, branch_count, 1)
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
829
                if ms_parent_node._revno_first == -1:
4593.5.16 by John Arbash Meinel
Change the revno handling code.
830
                    # Mainline ancestor, the increment is on the last digit
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
831
                    base_revno = ms_parent_node._revno_last
4593.5.16 by John Arbash Meinel
Change the revno handling code.
832
                else:
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
833
                    base_revno = ms_parent_node._revno_first
4593.5.16 by John Arbash Meinel
Change the revno handling code.
834
                temp = PyDict_GetItem(self._revno_to_branch_count,
4593.5.17 by John Arbash Meinel
No need to pop one item off the stack after another.
835
                                      base_revno)
4593.5.16 by John Arbash Meinel
Change the revno handling code.
836
                if temp == NULL:
837
                    branch_count = 1
838
                else:
839
                    branch_count = (<object>temp) + 1
840
                PyDict_SetItem(self._revno_to_branch_count, base_revno,
841
                               branch_count)
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
842
                ms_node._revno_first = base_revno
843
                ms_node._revno_second = branch_count
844
                ms_node._revno_last = 1
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
845
        else:
4593.5.27 by John Arbash Meinel
bit of cleanup passes
846
            temp = PyDict_GetItem(self._revno_to_branch_count, 0)
847
            if temp == NULL:
4593.5.16 by John Arbash Meinel
Change the revno handling code.
848
                # The first root node doesn't have a 3-digit revno
4593.5.27 by John Arbash Meinel
bit of cleanup passes
849
                root_count = 0
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
850
                ms_node._revno_first = -1
851
                ms_node._revno_second = -1
852
                ms_node._revno_last = 1
4593.5.27 by John Arbash Meinel
bit of cleanup passes
853
            else:
854
                root_count = (<object>temp) + 1
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
855
                ms_node._revno_first = 0
856
                ms_node._revno_second = root_count
857
                ms_node._revno_last = 1
4593.5.27 by John Arbash Meinel
bit of cleanup passes
858
            PyDict_SetItem(self._revno_to_branch_count, 0, root_count)
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
859
        ms_node.completed = 1
4593.5.23 by John Arbash Meinel
Move the end-of-merge computation into _pop_node
860
        if PyList_GET_SIZE(self._scheduled_nodes) == 0:
4593.5.27 by John Arbash Meinel
bit of cleanup passes
861
            # The first scheduled node is always the end of merge
4593.5.23 by John Arbash Meinel
Move the end-of-merge computation into _pop_node
862
            ms_node.end_of_merge = True
863
        else:
864
            prev_node = _get_list_node(self._scheduled_nodes,
865
                                    PyList_GET_SIZE(self._scheduled_nodes) - 1)
866
            ms_prev_node = <_MergeSortNode>prev_node.extra
867
            if ms_prev_node.merge_depth < ms_node.merge_depth:
868
                # The previously pushed node is to our left, so this is the end
869
                # of this right-hand chain
870
                ms_node.end_of_merge = True
871
            elif (ms_prev_node.merge_depth == ms_node.merge_depth
872
                  and prev_node not in node.parents):
873
                # The next node is not a direct parent of this node
874
                ms_node.end_of_merge = True
875
            else:
876
                ms_node.end_of_merge = False
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
877
        PyList_Append(self._scheduled_nodes, node)
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
878
879
    cdef _schedule_stack(self):
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
880
        cdef _KnownGraphNode last_node, next_node
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
881
        cdef _MergeSortNode ms_node, ms_last_node, ms_next_node
4593.5.24 by John Arbash Meinel
Switch some variables from Py_ssize_t to long
882
        cdef long next_merge_depth
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
883
        ordered = []
4593.5.15 by John Arbash Meinel
Use the same implementation of stack handling to avoid append/pop.
884
        while self._last_stack_item >= 0:
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
885
            # Peek at the last item on the stack
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
886
            last_node = _get_list_node(self._depth_first_stack,
887
                                       self._last_stack_item)
4593.5.25 by John Arbash Meinel
Add tests that we detect GraphCycleError
888
            if last_node.gdfo == -1:
4593.5.28 by John Arbash Meinel
Cleanup pass
889
                # if _find_gdfo skipped a node, that means there is a graph
890
                # cycle, error out now
4593.5.25 by John Arbash Meinel
Add tests that we detect GraphCycleError
891
                raise errors.GraphCycleError(self.graph._nodes)
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
892
            ms_last_node = <_MergeSortNode>last_node.extra
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
893
            if not ms_last_node.has_pending_parents():
894
                # Processed all parents, pop this node
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
895
                self._pop_node()
896
                continue
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
897
            while ms_last_node.has_pending_parents():
898
                if ms_last_node.left_pending_parent is not None:
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
899
                    # recurse depth first into the primary parent
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
900
                    next_node = ms_last_node.left_pending_parent
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
901
                    ms_last_node.left_pending_parent = None
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
902
                else:
903
                    # place any merges in right-to-left order for scheduling
904
                    # which gives us left-to-right order after we reverse
4593.5.28 by John Arbash Meinel
Cleanup pass
905
                    # the scheduled queue.
906
                    # Note: This has the effect of allocating common-new
907
                    #       revisions to the right-most subtree rather than the
908
                    #       left most, which will display nicely (you get
909
                    #       smaller trees at the top of the combined merge).
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
910
                    next_node = ms_last_node.pending_parents.pop()
4593.5.27 by John Arbash Meinel
bit of cleanup passes
911
                ms_next_node = self._get_ms_node(next_node)
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
912
                if ms_next_node.completed:
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
913
                    # this parent was completed by a child on the
914
                    # call stack. skip it.
915
                    continue
916
                # otherwise transfer it from the source graph into the
917
                # top of the current depth first search stack.
918
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
919
                if next_node is ms_last_node.left_parent:
4593.5.24 by John Arbash Meinel
Switch some variables from Py_ssize_t to long
920
                    next_merge_depth = ms_last_node.merge_depth
4593.5.12 by John Arbash Meinel
Move some sets/dicts into attributes.
921
                else:
4593.5.24 by John Arbash Meinel
Switch some variables from Py_ssize_t to long
922
                    next_merge_depth = ms_last_node.merge_depth + 1
4593.5.19 by John Arbash Meinel
Switch from having a MergeSortNode => KnownGraphNode to
923
                self._push_node(next_node, next_merge_depth)
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
924
                # and do not continue processing parents until this 'call'
925
                # has recursed.
926
                break
927
928
    cdef topo_order(self):
4593.5.23 by John Arbash Meinel
Move the end-of-merge computation into _pop_node
929
        cdef _MergeSortNode ms_node
930
        cdef _KnownGraphNode node
4593.5.17 by John Arbash Meinel
No need to pop one item off the stack after another.
931
        cdef Py_ssize_t pos
4593.5.28 by John Arbash Meinel
Cleanup pass
932
        cdef PyObject *temp_key, *temp_node
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
933
4593.5.28 by John Arbash Meinel
Cleanup pass
934
        # Note: allocating a _MergeSortNode and deallocating it for all nodes
935
        #       costs approx 8.52ms (21%) of the total runtime
936
        #       We might consider moving the attributes into the base
937
        #       KnownGraph object.
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
938
        self._schedule_stack()
939
940
        # We've set up the basic schedule, now we can continue processing the
941
        # output.
4593.5.29 by John Arbash Meinel
Remove the gc enable/disable code.
942
        # Note: This final loop costs us 40.0ms => 28.8ms (11ms, 25%) on
4593.5.24 by John Arbash Meinel
Switch some variables from Py_ssize_t to long
943
        #       bzr.dev, to convert the internal Object representation into a
944
        #       Tuple representation...
945
        #       2ms is walking the data and computing revno tuples
946
        #       7ms is computing the return tuple
947
        #       4ms is PyList_Append()
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
948
        ordered = []
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
949
        # output the result in reverse order, and separate the generated info
4593.5.23 by John Arbash Meinel
Move the end-of-merge computation into _pop_node
950
        for pos from PyList_GET_SIZE(self._scheduled_nodes) > pos >= 0:
951
            node = _get_list_node(self._scheduled_nodes, pos)
952
            ms_node = <_MergeSortNode>node.extra
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
953
            PyList_Append(ordered, ms_node)
4593.5.23 by John Arbash Meinel
Move the end-of-merge computation into _pop_node
954
            node.extra = None
4593.5.34 by John Arbash Meinel
Change the KnownGraph.merge_sort api.
955
        # Clear out the scheduled nodes now that we're done
4593.5.17 by John Arbash Meinel
No need to pop one item off the stack after another.
956
        self._scheduled_nodes = []
4593.5.8 by John Arbash Meinel
Approximate implementation in Pyrex.
957
        return ordered