/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
1
# Copyright (C) 2008 Canonical Ltd
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
4183.7.1 by Sabin Iacob
update FSF mailing address
15
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
16
#
7509.1.1 by Jelmer Vernooij
Set cython language level to Python 3.
17
# cython: language_level=3
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
18
19
"""Pyrex extensions for converting chunks to lines."""
20
6656.2.4 by Jelmer Vernooij
Merge cython-only branch.
21
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
22
cdef extern from "python-compat.h":
23
    pass
24
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
25
from cpython.bytes cimport (
26
    PyBytes_CheckExact,
27
    PyBytes_FromStringAndSize,
28
    PyBytes_AS_STRING,
29
    PyBytes_GET_SIZE,
30
    )
31
from cpython.list cimport (
32
    PyList_Append,
33
    )
34
35
from libc.string cimport memchr
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
36
37
38
def chunks_to_lines(chunks):
3890.2.10 by John Arbash Meinel
Change the python implementation to a friendlier implementation.
39
    """Re-split chunks into simple lines.
40
41
    Each entry in the result should contain a single newline at the end. Except
42
    for the last entry which may not have a final newline. If chunks is already
43
    a simple list of lines, we return it directly.
44
45
    :param chunks: An list/tuple of strings. If chunks is already a list of
46
        lines, then we will return it as-is.
47
    :return: A list of strings.
48
    """
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
49
    cdef char *c_str
50
    cdef char *newline
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
51
    cdef char *c_last
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
52
    cdef Py_ssize_t the_len
3890.2.10 by John Arbash Meinel
Change the python implementation to a friendlier implementation.
53
    cdef int last_no_newline
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
54
55
    # Check to see if the chunks are already lines
3890.2.10 by John Arbash Meinel
Change the python implementation to a friendlier implementation.
56
    last_no_newline = 0
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
57
    for chunk in chunks:
3890.2.10 by John Arbash Meinel
Change the python implementation to a friendlier implementation.
58
        if last_no_newline:
59
            # We have a chunk which followed a chunk without a newline, so this
60
            # is not a simple list of lines.
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
61
            break
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
62
        # Switching from PyBytes_AsStringAndSize to PyBytes_CheckExact and
3890.2.11 by John Arbash Meinel
A bit more tweaking of the pyrex version. Shave off another 10% by
63
        # then the macros GET_SIZE and AS_STRING saved us 40us / 470us.
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
64
        # It seems PyBytes_AsStringAndSize can actually trigger a conversion,
3890.2.11 by John Arbash Meinel
A bit more tweaking of the pyrex version. Shave off another 10% by
65
        # which we don't want anyway.
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
66
        if not PyBytes_CheckExact(chunk):
3890.2.11 by John Arbash Meinel
A bit more tweaking of the pyrex version. Shave off another 10% by
67
            raise TypeError('chunk is not a string')
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
68
        the_len = PyBytes_GET_SIZE(chunk)
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
69
        if the_len == 0:
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
70
            # An empty string is never a valid line
71
            break
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
72
        c_str = PyBytes_AS_STRING(chunk)
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
73
        c_last = c_str + the_len - 1
74
        newline = <char *>memchr(c_str, c'\n', the_len)
75
        if newline != c_last:
76
            if newline == NULL:
77
                # Missing a newline. Only valid as the last line
78
                last_no_newline = 1
79
            else:
80
                # There is a newline in the middle, we must resplit
81
                break
82
    else:
83
        # Everything was already a list of lines
84
        return chunks
85
86
    # We know we need to create a new list of lines
87
    lines = []
88
    tail = None # Any remainder from the previous chunk
89
    for chunk in chunks:
90
        if tail is not None:
91
            chunk = tail + chunk
92
            tail = None
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
93
        if not PyBytes_CheckExact(chunk):
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
94
            raise TypeError('chunk is not a string')
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
95
        the_len = PyBytes_GET_SIZE(chunk)
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
96
        if the_len == 0:
3890.2.15 by John Arbash Meinel
Update to do a single iteration over the chunks.
97
            # An empty string is never a valid line, and we don't need to
98
            # append anything
99
            continue
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
100
        c_str = PyBytes_AS_STRING(chunk)
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
101
        c_last = c_str + the_len - 1
102
        newline = <char *>memchr(c_str, c'\n', the_len)
3890.2.15 by John Arbash Meinel
Update to do a single iteration over the chunks.
103
        if newline == c_last:
104
            # A simple line
105
            PyList_Append(lines, chunk)
106
        elif newline == NULL:
107
            # A chunk without a newline, if this is the last entry, then we
108
            # allow it
109
            tail = chunk
110
        else:
111
            # We have a newline in the middle, loop until we've consumed all
112
            # lines
113
            while newline != NULL:
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
114
                line = PyBytes_FromStringAndSize(c_str, newline - c_str + 1)
3890.2.15 by John Arbash Meinel
Update to do a single iteration over the chunks.
115
                PyList_Append(lines, line)
116
                c_str = newline + 1
117
                if c_str > c_last: # We are done
118
                    break
119
                the_len = c_last - c_str + 1
120
                newline = <char *>memchr(c_str, c'\n', the_len)
121
                if newline == NULL:
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
122
                    tail = PyBytes_FromStringAndSize(c_str, the_len)
3890.2.15 by John Arbash Meinel
Update to do a single iteration over the chunks.
123
                    break
124
    if tail is not None:
125
        PyList_Append(lines, tail)
126
    return lines