/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
1
# Copyright (C) 2008 Canonical Ltd
2
#
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
7
#
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
12
#
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
4183.7.1 by Sabin Iacob
update FSF mailing address
15
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
16
#
17
18
"""Pyrex extensions for converting chunks to lines."""
19
6656.2.2 by Jelmer Vernooij
Use absolute_import.
20
from __future__ import absolute_import
21
6656.2.4 by Jelmer Vernooij
Merge cython-only branch.
22
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
23
cdef extern from "python-compat.h":
24
    pass
25
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
26
from cpython.bytes cimport (
27
    PyBytes_CheckExact,
28
    PyBytes_FromStringAndSize,
29
    PyBytes_AS_STRING,
30
    PyBytes_GET_SIZE,
31
    )
32
from cpython.list cimport (
33
    PyList_Append,
34
    )
35
36
from libc.string cimport memchr
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
37
38
39
def chunks_to_lines(chunks):
3890.2.10 by John Arbash Meinel
Change the python implementation to a friendlier implementation.
40
    """Re-split chunks into simple lines.
41
42
    Each entry in the result should contain a single newline at the end. Except
43
    for the last entry which may not have a final newline. If chunks is already
44
    a simple list of lines, we return it directly.
45
46
    :param chunks: An list/tuple of strings. If chunks is already a list of
47
        lines, then we will return it as-is.
48
    :return: A list of strings.
49
    """
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
50
    cdef char *c_str
51
    cdef char *newline
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
52
    cdef char *c_last
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
53
    cdef Py_ssize_t the_len
3890.2.10 by John Arbash Meinel
Change the python implementation to a friendlier implementation.
54
    cdef int last_no_newline
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
55
56
    # Check to see if the chunks are already lines
3890.2.10 by John Arbash Meinel
Change the python implementation to a friendlier implementation.
57
    last_no_newline = 0
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
58
    for chunk in chunks:
3890.2.10 by John Arbash Meinel
Change the python implementation to a friendlier implementation.
59
        if last_no_newline:
60
            # We have a chunk which followed a chunk without a newline, so this
61
            # is not a simple list of lines.
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
62
            break
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
63
        # Switching from PyBytes_AsStringAndSize to PyBytes_CheckExact and
3890.2.11 by John Arbash Meinel
A bit more tweaking of the pyrex version. Shave off another 10% by
64
        # then the macros GET_SIZE and AS_STRING saved us 40us / 470us.
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
65
        # It seems PyBytes_AsStringAndSize can actually trigger a conversion,
3890.2.11 by John Arbash Meinel
A bit more tweaking of the pyrex version. Shave off another 10% by
66
        # which we don't want anyway.
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
67
        if not PyBytes_CheckExact(chunk):
3890.2.11 by John Arbash Meinel
A bit more tweaking of the pyrex version. Shave off another 10% by
68
            raise TypeError('chunk is not a string')
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
69
        the_len = PyBytes_GET_SIZE(chunk)
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
70
        if the_len == 0:
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
71
            # An empty string is never a valid line
72
            break
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
73
        c_str = PyBytes_AS_STRING(chunk)
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
74
        c_last = c_str + the_len - 1
75
        newline = <char *>memchr(c_str, c'\n', the_len)
76
        if newline != c_last:
77
            if newline == NULL:
78
                # Missing a newline. Only valid as the last line
79
                last_no_newline = 1
80
            else:
81
                # There is a newline in the middle, we must resplit
82
                break
83
    else:
84
        # Everything was already a list of lines
85
        return chunks
86
87
    # We know we need to create a new list of lines
88
    lines = []
89
    tail = None # Any remainder from the previous chunk
90
    for chunk in chunks:
91
        if tail is not None:
92
            chunk = tail + chunk
93
            tail = None
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
94
        if not PyBytes_CheckExact(chunk):
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
95
            raise TypeError('chunk is not a string')
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
96
        the_len = PyBytes_GET_SIZE(chunk)
3890.2.16 by John Arbash Meinel
If we split into 2 loops, we get 440us for already lines, and the
97
        if the_len == 0:
3890.2.15 by John Arbash Meinel
Update to do a single iteration over the chunks.
98
            # An empty string is never a valid line, and we don't need to
99
            # append anything
100
            continue
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
101
        c_str = PyBytes_AS_STRING(chunk)
3890.2.7 by John Arbash Meinel
A Pyrex extension is about 5x faster than the fastest python code I could write.
102
        c_last = c_str + the_len - 1
103
        newline = <char *>memchr(c_str, c'\n', the_len)
3890.2.15 by John Arbash Meinel
Update to do a single iteration over the chunks.
104
        if newline == c_last:
105
            # A simple line
106
            PyList_Append(lines, chunk)
107
        elif newline == NULL:
108
            # A chunk without a newline, if this is the last entry, then we
109
            # allow it
110
            tail = chunk
111
        else:
112
            # We have a newline in the middle, loop until we've consumed all
113
            # lines
114
            while newline != NULL:
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
115
                line = PyBytes_FromStringAndSize(c_str, newline - c_str + 1)
3890.2.15 by John Arbash Meinel
Update to do a single iteration over the chunks.
116
                PyList_Append(lines, line)
117
                c_str = newline + 1
118
                if c_str > c_last: # We are done
119
                    break
120
                the_len = c_last - c_str + 1
121
                newline = <char *>memchr(c_str, c'\n', the_len)
122
                if newline == NULL:
7007.2.1 by Martin
Port _chunks_to_lines_pyx to Python 3
123
                    tail = PyBytes_FromStringAndSize(c_str, the_len)
3890.2.15 by John Arbash Meinel
Update to do a single iteration over the chunks.
124
                    break
125
    if tail is not None:
126
        PyList_Append(lines, tail)
127
    return lines