/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
1
# Copyright (C) 2011 Canonical Ltd
2
#
5820.1.18 by INADA Naoki
Add copyright for some function.
3
# UTextWrapper._handle_long_word, UTextWrapper._wrap_chunks,
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
4
# UTextWrapper._fix_sentence_endings, wrap and fill is copied from Python's
5
# textwrap module (under PSF license) and modified for support CJK.
5820.1.18 by INADA Naoki
Add copyright for some function.
6
# Original Copyright for these functions:
7
#
8
# Copyright (C) 1999-2001 Gregory P. Ward.
9
# Copyright (C) 2002, 2003 Python Software Foundation.
10
#
11
# Written by Greg Ward <gward@python.net>
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
12
# This program is free software; you can redistribute it and/or modify
13
# it under the terms of the GNU General Public License as published by
14
# the Free Software Foundation; either version 2 of the License, or
15
# (at your option) any later version.
16
#
17
# This program is distributed in the hope that it will be useful,
18
# but WITHOUT ANY WARRANTY; without even the implied warranty of
19
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
20
# GNU General Public License for more details.
21
#
22
# You should have received a copy of the GNU General Public License
23
# along with this program; if not, write to the Free Software
24
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
25
26
import textwrap
27
from unicodedata import east_asian_width as _eawidth
28
6624 by Jelmer Vernooij
Merge Python3 porting work ('py3 pokes')
29
from . import osutils
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
30
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
31
__all__ = ["UTextWrapper", "fill", "wrap"]
32
7143.15.2 by Jelmer Vernooij
Run autopep8.
33
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
34
class UTextWrapper(textwrap.TextWrapper):
35
    """
36
    Extend TextWrapper for Unicode.
37
38
    This textwrapper handles east asian double width and split word
39
    even if !break_long_words when word contains double width
40
    characters.
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
41
42
    :param ambiguous_width: (keyword argument) width for character when
43
                            unicodedata.east_asian_width(c) == 'A'
5820.1.21 by INADA Naoki
Change default value of ambiguous_width from 2 to 1.
44
                            (default: 1)
5820.1.22 by INADA Naoki
Add document of some limitations in docstring.
45
46
    Limitations:
47
    * expand_tabs doesn't fixed. It uses len() for calculating width
48
      of string on left of TAB.
49
    * Handles one codeunit as a single character having 1 or 2 width.
50
      This is not correct when there are surrogate pairs, combined
51
      characters or zero-width characters.
52
    * Treats all asian character are line breakable. But it is not
53
      true because line breaking is prohibited around some characters.
54
      (For example, breaking before punctation mark is prohibited.)
55
      See UAX # 14 "UNICODE LINE BREAKING ALGORITHM"
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
56
    """
5820.1.15 by Martin
Cope with lack of TextWrapper.drop_whitespace before Python 2.6
57
5820.1.9 by INADA Naoki
Default width of UTextWrapper is also osutils.terminal_widtth() and
58
    def __init__(self, width=None, **kwargs):
59
        if width is None:
60
            width = (osutils.terminal_width() or
7143.15.2 by Jelmer Vernooij
Run autopep8.
61
                     osutils.default_terminal_width) - 1
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
62
5820.1.21 by INADA Naoki
Change default value of ambiguous_width from 2 to 1.
63
        ambi_width = kwargs.pop('ambiguous_width', 1)
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
64
        if ambi_width == 1:
65
            self._east_asian_doublewidth = 'FW'
66
        elif ambi_width == 2:
67
            self._east_asian_doublewidth = 'FWA'
68
        else:
69
            raise ValueError("ambiguous_width should be 1 or 2")
70
7112.1.2 by Jelmer Vernooij
Fix test on python2.
71
        self.max_lines = kwargs.get('max_lines', None)
5820.1.9 by INADA Naoki
Default width of UTextWrapper is also osutils.terminal_widtth() and
72
        textwrap.TextWrapper.__init__(self, width, **kwargs)
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
73
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
74
    def _unicode_char_width(self, uc):
75
        """Return width of character `uc`.
76
77
        :param:     uc      Single unicode character.
78
        """
79
        # 'A' means width of the character is not be able to determine.
80
        # We assume that it's width is 2 because longer wrap may over
81
        # terminal width but shorter wrap may be acceptable.
82
        return (_eawidth(uc) in self._east_asian_doublewidth and 2) or 1
83
84
    def _width(self, s):
85
        """Returns width for s.
5820.1.26 by INADA Naoki
Cleanup. Remove spaces in empty line and shorten line having 80 characters.
86
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
87
        When s is unicode, take care of east asian width.
88
        When s is bytes, treat all byte is single width character.
89
        """
90
        charwidth = self._unicode_char_width
91
        return sum(charwidth(c) for c in s)
92
93
    def _cut(self, s, width):
94
        """Returns head and rest of s. (head+rest == s)
95
96
        Head is large as long as _width(head) <= width.
97
        """
98
        w = 0
99
        charwidth = self._unicode_char_width
100
        for pos, c in enumerate(s):
101
            w += charwidth(c)
102
            if w > width:
103
                return s[:pos], s[pos:]
104
        return s, u''
105
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
106
    def _fix_sentence_endings(self, chunks):
107
        """_fix_sentence_endings(chunks : [string])
108
109
        Correct for sentence endings buried in 'chunks'.  Eg. when the
110
        original text contains "... foo.\nBar ...", munge_whitespace()
111
        and split() will convert that to [..., "foo.", " ", "Bar", ...]
112
        which has one too few spaces; this method simply changes the one
113
        space to two.
114
115
        Note: This function is copied from textwrap.TextWrap and modified
116
        to use unicode always.
117
        """
118
        i = 0
7143.15.2 by Jelmer Vernooij
Run autopep8.
119
        L = len(chunks) - 1
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
120
        patsearch = self.sentence_end_re.search
121
        while i < L:
7143.15.2 by Jelmer Vernooij
Run autopep8.
122
            if chunks[i + 1] == u" " and patsearch(chunks[i]):
123
                chunks[i + 1] = u"  "
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
124
                i += 2
125
            else:
126
                i += 1
127
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
128
    def _handle_long_word(self, chunks, cur_line, cur_len, width):
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
129
        # Figure out when indent is larger than the specified width, and make
130
        # sure at least one character is stripped off on every pass
131
        if width < 2:
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
132
            space_left = chunks[-1] and self._width(chunks[-1][0]) or 1
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
133
        else:
134
            space_left = width - cur_len
135
136
        # If we're allowed to break long words, then do so: put as much
137
        # of the next chunk onto the current line as will fit.
138
        if self.break_long_words:
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
139
            head, rest = self._cut(chunks[-1], space_left)
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
140
            cur_line.append(head)
5820.1.5 by INADA Naoki
Make UTextWrapper support byte string and add tests including Python's
141
            if rest:
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
142
                chunks[-1] = rest
143
            else:
144
                del chunks[-1]
145
146
        # Otherwise, we have to preserve the long word intact.  Only add
147
        # it to the current line if there's nothing already there --
148
        # that minimizes how much we violate the width constraint.
149
        elif not cur_line:
150
            cur_line.append(chunks.pop())
151
152
        # If we're not allowed to break long words, and there's already
153
        # text on the current line, do nothing.  Next time through the
154
        # main loop of _wrap_chunks(), we'll wind up here again, but
155
        # cur_len will be zero, so the next line will be entirely
156
        # devoted to the long word that we can't handle right now.
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
157
158
    def _wrap_chunks(self, chunks):
159
        lines = []
160
        if self.width <= 0:
161
            raise ValueError("invalid width %r (must be > 0)" % self.width)
7112.1.1 by Jelmer Vernooij
Fix utextwrap tests on Python 3.5.
162
        if self.max_lines is not None:
163
            if self.max_lines > 1:
164
                indent = self.subsequent_indent
165
            else:
166
                indent = self.initial_indent
167
            if self._width(indent) + self._width(self.placeholder.lstrip()) > self.width:
168
                raise ValueError("placeholder too large for max width")
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
169
170
        # Arrange in reverse order so items can be efficiently popped
171
        # from a stack of chucks.
172
        chunks.reverse()
173
174
        while chunks:
175
176
            # Start the list of chunks that will make up the current line.
177
            # cur_len is just the length of all the chunks in cur_line.
178
            cur_line = []
179
            cur_len = 0
180
181
            # Figure out which static string will prefix this line.
182
            if lines:
183
                indent = self.subsequent_indent
184
            else:
185
                indent = self.initial_indent
186
187
            # Maximum width for this line.
188
            width = self.width - len(indent)
189
190
            # First chunk on line is whitespace -- drop it, unless this
191
            # is the very beginning of the text (ie. no lines started yet).
192
            if self.drop_whitespace and chunks[-1].strip() == '' and lines:
193
                del chunks[-1]
194
195
            while chunks:
196
                # Use _width instead of len for east asian width
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
197
                l = self._width(chunks[-1])
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
198
199
                # Can at least squeeze this chunk onto the current line.
200
                if cur_len + l <= width:
201
                    cur_line.append(chunks.pop())
202
                    cur_len += l
203
204
                # Nope, this line is full.
205
                else:
206
                    break
207
208
            # The current line is full, and the next chunk is too big to
209
            # fit on *any* line (not just this one).
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
210
            if chunks and self._width(chunks[-1]) > width:
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
211
                self._handle_long_word(chunks, cur_line, cur_len, width)
7112.1.1 by Jelmer Vernooij
Fix utextwrap tests on Python 3.5.
212
                cur_len = sum(map(len, cur_line))
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
213
214
            # If the last chunk on this line is all whitespace, drop it.
5820.1.26 by INADA Naoki
Cleanup. Remove spaces in empty line and shorten line having 80 characters.
215
            if self.drop_whitespace and cur_line and not cur_line[-1].strip():
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
216
                del cur_line[-1]
217
218
            # Convert current line back to a string and store it in list
219
            # of all lines (return value).
220
            if cur_line:
7112.1.1 by Jelmer Vernooij
Fix utextwrap tests on Python 3.5.
221
                if (self.max_lines is None or
222
                    len(lines) + 1 < self.max_lines or
223
                    (not chunks or
7143.15.2 by Jelmer Vernooij
Run autopep8.
224
                        self.drop_whitespace and
7112.1.1 by Jelmer Vernooij
Fix utextwrap tests on Python 3.5.
225
                     len(chunks) == 1 and
226
                     not chunks[0].strip()) and cur_len <= width):
227
                    # Convert current line back to a string and store it in
228
                    # list of all lines (return value).
229
                    lines.append(indent + u''.join(cur_line))
230
                else:
231
                    while cur_line:
232
                        if (cur_line[-1].strip() and
7143.15.2 by Jelmer Vernooij
Run autopep8.
233
                                cur_len + self._width(self.placeholder) <= width):
7112.1.1 by Jelmer Vernooij
Fix utextwrap tests on Python 3.5.
234
                            cur_line.append(self.placeholder)
235
                            lines.append(indent + ''.join(cur_line))
236
                            break
237
                        cur_len -= self._width(cur_line[-1])
238
                        del cur_line[-1]
239
                    else:
240
                        if lines:
241
                            prev_line = lines[-1].rstrip()
242
                            if (self._width(prev_line) + self._width(self.placeholder) <=
243
                                    self.width):
244
                                lines[-1] = prev_line + self.placeholder
245
                                break
246
                        lines.append(indent + self.placeholder.lstrip())
247
                    break
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
248
249
        return lines
250
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
251
    def _split(self, text):
7012.2.1 by Jelmer Vernooij
Always pass in unicode to utextwrap.
252
        chunks = textwrap.TextWrapper._split(self, osutils.safe_unicode(text))
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
253
        cjk_split_chunks = []
254
        for chunk in chunks:
255
            prev_pos = 0
256
            for pos, char in enumerate(chunk):
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
257
                if self._unicode_char_width(char) == 2:
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
258
                    if prev_pos < pos:
259
                        cjk_split_chunks.append(chunk[prev_pos:pos])
260
                    cjk_split_chunks.append(char)
7143.15.2 by Jelmer Vernooij
Run autopep8.
261
                    prev_pos = pos + 1
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
262
            if prev_pos < len(chunk):
263
                cjk_split_chunks.append(chunk[prev_pos:])
264
        return cjk_split_chunks
265
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
266
    def wrap(self, text):
267
        # ensure text is unicode
7112.1.2 by Jelmer Vernooij
Fix test on python2.
268
        return textwrap.TextWrapper.wrap(self, osutils.safe_unicode(text))
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
269
270
# -- Convenience interface ---------------------------------------------
271
7143.15.2 by Jelmer Vernooij
Run autopep8.
272
5820.1.2 by INADA Naoki
bzrlib.utextwrap uses bzrlib.osutils.terminal_width() when width is not specified.
273
def wrap(text, width=None, **kwargs):
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
274
    """Wrap a single paragraph of text, returning a list of wrapped lines.
275
276
    Reformat the single paragraph in 'text' so it fits in lines of no
277
    more than 'width' columns, and return a list of wrapped lines.  By
278
    default, tabs in 'text' are expanded with string.expandtabs(), and
279
    all other whitespace characters (including newline) are converted to
280
    space.  See TextWrapper class for available keyword args to customize
281
    wrapping behaviour.
282
    """
5820.1.9 by INADA Naoki
Default width of UTextWrapper is also osutils.terminal_widtth() and
283
    return UTextWrapper(width=width, **kwargs).wrap(text)
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
284
7143.15.2 by Jelmer Vernooij
Run autopep8.
285
5820.1.2 by INADA Naoki
bzrlib.utextwrap uses bzrlib.osutils.terminal_width() when width is not specified.
286
def fill(text, width=None, **kwargs):
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
287
    """Fill a single paragraph of text, returning a new string.
288
289
    Reformat the single paragraph in 'text' to fit in lines of no more
290
    than 'width' columns, and return a new string containing the entire
291
    wrapped paragraph.  As with wrap(), tabs are expanded and other
292
    whitespace characters converted to space.  See TextWrapper class for
293
    available keyword args to customize wrapping behaviour.
294
    """
5820.1.9 by INADA Naoki
Default width of UTextWrapper is also osutils.terminal_widtth() and
295
    return UTextWrapper(width=width, **kwargs).fill(text)