/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
1
# Copyright (C) 2011 Canonical Ltd
2
#
5820.1.18 by INADA Naoki
Add copyright for some function.
3
# UTextWrapper._handle_long_word, UTextWrapper._wrap_chunks,
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
4
# UTextWrapper._fix_sentence_endings, wrap and fill is copied from Python's
5
# textwrap module (under PSF license) and modified for support CJK.
5820.1.18 by INADA Naoki
Add copyright for some function.
6
# Original Copyright for these functions:
7
#
8
# Copyright (C) 1999-2001 Gregory P. Ward.
9
# Copyright (C) 2002, 2003 Python Software Foundation.
10
#
11
# Written by Greg Ward <gward@python.net>
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
12
# This program is free software; you can redistribute it and/or modify
13
# it under the terms of the GNU General Public License as published by
14
# the Free Software Foundation; either version 2 of the License, or
15
# (at your option) any later version.
16
#
17
# This program is distributed in the hope that it will be useful,
18
# but WITHOUT ANY WARRANTY; without even the implied warranty of
19
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
20
# GNU General Public License for more details.
21
#
22
# You should have received a copy of the GNU General Public License
23
# along with this program; if not, write to the Free Software
24
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
25
6379.6.3 by Jelmer Vernooij
Use absolute_import.
26
from __future__ import absolute_import
27
5820.1.15 by Martin
Cope with lack of TextWrapper.drop_whitespace before Python 2.6
28
import sys
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
29
import textwrap
30
from unicodedata import east_asian_width as _eawidth
31
6624 by Jelmer Vernooij
Merge Python3 porting work ('py3 pokes')
32
from . import osutils
6973.6.2 by Jelmer Vernooij
Fix more tests.
33
from .sixish import text_type
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
34
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
35
__all__ = ["UTextWrapper", "fill", "wrap"]
36
37
class UTextWrapper(textwrap.TextWrapper):
38
    """
39
    Extend TextWrapper for Unicode.
40
41
    This textwrapper handles east asian double width and split word
42
    even if !break_long_words when word contains double width
43
    characters.
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
44
45
    :param ambiguous_width: (keyword argument) width for character when
46
                            unicodedata.east_asian_width(c) == 'A'
5820.1.21 by INADA Naoki
Change default value of ambiguous_width from 2 to 1.
47
                            (default: 1)
5820.1.22 by INADA Naoki
Add document of some limitations in docstring.
48
49
    Limitations:
50
    * expand_tabs doesn't fixed. It uses len() for calculating width
51
      of string on left of TAB.
52
    * Handles one codeunit as a single character having 1 or 2 width.
53
      This is not correct when there are surrogate pairs, combined
54
      characters or zero-width characters.
55
    * Treats all asian character are line breakable. But it is not
56
      true because line breaking is prohibited around some characters.
57
      (For example, breaking before punctation mark is prohibited.)
58
      See UAX # 14 "UNICODE LINE BREAKING ALGORITHM"
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
59
    """
5820.1.15 by Martin
Cope with lack of TextWrapper.drop_whitespace before Python 2.6
60
5820.1.9 by INADA Naoki
Default width of UTextWrapper is also osutils.terminal_widtth() and
61
    def __init__(self, width=None, **kwargs):
62
        if width is None:
63
            width = (osutils.terminal_width() or
64
                        osutils.default_terminal_width) - 1
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
65
5820.1.21 by INADA Naoki
Change default value of ambiguous_width from 2 to 1.
66
        ambi_width = kwargs.pop('ambiguous_width', 1)
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
67
        if ambi_width == 1:
68
            self._east_asian_doublewidth = 'FW'
69
        elif ambi_width == 2:
70
            self._east_asian_doublewidth = 'FWA'
71
        else:
72
            raise ValueError("ambiguous_width should be 1 or 2")
73
5820.1.9 by INADA Naoki
Default width of UTextWrapper is also osutils.terminal_widtth() and
74
        textwrap.TextWrapper.__init__(self, width, **kwargs)
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
75
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
76
    def _unicode_char_width(self, uc):
77
        """Return width of character `uc`.
78
79
        :param:     uc      Single unicode character.
80
        """
81
        # 'A' means width of the character is not be able to determine.
82
        # We assume that it's width is 2 because longer wrap may over
83
        # terminal width but shorter wrap may be acceptable.
84
        return (_eawidth(uc) in self._east_asian_doublewidth and 2) or 1
85
86
    def _width(self, s):
87
        """Returns width for s.
5820.1.26 by INADA Naoki
Cleanup. Remove spaces in empty line and shorten line having 80 characters.
88
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
89
        When s is unicode, take care of east asian width.
90
        When s is bytes, treat all byte is single width character.
91
        """
92
        charwidth = self._unicode_char_width
93
        return sum(charwidth(c) for c in s)
94
95
    def _cut(self, s, width):
96
        """Returns head and rest of s. (head+rest == s)
97
98
        Head is large as long as _width(head) <= width.
99
        """
100
        w = 0
101
        charwidth = self._unicode_char_width
102
        for pos, c in enumerate(s):
103
            w += charwidth(c)
104
            if w > width:
105
                return s[:pos], s[pos:]
106
        return s, u''
107
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
108
    def _fix_sentence_endings(self, chunks):
109
        """_fix_sentence_endings(chunks : [string])
110
111
        Correct for sentence endings buried in 'chunks'.  Eg. when the
112
        original text contains "... foo.\nBar ...", munge_whitespace()
113
        and split() will convert that to [..., "foo.", " ", "Bar", ...]
114
        which has one too few spaces; this method simply changes the one
115
        space to two.
116
117
        Note: This function is copied from textwrap.TextWrap and modified
118
        to use unicode always.
119
        """
120
        i = 0
121
        L = len(chunks)-1
122
        patsearch = self.sentence_end_re.search
123
        while i < L:
124
            if chunks[i+1] == u" " and patsearch(chunks[i]):
125
                chunks[i+1] = u"  "
126
                i += 2
127
            else:
128
                i += 1
129
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
130
    def _handle_long_word(self, chunks, cur_line, cur_len, width):
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
131
        # Figure out when indent is larger than the specified width, and make
132
        # sure at least one character is stripped off on every pass
133
        if width < 2:
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
134
            space_left = chunks[-1] and self._width(chunks[-1][0]) or 1
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
135
        else:
136
            space_left = width - cur_len
137
138
        # If we're allowed to break long words, then do so: put as much
139
        # of the next chunk onto the current line as will fit.
140
        if self.break_long_words:
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
141
            head, rest = self._cut(chunks[-1], space_left)
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
142
            cur_line.append(head)
5820.1.5 by INADA Naoki
Make UTextWrapper support byte string and add tests including Python's
143
            if rest:
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
144
                chunks[-1] = rest
145
            else:
146
                del chunks[-1]
147
148
        # Otherwise, we have to preserve the long word intact.  Only add
149
        # it to the current line if there's nothing already there --
150
        # that minimizes how much we violate the width constraint.
151
        elif not cur_line:
152
            cur_line.append(chunks.pop())
153
154
        # If we're not allowed to break long words, and there's already
155
        # text on the current line, do nothing.  Next time through the
156
        # main loop of _wrap_chunks(), we'll wind up here again, but
157
        # cur_len will be zero, so the next line will be entirely
158
        # devoted to the long word that we can't handle right now.
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
159
160
    def _wrap_chunks(self, chunks):
161
        lines = []
162
        if self.width <= 0:
163
            raise ValueError("invalid width %r (must be > 0)" % self.width)
164
165
        # Arrange in reverse order so items can be efficiently popped
166
        # from a stack of chucks.
167
        chunks.reverse()
168
169
        while chunks:
170
171
            # Start the list of chunks that will make up the current line.
172
            # cur_len is just the length of all the chunks in cur_line.
173
            cur_line = []
174
            cur_len = 0
175
176
            # Figure out which static string will prefix this line.
177
            if lines:
178
                indent = self.subsequent_indent
179
            else:
180
                indent = self.initial_indent
181
182
            # Maximum width for this line.
183
            width = self.width - len(indent)
184
185
            # First chunk on line is whitespace -- drop it, unless this
186
            # is the very beginning of the text (ie. no lines started yet).
187
            if self.drop_whitespace and chunks[-1].strip() == '' and lines:
188
                del chunks[-1]
189
190
            while chunks:
191
                # Use _width instead of len for east asian width
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
192
                l = self._width(chunks[-1])
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
193
194
                # Can at least squeeze this chunk onto the current line.
195
                if cur_len + l <= width:
196
                    cur_line.append(chunks.pop())
197
                    cur_len += l
198
199
                # Nope, this line is full.
200
                else:
201
                    break
202
203
            # The current line is full, and the next chunk is too big to
204
            # fit on *any* line (not just this one).
5820.1.19 by INADA Naoki
Add keyword parameter 'ambiguous_width' that specifies width for character
205
            if chunks and self._width(chunks[-1]) > width:
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
206
                self._handle_long_word(chunks, cur_line, cur_len, width)
207
208
            # If the last chunk on this line is all whitespace, drop it.
5820.1.26 by INADA Naoki
Cleanup. Remove spaces in empty line and shorten line having 80 characters.
209
            if self.drop_whitespace and cur_line and not cur_line[-1].strip():
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
210
                del cur_line[-1]
211
212
            # Convert current line back to a string and store it in list
213
            # of all lines (return value).
214
            if cur_line:
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
215
                lines.append(indent + u''.join(cur_line))
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
216
217
        return lines
218
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
219
    def _split(self, text):
6973.6.4 by Jelmer Vernooij
Avoid text_type()
220
        chunks = textwrap.TextWrapper._split(self, text)
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
221
        cjk_split_chunks = []
222
        for chunk in chunks:
223
            prev_pos = 0
224
            for pos, char in enumerate(chunk):
5820.1.27 by INADA Naoki
Fix error when fix_sentence_endings=True.
225
                if self._unicode_char_width(char) == 2:
5820.1.10 by INADA Naoki
utextwrap: Change a way to split between CJK characters.
226
                    if prev_pos < pos:
227
                        cjk_split_chunks.append(chunk[prev_pos:pos])
228
                    cjk_split_chunks.append(char)
229
                    prev_pos = pos+1
230
            if prev_pos < len(chunk):
231
                cjk_split_chunks.append(chunk[prev_pos:])
232
        return cjk_split_chunks
233
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
234
    def wrap(self, text):
235
        # ensure text is unicode
6973.6.2 by Jelmer Vernooij
Fix more tests.
236
        return textwrap.TextWrapper.wrap(self, text_type(text))
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
237
238
# -- Convenience interface ---------------------------------------------
239
5820.1.2 by INADA Naoki
bzrlib.utextwrap uses bzrlib.osutils.terminal_width() when width is not specified.
240
def wrap(text, width=None, **kwargs):
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
241
    """Wrap a single paragraph of text, returning a list of wrapped lines.
242
243
    Reformat the single paragraph in 'text' so it fits in lines of no
244
    more than 'width' columns, and return a list of wrapped lines.  By
245
    default, tabs in 'text' are expanded with string.expandtabs(), and
246
    all other whitespace characters (including newline) are converted to
247
    space.  See TextWrapper class for available keyword args to customize
248
    wrapping behaviour.
249
    """
5820.1.9 by INADA Naoki
Default width of UTextWrapper is also osutils.terminal_widtth() and
250
    return UTextWrapper(width=width, **kwargs).wrap(text)
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
251
5820.1.2 by INADA Naoki
bzrlib.utextwrap uses bzrlib.osutils.terminal_width() when width is not specified.
252
def fill(text, width=None, **kwargs):
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
253
    """Fill a single paragraph of text, returning a new string.
254
255
    Reformat the single paragraph in 'text' to fit in lines of no more
256
    than 'width' columns, and return a new string containing the entire
257
    wrapped paragraph.  As with wrap(), tabs are expanded and other
258
    whitespace characters converted to space.  See TextWrapper class for
259
    available keyword args to customize wrapping behaviour.
260
    """
5820.1.9 by INADA Naoki
Default width of UTextWrapper is also osutils.terminal_widtth() and
261
    return UTextWrapper(width=width, **kwargs).fill(text)
5820.1.1 by INADA Naoki
Add utextwrap that is same to textwrap but supports double width characters in east asia.
262