/brz/remove-bazaar

To get this branch, use:
bzr branch http://gegoxaren.bato24.eu/bzr/brz/remove-bazaar
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
1
# Copyright (C) 2006 Canonical Ltd
1540.3.18 by Martin Pool
Style review fixes (thanks robertc)
2
#
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
3
# This program is free software; you can redistribute it and/or modify
4
# it under the terms of the GNU General Public License as published by
5
# the Free Software Foundation; either version 2 of the License, or
6
# (at your option) any later version.
1540.3.18 by Martin Pool
Style review fixes (thanks robertc)
7
#
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
8
# This program is distributed in the hope that it will be useful,
9
# but WITHOUT ANY WARRANTY; without even the implied warranty of
10
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
11
# GNU General Public License for more details.
1540.3.18 by Martin Pool
Style review fixes (thanks robertc)
12
#
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
13
# You should have received a copy of the GNU General Public License
14
# along with this program; if not, write to the Free Software
15
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
16
17
"""http/https transport using pycurl"""
18
19
# TODO: test reporting of http errors
1887.1.1 by Adeodato Simó
Do not separate paragraphs in the copyright statement with blank lines,
20
#
1616.1.9 by Martin Pool
Set Cache-control: max-age=0 and Pragma: no-cache
21
# TODO: Transport option to control caching of particular requests; broadly we
22
# would want to offer "caching allowed" or "must revalidate", depending on
23
# whether we expect a particular file will be modified after it's committed.
24
# It's probably safer to just always revalidate.  mbp 20060321
25
2164.2.16 by Vincent Ladeuil
Add tests.
26
# TODO: Some refactoring could be done to avoid the strange idiom
27
# used to capture data and headers while setting up the request
28
# (and having to pass 'header' to _curl_perform to handle
29
# redirections) . This could be achieved by creating a
30
# specialized Curl object and returning code, headers and data
31
# from _curl_perform.  Not done because we may deprecate pycurl in the
32
# future -- vila 20070212
33
1612.1.1 by Martin Pool
Raise errors correctly on pycurl connection failure
34
import os
1786.1.42 by John Arbash Meinel
Update _extract_headers, make it less generic, and non recursive.
35
from cStringIO import StringIO
2298.5.1 by Alexander Belchenko
Bugfix #82086: Searching location of CA bundle for PyCurl in env variable (CURL_CA_BUNDLE), and on win32 along the PATH
36
import sys
1540.3.5 by Martin Pool
Raise exception if unicode is passed to transport; formatting fixes
37
2004.1.25 by v.ladeuil+lp at free
Shuffle http related test code. Hopefully it ends up at the right place :)
38
from bzrlib import (
39
    errors,
40
    __version__ as bzrlib_version,
41
    )
1540.3.15 by Martin Pool
[merge] large merge to sync with bzr.dev
42
import bzrlib
2004.1.25 by v.ladeuil+lp at free
Shuffle http related test code. Hopefully it ends up at the right place :)
43
from bzrlib.errors import (NoSuchFile,
44
                           ConnectionError,
1540.3.7 by Martin Pool
Prepare to select a transport depending on what dependencies can be satisfied.
45
                           DependencyNotPresent)
1540.3.18 by Martin Pool
Style review fixes (thanks robertc)
46
from bzrlib.trace import mutter
1636.1.2 by Robert Collins
More review fixen to the relpath at '/' fixes.
47
from bzrlib.transport import register_urlparse_netloc_protocol
2004.1.25 by v.ladeuil+lp at free
Shuffle http related test code. Hopefully it ends up at the right place :)
48
from bzrlib.transport.http import (
2298.5.1 by Alexander Belchenko
Bugfix #82086: Searching location of CA bundle for PyCurl in env variable (CURL_CA_BUNDLE), and on win32 along the PATH
49
    ca_bundle,
2004.1.25 by v.ladeuil+lp at free
Shuffle http related test code. Hopefully it ends up at the right place :)
50
    _extract_headers,
51
    HttpTransportBase,
52
    _pycurl_errors,
53
    response,
54
    )
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
55
1540.3.7 by Martin Pool
Prepare to select a transport depending on what dependencies can be satisfied.
56
try:
57
    import pycurl
58
except ImportError, e:
59
    mutter("failed to import pycurl: %s", e)
60
    raise DependencyNotPresent('pycurl', e)
61
1684.1.5 by Martin Pool
(patch) check that pycurl will actuall initialize as well as load (Alexander)
62
try:
63
    # see if we can actually initialize PyCurl - sometimes it will load but
64
    # fail to start up due to this bug:
65
    #  
66
    #   32. (At least on Windows) If libcurl is built with c-ares and there's
67
    #   no DNS server configured in the system, the ares_init() call fails and
68
    #   thus curl_easy_init() fails as well. This causes weird effects for
69
    #   people who use numerical IP addresses only.
70
    #
71
    # reported by Alexander Belchenko, 2006-04-26
72
    pycurl.Curl()
73
except pycurl.error, e:
74
    mutter("failed to initialize pycurl: %s", e)
75
    raise DependencyNotPresent('pycurl', e)
76
1540.3.7 by Martin Pool
Prepare to select a transport depending on what dependencies can be satisfied.
77
1636.1.2 by Robert Collins
More review fixen to the relpath at '/' fixes.
78
register_urlparse_netloc_protocol('http+pycurl')
1636.1.1 by Robert Collins
Fix calling relpath() and abspath() on transports at their root.
79
80
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
81
class PyCurlTransport(HttpTransportBase):
1540.3.3 by Martin Pool
Review updates of pycurl transport
82
    """http client transport using pycurl
83
84
    PyCurl is a Python binding to the C "curl" multiprotocol client.
85
2004.1.30 by v.ladeuil+lp at free
Fix #62276 and #62029 by providing a more robust http range handling.
86
    This transport can be significantly faster than the builtin
87
    Python client.  Advantages include: DNS caching.
1540.3.3 by Martin Pool
Review updates of pycurl transport
88
    """
89
1786.1.32 by John Arbash Meinel
cleanup pass, allow pycurl connections to be shared between transports.
90
    def __init__(self, base, from_transport=None):
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
91
        super(PyCurlTransport, self).__init__(base)
2294.3.1 by Vincent Ladeuil
Fix #85305 by issuing an exception instead of a traceback.
92
        if base.startswith('https'):
93
            # Check availability of https into pycurl supported
94
            # protocols
95
            supported = pycurl.version_info()[8]
96
            if 'https' not in supported:
97
                raise DependencyNotPresent('pycurl', 'no https support')
2298.5.1 by Alexander Belchenko
Bugfix #82086: Searching location of CA bundle for PyCurl in env variable (CURL_CA_BUNDLE), and on win32 along the PATH
98
        self.cabundle = ca_bundle.get_ca_path()
1786.1.32 by John Arbash Meinel
cleanup pass, allow pycurl connections to be shared between transports.
99
        if from_transport is not None:
2000.3.1 by v.ladeuil+lp at free
Better connection sharing by using only one curl object.
100
            self._curl = from_transport._curl
1786.1.32 by John Arbash Meinel
cleanup pass, allow pycurl connections to be shared between transports.
101
        else:
102
            mutter('using pycurl %s' % pycurl.version)
2000.3.1 by v.ladeuil+lp at free
Better connection sharing by using only one curl object.
103
            self._curl = pycurl.Curl()
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
104
1540.3.10 by Martin Pool
[broken] keep hooking pycurl into test framework
105
    def should_cache(self):
106
        """Return True if the data pulled across should be cached locally.
107
        """
108
        return True
109
1540.3.3 by Martin Pool
Review updates of pycurl transport
110
    def has(self, relpath):
1786.1.32 by John Arbash Meinel
cleanup pass, allow pycurl connections to be shared between transports.
111
        """See Transport.has()"""
112
        # We set NO BODY=0 in _get_full, so it should be safe
113
        # to re-use the non-range curl object
2000.3.1 by v.ladeuil+lp at free
Better connection sharing by using only one curl object.
114
        curl = self._curl
1540.3.24 by Martin Pool
Add new protocol 'http+pycurl' that always uses PyCurl.
115
        abspath = self._real_abspath(relpath)
1540.3.14 by Martin Pool
[pycurl] Make Curl instance a local variable not a long-lived object.
116
        curl.setopt(pycurl.URL, abspath)
117
        self._set_curl_options(curl)
2018.2.28 by Andrew Bennetts
Changes in response to review: re-use _base_curl, rather than keeping a seperate _post_curl object; add docstring to test_http.RecordingServer, set is_user_error on some new exceptions.
118
        curl.setopt(pycurl.HTTPGET, 1)
1540.3.3 by Martin Pool
Review updates of pycurl transport
119
        # don't want the body - ie just do a HEAD request
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
120
        # This means "NO BODY" not 'nobody'
1540.3.14 by Martin Pool
[pycurl] Make Curl instance a local variable not a long-lived object.
121
        curl.setopt(pycurl.NOBODY, 1)
2164.2.16 by Vincent Ladeuil
Add tests.
122
        # But we need headers to handle redirections
123
        header = StringIO()
124
        curl.setopt(pycurl.HEADERFUNCTION, header.write)
2004.1.16 by v.ladeuil+lp at free
Add tests against erroneous http status lines.
125
        # In some erroneous cases, pycurl will emit text on
126
        # stdout if we don't catch it (see InvalidStatus tests
127
        # for one such occurrence).
128
        blackhole = StringIO()
129
        curl.setopt(pycurl.WRITEFUNCTION, blackhole.write)
2164.2.16 by Vincent Ladeuil
Add tests.
130
        self._curl_perform(curl, header)
1540.3.14 by Martin Pool
[pycurl] Make Curl instance a local variable not a long-lived object.
131
        code = curl.getinfo(pycurl.HTTP_CODE)
132
        if code == 404: # not found
133
            return False
2164.2.16 by Vincent Ladeuil
Add tests.
134
        elif code == 200: # "ok"
1540.3.14 by Martin Pool
[pycurl] Make Curl instance a local variable not a long-lived object.
135
            return True
136
        else:
1612.1.1 by Martin Pool
Raise errors correctly on pycurl connection failure
137
            self._raise_curl_http_error(curl)
2000.3.1 by v.ladeuil+lp at free
Better connection sharing by using only one curl object.
138
2164.2.15 by Vincent Ladeuil
Http redirections are not followed by default. Do not use hints
139
    def _get(self, relpath, ranges, tail_amount=0):
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
140
        # This just switches based on the type of request
141
        if ranges is not None or tail_amount not in (0, None):
142
            return self._get_ranged(relpath, ranges, tail_amount=tail_amount)
143
        else:
2164.2.5 by v.ladeuil+lp at free
Simpler implementation using inspect. 'hints' is a kwargs.
144
            return self._get_full(relpath)
2000.3.1 by v.ladeuil+lp at free
Better connection sharing by using only one curl object.
145
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
146
    def _setup_get_request(self, curl, relpath):
2018.2.6 by Andrew Bennetts
HTTP client starting to work (pycurl for the moment).
147
        # Make sure we do a GET request. versions > 7.14.1 also set the
148
        # NO BODY flag, but we'll do it ourselves in case it is an older
149
        # pycurl version
150
        curl.setopt(pycurl.NOBODY, 0)
151
        curl.setopt(pycurl.HTTPGET, 1)
152
        return self._setup_request(curl, relpath)
153
154
    def _setup_request(self, curl, relpath):
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
155
        """Do the common setup stuff for making a request
156
157
        :param curl: The curl object to place the request on
158
        :param relpath: The relative path that we want to get
159
        :return: (abspath, data, header) 
160
                 abspath: full url
161
                 data: file that will be filled with the body
162
                 header: file that will be filled with the headers
163
        """
164
        abspath = self._real_abspath(relpath)
165
        curl.setopt(pycurl.URL, abspath)
166
        self._set_curl_options(curl)
167
168
        data = StringIO()
169
        header = StringIO()
170
        curl.setopt(pycurl.WRITEFUNCTION, data.write)
171
        curl.setopt(pycurl.HEADERFUNCTION, header.write)
172
173
        return abspath, data, header
174
2164.2.5 by v.ladeuil+lp at free
Simpler implementation using inspect. 'hints' is a kwargs.
175
    def _get_full(self, relpath):
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
176
        """Make a request for the entire file"""
2000.3.1 by v.ladeuil+lp at free
Better connection sharing by using only one curl object.
177
        curl = self._curl
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
178
        abspath, data, header = self._setup_get_request(curl, relpath)
2164.2.16 by Vincent Ladeuil
Add tests.
179
        self._curl_perform(curl, header)
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
180
181
        code = curl.getinfo(pycurl.HTTP_CODE)
182
        data.seek(0)
183
184
        if code == 404:
185
            raise NoSuchFile(abspath)
186
        if code != 200:
2004.1.25 by v.ladeuil+lp at free
Shuffle http related test code. Hopefully it ends up at the right place :)
187
            self._raise_curl_http_error(
188
                curl, 'expected 200 or 404 for full response.')
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
189
190
        return code, data
191
192
    def _get_ranged(self, relpath, ranges, tail_amount):
193
        """Make a request for just part of the file."""
2000.3.1 by v.ladeuil+lp at free
Better connection sharing by using only one curl object.
194
        curl = self._curl
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
195
        abspath, data, header = self._setup_get_request(curl, relpath)
196
2004.1.30 by v.ladeuil+lp at free
Fix #62276 and #62029 by providing a more robust http range handling.
197
        range_header = self.attempted_range_header(ranges, tail_amount)
198
        if range_header is None:
199
            # Forget ranges, the server can't handle them
200
            return self._get_full(relpath)
201
2164.2.16 by Vincent Ladeuil
Add tests.
202
        self._curl_perform(curl, header,
203
                           ['Range: bytes=%s'
204
                            % self.range_header(ranges, tail_amount)])
1786.1.33 by John Arbash Meinel
Cleanup pass #2
205
        data.seek(0)
206
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
207
        code = curl.getinfo(pycurl.HTTP_CODE)
1979.1.1 by John Arbash Meinel
Fix bug #57723, parse boundary="" correctly, since Squid uses it
208
        # mutter('header:\n%r', header.getvalue())
1786.1.42 by John Arbash Meinel
Update _extract_headers, make it less generic, and non recursive.
209
        headers = _extract_headers(header.getvalue(), abspath)
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
210
        # handle_response will raise NoSuchFile, etc based on the response code
211
        return code, response.handle_response(abspath, code, headers, data)
1786.1.4 by John Arbash Meinel
Adding HEADERFUNCTION which lets us get any response codes we want.
212
2018.2.6 by Andrew Bennetts
HTTP client starting to work (pycurl for the moment).
213
    def _post(self, body_bytes):
214
        fake_file = StringIO(body_bytes)
2000.3.4 by v.ladeuil+lp at free
Merge bzr.dev
215
        curl = self._curl
2018.2.28 by Andrew Bennetts
Changes in response to review: re-use _base_curl, rather than keeping a seperate _post_curl object; add docstring to test_http.RecordingServer, set is_user_error on some new exceptions.
216
        # Other places that use _base_curl for GET requests explicitly set
217
        # HTTPGET, so it should be safe to re-use the same object for both GETs
218
        # and POSTs.
2018.2.6 by Andrew Bennetts
HTTP client starting to work (pycurl for the moment).
219
        curl.setopt(pycurl.POST, 1)
220
        curl.setopt(pycurl.POSTFIELDSIZE, len(body_bytes))
221
        curl.setopt(pycurl.READFUNCTION, fake_file.read)
222
        abspath, data, header = self._setup_request(curl, '.bzr/smart')
2000.3.4 by v.ladeuil+lp at free
Merge bzr.dev
223
        # We override the Expect: header so that pycurl will send the POST
224
        # body immediately.
2164.2.16 by Vincent Ladeuil
Add tests.
225
        self._curl_perform(curl, header, ['Expect: '])
2018.2.6 by Andrew Bennetts
HTTP client starting to work (pycurl for the moment).
226
        data.seek(0)
227
        code = curl.getinfo(pycurl.HTTP_CODE)
228
        headers = _extract_headers(header.getvalue(), abspath)
229
        return code, response.handle_response(abspath, code, headers, data)
230
1786.1.40 by John Arbash Meinel
code cleanups from Martin Pool.
231
    def _raise_curl_http_error(self, curl, info=None):
1612.1.1 by Martin Pool
Raise errors correctly on pycurl connection failure
232
        code = curl.getinfo(pycurl.HTTP_CODE)
233
        url = curl.getinfo(pycurl.EFFECTIVE_URL)
2004.1.27 by v.ladeuil+lp at free
Fix bug #57644 by issuing an explicit error message.
234
        # Some error codes can be handled the same way for all
235
        # requests
236
        if code == 403:
2004.1.34 by v.ladeuil+lp at free
Cosmetic fix for bug #57644.
237
            raise errors.TransportError(
238
                'Server refuses to fullfil the request for: %s' % url)
1786.1.40 by John Arbash Meinel
code cleanups from Martin Pool.
239
        else:
2004.1.27 by v.ladeuil+lp at free
Fix bug #57644 by issuing an explicit error message.
240
            if info is None:
241
                msg = ''
242
            else:
243
                msg = ': ' + info
244
            raise errors.InvalidHttpResponse(
245
                url, 'Unable to handle http code %d%s' % (code,msg))
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
246
1540.3.13 by Martin Pool
Curl should follow http redirects, the same as urllib
247
    def _set_curl_options(self, curl):
248
        """Set options for all requests"""
1540.3.14 by Martin Pool
[pycurl] Make Curl instance a local variable not a long-lived object.
249
        ## curl.setopt(pycurl.VERBOSE, 1)
1616.1.9 by Martin Pool
Set Cache-control: max-age=0 and Pragma: no-cache
250
        # TODO: maybe include a summary of the pycurl version
1786.1.33 by John Arbash Meinel
Cleanup pass #2
251
        ua_str = 'bzr/%s (pycurl)' % (bzrlib.__version__,)
1540.3.15 by Martin Pool
[merge] large merge to sync with bzr.dev
252
        curl.setopt(pycurl.USERAGENT, ua_str)
2298.5.1 by Alexander Belchenko
Bugfix #82086: Searching location of CA bundle for PyCurl in env variable (CURL_CA_BUNDLE), and on win32 along the PATH
253
        if self.cabundle:
254
            curl.setopt(pycurl.CAINFO, self.cabundle)
1540.3.3 by Martin Pool
Review updates of pycurl transport
255
2164.2.16 by Vincent Ladeuil
Add tests.
256
    def _curl_perform(self, curl, header, more_headers=[]):
1540.3.3 by Martin Pool
Review updates of pycurl transport
257
        """Perform curl operation and translate exceptions."""
258
        try:
2000.3.1 by v.ladeuil+lp at free
Better connection sharing by using only one curl object.
259
            # There's no way in http/1.0 to say "must
260
            # revalidate"; we don't want to force it to always
261
            # retrieve.  so just turn off the default Pragma
262
            # provided by Curl.
263
            headers = ['Cache-control: max-age=0',
264
                       'Pragma: no-cache',
265
                       'Connection: Keep-Alive']
266
            curl.setopt(pycurl.HTTPHEADER, headers + more_headers)
1540.3.14 by Martin Pool
[pycurl] Make Curl instance a local variable not a long-lived object.
267
            curl.perform()
1540.3.3 by Martin Pool
Review updates of pycurl transport
268
        except pycurl.error, e:
1786.1.35 by John Arbash Meinel
For pycurl inverse of (NOBODY,1) is (HTTPGET,1) not (NOBODY,0)
269
            url = curl.getinfo(pycurl.EFFECTIVE_URL)
270
            mutter('got pycurl error: %s, %s, %s, url: %s ',
271
                    e[0], _pycurl_errors.errorcode[e[0]], e, url)
272
            if e[0] in (_pycurl_errors.CURLE_COULDNT_RESOLVE_HOST,
2051.2.1 by Matthieu Moy
correct handling of proxy error
273
                        _pycurl_errors.CURLE_COULDNT_CONNECT,
2004.1.40 by v.ladeuil+lp at free
Fix the race condition again and correct some small typos to be in
274
                        _pycurl_errors.CURLE_GOT_NOTHING,
275
                        _pycurl_errors.CURLE_COULDNT_RESOLVE_PROXY):
2051.2.1 by Matthieu Moy
correct handling of proxy error
276
                raise ConnectionError('curl connection error (%s)\non %s'
277
                              % (e[1], url))
2000.3.9 by v.ladeuil+lp at free
The tests that would have help avoid bug #73948 and all that mess :)
278
            elif e[0] == _pycurl_errors.CURLE_PARTIAL_FILE:
2180.1.2 by Aaron Bentley
Grammar fixes
279
                # Pycurl itself has detected a short read.  We do
280
                # not have all the information for the
2000.3.9 by v.ladeuil+lp at free
The tests that would have help avoid bug #73948 and all that mess :)
281
                # ShortReadvError, but that should be enough
282
                raise errors.ShortReadvError(url,
283
                                             offset='unknown', length='unknown',
284
                                             actual='unknown',
285
                                             extra='Server aborted the request')
2180.1.2 by Aaron Bentley
Grammar fixes
286
            # jam 20060713 The code didn't use to re-raise the exception here,
1786.1.27 by John Arbash Meinel
Fix up the http transports so that tests pass with the new configuration.
287
            # but that seemed bogus
288
            raise
2164.2.16 by Vincent Ladeuil
Add tests.
289
        code = curl.getinfo(pycurl.HTTP_CODE)
290
        if code in (301, 302, 303, 307):
291
            url = curl.getinfo(pycurl.EFFECTIVE_URL)
292
            headers = _extract_headers(header.getvalue(), url)
293
            redirected_to = headers['Location']
294
            raise errors.RedirectRequested(url,
295
                                           redirected_to,
296
                                           is_permament=(code == 301),
297
                                           qual_proto=self._qualified_proto)
1540.3.1 by Martin Pool
First-cut implementation of pycurl. Substantially faster than using urllib.
298
1540.3.10 by Martin Pool
[broken] keep hooking pycurl into test framework
299
300
def get_test_permutations():
301
    """Return the permutations to be used in testing."""
2004.1.25 by v.ladeuil+lp at free
Shuffle http related test code. Hopefully it ends up at the right place :)
302
    from bzrlib.tests.HttpServer import HttpServer_PyCurl
1540.3.24 by Martin Pool
Add new protocol 'http+pycurl' that always uses PyCurl.
303
    return [(PyCurlTransport, HttpServer_PyCurl),
1540.3.10 by Martin Pool
[broken] keep hooking pycurl into test framework
304
            ]