annotate mercurial/hgweb/protocol.py @ 29792:58467204cac0

hgweb: tweak zlib chunking behavior When doing streaming compression with zlib, zlib appears to emit chunks with data after ~20-30kb on average is available. In other words, most calls to compress() return an empty string. On the mozilla-unified repo, only 48,433 of 921,167 (5.26%) of calls to compress() returned data. In other words, we were sending hundreds of thousands of empty chunks via a generator where they touched who knows how many frames (my guess is millions). Filtering out the empty chunks from the generator cuts down on overhead. In addition, we were previously feeding 8kb chunks into zlib compression. Since this function tends to emit *compressed* data after 20-30kb is available, it would take several calls before data was produced. We increase the amount of data fed in at a time to 32kb. This reduces the number of calls to compress() from 921,167 to 115,146. It also reduces the number of output chunks from 48,433 to 31,377. This does increase the average output chunk size by a little. But I don't think this will matter in most scenarios. The combination of these 2 changes appears to shave ~6s CPU time or ~3% from a server serving the mozilla-unified repo.
author Gregory Szorc <gregory.szorc@gmail.com>
date Sun, 14 Aug 2016 21:29:46 -0700
parents b1809f5d7630
children d34cf260d15b
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5598
d534ba1c4eb4 separate the wire protocol commands from the user interface commands
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents:
diff changeset
1 #
d534ba1c4eb4 separate the wire protocol commands from the user interface commands
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents:
diff changeset
2 # Copyright 21 May 2005 - (c) 2005 Jake Edge <jake@edge2.net>
d534ba1c4eb4 separate the wire protocol commands from the user interface commands
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents:
diff changeset
3 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
d534ba1c4eb4 separate the wire protocol commands from the user interface commands
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents:
diff changeset
4 #
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 8109
diff changeset
5 # This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 9713
diff changeset
6 # GNU General Public License version 2 or any later version.
5598
d534ba1c4eb4 separate the wire protocol commands from the user interface commands
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents:
diff changeset
7
27046
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
8 from __future__ import absolute_import
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
9
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
10 import cgi
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
11 import zlib
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
12
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
13 from .common import (
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
14 HTTP_OK,
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
15 )
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
16
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
17 from .. import (
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
18 util,
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
19 wireproto,
37fcfe52c68c hgweb: use absolute_import
Yuya Nishihara <yuya@tcha.org>
parents: 20903
diff changeset
20 )
28861
86db5cb55d46 pycompat: switch to util.stringio for py3 compat
timeless <timeless@mozdev.org>
parents: 28530
diff changeset
21 stringio = util.stringio
5963
5be210afe1b8 hgweb: explicitly check if requested command exists
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 5915
diff changeset
22
28883
032c4c2f802a pycompat: switch to util.urlreq/util.urlerr for py3 compat
timeless <timeless@mozdev.org>
parents: 28861
diff changeset
23 urlerr = util.urlerr
032c4c2f802a pycompat: switch to util.urlreq/util.urlerr for py3 compat
timeless <timeless@mozdev.org>
parents: 28861
diff changeset
24 urlreq = util.urlreq
032c4c2f802a pycompat: switch to util.urlreq/util.urlerr for py3 compat
timeless <timeless@mozdev.org>
parents: 28861
diff changeset
25
5993
948a41e77902 hgweb: explicit response status
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 5963
diff changeset
26 HGTYPE = 'application/mercurial-0.1'
15017
f4522df38c65 wireproto: add out-of-band error class to allow remote repo to report errors
Andrew Pritchard <andrewp@fogcreek.com>
parents: 14614
diff changeset
27 HGERRTYPE = 'application/hg-error'
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
28
20903
8d477543882b wireproto: introduce an abstractserverproto class
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 18352
diff changeset
29 class webproto(wireproto.abstractserverproto):
14614
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
30 def __init__(self, req, ui):
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
31 self.req = req
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
32 self.response = ''
14614
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
33 self.ui = ui
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
34 def getargs(self, args):
14093
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
35 knownargs = self._args()
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
36 data = {}
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
37 keys = args.split()
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
38 for k in keys:
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
39 if k == '*':
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
40 star = {}
14093
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
41 for key in knownargs.keys():
13721
3458c15ab2f0 wireproto: fix handling of '*' args for HTTP and SSH
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 12704
diff changeset
42 if key != 'cmd' and key not in keys:
14093
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
43 star[key] = knownargs[key][0]
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
44 data['*'] = star
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
45 else:
14093
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
46 data[k] = knownargs[k][0]
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
47 return [data[k] for k in keys]
14093
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
48 def _args(self):
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
49 args = self.req.form.copy()
28530
fd2acc5046f6 http: support sending hgargs via POST body instead of in GET or headers
Augie Fackler <augie@google.com>
parents: 27046
diff changeset
50 postlen = int(self.req.env.get('HTTP_X_HGARGS_POST', 0))
fd2acc5046f6 http: support sending hgargs via POST body instead of in GET or headers
Augie Fackler <augie@google.com>
parents: 27046
diff changeset
51 if postlen:
fd2acc5046f6 http: support sending hgargs via POST body instead of in GET or headers
Augie Fackler <augie@google.com>
parents: 27046
diff changeset
52 args.update(cgi.parse_qs(
fd2acc5046f6 http: support sending hgargs via POST body instead of in GET or headers
Augie Fackler <augie@google.com>
parents: 27046
diff changeset
53 self.req.read(postlen), keep_blank_values=True))
fd2acc5046f6 http: support sending hgargs via POST body instead of in GET or headers
Augie Fackler <augie@google.com>
parents: 27046
diff changeset
54 return args
14093
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
55 chunks = []
14094
d10c6835497e http: minor tweaks to long arg handling
Matt Mackall <mpm@selenic.com>
parents: 14093
diff changeset
56 i = 1
14494
1ffeeb91c55d check-code: flag 0/1 used as constant Boolean expression
Martin Geisler <mg@lazybytes.net>
parents: 14094
diff changeset
57 while True:
14094
d10c6835497e http: minor tweaks to long arg handling
Matt Mackall <mpm@selenic.com>
parents: 14093
diff changeset
58 h = self.req.env.get('HTTP_X_HGARG_' + str(i))
14093
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
59 if h is None:
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
60 break
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
61 chunks += [h]
14094
d10c6835497e http: minor tweaks to long arg handling
Matt Mackall <mpm@selenic.com>
parents: 14093
diff changeset
62 i += 1
14093
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
63 args.update(cgi.parse_qs(''.join(chunks), keep_blank_values=True))
ce99d887585f httprepo: long arguments support (issue2126)
Steven Brown <StevenGBrown@gmail.com>
parents: 13721
diff changeset
64 return args
11621
e46a8b2331a6 protocol: shuffle server methods to group send methods
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11618
diff changeset
65 def getfile(self, fp):
e46a8b2331a6 protocol: shuffle server methods to group send methods
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11618
diff changeset
66 length = int(self.req.env['CONTENT_LENGTH'])
e46a8b2331a6 protocol: shuffle server methods to group send methods
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11618
diff changeset
67 for s in util.filechunkiter(self.req, limit=length):
e46a8b2331a6 protocol: shuffle server methods to group send methods
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11618
diff changeset
68 fp.write(s)
e46a8b2331a6 protocol: shuffle server methods to group send methods
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11618
diff changeset
69 def redirect(self):
14614
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
70 self.oldio = self.ui.fout, self.ui.ferr
28861
86db5cb55d46 pycompat: switch to util.stringio for py3 compat
timeless <timeless@mozdev.org>
parents: 28530
diff changeset
71 self.ui.ferr = self.ui.fout = stringio()
14614
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
72 def restore(self):
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
73 val = self.ui.fout.getvalue()
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
74 self.ui.ferr, self.ui.fout = self.oldio
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
75 return val
11623
31d0a6d50ee2 protocol: extract compression from streaming mechanics
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11622
diff changeset
76 def groupchunks(self, cg):
29788
b1809f5d7630 hgweb: document why we don't allow untrusted settings to control zlib
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29748
diff changeset
77 # Don't allow untrusted settings because disabling compression or
b1809f5d7630 hgweb: document why we don't allow untrusted settings to control zlib
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29748
diff changeset
78 # setting a very high compression level could lead to flooding
b1809f5d7630 hgweb: document why we don't allow untrusted settings to control zlib
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29748
diff changeset
79 # the server's network or CPU.
29748
5e2365698d44 hgweb: config option to control zlib compression level
Gregory Szorc <gregory.szorc@gmail.com>
parents: 28883
diff changeset
80 z = zlib.compressobj(self.ui.configint('server', 'zliblevel', -1))
14494
1ffeeb91c55d check-code: flag 0/1 used as constant Boolean expression
Martin Geisler <mg@lazybytes.net>
parents: 14094
diff changeset
81 while True:
29792
58467204cac0 hgweb: tweak zlib chunking behavior
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29788
diff changeset
82 chunk = cg.read(32768)
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
83 if not chunk:
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
84 break
29792
58467204cac0 hgweb: tweak zlib chunking behavior
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29788
diff changeset
85 data = z.compress(chunk)
58467204cac0 hgweb: tweak zlib chunking behavior
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29788
diff changeset
86 # Not all calls to compress() emit data. It is cheaper to inspect
58467204cac0 hgweb: tweak zlib chunking behavior
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29788
diff changeset
87 # that here than to send it via the generator.
58467204cac0 hgweb: tweak zlib chunking behavior
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29788
diff changeset
88 if data:
58467204cac0 hgweb: tweak zlib chunking behavior
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29788
diff changeset
89 yield data
11623
31d0a6d50ee2 protocol: extract compression from streaming mechanics
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11622
diff changeset
90 yield z.flush()
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
91 def _client(self):
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
92 return 'remote:%s:%s:%s' % (
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
93 self.req.env.get('wsgi.url_scheme') or 'http',
28883
032c4c2f802a pycompat: switch to util.urlreq/util.urlerr for py3 compat
timeless <timeless@mozdev.org>
parents: 28861
diff changeset
94 urlreq.quote(self.req.env.get('REMOTE_HOST', '')),
032c4c2f802a pycompat: switch to util.urlreq/util.urlerr for py3 compat
timeless <timeless@mozdev.org>
parents: 28861
diff changeset
95 urlreq.quote(self.req.env.get('REMOTE_USER', '')))
11595
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
96
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
97 def iscmd(cmd):
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
98 return cmd in wireproto.commands
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
99
368cd5325348 protocol: move hgweb protocol support back into protocol.py
Matt Mackall <mpm@selenic.com>
parents: 11594
diff changeset
100 def call(repo, req, cmd):
14614
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
101 p = webproto(req, repo.ui)
11625
cdeb861335d5 protocol: wrap non-string protocol responses in classes
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11623
diff changeset
102 rsp = wireproto.dispatch(repo, p, cmd)
11626
2f8adc60e013 protocol: use generators instead of req.write() for hgweb stream responses
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11625
diff changeset
103 if isinstance(rsp, str):
18352
e33b9b92a200 hgweb: pass the actual response body to request.response, not just the length
Mads Kiilerich <mads@kiilerich.com>
parents: 18346
diff changeset
104 req.respond(HTTP_OK, HGTYPE, body=rsp)
e33b9b92a200 hgweb: pass the actual response body to request.response, not just the length
Mads Kiilerich <mads@kiilerich.com>
parents: 18346
diff changeset
105 return []
11626
2f8adc60e013 protocol: use generators instead of req.write() for hgweb stream responses
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11625
diff changeset
106 elif isinstance(rsp, wireproto.streamres):
2f8adc60e013 protocol: use generators instead of req.write() for hgweb stream responses
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11625
diff changeset
107 req.respond(HTTP_OK, HGTYPE)
2f8adc60e013 protocol: use generators instead of req.write() for hgweb stream responses
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11625
diff changeset
108 return rsp.gen
2f8adc60e013 protocol: use generators instead of req.write() for hgweb stream responses
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11625
diff changeset
109 elif isinstance(rsp, wireproto.pushres):
14614
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
110 val = p.restore()
18346
6c2563b2c1c6 hgweb: use Content-Length for pushres
Mads Kiilerich <mads@kiilerich.com>
parents: 15017
diff changeset
111 rsp = '%d\n%s' % (rsp.res, val)
18352
e33b9b92a200 hgweb: pass the actual response body to request.response, not just the length
Mads Kiilerich <mads@kiilerich.com>
parents: 18346
diff changeset
112 req.respond(HTTP_OK, HGTYPE, body=rsp)
e33b9b92a200 hgweb: pass the actual response body to request.response, not just the length
Mads Kiilerich <mads@kiilerich.com>
parents: 18346
diff changeset
113 return []
12703
40bb5853fc4b wireproto: introduce pusherr() to deal with "unsynced changes" error
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 11626
diff changeset
114 elif isinstance(rsp, wireproto.pusherr):
12704
ca6e2adc3e4d wireproto/http: drain the incoming bundle in case of errors
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 12703
diff changeset
115 # drain the incoming bundle
ca6e2adc3e4d wireproto/http: drain the incoming bundle in case of errors
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 12703
diff changeset
116 req.drain()
14614
afccc64eea73 ui: use I/O descriptors internally
Idan Kamara <idankk86@gmail.com>
parents: 14494
diff changeset
117 p.restore()
12703
40bb5853fc4b wireproto: introduce pusherr() to deal with "unsynced changes" error
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 11626
diff changeset
118 rsp = '0\n%s\n' % rsp.res
18352
e33b9b92a200 hgweb: pass the actual response body to request.response, not just the length
Mads Kiilerich <mads@kiilerich.com>
parents: 18346
diff changeset
119 req.respond(HTTP_OK, HGTYPE, body=rsp)
e33b9b92a200 hgweb: pass the actual response body to request.response, not just the length
Mads Kiilerich <mads@kiilerich.com>
parents: 18346
diff changeset
120 return []
15017
f4522df38c65 wireproto: add out-of-band error class to allow remote repo to report errors
Andrew Pritchard <andrewp@fogcreek.com>
parents: 14614
diff changeset
121 elif isinstance(rsp, wireproto.ooberror):
f4522df38c65 wireproto: add out-of-band error class to allow remote repo to report errors
Andrew Pritchard <andrewp@fogcreek.com>
parents: 14614
diff changeset
122 rsp = rsp.message
18352
e33b9b92a200 hgweb: pass the actual response body to request.response, not just the length
Mads Kiilerich <mads@kiilerich.com>
parents: 18346
diff changeset
123 req.respond(HTTP_OK, HGERRTYPE, body=rsp)
e33b9b92a200 hgweb: pass the actual response body to request.response, not just the length
Mads Kiilerich <mads@kiilerich.com>
parents: 18346
diff changeset
124 return []