comparison mercurial/hgweb/hgweb_mod.py @ 36814:69b2d0900cd7

hgweb: parse WSGI request into a data structure Currently, our WSGI applications (hgweb_mod and hgwebdir_mod) process the raw WSGI request instance themselves. This means they have to talk in terms of system strings. And they need to know details about what's in the WSGI request. And in the case of hgweb_mod, it is doing some very funky things with URL parsing to impact dispatching. The code is difficult to read and maintain. This commit introduces parsing of the WSGI request into a higher-level and easier-to-reason-about data structure. To prove it works, we hook it up to hgweb_mod and use it for populating the relative URL on the request instance. We hold off on using it in more places because the logic in hgweb_mod is crazy and I don't want to involve those changes with review of the parsing code. The URL construction code has variations that use the HTTP: Host header (the canonical WSGI way of reconstructing the URL) and with the use of SERVER_NAME. We need to differentiate because hgweb is currently using SERVER_NAME for URL construction. Differential Revision: https://phab.mercurial-scm.org/D2734
author Gregory Szorc <gregory.szorc@gmail.com>
date Sat, 10 Mar 2018 10:20:51 -0800
parents ec46415ed826
children 1e2194e0ef62
comparison
equal deleted inserted replaced
36813:ec46415ed826 36814:69b2d0900cd7
314 with profiling.profile(repo.ui, enabled=profile): 314 with profiling.profile(repo.ui, enabled=profile):
315 for r in self._runwsgi(wsgireq, repo): 315 for r in self._runwsgi(wsgireq, repo):
316 yield r 316 yield r
317 317
318 def _runwsgi(self, wsgireq, repo): 318 def _runwsgi(self, wsgireq, repo):
319 req = requestmod.parserequestfromenv(wsgireq.env)
319 rctx = requestcontext(self, repo) 320 rctx = requestcontext(self, repo)
320 321
321 # This state is global across all threads. 322 # This state is global across all threads.
322 encoding.encoding = rctx.config('web', 'encoding') 323 encoding.encoding = rctx.config('web', 'encoding')
323 rctx.repo.ui.environ = wsgireq.env 324 rctx.repo.ui.environ = wsgireq.env
327 # replace it. 328 # replace it.
328 wsgireq.headers = [h for h in wsgireq.headers 329 wsgireq.headers = [h for h in wsgireq.headers
329 if h[0] != 'Content-Security-Policy'] 330 if h[0] != 'Content-Security-Policy']
330 wsgireq.headers.append(('Content-Security-Policy', rctx.csp)) 331 wsgireq.headers.append(('Content-Security-Policy', rctx.csp))
331 332
332 # work with CGI variables to create coherent structure 333 wsgireq.url = pycompat.sysstr(req.apppath)
333 # use SCRIPT_NAME, PATH_INFO and QUERY_STRING as well as our REPO_NAME
334
335 wsgireq.url = wsgireq.env[r'SCRIPT_NAME']
336 if not wsgireq.url.endswith(r'/'):
337 wsgireq.url += r'/'
338 if wsgireq.env.get('REPO_NAME'):
339 wsgireq.url += wsgireq.env[r'REPO_NAME'] + r'/'
340 334
341 if r'PATH_INFO' in wsgireq.env: 335 if r'PATH_INFO' in wsgireq.env:
342 parts = wsgireq.env[r'PATH_INFO'].strip(r'/').split(r'/') 336 parts = wsgireq.env[r'PATH_INFO'].strip(r'/').split(r'/')
343 repo_parts = wsgireq.env.get(r'REPO_NAME', r'').split(r'/') 337 repo_parts = wsgireq.env.get(r'REPO_NAME', r'').split(r'/')
344 if parts[:len(repo_parts)] == repo_parts: 338 if parts[:len(repo_parts)] == repo_parts: