comparison mercurial/hgweb/request.py @ 36897:d7fd203e36cc

hgweb: refactor repository name URL parsing The hgwebdir WSGI application detects when a requested URL is for a known repository and it effectively forwards the request to the hgweb WSGI application. The hgweb WSGI application needs to route the request based on the base URL for the repository. The way this normally works is SCRIPT_NAME is used to resolve the base URL and PATH_INFO contains the path after the script. But with hgwebdir, SCRIPT_NAME refers to hgwebdir, not the base URL for the repository. So, there was a hacky REPO_NAME environment variable being set to convey the part of the URL that represented the repository so hgweb could ignore this path component for routing purposes. The use of the environment variable for passing internal state is pretty hacky. Plus, it wasn't clear from the perspective of the URL parsing code what was going on. This commit improves matters by making the repository name an explicit argument to the request parser. The logic around handling of this value has been shored up. We add various checks that the argument is used properly - that the repository name does represent the prefix of the PATH_INFO. Differential Revision: https://phab.mercurial-scm.org/D2819
author Gregory Szorc <gregory.szorc@gmail.com>
date Sun, 11 Mar 2018 13:11:13 -0700
parents b2a3308d6a21
children d0b0fedbfb53
comparison
equal deleted inserted replaced
36896:b2a3308d6a21 36897:d7fd203e36cc
153 # insensitive keys. 153 # insensitive keys.
154 headers = attr.ib() 154 headers = attr.ib()
155 # Request body input stream. 155 # Request body input stream.
156 bodyfh = attr.ib() 156 bodyfh = attr.ib()
157 157
158 def parserequestfromenv(env, bodyfh): 158 def parserequestfromenv(env, bodyfh, reponame=None):
159 """Parse URL components from environment variables. 159 """Parse URL components from environment variables.
160 160
161 WSGI defines request attributes via environment variables. This function 161 WSGI defines request attributes via environment variables. This function
162 parses the environment variables into a data structure. 162 parses the environment variables into a data structure.
163
164 If ``reponame`` is defined, the leading path components matching that
165 string are effectively shifted from ``PATH_INFO`` to ``SCRIPT_NAME``.
166 This simulates the world view of a WSGI application that processes
167 requests from the base URL of a repo.
163 """ 168 """
164 # PEP-0333 defines the WSGI spec and is a useful reference for this code. 169 # PEP-0333 defines the WSGI spec and is a useful reference for this code.
165 170
166 # We first validate that the incoming object conforms with the WSGI spec. 171 # We first validate that the incoming object conforms with the WSGI spec.
167 # We only want to be dealing with spec-conforming WSGI implementations. 172 # We only want to be dealing with spec-conforming WSGI implementations.
213 218
214 if env.get('QUERY_STRING'): 219 if env.get('QUERY_STRING'):
215 fullurl += '?' + env['QUERY_STRING'] 220 fullurl += '?' + env['QUERY_STRING']
216 advertisedfullurl += '?' + env['QUERY_STRING'] 221 advertisedfullurl += '?' + env['QUERY_STRING']
217 222
218 # When dispatching requests, we look at the URL components (PATH_INFO 223 # If ``reponame`` is defined, that must be a prefix on PATH_INFO
219 # and QUERY_STRING) after the application root (SCRIPT_NAME). But hgwebdir 224 # that represents the repository being dispatched to. When computing
220 # has the concept of "virtual" repositories. This is defined via REPO_NAME. 225 # the dispatch info, we ignore these leading path components.
221 # If REPO_NAME is defined, we append it to SCRIPT_NAME to form a new app
222 # root. We also exclude its path components from PATH_INFO when resolving
223 # the dispatch path.
224 226
225 apppath = env.get('SCRIPT_NAME', '') 227 apppath = env.get('SCRIPT_NAME', '')
226 228
227 if env.get('REPO_NAME'): 229 if reponame:
228 if not apppath.endswith('/'): 230 repoprefix = '/' + reponame.strip('/')
229 apppath += '/' 231
230 232 if not env.get('PATH_INFO'):
231 apppath += env.get('REPO_NAME') 233 raise error.ProgrammingError('reponame requires PATH_INFO')
232 234
233 if 'PATH_INFO' in env: 235 if not env['PATH_INFO'].startswith(repoprefix):
236 raise error.ProgrammingError('PATH_INFO does not begin with repo '
237 'name: %s (%s)' % (env['PATH_INFO'],
238 reponame))
239
240 dispatchpath = env['PATH_INFO'][len(repoprefix):]
241
242 if dispatchpath and not dispatchpath.startswith('/'):
243 raise error.ProgrammingError('reponame prefix of PATH_INFO does '
244 'not end at path delimiter: %s (%s)' %
245 (env['PATH_INFO'], reponame))
246
247 apppath = apppath.rstrip('/') + repoprefix
248 dispatchparts = dispatchpath.strip('/').split('/')
249 elif env.get('PATH_INFO', '').strip('/'):
234 dispatchparts = env['PATH_INFO'].strip('/').split('/') 250 dispatchparts = env['PATH_INFO'].strip('/').split('/')
235
236 # Strip out repo parts.
237 repoparts = env.get('REPO_NAME', '').split('/')
238 if dispatchparts[:len(repoparts)] == repoparts:
239 dispatchparts = dispatchparts[len(repoparts):]
240 else: 251 else:
241 dispatchparts = [] 252 dispatchparts = []
242 253
243 dispatchpath = '/'.join(dispatchparts) 254 dispatchpath = '/'.join(dispatchparts)
244 255
281 remoteuser=env.get('REMOTE_USER'), 292 remoteuser=env.get('REMOTE_USER'),
282 remotehost=env.get('REMOTE_HOST'), 293 remotehost=env.get('REMOTE_HOST'),
283 apppath=apppath, 294 apppath=apppath,
284 dispatchparts=dispatchparts, dispatchpath=dispatchpath, 295 dispatchparts=dispatchparts, dispatchpath=dispatchpath,
285 havepathinfo='PATH_INFO' in env, 296 havepathinfo='PATH_INFO' in env,
286 reponame=env.get('REPO_NAME'), 297 reponame=reponame,
287 querystring=querystring, 298 querystring=querystring,
288 qsparams=qsparams, 299 qsparams=qsparams,
289 headers=headers, 300 headers=headers,
290 bodyfh=bodyfh) 301 bodyfh=bodyfh)
291 302