Mercurial > public > mercurial-scm > hg
comparison mercurial/hgweb/request.py @ 36897:d7fd203e36cc
hgweb: refactor repository name URL parsing
The hgwebdir WSGI application detects when a requested URL is for
a known repository and it effectively forwards the request to the
hgweb WSGI application.
The hgweb WSGI application needs to route the request based on the
base URL for the repository. The way this normally works is
SCRIPT_NAME is used to resolve the base URL and PATH_INFO
contains the path after the script.
But with hgwebdir, SCRIPT_NAME refers to hgwebdir, not the base
URL for the repository. So, there was a hacky REPO_NAME environment
variable being set to convey the part of the URL that represented
the repository so hgweb could ignore this path component for
routing purposes.
The use of the environment variable for passing internal state
is pretty hacky. Plus, it wasn't clear from the perspective of
the URL parsing code what was going on.
This commit improves matters by making the repository name an
explicit argument to the request parser. The logic around
handling of this value has been shored up. We add various checks
that the argument is used properly - that the repository name
does represent the prefix of the PATH_INFO.
Differential Revision: https://phab.mercurial-scm.org/D2819
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Sun, 11 Mar 2018 13:11:13 -0700 |
parents | b2a3308d6a21 |
children | d0b0fedbfb53 |
comparison
equal
deleted
inserted
replaced
36896:b2a3308d6a21 | 36897:d7fd203e36cc |
---|---|
153 # insensitive keys. | 153 # insensitive keys. |
154 headers = attr.ib() | 154 headers = attr.ib() |
155 # Request body input stream. | 155 # Request body input stream. |
156 bodyfh = attr.ib() | 156 bodyfh = attr.ib() |
157 | 157 |
158 def parserequestfromenv(env, bodyfh): | 158 def parserequestfromenv(env, bodyfh, reponame=None): |
159 """Parse URL components from environment variables. | 159 """Parse URL components from environment variables. |
160 | 160 |
161 WSGI defines request attributes via environment variables. This function | 161 WSGI defines request attributes via environment variables. This function |
162 parses the environment variables into a data structure. | 162 parses the environment variables into a data structure. |
163 | |
164 If ``reponame`` is defined, the leading path components matching that | |
165 string are effectively shifted from ``PATH_INFO`` to ``SCRIPT_NAME``. | |
166 This simulates the world view of a WSGI application that processes | |
167 requests from the base URL of a repo. | |
163 """ | 168 """ |
164 # PEP-0333 defines the WSGI spec and is a useful reference for this code. | 169 # PEP-0333 defines the WSGI spec and is a useful reference for this code. |
165 | 170 |
166 # We first validate that the incoming object conforms with the WSGI spec. | 171 # We first validate that the incoming object conforms with the WSGI spec. |
167 # We only want to be dealing with spec-conforming WSGI implementations. | 172 # We only want to be dealing with spec-conforming WSGI implementations. |
213 | 218 |
214 if env.get('QUERY_STRING'): | 219 if env.get('QUERY_STRING'): |
215 fullurl += '?' + env['QUERY_STRING'] | 220 fullurl += '?' + env['QUERY_STRING'] |
216 advertisedfullurl += '?' + env['QUERY_STRING'] | 221 advertisedfullurl += '?' + env['QUERY_STRING'] |
217 | 222 |
218 # When dispatching requests, we look at the URL components (PATH_INFO | 223 # If ``reponame`` is defined, that must be a prefix on PATH_INFO |
219 # and QUERY_STRING) after the application root (SCRIPT_NAME). But hgwebdir | 224 # that represents the repository being dispatched to. When computing |
220 # has the concept of "virtual" repositories. This is defined via REPO_NAME. | 225 # the dispatch info, we ignore these leading path components. |
221 # If REPO_NAME is defined, we append it to SCRIPT_NAME to form a new app | |
222 # root. We also exclude its path components from PATH_INFO when resolving | |
223 # the dispatch path. | |
224 | 226 |
225 apppath = env.get('SCRIPT_NAME', '') | 227 apppath = env.get('SCRIPT_NAME', '') |
226 | 228 |
227 if env.get('REPO_NAME'): | 229 if reponame: |
228 if not apppath.endswith('/'): | 230 repoprefix = '/' + reponame.strip('/') |
229 apppath += '/' | 231 |
230 | 232 if not env.get('PATH_INFO'): |
231 apppath += env.get('REPO_NAME') | 233 raise error.ProgrammingError('reponame requires PATH_INFO') |
232 | 234 |
233 if 'PATH_INFO' in env: | 235 if not env['PATH_INFO'].startswith(repoprefix): |
236 raise error.ProgrammingError('PATH_INFO does not begin with repo ' | |
237 'name: %s (%s)' % (env['PATH_INFO'], | |
238 reponame)) | |
239 | |
240 dispatchpath = env['PATH_INFO'][len(repoprefix):] | |
241 | |
242 if dispatchpath and not dispatchpath.startswith('/'): | |
243 raise error.ProgrammingError('reponame prefix of PATH_INFO does ' | |
244 'not end at path delimiter: %s (%s)' % | |
245 (env['PATH_INFO'], reponame)) | |
246 | |
247 apppath = apppath.rstrip('/') + repoprefix | |
248 dispatchparts = dispatchpath.strip('/').split('/') | |
249 elif env.get('PATH_INFO', '').strip('/'): | |
234 dispatchparts = env['PATH_INFO'].strip('/').split('/') | 250 dispatchparts = env['PATH_INFO'].strip('/').split('/') |
235 | |
236 # Strip out repo parts. | |
237 repoparts = env.get('REPO_NAME', '').split('/') | |
238 if dispatchparts[:len(repoparts)] == repoparts: | |
239 dispatchparts = dispatchparts[len(repoparts):] | |
240 else: | 251 else: |
241 dispatchparts = [] | 252 dispatchparts = [] |
242 | 253 |
243 dispatchpath = '/'.join(dispatchparts) | 254 dispatchpath = '/'.join(dispatchparts) |
244 | 255 |
281 remoteuser=env.get('REMOTE_USER'), | 292 remoteuser=env.get('REMOTE_USER'), |
282 remotehost=env.get('REMOTE_HOST'), | 293 remotehost=env.get('REMOTE_HOST'), |
283 apppath=apppath, | 294 apppath=apppath, |
284 dispatchparts=dispatchparts, dispatchpath=dispatchpath, | 295 dispatchparts=dispatchparts, dispatchpath=dispatchpath, |
285 havepathinfo='PATH_INFO' in env, | 296 havepathinfo='PATH_INFO' in env, |
286 reponame=env.get('REPO_NAME'), | 297 reponame=reponame, |
287 querystring=querystring, | 298 querystring=querystring, |
288 qsparams=qsparams, | 299 qsparams=qsparams, |
289 headers=headers, | 300 headers=headers, |
290 bodyfh=bodyfh) | 301 bodyfh=bodyfh) |
291 | 302 |