comparison mercurial/hgweb/hgwebdir_mod.py @ 36845:ff2370a70fe8 stable

hgweb: garbage collect on every request There appears to be a cycle in localrepository or hgweb that is preventing repositories from being garbage collected when hgwebdir dispatches to hgweb. Every request creates a new repository instance and then leaks that object and other referenced objects. A periodic GC to find cycles will eventually collect the old repositories. But these don't run reliably and rapid requests to hgwebdir can result in rapidly increasing memory consumption. With the Firefox repository, repeated requests to raw-file URLs leak ~100 MB per hgwebdir request (most of this appears to be cached manifest data structures). WSGI processes quickly grow to >1 GB RSS. Breaking the cycles in localrepository is going to be a bit of work. Because we know that hgwebdir leaks localrepository instances, let's put a band aid on the problem in the form of an explicit gc.collect() on every hgwebdir request. As the inline comment states, ideally we'd do this in a finally block for the current request iff it dispatches to hgweb. But _runwsgi() returns an explicit value. We need the finally to run after generator exhaustion. So we'd need to refactor _runwsgi() to "yield" instead of "return." That's too much change for a patch to stable. So we implement this hack one function above and run it on every request. The performance impact of this change should be minimal. Any impact should be offset by benefits from not having hgwebdir processes leak memory.
author Gregory Szorc <gregory.szorc@gmail.com>
date Mon, 12 Mar 2018 13:15:00 -0700
parents d1fccbd50fcd
children c479692690ef
comparison
equal deleted inserted replaced
36844:eeb87b24aea7 36845:ff2370a70fe8
6 # This software may be used and distributed according to the terms of the 6 # This software may be used and distributed according to the terms of the
7 # GNU General Public License version 2 or any later version. 7 # GNU General Public License version 2 or any later version.
8 8
9 from __future__ import absolute_import 9 from __future__ import absolute_import
10 10
11 import gc
11 import os 12 import os
12 import re 13 import re
13 import time 14 import time
14 15
15 from ..i18n import _ 16 from ..i18n import _
222 return False 223 return False
223 224
224 def run_wsgi(self, req): 225 def run_wsgi(self, req):
225 profile = self.ui.configbool('profiling', 'enabled') 226 profile = self.ui.configbool('profiling', 'enabled')
226 with profiling.profile(self.ui, enabled=profile): 227 with profiling.profile(self.ui, enabled=profile):
227 for r in self._runwsgi(req): 228 try:
228 yield r 229 for r in self._runwsgi(req):
230 yield r
231 finally:
232 # There are known cycles in localrepository that prevent
233 # those objects (and tons of held references) from being
234 # collected through normal refcounting. We mitigate those
235 # leaks by performing an explicit GC on every request.
236 # TODO remove this once leaks are fixed.
237 # TODO only run this on requests that create localrepository
238 # instances instead of every request.
239 gc.collect()
229 240
230 def _runwsgi(self, req): 241 def _runwsgi(self, req):
231 try: 242 try:
232 self.refresh() 243 self.refresh()
233 244