comparison mercurial/changelog.py @ 32292:0ad0d26ff703

changelog: load pending file directly When changelogs are written, a copy of the index (or inline revlog) may be written to an 00changelog.i.a file to facilitate hooks and other processes having access to the pending data before it is finalized. The way it works today, the localrepo class loads the changelog like normal. Then, if it detects a pending transaction, it asks the changelog class to load a pending changelog. The changelog class looks for a 00changelog.i.a file. If it exists, it is loaded and internal data structures on the new revlog class are copied to the original instance. The existing mechanism is inefficient because it loads 2 revlog files. The index, node map, and chunk cache for 00changelog.i are thrown away and replaced by those for 00changelog.i.a. The existing mechanism is also brittle because it is a layering violation to access the data structures being accessed. For example, the code copies the "chunk cache" because for inline revlogs this cache contains the raw revision chunks and allows the original changelog/revlog instance to access revision data for these pending revisions. This whole behavior of course relies on the revlog constructor reading the entirety of an inline revlog into memory and caching it. That's why it is brittle. (I discovered all this as part of modifying behavior of the chunk cache.) This patch streamlines the loading of a pending 00changelog.i.a revlog by doing it directly in the changelog constructor if told to do so. When this code path is active, we no longer load the 00changelog.i file at all. The only negative outcome I see from this change is if loading 00changelog.i was somehow facilitating a role. But I can't imagine what that would be because we throw away its data (the index data structures are replaced and inline revision data is replaced via the chunk cache) and since 00changelog.i.a is a copy of 00changelog.i, file content should be identical, so there should be no meaninful file integrity checking at play. I think this was all just sub-optimal code.
author Gregory Szorc <gregory.szorc@gmail.com>
date Sat, 13 May 2017 16:26:43 -0700
parents 85ef5a073114
children 3caec778774b
comparison
equal deleted inserted replaced
32291:bd872f64a8ba 32292:0ad0d26ff703
256 @property 256 @property
257 def description(self): 257 def description(self):
258 return encoding.tolocal(self._text[self._offsets[3] + 2:]) 258 return encoding.tolocal(self._text[self._offsets[3] + 2:])
259 259
260 class changelog(revlog.revlog): 260 class changelog(revlog.revlog):
261 def __init__(self, opener): 261 def __init__(self, opener, trypending=False):
262 revlog.revlog.__init__(self, opener, "00changelog.i", 262 """Load a changelog revlog using an opener.
263 checkambig=True) 263
264 If ``trypending`` is true, we attempt to load the index from a
265 ``00changelog.i.a`` file instead of the default ``00changelog.i``.
266 The ``00changelog.i.a`` file contains index (and possibly inline
267 revision) data for a transaction that hasn't been finalized yet.
268 It exists in a separate file to facilitate readers (such as
269 hooks processes) accessing data before a transaction is finalized.
270 """
271 if trypending and opener.exists('00changelog.i.a'):
272 indexfile = '00changelog.i.a'
273 else:
274 indexfile = '00changelog.i'
275
276 revlog.revlog.__init__(self, opener, indexfile, checkambig=True)
277
264 if self._initempty: 278 if self._initempty:
265 # changelogs don't benefit from generaldelta 279 # changelogs don't benefit from generaldelta
266 self.version &= ~revlog.REVLOGGENERALDELTA 280 self.version &= ~revlog.REVLOGGENERALDELTA
267 self._generaldelta = False 281 self._generaldelta = False
268 282
399 self._delaybuf = None 413 self._delaybuf = None
400 self._divert = False 414 self._divert = False
401 # split when we're done 415 # split when we're done
402 self.checkinlinesize(tr) 416 self.checkinlinesize(tr)
403 417
404 def readpending(self, file):
405 """read index data from a "pending" file
406
407 During a transaction, the actual changeset data is already stored in the
408 main file, but not yet finalized in the on-disk index. Instead, a
409 "pending" index is written by the transaction logic. If this function
410 is running, we are likely in a subprocess invoked in a hook. The
411 subprocess is informed that it is within a transaction and needs to
412 access its content.
413
414 This function will read all the index data out of the pending file and
415 overwrite the main index."""
416
417 if not self.opener.exists(file):
418 return # no pending data for changelog
419 r = revlog.revlog(self.opener, file)
420 self.index = r.index
421 self.nodemap = r.nodemap
422 self._nodecache = r._nodecache
423 self._chunkcache = r._chunkcache
424
425 def _writepending(self, tr): 418 def _writepending(self, tr):
426 "create a file containing the unfinalized state for pretxnchangegroup" 419 "create a file containing the unfinalized state for pretxnchangegroup"
427 if self._delaybuf: 420 if self._delaybuf:
428 # make a temporary copy of the index 421 # make a temporary copy of the index
429 fp1 = self._realopener(self.indexfile) 422 fp1 = self._realopener(self.indexfile)