comparison mercurial/revlog.py @ 40670:87a872555e90

revlog: detect incomplete revlog reads _readsegment() is supposed to return N bytes of revlog revision data starting at a file offset. Surprisingly, its behavior before this patch never verified that it actually read and returned N bytes! Instead, it would perform the read(), then return whatever data was available. And even more surprisingly, nothing in the call chain appears to have been validating that it received all the data it was expecting. This behavior could lead to partial or incomplete revision chunks being operated on. This could result in e.g. cached deltas being applied against incomplete base revisions. The delta application process would happily perform this operation. Only hash verification would detect the corruption and save us. This commit changes the behavior of raw revlog reading to validate that we actually read() the number of bytes that were requested. We will raise a more specific error faster, rather than possibly have it go undetected or manifest later in the call stack, at delta application or hash verification. Differential Revision: https://phab.mercurial-scm.org/D5266
author Gregory Szorc <gregory.szorc@gmail.com>
date Tue, 13 Nov 2018 12:30:59 -0800
parents 39369475445c
children e9293c5f8bb9
comparison
equal deleted inserted replaced
40669:39369475445c 40670:87a872555e90
1340 1340
1341 If an existing file handle is passed, it will be seeked and the 1341 If an existing file handle is passed, it will be seeked and the
1342 original seek position will NOT be restored. 1342 original seek position will NOT be restored.
1343 1343
1344 Returns a str or buffer of raw byte data. 1344 Returns a str or buffer of raw byte data.
1345
1346 Raises if the requested number of bytes could not be read.
1345 """ 1347 """
1346 # Cache data both forward and backward around the requested 1348 # Cache data both forward and backward around the requested
1347 # data, in a fixed size window. This helps speed up operations 1349 # data, in a fixed size window. This helps speed up operations
1348 # involving reading the revlog backwards. 1350 # involving reading the revlog backwards.
1349 cachesize = self._chunkcachesize 1351 cachesize = self._chunkcachesize
1351 reallength = (((offset + length + cachesize) & ~(cachesize - 1)) 1353 reallength = (((offset + length + cachesize) & ~(cachesize - 1))
1352 - realoffset) 1354 - realoffset)
1353 with self._datareadfp(df) as df: 1355 with self._datareadfp(df) as df:
1354 df.seek(realoffset) 1356 df.seek(realoffset)
1355 d = df.read(reallength) 1357 d = df.read(reallength)
1358
1356 self._cachesegment(realoffset, d) 1359 self._cachesegment(realoffset, d)
1357 if offset != realoffset or reallength != length: 1360 if offset != realoffset or reallength != length:
1358 return util.buffer(d, offset - realoffset, length) 1361 startoffset = offset - realoffset
1362 if len(d) - startoffset < length:
1363 raise error.RevlogError(
1364 _('partial read of revlog %s; expected %d bytes from '
1365 'offset %d, got %d') %
1366 (self.indexfile if self._inline else self.datafile,
1367 length, realoffset, len(d) - startoffset))
1368
1369 return util.buffer(d, startoffset, length)
1370
1371 if len(d) < length:
1372 raise error.RevlogError(
1373 _('partial read of revlog %s; expected %d bytes from offset '
1374 '%d, got %d') %
1375 (self.indexfile if self._inline else self.datafile,
1376 length, offset, len(d)))
1377
1359 return d 1378 return d
1360 1379
1361 def _getsegment(self, offset, length, df=None): 1380 def _getsegment(self, offset, length, df=None):
1362 """Obtain a segment of raw data from the revlog. 1381 """Obtain a segment of raw data from the revlog.
1363 1382