Mercurial > public > mercurial-scm > hg
comparison mercurial/revlogutils/deltas.py @ 40603:2f7e531ef3e7
sparse-revlog: skip the span check in the sparse-revlog case
This significantly improves the performance on unbundling on smaller
repositories.
Mercurial: unbundling 1K revisions
no-sparse-revlog: 500 ms
sparse-revlog-before: 689 ms
sparse-revlog-after: 484 ms
Pypy: unbundling 1K revisions
no-sparse-revlog: 1.242 s
sparse-revlog-before: 1.135 s
sparse-revlog-after: 0.860 s
NetBeans: unbundling 1K revisions
no-sparse-revlog: 1.386 s
sparse-revlog-before: 2.368 s
sparse-revlog-after: 1.191 s
Mozilla: unbundling 1K revisions
no-sparse-revlog: 3.103 s
sparse-revlog-before: 3.367 s
sparse-revlog-after: 3.093 s
author | Boris Feld <boris.feld@octobus.net> |
---|---|
date | Mon, 15 Oct 2018 15:45:08 +0200 |
parents | 324ba8b14d78 |
children | 3ac23dad6364 |
comparison
equal
deleted
inserted
replaced
40602:c36175456350 | 40603:2f7e531ef3e7 |
---|---|
487 # bounding it limits the amount of I/O we need to do. | 487 # bounding it limits the amount of I/O we need to do. |
488 # - 'deltainfo.compresseddeltalen' is the sum of the total size of | 488 # - 'deltainfo.compresseddeltalen' is the sum of the total size of |
489 # deltas we need to apply -- bounding it limits the amount of CPU | 489 # deltas we need to apply -- bounding it limits the amount of CPU |
490 # we consume. | 490 # we consume. |
491 | 491 |
492 if revlog._sparserevlog: | |
493 # As sparse-read will be used, we can consider that the distance, | |
494 # instead of being the span of the whole chunk, | |
495 # is the span of the largest read chunk | |
496 base = deltainfo.base | |
497 | |
498 if base != nullrev: | |
499 deltachain = revlog._deltachain(base)[0] | |
500 else: | |
501 deltachain = [] | |
502 | |
503 # search for the first non-snapshot revision | |
504 for idx, r in enumerate(deltachain): | |
505 if not revlog.issnapshot(r): | |
506 break | |
507 deltachain = deltachain[idx:] | |
508 chunks = slicechunk(revlog, deltachain, deltainfo) | |
509 all_span = [segmentspan(revlog, revs, deltainfo) | |
510 for revs in chunks] | |
511 distance = max(all_span) | |
512 else: | |
513 distance = deltainfo.distance | |
514 | |
515 textlen = revinfo.textlen | 492 textlen = revinfo.textlen |
516 defaultmax = textlen * 4 | 493 defaultmax = textlen * 4 |
517 maxdist = revlog._maxdeltachainspan | 494 maxdist = revlog._maxdeltachainspan |
518 if not maxdist: | 495 if not maxdist: |
519 maxdist = distance # ensure the conditional pass | 496 maxdist = deltainfo.distance # ensure the conditional pass |
520 maxdist = max(maxdist, defaultmax) | 497 maxdist = max(maxdist, defaultmax) |
521 if revlog._sparserevlog and maxdist < revlog._srmingapsize: | |
522 # In multiple place, we are ignoring irrelevant data range below a | |
523 # certain size. Be also apply this tradeoff here and relax span | |
524 # constraint for small enought content. | |
525 maxdist = revlog._srmingapsize | |
526 | 498 |
527 # Bad delta from read span: | 499 # Bad delta from read span: |
528 # | 500 # |
529 # If the span of data read is larger than the maximum allowed. | 501 # If the span of data read is larger than the maximum allowed. |
530 if maxdist < distance: | 502 # |
503 # In the sparse-revlog case, we rely on the associated "sparse reading" | |
504 # to avoid issue related to the span of data. In theory, it would be | |
505 # possible to build pathological revlog where delta pattern would lead | |
506 # to too many reads. However, they do not happen in practice at all. So | |
507 # we skip the span check entirely. | |
508 if not revlog._sparserevlog and maxdist < deltainfo.distance: | |
531 return False | 509 return False |
532 | 510 |
533 # Bad delta from new delta size: | 511 # Bad delta from new delta size: |
534 # | 512 # |
535 # If the delta size is larger than the target text, storing the | 513 # If the delta size is larger than the target text, storing the |