comparison mercurial/revlogutils/deltas.py @ 40991:42f59d3f714d

delta: exclude base candidate much smaller than the target If a revision's full text is that much bigger than a base candidate full text, we no longer consider that candidate. This solves a pathological case we encountered on a very specify repository. It contains a long series of changesets with a very small manifest (one file) co-existing with others changesets using a very large manifest. Without this filtering, we ended up considering a large number of tiny full snapshots as a potential base. It resulted in very large delta (the size of the full text) and mercurial spending 99% of its time compressing these deltas. The timing of a commit moved from about 400s to about 10s (still slow, but not ridiculously slow).
author Boris Feld <boris.feld@octobus.net>
date Mon, 17 Dec 2018 10:42:19 +0100
parents f960c51eebf3
children ba09db267cb6
comparison
equal deleted inserted replaced
40990:21a9cace4bbf 40991:42f59d3f714d
599 and revlog.length(deltainfo.base) < deltainfo.deltalen): 599 and revlog.length(deltainfo.base) < deltainfo.deltalen):
600 return False 600 return False
601 601
602 return True 602 return True
603 603
604 # If a revision's full text is that much bigger than a base candidate full
605 # text's, it is very unlikely that it will produce a valid delta. We no longer
606 # consider these candidates.
607 LIMIT_BASE2TEXT = 50
608
604 def _candidategroups(revlog, textlen, p1, p2, cachedelta): 609 def _candidategroups(revlog, textlen, p1, p2, cachedelta):
605 """Provides group of revision to be tested as delta base 610 """Provides group of revision to be tested as delta base
606 611
607 This top level function focus on emitting groups with unique and worthwhile 612 This top level function focus on emitting groups with unique and worthwhile
608 content. See _raw_candidate_groups for details about the group order. 613 content. See _raw_candidate_groups for details about the group order.
612 yield None 617 yield None
613 return 618 return
614 619
615 deltalength = revlog.length 620 deltalength = revlog.length
616 deltaparent = revlog.deltaparent 621 deltaparent = revlog.deltaparent
622 sparse = revlog._sparserevlog
617 good = None 623 good = None
618 624
619 deltas_limit = textlen * LIMIT_DELTA2TEXT 625 deltas_limit = textlen * LIMIT_DELTA2TEXT
620 626
621 tested = set([nullrev]) 627 tested = set([nullrev])
641 if rev in tested: 647 if rev in tested:
642 continue 648 continue
643 tested.add(rev) 649 tested.add(rev)
644 # filter out delta base that will never produce good delta 650 # filter out delta base that will never produce good delta
645 if deltas_limit < revlog.length(rev): 651 if deltas_limit < revlog.length(rev):
652 continue
653 if sparse and revlog.rawsize(rev) < (textlen // LIMIT_BASE2TEXT):
646 continue 654 continue
647 # no delta for rawtext-changing revs (see "candelta" for why) 655 # no delta for rawtext-changing revs (see "candelta" for why)
648 if revlog.flags(rev) & REVIDX_RAWTEXT_CHANGING_FLAGS: 656 if revlog.flags(rev) & REVIDX_RAWTEXT_CHANGING_FLAGS:
649 continue 657 continue
650 group.append(rev) 658 group.append(rev)