comparison mercurial/util.py @ 27371:45d996a566d7

util: reimplement lrucachedict As part of attempting to more aggressively use the existing lrucachedict, collections.deque operations were frequently showing up in profiling output, negating benefits of caching. Searching the internet seems to tell me that the most efficient way to implement an LRU cache in Python is to have a dict indexing the cached entries and then to use a doubly linked list to track freshness of each entry. So, this patch replaces our existing lrucachedict with a version using such a pattern. The recently introduced perflrucachedict command reveals the following timings for 10,000 operations for the following cache sizes for the existing cache: n=4 init=0.004079 gets=0.003632 sets=0.005188 mixed=0.005402 n=8 init=0.004045 gets=0.003998 sets=0.005064 mixed=0.005328 n=16 init=0.004011 gets=0.004496 sets=0.005021 mixed=0.005555 n=32 init=0.004064 gets=0.005611 sets=0.005188 mixed=0.006189 n=64 init=0.003975 gets=0.007684 sets=0.005178 mixed=0.007245 n=128 init=0.004121 gets=0.012005 sets=0.005422 mixed=0.009471 n=256 init=0.004143 gets=0.020295 sets=0.005227 mixed=0.013612 n=512 init=0.004039 gets=0.036703 sets=0.005243 mixed=0.020685 n=1024 init=0.004193 gets=0.068142 sets=0.005251 mixed=0.033064 n=2048 init=0.004070 gets=0.133383 sets=0.005160 mixed=0.050359 n=4096 init=0.004053 gets=0.265194 sets=0.004868 mixed=0.048352 n=8192 init=0.004087 gets=0.542218 sets=0.004562 mixed=0.032753 n=16384 init=0.004106 gets=1.064055 sets=0.004179 mixed=0.020367 n=32768 init=0.004034 gets=2.097620 sets=0.004260 mixed=0.013031 n=65536 init=0.004108 gets=4.106390 sets=0.004268 mixed=0.010191 As the data shows, the existing cache's retrieval performance diminishes linearly with cache size. (Keep in mind the microbenchmark is testing 100% cache hit rate.) The new cache implementation reveals the following: n=4 init=0.006665 gets=0.006541 sets=0.005733 mixed=0.006876 n=8 init=0.006649 gets=0.006374 sets=0.005663 mixed=0.006899 n=16 init=0.006570 gets=0.006504 sets=0.005799 mixed=0.007057 n=32 init=0.006854 gets=0.006459 sets=0.005747 mixed=0.007034 n=64 init=0.006580 gets=0.006495 sets=0.005740 mixed=0.006992 n=128 init=0.006534 gets=0.006739 sets=0.005648 mixed=0.007124 n=256 init=0.006669 gets=0.006773 sets=0.005824 mixed=0.007151 n=512 init=0.006701 gets=0.007061 sets=0.006042 mixed=0.007372 n=1024 init=0.006641 gets=0.007620 sets=0.006387 mixed=0.007464 n=2048 init=0.006517 gets=0.008598 sets=0.006871 mixed=0.008077 n=4096 init=0.006720 gets=0.010933 sets=0.007854 mixed=0.008663 n=8192 init=0.007383 gets=0.015969 sets=0.010288 mixed=0.008896 n=16384 init=0.006660 gets=0.025447 sets=0.011208 mixed=0.008826 n=32768 init=0.006658 gets=0.044390 sets=0.011192 mixed=0.008943 n=65536 init=0.006836 gets=0.082736 sets=0.011151 mixed=0.008826 Let's go through the results. The new cache takes longer to construct. ~6.6ms vs ~4.1ms. However, this is measuring 10,000 __init__ calls, so the difference is ~0.2us/instance. We currently only create lrucachedict for manifest instances, so this regression is not likely relevant. The new cache is slightly slower for retrievals for cache sizes < 1024. It's worth noting that the only existing use of lurcachedict is in manifest.py and the default cache size is 4. This regression is worrisome. However, for n=4, the delta is ~2.9s for 10,000 lookups, or ~0.29us/op. Again, this is a marginal regression and likely not relevant in the real world. Timing `hg log -p -l 100` for mozilla-central reveals that cache lookup times are dominated by decompression and fulltext resolution (even with lz4 manifests). The new cache is significantly faster for retrievals at larger capacities. Whereas the old implementation has retrieval performance linear with cache capacity, the new cache is constant time until much larger values. And, when it does start to increase significantly, it is a few magnitudes faster than the current cache. The new cache does appear to be slower for sets when capacity is large. However, performance is similar for smaller capacities. Of course, caches should generally be optimized for retrieval performance because if a cache is getting more sets than gets, it doesn't really make sense to cache. If this regression is worrisome, again, taking the largest regression at n=65536 of ~6.9ms for 10,000 results in a regression of ~0.68us/op. This is not significant in the grand scheme of things. Overall, the new cache is performant at retrievals at much larger capacity values which makes it a generally more useful cache backend. While there are regressions, their absolute value is extremely small. Since we aren't using lrucachedict aggressively today, these regressions should not be relevant. The improved scalability of lrucachedict should enable us to more aggressively utilize lrucachedict for more granular caching (read: higher capacity caches) in the near future. The impetus for this patch is to establish a cache of decompressed revlog revisions, notably manifest revisions. And since delta chains can grow to >10,000 and cache hit rate can be high, the improved retrieval performance of lrucachedict should be relevant.
author Gregory Szorc <gregory.szorc@gmail.com>
date Sun, 06 Dec 2015 19:04:10 -0800
parents c7ab2087d436
children 4eeef1b2d689
comparison
equal deleted inserted replaced
27368:409a20314c64 27371:45d996a566d7
508 yield k, self[k] 508 yield k, self[k]
509 def insert(self, index, key, val): 509 def insert(self, index, key, val):
510 self._list.insert(index, key) 510 self._list.insert(index, key)
511 dict.__setitem__(self, key, val) 511 dict.__setitem__(self, key, val)
512 512
513 class _lrucachenode(object):
514 """A node in a doubly linked list.
515
516 Holds a reference to nodes on either side as well as a key-value
517 pair for the dictionary entry.
518 """
519 __slots__ = ('next', 'prev', 'key', 'value')
520
521 def __init__(self):
522 self.next = None
523 self.prev = None
524
525 self.key = _notset
526 self.value = None
527
528 def markempty(self):
529 """Mark the node as emptied."""
530 self.key = _notset
531
513 class lrucachedict(object): 532 class lrucachedict(object):
514 '''cache most recent gets from or sets to this dictionary''' 533 """Dict that caches most recent accesses and sets.
515 def __init__(self, maxsize): 534
535 The dict consists of an actual backing dict - indexed by original
536 key - and a doubly linked circular list defining the order of entries in
537 the cache.
538
539 The head node is the newest entry in the cache. If the cache is full,
540 we recycle head.prev and make it the new head. Cache accesses result in
541 the node being moved to before the existing head and being marked as the
542 new head node.
543
544 NOTE: construction of this class doesn't scale well if the cache size
545 is in the thousands. Avoid creating hundreds or thousands of instances
546 with large capacities.
547 """
548 def __init__(self, max):
516 self._cache = {} 549 self._cache = {}
517 self._maxsize = maxsize 550
518 self._order = collections.deque() 551 self._head = head = _lrucachenode()
519 552 head.prev = head
520 def __getitem__(self, key): 553 head.next = head
521 value = self._cache[key] 554 self._size = 1
522 self._order.remove(key) 555 self._capacity = max
523 self._order.append(key) 556
524 return value 557 def __len__(self):
525 558 return len(self._cache)
526 def __setitem__(self, key, value): 559
527 if key not in self._cache: 560 def __contains__(self, k):
528 if len(self._cache) >= self._maxsize: 561 return k in self._cache
529 del self._cache[self._order.popleft()] 562
563 def __iter__(self):
564 # We don't have to iterate in cache order, but why not.
565 n = self._head
566 for i in range(len(self._cache)):
567 yield n.key
568 n = n.next
569
570 def __getitem__(self, k):
571 node = self._cache[k]
572 self._movetohead(node)
573 return node.value
574
575 def __setitem__(self, k, v):
576 node = self._cache.get(k)
577 # Replace existing value and mark as newest.
578 if node is not None:
579 node.value = v
580 self._movetohead(node)
581 return
582
583 if self._size < self._capacity:
584 node = self._addcapacity()
530 else: 585 else:
531 self._order.remove(key) 586 # Grab the last/oldest item.
532 self._cache[key] = value 587 node = self._head.prev
533 self._order.append(key) 588
534 589 # At capacity. Kill the old entry.
535 def __contains__(self, key): 590 if node.key is not _notset:
536 return key in self._cache 591 del self._cache[node.key]
592
593 node.key = k
594 node.value = v
595 self._cache[k] = node
596 # And mark it as newest entry. No need to adjust order since it
597 # is already self._head.prev.
598 self._head = node
599
600 def __delitem__(self, k):
601 node = self._cache.pop(k)
602 node.markempty()
603
604 # Temporarily mark as newest item before re-adjusting head to make
605 # this node the oldest item.
606 self._movetohead(node)
607 self._head = node.next
608
609 # Additional dict methods.
610
611 def get(self, k, default=None):
612 try:
613 return self._cache[k]
614 except KeyError:
615 return default
537 616
538 def clear(self): 617 def clear(self):
618 n = self._head
619 while n.key is not _notset:
620 n.markempty()
621 n = n.next
622
539 self._cache.clear() 623 self._cache.clear()
540 self._order = collections.deque() 624
625 def _movetohead(self, node):
626 """Mark a node as the newest, making it the new head.
627
628 When a node is accessed, it becomes the freshest entry in the LRU
629 list, which is denoted by self._head.
630
631 Visually, let's make ``N`` the new head node (* denotes head):
632
633 previous/oldest <-> head <-> next/next newest
634
635 ----<->--- A* ---<->-----
636 | |
637 E <-> D <-> N <-> C <-> B
638
639 To:
640
641 ----<->--- N* ---<->-----
642 | |
643 E <-> D <-> C <-> B <-> A
644
645 This requires the following moves:
646
647 C.next = D (node.prev.next = node.next)
648 D.prev = C (node.next.prev = node.prev)
649 E.next = N (head.prev.next = node)
650 N.prev = E (node.prev = head.prev)
651 N.next = A (node.next = head)
652 A.prev = N (head.prev = node)
653 """
654 head = self._head
655 # C.next = D
656 node.prev.next = node.next
657 # D.prev = C
658 node.next.prev = node.prev
659 # N.prev = E
660 node.prev = head.prev
661 # N.next = A
662 # It is tempting to do just "head" here, however if node is
663 # adjacent to head, this will do bad things.
664 node.next = head.prev.next
665 # E.next = N
666 node.next.prev = node
667 # A.prev = N
668 node.prev.next = node
669
670 self._head = node
671
672 def _addcapacity(self):
673 """Add a node to the circular linked list.
674
675 The new node is inserted before the head node.
676 """
677 head = self._head
678 node = _lrucachenode()
679 head.prev.next = node
680 node.prev = head.prev
681 node.next = head
682 head.prev = node
683 self._size += 1
684 return node
541 685
542 def lrucachefunc(func): 686 def lrucachefunc(func):
543 '''cache most recent results of function calls''' 687 '''cache most recent results of function calls'''
544 cache = {} 688 cache = {}
545 order = collections.deque() 689 order = collections.deque()