comparison mercurial/util.py @ 23097:30124c40d11f stable

util.fspath: use a dict rather than a linear scan for lookups Previously, we'd scan through the entire directory listing looking for a normalized match. This is O(N) in the number of files in the directory. If we decide to call util.fspath on each file in it, the overall complexity works out to O(N^2). This becomes a problem with directories a few thousand files or larger. Switch to using a dictionary instead. There is a slightly higher upfront cost to pay, but for cases like the above this is amortized O(1). Plus there is a lower constant factor because generator comprehensions are faster than for loops, so overall it works out to be a very small loss in performance for 1 file, and a huge gain when there's more. For a large repo with around 200k files in it on a case-insensitive file system, for a large directory with over 30,000 files in it, the following command was tested: ls | shuf -n $COUNT | xargs hg status This command leads to util.fspath being called on $COUNT files in the directory. COUNT before after 1 0.77s 0.78s 100 1.42s 0.80s 1000 6.3s 0.96s I also tested with COUNT=10000, but before took too long so I gave up.
author Siddharth Agarwal <sid0@fb.com>
date Fri, 24 Oct 2014 11:39:39 -0700
parents c312ef382033
children e53f6b72a0e4
comparison
equal deleted inserted replaced
23096:a622c3fa10a5 23097:30124c40d11f
897 Note that this function is unnecessary, and should not be 897 Note that this function is unnecessary, and should not be
898 called, for case-sensitive filesystems (simply because it's expensive). 898 called, for case-sensitive filesystems (simply because it's expensive).
899 899
900 The root should be normcase-ed, too. 900 The root should be normcase-ed, too.
901 ''' 901 '''
902 def find(p, contents): 902 def _makefspathcacheentry(dir):
903 for n in contents: 903 return dict((normcase(n), n) for n in os.listdir(dir))
904 if normcase(n) == p:
905 return n
906 return None
907 904
908 seps = os.sep 905 seps = os.sep
909 if os.altsep: 906 if os.altsep:
910 seps = seps + os.altsep 907 seps = seps + os.altsep
911 # Protect backslashes. This gets silly very quickly. 908 # Protect backslashes. This gets silly very quickly.
917 if sep: 914 if sep:
918 result.append(sep) 915 result.append(sep)
919 continue 916 continue
920 917
921 if dir not in _fspathcache: 918 if dir not in _fspathcache:
922 _fspathcache[dir] = os.listdir(dir) 919 _fspathcache[dir] = _makefspathcacheentry(dir)
923 contents = _fspathcache[dir] 920 contents = _fspathcache[dir]
924 921
925 found = find(part, contents) 922 found = contents.get(part)
926 if not found: 923 if not found:
927 # retry "once per directory" per "dirstate.walk" which 924 # retry "once per directory" per "dirstate.walk" which
928 # may take place for each patches of "hg qpush", for example 925 # may take place for each patches of "hg qpush", for example
929 contents = os.listdir(dir) 926 _fspathcache[dir] = contents = _makefspathcacheentry(dir)
930 _fspathcache[dir] = contents 927 found = contents.get(part)
931 found = find(part, contents)
932 928
933 result.append(found or part) 929 result.append(found or part)
934 dir = os.path.join(dir, part) 930 dir = os.path.join(dir, part)
935 931
936 return ''.join(result) 932 return ''.join(result)