Mercurial > public > mercurial-scm > hg-stable
comparison mercurial/util.py @ 23097:30124c40d11f stable
util.fspath: use a dict rather than a linear scan for lookups
Previously, we'd scan through the entire directory listing looking for a
normalized match. This is O(N) in the number of files in the directory. If we
decide to call util.fspath on each file in it, the overall complexity works out
to O(N^2). This becomes a problem with directories a few thousand files or
larger.
Switch to using a dictionary instead. There is a slightly higher upfront cost
to pay, but for cases like the above this is amortized O(1). Plus there is a
lower constant factor because generator comprehensions are faster than for
loops, so overall it works out to be a very small loss in performance for 1
file, and a huge gain when there's more.
For a large repo with around 200k files in it on a case-insensitive file
system, for a large directory with over 30,000 files in it, the following
command was tested:
ls | shuf -n $COUNT | xargs hg status
This command leads to util.fspath being called on $COUNT files in the
directory.
COUNT before after
1 0.77s 0.78s
100 1.42s 0.80s
1000 6.3s 0.96s
I also tested with COUNT=10000, but before took too long so I gave up.
author | Siddharth Agarwal <sid0@fb.com> |
---|---|
date | Fri, 24 Oct 2014 11:39:39 -0700 |
parents | c312ef382033 |
children | e53f6b72a0e4 |
comparison
equal
deleted
inserted
replaced
23096:a622c3fa10a5 | 23097:30124c40d11f |
---|---|
897 Note that this function is unnecessary, and should not be | 897 Note that this function is unnecessary, and should not be |
898 called, for case-sensitive filesystems (simply because it's expensive). | 898 called, for case-sensitive filesystems (simply because it's expensive). |
899 | 899 |
900 The root should be normcase-ed, too. | 900 The root should be normcase-ed, too. |
901 ''' | 901 ''' |
902 def find(p, contents): | 902 def _makefspathcacheentry(dir): |
903 for n in contents: | 903 return dict((normcase(n), n) for n in os.listdir(dir)) |
904 if normcase(n) == p: | |
905 return n | |
906 return None | |
907 | 904 |
908 seps = os.sep | 905 seps = os.sep |
909 if os.altsep: | 906 if os.altsep: |
910 seps = seps + os.altsep | 907 seps = seps + os.altsep |
911 # Protect backslashes. This gets silly very quickly. | 908 # Protect backslashes. This gets silly very quickly. |
917 if sep: | 914 if sep: |
918 result.append(sep) | 915 result.append(sep) |
919 continue | 916 continue |
920 | 917 |
921 if dir not in _fspathcache: | 918 if dir not in _fspathcache: |
922 _fspathcache[dir] = os.listdir(dir) | 919 _fspathcache[dir] = _makefspathcacheentry(dir) |
923 contents = _fspathcache[dir] | 920 contents = _fspathcache[dir] |
924 | 921 |
925 found = find(part, contents) | 922 found = contents.get(part) |
926 if not found: | 923 if not found: |
927 # retry "once per directory" per "dirstate.walk" which | 924 # retry "once per directory" per "dirstate.walk" which |
928 # may take place for each patches of "hg qpush", for example | 925 # may take place for each patches of "hg qpush", for example |
929 contents = os.listdir(dir) | 926 _fspathcache[dir] = contents = _makefspathcacheentry(dir) |
930 _fspathcache[dir] = contents | 927 found = contents.get(part) |
931 found = find(part, contents) | |
932 | 928 |
933 result.append(found or part) | 929 result.append(found or part) |
934 dir = os.path.join(dir, part) | 930 dir = os.path.join(dir, part) |
935 | 931 |
936 return ''.join(result) | 932 return ''.join(result) |