comparison mercurial/revset.py @ 27028:f92053df8f0b

revset: speed up '_matchfiles' File matching is done by applying the matcher to all elements in the 'file' field of all changesets in the repository. This requires to read/parse all changesets in the repository and do a lot of matching. However about 1/3 of the time of the function is used to create 'changectx' object and retrieve their 'file' field. This is far too much overhead so we are skipping the changectx layer and directly access the data from the changelog. This provide use significant speed up: repository: mozilla central 252524 revisions command: hg perfrevset '_matchfiles("p:browser")' Before: 15.899687s After: 10.011705s Slowdown is even more significant if you have a lot of namespace that slowdown lookup. The time is now spent with this approximate repartition: Matcher: 20% regexp matching: 10% changelog.read: 80% reading revision: 60% checking hash: 15% decompression: 15% reading chunk: 30% changelog parsing: 20% decoding to local: 10% The next easy win is probably to have more of the changelog stack implemented using the CPython api.
author Pierre-Yves David <pierre-yves.david@fb.com>
date Wed, 18 Nov 2015 23:23:03 -0800
parents a95c975f42e3
children 9e06e7fb037d
comparison
equal deleted inserted replaced
27027:a01ecbcfaf84 27028:f92053df8f0b
1162 default = 'glob' 1162 default = 'glob'
1163 1163
1164 m = matchmod.match(repo.root, repo.getcwd(), pats, include=inc, 1164 m = matchmod.match(repo.root, repo.getcwd(), pats, include=inc,
1165 exclude=exc, ctx=repo[rev], default=default) 1165 exclude=exc, ctx=repo[rev], default=default)
1166 1166
1167 # This directly read the changelog data as creating changectx for all
1168 # revisions is quite expensive.
1169 getchangeset = repo.changelog.read
1170 wdirrev = node.wdirrev
1167 def matches(x): 1171 def matches(x):
1168 for f in repo[x].files(): 1172 if x == wdirrev:
1173 files = repo[x].files()
1174 else:
1175 files = getchangeset(x)[3]
1176 for f in files:
1169 if m(f): 1177 if m(f):
1170 return True 1178 return True
1171 return False 1179 return False
1172 1180
1173 return subset.filter(matches) 1181 return subset.filter(matches)