comparison mercurial/revset.py @ 16446:984e0412e82b

revset: speedup matching() by first matching fields that take less time to match This patch sorts the fields that are passed to the matching function so that it always starts by matching those fields that take less time to match. Not all fields take the same amount of time to match. I've done several measurements running the following command: hg --time log -r "matching(1, field)" on the mercurial repository, and where 'field' was each one of the fields accepted by match. In order to avoid the print overhead (which could be different for different fields, given the different number of matches) I used a modified version of the matching() function which always returns no matches. These tests showed that different fields take wildly different amounts of time to match. Particulary the substate field takes up to 25 seconds to match on my machine, compared to the 0.3 seconds that takes to match the phase field or the 2 seconds (approx) that takes to match most fields. With this patch, matching both the phase and the substate of a revision takes the same amount of time as matching the phase. The field match order introduced by this patch is as follows: phase, parents, user, date, branch, summary, files, description, substate An extra nice thing about this patch is that it makes the match time stable.
author Angel Ezquerra <angel.ezquerra@gmail.com>
date Sat, 14 Apr 2012 01:41:03 +0200
parents 453c8670566c
children 8a72805034d5
comparison
equal deleted inserted replaced
16445:453c8670566c 16446:984e0412e82b
948 if 'summary' in fields and 'description' in fields: 948 if 'summary' in fields and 'description' in fields:
949 # If a revision matches its description it also matches its summary 949 # If a revision matches its description it also matches its summary
950 fields.discard('summary') 950 fields.discard('summary')
951 951
952 # We may want to match more than one field 952 # We may want to match more than one field
953 # Not all fields take the same amount of time to be matched
954 # Sort the selected fields in order of increasing matching cost
955 fieldorder = ('phase', 'parents', 'user', 'date', 'branch', 'summary',
956 'files', 'description', 'substate',)
957 def fieldkeyfunc(f):
958 try:
959 return fieldorder.index(f)
960 except ValueError:
961 # assume an unknown field is very costly
962 return len(fieldorder)
963 fields = list(fields)
964 fields.sort(key=fieldkeyfunc)
965
953 # Each field will be matched with its own "getfield" function 966 # Each field will be matched with its own "getfield" function
954 # which will be added to the getfieldfuncs array of functions 967 # which will be added to the getfieldfuncs array of functions
955 getfieldfuncs = [] 968 getfieldfuncs = []
956 _funcs = { 969 _funcs = {
957 'user': lambda r: repo[r].user(), 970 'user': lambda r: repo[r].user(),