Mercurial > public > mercurial-scm > hg
changeset 52622:aa5844ade247
git: speed up possible head processing during indexing by ~100x
Benchmarking of 50 iterations of indexing (see below) shows that there is
essentially no difference for small repos (<1k commits), similarly medium
repos (~12k commits) see some benefit but other overheads completely
overwhelm it, but for large repos (~122k commits) the 80-100x speedup is
clearly visible to the user.
All of the numbers are in seconds and were measured with time.time() calls
placed in _index_repo(). The times exclude the time taken by changedfiles
processing.
Small repo (guilt, 553 commits, 1 head):
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0008781 0.0009274 0.0009800 0.0012285 0.0014637 0.0024107 (before)
0.0003092 0.0003281 0.0003519 0.0003777 0.0003927 0.0006843 (after)
Medium repo (hamlib, 12k commits, 53 heads):
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.04881 0.05135 0.07632 0.06672 0.08042 0.09415 (before)
0.004249 0.004420 0.004799 0.004809 0.005051 0.006416 (after)
Large repo (qemu, 122k commits, 50 heads):
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.274 4.595 4.832 6.578 8.397 9.721 (before)
0.05180 0.05643 0.05865 0.06130 0.06712 0.06872 (after)
author | Josef 'Jeff' Sipek <jeffpc@josefsipek.net> |
---|---|
date | Wed, 02 Oct 2024 15:01:26 -0400 |
parents | ab4fb2d15bc9 |
children | 4e2ea270ba6a |
files | hgext/git/index.py |
diffstat | 1 files changed, 18 insertions(+), 9 deletions(-) [+] |
line wrap: on
line diff
--- a/hgext/git/index.py Wed Oct 02 14:53:24 2024 -0400 +++ b/hgext/git/index.py Wed Oct 02 15:01:26 2024 -0400 @@ -18,7 +18,7 @@ pygit2 = gitutil.get_pygit2() -_CURRENT_SCHEMA_VERSION = 1 +_CURRENT_SCHEMA_VERSION = 2 _SCHEMA = ( """ CREATE TABLE refs ( @@ -35,6 +35,8 @@ node TEXT NOT NULL ); +CREATE UNIQUE INDEX possible_heads_idx ON possible_heads(node); + -- The topological heads of the changelog, which hg depends on. CREATE TABLE heads ( node TEXT NOT NULL @@ -331,14 +333,21 @@ ) db.execute('DELETE FROM heads') db.execute('DELETE FROM possible_heads') - for hid in possible_heads: - h = hid.hex - db.execute('INSERT INTO possible_heads (node) VALUES(?)', (h,)) - haschild = db.execute( - 'SELECT COUNT(*) FROM changelog WHERE p1 = ? OR p2 = ?', (h, h) - ).fetchone()[0] - if not haschild: - db.execute('INSERT INTO heads (node) VALUES(?)', (h,)) + db.executemany( + 'INSERT INTO possible_heads (node) VALUES(?)', + [(hid.hex,) for hid in possible_heads], + ) + db.execute( + ''' + INSERT INTO heads (node) + SELECT node FROM possible_heads WHERE + node NOT IN ( + SELECT DISTINCT possible_heads.node FROM changelog, possible_heads WHERE + changelog.p1 = possible_heads.node OR + changelog.p2 = possible_heads.node + ) + ''' + ) db.commit() if prog is not None: