mercurial-scm/hg: hgext/git/index.py comparison

comparison hgext/git/index.py @ 52622:aa5844ade247

git: speed up possible head processing during indexing by ~100x Benchmarking of 50 iterations of indexing (see below) shows that there is essentially no difference for small repos (<1k commits), similarly medium repos (~12k commits) see some benefit but other overheads completely overwhelm it, but for large repos (~122k commits) the 80-100x speedup is clearly visible to the user. All of the numbers are in seconds and were measured with time.time() calls placed in _index_repo(). The times exclude the time taken by changedfiles processing. Small repo (guilt, 553 commits, 1 head): Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0008781 0.0009274 0.0009800 0.0012285 0.0014637 0.0024107 (before) 0.0003092 0.0003281 0.0003519 0.0003777 0.0003927 0.0006843 (after) Medium repo (hamlib, 12k commits, 53 heads): Min. 1st Qu. Median Mean 3rd Qu. Max. 0.04881 0.05135 0.07632 0.06672 0.08042 0.09415 (before) 0.004249 0.004420 0.004799 0.004809 0.005051 0.006416 (after) Large repo (qemu, 122k commits, 50 heads): Min. 1st Qu. Median Mean 3rd Qu. Max. 4.274 4.595 4.832 6.578 8.397 9.721 (before) 0.05180 0.05643 0.05865 0.06130 0.06712 0.06872 (after)

author	Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
date	Wed, 02 Oct 2024 15:01:26 -0400
parents	f4733654f144
children	4e2ea270ba6a

comparison

equal deleted inserted replaced

-:ab4fb2d15bc9
+:aa5844ade247
 from . import gitutil
 pygit2 = gitutil.get_pygit2()
-_CURRENT_SCHEMA_VERSION = 1
+_CURRENT_SCHEMA_VERSION = 2
 _SCHEMA = (
 """
 CREATE TABLE refs (
 -- node and name are unique together. There may be more than one name for
 -- a given node, and there may be no name at all for a given node (in the
 -- The "possible heads" of the repository, which we use to figure out
 -- if we need to re-walk the changelog.
 CREATE TABLE possible_heads (
 node TEXT NOT NULL
 );
+CREATE UNIQUE INDEX possible_heads_idx ON possible_heads(node);
 -- The topological heads of the changelog, which hg depends on.
 CREATE TABLE heads (
 node TEXT NOT NULL
 );
 'p2filenode) VALUES(?, ?, ?, ?, ?, ?, ?)',
 (commit.id.hex, p, n, None, None, None, None),
 )
 db.execute('DELETE FROM heads')
 db.execute('DELETE FROM possible_heads')
-for hid in possible_heads:
+db.executemany(
-h = hid.hex
+'INSERT INTO possible_heads (node) VALUES(?)',
-db.execute('INSERT INTO possible_heads (node) VALUES(?)', (h,))
+[(hid.hex,) for hid in possible_heads],
-haschild = db.execute(
+)
-'SELECT COUNT(*) FROM changelog WHERE p1 = ? OR p2 = ?', (h, h)
+db.execute(
-).fetchone()[0]
+'''
-if not haschild:
+INSERT INTO heads (node)
-db.execute('INSERT INTO heads (node) VALUES(?)', (h,))
+SELECT node FROM possible_heads WHERE
+node NOT IN (
+SELECT DISTINCT possible_heads.node FROM changelog, possible_heads WHERE
+changelog.p1 = possible_heads.node OR
+changelog.p2 = possible_heads.node
+)
+'''
+)
 db.commit()
 if prog is not None:
 prog.complete()

Mercurial > public > mercurial-scm > hg

comparison hgext/git/index.py @ 52622:aa5844ade247