Mercurial > public > mercurial-scm > hg
view rust/hg-core/src/lib.rs @ 52556:1866119cbad7
rust-ignore: construct regex Hir object directly, avoiding large regex string
Rework how we convert patterns to regexes in rust.
Instead of going patterns -> string -> Regex, which is slow and causes
some correctness issues, build a structured regex_syntax::hir::Hir value,
which is faster and it also prevents surprising regex escape.
This change makes the time of `build_regex_match` go from ~70-80ms
to ~40ms in my testing (for a large hgignore).
The bug I mentioned involves regex patterns that "escape" their
intended scope. For example, a sequence of hgignore regexp patterns like
this would previously lead to surprising behavior:
foo(?:
bar
baz
)
this matches foobar and foobaz, and doesn't match bar and baz.
The new behavior is to report a pattern parse error
The Python hg also has this bug, so this bugfix
not really helping much, but it's probably better to
fall back to real Python bugs than to simulate them.
author | Arseniy Alekseyev <aalekseyev@janestreet.com> |
---|---|
date | Fri, 06 Dec 2024 20:27:59 +0000 |
parents | 22d24f6d6411 |
children | 1b7a57a5b47a |
line wrap: on
line source
// Copyright 2018-2020 Georges Racinet <georges.racinet@octobus.net> // and Mercurial contributors // // This software may be used and distributed according to the terms of the // GNU General Public License version 2 or any later version. mod ancestors; pub mod dagops; pub mod errors; pub mod narrow; pub mod sparse; pub use ancestors::{AncestorsIterator, MissingAncestors}; pub mod dirstate; pub mod discovery; pub mod exit_codes; pub mod fncache; pub mod requirements; pub mod testing; // unconditionally built, for use from integration tests // Export very common type to make discovery easier pub use dirstate::DirstateParents; pub mod copy_tracing; pub mod filepatterns; pub mod matchers; pub mod repo; pub mod revlog; // Export very common types to make discovery easier pub use revlog::{ BaseRevision, Graph, GraphError, Node, NodePrefix, Revision, UncheckedRevision, NULL_NODE, NULL_NODE_ID, NULL_REVISION, WORKING_DIRECTORY_HEX, WORKING_DIRECTORY_REVISION, }; pub mod checkexec; pub mod config; pub mod lock; pub mod logging; pub mod operations; mod pre_regex; pub mod progress; pub mod revset; pub mod transaction; pub mod update; pub mod utils; pub mod vfs; use std::{collections::HashMap, sync::atomic::AtomicBool}; use twox_hash::RandomXxHashBuilder64; /// Used to communicate with threads spawned from code within this crate that /// they should stop their work (SIGINT was received). pub static INTERRUPT_RECEIVED: AtomicBool = AtomicBool::new(false); pub type LineNumber = usize; /// Rust's default hasher is too slow because it tries to prevent collision /// attacks. We are not concerned about those: if an ill-minded person has /// write access to your repository, you have other issues. pub type FastHashMap<K, V> = HashMap<K, V, RandomXxHashBuilder64>; // TODO: should this be the default `FastHashMap` for all of hg-core, not just // dirstate? How does XxHash compare with AHash, hashbrown’s default? pub type FastHashbrownMap<K, V> = hashbrown::HashMap<K, V, RandomXxHashBuilder64>;