Mercurial > public > mercurial-scm > hg
diff rust/hg-core/src/matchers.rs @ 44832:ad1ec40975aa
rust-regex: fix issues with regex anchoring and performance
It turns out that the way I tried to work around `regex`'s behavior difference
with `re2` and Python's `re` was 1) buggy and 2) much more complicated than
needed.
In a few words:
`regex` adds `.*` on either side of patterns when no start or end anchor is
present. My previous workaround put `^` or `$` for every pattern, which is
wrong even without the other 2 bugs on top of it.
Using `^(?:<patterns>)` right at the end of the `regex` path fixes the issue.
I've opened an issue to get a build option instead:
https://github.com/rust-lang/regex/issues/675
Differential Revision: https://phab.mercurial-scm.org/D8506
author | Rapha?l Gom?s <rgomes@octobus.net> |
---|---|
date | Thu, 07 May 2020 23:52:08 +0200 |
parents | de0fb4463a3d |
children | fd3b94f1712d |
line wrap: on
line diff
--- a/rust/hg-core/src/matchers.rs Thu May 07 16:56:03 2020 -0400 +++ b/rust/hg-core/src/matchers.rs Thu May 07 23:52:08 2020 +0200 @@ -347,7 +347,9 @@ ) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> { use std::io::Write; - let mut escaped_bytes = vec![]; + // The `regex` crate adds `.*` to the start and end of expressions if there + // are no anchors, so add the start anchor. + let mut escaped_bytes = vec![b'^', b'(', b'?', b':']; for byte in pattern { if *byte > 127 { write!(escaped_bytes, "\\x{:x}", *byte).unwrap(); @@ -355,6 +357,7 @@ escaped_bytes.push(*byte); } } + escaped_bytes.push(b')'); // Avoid the cost of UTF8 checking //