diff rust/hg-core/src/matchers.rs @ 44832:ad1ec40975aa

rust-regex: fix issues with regex anchoring and performance It turns out that the way I tried to work around `regex`'s behavior difference with `re2` and Python's `re` was 1) buggy and 2) much more complicated than needed. In a few words: `regex` adds `.*` on either side of patterns when no start or end anchor is present. My previous workaround put `^` or `$` for every pattern, which is wrong even without the other 2 bugs on top of it. Using `^(?:<patterns>)` right at the end of the `regex` path fixes the issue. I've opened an issue to get a build option instead: https://github.com/rust-lang/regex/issues/675 Differential Revision: https://phab.mercurial-scm.org/D8506
author Rapha?l Gom?s <rgomes@octobus.net>
date Thu, 07 May 2020 23:52:08 +0200
parents de0fb4463a3d
children fd3b94f1712d
line wrap: on
line diff
--- a/rust/hg-core/src/matchers.rs	Thu May 07 16:56:03 2020 -0400
+++ b/rust/hg-core/src/matchers.rs	Thu May 07 23:52:08 2020 +0200
@@ -347,7 +347,9 @@
 ) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> {
     use std::io::Write;
 
-    let mut escaped_bytes = vec![];
+    // The `regex` crate adds `.*` to the start and end of expressions if there
+    // are no anchors, so add the start anchor.
+    let mut escaped_bytes = vec![b'^', b'(', b'?', b':'];
     for byte in pattern {
         if *byte > 127 {
             write!(escaped_bytes, "\\x{:x}", *byte).unwrap();
@@ -355,6 +357,7 @@
             escaped_bytes.push(*byte);
         }
     }
+    escaped_bytes.push(b')');
 
     // Avoid the cost of UTF8 checking
     //