Mercurial > public > mercurial-scm > hg
view rust/hg-core/src/operations/cat.rs @ 50692:1c31b343e514
match: add `filepath:` pattern to match an exact filepath relative to the root
It's useful in certain automated workflows to make sure we recurse in
directories whose name conflicts with files in other revisions.
In addition it makes it possible to avoid building a potentially costly regex,
improving performance when the set of files to match explicitly is large.
The benchmark below are run in the following configuration :
# data-env-vars.name = mozilla-central-2018-08-01-zstd-sparse-revlog
# benchmark.name = files
# benchmark.variants.rev = tip
# benchmark.variants.files = all-list-filepath-sorted
# bin-env-vars.hg.flavor = no-rust
It also includes timings using the re2 engine (through the `google-re2` module)
to show how much can be saved by just using a better regexp engine.
Pattern time (seconds) time using re2
-----------------------------------------------------------
just "." 0.4 0.4
list of "filepath:?" 1.3 1.3
list of "path:?" 25.7 3.9
list of patterns 29.7 10.4
As you can see, Without re2, using "filepath:" instead of "path:" is a huge
win. With re2, it is still about three times faster to not have to build the
regex.
author | Rapha?l Gom?s <rgomes@octobus.net> |
---|---|
date | Mon, 12 Jun 2023 16:51:08 +0200 |
parents | 750409505286 |
children | 1928b770e3e7 |
line wrap: on
line source
// list_tracked_files.rs // // Copyright 2020 Antoine Cezar <antoine.cezar@octobus.net> // // This software may be used and distributed according to the terms of the // GNU General Public License version 2 or any later version. use crate::repo::Repo; use crate::revlog::Node; use crate::revlog::RevlogError; use crate::utils::hg_path::HgPath; use crate::errors::HgError; use crate::manifest::Manifest; use crate::manifest::ManifestEntry; use itertools::put_back; use itertools::PutBack; use std::cmp::Ordering; pub struct CatOutput<'a> { /// Whether any file in the manifest matched the paths given as CLI /// arguments pub found_any: bool, /// The contents of matching files, in manifest order pub results: Vec<(&'a HgPath, Vec<u8>)>, /// Which of the CLI arguments did not match any manifest file pub missing: Vec<&'a HgPath>, /// The node ID that the given revset was resolved to pub node: Node, } // Find an item in an iterator over a sorted collection. fn find_item<'a>( i: &mut PutBack<impl Iterator<Item = Result<ManifestEntry<'a>, HgError>>>, needle: &HgPath, ) -> Result<Option<Node>, HgError> { loop { match i.next() { None => return Ok(None), Some(result) => { let entry = result?; match needle.as_bytes().cmp(entry.path.as_bytes()) { Ordering::Less => { i.put_back(Ok(entry)); return Ok(None); } Ordering::Greater => continue, Ordering::Equal => return Ok(Some(entry.node_id()?)), } } } } } // Tuple of (missing, found) paths in the manifest type ManifestQueryResponse<'a> = (Vec<(&'a HgPath, Node)>, Vec<&'a HgPath>); fn find_files_in_manifest<'query>( manifest: &Manifest, query: impl Iterator<Item = &'query HgPath>, ) -> Result<ManifestQueryResponse<'query>, HgError> { let mut manifest = put_back(manifest.iter()); let mut res = vec![]; let mut missing = vec![]; for file in query { match find_item(&mut manifest, file)? { None => missing.push(file), Some(item) => res.push((file, item)), } } Ok((res, missing)) } /// Output the given revision of files /// /// * `root`: Repository root /// * `rev`: The revision to cat the files from. /// * `files`: The files to output. pub fn cat<'a>( repo: &Repo, revset: &str, mut files: Vec<&'a HgPath>, ) -> Result<CatOutput<'a>, RevlogError> { let rev = crate::revset::resolve_single(revset, repo)?; let manifest = repo.manifest_for_rev(rev)?; let node = *repo .changelog()? .node_from_rev(rev) .expect("should succeed when repo.manifest did"); let mut results: Vec<(&'a HgPath, Vec<u8>)> = vec![]; let mut found_any = false; files.sort_unstable(); let (found, missing) = find_files_in_manifest(&manifest, files.into_iter())?; for (file_path, file_node) in found { found_any = true; let file_log = repo.filelog(file_path)?; results.push(( file_path, file_log.data_for_node(file_node)?.into_file_data()?, )); } Ok(CatOutput { found_any, results, missing, node, }) }