summaryrefslogtreecommitdiffstats
path: root/Cargo.lock
diff options
context:
space:
mode:
authorAndrew Gallant <jamslam@gmail.com>2019-04-07 18:43:01 -0400
committerAndrew Gallant <jamslam@gmail.com>2019-04-07 19:11:03 -0400
commit09108b7fda7af6db7c1c4f0366301f9a21cc485d (patch)
treefc1afd8e8b036312f97965f51d7de1e5e9d0db7d /Cargo.lock
parent743d64f2e4093a3302895e128fbbc58e6fb8ed18 (diff)
regex: make multi-literal searcher faster
This makes the case of searching for a dictionary of a very large number of literals much much faster. (~10x or so.) In particular, we achieve this by short-circuiting the construction of a full regex when we know we have a simple alternation of literals. Building the regex for a large dictionary (>100,000 literals) turns out to be quite slow, even if it internally will dispatch to Aho-Corasick. Even that isn't quite enough. It turns out that even *parsing* such a regex is quite slow. So when the -F/--fixed-strings flag is set, we short circuit regex parsing completely and jump straight to Aho-Corasick. We aren't quite as fast as GNU grep here, but it's much closer (less than 2x slower). In general, this is somewhat of a hack. In particular, it seems plausible that this optimization could be implemented entirely in the regex engine. Unfortunately, the regex engine's internals are just not amenable to this at all, so it would require a larger refactoring effort. For now, it's good enough to add this fairly simple hack at a higher level. Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will be slower, because of the aforementioned missing optimization. Moreover, passing flags like `-i` or `-S` will cause ripgrep to abandon this optimization and fall back to something potentially much slower. Again, this fix really needs to happen inside the regex engine, although we might be able to special case -i when the input literals are pure ASCII via Aho-Corasick's `ascii_case_insensitive`. Fixes #497, Fixes #838
Diffstat (limited to 'Cargo.lock')
-rw-r--r--Cargo.lock1
1 files changed, 1 insertions, 0 deletions
diff --git a/Cargo.lock b/Cargo.lock
index 9c633d45..1a895d17 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -211,6 +211,7 @@ dependencies = [
name = "grep-regex"
version = "0.1.2"
dependencies = [
+ "aho-corasick 0.7.3 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.1.1",
"log 0.4.6 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.1.5 (registry+https://github.com/rust-lang/crates.io-index)",