summaryrefslogtreecommitdiffstats
path: root/CHANGELOG.md
diff options
context:
space:
mode:
authorAndrew Gallant <jamslam@gmail.com>2019-04-07 18:43:01 -0400
committerAndrew Gallant <jamslam@gmail.com>2019-04-07 19:11:03 -0400
commit09108b7fda7af6db7c1c4f0366301f9a21cc485d (patch)
treefc1afd8e8b036312f97965f51d7de1e5e9d0db7d /CHANGELOG.md
parent743d64f2e4093a3302895e128fbbc58e6fb8ed18 (diff)
regex: make multi-literal searcher faster
This makes the case of searching for a dictionary of a very large number of literals much much faster. (~10x or so.) In particular, we achieve this by short-circuiting the construction of a full regex when we know we have a simple alternation of literals. Building the regex for a large dictionary (>100,000 literals) turns out to be quite slow, even if it internally will dispatch to Aho-Corasick. Even that isn't quite enough. It turns out that even *parsing* such a regex is quite slow. So when the -F/--fixed-strings flag is set, we short circuit regex parsing completely and jump straight to Aho-Corasick. We aren't quite as fast as GNU grep here, but it's much closer (less than 2x slower). In general, this is somewhat of a hack. In particular, it seems plausible that this optimization could be implemented entirely in the regex engine. Unfortunately, the regex engine's internals are just not amenable to this at all, so it would require a larger refactoring effort. For now, it's good enough to add this fairly simple hack at a higher level. Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will be slower, because of the aforementioned missing optimization. Moreover, passing flags like `-i` or `-S` will cause ripgrep to abandon this optimization and fall back to something potentially much slower. Again, this fix really needs to happen inside the regex engine, although we might be able to special case -i when the input literals are pure ASCII via Aho-Corasick's `ascii_case_insensitive`. Fixes #497, Fixes #838
Diffstat (limited to 'CHANGELOG.md')
-rw-r--r--CHANGELOG.md6
1 files changed, 6 insertions, 0 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9f026268..0e9f381d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -17,6 +17,12 @@ TODO.
available, however, it does increase compilation times substantially at the
moment.
+Performance improvements:
+
+* [PERF #497](https://github.com/BurntSushi/ripgrep/issues/497),
+ [PERF #838](https://github.com/BurntSushi/ripgrep/issues/838):
+ Make `rg -F -f dictionary-of-literals` much faster.
+
Feature enhancements:
* [FEATURE #1099](https://github.com/BurntSushi/ripgrep/pull/1099):