diff options
author | Andrew Gallant <jamslam@gmail.com> | 2023-10-09 19:51:44 -0400 |
---|---|---|
committer | Andrew Gallant <jamslam@gmail.com> | 2023-10-09 20:29:52 -0400 |
commit | 5011f6e9f1da44ffd923d612e75e70411d63a0ea (patch) | |
tree | bd4af8c3c00736d39b4a8736ec109758d863cdf3 /CHANGELOG.md | |
parent | a2799ccb41078c75a0a0420299a80ca4b1361632 (diff) |
changelog: add perf bug fix for \b
Like the previous CHANGELOG entry, this marks a bug that was fixed
likely with the introduction of regex 1.9:
$ hyperfine "rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt" "rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt"
Benchmark 1: rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt
Time (mean ± σ): 1.034 s ± 0.011 s [User: 1.030 s, System: 0.004 s]
Range (min … max): 1.021 s … 1.053 s 10 runs
Benchmark 2: rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt
Time (mean ± σ): 6.3 ms ± 0.3 ms [User: 4.6 ms, System: 1.6 ms]
Range (min … max): 5.6 ms … 7.3 ms 343 runs
Summary
'rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt' ran
164.95 ± 7.70 times faster than 'rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt'
This was not fixed by making \b itself faster, but rather, by improving
inner literal extraction. In particular, if the regex doesn't have any
literals extracted, then search time can still be quite slow:
$ time rg-13.0.0 -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
57538
real 0.427
user 0.423
sys 0.003
maxmem 46 MB
faults 0
$ time rg -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
57538
real 0.337
user 0.333
sys 0.003
maxmem 46 MB
faults 0
But then again, so is grep, because grep doesn't benefit from any
literal optimizations either:
$ time grep -E -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
62396
real 1.316
user 1.292
sys 0.007
maxmem 13 MB
faults 7
The count mismatch should probably be investigated.
Fixes #1760
Diffstat (limited to 'CHANGELOG.md')
-rw-r--r-- | CHANGELOG.md | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md index 3c160b9f..f9180118 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,8 @@ Unreleased changes. Release notes have not yet been written. Performance improvements: +* [PERF #1760](https://github.com/BurntSushi/ripgrep/issues/1760): + Make most searches with `\b` look-arounds (among others) much faster. * [PERF #2591](https://github.com/BurntSushi/ripgrep/pull/2591): Parallel directory traversal now uses work stealing for faster searches. |