summaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authorAndrew Gallant <jamslam@gmail.com>2020-03-15 13:19:45 -0400
committerAndrew Gallant <jamslam@gmail.com>2020-03-15 13:21:18 -0400
commita38913b63a104d779f7870be5cc92d6523449177 (patch)
tree7582a73ccc856665f60edcf9d8031054475db898 /README.md
parente772a95b58b77604e9a4657e949431539ec15b08 (diff)
readme: update benchmarks
This also updates the corpora used, so previous times (and counts) are not comparable. We also remove some tools, likt pt, sift and ucg, since they appear to be no longer maintained. ag isn't really maintained either, but it still has significant mind share, so we retain a benchmark for it. We also upgrade ack to version 3, and remove the clarification on how `-w` is implemented. We also add `git grep -P` (uses PCRE2) which appears to be much faster than `git grep -E`. Finally, we add ugrep which is a new up and comer in this space. Fixes #1474
Diffstat (limited to 'README.md')
-rw-r--r--README.md57
1 files changed, 28 insertions, 29 deletions
diff --git a/README.md b/README.md
index 760643fd..3e20e159 100644
--- a/README.md
+++ b/README.md
@@ -38,10 +38,11 @@ Please see the [CHANGELOG](CHANGELOG.md) for a release history.
### Quick examples comparing tools
-This example searches the entire Linux kernel source tree (after running
-`make defconfig && make -j8`) for `[A-Z]+_SUSPEND`, where all matches must be
-words. Timings were collected on a system with an Intel i7-6900K 3.2 GHz, and
-ripgrep was compiled with SIMD enabled.
+This example searches the entire
+[Linux kernel source tree](https://github.com/BurntSushi/linux)
+(after running `make defconfig && make -j8`) for `[A-Z]+_SUSPEND`, where
+all matches must be words. Timings were collected on a system with an Intel
+i7-6900K 3.2 GHz.
Please remember that a single benchmark is never enough! See my
[blog post on ripgrep](https://blog.burntsushi.net/ripgrep/)
@@ -49,40 +50,38 @@ for a very detailed comparison with more benchmarks and analysis.
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
-| ripgrep (Unicode) | `rg -n -w '[A-Z]+_SUSPEND'` | 450 | **0.106s** |
-| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=C git grep -E -n -w '[A-Z]+_SUSPEND'` | 450 | 0.553s |
-| [The Silver Searcher](https://github.com/ggreer/the_silver_searcher) | `ag -w '[A-Z]+_SUSPEND'` | 450 | 0.589s |
-| [git grep (Unicode)](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=en_US.UTF-8 git grep -E -n -w '[A-Z]+_SUSPEND'` | 450 | 2.266s |
-| [sift](https://github.com/svent/sift) | `sift --git -n -w '[A-Z]+_SUSPEND'` | 450 | 3.505s |
-| [ack](https://github.com/beyondgrep/ack2) | `ack -w '[A-Z]+_SUSPEND'` | 1878 | 6.823s |
-| [The Platinum Searcher](https://github.com/monochromegane/the_platinum_searcher) | `pt -w -e '[A-Z]+_SUSPEND'` | 450 | 14.208s |
-
-(Yes, `ack` [has](https://github.com/beyondgrep/ack2/issues/445) a
-[bug](https://github.com/beyondgrep/ack2/issues/14).)
-
-Here's another benchmark that disregards gitignore files and searches with a
-whitelist instead. The corpus is the same as in the previous benchmark, and the
-flags passed to each command ensure that they are doing equivalent work:
+| ripgrep (Unicode) | `rg -n -w '[A-Z]+_SUSPEND'` | 452 | **0.136s** |
+| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `git grep -P -n -w '[A-Z]+_SUSPEND'` | 452 | 0.348s |
+| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -r --ignore-files --no-hidden -I -w '[A-Z]+_SUSPEND'` | 452 | 0.506s |
+| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=C git grep -E -n -w '[A-Z]+_SUSPEND'` | 452 | 1.150s |
+| [The Silver Searcher](https://github.com/ggreer/the_silver_searcher) | `ag -w '[A-Z]+_SUSPEND'` | 452 | 0.654s |
+| [ack](https://github.com/beyondgrep/ack3) | `ack -w '[A-Z]+_SUSPEND'` | 452 | 4.054s |
+| [git grep (Unicode)](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=en_US.UTF-8 git grep -E -n -w '[A-Z]+_SUSPEND'` | 452 | 4.205s |
+
+Here's another benchmark on the same corpus as above that disregards gitignore
+files and searches with a whitelist instead. The corpus is the same as in the
+previous benchmark, and the flags passed to each command ensure that they are
+doing equivalent work:
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
-| ripgrep | `rg -L -u -tc -n -w '[A-Z]+_SUSPEND'` | 404 | **0.079s** |
-| [ucg](https://github.com/gvansickle/ucg) | `ucg --type=cc -w '[A-Z]+_SUSPEND'` | 390 | 0.163s |
-| [GNU grep](https://www.gnu.org/software/grep/) | `egrep -R -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 404 | 0.611s |
-
-(`ucg` [has slightly different behavior in the presence of symbolic links](https://github.com/gvansickle/ucg/issues/106).)
+| ripgrep | `rg -uuu -tc -n -w '[A-Z]+_SUSPEND'` | 388 | **0.096s** |
+| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -r -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 388 | 0.493s |
+| [GNU grep](https://www.gnu.org/software/grep/) | `egrep -r -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 388 | 0.806s |
-And finally, a straight-up comparison between ripgrep and GNU grep on a single
-large file (~9.3GB,
-[`OpenSubtitles2016.raw.en.gz`](http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.en.gz)):
+And finally, a straight-up comparison between ripgrep, ugrep and GNU grep on a
+single large file
+(~9.3GB, [`OpenSubtitles.raw.en.gz`](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.en.gz)):
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
-| ripgrep | `rg -w 'Sherlock [A-Z]\w+'` | 5268 | **2.108s** |
-| [GNU grep](https://www.gnu.org/software/grep/) | `LC_ALL=C egrep -w 'Sherlock [A-Z]\w+'` | 5268 | 7.014s |
+| ripgrep | `rg -w 'Sherlock [A-Z]\w+'` | 7882 | **2.769s** |
+| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -w 'Sherlock [A-Z]\w+'` | 7882 | 6.802s |
+| [GNU grep](https://www.gnu.org/software/grep/) | `LC_ALL=en_US.UTF-8 egrep -w 'Sherlock [A-Z]\w+'` | 7882 | 9.027s |
In the above benchmark, passing the `-n` flag (for showing line numbers)
-increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep.
+increases the times to `3.423s` for ripgrep and `13.031s` for GNU grep. ugrep
+times are unaffected by the presence or absence of `-n`.
### Why should I use ripgrep?