summaryrefslogtreecommitdiffstats
path: root/grep-printer/src/summary.rs
diff options
context:
space:
mode:
authorAndrew Gallant <jamslam@gmail.com>2019-04-08 19:28:38 -0400
committerAndrew Gallant <jamslam@gmail.com>2019-04-14 19:29:27 -0400
commita7d26c8f144a4957b75f71087a66692d0b25759a (patch)
tree4888ac5ea66643ac919d4e12c60cc51992bef11a /grep-printer/src/summary.rs
parentbd222ae93fa0cabe7d51ba8db40ece99579bdaed (diff)
binary: rejigger ripgrep's handling of binary files
This commit attempts to surface binary filtering in a slightly more user friendly way. Namely, before, ripgrep would silently stop searching a file if it detected a NUL byte, even if it had previously printed a match. This can lead to the user quite reasonably assuming that there are no more matches, since a partial search is fairly unintuitive. (ripgrep has this behavior by default because it really wants to NOT search binary files at all, just like it doesn't search gitignored or hidden files.) With this commit, if a match has already been printed and ripgrep detects a NUL byte, then it will print a warning message indicating that the search stopped prematurely. Moreover, this commit adds a new flag, --binary, which causes ripgrep to stop filtering binary files, but in a way that still avoids dumping binary data into terminals. That is, the --binary flag makes ripgrep behave more like grep's default behavior. For files explicitly specified in a search, e.g., `rg foo some-file`, then no binary filtering is applied (just like no gitignore and no hidden file filtering is applied). Instead, ripgrep behaves as if you gave the --binary flag for all explicitly given files. This was a fairly invasive change, and potentially increases the UX complexity of ripgrep around binary files. (Before, there were two binary modes, where as now there are three.) However, ripgrep is now a bit louder with warning messages when binary file detection might otherwise be hiding potential matches, so hopefully this is a net improvement. Finally, the `-uuu` convenience now maps to `--no-ignore --hidden --binary`, since this is closer to the actualy intent of the `--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a consequence, `rg -uuu foo` should now search roughly the same number of bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the same number of bytes as `grep -ra foo`. (The "roughly" weasel word is used because grep's and ripgrep's binary file detection might differ somewhat---perhaps based on buffer sizes---which can impact exactly what is and isn't searched.) See the numerous tests in tests/binary.rs for intended behavior. Fixes #306, Fixes #855
Diffstat (limited to 'grep-printer/src/summary.rs')
-rw-r--r--grep-printer/src/summary.rs28
1 files changed, 28 insertions, 0 deletions
diff --git a/grep-printer/src/summary.rs b/grep-printer/src/summary.rs
index deb7e609..a1c7785e 100644
--- a/grep-printer/src/summary.rs
+++ b/grep-printer/src/summary.rs
@@ -636,6 +636,34 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for SummarySink<'p, 's, M, W> {
stats.add_bytes_searched(finish.byte_count());
stats.add_bytes_printed(self.summary.wtr.borrow().count());
}
+ // If our binary detection method says to quit after seeing binary
+ // data, then we shouldn't print any results at all, even if we've
+ // found a match before detecting binary data. The intent here is to
+ // keep BinaryDetection::quit as a form of filter. Otherwise, we can
+ // present a matching file with a smaller number of matches than
+ // there might be, which can be quite misleading.
+ //
+ // If our binary detection method is to convert binary data, then we
+ // don't quit and therefore search the entire contents of the file.
+ //
+ // There is an unfortunate inconsistency here. Namely, when using
+ // Quiet or PathWithMatch, then the printer can quit after the first
+ // match seen, which could be long before seeing binary data. This
+ // means that using PathWithMatch can print a path where as using
+ // Count might not print it at all because of binary data.
+ //
+ // It's not possible to fix this without also potentially significantly
+ // impacting the performance of Quiet or PathWithMatch, so we accept
+ // the bug.
+ if self.binary_byte_offset.is_some()
+ && searcher.binary_detection().quit_byte().is_some()
+ {
+ // Squash the match count. The statistics reported will still
+ // contain the match count, but the "official" match count should
+ // be zero.
+ self.match_count = 0;
+ return Ok(());
+ }
let show_count =
!self.summary.config.exclude_zero