summaryrefslogtreecommitdiffstats
path: root/src
diff options
context:
space:
mode:
authorAndrew Gallant <jamslam@gmail.com>2019-04-08 19:28:38 -0400
committerAndrew Gallant <jamslam@gmail.com>2019-04-14 19:29:27 -0400
commita7d26c8f144a4957b75f71087a66692d0b25759a (patch)
tree4888ac5ea66643ac919d4e12c60cc51992bef11a /src
parentbd222ae93fa0cabe7d51ba8db40ece99579bdaed (diff)
binary: rejigger ripgrep's handling of binary files
This commit attempts to surface binary filtering in a slightly more user friendly way. Namely, before, ripgrep would silently stop searching a file if it detected a NUL byte, even if it had previously printed a match. This can lead to the user quite reasonably assuming that there are no more matches, since a partial search is fairly unintuitive. (ripgrep has this behavior by default because it really wants to NOT search binary files at all, just like it doesn't search gitignored or hidden files.) With this commit, if a match has already been printed and ripgrep detects a NUL byte, then it will print a warning message indicating that the search stopped prematurely. Moreover, this commit adds a new flag, --binary, which causes ripgrep to stop filtering binary files, but in a way that still avoids dumping binary data into terminals. That is, the --binary flag makes ripgrep behave more like grep's default behavior. For files explicitly specified in a search, e.g., `rg foo some-file`, then no binary filtering is applied (just like no gitignore and no hidden file filtering is applied). Instead, ripgrep behaves as if you gave the --binary flag for all explicitly given files. This was a fairly invasive change, and potentially increases the UX complexity of ripgrep around binary files. (Before, there were two binary modes, where as now there are three.) However, ripgrep is now a bit louder with warning messages when binary file detection might otherwise be hiding potential matches, so hopefully this is a net improvement. Finally, the `-uuu` convenience now maps to `--no-ignore --hidden --binary`, since this is closer to the actualy intent of the `--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a consequence, `rg -uuu foo` should now search roughly the same number of bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the same number of bytes as `grep -ra foo`. (The "roughly" weasel word is used because grep's and ripgrep's binary file detection might differ somewhat---perhaps based on buffer sizes---which can impact exactly what is and isn't searched.) See the numerous tests in tests/binary.rs for intended behavior. Fixes #306, Fixes #855
Diffstat (limited to 'src')
-rw-r--r--src/app.rs73
-rw-r--r--src/args.rs45
-rw-r--r--src/search.rs45
-rw-r--r--src/subject.rs33
4 files changed, 167 insertions, 29 deletions
diff --git a/src/app.rs b/src/app.rs
index 66eaedb4..d062699f 100644
--- a/src/app.rs
+++ b/src/app.rs
@@ -27,6 +27,9 @@ configuration file. The file can specify one shell argument per line. Lines
starting with '#' are ignored. For more details, see the man page or the
README.
+Tip: to disable all smart filtering and make ripgrep behave a bit more like
+classical grep, use 'rg -uuu'.
+
Project home page: https://github.com/BurntSushi/ripgrep
Use -h for short descriptions and --help for more details.";
@@ -545,6 +548,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
// "positive" flag.
flag_after_context(&mut args);
flag_before_context(&mut args);
+ flag_binary(&mut args);
flag_block_buffered(&mut args);
flag_byte_offset(&mut args);
flag_case_sensitive(&mut args);
@@ -691,6 +695,55 @@ This overrides the --context flag.
args.push(arg);
}
+fn flag_binary(args: &mut Vec<RGArg>) {
+ const SHORT: &str = "Search binary files.";
+ const LONG: &str = long!("\
+Enabling this flag will cause ripgrep to search binary files. By default,
+ripgrep attempts to automatically skip binary files in order to improve the
+relevance of results and make the search faster.
+
+Binary files are heuristically detected based on whether they contain a NUL
+byte or not. By default (without this flag set), once a NUL byte is seen,
+ripgrep will stop searching the file. Usually, NUL bytes occur in the beginning
+of most binary files. If a NUL byte occurs after a match, then ripgrep will
+still stop searching the rest of the file, but a warning will be printed.
+
+In contrast, when this flag is provided, ripgrep will continue searching a file
+even if a NUL byte is found. In particular, if a NUL byte is found then ripgrep
+will continue searching until either a match is found or the end of the file is
+reached, whichever comes sooner. If a match is found, then ripgrep will stop
+and print a warning saying that the search stopped prematurely.
+
+If you want ripgrep to search a file without any special NUL byte handling at
+all (and potentially print binary data to stdout), then you should use the
+'-a/--text' flag.
+
+The '--binary' flag is a flag for controlling ripgrep's automatic filtering
+mechanism. As such, it does not need to be used when searching a file
+explicitly or when searching stdin. That is, it is only applicable when
+recursively searching a directory.
+
+Note that when the '-u/--unrestricted' flag is provided for a third time, then
+this flag is automatically enabled.
+
+This flag can be disabled with '--no-binary'. It overrides the '-a/--text'
+flag.
+");
+ let arg = RGArg::switch("binary")
+ .help(SHORT).long_help(LONG)
+ .overrides("no-binary")
+ .overrides("text")
+ .overrides("no-text");
+ args.push(arg);
+
+ let arg = RGArg::switch("no-binary")
+ .hidden()
+ .overrides("binary")
+ .overrides("text")
+ .overrides("no-text");
+ args.push(arg);
+}
+
fn flag_block_buffered(args: &mut Vec<RGArg>) {
const SHORT: &str = "Force block buffering.";
const LONG: &str = long!("\
@@ -1874,7 +1927,7 @@ fn flag_pre(args: &mut Vec<RGArg>) {
For each input FILE, search the standard output of COMMAND FILE rather than the
contents of FILE. This option expects the COMMAND program to either be an
absolute path or to be available in your PATH. Either an empty string COMMAND
-or the `--no-pre` flag will disable this behavior.
+or the '--no-pre' flag will disable this behavior.
WARNING: When this flag is set, ripgrep will unconditionally spawn a
process for every file that is searched. Therefore, this can incur an
@@ -2208,20 +2261,23 @@ escape codes to be printed that alter the behavior of your terminal.
When binary file detection is enabled it is imperfect. In general, it uses
a simple heuristic. If a NUL byte is seen during search, then the file is
considered binary and search stops (unless this flag is present).
+Alternatively, if the '--binary' flag is used, then ripgrep will only quit
+when it sees a NUL byte after it sees a match (or searches the entire file).
-Note that when the `-u/--unrestricted` flag is provided for a third time, then
-this flag is automatically enabled.
-
-This flag can be disabled with --no-text.
+This flag can be disabled with '--no-text'. It overrides the '--binary' flag.
");
let arg = RGArg::switch("text").short("a")
.help(SHORT).long_help(LONG)
- .overrides("no-text");
+ .overrides("no-text")
+ .overrides("binary")
+ .overrides("no-binary");
args.push(arg);
let arg = RGArg::switch("no-text")
.hidden()
- .overrides("text");
+ .overrides("text")
+ .overrides("binary")
+ .overrides("no-binary");
args.push(arg);
}
@@ -2350,8 +2406,7 @@ Reduce the level of \"smart\" searching. A single -u won't respect .gitignore
(etc.) files. Two -u flags will additionally search hidden files and
directories. Three -u flags will additionally search binary files.
--uu is roughly equivalent to grep -r and -uuu is roughly equivalent to grep -a
--r.
+'rg -uuu' is roughly equivalent to 'grep -r'.
");
let arg = RGArg::switch("unrestricted").short("u")
.help(SHORT).long_help(LONG)
diff --git a/src/args.rs b/src/args.rs
index 61e1d4f3..6a5f09f9 100644
--- a/src/args.rs
+++ b/src/args.rs
@@ -286,15 +286,18 @@ impl Args {
&self,
wtr: W,
) -> Result<SearchWorker<W>> {
+ let matches = self.matches();
let matcher = self.matcher().clone();
let printer = self.printer(wtr)?;
- let searcher = self.matches().searcher(self.paths())?;
+ let searcher = matches.searcher(self.paths())?;
let mut builder = SearchWorkerBuilder::new();
builder
- .json_stats(self.matches().is_present("json"))
- .preprocessor(self.matches().preprocessor())
- .preprocessor_globs(self.matches().preprocessor_globs()?)
- .search_zip(self.matches().is_present("search-zip"));
+ .json_stats(matches.is_present("json"))
+ .preprocessor(matches.preprocessor())
+ .preprocessor_globs(matches.preprocessor_globs()?)
+ .search_zip(matches.is_present("search-zip"))
+ .binary_detection_implicit(matches.binary_detection_implicit())
+ .binary_detection_explicit(matches.binary_detection_explicit());
Ok(builder.build(matcher, searcher, printer))
}
@@ -802,8 +805,7 @@ impl ArgMatches {
.before_context(ctx_before)
.after_context(ctx_after)
.passthru(self.is_present("passthru"))
- .memory_map(self.mmap_choice(paths))
- .binary_detection(self.binary_detection());
+ .memory_map(self.mmap_choice(paths));
match self.encoding()? {
EncodingMode::Some(enc) => {
builder.encoding(Some(enc));
@@ -862,19 +864,42 @@ impl ArgMatches {
///
/// Methods are sorted alphabetically.
impl ArgMatches {
- /// Returns the form of binary detection to perform.
- fn binary_detection(&self) -> BinaryDetection {
+ /// Returns the form of binary detection to perform on files that are
+ /// implicitly searched via recursive directory traversal.
+ fn binary_detection_implicit(&self) -> BinaryDetection {
let none =
self.is_present("text")
- || self.unrestricted_count() >= 3
|| self.is_present("null-data");
+ let convert =
+ self.is_present("binary")
+ || self.unrestricted_count() >= 3;
if none {
BinaryDetection::none()
+ } else if convert {
+ BinaryDetection::convert(b'\x00')
} else {
BinaryDetection::quit(b'\x00')
}
}
+ /// Returns the form of binary detection to perform on files that are
+ /// explicitly searched via the user invoking ripgrep on a particular
+ /// file or files or stdin.
+ ///
+ /// In general, this should never be BinaryDetection::quit, since that acts
+ /// as a filter (but quitting immediately once a NUL byte is seen), and we
+ /// should never filter out files that the user wants to explicitly search.
+ fn binary_detection_explicit(&self) -> BinaryDetection {
+ let none =
+ self.is_present("text")
+ || self.is_present("null-data");
+ if none {
+ BinaryDetection::none()
+ } else {
+ BinaryDetection::convert(b'\x00')
+ }
+ }
+
/// Returns true if the command line configuration implies that a match
/// can never be shown.
fn can_never_match(&self, patterns: &[String]) -> bool {
diff --git a/src/search.rs b/src/search.rs
index 048f882b..149f67c3 100644
--- a/src/search.rs
+++ b/src/search.rs
@@ -10,7 +10,7 @@ use grep::matcher::Matcher;
use grep::pcre2::{RegexMatcher as PCRE2RegexMatcher};
use grep::printer::{JSON, Standard, Summary, Stats};
use grep::regex::{RegexMatcher as RustRegexMatcher};
-use grep::searcher::Searcher;
+use grep::searcher::{BinaryDetection, Searcher};
use ignore::overrides::Override;
use serde_json as json;
use serde_json::json;
@@ -27,6 +27,8 @@ struct Config {
preprocessor: Option<PathBuf>,
preprocessor_globs: Override,
search_zip: bool,
+ binary_implicit: BinaryDetection,
+ binary_explicit: BinaryDetection,
}
impl Default for Config {
@@ -36,6 +38,8 @@ impl Default for Config {
preprocessor: None,
preprocessor_globs: Override::empty(),
search_zip: false,
+ binary_implicit: BinaryDetection::none(),
+ binary_explicit: BinaryDetection::none(),
}
}
}
@@ -134,6 +138,37 @@ impl SearchWorkerBuilder {
self.config.search_zip = yes;
self
}
+
+ /// Set the binary detection that should be used when searching files
+ /// found via a recursive directory search.
+ ///
+ /// Generally, this binary detection may be `BinaryDetection::quit` if
+ /// we want to skip binary files completely.
+ ///
+ /// By default, no binary detection is performed.
+ pub fn binary_detection_implicit(
+ &mut self,
+ detection: BinaryDetection,
+ ) -> &mut SearchWorkerBuilder {
+ self.config.binary_implicit = detection;
+ self
+ }
+
+ /// Set the binary detection that should be used when searching files
+ /// explicitly supplied by an end user.
+ ///
+ /// Generally, this binary detection should NOT be `BinaryDetection::quit`,
+ /// since we never want to automatically filter files supplied by the end
+ /// user.
+ ///
+ /// By default, no binary detection is performed.
+ pub fn binary_detection_explicit(
+ &mut self,
+ detection: BinaryDetection,
+ ) -> &mut SearchWorkerBuilder {
+ self.config.binary_explicit = detection;
+ self
+ }
}
/// The result of executing a search.
@@ -308,6 +343,14 @@ impl<W: WriteColor> SearchWorker<W> {
/// Search the given subject using the appropriate strategy.
fn search_impl(&mut self, subject: &Subject) -> io::Result<SearchResult> {
+ let bin =
+ if subject.is_explicit() {
+ self.config.binary_explicit.clone()
+ } else {
+ self.config.binary_implicit.clone()
+ };
+ self.searcher.set_binary_detection(bin);
+
let path = subject.path();
if subject.is_stdin() {
let stdin = io::stdin();
diff --git a/src/subject.rs b/src/subject.rs
index 0eae5c26..38e92359 100644
--- a/src/subject.rs
+++ b/src/subject.rs
@@ -59,17 +59,12 @@ impl SubjectBuilder {
if let Some(ignore_err) = subj.dent.error() {
ignore_message!("{}", ignore_err);
}
- // If this entry represents stdin, then we always search it.
- if subj.dent.is_stdin() {
+ // If this entry was explicitly provided by an end user, then we always
+ // want to search it.
+ if subj.is_explicit() {
return Some(subj);
}
- // If this subject has a depth of 0, then it was provided explicitly
- // by an end user (or via a shell glob). In this case, we always want
- // to search it if it even smells like a file (e.g., a symlink).
- if subj.dent.depth() == 0 && !subj.is_dir() {
- return Some(subj);
- }
- // At this point, we only want to search something it's explicitly a
+ // At this point, we only want to search something if it's explicitly a
// file. This omits symlinks. (If ripgrep was configured to follow
// symlinks, then they have already been followed by the directory
// traversal.)
@@ -127,6 +122,26 @@ impl Subject {
self.dent.is_stdin()
}
+ /// Returns true if and only if this entry corresponds to a subject to
+ /// search that was explicitly supplied by an end user.
+ ///
+ /// Generally, this corresponds to either stdin or an explicit file path
+ /// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
+ /// an explicit subject, but, e.g., `./some-dir/some-other-file` is not.
+ ///
+ /// However, note that ripgrep does not see through shell globbing. e.g.,
+ /// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
+ /// as an explicit subject.
+ pub fn is_explicit(&self) -> bool {
+ // stdin is obvious. When an entry has a depth of 0, that means it
+ // was explicitly provided to our directory iterator, which means it
+ // was in turn explicitly provided by the end user. The !is_dir check
+ // means that we want to search files even if their symlinks, again,
+ // because they were explicitly provided. (And we never want to try
+ // to search a directory.)
+ self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
+ }
+
/// Returns true if and only if this subject points to a directory after
/// following symbolic links.
fn is_dir(&self) -> bool {