summaryrefslogtreecommitdiffstats
path: root/tests
AgeCommit message (Collapse)Author
2017-03-12Add support for additional text encodings.Andrew Gallant
This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1
2017-03-12Fix test on Windows.Andrew Gallant
(This is what I get for directly pushing to master.)
2017-03-12Fix leading slash bug when used with `!`.Andrew Gallant
When writing paths like `!/foo` in gitignore files (or when using the -g/--glob flag), the presence of `!` would prevent the gitignore builder from noticing the leading slash, which causes absolute path matching to fail. Fixes #405
2017-03-08Remove clap validator + add max-filesize integration testsMarc Tiehuis
2017-03-08Add `--max-filesize` option to clitiehuis
The --max-filesize option allows filtering files which are larger than the specified limit. This is potentially useful if one is attempting to search a number of large files without common file-types/suffixes. See #369.
2017-03-08Add enclosing group to alternations in globsMarc Tiehuis
Fixes #391.
2017-01-11Make --column imply --line-number.Andrew Gallant
Closes #243
2017-01-10Add --path-separator flag.Andrew Gallant
This flag permits setting the path separator used for all file paths printed by ripgrep in normal operation. Fixes #275
2017-01-07Fix type compose test.Andrew Gallant
2017-01-07Provide a mechanism to compose type definitionsIan Kerins
This extends the syntax of the --type-add flag to allow including the globs of other already defined types. Fixes #83.
2017-01-06Add --sort-files flag.Andrew Gallant
When used, parallelism is disabled but the results are sorted by file path. Closes #263
2016-12-12Fix a non-termination bug.Andrew Gallant
This was a very silly bug. Instead of creating a particular atomic once and cloning it, we created a new value for each worker. Fixes #279
2016-12-06Fix leading hypen bug by updating clap.Andrew Gallant
Fixes #270
2016-12-05Fix bug reading root symlink.Andrew Gallant
When give an explicit file path on the command line like `foo` where `foo` is a symlink, ripgrep should follow it even if `-L` isn't set. This is consistent with the behavior of `foo/`. Fixes #256
2016-11-28Disable Unicode mode for literal regex.Andrew Gallant
When ripgrep detects a literal, it emits them as raw hex escaped byte sequences to Regex::new. This permits literal optimizations for arbitrary byte sequences (i.e., possibly invalid UTF-8). The problem is that Regex::new interprets hex escaped byte sequences as *Unicode codepoints* by default, but we want them to actually stand for their raw byte values. Therefore, disable Unicode mode. This is OK, since the regex is composed entirely of literals and literal extraction does Unicode case folding. Fixes #251
2016-11-28Detect more uppercase literals for --smart-case.Andrew Gallant
This changes the uppercase literal detection for the "smart case" functionality. In particular, a character class is considered to have an uppercase literal if at least one of its ranges starts or stops with an uppercase literal. Fixes #229
2016-11-19Rename --files-without-matches to --files-without-match.Andrew Gallant
This is to be consistent with grep.
2016-11-19Add --files-without-matches flag.Daniel Luz
Performs the opposite of --files-with-matches: only shows paths of files that contain zero matches. Closes #138
2016-11-17Fix issue number mixup.Andrew Gallant
Thanks @bluss!
2016-11-17Switch from Docopt to Clap.Andrew Gallant
There were two important reasons for the switch: 1. Performance. Docopt does poorly when the argv becomes large, which is a reasonable common use case for search tools. (e.g., use with xargs) 2. Better failure modes. Clap knows a lot more about how a particular argv might be invalid, and can therefore provide much clearer error messages. While both were important, (1) made it urgent. Note that since Clap requires at least Rust 1.11, this will in turn increase the minimum Rust version supported by ripgrep from Rust 1.9 to Rust 1.11. It is therefore a breaking change, so the soonest release of ripgrep with Clap will have to be 0.3. There is also at least one subtle breaking change in real usage. Previous to this commit, this used to work: rg -e -foo Where this would cause ripgrep to search for the string `-foo`. Clap currently has problems supporting this use case (see: https://github.com/kbknapp/clap-rs/issues/742), but it can be worked around by using this instead: rg -e [-]foo or even rg [-]foo and this still works: rg -- -foo This commit also adds Bash, Fish and PowerShell completion files to the release, fixes a bug that prevented ripgrep from working on file paths containing invalid UTF-8 and shows short descriptions in the output of `-h` but longer descriptions in the output of `--help`. Fixes #136, Fixes #189, Fixes #210, Fixes #230
2016-11-15Allow specifying patterns with `-f FILE` and `-f-`Eric Kidd
This is a somewhat basic implementation of `-f-` (#7), with unit tests. Changes include: 1. The internals of the `pattern` function have been refactored to avoid code duplication, but there's a lot more we could do. Right now we read the entire pattern list into a `Vec`. 2. There's now a `WorkDir::pipe` command that allows sending standard input to `rg` when testing. Not implemented: aho-corasick.
2016-11-11Disable symlink tests on Windows.Andrew Gallant
For some reason, these work on AppVeyor but not in other build systems. Let's just disable them. See: https://github.com/rust-lang/rust/pull/37149
2016-11-09Fix a bug with handling --ignore-file.Andrew Gallant
Namely, passing a directory to --ignore-file caused ripgrep to allocate memory without bound. The issue was that I got a bit overzealous with partial error reporting. Namely, when processing a gitignore file, we should try to use every pattern even if some patterns are invalid globs (e.g., a**b). In the process, I applied the same logic to I/O errors. In this case, it manifest by attempting to read lines from a directory, which appears to yield Results forever, where each Result is an error of the form "you can't read from a directory silly." Since I treated it as a partial error, ripgrep was just spinning and accruing each error in memory, which caused the OOM killer to kick in. Fixes #228
2016-11-06Add -m/--max-count flag.Andrew Gallant
This flag limits the number of matches printed *per file*. Closes #159
2016-11-06Fixes a bug with --smart-case.Andrew Gallant
This was a subtle bug, but the big picture was that the smart case information wasn't being carried through to the literal extraction in some cases. When this happened, it was possible to get back an incomplete set of literals, which would therefore miss some valid matches. The fix to this is to actually parse the regex and determine whether smart case applies before doing anything else. It's a little extra work, but parsing is pretty fast. Fixes #199
2016-11-05Use the bytecount crate for fast line counting.Andre Bogus
Fixes #128
2016-10-31Fixes a matching bug in the glob override matcher.Andrew Gallant
This was probably a transcription error when moving the ignore matcher code out of ripgrep core. Specifically, the override glob matcher should not ignore directories if they don't match. Fixes #206
2016-10-29Move all gitignore matching to separate crate.Andrew Gallant
This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45
2016-10-16Fix bug when processing parent gitignore files.Andrew Gallant
This particular bug was triggered whenever a search was run in a directory with a parent directory that contains a relevant .gitignore file. In particular, before matching against a parent directory's gitignore rules, a path's leading `./` was not stripped, which results in errant matching. We now make sure `./` is stripped. Fixes #184.
2016-10-10Update regex-syntax for bug fix.Andrew Gallant
The bug fix was in expression pretty printing. ripgrep parses the regex into an AST and may do some modifications to it, which requires the ability to go from string -> AST -> string' -> AST' where string == string' implies AST == AST'. Also, add a regression test for the specific regex that tripped the bug. Fixes #156.
2016-10-10Update darwin cfg attributes.Andrew Gallant
2016-10-10Disable regression_131 test on darwin.Andrew Gallant
It's not clear why it's failing. Maybe it doesn't permit certain characters in file paths?
2016-10-10Fix symlink test.Andrew Gallant
We attempt to run it on Windows, but I'm getting "access denied" errors when trying to create a file symlink. So we disable the test on Windows.
2016-10-10style nitsAndrew Gallant
2016-10-10Finish overhaul of glob matching.Andrew Gallant
This commit completes the initial move of glob matching to an external crate, including fixing up cross platform support, polishing the external crate for others to use and fixing a number of bugs in the process. Fixes #87, #127, #131
2016-10-08Always follow symlinks on explicit file arguments.Ian Kerins
2016-10-04Refactor and test glob sets.Andrew Gallant
This commit goes a long way toward refactoring glob sets so that the code is easier to maintain going forward. In particular, it makes the literal optimizations that glob sets used a lot more structured and much easier to extend. Tests have also been modified to include glob sets. There's still a bit of polish work left to do before a release. This also fixes the immediate issue where large gitignore files were causing ripgrep to slow way down. While we don't technically fix it for good, we're a lot better about reducing the number of regexes we compile. In particular, if a gitignore file contains thousands of patterns that can't be matched more simply using literals, then ripgrep will slow down again. We could fix this for good by avoiding RegexSet if the number of regexes grows too large. Fixes #134.
2016-09-28Add -s/--case-sensitive flag.Andrew Gallant
This flag overrides both --smart-case and --ignore-case. Closes #124.
2016-09-27add a max-depth option for directory traversalGarrett Squire
CR and add integration test
2016-09-26Don't print empty lines in single threaded mode.Andrew Gallant
Fixes #99.
2016-09-26Add a --null flag.Andrew Gallant
This flag causes a NUL byte to follow any file path in ripgrep's output. Closes #89.
2016-09-26Fix an off-by-one error with --column.Andrew Gallant
Fixes #105.
2016-09-25Don't replace NUL bytes when searching binary files as text.Andrew Gallant
This was a result of misinterpreting a feature in grep where NUL bytes are replaced with \n. The primary reason for doing this is to avoid excessive memory usage on truly binary data. However, grep only does this when searching binary files as if they were binary, and which only reports whether the file matched or not. When grep is told to search binary data as text (the -a/--text flag), then it doesn't do any replacement so we shouldn't either. In general, this makes sense, because the user is essentially asserting that a particular file that looks like binary is actually text. In that case, we shouldn't try to replace any NUL bytes. ripgrep doesn't actually support searching binary data for whether it matches or not, so we don't actually need the replace_buf function. However, it does seem like a potentially useful feature.
2016-09-25Don't union inner literals of repetitions.Andrew Gallant
If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`, which is wrong. This does prevent us from extracting `foofoofoo` from `foo{3}`, which is unfortunate, but we miss plenty of other stuff too. Literal extracting needs a good rethink (all the way down into the regex engine). Fixes #93
2016-09-25Permit whitelisting hidden files in ignores.Andrew Gallant
Fixes #90
2016-09-25Fix tests on Windows.Andrew Gallant
Mostly this is just using \\ instead of / in paths reported by the OS.
2016-09-24Add --files-with-matches flag.Andrew Schwartzmeyer
Closes #26. Acts like --count but emits only the paths of files with matches, suitable for piping to xargs. Both mmap and no-mmap searches terminate after the first match is found. Documentation updated and tests added.
2016-09-24Add --smart-case.Andrew Gallant
It does what it says on the tin. Closes #70.
2016-09-24Add --no-ignore-vcs flag.Andrew Gallant
This flag will respect .ignore but not .gitignore. Closes #68.
2016-09-24Don't ignore first path when using --files.Andrew Gallant
This is a docopt oddity, but probably not a bug. If --files is given, then just interpret the pattern (if not empty) as the first file path. Fixes #64.