summaryrefslogtreecommitdiffstats
path: root/src
AgeCommit message (Collapse)Author
2017-03-12Add new -M/--max-columns option.Ralf Jung
This permits setting the maximum line width with respect to the number of bytes in a line. Omitted lines (whether part of a match, replacement or context) are replaced with a message stating that the line was elided. Fixes #129
2017-03-12No line numbers when searching only stdin.Andrew Gallant
This changes the default behavior of ripgrep to *not* show line numbers when it is printing to a tty and is only searching stdin. Fixes #380 [breaking-change]
2017-03-12Stop aggressive inlining.Andrew Gallant
It's not clear what exactly is happening here, but the Read implementation for text decoding appears a bit sensitive. Small pertubations in the code appear to have a nearly 100% impact on the overall speed of ripgrep when searching UTF-16 files. I haven't had the time to examine the generated code in detail, but `perf stat` seems to think that the instruction cache is performing a lot worse when the code slows down. This might mean that excessive inlining causes a different code structure that leads to less-than-optimal icache usage, but it's at best a guess. Explicitly disabling the inline for the cold path seems to help the optimizer figure out the right thing.
2017-03-12Add support for additional text encodings.Andrew Gallant
This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1
2017-03-08Remove clap validator + add max-filesize integration testsMarc Tiehuis
2017-03-08Add `--max-filesize` option to clitiehuis
The --max-filesize option allows filtering files which are larger than the specified limit. This is potentially useful if one is attempting to search a number of large files without common file-types/suffixes. See #369.
2017-02-18Tweak how binary files are handled internally.Andrew Gallant
This commit fixes two issues. The first issue is that if a file contained many NUL bytes without any LF bytes, then the InputBuffer would read the entire file into memory. This is not typically a problem, but if you run rg on /proc, then bad things can happen when reading virtual memory mapping files. Arguably, such files should be ignored, but we should also try to avoid exhausting memory too. We fix this by pushing the `-a/--text` flag option down into InputBuffer, so that it knows to stop immediately if it finds a NUL byte. The other issue this fixes is that binary detection is now applied to every buffer instead of just the first one. This helps avoid detecting too many files as plain text if the first parts of a binary file happen to contain no NUL bytes. This issue still persists somewhat in the memory map searcher, since we probably don't want to search the entire file upfront for NUL bytes before actually performing our search. Instead, we search the first 10KB for now. Fixes #52, Fixes #311
2017-02-18Don't parses regexes with --files.Andrew Gallant
When the --files flag is given, ripgrep would still try to parse some of the positional arguments as regexes. Don't do that. Fixes #326
2017-02-18Permit --heading to override --no-heading.Andrew Gallant
@kbknapp <3 Fixes #327
2017-02-18Remove Windows deps from ripgrep proper.Andrew Gallant
All Windows specific code has been (mostly) pushed out of ripgrep and into its constituent libraries.
2017-02-09termcolor: add support for output to standard errorPeter Williams
This is essentially a rename of the existing `Stdout` type to `StandardStream` and a change of its constructor from a single `new()` function to have two `stdout()` and `stderr()` functions. Under the hood, we add add internal IoStandardStream{,Lock} enums that allow us to abstract between Stdout and Stderr conveniently. The rest of the needed changes then fall out fairly naturally. Fixes #324. [breaking-change]
2017-01-15Replace internal atty module with atty crate.Andrew Gallant
This removes all use of explicit unsafe in ripgrep proper except for one: accessing the contents of a memory map. (Which may never go away.)
2017-01-13Use basic SGR sequences when possible.Andrew Gallant
In Emacs, its terminal apparently doesn't support "extended" sets of foreground/background colors. Unless we need to set an "intense" color, we should instead use one of the eight basic color codes. Also, remove the "intense" setting from the default set of colors. It doesn't do much anyway and enables the default color settings to work in Emacs out of the box. Fixes #182 (again)
2017-01-11Make --column imply --line-number.Andrew Gallant
Closes #243
2017-01-10Add --path-separator flag.Andrew Gallant
This flag permits setting the path separator used for all file paths printed by ripgrep in normal operation. Fixes #275
2017-01-10Add example to -r/--replace docs.Andrew Gallant
Fixes #308
2017-01-09Don't search stdout redirected file.Andrew Gallant
When running ripgrep like this: rg foo > output we must be careful not to search `output` since ripgrep is actively writing to it. Searching it can cause massive blowups where the file grows without bound. While this is conceptually easy to fix (check the inode of the redirection and the inode of the file you're about to search), there are a few problems with it. First, inodes are a Unix thing, so we need a Windows specific solution to this as well. To resolve this concern, I created a new crate, `same-file`, which provides a cross platform abstraction. Second, stat'ing every file is costly. This is not avoidable on Windows, but on Unix, we can get the inode number directly from directory traversal. However, this information wasn't exposed, but now it is (through both the ignore and walkdir crates). Fixes #286
2017-01-08Remove trivial condition.Daniel Luz
2017-01-07Provide a mechanism to compose type definitionsIan Kerins
This extends the syntax of the --type-add flag to allow including the globs of other already defined types. Fixes #83.
2017-01-06Fix spacing issue in --help output.Andrew Gallant
2017-01-06Add --sort-files flag.Andrew Gallant
When used, parallelism is disabled but the results are sorted by file path. Closes #263
2017-01-06Rejigger bold and intense settings.Andrew Gallant
Previously, ripgrep would only emit the 'bold' ANSI escape sequence if no foreground or background color was set. Instead, it would convert colors to their "intense" versions if bold was set. The intent was to do the same thing on Windows and Unix. However, this had a few negative side effects: 1. Omitting the 'bold' ANSI escape when 'bold' was set is surprising. 2. Intense colors can look quite bad and be hard to read. To fix this, we introduce a new setting called 'intense' in the --colors flag, and thread that down through to the public API of the `termcolor` crate. The 'intense' setting has environment specific behavior: 1. In ANSI mode, it will convert the selected color to its "intense" variant. 2. In the Windows console, it will make the text "intense." There is no longer any "smart" handling of the 'bold' style. The 'bold' ANSI escape is always emitted when it is selected. In the Windows console, the 'bold' setting now has no effect. Note that this is a breaking change. Fixes #266, #293
2017-01-01Update to regex 0.2.Andrew Gallant
2016-12-24Remove special ^C handling.Andrew Gallant
This means that ripgrep will no longer try to reset your colors in your terminal if you kill it while searching. This could result in messing up the colors in your terminal, and the fix is to simply run some other command that resets them for you. For example: $ echo -ne "\033[0m" The reason why the ^C handling was removed is because it is irrevocably broken on Windows and is impossible to do correctly and efficiently in ANSI terminals. Fixes #281
2016-12-24Small code cleanups.Andrew Gallant
2016-12-23fix some clippy lints (#288)Leonardo Yvens
2016-12-12Make backreference support clear.Andrew Gallant
Fixes #268.
2016-12-06Fix leading hypen bug by updating clap.Andrew Gallant
Fixes #270
2016-12-04Simplify code.Andrew Gallant
Instead of `Ok(n) if n == 0` we can just write `Ok(0)`.
2016-11-28Clarify use of --heading/--no-heading.Andrew Gallant
Fixes #247.
2016-11-21Only emit bold ANSI code if bold is true.Andrew Gallant
This was a simple logic error. Also, avoid emitting ANSI escape codes if there are no color settings. Fixes #242
2016-11-20Get rid of special mmap decision on Windows.Andrew Gallant
I spent some quality time on my Windows 10 laptop and it appears to suffer from a similar trade-off as on Linux: mmaps are bad for large directory traversals but good for single large files. Darwin continues to reject memory maps in all cases (unless explicitly requested), but more testing should be done there.
2016-11-20Propagate no_messages option to worker.Andrew Gallant
Fixes #241
2016-11-20Completely re-work colored output and tty handling.Andrew Gallant
This commit completely guts all of the color handling code and replaces most of it with two new crates: wincolor and termcolor. wincolor provides a simple API to coloring using the Windows console and termcolor provides a platform independent coloring API tuned for multithreaded command line programs. This required a lot more flexibility than what the `term` crate provided, so it was dropped. We instead switch to writing ANSI escape sequences directly and ignore the TERMINFO database. In addition to fixing several bugs, this commit also permits end users to customize colors to a certain extent. For example, this command will set the match color to magenta and the line number background to yellow: rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo For tty handling, we've adopted a hack from `git` to do tty detection in MSYS/mintty terminals. As a result, ripgrep should get both color detection and piping correct on Windows regardless of which terminal you use. Finally, switch to line buffering. Performance doesn't seem to be impacted and it's an otherwise more user friendly option. Fixes #37, Fixes #51, Fixes #94, Fixes #117, Fixes #182, Fixes #231
2016-11-19Rename --files-without-matches to --files-without-match.Andrew Gallant
This is to be consistent with grep.
2016-11-19Add --files-without-matches flag.Daniel Luz
Performs the opposite of --files-with-matches: only shows paths of files that contain zero matches. Closes #138
2016-11-17Fix stdin bug with --file.Andrew Gallant
When `rg -f-` is used, the default search path should be `./` and not `-`.
2016-11-17Switch from Docopt to Clap.Andrew Gallant
There were two important reasons for the switch: 1. Performance. Docopt does poorly when the argv becomes large, which is a reasonable common use case for search tools. (e.g., use with xargs) 2. Better failure modes. Clap knows a lot more about how a particular argv might be invalid, and can therefore provide much clearer error messages. While both were important, (1) made it urgent. Note that since Clap requires at least Rust 1.11, this will in turn increase the minimum Rust version supported by ripgrep from Rust 1.9 to Rust 1.11. It is therefore a breaking change, so the soonest release of ripgrep with Clap will have to be 0.3. There is also at least one subtle breaking change in real usage. Previous to this commit, this used to work: rg -e -foo Where this would cause ripgrep to search for the string `-foo`. Clap currently has problems supporting this use case (see: https://github.com/kbknapp/clap-rs/issues/742), but it can be worked around by using this instead: rg -e [-]foo or even rg [-]foo and this still works: rg -- -foo This commit also adds Bash, Fish and PowerShell completion files to the release, fixes a bug that prevented ripgrep from working on file paths containing invalid UTF-8 and shows short descriptions in the output of `-h` but longer descriptions in the output of `--help`. Fixes #136, Fixes #189, Fixes #210, Fixes #230
2016-11-15Allow specifying patterns with `-f FILE` and `-f-`Eric Kidd
This is a somewhat basic implementation of `-f-` (#7), with unit tests. Changes include: 1. The internals of the `pattern` function have been refactored to avoid code duplication, but there's a lot more we could do. Right now we read the entire pattern list into a `Vec`. 2. There's now a `WorkDir::pipe` command that allows sending standard input to `rg` when testing. Not implemented: aho-corasick.
2016-11-09Rework parallelism in directory iterator.Andrew Gallant
Previously, ignore::WalkParallel would invoke the callback for all *explicitly* given file paths in a single thread, which effectively meant that `rg pattern foo bar baz ...` didn't actually search foo, bar and baz in parallel. The code was structured that way to avoid spinning up workers if no directory paths were given. The original intention was probably to have a separate pool of threads responsible for searching, but ripgrep ended up just reusing the ignore::WalkParallel workers themselves for searching, and thereby subjected to its sub-par performance in this case. The code has been restructured so that file paths are sent to the workers, which brings back parallelism. Fixes #226
2016-11-06rewordAndrew Gallant
2016-11-06rewordAndrew Gallant
2016-11-06Don't ever search directories.Andrew Gallant
2016-11-06Always search paths given by user.Andrew Gallant
This permits doing `rg -a test /dev/sda1` for example, where as before /dev/sda1 was skipped because it wasn't a regular file.
2016-11-06Add --no-messages flag.Andrew Gallant
This flag is similar to what's found in grep: it will suppress all error messages, such as those shown when a particular file couldn't be read. Closes #149
2016-11-06Add -m/--max-count flag.Andrew Gallant
This flag limits the number of matches printed *per file*. Closes #159
2016-11-06Include the name "ripgrep" in more places.Andrew Gallant
Fixes #203
2016-11-06Note -e/--regexp's additional usefulness.Andrew Gallant
Specifically, it can be used when searching for patterns that start with a dash. Fixes #215
2016-11-05Use the bytecount crate for fast line counting.Andre Bogus
Fixes #128
2016-11-05Add parallel recursive directory iterator.Andrew Gallant
This adds a new walk type in the `ignore` crate, `WalkParallel`, which provides a way for recursively iterating over a set of paths in parallel while respecting various ignore rules. The API is a bit strange, as a closure producing a closure isn't something one often sees, but it does seem to work well. This also allowed us to simplify much of the worker logic in ripgrep proper, where MultiWorker is now gone.