summaryrefslogtreecommitdiffstats
path: root/src/worker.rs
AgeCommit message (Collapse)Author
2018-08-20ripgrep: migrate to libripgrepAndrew Gallant
This commit does the work to delete the old `grep` crate and effectively rewrite most of ripgrep core to use the new libripgrep crates. The new `grep` crate is now a facade that collects the various crates that make up libripgrep. The most complex part of ripgrep core is now arguably the translation between command line parameters and the library options, which is ultimately where we want to be.
2018-07-21ripgrep: replace decoder with encoding_rs_ioAndrew Gallant
This commit mostly moves the transcoder implementation to its own crate: https://github.com/BurntSushi/encoding_rs_io The new crate adds clear documentation and cleans up the implementation to fully implement the contract of io::Read.
2018-07-21ripgrep: add --pre flagCharles Blake
The preprocessor flag accepts a command program and executes this program for every input file that is searched. Instead of searching the file directly, ripgrep will instead search the stdout contents of the program. Closes #978, Closes #981
2018-03-10search: add a --count-matches flagBalaji Sivaraman
This commit introduces a new flag, --count-matches, which will cause ripgrep to report a total count of all matches instead of a count of total lines matched. Closes #566, Closes #814
2018-03-10search: add -b/--byte-offset flagBalaji Sivaraman
This commit adds support for printing 0-based byte offset before each line. We handle corner cases such as `-o/--only-matching` and `-C/--context` as well. Closes #812
2018-03-10mmap: handle ENOMEM errorAndrew Gallant
This commit causes a memory map strategy to fall back to a file backed strategy if the mmap call fails with an `ENOMEM` error. Fixes #852
2018-01-31worker: better error handling for memory mapsAndrew Gallant
Previously, we would bail out of using memory maps if we could detect ahead of time that opening a memory map would fail. The only case we checked was whether the file size was 0 or not. This is actually insufficient. The mmap call can return ENODEV errors when a file doesn't support memory maps. This is the case for new files exposed by Linux, for example, /sys/devices/system/cpu/vulnerabilities/meltdown. We fix this by checking the actual error codes returned by the mmap call. If ENODEV (or EOVERFLOW) is returned, then we fall back to regular `read` calls. If any other error occurs, we report it to the user. Fixes #760
2018-01-30search: add support for searching compressed filesBalaji Sivaraman
This commit adds opt-in support for searching compressed files during recursive search. This behavior is only enabled when the `-z/--search-zip` flag is passed to ripgrep. When enabled, a limited set of common compression formats are recognized via file extension, and a new process is spawned to perform the decompression. ripgrep then searches the stdout of that spawned process. Closes #539
2018-01-01cleanup: replace try! with ?Balaji Sivaraman
2017-11-22Update to memmap 0.6Dan Burkert
`memmap` 0.6.0 introduces major API changes in anticipation of a 1.0 release. See https://github.com/danburkert/memmap-rs/releases/tag/0.6.0 for more information. CC danburkert/memmap-rs#33.
2017-03-12Add support for additional text encodings.Andrew Gallant
This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1
2016-12-23fix some clippy lints (#288)Leonardo Yvens
2016-11-20Propagate no_messages option to worker.Andrew Gallant
Fixes #241
2016-11-20Completely re-work colored output and tty handling.Andrew Gallant
This commit completely guts all of the color handling code and replaces most of it with two new crates: wincolor and termcolor. wincolor provides a simple API to coloring using the Windows console and termcolor provides a platform independent coloring API tuned for multithreaded command line programs. This required a lot more flexibility than what the `term` crate provided, so it was dropped. We instead switch to writing ANSI escape sequences directly and ignore the TERMINFO database. In addition to fixing several bugs, this commit also permits end users to customize colors to a certain extent. For example, this command will set the match color to magenta and the line number background to yellow: rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo For tty handling, we've adopted a hack from `git` to do tty detection in MSYS/mintty terminals. As a result, ripgrep should get both color detection and piping correct on Windows regardless of which terminal you use. Finally, switch to line buffering. Performance doesn't seem to be impacted and it's an otherwise more user friendly option. Fixes #37, Fixes #51, Fixes #94, Fixes #117, Fixes #182, Fixes #231
2016-11-19Add --files-without-matches flag.Daniel Luz
Performs the opposite of --files-with-matches: only shows paths of files that contain zero matches. Closes #138
2016-11-06Add --no-messages flag.Andrew Gallant
This flag is similar to what's found in grep: it will suppress all error messages, such as those shown when a particular file couldn't be read. Closes #149
2016-11-06Add -m/--max-count flag.Andrew Gallant
This flag limits the number of matches printed *per file*. Closes #159
2016-11-05Add parallel recursive directory iterator.Andrew Gallant
This adds a new walk type in the `ignore` crate, `WalkParallel`, which provides a way for recursively iterating over a set of paths in parallel while respecting various ignore rules. The API is a bit strange, as a closure producing a closure isn't something one often sees, but it does seem to work well. This also allowed us to simplify much of the worker logic in ripgrep proper, where MultiWorker is now gone.