summaryrefslogtreecommitdiffstats
path: root/grep-searcher
AgeCommit message (Collapse)Author
2019-04-05searcher: partially migrate to bstrAndrew Gallant
This commit causes grep-searcher to use byte strings internally for its line buffer support. We manage to remove a use of `unsafe` by doing this (by pushing it down into `bstr`). We stop short of using byte strings everywhere else because we rely heavily on the `impl ops::Index<[u8]> for grep_matcher::Match` impl, which isn't available for byte strings. (It is premature to make bstr a public dep of a core crate like grep-matcher, but maybe some day.)
2019-02-10grep-searcher-0.1.3grep-searcher-0.1.3Andrew Gallant
2019-02-10searcher: revert big-endian patchAndrew Gallant
This undoes the patch to stop using bytecount on big-endian architectures. In particular, we bump our bytecount dependency to the latest release, which has a fix. This reverts commit a4868b88351318182eed3b801d0c97a106a7d38f. Fixes #1144 (again), Closes #1194
2019-02-09grep-searcher-0.1.2grep-searcher-0.1.2Andrew Gallant
2019-02-09searcher: use naive line counting on big-endianAndrew Gallant
This patches out bytecount's "fast" vectorized algorithm on big-endian machines, where it has been observed to fail. Going forward, bytecount should probably fix this on their end, but for now, we take a small performance hit on big-endian machines. Fixes #1144
2019-01-26regex: make CRLF hack more robustAndrew Gallant
This commit improves the CRLF hack to be more robust. In particular, in addition to rewriting `$` as `(?:\r??$)`, we now strip `\r` from the end of a match if and only if the regex has an ending line anchor required for a match. This doesn't quite make the hack 100% correct, but should fix most use cases in practice. An example of a regex that will still be incorrect is `foo|bar$`, since the analysis isn't quite sophisticated enough to determine that a `\r` can be safely stripped from any match. Even if we fix that, regexes like `foo\r|bar$` still won't be handled correctly. Alas, more work on this front should really be focused on enabling this in the regex engine itself. The specific cause of this bug was that grep-searcher was sneakily stripping CRLF from matching lines when it really shouldn't have. We remove that code now, and instead rely on better match semantics provided at a lower level. Fixes #1095
2019-01-25searcher: always strip BOMAndrew Gallant
This fixes a bug where a BOM prefix was included. While this was somewhat intentional in order to have a faithful "UTF8 passthru" option, in practice, this causes problems such as breaking patterns like `^` in a really non-obvious way. The actual fix was to add a new API to encoding_rs_io, which this commit brings in. Fixes #1163
2019-01-19deps: update memmapAndrew Gallant
2019-01-19deps: update to bytecount 0.5Andrew Gallant
bytecount now uses runtime dispatch for enabling SIMD, which means we can no longer need the avx-accel features. We remove it from ripgrep since the next release will be a minor version bump, but leave them as no-ops for the crates that previously used it.
2019-01-19deps: update various dependenciesAndrew Gallant
We also increase the MSRV to 1.32, the current stable release, which sets the stage for migrating to Rust 2018.
2019-01-11grep-searcher: add docs for assert_eq_printedAndrew Gallant
Looks like the deny(missing_docs) lint got a bit stronger.
2018-10-22deps: update encoding_rsAndrew Gallant
This commit bumps the version of encoding_rs to use the latest release. This appears to fix a panic in UTF-16 decoding. Fixes #1089
2018-09-25grep-searcher: update to encoding_rs_io 0.1.3Andrew Gallant
This update includes a work-around for a presumed bug in encoding_rs that causes a panic: https://github.com/hsivonen/encoding_rs/issues/34 Specifically, to reproduce this in ripgrep, one can run the following: $ curl -LO https://cache.ruby-lang.org/pub/ruby/2.5/ruby-2.5.1.tar.gz $ tar xf ruby-2.5.1.tar.gz $ rg ZZZZZ ruby-2.5.1/test/rexml/data/t63-2.svg thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1' Fixes #1052
2018-09-07deps: update versions for all cratesAndrew Gallant
I don't think every change here is needed, but this ensures we're using the latest version of every direct dependency.
2018-09-07grep-searcher: add Box<...> impl for SinkAndrew Gallant
We initially did not have this impl because the first revision of the Sink trait was much more complicated. In particular, each method was parameterized over a Matcher. But not every Sink impl actually needs a Matcher, and it is just as easy to borrow a Matcher explicitly, so the added parameterization wasn't holding its own. This does permit Sink implementations to be used as trait objects. One key use case here is to reduce compile times, since there is quite a bit of code inside grep-searcher that is parameterized on Sink. Unfortunately, that code is *also* parameterized on Matcher, and the various printers in grep-printer are also parameterized on Matcher, which means Sink trait objects are necessary but no sufficient for a major reduction in compile times. Unfortunately, the path to making Matcher object safe isn't quite clear. Extension traits maybe? There's also stuff in the Serde ecosystem that might help, but the type shenanigans can get pretty gnarly.
2018-09-07grep-printer: delete unused codeAndrew Gallant
2018-08-20deps: update libripgrep crate versionsAndrew Gallant
This prepares them for an initial 0.1.0 release.
2018-08-20libripgrep: initial commit introducing libripgrepAndrew Gallant
libripgrep is not any one library, but rather, a collection of libraries that roughly separate the following key distinct phases in a grep implementation: 1. Pattern matching (e.g., by a regex engine). 2. Searching a file using a pattern matcher. 3. Printing results. Ultimately, both (1) and (3) are defined by de-coupled interfaces, of which there may be multiple implementations. Namely, (1) is satisfied by the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by the `Sink` trait in the `grep2` crate. The searcher (2) ties everything together and finds results using a matcher and reports those results using a `Sink` implementation. Closes #162