Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This comes with PCRE 10.32 and a few new options we'll use in subsequent
commits.
|
|
This makes the case of searching for a dictionary of a very large number
of literals much much faster. (~10x or so.) In particular, we achieve this
by short-circuiting the construction of a full regex when we know we have
a simple alternation of literals. Building the regex for a large dictionary
(>100,000 literals) turns out to be quite slow, even if it internally will
dispatch to Aho-Corasick.
Even that isn't quite enough. It turns out that even *parsing* such a regex
is quite slow. So when the -F/--fixed-strings flag is set, we short
circuit regex parsing completely and jump straight to Aho-Corasick.
We aren't quite as fast as GNU grep here, but it's much closer (less than
2x slower).
In general, this is somewhat of a hack. In particular, it seems plausible
that this optimization could be implemented entirely in the regex engine.
Unfortunately, the regex engine's internals are just not amenable to this
at all, so it would require a larger refactoring effort. For now, it's
good enough to add this fairly simple hack at a higher level.
Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will
be slower, because of the aforementioned missing optimization. Moreover,
passing flags like `-i` or `-S` will cause ripgrep to abandon this
optimization and fall back to something potentially much slower. Again,
this fix really needs to happen inside the regex engine, although we
might be able to special case -i when the input literals are pure ASCII
via Aho-Corasick's `ascii_case_insensitive`.
Fixes #497, Fixes #838
|
|
|
|
This commit adds a new encoding feature where the -E/--encoding flag
will now accept a value of 'none'. When given this value, all encoding
related machinery is disabled and ripgrep will search the raw bytes of
the file, including the BOM if it's present.
Closes #1207, Closes #1208
|
|
|
|
We do the simplest possible change to migrate to the new version.
Fixes #1228
|
|
This updates all dependencies to their latest versions.
We tolerate a duplicative aho-corasick for now, which we will fix in the
next commit.
|
|
This brings in a new API for disabling BOM sniffing.
This is part of the work toward completing
https://github.com/BurntSushi/ripgrep/issues/1207
|
|
See
* https://github.com/rust-lang/regex/commit/661bf53d5b2b6dde25549aaad601ad8c59b37bfd
* https://github.com/rust-lang/regex/commit/edf45e6f5fa54705298ba14f3216cfb5277c0908
for details on the bug fix, which was in the regex engine.
Fixes #1203
|
|
|
|
This gets rid of the unmaintained crates `unreachable` and `void`. Yay!
|
|
This brings in a fix for this bug:
https://github.com/BurntSushi/aho-corasick/issues/37
Fixes #1079
|
|
|
|
|
|
|
|
|
|
This undoes the patch to stop using bytecount on big-endian
architectures. In particular, we bump our bytecount dependency to the
latest release, which has a fix.
This reverts commit a4868b88351318182eed3b801d0c97a106a7d38f.
Fixes #1144 (again), Closes #1194
|
|
|
|
This brings in an updated `encoding_rs` crate that uses `packed_simd`,
which compiles on the latest nightly. Compilation times do appear to be
impacted significantly though.
Fixes #1175 (again)
|
|
|
|
This drops dependencies on parking_lot and rand from ripgrep.
(rand is still used for tests.)
|
|
This is going to be annoying for a while if one switches between the
latest nightly compiler and older compilers. Sigh.
|
|
This commit fixes a bug where ripgrep only treated files beginning with
a `.` as hidden. On Windows, we continue this tradition, but
additionally check whether a file has the special Windows "hidden"
attribute set. If so, we treat it as a hidden file.
In order to make this work without an additional stat call, we had to
rearrange some of the plumbing from the directory traverser.
Fixes #1154
|
|
This is necessary for the use of the new is_line_anchored_{start,end}
APIs.
|
|
This fixes a bug where a BOM prefix was included. While this was somewhat
intentional in order to have a faithful "UTF8 passthru" option, in
practice, this causes problems such as breaking patterns like `^` in a
really non-obvious way.
The actual fix was to add a new API to encoding_rs_io, which this commit
brings in.
Fixes #1163
|
|
|
|
|
|
bytecount now uses runtime dispatch for enabling SIMD, which means we can
no longer need the avx-accel features. We remove it from ripgrep since the
next release will be a minor version bump, but leave them as no-ops for
the crates that previously used it.
|
|
We also increase the MSRV to 1.32, the current stable release, which sets
the stage for migrating to Rust 2018.
|
|
|
|
|
|
This commit is the result of doing:
$ cargo update
$ cargo update -p encoding_rs --precise 0.8.10
where the latter line prevents encoding_rs from updating to 0.8.11 (or
newer). In particular, the 0.8.11 release increased the minimum Rust
version to 1.29, where as ripgrep 0.10.x is still on 1.28. We stay on an
older version for now until ripgrep is ready to move to 0.11.x.
|
|
This also requires corresponding updates to both rand and rand_core. Doing
an update of rand without doing an update of rand_core results in
compilation errors because two distinct versions of rand_core are included
in the build, and the traits they expose are distinct and incompatible.
We also switch over to using tempfile instead of tempdir, which drops the
last remaining thing keeping rand 0.4 in the build.
Fixes #1141, Fixes #1142
|
|
This brings in some new Unicode properties, such as \p{Emoji}.
It is now also technically possible construct a regex that recognizes
grapheme clusters.
|
|
This commit bumps the version of encoding_rs to use the latest release.
This appears to fix a panic in UTF-16 decoding.
Fixes #1089
|
|
This update includes a work-around for a presumed bug in encoding_rs
that causes a panic:
https://github.com/hsivonen/encoding_rs/issues/34
Specifically, to reproduce this in ripgrep, one can run the following:
$ curl -LO https://cache.ruby-lang.org/pub/ruby/2.5/ruby-2.5.1.tar.gz
$ tar xf ruby-2.5.1.tar.gz
$ rg ZZZZZ ruby-2.5.1/test/rexml/data/t63-2.svg
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1'
Fixes #1052
|
|
This will allow PCRE2 to fall back to non-JIT matching when running on
platforms without JIT support.
ref https://github.com/BurntSushi/rust-pcre2/issues/3
|
|
I don't think every change here is needed, but this ensures we're using
the latest version of every direct dependency.
|
|
|
|
These are (or will be) used in grep's examples.
|
|
|