Age | Commit message (Collapse) | Author |
|
Treat anything with a `.lock` extension as a lock file, with
an extra rule or two for special cases, e.g., package-lock.json.
|
|
This makes the case of searching for a dictionary of a very large number
of literals much much faster. (~10x or so.) In particular, we achieve this
by short-circuiting the construction of a full regex when we know we have
a simple alternation of literals. Building the regex for a large dictionary
(>100,000 literals) turns out to be quite slow, even if it internally will
dispatch to Aho-Corasick.
Even that isn't quite enough. It turns out that even *parsing* such a regex
is quite slow. So when the -F/--fixed-strings flag is set, we short
circuit regex parsing completely and jump straight to Aho-Corasick.
We aren't quite as fast as GNU grep here, but it's much closer (less than
2x slower).
In general, this is somewhat of a hack. In particular, it seems plausible
that this optimization could be implemented entirely in the regex engine.
Unfortunately, the regex engine's internals are just not amenable to this
at all, so it would require a larger refactoring effort. For now, it's
good enough to add this fairly simple hack at a higher level.
Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will
be slower, because of the aforementioned missing optimization. Moreover,
passing flags like `-i` or `-S` will cause ripgrep to abandon this
optimization and fall back to something potentially much slower. Again,
this fix really needs to happen inside the regex engine, although we
might be able to special case -i when the input literals are pure ASCII
via Aho-Corasick's `ascii_case_insensitive`.
Fixes #497, Fixes #838
|
|
|
|
This commit adds a new encoding feature where the -E/--encoding flag
will now accept a value of 'none'. When given this value, all encoding
related machinery is disabled and ripgrep will search the raw bytes of
the file, including the BOM if it's present.
Closes #1207, Closes #1208
|
|
PR #1205
|
|
This adds a badge to the README.md file indicating to users that click
on it if their os/distro carries that latest version of ripgrep.
PR #1213
|
|
PR #1222
|
|
|
|
This is useful for debugging to see what regex is actually being run.
We put this as a trace since the regex can be quite gnarly. (It is not
pretty printed.)
|
|
When looking for an inner literal to speed up searches, if only a prefix
is found, then we generally give up doing inner literal optimizations since
the regex engine will generally handle it for us. Unfortunately, this
decision was being made *before* we wrap the regex in (^|\W)...($|\W) when
using the -w/--word-regexp flag, which would then defeat the literal
optimizations inside the regex engine.
We fix this with a bit of a hack that says, "if we're doing a word regexp,
then give me back any literal you find, even if it's a prefix."
|
|
This tweaks the path handling functions slightly to make them a hair
faster. In particular, `file_name` is called on every path that ripgrep
visits, and it was possible to remove a few branches without changing
behavior.
|
|
This simplifies the various path related functions and pushed more platform
dependent code down into bstr. This likely also makes things a bit more
efficient on Windows, since we now only do a single UTF-8 check for each
file path.
|
|
This uses bstr in the unescaping logic. This lets us remove some platform
specific code, and also lets us remove a hacked UTF-8 decoder on raw
bytes.
|
|
This lets us implement correct Unicode trimming and also simplifies the
parsing logic a bit. This also removes the last platform specific bits of
code in ripgrep core.
|
|
This starts the usage of bstr in the printer. We don't use it too much
yet, but it comes in handy for implementing PrinterPath and lets us push
down some platform specific code into bstr.
|
|
This commit causes grep-searcher to use byte strings internally for its
line buffer support. We manage to remove a use of `unsafe` by doing this
(by pushing it down into `bstr`).
We stop short of using byte strings everywhere else because we rely
heavily on the `impl ops::Index<[u8]> for grep_matcher::Match` impl,
which isn't available for byte strings. (It is premature to make bstr a
public dep of a core crate like grep-matcher, but maybe some day.)
|
|
Rust is having problems with trusty, in particular, see this bug I
filed: https://github.com/rust-lang/rust/issues/59411
This was purpotedly fixed in
https://github.com/rust-lang/rust/pull/59468,
but it appears the issue is still occurring.
This commit tries to update to Ubuntu 16.04 in the hope that it will fix
this problem.
|
|
We do the simplest possible change to migrate to the new version.
Fixes #1228
|
|
This updates all dependencies to their latest versions.
We tolerate a duplicative aho-corasick for now, which we will fix in the
next commit.
|
|
This brings in a new API for disabling BOM sniffing.
This is part of the work toward completing
https://github.com/BurntSushi/ripgrep/issues/1207
|
|
See
* https://github.com/rust-lang/regex/commit/661bf53d5b2b6dde25549aaad601ad8c59b37bfd
* https://github.com/rust-lang/regex/commit/edf45e6f5fa54705298ba14f3216cfb5277c0908
for details on the bug fix, which was in the regex engine.
Fixes #1203
|
|
Despite the fact that we mention this in several places, people are
still surprised by ripgrep's "smart" filtering.
|
|
|
|
This gets rid of the unmaintained crates `unreachable` and `void`. Yay!
|
|
This brings in a fix for this bug:
https://github.com/BurntSushi/aho-corasick/issues/37
Fixes #1079
|
|
|
|
|
|
This attempts to make Appveyor more conservative in what tags it thinks
are releases. I don't know for sure, but it looks like the previous
regex could match anywhere, so we anchor it.
Fixes #1195
|
|
|
|
|
|
This undoes the patch to stop using bytecount on big-endian
architectures. In particular, we bump our bytecount dependency to the
latest release, which has a fix.
This reverts commit a4868b88351318182eed3b801d0c97a106a7d38f.
Fixes #1144 (again), Closes #1194
|
|
We did this in 05411b2b for core ripgrep, but didn't carry it over to
tests.
|
|
|
|
This patches out bytecount's "fast" vectorized algorithm on big-endian
machines, where it has been observed to fail. Going forward, bytecount
should probably fix this on their end, but for now, we take a small
performance hit on big-endian machines.
Fixes #1144
|
|
PR #1191
|
|
This brings in an updated `encoding_rs` crate that uses `packed_simd`,
which compiles on the latest nightly. Compilation times do appear to be
impacted significantly though.
Fixes #1175 (again)
|
|
PR #1182
|
|
We bumped it a while back in the CI configuration, but didn't update the
README.
|
|
|
|
This drops dependencies on parking_lot and rand from ripgrep.
(rand is still used for tests.)
|
|
Our MSRV is high enough that we can use const functions now.
|
|
This is going to be annoying for a while if one switches between the
latest nightly compiler and older compilers. Sigh.
|
|
Fedora 27 and below are past their EOL, so it can now be said that it's
supported regularly on Fedora.
PR #1177
|
|
This was fixed by bumping the MSRV above Rust 1.28.
Fixes #916
|
|
|
|
This commit fixes a bug where ripgrep only treated files beginning with
a `.` as hidden. On Windows, we continue this tradition, but
additionally check whether a file has the special Windows "hidden"
attribute set. If so, we treat it as a hidden file.
In order to make this work without an additional stat call, we had to
rearrange some of the plumbing from the directory traverser.
Fixes #1154
|
|
Notably, ripgrep can do multiline search now. We also update the
supported compression format list and replace deprecated flags like
`--sort-files` with `--sort path`.
|
|
This fixes what appears to be a pretty egregious regression where the
`-F/--fixed-strings` flag wasn't be applied to patterns supplied via
the `-f/--file` flag. The same bug existed for the `-x/--line-regexp`
flag as well, which we fix here.
Fixes #1176
|
|
This changes how ripgrep emit exit status codes. In particular, any error
that occurs while searching will now cause ripgrep to emit a `2` exit
code, where as it previously would emit either a `0` or a `1` code based
on whether it matched or not. That is, ripgrep would only emit a `2` exit
code for a catastrophic error.
This tweak includes additional logic that GNU grep adheres to, which seems
like good sense. Namely, if -q/--quiet is given, and an error occurs and
a match occurs, then ripgrep will emit a `0` exit code.
Closes #1159
|
|
Previously, we relied on clap to handle printing either an error
message, or --help/--version output, in addition to setting the exit
status code. Unfortunately, for --help/--version output, clap was
panicking if the write failed, which can happen in fairly common
scenarios via a broken pipe error. e.g., `rg -h | head`.
We fix this by using clap's "safe" API and doing the printing ourselves.
We also set the exit code to `2` when an invalid command has been given.
Fixes #1125 and partially addresses #1159
|