summaryrefslogtreecommitdiffstats
path: root/crates
AgeCommit message (Collapse)Author
2023-08-05globset-0.4.13globset-0.4.13Andrew Gallant
2023-08-05globset: use non-capture groups in regex transformAndrew Gallant
We currently implement globs by converting them to regexes, and in doing so, sometimes use grouping. In all but one case, we used non-capturing groups. But for alternations, we used capturing groups, which was likely just an oversight. We don't make use of capture groups at all, and while they usually don't have any overhead, they lead to weird cases like this one: https://github.com/rust-lang/regex/issues/1059 That particular issue is also a bug in the regex crate itself, which is fixed in https://github.com/rust-lang/regex/pull/1062. Note though that the bug fix in the regex crate is required. Even with this patch to globset, memory usage is reduced (by about half in rust-lang/regex#1059) but is not returned to where it was prior to the regex 1.9 release.
2023-07-31regex: fix fast path for -w/--word-regexp flag (#2576)Andrew Gallant
It turns out our fast path for -w/--word-regexp wasn't quite correct in some cases. Namely, we use `(?m:^|\W)(<original-regex>)(?m:\W|$)` as the implementation of -w/--word-regexp since `\b(<original-regex>)\b` has some unintuitive results in certain cases, specifically when <original-regex> matches non-word characters at match boundaries. The problem is that using this formulation means that you need to extract the capture group around <original-regex> to find the "real" match, since the surrounding (^|\W) and (\W|$) aren't part of the match. This is fine, but the capture group engine is usually slow, so we have a fast path where we try to deduce the correct match boundary after an initial match (before running capture groups). The problem is that doing this is rather tricky because it's hard to know, in general, whether the `^` or the `\W` matched. This still doesn't seem quite right overall, but we at least fix one more case. Fixes #2574
2023-07-31ignore/types: add csprojVidar
Supports the .NET C# Project file extension. PR #2575
2023-07-26globset-0.4.12globset-0.4.12Andrew Gallant
2023-07-26api: impl Deserialize for GlobSetDavid Tolnay
PR #2569
2023-07-18grep-cli-0.1.9grep-cli-0.1.9Andrew Gallant
2023-07-12globset-0.4.11globset-0.4.11Andrew Gallant
2023-07-10ignore/types: add Windows Command Prompt filesmataha
This PR adds `*.bat` and `*.cmd` file types. In doing so, it makes a distinction between batch files (old standard from the MS-DOS era) and command scripts (new flavor - can operate on batch files, although `*.cmd` is preferred for various reasons, the main one being batch files will set `ERRORLEVEL` following inconsistent MS-DOS style rules[1]). PR #2556 [1]: https://groups.google.com/g/microsoft.public.win2000.cmdprompt.admin/c/XHeUq8oe2wk/m/LIEViGNmkK0J#i106
2023-07-09cli: fix non-path sorting behaviornguyenvukhang
Previously, sorting worked by sorting the parents and then sorting the children within each parent. This was done during traversal, but it only works when sorting parents preserves the overall order. This generally only works for '--sort path' in ascending order. This commit fixes the rest of the sorting behavior by collecting all of the paths to search and then sorting them before searching. We only collect all of the paths when sorting was requested. Fixes #2243, Closes #2361
2023-07-08cli: add --stop-on-nonmatch flagEdoardo Pirovano
This causes ripgrep to stop searching an individual file after it has found a non-matching line. But this only occurs after it has found a matching line. Fixes #1790, Closes #1930
2023-07-08core: lock stdout before printing an error message to stderrGarrett Thornburg
Adds a new eprintln_locked macro which locks STDOUT before logging to STDERR. This patch also replaces instances of eprintln with eprintln_locked to avoid interleaving lines. Fixes #1941, Closes #1968
2023-07-08globset: add 'escape' routinepiegames
Fixes #2060, Closes #2061
2023-07-08cli: force binary existance checkSeth Stadick
Previously, we were only doing a binary existence check on Windows. And in fact, the main point there wasn't binary existence, but ensuring we didn't accidentally resolve a binary name relative to the CWD, which could result in executing a program one didn't mean to run. However, it is useful to be able to check whether a binary exists on any platform when associating a glob with a binary. If the binary doesn't exist, then the association can fail eagerly and let some other glob apply. Closes #1946
2023-07-08doc: clarify the comment on `Worker.work_done`Michal Terepeta
We call `work_done` only once the work has been actually performed (otherwise `num_pending` could go to 0 before the actual work is done). Closes #2039
2023-07-08doc: improve -r/--replace flag syntax docsKyle Todeschini
Fixes #2108, Closes #2123
2023-07-08ignore/types: name aliases for file typeskotborealis
We also make py/python, md/markdown and ts/typescript aliases of one another. Note that this only introduces aliases at the point where default types are defined. This just makes them a bit easier to read/write, and also makes it easier to expose more names that describe the same thing. Fixes #1857, Closes #1895
2023-07-08ignore/types: add 'typescript' alias for 'ts'Klas Mellbourn
Closes #2009
2023-07-08ignore/types: add Ada filetypes, including gprbuild and alireTama McGlinn
*.adb and *.ads are the usual extensions for Ada source code, and *.gpr indicates a GPRbuild project file used for Ada, and these days often being combined with alire for package dependency resolution. Alire stores a bunch of files named alire.toml in different directories in your (gitignored) cache/dependencies/... Closes #2013
2023-07-08ignore/types: add raku extensions to ignore typesJuan Francisco Cantero Hurtado
Closes #2117
2023-07-08ignore/types: add MDX format to Markdown typesAndrew Gallant
Ref https://mdxjs.com/ Closes #2142
2023-07-08ignore/types: add DITA (Darwin Information Typing Architecture)chrispy
Closes #2148
2023-07-08doc: fix typoLudi Rehak
Closes #2153
2023-07-08doc: fix some typoscuishuang
Closes #2195
2023-07-08cli: '--no-ignore-dot' should also '.rgignore'Richard Sternagel
Fixes #2198, Closes #2202
2023-07-08ignore/types: fix formattingAndrew Gallant
2023-07-08ignore/types: added V typeedam
V (http://vlang.io) uses '.v' files. Closes #2302
2023-07-08globset: introduce option to keep empty alternatesAlex Rawson
Add a method GlobBuilder::empty_alternates and supporting mechanisms. Ref #1368 Closes #2369
2023-07-08globset: permit deserializing Glob from StringJérome Eertmans
Closes #2386, Closes #2388
2023-07-08ignore/types: add USD to the default file typesMark Sisson
Closes #2432
2023-07-08ignore/types: add Gentoo eclass typeSam James
Eclasses are "ebuild libraries" and generally if you're filtering for/filtering out an ebuild/eclass, you don't want the other either. Followup to 4dfea016b915bb1e88679361de83a91e60447835 Closes #2437
2023-07-08ignore/types: improve Elixir globsangrycandy
Closes #2450
2023-07-08core: don't let context flags override eachotherAndrew Gallant
This matches the behavior of GNU grep which does not ignore before-context and after-context completely if the context flag is also provided. Note that this change wasn't done just to match GNU grep. In this case, GNU grep has the more sensible behavior. Fixes #2288, Closes #2451
2023-07-08doc: note '-n' and '-N' override each otherMisaki
Closes #2460
2023-07-08ignore/gitignore: expose `gitconfig_excludes_path`Eric Arellano
I have reservations about this, but it looks useful and doesn't seem terribly onerous to support. The `ignore` crate will really always need to have some kind of logic supporting this in some form I think. Closes #2482
2023-07-08ignore: tweak regex crate featuresJakub Jirutka
This removes most of the Unicode features as they aren't currently used. We can always add them back later if necessary. We can avoid the unicode-perl feature by changing `\s` to `[[:space:]]`, which uses the ASCII-only definition of `\s`. Since we don't expect non-ASCII whitespace in git config files, this seems okay. Closes #2502
2023-07-08ignore/types: add 'graphql' typeJon Parise
GraphQL file extensions: .graphql and .graphqls (schema) We could also add `.gql`, but perhaps it's less correct to do so. We'll start conservatively here, and we can always add `.gql` later. Closes #2439, Closes #2508
2023-07-08cli: make resolve_binary take COM executables into accountmataha
When `resolve_binary()` attempts to resolve a path to a program on Windows while searching for a program in `PATH` without an extension, `ripgrep` will assume the extension of the file to be `.exe` as it's the *de facto* standard, which will work most (99.99%) of the time... ...unless the binary is a COM executable (we're on Windows, duh). Closes #2523
2023-07-08ignore/types: add cml to the default types listYifei Teng
It's used in Fuchsia to mean "component manifest language."[1] [1]: https://fuchsia.dev/reference/cml?hl=en Closes #2529
2023-07-05grep-cli-0.1.8grep-cli-0.1.8Andrew Gallant
2023-07-05regex: remove old inner literal extractorAndrew Gallant
(It had already been removed from the crate.)
2023-07-05deps: drop temporary patch and move to bstr 1.6Andrew Gallant
Now that regex 1.9 is out, we can depend on it from crates.io.
2023-07-05regex: add new inner literal extractorAndrew Gallant
This is mostly a copy of the prefix literal extractor in regex-syntax, but with a tweaked notion of Seq that keeps track of whether it's a prefix of an expression or not. If it isn't, then we can't cross it as a suffix to another Seq. This new extractor should be a lot more robust than the old one. We actually will keep going through the regex to try and find the "best" literals to search for (according to some heuristic).
2023-07-05regex: tweak formatting of regex-automata version specAndrew Gallant
This makes it easier to enable the `logging` feature for regex-automata. I wish I could just enable it unconditionally, but it winds up producing a lot of output because ripgrep uses regexes for things other than the primary search (like every glob). Sigh.
2023-07-05regex: refactor matcher constructionAndrew Gallant
This does a little bit of refactoring so that we can pass both a ConfiguredHIR and a Regex to the inner literal extraction routine. One downside of this approach is that a regex object hangs on to a ConfiguredHIR. But the extra memory usage is probably negligible. A benefit though is that converting the HIR to its concrete syntax is now lazy and only happens when logging is enabled.
2023-07-05regex: tweak DFA settingsAndrew Gallant
This increases the limits a bit for when the regex engine will build and use a fully compiled DFA. They can faster in some circumstances. For example, '(?-u)^\w{30,}$' gets a nice speed boost from state acceleration. We are also able to remove `regex` proper as a dependency. Wow.
2023-07-05regex: push more pattern handling to matcher constructionAndrew Gallant
Previously, ripgrep core was responsible for escaping regex patterns and implementing the --line-regexp flag. This commit moves that responsibility down into the matchers such that ripgrep just needs to hand the patterns it gets off to the matcher builder. The builder will then take care of escaping and all that. This was done to make pattern construction completely owned by the matcher builders. With the arrival regex-automata, this means we can move to the HIR very quickly and then never move back to the concrete syntax. We can then build our regex directly from the HIR. This overall can save quite a bit of time, especially when searching for large dictionaries. We still aren't quite as fast as GNU grep when searching something on the scale of /usr/share/dict/words, but we are basically within spitting distance. Prior to this, we were about an order of magnitude slower. This architecture in particular lets us write a pretty simple fast path that avoids AST parsing and HIR translation entirely: the case where one is just searching for a literal. In that case, we can hand construct the HIR directly.
2023-07-05globset: fix build error in testsAndrew Gallant
I guess we haven't been testing with the Serde feature enabled? Weird.
2023-07-05deps: update to pcre2 0.2.4Andrew Gallant
0.2.4 updates to PCRE2 10.42 and has a few other nice changes. For example, when `utf` is enabled, the crate will always set the PCRE2_MATCH_INVALID_UTF option. That means we no longer need to do transcoding or UTF-8 validity checks. Because of this, we actually get to remove one of the two uses of `unsafe` in ripgrep's `main` program. (This also updates a couple other dependencies for convenience.)
2023-07-05regex: small cleanupsAndrew Gallant
Just some small polishing. We also get rid of thread_local in favor of using regex-automata, mostly just in the name of reducing dependencies. (We should eventually be able to drop thread_local completely.)