summaryrefslogtreecommitdiffstats
path: root/ignore/src
AgeCommit message (Collapse)Author
2020-02-17repo: move all source code in crates directoryAndrew Gallant
The top-level listing was just getting a bit too long for my taste. So put all of the code in one directory and shrink the large top-level mess to a small top-level mess. NOTE: This commit only contains renames. The subsequent commit will actually make ripgrep build again. We do it this way with the naive hope that this will make it easier for git history to track the renames. Sigh.
2020-02-17style: rustfmt everythingAndrew Gallant
This is why I was so intent on clearing the PR queue. This will effectively invalidate all existing patches, so I wanted to start from a clean slate. We do make one little tweak: we put the default type definitions in their own file and tell rustfmt to keep its grubby mits off of it. We also sort it lexicographically and hopefully will enforce that from here on.
2020-02-17ignore: rework inter-thread messagingzsugabubus
Change the meaning of `Quit` message. Now it means terminate. The final "dance" is unnecessary, because by the time quitting begins, no thread will ever spawn a new `Work`. The trick was to replace the heuristic spin-loop with blocking receive. Closes #1337
2020-02-17ignore: treat symbolic links to directories as directoriesAndrew Gallant
Due to how walkdir works if symlinks are not followed, symlinks to directories are seen as simple files by ripgrep. This caused a panic in some cases due to receiving a WalkEvent::Exit event without a corresponding WalkEvent::Dir event. This is fixed by looking at the metadata of the file in the case of a symlink to determine if it's a directory. We are careful to only do this stat check when the depth of the entry is 0, as this bug only impacts us when 1) we aren't following symlinks generally and 2) the user provides a symlinked directory that we do follow as a top-level path to search. Fixes #1389, Closes #1397
2020-02-17cli: add --no-require-git flagAndrew Gallant
This flag prevents ripgrep from requiring one to search a git repository in order to respect git-related ignore rules (global, .gitignore and local excludes). This actually corresponds to behavior ripgrep had long ago, but #934 changed that. It turns out that users were relying on this buggy behavior. In most cases, fixing it as simple as converting one's rules to .ignore or .rgignore files. Unfortunately, there are other use cases---like Perforce automatically respecting .gitignore files---that make a strong case for ripgrep to at least support this. The UX of a flag like this is absolutely atrocious. It's so obscure that it's really not worth explicitly calling it out anywhere. Moreover, the error cases that occur when this flag isn't used (but its behavior is desirable) will not be intuitive, do not seem easily detectable and will not guide users to this flag. Nevertheless, the motivation for this is just barely strong enough for me to begrudgingly accept this. Fixes #1414, Closes #1416
2020-02-17ignore: make walker visit untraversable directoriesJakub Wieczorek
This commit fixes an inconsistency between the serial and the parallel directory walkers around visiting a directory for which the user holds insufficient permissions to descend into. The serial walker does produce a successful entry for a directory that it cannot descend into due to insufficient permissions. However, before this change that has not been the case for the parallel walker, which would produce an `Err` item not only when descending into a directory that it cannot read from but also for the directory entry itself. This change brings the behaviour of the parallel variant in line with that of the serial one. Fixes #1346, Closes #1365
2020-02-17ignore: allow post-processing at end-of-threadEd Page
On top of the parallel-walk's closures, this provides a Visitor API. This clarifies the role of the two different closures in the `run` API and allows implementing of `Drop` for post-processing once traversal is finished. The closure API is maintained not just for compatibility but also convinience for simple cases. Fixes #469, Closes #1430
2020-02-17ignore: allow parallel walker to borrow dataEd Page
This makes it so the caller can more easily refactor from single-threaded to multi-threaded walking. If they want to support both, this makes it easier to do so with a single initialization code-path. In particular, it side-steps the need to put everything into an `Arc`. This is not a breaking change because it strictly increases the number of allowed inputs to `WalkParallel::run`. Closes #1410, Closes #1432
2020-02-17ignore: use git commondir for sourcing .git/info/excludeJohannes Altmanninger
Git looks for this file in GIT_COMMON_DIR, which is usually the same as GIT_DIR (.git). However, when searching inside a linked worktree, .git is usually a file that contains the path of the actual git dir, which in turn contains a file "commondir" which references the directory where info/exclude may reside, alongside other configuration shared across all worktrees. This directory is usually the git dir of the main worktree. Unlike git this does *not* read environment variables GIT_DIR and GIT_COMMON_DIR, because it is not clear how to interpret them when searching multiple repositories. Fixes #1445, Closes #1446
2020-02-17ignore/types: add spec file typeMatěj Cepl
This is for RPM package SPEC files. Fixes #946, Closes #1449
2020-02-17ignore/types: add xhtml to xml file typeluh2
Closes #1426
2020-02-17ignore/types: add 'diff' file typeSven-Hendrik Haase
This includes .patch and .diff files. Fixes #1418, Closes #1419
2020-02-17ignore: add existence check for ignore filessharkdp
This commit adds a simple `.exists()` check for `.gitignore`, `.ignore`, and other similar files before actually calling `File::open(…)` in `GitIgnoreBuilder::add`. The reason is that a simple existence check via `stat` can be faster than actually trying to `open` the file, see https://stackoverflow.com/a/12774387/704831. As we typically expect(?) the number of directories *without* ignore files to be much larger than the number of directories *with* ignore files, this leads to an overall speedup. The performance gain is not huge for `rg`, but can be quite significant if more `.gitignore`-like files are added via `add_custom_ignore_filename`. The speedup is *larger* for folders with *low* files-per-directory ratios. Note though that we do not do this check on Windows until a specific analysis there suggests this is beneficial. Namely, Windows generally has slower file system operations, so it's not clear whether this speculative check is actually a benefit or not. Benchmark results ----------------- `rg --files` in my home folder (200k results, 6.5 files per directory): | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `./rg-master --files` | 396.4 ± 3.2 | 390.9 | 400.0 | 1.05 | | `./rg-feature --files` | 376.0 ± 3.6 | 369.3 | 383.5 | 1.00 | `rg --files --hidden` in my home folder (800k results, 5.4 files per directory) | Command | Mean [s] | Min [s] | Max [s] | Relative | |:---|---:|---:|---:|---:| | `./rg-master --files --hidden` | 1.575 ± 0.012 | 1.560 | 1.597 | 1.06 | | `./rg-feature --files --hidden` | 1.479 ± 0.011 | 1.464 | 1.496 | 1.00 | `rg --files` in the chromium-79.0.3915.2 source tree (300k results, 12.7 files per directory) | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `~/rg-master --files` | 445.2 ± 5.3 | 435.6 | 453.0 | 1.04 | | `~/rg-feature --files` | 428.9 ± 7.0 | 418.2 | 440.0 | 1.00 | `rg --files` in the linux-5.3 source tree (65k results, 15.1 files per directory) | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `./rg-master --files` | 94.5 ± 1.9 | 89.8 | 98.5 | 1.02 | | `./rg-feature --files` | 92.6 ± 2.7 | 88.4 | 98.7 | 1.00 | Closes #1381
2020-02-15ignore/types: add HAML and ERBJonathan Mast
These are commonly used templating languages for Ruby, add their extensions to the filetypes list for convenient filtering. PR #1407
2020-02-15ignore/types: add slim, slime, and skim templatesJeff S
PR #1391
2020-02-10ignore: allow use of Error::descriptionAndrew Gallant
We can remove it in the next semver incompatible release.
2020-02-07ignore/types: add typoscript file typeLuca Kredel
Add the file types for TypoScript - the configuration language of the TYPO3 CMS. PR #1477
2020-01-29ignore/types: add *.org_archive to org file typeRobert Irelan
.org_archive is the default extension for Org archive files, created when entries from an Org-mode file are archived (see <https://orgmode.org/org.html#Moving-subtrees>). These files are still in Org mode format, so it's worth searching them at the same time as non-archive Org mode files. PR #1475
2020-01-23ignore/types: make 'gradle' it's own typeTristan Waddington
This change maintains the existing behavior of the 'groovy' type, which includes both .groovy and .gradle files. PR #1470
2020-01-20ignore/types: fix postscript globsJan Verbeek
The postscript globs were missing asterisks, so they were treated as literal filenames. PR #1461
2020-01-10deps: update to thread_local 1.0Andrew Gallant
We also update the pcre2 and regex dependencies, which removes any other lingering uses of thread_local 0.3.
2019-08-28ignore: remove unused parameterAndrew Gallant
2019-08-09doc: use XDG_CONFIG_HOME in commentsTodd Walton
XDG_CONFIG_DIR does not actually exist. PR #1347
2019-08-06deps: drop tempfileAndrew Gallant
We were only using it to create temporary directories for `ignore` tests, but it pulls in a bunch of dependencies and we don't really need randomness. So just use our own simple wrapper instead.
2019-07-29ignore/types: add edn type from Clojure ecosystemMatthew Davidson
PR #1330
2019-07-24ignore: support compilation on wasmTiziano Santoro
Currently the crate assumes that exactly one of `cfg(windows)` or `cfg(unix)` is true, but this is not actually the case, for instance when compiling for `wasm32`. Implement the missing functions so that the crate can compile on other platforms, even though those functions will always return an error. PR #1327
2019-07-14ignore/types: add Robot FrameworkConrad Olega
PR #1322
2019-06-16style: fix deprecationsAndrew Gallant
Use `dyn` for trait objects and use `..=` for inclusive ranges.
2019-06-12ignore/types: add more nim typesHitesh Jasani
PR #1297
2019-05-29ignore: remove .git check in some casesAndrew Gallant
When we know we aren't going to process gitignores, we shouldn't waste the syscall in every directory to check for a git repo.
2019-04-16ignore/types: add GAPMax Horn
Add support for file types used by the GAP language, a research system computational discrete algebra, see <https://www.gap-system.org> PR #1249
2019-04-14ignore/types: add additional java filesMarco Herrn
- .jspx for XHTML JSP files - .properties for Java Properties files (resource bundles, etc.) Closes #1242
2019-04-09ignore/types: add more extensions for xmlhupfdule
This includes: *.dtd for Document Type Definitions *.xsl and *.xslt for XSL Transformation descriptions *.xsd for XML Schema definitions *.xjb for JAXB bindings *.rng for Relax NG files *.sch for Schematron files PR #1243
2019-04-09ignore/types: add lock filestonypai
Treat anything with a `.lock` extension as a lock file, with an extra rule or two for special cases, e.g., package-lock.json.
2019-04-06ignore/types: add *.am and *.in for C/C++/makedana
PR #1205
2019-02-08ignore/types: add zigJohn Schmidt
PR #1191
2019-01-31ignore/types: *.dtx and *.ins added for texSteffen Banhardt
PR #1182
2019-01-27ignore: correctly detect hidden files on WindowsAndrew Gallant
This commit fixes a bug where ripgrep only treated files beginning with a `.` as hidden. On Windows, we continue this tradition, but additionally check whether a file has the special Windows "hidden" attribute set. If so, we treat it as a hidden file. In order to make this work without an additional stat call, we had to rearrange some of the plumbing from the directory traverser. Fixes #1154
2019-01-23ignore/types: add method for retrieving file type definitionAwad Mackie
Fixes #1116, Closes #1120
2019-01-23globset: permit ** to appear anywhereAndrew Gallant
Previously, `man gitignore` specified that `**` was invalid unless it was used in one of a few specific circumstances, i.e., `**`, `a/**`, `**/b` or `a/**/b`. That is, `**` always had to be surrounded by either a path separator or the beginning/end of the pattern. It turns out that git itself has treated `**` outside the above contexts as valid for quite a while, so there was an inconsistency between the spec `man gitignore` and the implementation, and it wasn't clear which was actually correct. @okdana filed a bug against git[1] and got this fixed. The spec was wrong, which has now been fixed [2] and updated[2]. This commit brings ripgrep in line with git and treats `**` outside of the above contexts as two consecutive `*` patterns. We deprecate the `InvalidRecursive` error since it is no longer used. Fixes #373, Fixes #1098 [1] - https://public-inbox.org/git/C16A9F17-0375-42F9-90A9-A92C9F3D8BBA@dana.is [2] - https://github.com/git/git/commit/627186d0206dcb219c43f8e6670b4487802a4921 [3] - https://git-scm.com/docs/gitignore
2019-01-23globset: fix repeated use of **Andrew Gallant
This fixes a bug where repeated use of ** didn't behave as it should. In particular, each use of `**` added a new requirement directory depth requirement. For example, something like `**/**/b` would match `foo/bar/b`, but it wouldn't match `foo/b` even though it should. In particular, `**` semantics demand "infinite" depth, so repeated uses of `**` should just coalesce as if only one was given. We do this coalescing in the parser. It's a little tricky because we treat `**/a`, `a/**` and `a/**/b` as distinct tokens with their own regex conversions. We also test the crap out of it. Fixes #1174
2019-01-23ignore: fix handling of **Andrew Gallant
When deciding whether to add the `**/` prefix or not, we should choose not to add it if the pattern is simply a bare `**`. Previously, we were only not adding it if it was `**/`, which is correct, but we also need to do it for `**` since `**` can already match anywhere. There's likely a more principled solution to this, but this works for now. Fixes #1173
2019-01-23ignore: always use literal_separator for gitignore patterns (#1093)dana
PR #1093
2019-01-22ignore/types: add/update brotli, bzip2, gzip, xz, zstddana
2019-01-22ripgrep: add --ignore-file-case-insensitiveDavid Torosyan
The --ignore-file-case-insensitive flag causes all .gitignore/.rgignore/.ignore files to have their globs matched without regard for case. Because this introduces a potentially significant performance regression, this is always disabled by default. Users that need case insensitive matching can enable it on a case by case basis. Closes #1164, Closes #1170
2019-01-18ignore/types: add QMLP M
PR #1165
2018-12-30ignore: permit use of deprecated trim_rightAndrew Gallant
2018-12-15deps: update to crossbeam-channel 0.3Andrew Gallant
This also requires corresponding updates to both rand and rand_core. Doing an update of rand without doing an update of rand_core results in compilation errors because two distinct versions of rand_core are included in the build, and the traits they expose are distinct and incompatible. We also switch over to using tempfile instead of tempdir, which drops the last remaining thing keeping rand 0.4 in the build. Fixes #1141, Fixes #1142
2018-12-07ignore/types: add ASPSimon Morgan
PR #1134
2018-11-23ignore/types: add postscriptAntony Lee
Although postscript/encapsulated postscript is usually thought of as a binary format, it's actually mostly ASCII, so ripgrep will not ignore these files. The situation is basically the same as for pdf, which is also already present in the list of known filetypes. PR #1118