summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAndrew Gallant <jamslam@gmail.com>2019-01-26 13:55:17 -0500
committerAndrew Gallant <jamslam@gmail.com>2019-01-26 13:55:47 -0500
commit6d5dba85bd455c348655c3f91347989abb160cd4 (patch)
tree4414b8e33321195c23aa00e1b79dc38d93ac827b
parentafb89bcdadf85d9c721739d004552ed3c8a9afa0 (diff)
doc: clarify automatic encoding detection
Fixes #1103
-rw-r--r--CHANGELOG.md2
-rw-r--r--GUIDE.md3
-rw-r--r--src/app.rs9
3 files changed, 11 insertions, 3 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 44f4ae79..5c7e61f0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -27,6 +27,8 @@ Bug fixes:
`**` is now accepted as valid syntax anywhere in a glob.
* [BUG #1095](https://github.com/BurntSushi/ripgrep/issues/1095):
Fix corner cases involving the `--crlf` flag.
+* [BUG #1103](https://github.com/BurntSushi/ripgrep/issues/1103):
+ Clarify what `--encoding auto` does.
* [BUG #1106](https://github.com/BurntSushi/ripgrep/issues/1106):
`--files-with-matches` and `--files-without-match` work with one file.
* [BUG #1093](https://github.com/BurntSushi/ripgrep/pull/1093):
diff --git a/GUIDE.md b/GUIDE.md
index 8523b6a5..39ccb52d 100644
--- a/GUIDE.md
+++ b/GUIDE.md
@@ -609,7 +609,8 @@ topic, but we can try to summarize its relevancy to ripgrep:
the most popular encodings likely consist of ASCII, latin1 or UTF-8. As
a special exception, UTF-16 is prevalent in Windows environments
-In light of the above, here is how ripgrep behaves:
+In light of the above, here is how ripgrep behaves when `--encoding auto` is
+given, which is the default:
* All input is assumed to be ASCII compatible (which means every byte that
corresponds to an ASCII codepoint actually is an ASCII codepoint). This
diff --git a/src/app.rs b/src/app.rs
index 59fd6f23..b4c81a7c 100644
--- a/src/app.rs
+++ b/src/app.rs
@@ -982,10 +982,15 @@ fn flag_encoding(args: &mut Vec<RGArg>) {
const LONG: &str = long!("\
Specify the text encoding that ripgrep will use on all files searched. The
default value is 'auto', which will cause ripgrep to do a best effort automatic
-detection of encoding on a per-file basis. Other supported values can be found
-in the list of labels here:
+detection of encoding on a per-file basis. Automatic detection in this case
+only applies to files that begin with a UTF-8 or UTF-16 byte-order mark (BOM).
+No other automatic detection is performend.
+
+Other supported values can be found in the list of labels here:
https://encoding.spec.whatwg.org/#concept-encoding-get
+For more details on encoding and how ripgrep deals with it, see GUIDE.md.
+
This flag can be disabled with --no-encoding.
");
let arg = RGArg::flag("encoding", "ENCODING").short("E")