diff options
Diffstat (limited to 'crates/cli/src/lib.rs')
-rw-r--r-- | crates/cli/src/lib.rs | 250 |
1 files changed, 250 insertions, 0 deletions
diff --git a/crates/cli/src/lib.rs b/crates/cli/src/lib.rs new file mode 100644 index 00000000..452ea141 --- /dev/null +++ b/crates/cli/src/lib.rs @@ -0,0 +1,250 @@ +/*! +This crate provides common routines used in command line applications, with a +focus on routines useful for search oriented applications. As a utility +library, there is no central type or function. However, a key focus of this +crate is to improve failure modes and provide user friendly error messages +when things go wrong. + +To the best extent possible, everything in this crate works on Windows, macOS +and Linux. + + +# Standard I/O + +The +[`is_readable_stdin`](fn.is_readable_stdin.html), +[`is_tty_stderr`](fn.is_tty_stderr.html), +[`is_tty_stdin`](fn.is_tty_stdin.html) +and +[`is_tty_stdout`](fn.is_tty_stdout.html) +routines query aspects of standard I/O. `is_readable_stdin` determines whether +stdin can be usefully read from, while the `tty` methods determine whether a +tty is attached to stdin/stdout/stderr. + +`is_readable_stdin` is useful when writing an application that changes behavior +based on whether the application was invoked with data on stdin. For example, +`rg foo` might recursively search the current working directory for +occurrences of `foo`, but `rg foo < file` might only search the contents of +`file`. + +The `tty` methods are useful for similar reasons. Namely, commands like `ls` +will change their output depending on whether they are printing to a terminal +or not. For example, `ls` shows a file on each line when stdout is redirected +to a file or a pipe, but condenses the output to show possibly many files on +each line when stdout is connected to a tty. + + +# Coloring and buffering + +The +[`stdout`](fn.stdout.html), +[`stdout_buffered_block`](fn.stdout_buffered_block.html) +and +[`stdout_buffered_line`](fn.stdout_buffered_line.html) +routines are alternative constructors for +[`StandardStream`](struct.StandardStream.html). +A `StandardStream` implements `termcolor::WriteColor`, which provides a way +to emit colors to terminals. Its key use is the encapsulation of buffering +style. Namely, `stdout` will return a line buffered `StandardStream` if and +only if stdout is connected to a tty, and will otherwise return a block +buffered `StandardStream`. Line buffering is important for use with a tty +because it typically decreases the latency at which the end user sees output. +Block buffering is used otherwise because it is faster, and redirecting stdout +to a file typically doesn't benefit from the decreased latency that line +buffering provides. + +The `stdout_buffered_block` and `stdout_buffered_line` can be used to +explicitly set the buffering strategy regardless of whether stdout is connected +to a tty or not. + + +# Escaping + +The +[`escape`](fn.escape.html), +[`escape_os`](fn.escape_os.html), +[`unescape`](fn.unescape.html) +and +[`unescape_os`](fn.unescape_os.html) +routines provide a user friendly way of dealing with UTF-8 encoded strings that +can express arbitrary bytes. For example, you might want to accept a string +containing arbitrary bytes as a command line argument, but most interactive +shells make such strings difficult to type. Instead, we can ask users to use +escape sequences. + +For example, `a\xFFz` is itself a valid UTF-8 string corresponding to the +following bytes: + +```ignore +[b'a', b'\\', b'x', b'F', b'F', b'z'] +``` + +However, we can +interpret `\xFF` as an escape sequence with the `unescape`/`unescape_os` +routines, which will yield + +```ignore +[b'a', b'\xFF', b'z'] +``` + +instead. For example: + +``` +use grep_cli::unescape; + +// Note the use of a raw string! +assert_eq!(vec![b'a', b'\xFF', b'z'], unescape(r"a\xFFz")); +``` + +The `escape`/`escape_os` routines provide the reverse transformation, which +makes it easy to show user friendly error messages involving arbitrary bytes. + + +# Building patterns + +Typically, regular expression patterns must be valid UTF-8. However, command +line arguments aren't guaranteed to be valid UTF-8. Unfortunately, the +standard library's UTF-8 conversion functions from `OsStr`s do not provide +good error messages. However, the +[`pattern_from_bytes`](fn.pattern_from_bytes.html) +and +[`pattern_from_os`](fn.pattern_from_os.html) +do, including reporting exactly where the first invalid UTF-8 byte is seen. + +Additionally, it can be useful to read patterns from a file while reporting +good error messages that include line numbers. The +[`patterns_from_path`](fn.patterns_from_path.html), +[`patterns_from_reader`](fn.patterns_from_reader.html) +and +[`patterns_from_stdin`](fn.patterns_from_stdin.html) +routines do just that. If any pattern is found that is invalid UTF-8, then the +error includes the file path (if available) along with the line number and the +byte offset at which the first invalid UTF-8 byte was observed. + + +# Read process output + +Sometimes a command line application needs to execute other processes and read +its stdout in a streaming fashion. The +[`CommandReader`](struct.CommandReader.html) +provides this functionality with an explicit goal of improving failure modes. +In particular, if the process exits with an error code, then stderr is read +and converted into a normal Rust error to show to end users. This makes the +underlying failure modes explicit and gives more information to end users for +debugging the problem. + +As a special case, +[`DecompressionReader`](struct.DecompressionReader.html) +provides a way to decompress arbitrary files by matching their file extensions +up with corresponding decompression programs (such as `gzip` and `xz`). This +is useful as a means of performing simplistic decompression in a portable +manner without binding to specific compression libraries. This does come with +some overhead though, so if you need to decompress lots of small files, this +may not be an appropriate convenience to use. + +Each reader has a corresponding builder for additional configuration, such as +whether to read stderr asynchronously in order to avoid deadlock (which is +enabled by default). + + +# Miscellaneous parsing + +The +[`parse_human_readable_size`](fn.parse_human_readable_size.html) +routine parses strings like `2M` and converts them to the corresponding number +of bytes (`2 * 1<<20` in this case). If an invalid size is found, then a good +error message is crafted that typically tells the user how to fix the problem. +*/ + +#![deny(missing_docs)] + +extern crate atty; +extern crate bstr; +extern crate globset; +#[macro_use] +extern crate lazy_static; +#[macro_use] +extern crate log; +extern crate regex; +extern crate same_file; +extern crate termcolor; +#[cfg(windows)] +extern crate winapi_util; + +mod decompress; +mod escape; +mod human; +mod pattern; +mod process; +mod wtr; + +pub use decompress::{ + DecompressionMatcher, DecompressionMatcherBuilder, DecompressionReader, + DecompressionReaderBuilder, +}; +pub use escape::{escape, escape_os, unescape, unescape_os}; +pub use human::{parse_human_readable_size, ParseSizeError}; +pub use pattern::{ + pattern_from_bytes, pattern_from_os, patterns_from_path, + patterns_from_reader, patterns_from_stdin, InvalidPatternError, +}; +pub use process::{CommandError, CommandReader, CommandReaderBuilder}; +pub use wtr::{ + stdout, stdout_buffered_block, stdout_buffered_line, StandardStream, +}; + +/// Returns true if and only if stdin is believed to be readable. +/// +/// When stdin is readable, command line programs may choose to behave +/// differently than when stdin is not readable. For example, `command foo` +/// might search the current directory for occurrences of `foo` where as +/// `command foo < some-file` or `cat some-file | command foo` might instead +/// only search stdin for occurrences of `foo`. +pub fn is_readable_stdin() -> bool { + #[cfg(unix)] + fn imp() -> bool { + use same_file::Handle; + use std::os::unix::fs::FileTypeExt; + + let ft = match Handle::stdin().and_then(|h| h.as_file().metadata()) { + Err(_) => return false, + Ok(md) => md.file_type(), + }; + ft.is_file() || ft.is_fifo() + } + + #[cfg(windows)] + fn imp() -> bool { + use winapi_util as winutil; + + winutil::file::typ(winutil::HandleRef::stdin()) + .map(|t| t.is_disk() || t.is_pipe()) + .unwrap_or(false) + } + + !is_tty_stdin() && imp() +} + +/// Returns true if and only if stdin is believed to be connectted to a tty +/// or a console. +pub fn is_tty_stdin() -> bool { + atty::is(atty::Stream::Stdin) +} + +/// Returns true if and only if stdout is believed to be connectted to a tty +/// or a console. +/// +/// This is useful for when you want your command line program to produce +/// different output depending on whether it's printing directly to a user's +/// terminal or whether it's being redirected somewhere else. For example, +/// implementations of `ls` will often show one item per line when stdout is +/// redirected, but will condensed output when printing to a tty. +pub fn is_tty_stdout() -> bool { + atty::is(atty::Stream::Stdout) +} + +/// Returns true if and only if stderr is believed to be connectted to a tty +/// or a console. +pub fn is_tty_stderr() -> bool { + atty::is(atty::Stream::Stderr) +} |