diff options
author | Justus Winter <justus@sequoia-pgp.org> | 2018-11-26 22:40:35 +0100 |
---|---|---|
committer | Justus Winter <justus@sequoia-pgp.org> | 2018-11-26 23:18:45 +0100 |
commit | 421f4d97d9f3e548a5e795599489ef1428f88f13 (patch) | |
tree | 9bbfb87f2140f91e97e979db5ef4968d52583dc6 /buffered-reader | |
parent | 02f4973085e15c944c85231fa49605ac33c8f4b7 (diff) |
buffered-reader: Add a README.
Diffstat (limited to 'buffered-reader')
-rw-r--r-- | buffered-reader/Cargo.toml | 2 | ||||
-rw-r--r-- | buffered-reader/README.md | 90 |
2 files changed, 91 insertions, 1 deletions
diff --git a/buffered-reader/Cargo.toml b/buffered-reader/Cargo.toml index ba0251f4..53e1fc95 100644 --- a/buffered-reader/Cargo.toml +++ b/buffered-reader/Cargo.toml @@ -10,7 +10,7 @@ authors = [ documentation = "https://docs.sequoia-pgp.org/buffered_reader" homepage = "https://sequoia-pgp.org/" repository = "https://gitlab.com/sequoia-pgp/sequoia" -readme = "../README.md" +readme = "README.md" license = "GPL-3.0" [badges] diff --git a/buffered-reader/README.md b/buffered-reader/README.md new file mode 100644 index 00000000..d927ad21 --- /dev/null +++ b/buffered-reader/README.md @@ -0,0 +1,90 @@ +A `BufferedReader` is a super-powered `Read`er. + +Like the [`BufRead`] trait, the `BufferedReader` trait has an +internal buffer that is directly exposed to the user. This design +enables two performance optimizations. First, the use of an +internal buffer amortizes system calls. Second, exposing the +internal buffer allows the user to work with data in place, which +avoids another copy. + +The [`BufRead`] trait, however, has a significant limitation for +parsers: the user of a [`BufRead`] object can't control the amount +of buffering. This is essential for being able to conveniently +work with data in place, and being able to lookahead without +consuming data. The result is that either the sizing has to be +handled by the instantiator of the [`BufRead`] object---assuming +the [`BufRead`] object provides such a mechanism---which is a +layering violation, or the parser has to fallback to buffering if +the internal buffer is too small, which eliminates most of the +advantages of the [`BufRead`] abstraction. The `BufferedReader` +trait addresses this shortcoming by allowing the user to control +the size of the internal buffer. + +The `BufferedReader` trait also has some functionality, +specifically, a generic interface to work with a stack of +`BufferedReader` objects, that simplifies using multiple parsers +simultaneously. This is helpful when one parser deals with +framing (e.g., something like [HTTP's chunk transfer encoding]), +and another decodes the actual objects. It is also useful when +objects are nested. + +# Details + +Because the [`BufRead`] trait doesn't provide a mechanism for the +user to size the interal buffer, a parser can't generally be sure +that the internal buffer will be large enough to allow it to work +with all data in place. + +Using the standard [`BufRead`] implementation, [`BufReader`], the +instantiator can set the size of the internal buffer at creation +time. Unfortunately, this mechanism is ugly, and not always +adequate. First, the parser is typically not the instantiator. +Thus, the instantiator needs to know about the implementation +details of all of the parsers, which turns an implementation +detail into a cross-cutting concern. Second, when working with +dynamically sized data, the maximum amount of the data that needs +to be worked with in place may not be known apriori, or the +maximum amount may be significantly larger than the typical +amount. This leads to poorly sized buffers. + +Alternatively, the code that uses, but does not instantiate a +[`BufRead`] object, can be changed to stream the data, or to +fallback to reading the data into a local buffer if the internal +buffer is too small. Both of these approaches increase code +complexity, and the latter approach is contrary to the +[`BufRead`]'s goal of reducing unnecessary copying. + +The `BufferedReader` trait solves this problem by allowing the +user to dynamically (i.e., at read time, not open time) ensure +that the internal buffer has a certain amount of data. + +The ability to control the size of the internal buffer is also +essential to straightforward support for speculative lookahead. +The reason that speculative lookahead with a [`BufRead`] object is +difficult is that speculative lookahead is /speculative/, i.e., if +the parser backtracks, the data that was read must not be +consumed. Using a [`BufRead`] object, this is not possible if the +amount of lookahead is larger than the internal buffer. That is, +if the amount of lookahead data is larger than the [`BufRead`]'s +internal buffer, the parser first has to `BufRead::consume`() some +data to be able to examine more data. But, if the parser then +decides to backtrack, it has no way to return the unused data to +the [`BufRead`] object. This forces the parser to manage a buffer +of read, but unconsumed data, which significantly complicates the +code. + +The `BufferedReader` trait also simplifies working with a stack of +`BufferedReader`s in two ways. First, the `BufferedReader` trait +provides *generic* methods to access the underlying +`BufferedReader`. Thus, even when dealing with a trait object, it +is still possible to recover the underlying `BufferedReader`. +Second, the `BufferedReader` provides a mechanism to associate +generic state with each `BufferedReader` via a cookie. Although +it is possible to realize this functionality using a custom trait +that extends the `BufferedReader` trait and wraps existing +`BufferedReader` implementations, this approach eliminates a lot +of error-prone, boilerplate code. + +[`BufRead`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html +[`BufReader`]: https://doc.rust-lang.org/stable/std/io/struct.BufReader.html +[HTTP's chunk transfer encoding]: https://en.wikipedia.org/wiki/Chunked_transfer_encoding |