|
## Motivation
Currently, the per-task `tracing` spans generated by tokio's `tracing`
feature flag include the `std::any::type_name` of the future that was
spawned. When future combinators and/or libraries like Tower are in use,
these future names can get _quite_ long. Furthermore, when formatting
the `tracing` spans with their parent spans as context, any other task
spans in the span context where the future was spawned from can _also_
include extremely long future names.
In some cases, this can result in extremely high memory use just to
store the future names. For example, in Linkerd, when we enable
`tokio=trace` to enable the task spans, there's a spawned task whose
future name is _232990 characters long_. A proxy with only 14 spawned
tasks generates a task list that's over 690 KB. Enabling task spans
under load results in the process getting OOM killed very quickly.
## Solution
This branch removes future type names from the spans generated by
`spawn`. As a replacement, to allow identifying which `spawn` call a
span corresponds to, the task span now contains the source code location
where `spawn` was called, when the compiler supports the
`#[track_caller]` attribute. Since `track_caller` was stabilized in Rust
1.46.0, and our minimum supported Rust version is 1.45.0, we can't
assume that `#[track_caller]` is always available. Instead, we have a
RUSTFLAGS cfg, `tokio_track_caller`, that guards whether or not we use
it. I've also added a `build.rs` that detects the compiler minor
version, and sets the cfg flag automatically if the current compiler
version is >= 1.46. This means users shouldn't have to enable
`tokio_track_caller` manually.
Here's the trace output from the `chat` example, before this change:
![Screenshot_20201030_110157](https://user-images.githubusercontent.com/2796466/97741071-6d408800-1a9f-11eb-9ed6-b25e72f58c7b.png)
...and after:
![Screenshot_20201030_110303](https://user-images.githubusercontent.com/2796466/97741112-7e899480-1a9f-11eb-9197-c5a3f9ea1c05.png)
Closes #3073
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
|
|
## Motivation
When debugging asynchronous systems, it can be very valuable to inspect
what tasks are currently active (see #2510). The [`tracing` crate] and
related libraries provide an interface for Rust libraries and
applications to emit and consume structured, contextual, and async-aware
diagnostic information. Because this diagnostic information is
structured and machine-readable, it is a better fit for the
task-tracking use case than textual logging — `tracing` spans can be
consumed to generate metrics ranging from a simple counter of active
tasks to histograms of poll durations, idle durations, and total task
lifetimes. This information is potentially valuable to both Tokio users
*and* to maintainers.
Additionally, `tracing` is maintained by the Tokio project and is
becoming widely adopted by other libraries in the "Tokio stack", such as
[`hyper`], [`h2`], and [`tonic`] and in [other] [parts] of the broader Rust
ecosystem. Therefore, it is suitable for use in Tokio itself.
[`tracing` crate]: https://github.com/tokio-rs/tracing
[`hyper`]: https://github.com/hyperium/hyper/pull/2204
[`h2`]: https://github.com/hyperium/h2/pull/475
[`tonic`]: https://github.com/hyperium/tonic/blob/570c606397e47406ec148fe1763586e87a8f5298/tonic/Cargo.toml#L48
[other]: https://github.com/rust-lang/chalk/pull/525
[parts]: https://github.com/rust-lang/compiler-team/issues/331
## Solution
This PR is an MVP for instrumenting Tokio with `tracing` spans. When the
"tracing" optional dependency is enabled, every spawned future will be
instrumented with a `tracing` span.
The generated spans are at the `TRACE` verbosity level, and have the
target "tokio::task", which may be used by consumers to filter whether
they should be recorded. They include fields for the type name of the
spawned future and for what kind of task the span corresponds to (a
standard `spawn`ed task, a local task spawned by `spawn_local`, or a
`blocking` task spawned by `spawn_blocking`). Because `tracing` has
separate concepts of "opening/closing" and "entering/exiting" a span, we
enter these spans every time the spawned task is polled. This allows
collecting data such as:
- the total lifetime of the task from `spawn` to `drop`
- the number of times the task was polled before it completed
- the duration of each individual time that the span was polled (and
therefore, aggregated metrics like histograms or averages of poll
durations)
- the total time a span was actively being polled, and the total time
it was alive but **not** being polled
- the time between when the task was `spawn`ed and the first poll
As an example, here is the output of a version of the `chat` example
instrumented with `tracing`:
![image](https://user-images.githubusercontent.com/2796466/87231927-e50f6900-c36f-11ea-8a90-6da9b93b9601.png)
And, with multiple connections actually sending messages:
![trace_example_1](https://user-images.githubusercontent.com/2796466/87231876-8d70fd80-c36f-11ea-91f1-0ad1a5b3112f.png)
I haven't added any `tracing` spans in the example, only converted the
existing `println!`s to `tracing::info` and `tracing::error` for
consistency. The span durations in the above output are generated by
`tracing-subscriber`. Of course, a Tokio-specific subscriber could
generate even more detailed statistics, but that's follow-up work once
basic tracing support has been added.
Note that the `Instrumented` type from `tracing-futures`, which attaches
a `tracing` span to a future, was reimplemented inside of Tokio to avoid
a dependency on that crate. `tracing-futures` has a feature flag that
enables an optional dependency on Tokio, and I believe that if another
crate in a dependency graph enables that feature while Tokio's `tracing`
support is also enabled, it would create a circular dependency that
Cargo wouldn't be able to handle. Also, it avoids a dependency for a
very small amount of code that is unlikely to ever change.
There is, of course, room for plenty of future work here. This might
include:
- instrumenting other parts of `tokio`, such as I/O resources and
channels (possibly via waker instrumentation)
- instrumenting the threadpool so that the state of worker threads
can be inspected
- writing `tracing-subscriber` `Layer`s to collect and display
Tokio-specific data from these traces
- using `track_caller` (when it's stable) to record _where_ a task
was `spawn`ed from
However, this is intended as an MVP to get us started on that path.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
|