Age | Commit message (Collapse) | Author |
|
Previously, the runtime shutdown logic would first-hand control over all cores
to a single thread, which would sequentially shut down all tasks on the core
and then wait for them to complete.
This could deadlock when one task is waiting for a later core's task to
complete. For example, in the newly added test, we have a `block_in_place` task
that is waiting for another task to be dropped. If the latter task adds its
core to the shutdown list later than the former, we end up waiting forever for
the `block_in_place` task to complete.
Additionally, there also was a bug wherein we'd attempt to park on the parker
after shutting it down which was fixed as part of the refactors above.
This change restructures the code to bring all tasks to a halt (and do any
parking needed) before we collapse to a single thread to avoid this deadlock.
There was also an issue in which canceled tasks would not unpark the
originating thread, due to what appears to be some sort of optimization gone
wrong. This has been fixed to be much more conservative in selecting when not
to unpark the source thread (this may be too conservative; please take a look
at the changes to `release()`).
Fixes: #2789
|
|
A call to `JoinHandle::abort` releases a task. When called from outside of the runtime,
this panics due to the current implementation checking for a thread-local worker context.
This change makes accessing the thread-local context optional under release, by falling
back to remotely marking a task remotely as dropped. Behaving the same as if the core
was stolen by another worker.
Fixes #3157
|
|
|
|
This removes the last slab dependency by replacing the current slab-based
JoinHandle tracking with one based on HashMap instead.
Co-authored-by: Bryan Donlan <bdonlan@amazon.com>
|
|
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
Co-authored-by: Alice Ryhl <alice@ryhl.io>
Co-authored-by: Carl Lerche <me@carllerche.com>
|
|
|
|
## Motivation
Currently, the per-task `tracing` spans generated by tokio's `tracing`
feature flag include the `std::any::type_name` of the future that was
spawned. When future combinators and/or libraries like Tower are in use,
these future names can get _quite_ long. Furthermore, when formatting
the `tracing` spans with their parent spans as context, any other task
spans in the span context where the future was spawned from can _also_
include extremely long future names.
In some cases, this can result in extremely high memory use just to
store the future names. For example, in Linkerd, when we enable
`tokio=trace` to enable the task spans, there's a spawned task whose
future name is _232990 characters long_. A proxy with only 14 spawned
tasks generates a task list that's over 690 KB. Enabling task spans
under load results in the process getting OOM killed very quickly.
## Solution
This branch removes future type names from the spans generated by
`spawn`. As a replacement, to allow identifying which `spawn` call a
span corresponds to, the task span now contains the source code location
where `spawn` was called, when the compiler supports the
`#[track_caller]` attribute. Since `track_caller` was stabilized in Rust
1.46.0, and our minimum supported Rust version is 1.45.0, we can't
assume that `#[track_caller]` is always available. Instead, we have a
RUSTFLAGS cfg, `tokio_track_caller`, that guards whether or not we use
it. I've also added a `build.rs` that detects the compiler minor
version, and sets the cfg flag automatically if the current compiler
version is >= 1.46. This means users shouldn't have to enable
`tokio_track_caller` manually.
Here's the trace output from the `chat` example, before this change:
![Screenshot_20201030_110157](https://user-images.githubusercontent.com/2796466/97741071-6d408800-1a9f-11eb-9ed6-b25e72f58c7b.png)
...and after:
![Screenshot_20201030_110303](https://user-images.githubusercontent.com/2796466/97741112-7e899480-1a9f-11eb-9197-c5a3f9ea1c05.png)
Closes #3073
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
|
|
|
|
Move common code and tracing integration into Handle
Fixes #2998
Closes #3004
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
|
|
|
|
This allows writing
rt.spawn_blocking(f);
instead of
let _enter = rt.enter();
tokio::task::spawn_blocking(f);
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
|
|
|
|
|
|
This PR updates the runtime module docs to the changes made in `0.3`
release of tokio.
Closes #2720
|
|
|
|
tokio:
merge rt-core and rt-util as rt
rename rt-threaded to rt-multi-thread
tokio-util:
rename rt-core to rt
Closes #2942
|
|
|
|
Co-authored-by: Alice Ryhl <alice@ryhl.io>
|
|
|
|
Co-authored-by: Alice Ryhl <alice@ryhl.io>
Co-authored-by: Carl Lerche <me@carllerche.com>
|
|
|
|
Closes: #2929
Co-authored-by: Bryan Donlan <bdonlan@amazon.com>
|
|
Uses the infrastructure added by #2828 to enable switching
`TcpListener::accept` to use `&self`.
This also switches `poll_accept` to use `&self`. While doing introduces
a hazard, `poll_*` style functions are considered low-level. Most users
will use the `async fn` variants which are more misuse-resistant.
TcpListener::incoming() is temporarily removed as it has the same
problem as `TcpSocket::by_ref()` and will be implemented later.
|
|
|
|
|
|
|
|
As tokio does not rely on poisoning, we can
avoid always unwrapping when locking by handling
the `PoisonError` in the Mutex shim.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
|
|
|
|
This change will still internally compile any `signal` resources
required when `process` is enabled on unix systems, but it will not
publicly turn on the cargo feature
|
|
|
|
|
|
Refactors the signal infrastructure to move the driver to the runtime
thread. This follows the model put forth by the I/O driver and time
driver.
|
|
|
|
|
|
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
Fixes: #2585
|
|
(#2819)
|
|
|
|
|
|
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
|
|
Fixes: #1907
|
|
JoinHandle of threads created by the pool are now tracked and properly joined at
shutdown. If the thread does not return within the timeout, then it's not joined and
left to the OS for cleanup.
Also, break a cycle between wakers held by the timer and the runtime.
Fixes #2641, #2535
|
|
Previously, we would fail to reset the coop budget in this case, making
it so that `coop::poll_proceed` would perpetually yield `Poll::Pending`
in nested executers even when run in `block_in_place`.
This is also a further improvement on #2645.
|
|
Dropping a runtime normally involves waiting for any outstanding blocking tasks
to complete. When this drop happens in an asynchronous context, we previously
would issue a cryptic panic due to trying to block in an asynchronous context.
This change improves the panic message, and adds a `shutdown_blocking()` function
which can be used to shutdown a runtime without blocking at all, as an out for
cases where this really is necessary.
Co-authored-by: Bryan Donlan <bdonlan@amazon.com>
Co-authored-by: Alice Ryhl <alice@ryhl.io>
|
|
* Update doc comments
* Remove trailing whitespace
|
|
A fast path in block_on_place was failing to call exit() in the case where we
were in a block_on call.
Fixes: #2639
Co-authored-by: Bryan Donlan <bdonlan@amazon.com>
|
|
|
|
* Fix clippy warnings
* Pin rustc version to 1.43.1 in macOS
Refs: https://github.com/rust-lang/rust/issues/73030
|
|
|
|
This patch updates the coop logic so that the budget is only decremented
if a future makes progress (that is, if it returns `Ready`). This is
realized by restoring the budget to its former value after
`poll_proceed` _unless_ the caller indicates that it made progress.
The thinking here is that we always want tasks to make progress when we
poll them. With the way things were, if a task polled 128 resources that
could make no progress, and just returned `Pending`, then a 129th
resource that _could_ make progress would not be polled. Worse yet, this
could manifest as a deadlock, if the first 128 resources were all
_waiting_ for the 129th resource, since it would _never_ be polled.
The downside of this change is that `Pending` resources now do not take
up any part of the budget, even though they _do_ take up time on the
executor. If a task is particularly aggressive (or unoptimized), and
polls a large number of resources that cannot make progress whenever it
is polled, then coop will allow it to run potentially much longer before
yielding than it could before. The impact of this should be relatively
contained though, because tasks that behaved in this way in the past
probably ignored `Pending` _anyway_, so whether a resource returned
`Pending` due to coop or due to lack of progress may not make a
difference to it.
|
|
(#2493)
|