summaryrefslogtreecommitdiffstats
path: root/tokio/src/coop.rs
AgeCommit message (Collapse)Author
2020-10-12rt: simplify rt-* features (#2949)Taiki Endo
tokio: merge rt-core and rt-util as rt rename rt-threaded to rt-multi-thread tokio-util: rename rt-core to rt Closes #2942
2020-10-12rt: Remove `threaded_scheduler()` and `basic_scheduler()` (#2876)Lucio Franco
Co-authored-by: Alice Ryhl <alice@ryhl.io> Co-authored-by: Carl Lerche <me@carllerche.com>
2020-05-31docs: use intra-links in the docs (#2575)xliiv
2020-05-21coop: Undo budget decrement on Pending (#2549)Jon Gjengset
This patch updates the coop logic so that the budget is only decremented if a future makes progress (that is, if it returns `Ready`). This is realized by restoring the budget to its former value after `poll_proceed` _unless_ the caller indicates that it made progress. The thinking here is that we always want tasks to make progress when we poll them. With the way things were, if a task polled 128 resources that could make no progress, and just returned `Pending`, then a 129th resource that _could_ make progress would not be polled. Worse yet, this could manifest as a deadlock, if the first 128 resources were all _waiting_ for the 129th resource, since it would _never_ be polled. The downside of this change is that `Pending` resources now do not take up any part of the budget, even though they _do_ take up time on the executor. If a task is particularly aggressive (or unoptimized), and polls a large number of resources that cannot make progress whenever it is polled, then coop will allow it to run potentially much longer before yielding than it could before. The impact of this should be relatively contained though, because tasks that behaved in this way in the past probably ignored `Pending` _anyway_, so whether a resource returned `Pending` due to coop or due to lack of progress may not make a difference to it.
2020-05-07rt: set task budget after block_in_place call (#2502)Carl Lerche
In some cases, when a call to `block_in_place` completes, the runtime is reinstated on the thread. In this case, the task budget must also be set in order to avoid starving other tasks on the worker.
2020-05-06rt: simplify coop implementation (#2498)Carl Lerche
Simplifies coop implementation. Prunes unused code, create a `Budget` type to track the current budget.
2020-04-30task: fix LocalSet having a single shared task budget (#2462)Eliza Weisman
## Motivation Currently, an issue exists where a `LocalSet` has a single cooperative task budget that's shared across all futures spawned on the `LocalSet` _and_ by any future passed to `LocalSet::run_until` or `LocalSet::block_on`. Because these methods will poll the `run_until` future before polling spawned tasks, it is possible for that task to _always_ deterministically starve the entire `LocalSet` so that no local tasks can proceed. When the completion of that future _itself_ depends on other tasks on the `LocalSet`, this will then result in a deadlock, as in issue #2460. A detailed description of why this is the case, taken from [this comment][1]: `LocalSet` wraps each time a local task is run in `budget`: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/task/local.rs#L406 This is identical to what tokio's other schedulers do when running tasks, and in theory should give each task its own budget every time it's polled. _However_, `LocalSet` is different from other schedulers. Unlike the runtime schedulers, a `LocalSet` is itself a future that's run on another scheduler, in `block_on`. `block_on` _also_ sets a budget: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/runtime/basic_scheduler.rs#L131 The docs for `budget` state that: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/coop.rs#L73 This means that inside of a `LocalSet`, the calls to `budget` are no-ops. Instead, each future polled by the `LocalSet` is subtracting from a single global budget. `LocalSet`'s `RunUntil` future polls the provided future before polling any other tasks spawned on the local set: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/task/local.rs#L525-L535 In this case, the provided future is `JoinAll`. Unfortunately, every time a `JoinAll` is polled, it polls _every_ joined future that has not yet completed. When the number of futures in the `JoinAll` is >= 128, this means that the `JoinAll` immediately exhausts the task budget. This would, in theory, be a _good_ thing --- if the `JoinAll` had a huge number of `JoinHandle`s in it and none of them are ready, it would limit the time we spend polling those join handles. However, because the `LocalSet` _actually_ has a single shared task budget, this means polling the `JoinAll` _always_ exhausts the entire budget. There is now no budget remaining to poll any other tasks spawned on the `LocalSet`, and they are never able to complete. [1]: https://github.com/tokio-rs/tokio/issues/2460#issuecomment-621403122 ## Solution This branch solves this issue by resetting the task budget when polling a `LocalSet`. I've added a new function to `coop` for resetting the task budget to `UNCONSTRAINED` for the duration of a closure, and thus allowing the `budget` calls in `LocalSet` to _actually_ create a new budget for each spawned local task. Additionally, I've changed `LocalSet` to _also_ ensure that a separate task budget is applied to any future passed to `block_on`/`run_until`. Additionally, I've added a test reproducing the issue described in #2460. This test fails prior to this change, and passes after it. Fixes #2460 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2020-04-12docs: replace some html links with rustdoc paths (#2381)xliiv
Included changes - all simple references like `<type>.<name>.html` for these types - enum - fn - struct - trait - type - simple references for methods, like struct.DelayQueue.html#method.poll Refs: #1473
2020-03-28rt: cap fifo scheduler slot to avoid starvation (#2349)Carl Lerche
The work-stealing scheduler includes an optimization where each worker includes a single slot to store the **last** scheduled task. Tasks in scheduler's LIFO slot are executed next. This speeds up and reduces latency with message passing patterns. Previously, this optimization was susceptible to starving other tasks in certain cases. If two tasks ping-ping between each other without ever yielding, the worker would never execute other tasks. An early PR (#2160) introduced a form of pre-emption. Each task is allocated a per-poll operation budget. Tokio resources will return ready until the budget is depleted, at which point, Tokio resources will always return `Pending`. This patch leverages the operation budget to limit the LIFO scheduler optimization. When executing tasks from the LIFO slot, the budget is **not** reset. Once the budget goes to zero, the task in the LIFO slot is pushed to the back of the queue.
2020-03-23sync: new internal semaphore based on intrusive lists (#2325)Eliza Weisman
## Motivation Many of Tokio's synchronization primitives (`RwLock`, `Mutex`, `Semaphore`, and the bounded MPSC channel) are based on the internal semaphore implementation, called `semaphore_ll`. This semaphore type provides a lower-level internal API for the semaphore implementation than the public `Semaphore` type, and supports "batch" operations, where waiters may acquire more than one permit at a time, and batches of permits may be released back to the semaphore. Currently, `semaphore_ll` uses an atomic singly-linked list for the waiter queue. The linked list implementation is specific to the semaphore. This implementation therefore requires a heap allocation for every waiter in the queue. These allocations are owned by the semaphore, rather than by the task awaiting permits from the semaphore. Critically, they are only _deallocated_ when permits are released back to the semaphore, at which point it dequeues as many waiters from the front of the queue as can be satisfied with the released permits. If a task attempts to acquire permits from the semaphore and is cancelled (such as by timing out), their waiter nodes remain in the list until they are dequeued while releasing permits. In cases where large numbers of tasks are cancelled while waiting for permits, this results in extremely high memory use for the semaphore (see #2237). ## Solution @Matthias247 has proposed that Tokio adopt the approach used in his `futures-intrusive` crate: using an _intrusive_ linked list to store the wakers of tasks waiting on a synchronization primitive. In an intrusive list, each list node is stored as part of the entry that node represents, rather than in a heap allocation that owns the entry. Because futures must be pinned in order to be polled, the necessary invariant of such a list --- that entries may not move while in the list --- may be upheld by making the waiter node `!Unpin`. In this approach, the waiter node can be stored inline in the future, rather than requiring separate heap allocation, and cancelled futures may remove their nodes from the list. This branch adds a new semaphore implementation that uses the intrusive list added to Tokio in #2210. The implementation is essentially a hybrid of the old `semaphore_ll` and the semaphore used in `futures-intrusive`: while a `Mutex` around the wait list is necessary, since the intrusive list is not thread-safe, the permit state is stored outside of the mutex and updated atomically. The mutex is acquired only when accessing the wait list — if a task can acquire sufficient permits without waiting, it does not need to acquire the lock. When releasing permits, we iterate over the wait list from the end of the queue until we run out of permits to release, and split off all the nodes that received enough permits to wake up into a separate list. Then, we can drain the new list and notify those wakers *after* releasing the lock. Because the split operation only modifies the pointers on the head node of the split-off list and the new tail node of the old list, it is O(1) and does not require an allocation to return a variable length number of waiters to notify. Because of the intrusive list invariants, the API provided by the new `batch_semaphore` is somewhat different than that of `semaphore_ll`. In particular, the `Permit` type has been removed. This type was primarily intended allow the reuse of a wait list node allocated on the heap. Since the intrusive list means we can avoid heap-allocating waiters, this is no longer necessary. Instead, acquiring permits is done by polling an `Acquire` future returned by the `Semaphore` type. The use of a future here ensures that the waiter node is always pinned while waiting to acquire permits, and that a reference to the semaphore is available to remove the waiter if the future is cancelled. Unfortunately, the current implementation of the bounded MPSC requires a `poll_acquire` operation, and has methods that call it while outside of a pinned context. Therefore, I've left the old `semaphore_ll` implementation in place to be used by the bounded MPSC, and updated the `Mutex`, `RwLock`, and `Semaphore` APIs to use the new implementation. Hopefully, a subsequent change can update the bounded MPSC to use the new semaphore as well. Fixes #2237 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2020-03-16Add cooperative task yielding (#2160)Jon Gjengset
A single call to `poll` on a top-level task may potentially do a lot of work before it returns `Poll::Pending`. If a task runs for a long period of time without yielding back to the executor, it can starve other tasks waiting on that executor to execute them, or drive underlying resources. See for example rust-lang/futures-rs#2047, rust-lang/futures-rs#1957, and rust-lang/futures-rs#869. Since Rust does not have a runtime, it is difficult to forcibly preempt a long-running task. Consider a future like this one: ```rust use tokio::stream::StreamExt; async fn drop_all<I: Stream>(input: I) { while let Some(_) = input.next().await {} } ``` It may look harmless, but consider what happens under heavy load if the input stream is _always_ ready. If we spawn `drop_all`, the task will never yield, and will starve other tasks and resources on the same executor. This patch adds a `coop` module that provides an opt-in mechanism for futures to cooperate with the executor to avoid starvation. This alleviates the problem above: ``` use tokio::stream::StreamExt; async fn drop_all<I: Stream>(input: I) { while let Some(_) = input.next().await { tokio::coop::proceed().await; } } ``` The call to [`proceed`] will coordinate with the executor to make sure that every so often control is yielded back to the executor so it can run other tasks. The implementation uses a thread-local counter that simply counts how many "cooperation points" we have passed since the task was first polled. Once the "budget" has been spent, any subsequent points will return `Poll::Pending`, eventually making the top-level task yield. When it finally does yield, the executor resets the budget before running the next task. The budget per task poll is currently hard-coded to 128. Eventually, we may want to make it dynamic as more cooperation points are added. The number 128 was chosen more or less arbitrarily to balance the cost of yielding unnecessarily against the time an executor may be "held up". At the moment, all the tokio leaf futures ("resources") call into coop, but external futures have no way of doing so. We probably want to continue limiting coop points to leaf futures in the future, but may want to also enable third-party leaf futures to cooperate to benefit the ecosystem as a whole. This is reflected in the methods marked as `pub` in `mod coop` (even though the module is only `pub(crate)`). We will likely also eventually want to expose `coop::limit`, which enables sub-executors and manual `impl Future` blocks to avoid one sub-task spending all of their poll budget. Benchmarks (see tokio-rs/tokio#2160) suggest that the overhead of `coop` is marginal.