summaryrefslogtreecommitdiffstats
path: root/ci
diff options
context:
space:
mode:
authorAndrew Gallant <jamslam@gmail.com>2020-02-20 15:08:37 -0500
committerAndrew Gallant <jamslam@gmail.com>2020-02-20 16:07:51 -0500
commitf314b0d55f2fa7c9f19b677624a19a7ba24c7cf4 (patch)
treeb9f9cf80d7ac7ed9540f336eac8daa821637d1c1 /ci
parentfab5c812f31627ea5d57d5710cd3ccbd73244a2b (diff)
ignore: fix parallel traversal
It turns out that the previous version wasn't quite correct. Namely, it was possible for the following sequence to occur: 1. Consider that all workers, except for one, are `waiting`. 2. The last remaining worker finds one more job to do and sends it on the channel. 3. One of the previously `waiting` workers wakes up from the job that the last running worker sent, but `self.resume()` has not been called yet. 4. The last worker, from (2), calls `get_work` and sees that the channel has nothing on it, so it executes `self.waiting() == 1`. Since the worker in (3) hasn't called `self.resume()` yet, `self.waiting() == 1` evaluates to true. 5. This sets off a chain reaction that stops all workers, despite that fact that (3) got more work (which could itself spawn more work). The end result is that the traversal may terminate while their are still outstanding work items to process. This problem was observed through spurious failures in CI. I was not actually able to reproduce the bug locally. We fix this by changing our strategy to detect termination using a counter. Namely, we increment the counter just before sending new work and decrement the counter just after finishing work. In this way, we guarantee that the counter only ever reaches 0 once there is no more work to process. See #1337 for more discussion. Many thanks to @zsugabubus for helping me work through this.
Diffstat (limited to 'ci')
0 files changed, 0 insertions, 0 deletions