Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions from fetch to compute while AMM requests replica #6248

Merged
merged 8 commits into from
May 6, 2022

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Apr 29, 2022

Closes #6244

Two things in here


The order in which we fill the recommendations impacts the order in which we send messages. This specific case could mean

  • Worker A holds key T1
  • Worker B computes key T2 that depends on T1
  • Worker B fetches T1 (not in flight, yet, just queued up)
  • Worker A dies
  • Scheduler transitions T1 back to released
  • This transition recommends transitioning T2 back to released which will generate a worker message to B to release T2 again
  • Scheduler transitions T2 to waiting and afterwards to processing

depending on the order, at the time the handle-compute T1 signal arrives at B, the state will be either

  • T1: waiting
  • T2: waiting (which is strictly speaking false but shouldn't be harmful)
    or just
  • T1:waiting, i.e. T1 has no dependents from the POV of the worker

The problem of #6244 is that a task is requested to be fetched and has no dependents on the worker. Therefore, if the worker is asked to release this task, there is no reason to hold on to it and it transitions it to forgotten. There is currently no way out of forgotten other than through another handle_compute or acquire_replica (i.e. another ensure_task_exists).

The particular transition fetch->compute will expand to a fetch->released->compute and we do not want to forget about the task. There are two options to avoid this

  • If we encounter a forgotten task while doing these transitions, we could call ensure_task_exists to "revert" this
  • We explicitly block forgotten transitions in such a chain (this is what I am proposing since I think it's the simplest approach)

@fjetter fjetter marked this pull request as ready for review April 29, 2022 14:14
@fjetter fjetter changed the title WIP Race conditions from fetch to compute Race conditions from fetch to compute Apr 29, 2022
@fjetter fjetter changed the title Race conditions from fetch to compute Race conditions from fetch to compute while AMM requests replica Apr 29, 2022
@mrocklin
Copy link
Member

I took a brief look at this. No objection from me, but I didn't dive deeply into the logic. If tests pass and you're feeling confident about the added value @fjetter I think that it's ok to merge. If you can get someone like @crusaderky or @gjoseph92 to take a look that would be better of course.

@fjetter
Copy link
Member Author

fjetter commented Apr 29, 2022

CI seems to get stuck on distributed/tests/test_steal.py::test_steal_when_more_tasks. It crashes immediately with an InvalidTransition locally. I'll have a look

@github-actions
Copy link
Contributor

github-actions bot commented Apr 29, 2022

Unit Test Results

       16 files  ±  0         16 suites  ±0   7h 17m 0s ⏱️ - 16m 32s
  2 759 tests +  2    2 677 ✔️ +  2       78 💤  - 1  4 +1 
22 034 runs  +16  21 011 ✔️ +13  1 018 💤 +2  5 +1 

For more details on these failures, see this check.

Results for commit e6e4b50. ± Comparison against base commit 2286896.

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't fully understand what these changes are doing (especially reordering the transition_memory_released code). But if the test you've added don't pass without them, that seems like a good sign?

Comment on lines +2226 to +2230
for dts in ts.waiters:
if dts.state in ("no-worker", "processing"):
recommendations[dts.key] = "waiting"
elif dts.state == "waiting":
dts.waiting_on.add(ts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm: the reason for moving this code later is because transitions are insertion-ordered, so this way key gets transitioned to forgotten/waiting before its waiters get transitioned to waiting?

I wonder if this block should even be in transition_memory_released at all? Why shouldn't this part be done in transition_released_waiting and transition_released_forgotten? The fact that we need to make this transition after we make the waiting/forgotten transition makes me think we're overstepping our job in this function, and this should be the job of the other transitions.

I guess I didn't know that the recommendations dict was considered ordered. (Obviously dicts are now ordered in Python, but did it used to be an OrderedDict in py<=3.6?) If there's some dependency structure in the recommendations (transition A has to happen before transition B), I'd think transition A should be responsible for recommending transition B, not that they should be mixed together. That seems easier to reason about.

Copy link
Member Author

@fjetter fjetter May 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm: the reason for moving this code later is because transitions are insertion-ordered,

Yes, dictionaries are insertion ordered and popitem pops from the end

In [1]: adict = {}

In [2]: for x in range(10):
   ...:     adict[x] = x
   ...:

In [3]: adict.popitem()
Out[3]: (9, 9)

I wonder if this block should even be in transition_memory_released at all?

Yes, it should. If a task that was in memory is no longer in memory and a waiter, i.e. a task to be executed, exists, we need to ensure that this waiter is released. These few lines allow task to be resilient to worker failures.

Why shouldn't this part be done in transition_released_waiting

This transition is not there to reset/forget/release something but it schedules something for compute

and transition_released_forgotten

This transition should only be triggered after the scheduler deletes the entire graph. This should only ever have any scheduling consequences if there are multiple graphs scheduled that share keys. Otherwise this is simply a sophisticated pop task

If there's some dependency structure in the recommendations (transition A has to happen before transition B), I'd think transition A should be responsible for recommending transition B, not that they should be mixed together.

It's not about a dependency but about order.

In this specific case (see test

{'f1': 'waiting', 'f2': 'waiting'}

means:

  1. Transition f2 from processing back to waiting
  2. Then transition f1 from released to waiting (f1 was in memory/released before)

I don't see how we could ever infer "please transition f1 to waiting" after we released f1. From a causality perspective, I don't see how we could map this as a dependency

Edit: In a previous statement I argued that it's about finishing a chain of a tasks transitions but this was false. It's quite the opposite.

distributed/tests/test_worker.py Outdated Show resolved Hide resolved
distributed/worker.py Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Show resolved Hide resolved
distributed/worker.py Show resolved Hide resolved
@fjetter
Copy link
Member Author

fjetter commented May 5, 2022

OSX 3.10 not ci1 failure is known #6233
OSX 3.8 ci1 failure I think is related to #6211

distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@crusaderky crusaderky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two outstanding cosmetic issues for me

@crusaderky
Copy link
Collaborator

I think I'm seeing two genuine regressions in the unit tests

@fjetter
Copy link
Member Author

fjetter commented May 6, 2022

The test regression that's left is fixed with #6297

@crusaderky crusaderky merged commit 70e5c90 into dask:main May 6, 2022
@@ -647,6 +648,7 @@ def __init__(
("ready", "released"): self.transition_generic_released,
("released", "error"): self.transition_generic_error,
("released", "fetch"): self.transition_released_fetch,
("released", "missing"): self.transition_released_fetch,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - this should be transition_generic_missing.
This causes an infinite transition loop (#6305):

  1. The scheduler calls handle_compute_task with an empty who_has. This is a broken use case to begin with, but it is happening and the worker should either cope with it gracefully or crash loudly in handle_compute_task itself.
  2. handle_compute_task creates the dependencies with ensure_task_exists in state=released and requests a transition to fetch
  3. The released->fetch transition, handled by transition_released_fetch, notices that who_has is empty, so instead of transitioning to fetch it returns {ts: missing} and keeps the task in released state
  4. The released->missing transition, erroneously handled by the same method, repeats point 3 forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this "fixed" by #6318?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock fetching key from retiring worker, when scheduler thinks we already have the key
4 participants