Speculative task assignment #4264

gforsyth · 2020-11-23T16:15:36Z

This is a preliminary cut at introducing speculative task assignment to the distributed scheduler and closes #3974

As @mrocklin mentions in #3974, if tasks are speculatively assigned to a worker that contains all of their dependencies (which may still be processing or awaiting execution), this removes the need for task fusion.

The current approach I've started sketching out here is to use the existing recommendation system in the scheduler to examine a tasks dependent(s) -- (I'm going to switch to parent/child terminology here to make this less confusing):
if a task has only one child and all parents of that child task are on the same worker, speculatively assign that task to the same worker.

Still TODO:

so many more tests
figure out if we can disable fusion for an entire test run (to make sure tasks are being assigned as expected)
Update scheduling policy documentation to include possibility of speculative assignment
Figure out what to do with Actors (ignore spec assignment in this case?): don't use speculative assignment with actors
Handle resource restrictions
Handle work stealing

and a fix for a weird nbytes issues

mrocklin

Eight comments. Ah! Ah! Ah!

distributed/scheduler.py

mrocklin · 2020-11-23T16:40:52Z

distributed/scheduler.py

+
+            if self.validate:
+                # All dependencies are on the same worker
+                assert len({dts.processing_on for dts in ts.dependencies}) == 1


We probably also want that at least one of the dependencies is in a "processing" state

distributed/scheduler.py

mrocklin · 2020-11-23T16:43:37Z

distributed/scheduler.py

+
+            self.send_task_to_worker(worker, key)
+
+            return _recommend_speculative_assignment(ts)


Ah, so we would send long chains down to a worker all at once?

that's what currently happens and I can see it leading to less than stellar performance, especially if a cluster is scaling up --
We could add a config item about # of speculative tasks to assign to a given worker? a la max_height in the fuse options?

Oh, I think it's fine. Let's leave it like this until there is a problem.

mrocklin · 2020-11-23T16:44:55Z

distributed/scheduler.py

@@ -3837,7 +3885,14 @@ def get_comm_cost(self, ts, ws):
        Get the estimated communication cost (in s.) to compute the task
        on the given worker.
        """
-        return sum(dts.nbytes for dts in ts.dependencies - ws.has_what) / self.bandwidth
+        # TODO: How is it possible for nbytes to be None when there's a getter that is supposed to
+        # stop that from happening?


This looks like an excellent "who-dunnit?" mystery to enjoy

distributed/tests/test_speculative.py

mrocklin · 2020-11-23T16:48:52Z

distributed/worker.py

@@ -1441,7 +1443,7 @@ def add_task(
                self.tasks[key] = ts = TaskState(
                    key=key, runspec=SerializedTask(function, args, kwargs, task)
                )
-                ts.state = "waiting"
+                ts.state = "waiting" if not speculative else "speculative"


Hrm, what if the dependency has just finished running? Maybe there is a way to trust the local state here more than what the scheduler said.

It shouldn't make a difference, when add_task finishes we'll have noted that the dependency is in memory and then the task will transition from speculative -> ready

mrocklin · 2020-11-23T21:26:57Z

distributed/scheduler.py

            ("waiting", "released"): self.transition_waiting_released,
            ("waiting", "processing"): self.transition_waiting_processing,
            ("waiting", "memory"): self.transition_waiting_memory,
+            ("waiting", "speculative"): self.transition_waiting_speculative,
+            ("speculative", "processing"): self.transition_speculative_processing,


We'll eventually want to see what happens when a speculative task gets pushed back to released/waiting again, such as if a worker goes down. The catch all behavior is, I think, to move things back to released. I think that if you handle that that things should be ok, we may also want a speculative->waiting though. Same with errors. I suspect that when a processing task errs it will mark all of its dependents as errored, and so the scheduler will try to enact a speculative->erred transition.

In some of these it may be that the current self.transition_processing_* methods will work without modification, but we'll probably have to verify this.

Removing one less extra dictionary to track -- this was the source of some tricky-to-debug issues over in dask#4264 and I thought I'd break it out into a separate PR instead of jamming everything into one enormous PR.

Removing one less extra dictionary to track -- this was the source of some tricky-to-debug issues over in #4264 and I thought I'd break it out into a separate PR instead of jamming everything into one enormous PR.

Removing one less extra dictionary to track -- this was the source of some tricky-to-debug issues over in dask#4264 and I thought I'd break it out into a separate PR instead of jamming everything into one enormous PR.

This was an initial attempted fix for a test failure in `test_worker.py::test_clean_nbytes` that turns out to be unnecessary. On top of that, it breaks actors, so it shouldn't be in here. For reference, the value of actor tasks goes in a separate `self.actors` dict, not `self.data`

gforsyth added 11 commits November 20, 2020 11:49

First cut

acf4705

doesn't just fall over

fc5e70e

Possible that nbytes is None

307d3ef

Also recommend spec tasks from spec tasks

4f72097

One Unit Test. Ah! Ah! Ah!

0f49e7a

Move spec tests to separate file

8cbd0ce

Possible to be waiting_on a task that is "ready" now

98f741f

waiting on or all dependencies on same worker

756bf1b

and a fix for a weird nbytes issues

Spec assign if all dependencies are on same worker

3283c53

black

5400f20

Break out speculative assignment into helper function

74cfd1a

mrocklin reviewed Nov 23, 2020

View reviewed changes

gforsyth added 3 commits November 23, 2020 14:00

Separate spec validation function, additional transition function

cebe850

Remove speculative attribute

ee023a8

Improve test

0f9b767

gforsyth force-pushed the speculative_task branch from 6b48a1d to 0f9b767 Compare November 23, 2020 21:17

black

3776dc5

mrocklin reviewed Nov 23, 2020

View reviewed changes

gforsyth added 3 commits November 23, 2020 16:59

Add transition for speculative -> released

1156345

Update worker_plugin test to for speculative assignment

b3b19c0

Update a few tests for speculative assignment

2e1b5e6

gforsyth mentioned this pull request Nov 24, 2020

Move nbytes from WorkerState to TaskState #4274

Merged

gforsyth force-pushed the speculative_task branch 2 times, most recently from 9565fa2 to 2e1b5e6 Compare December 9, 2020 16:02

gforsyth and others added 3 commits December 9, 2020 11:15

Move nbytes from Worker's state to TaskState (dask#4274)

cd66c56

Removing one less extra dictionary to track -- this was the source of some tricky-to-debug issues over in dask#4264 and I thought I'd break it out into a separate PR instead of jamming everything into one enormous PR.

Fixes for nbytes is None and different reasons to wait for dependency

fc2d746

Merge branch 'master' into speculative_task

b609e8e

gforsyth force-pushed the speculative_task branch from 62c79e3 to b609e8e Compare December 9, 2020 19:17

gforsyth added 2 commits December 10, 2020 12:07

pass TaskState not key

c2ced34

Base automatically changed from master to main March 8, 2021 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative task assignment #4264

Speculative task assignment #4264

gforsyth commented Nov 23, 2020 •

edited

Loading

mrocklin left a comment

mrocklin Nov 23, 2020

mrocklin Nov 23, 2020

gforsyth Nov 23, 2020

mrocklin Nov 23, 2020

mrocklin Nov 23, 2020

mrocklin Nov 23, 2020

gforsyth Nov 23, 2020

mrocklin Nov 23, 2020


		self.send_task_to_worker(worker, key)

		return _recommend_speculative_assignment(ts)

Speculative task assignment #4264

Are you sure you want to change the base?

Speculative task assignment #4264

Conversation

gforsyth commented Nov 23, 2020 • edited Loading

mrocklin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gforsyth commented Nov 23, 2020 •

edited

Loading