User a layer for p2p shuffle #7180

fjetter · 2022-10-24T17:09:18Z

This moves the graph generation for p2p shuffling to a layer. The benefit is that we can generate a "new graph" when culling.

This specifically helps with problem cases like Case 2 in #6105

df = ...
x = df.shuffle(on="x")
y = x.partitions[x.npartitions//2].persist()
sleep(0.1)
z = x.persist()

as exhibited by test case test_add_some_results.

In the above example, y, and z, will be treated as unique shuffles which makes releasing keys much more straight forward and decouples the problem significantly.

Of course, we may need to submit some data twice but I doubt this is a problem in practice.

FWIW I only inherited from SimpleShuffleLayer because I didn't want to duplicate too much code. It's sufficiently similar to what we need for now to be subclassed

github-actions · 2022-10-24T19:09:56Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±0       15 suites ±0 6h 13m 13s ⏱️ - 21m 38s
  3 153 tests ±0   3 061 ✔️ - 2   85 💤 ±0   6 ❌ +1 1 🔥 +1
23 329 runs +1 22 396 ✔️ +2 911 💤 - 3 21 ❌ +1 1 🔥 +1

For more details on these failures and errors, see this check.

Results for commit dda2ad9. ± Comparison against base commit 8f25111.

wence-

On the code-movement aspects.

fjetter · 2022-10-26T12:23:50Z

distributed/shuffle/shuffle.py

+        )
+
+    def _construct_graph(self, deserializing=None):
+        token = tokenize(self.name_input, self.column, self.npartitions, self.parts_out)


FWIW putting self.parts_out here is the actual functional change. This causes culled graphs to be unique which matches the way how the shuffle service works very well.

fjetter · 2022-10-26T12:25:56Z

distributed/shuffle/shuffle.py

-    transferred = df.map_partitions(
-        shuffle_transfer,
-        id=token,
-        npartitions=npartitions,
-        column=column,
-        meta=df,
-        enforce_metadata=False,
-        transform_divisions=False,
-    )


The one downside of this is that we'll no longer get input task fusion. The previous code here was a BlockWise layer and we'll loose this. However, I consider consistency more important than the input task fusion.

FWIW "fixing"/enabling input (and output) task fusion for graphs of this type would be beneficial for other types of graph as well but I consider this out of scope for now

fjetter · 2022-10-26T12:26:15Z

distributed/shuffle/shuffle.py

-    dsk = {
-        (name, i): (shuffle_unpack, token, i, barrier_key) for i in range(npartitions)
-    }
-    layer = MaterializedLayer(dsk, annotations={"shuffle": lambda key: key[1]})


cherry on top: no MaterializedLayer

fjetter · 2022-10-26T12:28:03Z

distributed/shuffle/tests/test_shuffle.py

+    x = dd.shuffle.shuffle(df, "x", shuffle="p2p")
+    full = await x.persist()
+    ntasks_full = len(s.tasks)
+    del full
+    while s.tasks:
+        await asyncio.sleep(0)
+    partial = await x.tail(compute=False).persist()  # Only ask for one key

-    assert len(s.tasks) < df.npartitions * 2
+    assert len(s.tasks) < ntasks_full
+    del partial


Since we're no longer fusing this test was breaking. The new test ensures that the culled graph has fewer tasks instead of asserting on the ratio to input partitions

User a layer for p2p shuffle

dda2ad9

fjetter mentioned this pull request Oct 25, 2022

Make p2p shuffle submodules private #7186

Merged

wence- approved these changes Oct 25, 2022

View reviewed changes

fjetter commented Oct 26, 2022

View reviewed changes

fjetter self-assigned this Oct 26, 2022

fjetter merged commit c349f4f into dask:main Oct 26, 2022

fjetter deleted the p2p_layer branch October 26, 2022 13:39

fjetter mentioned this pull request Oct 26, 2022

Linting failure on CI #7196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User a layer for p2p shuffle #7180

User a layer for p2p shuffle #7180

fjetter commented Oct 24, 2022

github-actions bot commented Oct 24, 2022

wence- left a comment

fjetter Oct 26, 2022

fjetter Oct 26, 2022

fjetter Oct 26, 2022

fjetter Oct 26, 2022

User a layer for p2p shuffle #7180

User a layer for p2p shuffle #7180

Conversation

fjetter commented Oct 24, 2022

github-actions bot commented Oct 24, 2022

Unit Test Results

wence- left a comment

Choose a reason for hiding this comment

fjetter Oct 26, 2022

Choose a reason for hiding this comment

fjetter Oct 26, 2022

Choose a reason for hiding this comment

fjetter Oct 26, 2022

Choose a reason for hiding this comment

fjetter Oct 26, 2022

Choose a reason for hiding this comment