[Pallas] Integrate FlashAttention with SPMD #6935

alanwaketan · 2024-04-17T18:41:26Z

Summary:
This pull request integrating FlashAttention with SPMD. The way it works is to create a manual sharding region for the kernel which means we wraps all the inputs with enable_manual_sharding and all the outputs with disable_manual_sharding.

Added a new test file because the original test file is not SPMD aware.

Test Plan:
PJRT_DEVICE=TPU python test/test_pallas_spmd.py

JackCaoG

lgtm, minor comments

JackCaoG · 2024-04-17T21:22:12Z

test/test_pallas_spmd.py

+if xr.device_type() == 'TPU':
+  from torch_xla.experimental.custom_kernel import jax_import_guard
+  jax_import_guard()
+  import jax
+  import jax.numpy as jnp
+  from jax.experimental import pallas as pl


nit, you can put this part in the setup class similar to https://github.com/pytorch/xla/blob/master/test/spmd/test_xla_sharding_base.py#L31-L35

Is python import global?

JackCaoG · 2024-04-17T21:25:19Z

test/test_pallas_spmd.py

+    jax.config.update('jax_default_matmul_precision', jax.lax.Precision.HIGHEST)
+    from torch_xla.experimental.custom_kernel import flash_attention
+
+    xr.use_spmd()


nit, this should be called in the setup class since it is a one time global config.

I probably can call this in main as well. The setup class seems overkilled for this.

jonb377

Awesome stuff Jiewen!

jonb377 · 2024-04-17T22:48:32Z

test/test_pallas_spmd.py

+  @unittest.skipIf(xr.device_type() != 'TPU' or tpu.version() < 3,
+                   "This test only works on TPUv3+.")
+  def test_flash_attention_spmd_data_parallel(self):
+    jax.config.update('jax_default_matmul_precision', jax.lax.Precision.HIGHEST)


Does this impact the resulting kernel?

jonb377 · 2024-04-17T22:52:08Z

torch_xla/experimental/custom_kernel.py

@@ -184,15 +185,29 @@ class FlashAttention(torch.autograd.Function):
  }

  @staticmethod
-  def forward(ctx, q, k, v, causal=False):
+  def forward(ctx, q, k, v, causal=False, sharding_spec=None, mesh=None):


nit: sharding_spec -> partition_spec?

alanwaketan · 2024-04-17T23:52:29Z

Thanks Jon and Jack for the reviews.

alanwaketan added 6 commits April 17, 2024 18:32

Add an e2e test

01a5f87

initial commit

9bf7c12

Support forward

5a89157

Enable backward

899735f

Improve test case

401ca63

Fix linters

51f51a0

alanwaketan requested review from yeounoh, jonb377 and JackCaoG April 17, 2024 18:41

alanwaketan self-assigned this Apr 17, 2024

JackCaoG approved these changes Apr 17, 2024

View reviewed changes

jonb377 approved these changes Apr 17, 2024

View reviewed changes

alanwaketan added 3 commits April 17, 2024 23:48

Fix comments

944b998

Fix tests

a32140b

Fix tests

8f97ea3

alanwaketan merged commit 9f2b82d into master Apr 18, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pallas] Integrate FlashAttention with SPMD #6935

[Pallas] Integrate FlashAttention with SPMD #6935

alanwaketan commented Apr 17, 2024

JackCaoG left a comment

JackCaoG Apr 17, 2024

alanwaketan Apr 17, 2024

JackCaoG Apr 17, 2024

alanwaketan Apr 17, 2024

jonb377 left a comment

jonb377 Apr 17, 2024

alanwaketan Apr 17, 2024

jonb377 Apr 17, 2024

alanwaketan commented Apr 17, 2024

[Pallas] Integrate FlashAttention with SPMD #6935

[Pallas] Integrate FlashAttention with SPMD #6935

Conversation

alanwaketan commented Apr 17, 2024

JackCaoG left a comment

Choose a reason for hiding this comment

JackCaoG Apr 17, 2024

Choose a reason for hiding this comment

alanwaketan Apr 17, 2024

Choose a reason for hiding this comment

JackCaoG Apr 17, 2024

Choose a reason for hiding this comment

alanwaketan Apr 17, 2024

Choose a reason for hiding this comment

jonb377 left a comment

Choose a reason for hiding this comment

jonb377 Apr 17, 2024

Choose a reason for hiding this comment

alanwaketan Apr 17, 2024

Choose a reason for hiding this comment

jonb377 Apr 17, 2024

Choose a reason for hiding this comment

alanwaketan commented Apr 17, 2024