Fast Hadamard Transform #1249

barronalex · 2024-07-03T03:00:03Z

Supports n = m*2^k where m in (1, 12, 20, 28). (e.g. Llama 3 70B has a hidden size of 28672 = 28*1024).

Due to shared memory limits we support 2^k <= 8192 for FP32 and 2^k <= 16384 for FP16/BF16.

Planning to use this to enable low-bit, online quantization of the KV cache similar to Quarot/SpinQuant.

Benchmarks

We get close to full bandwidth for 2^k and half bandwidth for m*2^k (since we do those in two uploads). This is much faster than manually doing hadamard(N) @ x for non-trivial batch size.

awni · 2024-07-03T13:45:38Z

Very nice!!

From an API perspective, I'm wondering if this should live in the fast namespace?

The primitive does not have any transforms implemented (which is fine). I guess the question is really if we intend to implement them eventually. If yes, then maybe it makes sense to keep it as is. But if no, I would consider putting it in the fast package for now and maybe do a transformable fallback (hadmard(n) @ x) if it's not too tedious.

barronalex · 2024-07-03T19:56:59Z

Agreed! The transforms are pretty simple so I added vjp/jvp/vmap.

angeloskath

Looks fantastic!

I left a few nitpicks, I think we can merge after that.

P.S.: Do we think we want a CPU implementation from the get go? Or we 'll simply add it later?

angeloskath · 2024-07-08T21:57:49Z

mlx/primitives.cpp

+  if (axes[0] == inputs[0].ndim() - 1) {
+    auto a = moveaxis(inputs[0], axes[0], 0, s);
+    auto b = hadamard_transform(a, scale_, s);
+    return {{moveaxis(b, 0, axes[0], s)}, axes};


No need to move it back, you can just return it with 0 ie return {{b}, {0}};

angeloskath · 2024-07-08T22:09:51Z

mlx/primitives.h

+  DEFINE_VMAP()
+  DEFINE_GRADS()
+  DEFINE_PRINT(Hadamard)
+  DEFINE_DEFAULT_IS_EQUIVALENT()


Unfortunately the default is_equivalent is incorrect since we also need to check the scales. In general this is best left undefined unless certainly correct as it can cause quite hard to debug errors.

angeloskath · 2024-07-08T22:14:14Z

docs/src/python/ops.rst

@@ -72,6 +72,7 @@ Operations
   gather_qmm
   greater
   greater_equal
+   hadamard


Typo hadamard_transform .

angeloskath

Left a suggestion for the is_equivalent. Otherwise looks awesome!

angeloskath · 2024-07-09T22:02:24Z

mlx/primitives.cpp

@@ -3950,4 +3950,37 @@ bool View::is_equivalent(const Primitive& other) const {
  return (dtype_ == a_other.dtype_);
 }

+std::pair<std::vector<array>, std::vector<int>> Hadamard::vmap(


Suggested change

std::pair<std::vector<array>, std::vector<int>> Hadamard::vmap(

bool Hadamard::is_equivalent(const Primitive& other) const {

const Hadamard& h_other = static_cast<const Hadamard&>(other);

return scale_ == h_other.scale_;

}

std::pair<std::vector<array>, std::vector<int>> Hadamard::vmap(

Also needs the declaration in primitives.h of course.

Thanks! Added this in the latest commit.

Alex Barron added 6 commits July 1, 2024 16:22

Working hadamard for powers of 2

c635260

working for m*2^k

81e5f18

add scale and check contiguity

05d1662

add size check

6bdfe7d

clean up

92f1fe4

fix test

7a1f7a7

barronalex requested review from angeloskath, awni and jagrit06 July 3, 2024 03:00

Alex Barron added 3 commits July 3, 2024 12:50

add grads + vmap

9149368

gpu only

bbe137b

skip on linux

d3c53d7

test typo

9d9ddc5

angeloskath requested changes Jul 8, 2024

View reviewed changes

Alex Barron added 2 commits July 9, 2024 14:48

add cpu impl

6d58dcc

remove gpu only tests

78402f9

angeloskath approved these changes Jul 9, 2024

View reviewed changes

fix linux build + add is_equivalent

94debdf

barronalex merged commit a3c2873 into main Jul 10, 2024
3 checks passed

barronalex deleted the ab-hadamard branch July 10, 2024 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast Hadamard Transform #1249

Fast Hadamard Transform #1249

barronalex commented Jul 3, 2024

awni commented Jul 3, 2024

barronalex commented Jul 3, 2024

angeloskath left a comment

angeloskath Jul 8, 2024

angeloskath Jul 8, 2024

angeloskath Jul 8, 2024

angeloskath left a comment

angeloskath Jul 9, 2024

angeloskath Jul 9, 2024

barronalex Jul 9, 2024

-std::pair<std::vector<array>, std::vector<int>> Hadamard::vmap(
+bool Hadamard::is_equivalent(const Primitive& other) const {
+  const Hadamard& h_other = static_cast<const Hadamard&>(other);
+  return scale_ == h_other.scale_;
+}
+std::pair<std::vector<array>, std::vector<int>> Hadamard::vmap(

Fast Hadamard Transform #1249

Fast Hadamard Transform #1249

Conversation

barronalex commented Jul 3, 2024

Benchmarks

awni commented Jul 3, 2024

barronalex commented Jul 3, 2024

angeloskath left a comment

Choose a reason for hiding this comment

angeloskath Jul 8, 2024

Choose a reason for hiding this comment

angeloskath Jul 8, 2024

Choose a reason for hiding this comment

angeloskath Jul 8, 2024

Choose a reason for hiding this comment

angeloskath left a comment

Choose a reason for hiding this comment

angeloskath Jul 9, 2024

Choose a reason for hiding this comment

angeloskath Jul 9, 2024

Choose a reason for hiding this comment

barronalex Jul 9, 2024

Choose a reason for hiding this comment