CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv #1307

barronalex · 2024-08-02T19:23:07Z

Add mx.linalg.cholesky_inverse with identical functionality to torch.cholesky_inverse.

Uses lapack's optimized strtri routine for inverting triangular matrices. This makes it ~2x faster than mlx.linalg.inv and with less than half the memory usage.

e.g for an N by N matrix N = 8192, before: 4.59 sec after: 2.45 sec.

This is used by GPTQ so will be nice to have.

angeloskath

Looks perfect except for the Device::cpu hardcoding.

angeloskath · 2024-08-05T18:42:34Z

mlx/linalg.cpp

+        "matrices.");
+  }
+
+  array L_inv = tri_inv(L, upper, Device::cpu);


I would probably still pass s instead of Device::cpu here. I understand we are leaving performance on the table by doing the matmul on CPU but generally all ops dispatch on the same exact stream.

Make sense! Good motivation to have some more of linear algebra ops on GPU in the future

awni · 2024-08-05T18:47:34Z

Does it make sense to make a solve_triangular (like in scipy.linalg and torch?

Would that be usable in place of cholesky inv and tri inv for the use case you are going for?

barronalex · 2024-08-05T19:30:01Z

You could definitely implement the above with just solve_triangular and it would be more general which is nice.

scipy uses the same lapack ops underneath and from my quick test it seems like computing $L^{-1}$ with solve_triangular instead of strtri is about 17% slower:

from scipy.linalg import solve_triangular
from scipy.linalg.lapack import strtri

N = 1024
A = np.random.normal(size=(N, N)).astype(np.float32)
A = A @ A.T
L = np.linalg.cholesky(A)

L_inv_ref = np.linalg.inv(L)
L_inv_i, _ = strtri(L, 1)
L_inv_t = solve_triangular(L, np.eye(N), lower=True)
np.testing.assert_allclose(L_inv_i, L_inv_ref, atol=1e-4)
np.testing.assert_allclose(L_inv_t, L_inv_ref, atol=1e-4)

time_fn(strtri, L, 1)
time_fn(solve_triangular, L, np.eye(N), lower=True)
time_fn(np.linalg.inv, L)

Output:

Timing function strtri ... 5.88889 msec
Timing solve_triangular ... 6.88527 msec
Timing inv ... 41.91644 msec

Do you think that justifies having a separate tri_inv API?

Either way solve_triangular seems generally useful so happy to add it.

awni · 2024-08-05T22:50:47Z

Right, I meant more like if you could replace explicitly computing L_inv (which may not be so numerically stable) with the use of solve_triangular. But I guess it depends on what you do with it? Like if we want L_inv * b then we can do a triangular solve for L x = b?

But if we need the full inverse then we need it.

If we keep this API, can we make the naming consistent. Like inv vs inverse. Another option (which I might prefer) is to not add cholesky_inverse since it's quite a simple add on to tri_inv so maybe easy enough for now to do in user code?

barronalex · 2024-08-08T00:14:31Z

For cholesky_inverse I think we do need $L^{-1}$ directly since we construct $A^{-1} = L^{-T}L^{-1}$.

I played around with the GPTQ use case and I don't think triangular solve is sufficient unfortunately.

Definitely agree on the naming consistency. I think it's nice to have cholesky_inv in the API since it's in PyTorch but I could be convinced to leave it out too.

awni · 2024-08-08T00:45:49Z

I think it's nice to have cholesky_inv in the API since it's in PyTorch

Sounds good to me!

Alex Barron added 2 commits August 2, 2024 10:56

add cholesky inv + tri inv

8ceb6fb

always run tri_inv on cpu

3af3c66

barronalex requested review from angeloskath and awni August 2, 2024 19:23

angeloskath approved these changes Aug 5, 2024

View reviewed changes

consistent naming

4989a44

barronalex merged commit 32668a7 into main Aug 8, 2024
3 checks passed

barronalex deleted the ab-cholesky-inv branch August 8, 2024 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv #1307

CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv #1307

barronalex commented Aug 2, 2024

angeloskath left a comment

angeloskath Aug 5, 2024

barronalex Aug 5, 2024

awni commented Aug 5, 2024

barronalex commented Aug 5, 2024 •

edited

Loading

awni commented Aug 5, 2024

barronalex commented Aug 8, 2024

awni commented Aug 8, 2024 •

edited

Loading

CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv #1307

CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv #1307

Conversation

barronalex commented Aug 2, 2024

angeloskath left a comment

Choose a reason for hiding this comment

angeloskath Aug 5, 2024

Choose a reason for hiding this comment

barronalex Aug 5, 2024

Choose a reason for hiding this comment

awni commented Aug 5, 2024

barronalex commented Aug 5, 2024 • edited Loading

awni commented Aug 5, 2024

barronalex commented Aug 8, 2024

awni commented Aug 8, 2024 • edited Loading

barronalex commented Aug 5, 2024 •

edited

Loading

awni commented Aug 8, 2024 •

edited

Loading