✨ Extend _l2norm for sparse input #114

mbuttner · 2023-06-27T22:17:36Z

Is your feature request related to a problem? Please describe.
I tried to compute a WNN integration on a MuData object, i.e. I executed the following steps:

mu.pp.l2norm(mdata)
mu.pp.neighbors(mdata)
mu.tl.umap(mdata)

However, my Python session keeps crashing at the l2norm step. Omitting this step, computing neighbors and a UMAP works fine. I looked into the mu.pp.l2norm function and isolated the function np.linalg.norm() in the _l2norm() function as the culprit. It crashes reliably for sparse matrices. Converting the data matrix into dense format fixes the issue. This PR extends the _l2norm function to sparse matrices.

ilia-kats

Thanks, looks good to me. I'll merge as soon as you make the CI pass (I think black is complaining about the trailing whitespace in line 159)

mbuttner · 2023-06-28T20:26:06Z

Thanks! I did some more testing today and spotted a different behavior in my testing environment, where the norm matrix is still sparse after dividing with the l2norm vector, and add handling of infinite values for sparse matrices.

ilia-kats · 2023-06-28T20:38:20Z

muon/_core/preproc.py

+            i = i[isfin]
+            j = j[isfin]
+            val = val[isfin]
+            norm = csr_matrix((val, (i, j)), shape=X.shape)


I think for csr_matrix, csc_matrix, and coo_matrix it would probably be more efficient to do something like norm.data[~np.isfinite(norm.data)] = 0. Sparse matrices in AnnData are usually CSR or CSC (in fact, IO only has support for CSR and CSC implemented), so it would probably make sense to also add that special case.

Thats a very good point! I simplified it in commit e7b7b12

ilia-kats · 2023-06-28T20:48:22Z

muon/_core/preproc.py

    X.astype(norm.dtype, copy=False)
+    if sparse_X and not issparse(norm):
+        norm[~np.isfinite(norm)] = 0
+        X = X.toarray()


I don't think that will do what you're expecting it to do. The line below (X[:] = norm) overwrites the matrix in the AnnData object with the normalized values. Now you're replacing the local X variable with something else, so the normalized values will be lost. The question is: Is there ever a situation where X is sparse and norm is not? At least if X is a csr_matrix or a csc_matrix norm should always be a COO matrix.

Then you could also get rid of the if clause in line 162.

Thanks for the feedback. I removed the lines in commit e7b7b12

ilia-kats · 2023-06-28T20:54:08Z

muon/_core/preproc.py

    X.astype(norm.dtype, copy=False)
+    if sparse_X and not issparse(norm):
+        norm[~np.isfinite(norm)] = 0
+        X = X.toarray()
    X[:] = norm


I'm not sure at the moment, but can you test if this does what we expect it to do for sparse matrices? I.e. it doesn't change the sparsity structure? Otherwise it would perhaps make sense to do something like

if sparse_X and (isspmatrix_csc(X) or isspmatrix_csr(X) or issspmatrix_coo(X)): X.data[:] = norm.data[:] else: X[:] = norm

(I don't know enough about the other sparse matrix types off the top of my head, but this should cover most usecases)

I had a weird case under Python 3.8 and scipy==1.10.0 when norm became dense while X was still sparse, and then X[:] = norm crashed my kernel. That's why I wanted to catch that error.

I'll test your suggestion.

I tested your suggestion and included it in the code with minimal adaptations (commit cd3f942).

Following up on the part where norm became dense and X was still sparse. I added a clause to convert norm into a csr_matrix in that case.

mbuttner · 2023-06-29T18:26:46Z

Please merge if you think it fits to muon. I tested my version with sparse matrices and did not spot any issues.

ilia-kats · 2023-07-03T08:03:18Z

Thank you for the contribution.

Extend _l2norm for sparse input

5151d6a

Zethson requested a review from ilia-kats June 28, 2023 06:20

ilia-kats approved these changes Jun 28, 2023

View reviewed changes

mbuttner added 3 commits June 28, 2023 13:10

✨ implement l2norm for sparse matrix

dc793df

Fix black

4cf69fc

⚡ Add sparse check

522c31b

🐛 Bugfix

7a20ae6

ilia-kats reviewed Jun 28, 2023

View reviewed changes

mbuttner added 5 commits June 28, 2023 14:08

✨ convert coo_matrix

9f4923b

✨ simplify sparse matrix handling

e7b7b12

⚡ improve sparse handling

cd3f942

Fix black

042149c

remove dependency

8b8e0c1

ilia-kats merged commit d31ab16 into scverse:master Jul 3, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Extend _l2norm for sparse input #114

✨ Extend _l2norm for sparse input #114

mbuttner commented Jun 27, 2023

ilia-kats left a comment

mbuttner commented Jun 28, 2023

ilia-kats Jun 28, 2023

mbuttner Jun 28, 2023

ilia-kats Jun 28, 2023

mbuttner Jun 28, 2023

ilia-kats Jun 28, 2023

mbuttner Jun 28, 2023 •

edited

Loading

mbuttner Jun 28, 2023

mbuttner Jun 28, 2023

mbuttner Jun 28, 2023

mbuttner commented Jun 29, 2023

ilia-kats commented Jul 3, 2023

✨ Extend _l2norm for sparse input #114

✨ Extend _l2norm for sparse input #114

Conversation

mbuttner commented Jun 27, 2023

ilia-kats left a comment

Choose a reason for hiding this comment

mbuttner commented Jun 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbuttner Jun 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbuttner commented Jun 29, 2023

ilia-kats commented Jul 3, 2023

mbuttner Jun 28, 2023 •

edited

Loading