masking compatible with fullgraph compile #91

theAdamColton · 2023-12-08T18:03:40Z

this adds some slightly confusing masking code, but improves speed by 3x by making the shape of intermediate tensors non-dynamic. The masked_mean code is equivalent, up to fp precision, with the old code that used tensor indexing

Before, using LFQ with masking was not compatible with torch.compile with fullgraph=True or with dynamic=False. It was compatible with plain torch.compile, but the masked tensor indexing caused graph breaks

I added an example that uses masked sequences, to make sure it works properly

I did a benchmark. I ran the example code that uses masking. This was on a 3090 GPU

the previous masked LFQ implementation, using torch.compile(model, fullgraph=False, mode='max-autotune'), had an average model.forward time of 1.18 milliseconds
with this commit, using torch.compile(model, fullgraph=True, mode='max-autotune'), the average time is 0.40 milliseconds

The speedup might be worth the extra confusingness in the code

support arg unpacking like this

lucidrains · 2023-12-09T15:05:54Z

ah yea, that does look a bit confusing, needs a tiny bit more work

do you think you can try fitting all the logic into one function, masked_mean, where if mask is None, it simply takes a regular .mean()?

lucidrains · 2023-12-09T15:06:15Z

we can reassess after your refactor

lucidrains · 2023-12-09T15:11:02Z

@theAdamColton have you tried the updated LFQ? curious how you got good results on the previous broken one

theAdamColton · 2023-12-09T17:35:29Z

With the previous LFQ i set entropy loss and commit loss to very low weights and it did actually work.

theAdamColton · 2023-12-09T17:41:01Z

I've also been experimenting with the entropy loss from maskgit, it does it slightly different than the current lfq code here. The one there seems to work pretty well

theAdamColton · 2023-12-09T17:48:52Z

Also, this is a different issue, but I think here where the entropy is computed, maybe it should use F.log_softmax to separately compute the log probs from the distances, instead of taking the log of the probs to get the log probs.

lucidrains · 2023-12-09T18:11:06Z

Also, this is a different issue, but I think here where the entropy is computed, maybe it should use F.log_softmax to separately compute the log probs from the distances, instead of taking the log of the probs to get the log probs.

@theAdamColton how is that different? can you show me in code?

theAdamColton · 2023-12-09T19:25:49Z

@lucidrains
for example, instead of

prob = (-distance * inv_temperature).softmax(dim = -1)
per_sample_entropy = (-prob * log(prob)).sum(dim=-1).mean()

this is what I mean:

prob = (-distance * inv_temperature).softmax(dim = -1)
log_prob = F.log_softmax(-distance * inv_temperature, dim = -1)
per_sample_entropy = (-prob * log_prob).sum(dim=-1).mean()

I don't know if it would make a difference, but it's what the maskgit code does. Using log_softmax might fix precision issues

from the pytorch log_softmax doc
"While mathematically equivalent to log(softmax(x)), doing these two operations separately is slower and numerically unstable. This function uses an alternative formulation to compute the output and gradient correctly."

lucidrains · 2023-12-09T19:34:30Z

I think the numerical stability is accounted for by the epsilon in the log I have in the file, but do let me know otherwise

lucidrains · 2023-12-09T19:35:55Z

anyways, I've put in my hours today, happy Saturday! See if you can get that mask to go into the masked mean fn and I'll review it again

theAdamColton added 3 commits December 8, 2023 09:30

added lfq masked example, which works for current lfq impl

ddab5e0

made mean torch compile compatible

3897d2a

made expand code compatible with older python versions that don't

e03be73

support arg unpacking like this

lucidrains force-pushed the master branch 2 times, most recently from d9967be to 34b9e97 Compare May 10, 2024 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

masking compatible with fullgraph compile #91

masking compatible with fullgraph compile #91

theAdamColton commented Dec 8, 2023

lucidrains commented Dec 9, 2023

lucidrains commented Dec 9, 2023

lucidrains commented Dec 9, 2023

theAdamColton commented Dec 9, 2023

theAdamColton commented Dec 9, 2023

theAdamColton commented Dec 9, 2023

lucidrains commented Dec 9, 2023

theAdamColton commented Dec 9, 2023

lucidrains commented Dec 9, 2023

lucidrains commented Dec 9, 2023 •

edited

Loading

masking compatible with fullgraph compile #91

Are you sure you want to change the base?

masking compatible with fullgraph compile #91

Conversation

theAdamColton commented Dec 8, 2023

lucidrains commented Dec 9, 2023

lucidrains commented Dec 9, 2023

lucidrains commented Dec 9, 2023

theAdamColton commented Dec 9, 2023

theAdamColton commented Dec 9, 2023

theAdamColton commented Dec 9, 2023

lucidrains commented Dec 9, 2023

theAdamColton commented Dec 9, 2023

lucidrains commented Dec 9, 2023

lucidrains commented Dec 9, 2023 • edited Loading

lucidrains commented Dec 9, 2023 •

edited

Loading