Enable quantization of tied embeddings #1703
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tied embeddings are usually implemented in a way that they subclass
torch.nn.Embedding
and then implement the custom parts to handle using of the same matrix for embeddings and for the LM head. This PR enables detecting these modules during quantization as their type is notEmbedding
, but rather something custom defined by the LLM implementation. However, they are all subclasses oftorch.nn.Embedding
which we can utilize to detect them.Before this PR, quantization modifier couldn't find tied embeddings modules as their type is not
torch.nn.Embedding
.