feat: adjust attn weight loading logic #1975

drbh · 2024-05-29T15:08:57Z

This PR updates load_attention to prefer loading specific attention based on the model type. Additionally there were two cases where TensorParallelColumnLinear.load_multi was called and this reduces it to a single path

Narsil

LGTM

feat: adjust attn weight loading logic

3cf4354

drbh mentioned this pull request May 29, 2024

Cannot load microsoft/Phi-3-medium and microsoft/Phi-3-small with TGI-2.0.4 #1974

Closed

4 tasks

Narsil approved these changes May 29, 2024

View reviewed changes

drbh merged commit cbced7f into main May 29, 2024
6 of 8 checks passed

drbh deleted the simplify-llama-attn-load branch May 29, 2024 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adjust attn weight loading logic #1975

feat: adjust attn weight loading logic #1975

drbh commented May 29, 2024

Narsil left a comment

feat: adjust attn weight loading logic #1975

feat: adjust attn weight loading logic #1975

Conversation

drbh commented May 29, 2024

Narsil left a comment

Choose a reason for hiding this comment