You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Investigate how we can adapt the most correct solution for the router. One way to test this is to measure the initial loss. For reference, back when I implemented sliding windows for Mistral, the initial loss dropped from 9.98 on main to 1.9 from the PR.
Measure loss on short and long context data, e.g. use casperhansen/longalpaca_1k_test with alpaca format.
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.
The text was updated successfully, but these errors were encountered:
🔖 Feature description
The axolotl implementation is not aligned with the MegaBlocks implementation.
The current implementation:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/68b227a7d8045d0f428d7ca3b9750f837d03611f/src/axolotl/models/mixtral/modeling_moe_mistral.py#L223-L232
Mistral also commented on this:
✔️ Solution
Investigate how we can adapt the most correct solution for the router. One way to test this is to measure the initial loss. For reference, back when I implemented sliding windows for Mistral, the initial loss dropped from 9.98 on main to 1.9 from the PR.
Measure loss on short and long context data, e.g. use
casperhansen/longalpaca_1k_test
with alpaca format.❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: