Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixtral: More correct MoE, lower loss #932

Merged
merged 2 commits into from
Dec 10, 2023
Merged

Mixtral: More correct MoE, lower loss #932

merged 2 commits into from
Dec 10, 2023

Conversation

casper-hansen
Copy link
Collaborator

@casper-hansen casper-hansen commented Dec 10, 2023

This is measured on 2x A100 with the default mixtral config. Resolves #931. This PR applies the hint we got from Mistral, turns out this is more correct because loss is lower and thus we have a more correct model.

Better loss

Main:

  • Step 1: 1.3638, Step 2: 1.2806

PR:

  • Step 1: 1.3389, Step 2: 1.2417

@casper-hansen casper-hansen changed the title More correct MoE, lower loss Mixtral: More correct MoE, lower loss Dec 10, 2023
@winglian winglian merged commit 86487c2 into main Dec 10, 2023
4 checks passed
casper-hansen referenced this pull request in vllm-project/vllm Dec 10, 2023
@winglian winglian deleted the mixtral_better_loss branch December 12, 2023 20:35
mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate high loss of Mixtral
2 participants