Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid loss with BTLM 3B training #546

Closed
6 of 8 tasks
AlpinDale opened this issue Sep 10, 2023 · 1 comment
Closed
6 of 8 tasks

Invalid loss with BTLM 3B training #546

AlpinDale opened this issue Sep 10, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@AlpinDale
Copy link

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Typically, a run would start at a certain loss and gradually go down.

Current behaviour

Training BTLM 3B 8k base model has several issues, including:

  • Does not work with gradient_checkpointing set to true
  • Train loss stays at 0.0
  • Eval loss is nan

As a result of the first issue, memory usage at micro_batch_size=1 and a seqlen of 2048 is ~70GB/GPU on 8x NVIDIA H100s. As a result of the second and third issues, training is useless.

P.S. Flash Attention works. I've also tested both with and without it.

Steps to reproduce

  1. Use this config file.

Possible solution

No idea why this happens, opening this issue so it's brought to attention.

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.10.12

axolotl branch-commit

main/c1921c9acb66c2a8b6542584f62bb02bc543acbf

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@AlpinDale AlpinDale added the bug Something isn't working label Sep 10, 2023
@winglian
Copy link
Collaborator

sample packing is not supported with BTLM yet. Flash attention support was added for BTLM with #566 which should make supporting packing easier down the line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants