Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something wrong with fixing ema #2343

Closed
MolchanovYA opened this issue Mar 2, 2021 · 2 comments
Closed

Something wrong with fixing ema #2343

MolchanovYA opened this issue Mar 2, 2021 · 2 comments
Labels
bug Something isn't working Stale

Comments

@MolchanovYA
Copy link

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

🐛 Bug

A clear and concise description of what the bug is.

I get an error while trying to resume a training

To Reproduce (REQUIRED)

Input:
``
!python /content/yolov5/train.py --resume /content/drive/MyDrive/XRay/202102283/weights/last.pt


Output:
wandb: Run `wandb offline` to turn off syncing.

Traceback (most recent call last):
  File "train.py", line 532, in <module>
    train(hyp, opt, device, tb_writer, wandb)
  File "train.py", line 154, in train
    ema.ema.load_state_dict(ckpt['ema'].float().state_dict())
AttributeError: 'tuple' object has no attribute 'float'

wandb: Waiting for W&B process to finish, PID 8842

## Expected behavior
Resuming of training


## Environment
Google Colab

 - OS: [e.g. Ubuntu]
 - GPU [e.g. 2080 Ti]


## Additional context
Removing float() fixes the bug.
@MolchanovYA MolchanovYA added the bug Something isn't working label Mar 2, 2021
@glenn-jocher
Copy link
Member

@MolchanovYA sorry, there have been recent PRs to improve EMA resume behavior. Everything should work correctly now when starting and resuming a new run, however if you started and resumed your run with different versions of the code, this will likely lead to problems.

I would suggest to restart training from a new clone.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

2 participants