Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for wandb #1144

Merged
merged 7 commits into from
Feb 25, 2022
Merged

Conversation

manangoel99
Copy link
Contributor

@manangoel99 manangoel99 commented Feb 22, 2022

This PR fixes issues #1030 and #1049 . It adds support for the wandb logger.
I've added a new command line argument --logger which accepts "tensorboard" and "wandb" as arguments and raises an error otherwise.
More wandb related arguments like project, entity etc. can be provided in the CLI using the prefix wandb-. For example

python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o [--cache] --logger wandb wandb-project <project name>
                         yolox-m
                         yolox-l
                         yolox-x

All the metrics are logged to the W&B dashboard along with the hyperparameters. The model checkpoints are stored as wandb artifacts along with tagging the best and latest models appropriately.

This dashboard uses yolox-tiny.

@CLAassistant
Copy link

CLAassistant commented Feb 22, 2022

CLA assistant check
All committers have signed the CLA.

@FateScript
Copy link
Member

Thanks for your contribution @manangoel99 Plz lint your code to pass the workflow.

@manangoel99
Copy link
Contributor Author

Thanks for your contribution @manangoel99 Plz lint your code to pass the workflow.

Fixed thanks!

Copy link
Member

@FateScript FateScript left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @manangoel99 , please also use isort=4.3.21 to sort import of your code, thanks a lot.

docs/quick_run.md Show resolved Hide resolved
yolox/core/trainer.py Outdated Show resolved Hide resolved
yolox/utils/logger.py Show resolved Hide resolved
Copy link
Member

@FateScript FateScript left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FateScript FateScript merged commit cc6bff6 into Megvii-BaseDetection:main Feb 25, 2022
@samithaFHSS
Copy link

I'm getting an error when using wandb with custom data. Error is 'VOCDetection' object has no attribute 'cats'. Any solution for this?

@scottire
Copy link

hi @samithaFHSS, have you been able to resolve the above issue? If not, can you share code to reproduce this issue please?

@Hyper-Devil
Copy link

@scottire Sorry to bother you, this is my log.

2022-08-12 21:29:26.829 | INFO | yolox.core.trainer:before_train:136 - Model Summary: Params: 8.94M, Gflops: 26.76
2022-08-12 21:29:28.867 | INFO | yolox.core.trainer:before_train:155 - init prefetcher, this might take one minute or less...
2022-08-12 21:29:50.369 | ERROR | yolox.core.launch:_distributed_worker:147 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (78), thread 'MainThread' (140542829814976):
Traceback (most recent call last):

File "", line 1, in
File "/usr/local/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
│ │ └ 5
│ └ 8
└ <function _main at 0x7fd2ad313940>
File "/usr/local/lib/python3.9/multiprocessing/spawn.py", line 129, in _main
return self._bootstrap(parent_sentinel)
│ │ └ 5
│ └ <function BaseProcess._bootstrap at 0x7fd2ad41ca60>

File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
│ └ <function BaseProcess.run at 0x7fd2ad41c0d0>

File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └
│ │ │ └ (<function _distributed_worker at 0x7fd120eccee0>, 0, (<function main at 0x7fd11498f790>, 2, 2, 0, 'nccl', 'tcp://127.0.0.1:4...
│ │ └
│ └ <function _wrap at 0x7fd1212ad700>

File "/usr/local/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
│ │ └ (<function main at 0x7fd11498f790>, 2, 2, 0, 'nccl', 'tcp://127.0.0.1:40017', (╒═══════════════════╤═════════════════════════...
│ └ 0
└ <function _distributed_worker at 0x7fd120eccee0>

File "/root/YOLOX-docker/yolox/core/launch.py", line 147, in _distributed_worker
main_func(*args)
│ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x7fd11498f790>

File "/root/YOLOX-docker/tools/train.py", line 118, in main
trainer.train()
│ └ <function Trainer.train at 0x7fd1143de4c0>
└ <yolox.core.trainer.Trainer object at 0x7fd1143f4b80>

File "/root/YOLOX-docker/yolox/core/trainer.py", line 74, in train
self.before_train()
│ └ <function Trainer.before_train at 0x7fd1143ded30>
└ <yolox.core.trainer.Trainer object at 0x7fd1143f4b80>

File "/root/YOLOX-docker/yolox/core/trainer.py", line 183, in before_train
self.wandb_logger = WandbLogger.initialize_wandb_logger(
│ │ └ <classmethod object at 0x7fd11f38ae50>
│ └ <class 'yolox.utils.logger.WandbLogger'>
└ <yolox.core.trainer.Trainer object at 0x7fd1143f4b80>

File "/root/YOLOX-docker/yolox/utils/logger.py", line 384, in initialize_wandb_logger
return cls(config=vars(exp), val_dataset=val_dataset, **wandb_params)
│ │ │ └ {'project': 'yolox-s'}
│ │ └ <yolox.data.datasets.voc.VOCDetection object at 0x7fd146cec5e0>
│ └ ╒═══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <class 'yolox.utils.logger.WandbLogger'>

File "/root/YOLOX-docker/yolox/utils/logger.py", line 206, in init
self.cats = val_dataset.cats
│ └ <yolox.data.datasets.voc.VOCDetection object at 0x7fd146cec5e0>
└ <yolox.utils.logger.WandbLogger object at 0x7fd146f0ddc0>

AttributeError: 'VOCDetection' object has no attribute 'cats'

@manangoel99
Copy link
Contributor Author

Hi @samithaFHSS ! The reason this error is coming up is because the logger only currently supports COCO datasets and not VOC datasets. We will try to add it ASAP

Githubinme pushed a commit to Githubinme/YOLOX that referenced this pull request Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants