Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file #19

Open
hhhhhhxl opened this issue Jul 9, 2021 · 2 comments
Open

Comments

@hhhhhhxl
Copy link

hhhhhhxl commented Jul 9, 2021

I followed the steps and tried to train with 2 GPUs and got this error.
./tools/dist_train.sh ./configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py 2


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


[07/09 11:22:20] root WARNING: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
Traceback (most recent call last):
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/sy/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', './configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

@hhhhhhxl
Copy link
Author

hhhhhhxl commented Jul 9, 2021

it seems this file './data/weneed/mask_r50/epoch_12.pth' is missing.

@xiaohe6
Copy link

xiaohe6 commented Jun 24, 2022

I followed the steps and tried to train with 2 GPUs and got this error. ./tools/dist_train.sh ./configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py 2

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

[07/09 11:22:20] root WARNING: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Train fc_cls only. --Dist-train--IS:False--ISout:False Dist-train --- Not using image sampling. Train fc_cls only. --Dist-train--IS:False--ISout:False Dist-train --- Not using image sampling. [07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth Traceback (most recent call last): File "./tools/train.py", line 169, in main() File "./tools/train.py", line 165, in main logger=logger) File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train runner.load_checkpoint(cfg.load_from) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint self.logger) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint raise IOError('{} is not a checkpoint file'.format(filename)) OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file [07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth Traceback (most recent call last): File "./tools/train.py", line 169, in main() File "./tools/train.py", line 165, in main logger=logger) File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train runner.load_checkpoint(cfg.load_from) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint self.logger) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint raise IOError('{} is not a checkpoint file'.format(filename)) OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file Traceback (most recent call last): File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in main() File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/sy/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', './configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Hello, I can't download this ./data/download_models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth file right now, I dare ask if you have downloaded it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants