My running outputs remain the same for a long time #53

hosea7456 · 2021-07-21T12:50:04Z

Hi, When I running the code, the outputs remain the same, it can't keep on running.

2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | INFO 2021-07-21 12:24:38 | INFO 2021-07-21 12:24:38 | INFO 2021-07-21 12:24:38 | INFO 2021-07-21 12:24:38 | INFO | yolox.core.trainer:130 - Model Summary: Params: 99.00M, Gflops: 281.52
| apex.amp.frontend:328 - Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
| apex.amp.frontend:329 - Defaults for this optimization level are:
| apex.amp.frontend:331 - enabled : True
| apex.amp.frontend:331 - opt_level : O1
| apex.amp.frontend:331 - cast_model_type : None
| apex.amp.frontend:331 - patch_torch_functions : True
| apex.amp.frontend:331 - keep_batchnorm_fp32 : None
| apex.amp.frontend:331 - master_weights : None
| apex.amp.frontend:331 - loss_scale : dynamic
| apex.amp.frontend:336 - Processing user overrides (additional kwargs that are not None)...
| apex.amp.frontend:354 - After processing overrides, optimization options are:
| apex.amp.frontend:356 - enabled : True
| apex.amp.frontend:356 - opt_level : O1
| apex.amp.frontend:356 - cast_model_type : None
| apex.amp.frontend:356 - patch_torch_functions : True
| apex.amp.frontend:356 - keep_batchnorm_fp32 : None
| apex.amp.frontend:356 - master_weights : None
| apex.amp.frontend:356 - loss_scale : dynamic
| apex.amp.scaler:69 - Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
| yolox.core.trainer:283 - loading checkpoint for fine tuning
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.0.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.0.weight in model is torch.Size([3, 320, 1, 1]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.0.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.0.bias in model is torch.Size([3]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.1.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.1.weight in model is torch.Size([3, 320, 1, 1]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.1.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.1.bias in model is torch.Size([3]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.2.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.2.weight in model is torch.Size([3, 320, 1, 1]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.2.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.2.bias in model is torch.Size([3]).
| yolox.data.datasets.coco:44 - loading annotations into memory...
| yolox.data.datasets.coco:44 - Done (t=0.28s)
| pycocotools.coco:92 - creating index...
| pycocotools.coco:92 - index created!
| yolox.core.trainer:149 - init prefetcher, this might take one minute or less...

What's the problem?

Joker316701882 · 2021-07-21T12:53:42Z

How long does it remain here?

hosea7456 · 2021-07-21T13:01:39Z

@Joker316701882
More than half an hour then I closed it. It hasn't print any error worring but it remain sucking up GPU memory.

Joker316701882 · 2021-07-21T13:09:20Z

YOLOX/yolox/exp/yolox_base.py

Line 27 in ce7d754

self.data_num_workers = 4

Please try to set this value to 0.

Joker316701882 · 2021-07-21T13:18:15Z

You may also need to try to use single gpu:
-d 1 -b 8

hosea7456 · 2021-07-21T13:30:04Z

@Joker316701882
Yea, I have already set it to be 0. And I also try use single gpu with: -d 1 -b 2, but it still doesn't works.

GOATmessi7 · 2021-07-26T08:50:41Z

See #103, plz pull the latest updates and retry it

FateScript closed this as completed Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

My running outputs remain the same for a long time #53

My running outputs remain the same for a long time #53

hosea7456 commented Jul 21, 2021

Joker316701882 commented Jul 21, 2021

hosea7456 commented Jul 21, 2021

Joker316701882 commented Jul 21, 2021

Joker316701882 commented Jul 21, 2021

hosea7456 commented Jul 21, 2021

GOATmessi7 commented Jul 26, 2021

My running outputs remain the same for a long time #53

My running outputs remain the same for a long time #53

Comments

hosea7456 commented Jul 21, 2021

Joker316701882 commented Jul 21, 2021

hosea7456 commented Jul 21, 2021

Joker316701882 commented Jul 21, 2021

Joker316701882 commented Jul 21, 2021

hosea7456 commented Jul 21, 2021

GOATmessi7 commented Jul 26, 2021