Please read & provide the following #5265

skylark-joe · 2024-04-19T15:46:07Z

hi, i trian the model for 3000 ite, only to find the EVALU result to be zero. During the training time, there is no error, expcet a waring saying that
Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.
does it have something to do with the argu config.eval_only? i do not set the argument, so it should take the default
any available hep would be appreciated.

Instructions To Reproduce the Issue:

Full runnable code or full changes you made:
i follow the turiroal , loading my datasets using register_coco_json() , then i start to train.
here is my .yaml file, the Base-RCNN-FPN.yaml is the original one in config directory

BASE: "../Base-RCNN-FPN.yaml"
MODEL:
WEIGHTS: ""
MASK_ON: True
RESNETS:
DEPTH: 50
ROI_HEADS:
NUM_CLASSES: 6
DATASETS:
TRAIN: ("steel_train",) #("coco_2017_train",)
TEST: ("steel_val",) #("coco_2017_val",)
DATALOADER:
NUM_WORKERS: 8
SOLVER:
STEPS: () #(210000, 250000)
MAX_ITER: 270000
IMS_PER_BATCH: 16
BASE_LR: 0.001 #0.02
MAX_ITER: 90000

in command line, i run
python plain_train_net.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml
and it run successfully as it seems
Full logs or other relevant observations:
[04/19 04:37:58] d2.data.datasets.coco INFO: Loaded 360 images in COCO format from ../datasets/steel/annotations/steel_val.json
[04/19 04:37:58] d2.data.build INFO: Distribution of instances among all 6 categories:
�[36m| category | #instances | category | #instances | category | #instances |
|:-------------:|:-------------|:-------------:|:-------------|:----------:|:-------------|
| crazing | 154 | inclusion | 195 | patches | 170 |
| pitted_surf.. | 82 | rolled-in_s.. | 132 | scratches | 102 |
| | | | | | |
| total | 835 | | | | |�[0m
[04/19 04:37:58] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[04/19 04:37:58] d2.data.common INFO: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[04/19 04:37:58] d2.data.common INFO: Serializing 360 elements to byte tensors and concatenating them all ...
[04/19 04:37:58] d2.data.common INFO: Serialized dataset takes 0.19 MiB
[04/19 04:37:58] d2.evaluation.evaluator INFO: Start inference on 360 batches
[04/19 04:38:00] d2.evaluation.evaluator INFO: Inference done 11/360. Dataloading: 0.0006 s/iter. Inference: 0.0690 s/iter. Eval: 0.0027 s/iter. Total: 0.0723 s/iter. ETA=0:00:25
[04/19 04:38:05] d2.evaluation.evaluator INFO: Inference done 80/360. Dataloading: 0.0011 s/iter. Inference: 0.0684 s/iter. Eval: 0.0030 s/iter. Total: 0.0725 s/iter. ETA=0:00:20
[04/19 04:38:10] d2.evaluation.evaluator INFO: Inference done 151/360. Dataloading: 0.0012 s/iter. Inference: 0.0676 s/iter. Eval: 0.0028 s/iter. Total: 0.0717 s/iter. ETA=0:00:14
[04/19 04:38:15] d2.evaluation.evaluator INFO: Inference done 221/360. Dataloading: 0.0012 s/iter. Inference: 0.0679 s/iter. Eval: 0.0028 s/iter. Total: 0.0719 s/iter. ETA=0:00:09
[04/19 04:38:20] d2.evaluation.evaluator INFO: Inference done 290/360. Dataloading: 0.0012 s/iter. Inference: 0.0681 s/iter. Eval: 0.0028 s/iter. Total: 0.0721 s/iter. ETA=0:00:05
[04/19 04:38:25] d2.evaluation.evaluator INFO: Total inference time: 0:00:25.371791 (0.071470 s / iter per device, on 1 devices)
[04/19 04:38:25] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:00:23 (0.067332 s / iter per device, on 1 devices)
[04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
[04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Saving results to ./output/inference/steel_val/coco_instances_results.json
[04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Evaluating predictions with unofficial COCO API...
[04/19 04:38:25] d2.evaluation.fast_eval_api INFO: Evaluate annotation type bbox
[04/19 04:38:25] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 0.04 seconds.
[04/19 04:38:25] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results...
[04/19 04:38:25] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 0.02 seconds.
[04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox:
| AP | AP50 | AP75 | APs | APm | APl |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
[04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Per-category bbox AP:
| category | AP | category | AP | category | AP |
|:---------------|:------|:----------------|:------|:-----------|:------|
| crazing | 0.000 | inclusion | 0.000 | patches | 0.000 |
| pitted_surface | 0.000 | rolled-in_scale | 0.000 | scratches | 0.000 |
[04/19 04:38:26] d2.evaluation.fast_eval_api INFO: Evaluate annotation type segm
[04/19 04:38:26] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 0.13 seconds.
[04/19 04:38:26] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results...
[04/19 04:38:26] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 0.02 seconds.
[04/19 04:38:26] d2.evaluation.coco_evaluation INFO: Evaluation results for segm:
| AP | AP50 | AP75 | APs | APm | APl |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
[04/19 04:38:26] d2.evaluation.coco_evaluation INFO: Per-category segm AP:
| category | AP | category | AP | category | AP |
|:---------------|:------|:----------------|:------|:-----------|:------|
| crazing | 0.000 | inclusion | 0.000 | patches | 0.000 |
| pitted_surface | 0.000 | rolled-in_scale | 0.000 | scratches | 0.000 |
[04/19 04:38:26] detectron2 INFO: Evaluation results for steel_val in csv format:
[04/19 04:38:26] d2.evaluation.testing INFO: copypaste: Task: bbox
[04/19 04:38:26] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl
[04/19 04:38:26] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
[04/19 04:38:26] d2.evaluation.testing INFO: copypaste: Task: segm
[04/19 04:38:26] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl
[04/19 04:38:26] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000

Expected behavior:

at least, there should be a result but zero, i don not know what cause the problem

Environment:

the environment is set up following the tutorial

-------------------------------  -----------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
numpy                            1.23.5
detectron2                       0.6 @/home/detectron2/detectron2
Compiler                         GCC 9.4
CUDA compiler                    CUDA 11.6
detectron2 arch flags            7.5
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          1.12.1+cu116 @/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0                            NVIDIA GeForce RTX 2080 Ti (arch=7.5)
Driver version                   510.54
CUDA_HOME                        /usr/local/cuda
Pillow                           9.3.0
torchvision                      0.13.1+cu116 @/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torchvision
torchvision arch flags           3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                           0.1.5.post20221221
iopath                           0.1.9
cv2                              4.6.0
-------------------------------  -----------------------------------------------------------------------------------

The text was updated successfully, but these errors were encountered:

Huxwell · 2024-04-23T10:07:18Z

It's hard to tell why your model is not learning based on the limited info you provided.
A good sanity check would be using the same train set and validation set during your first training run.
You can even use 2-10 images instead of 360.

http://karpathy.github.io/2019/04/25/recipe/
'overfit one batch. Overfit a single batch of only a few examples (e.g. as little as two). To do so we increase the capacity of our model (e.g. add layers or filters) and verify that we can reach the lowest achievable loss (e.g. zero). I also like to visualize in the same plot both the label and the prediction and ensure that they end up aligning perfectly once we reach the minimum loss. If they do not, there is a bug somewhere and we cannot continue to the next stage.'

skylark-joe · 2024-04-24T05:12:54Z

thanks for your advice, it is kind of u to provide an atricle here, which i would read.
in fact, accidently, i guess i have solved the problem, through changeing the category_id in my json_file to [1, categories].

the waring we got is in detectron2\detectron2\data\datasets\coco.py line 104, saying that it will apply a mapping
however, when i read the following codes, i find that there is no operation to category_id as it saied.

in addition, in line 437 in the same file, i see that the
"id" field must start with 1 if we want to use the COCO API,
and then, after changing, it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please read & provide the following #5265

Please read & provide the following #5265

skylark-joe commented Apr 19, 2024

Huxwell commented Apr 23, 2024

skylark-joe commented Apr 24, 2024

Please read & provide the following #5265

Please read & provide the following #5265

Comments

skylark-joe commented Apr 19, 2024

Instructions To Reproduce the Issue:

Expected behavior:

Environment:

Huxwell commented Apr 23, 2024

skylark-joe commented Apr 24, 2024