Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please read & provide the following #5265

Open
skylark-joe opened this issue Apr 19, 2024 · 2 comments
Open

Please read & provide the following #5265

skylark-joe opened this issue Apr 19, 2024 · 2 comments

Comments

@skylark-joe
Copy link

hi, i trian the model for 3000 ite, only to find the EVALU result to be zero. During the training time, there is no error, expcet a waring saying that
Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.
does it have something to do with the argu config.eval_only? i do not set the argument, so it should take the default
any available hep would be appreciated.

Instructions To Reproduce the Issue:

  1. Full runnable code or full changes you made:
    i follow the turiroal , loading my datasets using register_coco_json() , then i start to train.
    here is my .yaml file, the Base-RCNN-FPN.yaml is the original one in config directory

BASE: "../Base-RCNN-FPN.yaml"
MODEL:
WEIGHTS: ""
MASK_ON: True
RESNETS:
DEPTH: 50
ROI_HEADS:
NUM_CLASSES: 6
DATASETS:
TRAIN: ("steel_train",) #("coco_2017_train",)
TEST: ("steel_val",) #("coco_2017_val",)
DATALOADER:
NUM_WORKERS: 8
SOLVER:
STEPS: () #(210000, 250000)
MAX_ITER: 270000
IMS_PER_BATCH: 16
BASE_LR: 0.001 #0.02
MAX_ITER: 90000

  1. in command line, i run
    python plain_train_net.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml
    and it run successfully as it seems

  2. Full logs or other relevant observations:
    [04/19 04:37:58] d2.data.datasets.coco INFO: Loaded 360 images in COCO format from ../datasets/steel/annotations/steel_val.json
    [04/19 04:37:58] d2.data.build INFO: Distribution of instances among all 6 categories:
    �[36m| category | #instances | category | #instances | category | #instances |
    |:-------------:|:-------------|:-------------:|:-------------|:----------:|:-------------|
    | crazing | 154 | inclusion | 195 | patches | 170 |
    | pitted_surf.. | 82 | rolled-in_s.. | 132 | scratches | 102 |
    | | | | | | |
    | total | 835 | | | | |�[0m
    [04/19 04:37:58] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
    [04/19 04:37:58] d2.data.common INFO: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
    [04/19 04:37:58] d2.data.common INFO: Serializing 360 elements to byte tensors and concatenating them all ...
    [04/19 04:37:58] d2.data.common INFO: Serialized dataset takes 0.19 MiB
    [04/19 04:37:58] d2.evaluation.evaluator INFO: Start inference on 360 batches
    [04/19 04:38:00] d2.evaluation.evaluator INFO: Inference done 11/360. Dataloading: 0.0006 s/iter. Inference: 0.0690 s/iter. Eval: 0.0027 s/iter. Total: 0.0723 s/iter. ETA=0:00:25
    [04/19 04:38:05] d2.evaluation.evaluator INFO: Inference done 80/360. Dataloading: 0.0011 s/iter. Inference: 0.0684 s/iter. Eval: 0.0030 s/iter. Total: 0.0725 s/iter. ETA=0:00:20
    [04/19 04:38:10] d2.evaluation.evaluator INFO: Inference done 151/360. Dataloading: 0.0012 s/iter. Inference: 0.0676 s/iter. Eval: 0.0028 s/iter. Total: 0.0717 s/iter. ETA=0:00:14
    [04/19 04:38:15] d2.evaluation.evaluator INFO: Inference done 221/360. Dataloading: 0.0012 s/iter. Inference: 0.0679 s/iter. Eval: 0.0028 s/iter. Total: 0.0719 s/iter. ETA=0:00:09
    [04/19 04:38:20] d2.evaluation.evaluator INFO: Inference done 290/360. Dataloading: 0.0012 s/iter. Inference: 0.0681 s/iter. Eval: 0.0028 s/iter. Total: 0.0721 s/iter. ETA=0:00:05
    [04/19 04:38:25] d2.evaluation.evaluator INFO: Total inference time: 0:00:25.371791 (0.071470 s / iter per device, on 1 devices)
    [04/19 04:38:25] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:00:23 (0.067332 s / iter per device, on 1 devices)
    [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
    [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Saving results to ./output/inference/steel_val/coco_instances_results.json
    [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Evaluating predictions with unofficial COCO API...
    [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: Evaluate annotation type bbox
    [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 0.04 seconds.
    [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results...
    [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 0.02 seconds.
    [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox:
    | AP | AP50 | AP75 | APs | APm | APl |
    |:-----:|:------:|:------:|:-----:|:-----:|:-----:|
    | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
    [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Per-category bbox AP:
    | category | AP | category | AP | category | AP |
    |:---------------|:------|:----------------|:------|:-----------|:------|
    | crazing | 0.000 | inclusion | 0.000 | patches | 0.000 |
    | pitted_surface | 0.000 | rolled-in_scale | 0.000 | scratches | 0.000 |
    [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: Evaluate annotation type segm
    [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 0.13 seconds.
    [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results...
    [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 0.02 seconds.
    [04/19 04:38:26] d2.evaluation.coco_evaluation INFO: Evaluation results for segm:
    | AP | AP50 | AP75 | APs | APm | APl |
    |:-----:|:------:|:------:|:-----:|:-----:|:-----:|
    | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
    [04/19 04:38:26] d2.evaluation.coco_evaluation INFO: Per-category segm AP:
    | category | AP | category | AP | category | AP |
    |:---------------|:------|:----------------|:------|:-----------|:------|
    | crazing | 0.000 | inclusion | 0.000 | patches | 0.000 |
    | pitted_surface | 0.000 | rolled-in_scale | 0.000 | scratches | 0.000 |
    [04/19 04:38:26] detectron2 INFO: Evaluation results for steel_val in csv format:
    [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: Task: bbox
    [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl
    [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
    [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: Task: segm
    [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl
    [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000

Expected behavior:

at least, there should be a result but zero, i don not know what cause the problem

Environment:

the environment is set up following the tutorial

-------------------------------  -----------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
numpy                            1.23.5
detectron2                       0.6 @/home/detectron2/detectron2
Compiler                         GCC 9.4
CUDA compiler                    CUDA 11.6
detectron2 arch flags            7.5
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          1.12.1+cu116 @/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0                            NVIDIA GeForce RTX 2080 Ti (arch=7.5)
Driver version                   510.54
CUDA_HOME                        /usr/local/cuda
Pillow                           9.3.0
torchvision                      0.13.1+cu116 @/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torchvision
torchvision arch flags           3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                           0.1.5.post20221221
iopath                           0.1.9
cv2                              4.6.0
-------------------------------  -----------------------------------------------------------------------------------
@Huxwell
Copy link

Huxwell commented Apr 23, 2024

It's hard to tell why your model is not learning based on the limited info you provided.
A good sanity check would be using the same train set and validation set during your first training run.
You can even use 2-10 images instead of 360.

http://karpathy.github.io/2019/04/25/recipe/
'overfit one batch. Overfit a single batch of only a few examples (e.g. as little as two). To do so we increase the capacity of our model (e.g. add layers or filters) and verify that we can reach the lowest achievable loss (e.g. zero). I also like to visualize in the same plot both the label and the prediction and ensure that they end up aligning perfectly once we reach the minimum loss. If they do not, there is a bug somewhere and we cannot continue to the next stage.'

@skylark-joe
Copy link
Author

thanks for your advice, it is kind of u to provide an atricle here, which i would read.
in fact, accidently, i guess i have solved the problem, through changeing the category_id in my json_file to [1, categories].

the waring we got is in detectron2\detectron2\data\datasets\coco.py line 104, saying that it will apply a mapping
however, when i read the following codes, i find that there is no operation to category_id as it saied.

in addition, in line 437 in the same file, i see that the
"id" field must start with 1 if we want to use the COCO API,
and then, after changing, it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants