Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when training and predicting with a custom dataset and the YOLO_NAS_S model #2028

Open
Esidell opened this issue Jul 2, 2024 · 2 comments

Comments

@Esidell
Copy link

Esidell commented Jul 2, 2024

💡 Your Question

Hi everyone,

I'm trying to use the Yolo_nas_s model on a custom dataset made of 2D gaussians in order to simulate galaxies so it can recognise them in astronomical images, however during training, many parts of the loss function and validation are equal to zero, thus preventing the model from doing anyprediction when using model.predict().

I've already checked the labels and they seem to be correctly working ( a .txt file with c x y w h , normalised for the coordinates and dimensions of the box.)

The model only uses 1 class and uses the PPyoloELoss function, here are the training parameters and other related parts :

from super_gradients.training.losses import PPYoloELoss
from super_gradients.training.metrics import DetectionMetrics_050
from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback


CLASS_NAMES = ['zero_order']
NUM_CLASSES = len(CLASS_NAMES)

train_params = {
    "warmup_initial_lr": 1e-5,
    "initial_lr": 5e-4,
    "lr_mode": "cosine",
    "cosine_final_lr_ratio": 0.5,
    "optimizer": "SGD",
    "zero_weight_decay_on_bias_and_bn": True,
    "lr_warmup_epochs": 1,
    "warmup_mode": "LinearEpochLRWarmup",
    "optimizer_params": {"weight_decay": 0.0001},
    "ema": False,
    "average_best_models": False,
    "ema_params": {"beta": 25, "decay_type": "exp"},
    "max_epochs": 20,
    "mixed_precision": True,
    "loss": PPYoloELoss(use_static_assigner=True, num_classes=NUM_CLASSES, reg_max=None),
    "valid_metrics_list": [
        DetectionMetrics_050(
            score_thres=0.1,
            top_k_predictions=300,
            num_cls=NUM_CLASSES,
            normalize_targets=True,
            include_classwise_ap=True,
            class_names=CLASS_NAMES,
            post_prediction_callback=PPYoloEPostPredictionCallback(score_threshold=0.01, nms_top_k=1000, max_predictions=300, nms_threshold=0.7),
        )
    ],
    "metric_to_watch": "mAP@0.50",
}

from super_gradients.training import Trainer
from super_gradients.common.object_names import Models
from super_gradients.training import models

trainer = Trainer(experiment_name="yolo_nas_s", ckpt_root_dir="CHECKPOINT_DIR")
model = models.get(Models.YOLO_NAS_S, num_classes=NUM_CLASSES, pretrained_weights="coco")
trainer.train(model=model, training_params=train_params, train_loader=train_loader, valid_loader=valid_loader)

IMAGE = "/data/split/test/images/image_0010.jpg"

images_predictions = model.to("cuda").predict(IMAGE, conf = 0.1)

images_predictions.show(box_thickness=2, show_confidence=True)

Here is an epoch summary to demonstrate the problem :

SUMMARY OF EPOCH 1
├── Train
│ ├── Ppyoloeloss/loss_cls = 0.0076
│ │ ├── Epoch N-1 = 0.4161 (↘ -0.4085)
│ │ └── Best until now = 0.4161 (↘ -0.4085)
│ ├── Ppyoloeloss/loss_iou = 0.0
│ │ ├── Epoch N-1 = 0.0 (= 0.0)
│ │ └── Best until now = 0.0 (= 0.0)
│ ├── Ppyoloeloss/loss_dfl = 0.0
│ │ ├── Epoch N-1 = 0.0 (= 0.0)
│ │ └── Best until now = 0.0 (= 0.0)
│ └── Ppyoloeloss/loss = 0.0076
│ ├── Epoch N-1 = 0.4161 (↘ -0.4085)
│ └── Best until now = 0.4161 (↘ -0.4085)
└── Validation
├── Ppyoloeloss/loss_cls = 0.0024
│ ├── Epoch N-1 = 0.0582 (↘ -0.0558)
│ └── Best until now = 0.0582 (↘ -0.0558)
├── Ppyoloeloss/loss_iou = 0.0
│ ├── Epoch N-1 = 0.0 (= 0.0)
│ └── Best until now = 0.0 (= 0.0)
├── Ppyoloeloss/loss_dfl = 0.0
│ ├── Epoch N-1 = 0.0 (= 0.0)
│ └── Best until now = 0.0 (= 0.0)
...
├── Epoch N-1 = 0.0 (= 0.0)
└── Best until now = 0.0 (= 0.0)

Thank you for your help !

Versions

No response

@BloodAxe
Copy link
Collaborator

BloodAxe commented Jul 9, 2024

The first thing I would try - double-check that dataset is loaded properly.
Use this callback to visualize the data during training and see if there are correct boxes drawn

class ExtremeBatchDetectionVisualizationCallback(ExtremeBatchCaseVisualizationCallback):

@Esidell
Copy link
Author

Esidell commented Jul 19, 2024

The first thing I would try - double-check that dataset is loaded properly. Use this callback to visualize the data during training and see if there are correct boxes drawn

class ExtremeBatchDetectionVisualizationCallback(ExtremeBatchCaseVisualizationCallback):

I've tried a different approach and added normalization to my dataset class instead of bringing in already normalized images, and it seems to have fixed the issue of the model not predicting, however I get this issue instead, which I only get on certain images :


AttributeError Traceback (most recent call last)
AttributeError: 'int' object has no attribute 'sqrt'

The above exception was the direct cause of the following exception:
in ImageDetectionPrediction.show(self, box_thickness, show_confidence, color_mapping, target_bboxes, target_bboxes_format, target_class_ids, class_names)

line 52 : diag_length = np.sqrt(bbox_width2 + bbox_height2)

TypeErrr: loop of ufunc does not support argument 0 of type int which has no callable sqrt method

Does anyone know of what could be causing this in the normalization process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants