Add trainer to yolov8 validator when calling w/o recipe #1599

rahul-tuli · 2023-05-31T22:58:08Z

This PR fixes a bug in our current yolov8 training flow.

The bug reveals itself when sparseml.ultralytics.train utility is invoked without a recipe

(sparsezoo) 🥃 sparseml (main) 💍 sparseml.ultralytics.train --epochs 1
/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.16) or chardet (5.1.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.Conv                  [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.Conv                  [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.C2f                   [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.C2f                   [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.C2f                   [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.SPPF                  [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.C2f                   [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.Conv                  [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.C2f                   [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 22        [15, 18, 21]  1    897664  ultralytics.nn.modules.Detect                [80, [64, 128, 256]]          
YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs

Ultralytics YOLOv8.0.30 🚀 Python-3.10.11 torch-1.12.1+cu102 CUDA:0 (NVIDIA TITAN RTX, 24218MiB)
yolo/engine/trainer: recipe=None, recipe_args=None, datasets_dir=None, task=detect, mode=train, model=yolov8n.yaml, data=coco128.yaml, epochs=1, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, min_memory=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, save_dir=runs/detect/train56

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.Conv                  [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.Conv                  [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.C2f                   [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.C2f                   [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.C2f                   [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.SPPF                  [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.C2f                   [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.Conv                  [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.C2f                   [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 22        [15, 18, 21]  1    897664  ultralytics.nn.modules.Detect                [80, [64, 128, 256]]          
YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs

Received torch.nn.Module, not loading from checkpoint
optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias
train: Scanning /home/rahul/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]
val: Scanning /home/rahul/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]
/home/rahul/github_projects/sparseml/src/sparseml/yolov8/trainers.py:294: UserWarning: Unable to import wandb for logging
  warnings.warn("Unable to import wandb for logging")
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train56
Starting training for 1 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/1      3.46G       3.61       5.75      4.257        172        640: 100%|██████████| 8/8 [00:03<00:00,  2.06it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 4/4 [00:03<00:00,  1.16it/s]
                   all        128        929          0          0          0          0

1 epochs completed in 0.003 hours.
Optimizer stripped from runs/detect/train56/weights/last.pt, 6.5MB
Optimizer stripped from runs/detect/train56/weights/best.pt, 6.5MB

Validating runs/detect/train56/weights/best.pt...
Ultralytics YOLOv8.0.30 🚀 Python-3.10.11 torch-1.12.1+cu102 CUDA:0 (NVIDIA TITAN RTX, 24218MiB)
Traceback (most recent call last):
  File "/home/rahul/.venvs/sparsezoo/bin/sparseml.ultralytics.train", line 8, in <module>
    sys.exit(main())
  File "/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/rahul/github_projects/sparseml/src/sparseml/yolov8/train.py", line 225, in main
    model.train(**kwargs)
  File "/home/rahul/github_projects/sparseml/src/sparseml/yolov8/trainers.py", line 796, in train
    self.trainer.train()
  File "/home/rahul/github_projects/sparseml/src/sparseml/yolov8/trainers.py", line 164, in train
    self._do_train(int(os.getenv("RANK", -1)), world_size)
  File "/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/ultralytics/yolo/engine/trainer.py", line 371, in _do_train
    self.final_eval()
  File "/home/rahul/github_projects/sparseml/src/sparseml/yolov8/trainers.py", line 491, in final_eval
    return super().final_eval()
  File "/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/ultralytics/yolo/engine/trainer.py", line 514, in final_eval
    self.metrics = self.validator(model=f)
  File "/home/rahul/github_projects/sparseml/src/sparseml/yolov8/validators.py", line 108, in __call__
    trainer.model = model.model
AttributeError: 'NoneType' object has no attribute 'model'

This happens due to ultralytics codebase, in which a trainer is not passed to the validator on this line: https://github.com/ultralytics/ultralytics/blob/6c65934b555e64bf26edd699865754b5ff651d0c/ultralytics/yolo/engine/trainer.py#L551

And this pathway is only hit when a recipe is not given to the train command, for example here's a dummy run with a recipe and it completes without any errors:

test_recipe.yaml

version: 1.1.0

training_modifiers:
  - !EpochRangeModifier
    start_epoch: 0
    end_epoch: 1

pruning_modifiers:
  - !GMPruningModifier
    init_sparsity: 0.05
    final_sparsity: 0.10
    start_epoch: 0.25
    end_epoch: 0.5
    update_frequency: 1.0
    params: ["model.0.conv.weight", "model.1.conv.weight"]

quantization_modifiers:
  - !QuantizationModifier
    start_epoch: 0.5
    freeze_bn_stats_epoch: 0.5
    disable_quantization_observer_epoch: 0.75
    ignore: ["Upsample", "Concat", "SiLU"]

sparseml.ultralytics.train --recipe src/sparseml/yolov8/test_recipe.yaml
sparseml.ultralytics.train --recipe src/sparseml/yolov8/test_recipe.yaml 
/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.16) or chardet (5.1.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.Conv                  [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.Conv                  [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.C2f                   [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.C2f                   [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.C2f                   [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.SPPF                  [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.C2f                   [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.Conv                  [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.C2f                   [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 22        [15, 18, 21]  1    897664  ultralytics.nn.modules.Detect                [80, [64, 128, 256]]          
YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs

Ultralytics YOLOv8.0.30 🚀 Python-3.10.11 torch-1.12.1+cu102 CUDA:0 (NVIDIA TITAN RTX, 24218MiB)
yolo/engine/trainer: recipe=src/sparseml/yolov8/test_recipe.yaml, recipe_args=None, datasets_dir=None, task=detect, mode=train, model=yolov8n.yaml, data=coco128.yaml, epochs=100, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, min_memory=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, save_dir=runs/detect/train57

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.Conv                  [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.Conv                  [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.C2f                   [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.C2f                   [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.C2f                   [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.SPPF                  [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.C2f                   [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.Conv                  [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.C2f                   [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 22        [15, 18, 21]  1    897664  ultralytics.nn.modules.Detect                [80, [64, 128, 256]]          
YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs

Received torch.nn.Module, not loading from checkpoint
optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias
train: Scanning /home/rahul/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]
val: Scanning /home/rahul/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]
/home/rahul/github_projects/sparseml/src/sparseml/yolov8/trainers.py:294: UserWarning: Unable to import wandb for logging
  warnings.warn("Unable to import wandb for logging")
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train57
Starting training for 1 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/1      4.47G      3.617      5.715      4.202        172        640: 100%|██████████| 8/8 [00:04<00:00,  1.86it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 4/4 [00:03<00:00,  1.02it/s]
                   all        128        929          0          0          0          0

1 epochs completed in 0.003 hours.
Results saved to runs/detect/train57

The recipe pathway succeeds because the final validation is explicitly not invoked when a recipe is provided:

sparseml/src/sparseml/yolov8/trainers.py

Line 490 in 9c7c285

if self.manager is None and self.checkpoint_manager is None:

NOTE: This circumvents the error, but/and does not run the final validation pathway 😲 and should probably investigated more.

The proposed fix however, handles the failing case by wrapping self.validator in a functools.partial callable object with trainer information; this not only fixes the error, but also successfully runs the ultralytics' final_eval:

(yolov8-without-recipe-fix) 💍 sparseml.ultralytics.train --epochs 1                                   
/home/rahul/.venvs/sparsezoo/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.16) or chardet (5.1.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json
Status code 429 from http://ipinfo.io/96.89.198.177/json: ERROR - 429 Client Error: Too Many Requests for url: http://ipinfo.io/96.89.198.177/json

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.Conv                  [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.Conv                  [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.C2f                   [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.C2f                   [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.C2f                   [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.SPPF                  [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.C2f                   [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.Conv                  [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.C2f                   [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 22        [15, 18, 21]  1    897664  ultralytics.nn.modules.Detect                [80, [64, 128, 256]]          
YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs

Ultralytics YOLOv8.0.30 🚀 Python-3.10.11 torch-1.12.1+cu102 CUDA:0 (NVIDIA TITAN RTX, 24218MiB)
yolo/engine/trainer: recipe=None, recipe_args=None, datasets_dir=None, task=detect, mode=train, model=yolov8n.yaml, data=coco128.yaml, epochs=1, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, min_memory=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, save_dir=runs/detect/train60

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.Conv                  [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.Conv                  [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.C2f                   [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.C2f                   [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  8                  -1  1    460288  ultralytics.nn.modules.C2f                   [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.SPPF                  [256, 256, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1     37248  ultralytics.nn.modules.C2f                   [192, 64, 1]                  
 16                  -1  1     36992  ultralytics.nn.modules.Conv                  [64, 64, 3, 2]                
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    123648  ultralytics.nn.modules.C2f                   [192, 128, 1]                 
 19                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 22        [15, 18, 21]  1    897664  ultralytics.nn.modules.Detect                [80, [64, 128, 256]]          
YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs

Received torch.nn.Module, not loading from checkpoint
optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias
train: Scanning /home/rahul/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]
val: Scanning /home/rahul/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]
/home/rahul/github_projects/sparseml/src/sparseml/yolov8/trainers.py:295: UserWarning: Unable to import wandb for logging
  warnings.warn("Unable to import wandb for logging")
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train60
Starting training for 1 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/1      3.46G       3.61       5.75      4.257        172        640: 100%|██████████| 8/8 [00:03<00:00,  2.08it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 4/4 [00:03<00:00,  1.19it/s]
                   all        128        929          0          0          0          0

1 epochs completed in 0.003 hours.
Optimizer stripped from runs/detect/train60/weights/last.pt, 6.5MB
Optimizer stripped from runs/detect/train60/weights/best.pt, 6.5MB

Validating runs/detect/train60/weights/best.pt...
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 4/4 [00:03<00:00,  1.12it/s]
                   all        128        929          0          0          0          0
Results saved to runs/detect/train60

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1204725623587413

dbogunowicz

Sweet, thanks for the exhaustive explanation R.

Add trainer to validator when calling w/o recipe

ded8b4f

rahul-tuli requested review from bfineran, markurtz, KSGulin and dbogunowicz May 31, 2023 22:58

rahul-tuli self-assigned this May 31, 2023

rahul-tuli added bug Something isn't working 1.5 release For fixes that must also be put in place for release/1.5 labels May 31, 2023

rahul-tuli mentioned this pull request May 31, 2023

[Cherry Pick]Add trainer to validator when calling w/o recipe #1600

Merged

rahul-tuli and others added 2 commits June 1, 2023 10:05

Re-work comment message to better describe what was done

a0a8926

Merge branch 'main' into yolov8-without-recipe-fix

a9039bf

dbogunowicz approved these changes Jun 1, 2023

View reviewed changes

KSGulin approved these changes Jun 1, 2023

View reviewed changes

KSGulin merged commit b9b5b19 into main Jun 1, 2023

KSGulin deleted the yolov8-without-recipe-fix branch June 1, 2023 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trainer to yolov8 validator when calling w/o recipe #1599

Add trainer to yolov8 validator when calling w/o recipe #1599

rahul-tuli commented May 31, 2023 •

edited

Loading

dbogunowicz left a comment

Add trainer to yolov8 validator when calling w/o recipe #1599

Add trainer to yolov8 validator when calling w/o recipe #1599

Conversation

rahul-tuli commented May 31, 2023 • edited Loading

dbogunowicz left a comment

Choose a reason for hiding this comment

rahul-tuli commented May 31, 2023 •

edited

Loading