Fix redundant outputs via Logging in DDP training #500

NanoCode012 · 2020-07-24T09:02:37Z

This PR fixes #463 second point.
Tested on coco128

python -m torch.distributed.launch --nproc_per_node 2 train.py --weights yolov5s.pt --cfg yolov5s.yaml --epochs 1 --img 320

Old output (current master)

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
                device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
                device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Namespace(batch_size=8, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='data/coco128.yaml', device='0,1', epochs=1, evolve=False, hyp='', img_size=[320, 320], local_rank=1, multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=2)
Hyperparameters {'optimizer': 'SGD', 'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}
Namespace(batch_size=8, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='data/coco128.yaml', device='0,1', epochs=1, evolve=False, hyp='', img_size=[320, 320], local_rank=0, multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=2)
Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Hyperparameters {'optimizer': 'SGD', 'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Transferred 368/370 items from yolov5s.pt
Transferred 368/370 items from yolov5s.pt
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty

Analyzing anchors... Best Possible Recall (BPR) = 0.9591. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 929 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 927 points...
thr=0.25: 0.9731 best possible recall, 3.74 anchors past thr
n=9, img_size=320, metric_all=0.261/0.654-mean/best, past_thr=0.471-mean: 9,12,  32,19,  27,47,  73,43,  53,91,  77,161,  161,107,  174,237,  299,195
Evolving anchors with Genetic Algorithm: fitness = 0.6627: 100%|█| 1000/1000 [00
thr=0.25: 0.9957 best possible recall, 3.79 anchors past thr
n=9, img_size=320, metric_all=0.262/0.662-mean/best, past_thr=0.473-mean: 7,8,  17,12,  24,31,  58,39,  50,86,  71,146,  148,116,  144,240,  293,213
New anchors saved to model. Update model *.yaml to use these anchors in the future.

Image sizes 320 train, 320 test
Using 8 dataloader workers
Starting training for 1 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       0/0    0.721G   0.08034    0.1729   0.03942    0.2927        55       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.172        0.65        0.44        0.24
Optimizer stripped from runs/exp1/weights/last.pt, 15.1MB
1 epochs completed in 0.011 hours.

New output

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************

Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
                device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Namespace(batch_size=8, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='data/coco128.yaml', device='0,1', epochs=1, evolve=False, hyp='', img_size=[320, 320], local_rank=0, multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=2)
Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/

Hyperparameter {'optimizer': 'SGD', 'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Transferred 368/370 items from yolov5s.pt
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty

Analyzing anchors... Best Possible Recall (BPR) = 0.9591. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 929 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 927 points...
thr=0.25: 0.9731 best possible recall, 3.74 anchors past thr
n=9, img_size=320, metric_all=0.261/0.654-mean/best, past_thr=0.471-mean: 9,12,  32,19,  27,47,  73,43,  53,91,  77,161,  161,107,  174,237,  299,195
Evolving anchors with Genetic Algorithm: fitness = 0.6627: 100%|█| 1000/1000 [00
thr=0.25: 0.9957 best possible recall, 3.79 anchors past thr
n=9, img_size=320, metric_all=0.262/0.662-mean/best, past_thr=0.473-mean: 7,8,  17,12,  24,31,  58,39,  50,86,  71,146,  148,116,  144,240,  293,213
New anchors saved to model. Update model *.yaml to use these anchors in the future.

Image sizes 320 train, 320 test
Using 8 dataloader workers
Starting training for 1 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       0/0    0.721G   0.08035    0.1729   0.03942    0.2926        55       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.173       0.649       0.441       0.241
Optimizer stripped from runs/exp15/weights/last.pt, 15.1MB
1 epochs completed in 0.010 hours.

The Old Output looks somewhat clean still because it is only outputting from 2 devices. 4 or 8 creates a mess.

I'm not sure whether I should set to use logging for everywhere for consistency, or only in the places that are affected by multi-gpu training.

Decide whether to change to logging everywhere or not. (Decision: Only multi-gpu areas)
Wait for test/detect refactor for multi-gpu
Wait for merge multi-node ddp support
Fix logging for multi-node

Edit: There is a "Scanning labels.." which repeats but it's part of tqdm, and not print. Not sure how to handle it right now.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced logging information for better tracking of model training processes.

📊 Key Changes

Added logging module imports to various files.
Replaced print statements with logger.info to standardize logging.
Added set_logging function to configure logging based on rank (necessary for distributed training setups).
Modified function signatures to use a shared rank variable for consistency.
Introduced logic to centralize the setting of DDP (Distributed Data Parallel) variables at the start of train.py.
Updated create_dataloader methods across train.py and datasets.py to ensure correct data processing in distributed environments.

🎯 Purpose & Impact

Purpose: The updates ensure that information during model training and dataset processing is logged efficiently. They streamline logging for better readability and management, especially when training models in distributed systems.
Impact: Users will benefit from clearer and more consistent logs, which are especially valuable for debugging and tracking the training of machine learning models at scale. The changes also promote clean coding practices and the efficient operation of distributed training sessions. 🧑‍💻🔍🌐

glenn-jocher · 2020-07-24T18:21:33Z

@NanoCode012 I think changing only in multi-gpu affected regions makes sense.

Should we wait for the test / detect refactor first before this or do this one first? This PR does not affect test.py and detect.py.

"Scanning labels" messages should show up twice (as it treats the train and val datasets as independent, even if they both point to the same images). I see it show up 3 times, but this is not a huge problem. Also sometimes depending on your console tqdm messages may accidentally repeat by themselves even on single gpu.

NanoCode012 · 2020-07-24T18:34:00Z

tqdm messages may accidentally repeat themselves

Okay I see!

Wait for test/detector PR merge?

I think so, then we can encapsulate it under one PR. However, you did mention that you want PR in small chunks...

Edit: I can create a new PR later to deal with the upcoming changes if you want.

I will set unit test to run in case I broke something tomorrow.

Edit2: Unit test success.

glenn-jocher · 2020-08-06T17:47:11Z

@NanoCode012 just tried some multi-gpu training myself and saw the redundant output phenomenon. Let's see if we can resolve these conflicts and get this merged. Can you do a rebase on your side to origin/master?

glenn-jocher · 2020-08-06T17:49:53Z

@NanoCode012 BTW, actual DDP ops seem to be working flawlessly. My use case is a 2x V100 training of v5x. I'll report the time difference once I get a few epochs completed.

glenn-jocher · 2020-08-06T17:51:30Z

nvidia-smi from epoch 0 and later to test high device 0 memory usage issue.

Epoch 0 train:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 17:50:26 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0   210W / 300W |  15020MiB / 16130MiB |     84%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   54C    P0   223W / 300W |  14708MiB / 16130MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      15009MiB |
|    1      1807      C   /opt/conda/bin/python                      14697MiB |
+-----------------------------------------------------------------------------+

epoch 0 test:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 18:19:45 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P0    84W / 300W |  14172MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   40C    P0    56W / 300W |  14006MiB / 16130MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      14159MiB |
|    1      1807      C   /opt/conda/bin/python                      13995MiB |
+-----------------------------------------------------------------------------+

epoch 1 train:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 18:21:59 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P0   277W / 300W |  14996MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   55C    P0   263W / 300W |  14600MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      14981MiB |
|    1      1807      C   /opt/conda/bin/python                      14589MiB |
+-----------------------------------------------------------------------------+

glenn-jocher · 2020-08-06T18:09:42Z

Just re-ran CI tests. Error on non-persistent buffers. Since PR predates the 1.6 update this seems logical. Ok this PR will need some work to properly rebase with origin/master. Perhaps simplest step is to merge #504 first (which just re-passed all CI tests now), and then tackle this one. @NanoCode012 sound good?

NanoCode012 · 2020-08-06T18:11:30Z

Hi @glenn-jocher , the reason this PR is not ready is because if we merge the "multi-node" PR later, I will have to redo another "logging" PR to address the logging under multi-node.

Edit:

Just re-ran CI tests. Error on non-persistent buffers. Since PR predates the 1.6 update this seems logical. Ok this PR will need some work to properly rebase with origin/master. Perhaps simplest step is to merge #504 first (which just re-passed all CI tests now), and then tackle this one. @NanoCode012 sound good?

Yep! Sounds good!

NanoCode012 · 2020-08-06T18:19:36Z

Thanks @glenn-jocher , with multi-node merged, I will get working on this PR to bring it up to speed and add multi-node logging. It should be done by tomorrow or day after as it is late here.

glenn-jocher · 2020-08-06T18:23:21Z

@NanoCode012 great, no rush. Glad to see we are making steady progress :)

NanoCode012 · 2020-08-06T18:36:06Z

Hi @glenn-jocher , for the GPU memory issue that was mentioned in #610 , tkianai and I did test and we found out that it spiked after epoch 1 test #610 (comment) and #610 (comment) . Specifically saying the issue lies in test as --notest removes the issue.

I am not sure why it does not happen for you. Maybe it is because your maximum memory is 16 GB and it is already training at 14 GB (near its maximum)?

I will run my own test with 2 GPU as comparison. I think this should be in a separate Issue.

Edit: Add table

Command

python -m torch.distributed.launch --nproc_per_node 2 train.py --batch-size 64 --data coco.yaml --cfg yolov5s.yaml --weights ''

GPU	Train 1	Train 3	Test 5
GPU 0	6145MiB	6483MiB	6483MiB
GPU 1	5965MiB	6295MiB	6295MiB

Edit 2: Due to losing track of time, I did not get to track down the inbetweens, however, we see that it doesn't spike as badly. I am confused why it spiked before. Was it because of 8 GPUs?

Edit 3: May take a bit longer for the rebase now. Got some other work to handle.

glenn-jocher · 2020-08-06T21:11:26Z

DDP time difference:

50min/epoch 1x V100 --batch 16
29min/epoch 2x V100 --batch 32

2x GPU needs 58% of the min/epoch as 1 GPU. Awesome, DDP seems to be working great!! 👍

NanoCode012 · 2020-08-11T06:14:38Z

Hi @glenn-jocher , I've done a rebase and fixed some leftovers. Sorry it took a while. I was a bit occupied with other stuff.

Logging output for 4 GPU DDP

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=0, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=0, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
2020-08-11 13:11:57.526967: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Hyperparameter {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Downloading https://drive.google.com/uc?export=download&id=1R5T6rIyy3lLwgFXNms8whc-387H0tMQO as yolov5s.pt...   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   279    0   279    0     0   1013      0 --:--:-- --:--:-- --:--:--  1014
100   408    0   408    0     0    341      0 --:--:--  0:00:01 --:--:--   553
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
100 14.4M    0 14.4M    0     0  5344k      0 --:--:--  0:00:02 --:--:-- 41.2M
Done (6.2s)

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]



Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 9581.68it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 16054.27it/s]

Analyzing anchors... anchors/target = 3.98, Best Possible Recall (BPR) = 0.9623. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 929 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 927 points...
thr=0.25: 0.9731 best possible recall, 3.74 anchors past thr
n=9, img_size=320, metric_all=0.261/0.654-mean/best, past_thr=0.471-mean: 9,12,  32,19,  27,47,  73,43,  53,91,  77,161,  161,107,  174,237,  299,195
Evolving anchors with Genetic Algorithm: fitness = 0.6577: 100%|█| 1000/1000 [00
thr=0.25: 0.9828 best possible recall, 3.79 anchors past thr
n=9, img_size=320, metric_all=0.263/0.660-mean/best, past_thr=0.473-mean: 9,10,  23,14,  30,42,  72,39,  53,87,  69,161,  146,128,  179,206,  292,225
New anchors saved to model. Update model *.yaml to use these anchors in the future.

Image sizes 320 train, 320 test
Using 4 dataloader workers
Starting training for 3 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       0/2    0.535G   0.07882    0.1792   0.03896     0.297        38       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.154       0.627       0.399       0.192

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       1/2     1.71G   0.06736    0.1551   0.03594    0.2584        27       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.172        0.69       0.484       0.268

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       2/2     1.71G   0.05921    0.1718   0.03886    0.2699        82       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.129       0.706       0.474       0.232
Optimizer stripped from runs/exp0/weights/last.pt, 15.2MB
Optimizer stripped from runs/exp0/weights/best.pt, 15.2MB
3 epochs completed in 0.005 hours.

Current master output 4 GPU DDP

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=3, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=3, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)
Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=2, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=2, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)
Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=1, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=0, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=0, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)

Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
2020-08-11 16:01:30.689269: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Downloading https://drive.google.com/uc?export=download&id=1R5T6rIyy3lLwgFXNms8whc-387H0tMQO as yolov5s.pt...   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   279    0   279    0     0    985      0 --:--:-- --:--:-- --:--:--   982
100   408    0   408    0     0    396      0 --:--:--  0:00:01 --:--:--   396
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
100 14.4M    0 14.4M    0     0  5309k      0 --:--:--  0:00:02 --:--:-- 28.9M
Done (7.3s)

                 from  n    params  module                                  arguments                     

                 from  n    params  module                                  arguments                     

                 from  n    params  module                                  arguments                     

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    

  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']           13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          

 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]               17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          

 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Transferred 370/370 items from yolov5s.pt
Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty

Analyzing anchors... anchors/target = 3.97, Best Possible Recall (BPR) = 0.9569. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 929 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 927 points...
thr=0.25: 0.9623 best possible recall, 3.54 anchors past thr
n=9, img_size=320, metric_all=0.251/0.635-mean/best, past_thr=0.475-mean: 11,11,  30,34,  74,43,  46,87,  77,162,  135,100,  204,158,  158,280,  304,203
Evolving anchors with Genetic Algorithm: fitness = 0.6819: 100%|█| 1000/1000 [00
thr=0.25: 0.9914 best possible recall, 3.77 anchors past thr
n=9, img_size=320, metric_all=0.262/0.682-mean/best, past_thr=0.474-mean: 7,6,  9,16,  26,15,  30,41,  72,46,  56,110,  136,132,  183,195,  316,205
New anchors saved to model. Update model *.yaml to use these anchors in the future.

Image sizes 320 train, 320 test
Using 4 dataloader workers
Starting training for 3 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       0/2    0.537G   0.07383    0.1703   0.04088     0.285        38       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.228       0.618       0.514       0.278

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       1/2     1.71G   0.06364     0.148    0.0377    0.2493        27       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.191       0.668        0.54       0.321

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       2/2     1.71G   0.05998    0.1575   0.03848     0.256        82       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.162       0.689       0.543       0.319
Optimizer stripped from runs/exp0/weights/last.pt, 15.2MB
Optimizer stripped from runs/exp0/weights/best.pt, 15.2MB
3 epochs completed in 0.005 hours.

Logging output on non-master machine for multi-node training

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************

Edit: Sorry for the multiple small commits, I just kept finding small inconsistencies while viewing the differences via the site and was correcting them. Please squash when merge.

Unit Test passed. This PR is finally ready. Please tell me if you want anything changed/explained.

glenn-jocher · 2020-08-11T17:52:41Z

@NanoCode012 thanks! I had an interesting idea that might make this change a little easier on the eyes. If we redefine print at the beginning of the functions that uses logger.info, would this cause any problems? I thought this might improve the readability, i.e.:

def function():
    print = logger.info
    
    # logger.info('hello')
    print('hello')

Could this work, or am I forgetting something?

NanoCode012 · 2020-08-11T18:16:04Z

I think it may be possible, not sure. But, I’m not sure it’s good for code readability later on. If someone else reads the code, they may miss the “logging” assignment.

glenn-jocher · 2020-08-11T18:18:40Z

@NanoCode012 yes, maybe you are right. I've never used the logging package before, but replacing it with print will obscure the command. Ok I'll go ahead and merge!

glenn-jocher · 2020-08-11T19:26:42Z

@NanoCode012 it looks like the update works error free, but some of the screen printing is different. For example colab isn't showing the CUDA information anymore, and Fusing ... should be followed by updated model.info() showing the new parameter count. I'll take a look.

Before:

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='inference/images', update=False, view_img=False, weights='yolov5s.pt')
Using CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5s.pt to yolov5s.pt...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   623  100   623    0     0   1238      0 --:--:-- --:--:-- --:--:--  1238
100 14.4M  100 14.4M    0     0  1207k      0  0:00:12  0:00:12 --:--:-- 1195k

Fusing layers... Model Summary: 140 layers, 7.45958e+06 parameters, 6.61683e+06 gradients
image 1/2 /Users/glennjocher/PycharmProjects/yolov5/inference/images/bus.jpg: 640x512 4 persons, 1 buss, Done. (0.283s)
image 2/2 /Users/glennjocher/PycharmProjects/yolov5/inference/images/zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.205s)
Results saved to inference/output
Done. (0.555s)

After:

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='inference/images', update=False, view_img=False, weights='yolov5s.pt')
Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5s.pt to yolov5s.pt...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   623  100   623    0     0   3385      0 --:--:-- --:--:-- --:--:--  3404
100 14.4M  100 14.4M    0     0  8964k      0  0:00:01  0:00:01 --:--:-- 18.8M

Fusing layers... image 1/2 /Users/glennjocher/PycharmProjects/yolov5/inference/images/bus.jpg: 640x512 4 persons, 1 buss, Done. (0.284s)
image 2/2 /Users/glennjocher/PycharmProjects/yolov5/inference/images/zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.212s)
Results saved to inference/output
Done. (0.566s)

NanoCode012 · 2020-08-12T00:54:46Z

Hi Glenn, I’ll take a look. It would seem we have to set_logging in the test/detect script as well. We might’ve missed other scripts too. Not sure what’s the best choice. Maybe do set_logging in the select_device function because they go hand in hand.

NanoCode012 · 2020-08-12T03:53:05Z

Hi @glenn-jocher .

New fix master...NanoCode012:logging-fix

Could not let Fusing layers... on same line as model.info() because logging does not support suppressing new line.

detect.py

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='0', img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='./inference/images/', update=False, view_img=False, weights=['yolov5m.pt'])
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5m.pt to yolov5m.pt...
100%|██████████████████████████████████████| 41.9M/41.9M [00:06<00:00, 6.69MB/s]

Fusing layers... 
Model Summary: 188 layers, 2.17879e+07 parameters, 2.00672e+07 gradients
image 1/2 /yolov5/inference/images/bus.jpg: 640x512 4 persons, 1 buss, Done. (0.017s)
image 2/2 /yolov5/inference/images/zidane.jpg: 384x640 2 persons, 1 ties, Done. (0.016s)
Results saved to inference/output
Done. (0.146s)

test.py

Namespace(augment=False, batch_size=32, conf_thres=0.001, data='data/coco128.yaml', device='0', img_size=320, iou_thres=0.65, merge=False, save_json=False, save_txt=False, single_cls=False, task='val', verbose=False, weights=['yolov5s.pt'])
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5s.pt to yolov5s.pt...
100%|███████████████████████████████████████| 14.5M/14.5M [00:56<00:00, 271kB/s]

Fusing layers... 
Model Summary: 140 layers, 7.45958e+06 parameters, 6.61683e+06 gradients
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 15030.82it/s]
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.346       0.621       0.565       0.356
Speed: 0.5/1.3/1.8 ms inference/NMS/total per 320x320 image at batch-size 32

yolo.py

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)


                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

export.py

Namespace(batch_size=1, img_size=[640, 640], weights='yolov5s.pt')
Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5s.pt to yolov5s.pt...
100%|██████████████████████████████████████| 14.5M/14.5M [00:14<00:00, 1.08MB/s]


Starting TorchScript export with torch 1.6.0...
/.conda/envs/py37/lib/python3.7/site-packages/torch/jit/__init__.py:1109: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, strict, _force_outplace)
TorchScript export success, saved as yolov5s.torchscript.pt

Starting ONNX export with onnx 1.6.0...
Fusing layers... 
Model Summary: 140 layers, 7.45958e+06 parameters, 6.61683e+06 gradients
graph torch-jit-export (
  %images[FLOAT, 1x3x640x640]
) initializers (
  %449[FLOAT, 4]
  %454[FLOAT, 4]
  %455[INT64, 1]
  %456[INT64, 1]
  %457[INT64, 1]
  %458[INT64, 1]
  %459[INT64, 1]
  %460[INT64, 1]
  %model.0.conv.conv.bias[FLOAT, 32]
  %model.0.conv.conv.weight[FLOAT, 32x12x3x3]
  %model.1.conv.bias[FLOAT, 64]
  %model.1.conv.weight[FLOAT, 64x32x3x3]
  %model.10.conv.bias[FLOAT, 256]
  %model.10.conv.weight[FLOAT, 256x512x1x1]
  %model.13.bn.bias[FLOAT, 256]
  %model.13.bn.running_mean[FLOAT, 256]
  %model.13.bn.running_var[FLOAT, 256]
  %model.13.bn.weight[FLOAT, 256]
  %model.13.cv1.conv.bias[FLOAT, 128]
  %model.13.cv1.conv.weight[FLOAT, 128x512x1x1]
  %model.13.cv2.weight[FLOAT, 128x512x1x1]
  %model.13.cv3.weight[FLOAT, 128x128x1x1]
  %model.13.cv4.conv.bias[FLOAT, 256]
  %model.13.cv4.conv.weight[FLOAT, 256x256x1x1]
  %model.13.m.0.cv1.conv.bias[FLOAT, 128]
  %model.13.m.0.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.13.m.0.cv2.conv.bias[FLOAT, 128]
  %model.13.m.0.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.14.conv.bias[FLOAT, 128]
  %model.14.conv.weight[FLOAT, 128x256x1x1]
  %model.17.bn.bias[FLOAT, 128]
  %model.17.bn.running_mean[FLOAT, 128]
  %model.17.bn.running_var[FLOAT, 128]
  %model.17.bn.weight[FLOAT, 128]
  %model.17.cv1.conv.bias[FLOAT, 64]
  %model.17.cv1.conv.weight[FLOAT, 64x256x1x1]
  %model.17.cv2.weight[FLOAT, 64x256x1x1]
  %model.17.cv3.weight[FLOAT, 64x64x1x1]
  %model.17.cv4.conv.bias[FLOAT, 128]
  %model.17.cv4.conv.weight[FLOAT, 128x128x1x1]
  %model.17.m.0.cv1.conv.bias[FLOAT, 64]
  %model.17.m.0.cv1.conv.weight[FLOAT, 64x64x1x1]
  %model.17.m.0.cv2.conv.bias[FLOAT, 64]
  %model.17.m.0.cv2.conv.weight[FLOAT, 64x64x3x3]
  %model.18.conv.bias[FLOAT, 128]
  %model.18.conv.weight[FLOAT, 128x128x3x3]
  %model.2.bn.bias[FLOAT, 64]
  %model.2.bn.running_mean[FLOAT, 64]
  %model.2.bn.running_var[FLOAT, 64]
  %model.2.bn.weight[FLOAT, 64]
  %model.2.cv1.conv.bias[FLOAT, 32]
  %model.2.cv1.conv.weight[FLOAT, 32x64x1x1]
  %model.2.cv2.weight[FLOAT, 32x64x1x1]
  %model.2.cv3.weight[FLOAT, 32x32x1x1]
  %model.2.cv4.conv.bias[FLOAT, 64]
  %model.2.cv4.conv.weight[FLOAT, 64x64x1x1]
  %model.2.m.0.cv1.conv.bias[FLOAT, 32]
  %model.2.m.0.cv1.conv.weight[FLOAT, 32x32x1x1]
  %model.2.m.0.cv2.conv.bias[FLOAT, 32]
  %model.2.m.0.cv2.conv.weight[FLOAT, 32x32x3x3]
  %model.20.bn.bias[FLOAT, 256]
  %model.20.bn.running_mean[FLOAT, 256]
  %model.20.bn.running_var[FLOAT, 256]
  %model.20.bn.weight[FLOAT, 256]
  %model.20.cv1.conv.bias[FLOAT, 128]
  %model.20.cv1.conv.weight[FLOAT, 128x256x1x1]
  %model.20.cv2.weight[FLOAT, 128x256x1x1]
  %model.20.cv3.weight[FLOAT, 128x128x1x1]
  %model.20.cv4.conv.bias[FLOAT, 256]
  %model.20.cv4.conv.weight[FLOAT, 256x256x1x1]
  %model.20.m.0.cv1.conv.bias[FLOAT, 128]
  %model.20.m.0.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.20.m.0.cv2.conv.bias[FLOAT, 128]
  %model.20.m.0.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.21.conv.bias[FLOAT, 256]
  %model.21.conv.weight[FLOAT, 256x256x3x3]
  %model.23.bn.bias[FLOAT, 512]
  %model.23.bn.running_mean[FLOAT, 512]
  %model.23.bn.running_var[FLOAT, 512]
  %model.23.bn.weight[FLOAT, 512]
  %model.23.cv1.conv.bias[FLOAT, 256]
  %model.23.cv1.conv.weight[FLOAT, 256x512x1x1]
  %model.23.cv2.weight[FLOAT, 256x512x1x1]
  %model.23.cv3.weight[FLOAT, 256x256x1x1]
  %model.23.cv4.conv.bias[FLOAT, 512]
  %model.23.cv4.conv.weight[FLOAT, 512x512x1x1]
  %model.23.m.0.cv1.conv.bias[FLOAT, 256]
  %model.23.m.0.cv1.conv.weight[FLOAT, 256x256x1x1]
  %model.23.m.0.cv2.conv.bias[FLOAT, 256]
  %model.23.m.0.cv2.conv.weight[FLOAT, 256x256x3x3]
  %model.24.m.0.bias[FLOAT, 255]
  %model.24.m.0.weight[FLOAT, 255x128x1x1]
  %model.24.m.1.bias[FLOAT, 255]
  %model.24.m.1.weight[FLOAT, 255x256x1x1]
  %model.24.m.2.bias[FLOAT, 255]
  %model.24.m.2.weight[FLOAT, 255x512x1x1]
  %model.3.conv.bias[FLOAT, 128]
  %model.3.conv.weight[FLOAT, 128x64x3x3]
  %model.4.bn.bias[FLOAT, 128]
  %model.4.bn.running_mean[FLOAT, 128]
  %model.4.bn.running_var[FLOAT, 128]
  %model.4.bn.weight[FLOAT, 128]
  %model.4.cv1.conv.bias[FLOAT, 64]
  %model.4.cv1.conv.weight[FLOAT, 64x128x1x1]
  %model.4.cv2.weight[FLOAT, 64x128x1x1]
  %model.4.cv3.weight[FLOAT, 64x64x1x1]
  %model.4.cv4.conv.bias[FLOAT, 128]
  %model.4.cv4.conv.weight[FLOAT, 128x128x1x1]
  %model.4.m.0.cv1.conv.bias[FLOAT, 64]
  %model.4.m.0.cv1.conv.weight[FLOAT, 64x64x1x1]
  %model.4.m.0.cv2.conv.bias[FLOAT, 64]
  %model.4.m.0.cv2.conv.weight[FLOAT, 64x64x3x3]
  %model.4.m.1.cv1.conv.bias[FLOAT, 64]
  %model.4.m.1.cv1.conv.weight[FLOAT, 64x64x1x1]
  %model.4.m.1.cv2.conv.bias[FLOAT, 64]
  %model.4.m.1.cv2.conv.weight[FLOAT, 64x64x3x3]
  %model.4.m.2.cv1.conv.bias[FLOAT, 64]
  %model.4.m.2.cv1.conv.weight[FLOAT, 64x64x1x1]
  %model.4.m.2.cv2.conv.bias[FLOAT, 64]
  %model.4.m.2.cv2.conv.weight[FLOAT, 64x64x3x3]
  %model.5.conv.bias[FLOAT, 256]
  %model.5.conv.weight[FLOAT, 256x128x3x3]
  %model.6.bn.bias[FLOAT, 256]
  %model.6.bn.running_mean[FLOAT, 256]
  %model.6.bn.running_var[FLOAT, 256]
  %model.6.bn.weight[FLOAT, 256]
  %model.6.cv1.conv.bias[FLOAT, 128]
  %model.6.cv1.conv.weight[FLOAT, 128x256x1x1]
  %model.6.cv2.weight[FLOAT, 128x256x1x1]
  %model.6.cv3.weight[FLOAT, 128x128x1x1]
  %model.6.cv4.conv.bias[FLOAT, 256]
  %model.6.cv4.conv.weight[FLOAT, 256x256x1x1]
  %model.6.m.0.cv1.conv.bias[FLOAT, 128]
  %model.6.m.0.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.6.m.0.cv2.conv.bias[FLOAT, 128]
  %model.6.m.0.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.6.m.1.cv1.conv.bias[FLOAT, 128]
  %model.6.m.1.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.6.m.1.cv2.conv.bias[FLOAT, 128]
  %model.6.m.1.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.6.m.2.cv1.conv.bias[FLOAT, 128]
  %model.6.m.2.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.6.m.2.cv2.conv.bias[FLOAT, 128]
  %model.6.m.2.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.7.conv.bias[FLOAT, 512]
  %model.7.conv.weight[FLOAT, 512x256x3x3]
  %model.8.cv1.conv.bias[FLOAT, 256]
  %model.8.cv1.conv.weight[FLOAT, 256x512x1x1]
  %model.8.cv2.conv.bias[FLOAT, 512]
  %model.8.cv2.conv.weight[FLOAT, 512x1024x1x1]
  %model.9.bn.bias[FLOAT, 512]
  %model.9.bn.running_mean[FLOAT, 512]
  %model.9.bn.running_var[FLOAT, 512]
  %model.9.bn.weight[FLOAT, 512]
  %model.9.cv1.conv.bias[FLOAT, 256]
  %model.9.cv1.conv.weight[FLOAT, 256x512x1x1]
  %model.9.cv2.weight[FLOAT, 256x512x1x1]
  %model.9.cv3.weight[FLOAT, 256x256x1x1]
  %model.9.cv4.conv.bias[FLOAT, 512]
  %model.9.cv4.conv.weight[FLOAT, 512x512x1x1]
  %model.9.m.0.cv1.conv.bias[FLOAT, 256]
  %model.9.m.0.cv1.conv.weight[FLOAT, 256x256x1x1]
  %model.9.m.0.cv2.conv.bias[FLOAT, 256]
  %model.9.m.0.cv2.conv.weight[FLOAT, 256x256x3x3]
) {
  %167 = Constant[value = <Tensor>]()
  %168 = Constant[value = <Tensor>]()
  %169 = Constant[value = <Tensor>]()
  %170 = Constant[value = <Tensor>]()
  %171 = Slice(%images, %168, %169, %167, %170)
  %172 = Constant[value = <Tensor>]()
  %173 = Constant[value = <Tensor>]()
  %174 = Constant[value = <Tensor>]()
  %175 = Constant[value = <Tensor>]()
  %176 = Slice(%171, %173, %174, %172, %175)
  %177 = Constant[value = <Tensor>]()
  %178 = Constant[value = <Tensor>]()
  %179 = Constant[value = <Tensor>]()
  %180 = Constant[value = <Tensor>]()
  %181 = Slice(%images, %178, %179, %177, %180)
  %182 = Constant[value = <Tensor>]()
  %183 = Constant[value = <Tensor>]()
  %184 = Constant[value = <Tensor>]()
  %185 = Constant[value = <Tensor>]()
  %186 = Slice(%181, %183, %184, %182, %185)
  %187 = Constant[value = <Tensor>]()
  %188 = Constant[value = <Tensor>]()
  %189 = Constant[value = <Tensor>]()
  %190 = Constant[value = <Tensor>]()
  %191 = Slice(%images, %188, %189, %187, %190)
  %192 = Constant[value = <Tensor>]()
  %193 = Constant[value = <Tensor>]()
  %194 = Constant[value = <Tensor>]()
  %195 = Constant[value = <Tensor>]()
  %196 = Slice(%191, %193, %194, %192, %195)
  %197 = Constant[value = <Tensor>]()
  %198 = Constant[value = <Tensor>]()
  %199 = Constant[value = <Tensor>]()
  %200 = Constant[value = <Tensor>]()
  %201 = Slice(%images, %198, %199, %197, %200)
  %202 = Constant[value = <Tensor>]()
  %203 = Constant[value = <Tensor>]()
  %204 = Constant[value = <Tensor>]()
  %205 = Constant[value = <Tensor>]()
  %206 = Slice(%201, %203, %204, %202, %205)
  %207 = Concat[axis = 1](%176, %186, %196, %206)
  %208 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%207, %model.0.conv.conv.weight, %model.0.conv.conv.bias)
  %209 = LeakyRelu[alpha = 0.100000001490116](%208)
  %210 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%209, %model.1.conv.weight, %model.1.conv.bias)
  %211 = LeakyRelu[alpha = 0.100000001490116](%210)
  %212 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%211, %model.2.cv1.conv.weight, %model.2.cv1.conv.bias)
  %213 = LeakyRelu[alpha = 0.100000001490116](%212)
  %214 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%213, %model.2.m.0.cv1.conv.weight, %model.2.m.0.cv1.conv.bias)
  %215 = LeakyRelu[alpha = 0.100000001490116](%214)
  %216 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%215, %model.2.m.0.cv2.conv.weight, %model.2.m.0.cv2.conv.bias)
  %217 = LeakyRelu[alpha = 0.100000001490116](%216)
  %218 = Add(%213, %217)
  %219 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%218, %model.2.cv3.weight)
  %220 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%211, %model.2.cv2.weight)
  %221 = Concat[axis = 1](%219, %220)
  %222 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%221, %model.2.bn.weight, %model.2.bn.bias, %model.2.bn.running_mean, %model.2.bn.running_var)
  %223 = LeakyRelu[alpha = 0.100000001490116](%222)
  %224 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%223, %model.2.cv4.conv.weight, %model.2.cv4.conv.bias)
  %225 = LeakyRelu[alpha = 0.100000001490116](%224)
  %226 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%225, %model.3.conv.weight, %model.3.conv.bias)
  %227 = LeakyRelu[alpha = 0.100000001490116](%226)
  %228 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%227, %model.4.cv1.conv.weight, %model.4.cv1.conv.bias)
  %229 = LeakyRelu[alpha = 0.100000001490116](%228)
  %230 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%229, %model.4.m.0.cv1.conv.weight, %model.4.m.0.cv1.conv.bias)
  %231 = LeakyRelu[alpha = 0.100000001490116](%230)
  %232 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%231, %model.4.m.0.cv2.conv.weight, %model.4.m.0.cv2.conv.bias)
  %233 = LeakyRelu[alpha = 0.100000001490116](%232)
  %234 = Add(%229, %233)
  %235 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%234, %model.4.m.1.cv1.conv.weight, %model.4.m.1.cv1.conv.bias)
  %236 = LeakyRelu[alpha = 0.100000001490116](%235)
  %237 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%236, %model.4.m.1.cv2.conv.weight, %model.4.m.1.cv2.conv.bias)
  %238 = LeakyRelu[alpha = 0.100000001490116](%237)
  %239 = Add(%234, %238)
  %240 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%239, %model.4.m.2.cv1.conv.weight, %model.4.m.2.cv1.conv.bias)
  %241 = LeakyRelu[alpha = 0.100000001490116](%240)
  %242 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%241, %model.4.m.2.cv2.conv.weight, %model.4.m.2.cv2.conv.bias)
  %243 = LeakyRelu[alpha = 0.100000001490116](%242)
  %244 = Add(%239, %243)
  %245 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%244, %model.4.cv3.weight)
  %246 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%227, %model.4.cv2.weight)
  %247 = Concat[axis = 1](%245, %246)
  %248 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%247, %model.4.bn.weight, %model.4.bn.bias, %model.4.bn.running_mean, %model.4.bn.running_var)
  %249 = LeakyRelu[alpha = 0.100000001490116](%248)
  %250 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%249, %model.4.cv4.conv.weight, %model.4.cv4.conv.bias)
  %251 = LeakyRelu[alpha = 0.100000001490116](%250)
  %252 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%251, %model.5.conv.weight, %model.5.conv.bias)
  %253 = LeakyRelu[alpha = 0.100000001490116](%252)
  %254 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%253, %model.6.cv1.conv.weight, %model.6.cv1.conv.bias)
  %255 = LeakyRelu[alpha = 0.100000001490116](%254)
  %256 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%255, %model.6.m.0.cv1.conv.weight, %model.6.m.0.cv1.conv.bias)
  %257 = LeakyRelu[alpha = 0.100000001490116](%256)
  %258 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%257, %model.6.m.0.cv2.conv.weight, %model.6.m.0.cv2.conv.bias)
  %259 = LeakyRelu[alpha = 0.100000001490116](%258)
  %260 = Add(%255, %259)
  %261 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%260, %model.6.m.1.cv1.conv.weight, %model.6.m.1.cv1.conv.bias)
  %262 = LeakyRelu[alpha = 0.100000001490116](%261)
  %263 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%262, %model.6.m.1.cv2.conv.weight, %model.6.m.1.cv2.conv.bias)
  %264 = LeakyRelu[alpha = 0.100000001490116](%263)
  %265 = Add(%260, %264)
  %266 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%265, %model.6.m.2.cv1.conv.weight, %model.6.m.2.cv1.conv.bias)
  %267 = LeakyRelu[alpha = 0.100000001490116](%266)
  %268 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%267, %model.6.m.2.cv2.conv.weight, %model.6.m.2.cv2.conv.bias)
  %269 = LeakyRelu[alpha = 0.100000001490116](%268)
  %270 = Add(%265, %269)
  %271 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%270, %model.6.cv3.weight)
  %272 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%253, %model.6.cv2.weight)
  %273 = Concat[axis = 1](%271, %272)
  %274 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%273, %model.6.bn.weight, %model.6.bn.bias, %model.6.bn.running_mean, %model.6.bn.running_var)
  %275 = LeakyRelu[alpha = 0.100000001490116](%274)
  %276 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%275, %model.6.cv4.conv.weight, %model.6.cv4.conv.bias)
  %277 = LeakyRelu[alpha = 0.100000001490116](%276)
  %278 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%277, %model.7.conv.weight, %model.7.conv.bias)
  %279 = LeakyRelu[alpha = 0.100000001490116](%278)
  %280 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%279, %model.8.cv1.conv.weight, %model.8.cv1.conv.bias)
  %281 = LeakyRelu[alpha = 0.100000001490116](%280)
  %282 = MaxPool[ceil_mode = 0, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%281)
  %283 = MaxPool[ceil_mode = 0, kernel_shape = [9, 9], pads = [4, 4, 4, 4], strides = [1, 1]](%281)
  %284 = MaxPool[ceil_mode = 0, kernel_shape = [13, 13], pads = [6, 6, 6, 6], strides = [1, 1]](%281)
  %285 = Concat[axis = 1](%281, %282, %283, %284)
  %286 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%285, %model.8.cv2.conv.weight, %model.8.cv2.conv.bias)
  %287 = LeakyRelu[alpha = 0.100000001490116](%286)
  %288 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%287, %model.9.cv1.conv.weight, %model.9.cv1.conv.bias)
  %289 = LeakyRelu[alpha = 0.100000001490116](%288)
  %290 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%289, %model.9.m.0.cv1.conv.weight, %model.9.m.0.cv1.conv.bias)
  %291 = LeakyRelu[alpha = 0.100000001490116](%290)
  %292 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%291, %model.9.m.0.cv2.conv.weight, %model.9.m.0.cv2.conv.bias)
  %293 = LeakyRelu[alpha = 0.100000001490116](%292)
  %294 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%293, %model.9.cv3.weight)
  %295 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%287, %model.9.cv2.weight)
  %296 = Concat[axis = 1](%294, %295)
  %297 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%296, %model.9.bn.weight, %model.9.bn.bias, %model.9.bn.running_mean, %model.9.bn.running_var)
  %298 = LeakyRelu[alpha = 0.100000001490116](%297)
  %299 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%298, %model.9.cv4.conv.weight, %model.9.cv4.conv.bias)
  %300 = LeakyRelu[alpha = 0.100000001490116](%299)
  %301 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%300, %model.10.conv.weight, %model.10.conv.bias)
  %302 = LeakyRelu[alpha = 0.100000001490116](%301)
  %311 = Constant[value = <Tensor>]()
  %312 = Resize[coordinate_transformation_mode = 'asymmetric', cubic_coeff_a = -0.75, mode = 'nearest', nearest_mode = 'floor'](%302, %311, %449)
  %313 = Concat[axis = 1](%312, %277)
  %314 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%313, %model.13.cv1.conv.weight, %model.13.cv1.conv.bias)
  %315 = LeakyRelu[alpha = 0.100000001490116](%314)
  %316 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%315, %model.13.m.0.cv1.conv.weight, %model.13.m.0.cv1.conv.bias)
  %317 = LeakyRelu[alpha = 0.100000001490116](%316)
  %318 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%317, %model.13.m.0.cv2.conv.weight, %model.13.m.0.cv2.conv.bias)
  %319 = LeakyRelu[alpha = 0.100000001490116](%318)
  %320 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%319, %model.13.cv3.weight)
  %321 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%313, %model.13.cv2.weight)
  %322 = Concat[axis = 1](%320, %321)
  %323 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%322, %model.13.bn.weight, %model.13.bn.bias, %model.13.bn.running_mean, %model.13.bn.running_var)
  %324 = LeakyRelu[alpha = 0.100000001490116](%323)
  %325 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%324, %model.13.cv4.conv.weight, %model.13.cv4.conv.bias)
  %326 = LeakyRelu[alpha = 0.100000001490116](%325)
  %327 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%326, %model.14.conv.weight, %model.14.conv.bias)
  %328 = LeakyRelu[alpha = 0.100000001490116](%327)
  %337 = Constant[value = <Tensor>]()
  %338 = Resize[coordinate_transformation_mode = 'asymmetric', cubic_coeff_a = -0.75, mode = 'nearest', nearest_mode = 'floor'](%328, %337, %454)
  %339 = Concat[axis = 1](%338, %251)
  %340 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%339, %model.17.cv1.conv.weight, %model.17.cv1.conv.bias)
  %341 = LeakyRelu[alpha = 0.100000001490116](%340)
  %342 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%341, %model.17.m.0.cv1.conv.weight, %model.17.m.0.cv1.conv.bias)
  %343 = LeakyRelu[alpha = 0.100000001490116](%342)
  %344 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%343, %model.17.m.0.cv2.conv.weight, %model.17.m.0.cv2.conv.bias)
  %345 = LeakyRelu[alpha = 0.100000001490116](%344)
  %346 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%345, %model.17.cv3.weight)
  %347 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%339, %model.17.cv2.weight)
  %348 = Concat[axis = 1](%346, %347)
  %349 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%348, %model.17.bn.weight, %model.17.bn.bias, %model.17.bn.running_mean, %model.17.bn.running_var)
  %350 = LeakyRelu[alpha = 0.100000001490116](%349)
  %351 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%350, %model.17.cv4.conv.weight, %model.17.cv4.conv.bias)
  %352 = LeakyRelu[alpha = 0.100000001490116](%351)
  %353 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%352, %model.18.conv.weight, %model.18.conv.bias)
  %354 = LeakyRelu[alpha = 0.100000001490116](%353)
  %355 = Concat[axis = 1](%354, %328)
  %356 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%355, %model.20.cv1.conv.weight, %model.20.cv1.conv.bias)
  %357 = LeakyRelu[alpha = 0.100000001490116](%356)
  %358 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%357, %model.20.m.0.cv1.conv.weight, %model.20.m.0.cv1.conv.bias)
  %359 = LeakyRelu[alpha = 0.100000001490116](%358)
  %360 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%359, %model.20.m.0.cv2.conv.weight, %model.20.m.0.cv2.conv.bias)
  %361 = LeakyRelu[alpha = 0.100000001490116](%360)
  %362 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%361, %model.20.cv3.weight)
  %363 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%355, %model.20.cv2.weight)
  %364 = Concat[axis = 1](%362, %363)
  %365 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%364, %model.20.bn.weight, %model.20.bn.bias, %model.20.bn.running_mean, %model.20.bn.running_var)
  %366 = LeakyRelu[alpha = 0.100000001490116](%365)
  %367 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%366, %model.20.cv4.conv.weight, %model.20.cv4.conv.bias)
  %368 = LeakyRelu[alpha = 0.100000001490116](%367)
  %369 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%368, %model.21.conv.weight, %model.21.conv.bias)
  %370 = LeakyRelu[alpha = 0.100000001490116](%369)
  %371 = Concat[axis = 1](%370, %302)
  %372 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%371, %model.23.cv1.conv.weight, %model.23.cv1.conv.bias)
  %373 = LeakyRelu[alpha = 0.100000001490116](%372)
  %374 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%373, %model.23.m.0.cv1.conv.weight, %model.23.m.0.cv1.conv.bias)
  %375 = LeakyRelu[alpha = 0.100000001490116](%374)
  %376 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%375, %model.23.m.0.cv2.conv.weight, %model.23.m.0.cv2.conv.bias)
  %377 = LeakyRelu[alpha = 0.100000001490116](%376)
  %378 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%377, %model.23.cv3.weight)
  %379 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%371, %model.23.cv2.weight)
  %380 = Concat[axis = 1](%378, %379)
  %381 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%380, %model.23.bn.weight, %model.23.bn.bias, %model.23.bn.running_mean, %model.23.bn.running_var)
  %382 = LeakyRelu[alpha = 0.100000001490116](%381)
  %383 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%382, %model.23.cv4.conv.weight, %model.23.cv4.conv.bias)
  %384 = LeakyRelu[alpha = 0.100000001490116](%383)
  %385 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%352, %model.24.m.0.weight, %model.24.m.0.bias)
  %386 = Shape(%385)
  %387 = Constant[value = <Scalar Tensor []>]()
  %388 = Gather[axis = 0](%386, %387)
  %389 = Shape(%385)
  %390 = Constant[value = <Scalar Tensor []>]()
  %391 = Gather[axis = 0](%389, %390)
  %392 = Shape(%385)
  %393 = Constant[value = <Scalar Tensor []>]()
  %394 = Gather[axis = 0](%392, %393)
  %397 = Unsqueeze[axes = [0]](%388)
  %400 = Unsqueeze[axes = [0]](%391)
  %401 = Unsqueeze[axes = [0]](%394)
  %402 = Concat[axis = 0](%397, %455, %456, %400, %401)
  %403 = Reshape(%385, %402)
  %output = Transpose[perm = [0, 1, 3, 4, 2]](%403)
  %405 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%368, %model.24.m.1.weight, %model.24.m.1.bias)
  %406 = Shape(%405)
  %407 = Constant[value = <Scalar Tensor []>]()
  %408 = Gather[axis = 0](%406, %407)
  %409 = Shape(%405)
  %410 = Constant[value = <Scalar Tensor []>]()
  %411 = Gather[axis = 0](%409, %410)
  %412 = Shape(%405)
  %413 = Constant[value = <Scalar Tensor []>]()
  %414 = Gather[axis = 0](%412, %413)
  %417 = Unsqueeze[axes = [0]](%408)
  %420 = Unsqueeze[axes = [0]](%411)
  %421 = Unsqueeze[axes = [0]](%414)
  %422 = Concat[axis = 0](%417, %457, %458, %420, %421)
  %423 = Reshape(%405, %422)
  %424 = Transpose[perm = [0, 1, 3, 4, 2]](%423)
  %425 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%384, %model.24.m.2.weight, %model.24.m.2.bias)
  %426 = Shape(%425)
  %427 = Constant[value = <Scalar Tensor []>]()
  %428 = Gather[axis = 0](%426, %427)
  %429 = Shape(%425)
  %430 = Constant[value = <Scalar Tensor []>]()
  %431 = Gather[axis = 0](%429, %430)
  %432 = Shape(%425)
  %433 = Constant[value = <Scalar Tensor []>]()
  %434 = Gather[axis = 0](%432, %433)
  %437 = Unsqueeze[axes = [0]](%428)
  %440 = Unsqueeze[axes = [0]](%431)
  %441 = Unsqueeze[axes = [0]](%434)
  %442 = Concat[axis = 0](%437, %459, %460, %440, %441)
  %443 = Reshape(%425, %442)
  %444 = Transpose[perm = [0, 1, 3, 4, 2]](%443)
  return %output, %424, %444
}
ONNX export success, saved as yolov5s.onnx
2020-08-12 10:43:20.966009: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
TensorFlow version 2.3.0 detected. Last version known to be fully compatible is 1.14.0 .

Starting CoreML export with coremltools 3.4...
CoreML export failure: module 'coremltools' has no attribute 'convert'

Export complete. Visualize with https://github.com/lutzroeder/netron.

The solution is not the cleanest. It would be better to add set_logging to a function that will always be called at every script instead of adding set_logging manually to each script. The closest I can find is select_device but it does not exist in export.py.

glenn-jocher · 2020-08-12T04:13:44Z

@NanoCode012 looks good! Can you submit a PR? Fusing is ok, I understand.

* Change print to logging * Clean function set_logging * Add line spacing * Change leftover prints to log * Fix scanning labels output * Fix rank naming * Change leftover print to logging * Reorganized DDP variables * Fix type error * Make quotes consistent * Fix spelling * Clean function call * Add line spacing * Update datasets.py * Update train.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

Borda approved these changes Jul 24, 2020

View reviewed changes

NanoCode012 mentioned this pull request Aug 3, 2020

Add Multi-Node support for DDP Training #504

Merged

3 tasks

glenn-jocher added the TODO label Aug 6, 2020

glenn-jocher assigned NanoCode012 and glenn-jocher Aug 6, 2020

glenn-jocher mentioned this pull request Aug 6, 2020

Consume double GPU memory while using pytorch built-in amp module starting from the second epoch #610

Closed

NanoCode012 force-pushed the logging branch from a99232b to f39e64f Compare August 11, 2020 05:21

NanoCode012 added 8 commits August 11, 2020 18:56

Change print to logging

d199730

Clean function set_logging

5653bcf

Add line spacing

815e3d5

Change leftover prints to log

95ff668

Fix scanning labels output

29a16c8

Fix rank naming

dd3b737

Change leftover print to logging

0af98ce

Reorganized DDP variables

faa6ea3

NanoCode012 force-pushed the logging branch from 644ca4b to faa6ea3 Compare August 11, 2020 12:21

Fix type error

65814e5

NanoCode012 marked this pull request as ready for review August 11, 2020 12:55

NanoCode012 added 4 commits August 11, 2020 19:57

Make quotes consistent

58bfcb6

Fix spelling

501e09c

Clean function call

2e57d00

Add line spacing

fdd1440

glenn-jocher added 2 commits August 11, 2020 10:55

Update datasets.py

459f885

Update train.py

cf9f181

glenn-jocher merged commit 4949401 into ultralytics:master Aug 11, 2020

glenn-jocher removed the TODO label Aug 11, 2020

NanoCode012 deleted the logging branch August 11, 2020 18:20

NanoCode012 mentioned this pull request Aug 12, 2020

Fix Logging #719

Merged

glenn-jocher mentioned this pull request Aug 19, 2020

[DDP][Gradient Accumulation] Speed can be accelerated. #790

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix redundant outputs via Logging in DDP training #500

Fix redundant outputs via Logging in DDP training #500

NanoCode012 commented Jul 24, 2020 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Jul 24, 2020

NanoCode012 commented Jul 24, 2020 •

edited

Loading

glenn-jocher commented Aug 6, 2020

glenn-jocher commented Aug 6, 2020

glenn-jocher commented Aug 6, 2020 •

edited

Loading

glenn-jocher commented Aug 6, 2020

NanoCode012 commented Aug 6, 2020 •

edited

Loading

NanoCode012 commented Aug 6, 2020

glenn-jocher commented Aug 6, 2020

NanoCode012 commented Aug 6, 2020 •

edited

Loading

glenn-jocher commented Aug 6, 2020

NanoCode012 commented Aug 11, 2020 •

edited

Loading

glenn-jocher commented Aug 11, 2020 •

edited

Loading

NanoCode012 commented Aug 11, 2020

glenn-jocher commented Aug 11, 2020

glenn-jocher commented Aug 11, 2020 •

edited

Loading

NanoCode012 commented Aug 12, 2020

NanoCode012 commented Aug 12, 2020

glenn-jocher commented Aug 12, 2020

Fix redundant outputs via Logging in DDP training #500

Fix redundant outputs via Logging in DDP training #500

Conversation

NanoCode012 commented Jul 24, 2020 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented Jul 24, 2020

NanoCode012 commented Jul 24, 2020 • edited Loading

glenn-jocher commented Aug 6, 2020

glenn-jocher commented Aug 6, 2020

glenn-jocher commented Aug 6, 2020 • edited Loading

glenn-jocher commented Aug 6, 2020

NanoCode012 commented Aug 6, 2020 • edited Loading

NanoCode012 commented Aug 6, 2020

glenn-jocher commented Aug 6, 2020

NanoCode012 commented Aug 6, 2020 • edited Loading

glenn-jocher commented Aug 6, 2020

NanoCode012 commented Aug 11, 2020 • edited Loading

glenn-jocher commented Aug 11, 2020 • edited Loading

NanoCode012 commented Aug 11, 2020

glenn-jocher commented Aug 11, 2020

glenn-jocher commented Aug 11, 2020 • edited Loading

NanoCode012 commented Aug 12, 2020

NanoCode012 commented Aug 12, 2020

glenn-jocher commented Aug 12, 2020

NanoCode012 commented Jul 24, 2020 •

edited by UltralyticsAssistant

Loading

NanoCode012 commented Jul 24, 2020 •

edited

Loading

glenn-jocher commented Aug 6, 2020 •

edited

Loading

NanoCode012 commented Aug 6, 2020 •

edited

Loading

NanoCode012 commented Aug 6, 2020 •

edited

Loading

NanoCode012 commented Aug 11, 2020 •

edited

Loading

glenn-jocher commented Aug 11, 2020 •

edited

Loading

glenn-jocher commented Aug 11, 2020 •

edited

Loading