Add support for W&B bounding box debugging and metric logging #1108

AyushExel · 2020-10-09T15:02:06Z

This PR adds support for debugging models using W&B, only if the library is already found installed. When using W&B, users of YOLOV5 can debug their models easily inside a customizable dashboard by logging & comparing performance metrics, system usage metrics (like GPU memory) and predictions.

W&B is free for all individuals, academics and open source projects

Here is a live dashboard comparing all the standard YOLOv5 models

Features:

Bonding Box Debugging

Debug you bounding box predictions in real-time.

Automatically log and compare the performance of multiple models

Supports Resuming

When training is resumed from a previous checkpoint, the metrics and images will continue to be logged in the same W&B dashboard if it exists, otherwise, a new W&B run will be created

Adds no dependencies

The library will work as it is supposed to if wandb is not installed and will only log metrics and media files to W&B if it is installed. To enable W&B logging, you just need to install the library using pip install wandb

To disable logging you can set os.environ['WANDB_DISABLED'] = 'true' in your code or call wandb off using the command line.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced YOLOv5 testing and training modules with improved logging, code refactoring, and Weights & Biases integration.

📊 Key Changes

Added save_conf and num_predictions arguments to control output during testing.
Integrated Weights & Biases (W&B) for better experiment tracking and visualization.
Improved file management during model testing, such as creating output directories and saving results.
Code refactoring for clarity, such as renaming variables and updating comments.
Added bounding box logging to W&B for better debugging.
Modified JSON saving logic to reflect the correct file naming and handling.
Enhanced the training pipeline with W&B logging, including bounding box images if W&B is initialized.

🎯 Purpose & Impact

Enhanced logging: With W&B integration, users gain access to advanced experiment tracking and visualization tools. 📊
Robust file handling: Ensures consistent output directory creation and deletion to avoid errors and keep file structure clean. 🗂️
Better debuggability: Bounding box image logging helps users visualize model performance and debug more effectively. 🖼️
Clearer code: Refactoring code improves readability and maintainability for developers. 💻
Consistent naming: Standardized JSON file naming aligns with expected outputs for easier result identification. 🏷️

Users will experience a more seamless testing and training process with enhanced tools for monitoring their model's performance, leading to a smoother development cycle and potential improvements in model accuracy.

Add Support for W&B Bounding Box debugging and Metric logging

github-actions

Hello @AyushExel, thank you for submitting a PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

Verify your PR is up-to-date with origin/master. If your PR is behind origin/master update by running the following, replacing 'feature' with the name of your local branch:

git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f

Verify all Continuous Integration (CI) checks are passing.
Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

glenn-jocher · 2020-10-10T10:48:21Z

@AyushExel very interesting, thanks for the PR! I was not aware of this logging tool. One thing I noticed is that you have a ceiling in place on box quantity. An alternative approach which would allow you to remove this argument, which we currently use for inference and for plotting predictions (i.e. test_batch0.jpg) is to only plot boxes above a sensible real-world confidence:

yolov5/utils/general.py

Lines 1078 to 1081 in 5fac5ad

    
           if gt or conf[j] > 0.3:  # 0.3 conf thresh 
        
               label = '%s' % cls if gt else '%s %.1f' % (cls, conf[j]) 
        
               plot_one_box(box, mosaic, label=label, color=color, line_thickness=tl)

Not to be confused with the mAP confidence threshold, which is set as close to zero as feasible to record the highest mAP:

yolov5/test.py

Line 257 in 5fac5ad

    
           parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')

Also, a bit on a separate topic but based on your training plots you have may not have trained sufficient epochs. One clue to this (other than validation confidences that do not overfit) is low confidences on your detections (i.e. around 40% confidence in your road traffic gif).

AyushExel · 2020-10-10T13:12:38Z

@glenn-jocher I ceiling is for the number of images being logged per epoch and not on the number of bounding boxes. All the detected bounding boxes are being logged. Which means that at maximum 100 images will be logged per epoch. This serves 2 purposes:

To sure the UI feels snappy
It's easier to debug fewer images.

Also, plotting all the boxes also serves a purpose. The bounding box debugger of W&B supports the tuning of confidence scores in real-time( Like the GIF above). So, I thought applying a filter would not make sense.

But we can relax this restriction in the future. Would love your thoughts on this as well.

As for the training, I trained all the models for 100 epochs to keep the comparison fair. But I'm training again for 200 epochs to improve confidence.

glenn-jocher · 2020-10-11T11:01:38Z

@AyushExel ah, got it, thanks for the clarification. We've found image logging ops to be very costly operations, for wall clock time as well as hard drive space, hence the current approach of only plotting the first few batches of the first and final epoch during training.

The slowdown is more pronounced for smaller datasets, like the COCO128 tutorial, where each epoch may only last a few seconds, but will affect all datasets to varying degrees naturally. Toeing this line is part of the compromise we are trying to keep, between adding features and speeding up training.

AyushExel · 2020-10-27T06:24:50Z

@glenn-jocher Here are some results of the profiling tests that I ran for wandb logging code

Time unit => 1e-3 seconds(ms)Time profile for all steps in W&B image logging

#Line     #Hits     #Time     #Time/hit  #%Time  
   146       128          0.6       0.0         0.0              if len(wandb_image_log) < num_predictions:
   147        50          1.5       0.0        0.0                  x = pred.clone()
   148       100        875.5      8.8         3.8                  bbox_data = [{
   149                                                               "position": {
   150                                                                   "minX": float(xyxy[0]),
   151                                                                   "minY": float(xyxy[1]),
   152                                                                   "maxX": float(xyxy[2]),
   153                                                                   "maxY": float(xyxy[3])
   154                                                               },
   155                                                               "class_id": int(cls),
   156                                                               "scores": {
   157                                                                   "class_score": float(conf)
   158                                                               },
   159                                                               "domain":"pixel"
   160        50          0.2        0.0          0.0                  } for *xyxy, conf, cls in x]
   161        50       5889.7    117.8           25.5                  im = wandb.Image(img[si], boxes={"predictions": 
                                                                                           {"box_data":bbox_data}})
   162        50          0.3      0.0           0.0                  wandb_image_log.append(im)
 
   224         1          0.0      0.0             0.0      if len(wandb_image_log) > 0:
   225         1        108.6    108.6            0.5          wandb.log({"outputs":wandb_image_log})

TIme Profile for all steps in existing image plotting

#Line     #Hits     #Time     #Time/hit  #%Time    
   217        32          0.1      0.0             0.0          if plots and batch_i < 1:
   218         1          0.1      0.1              0.0              f = save_dir / ('test_batch%g_gt.jpg' % batch_i)  # filename
   219         1         22.9     22.9            0.1              plot_images(img, targets, paths, str(f), names)  # ground truth
   220         1          0.0      0.0              0.0              f = save_dir / ('test_batch%g_pred.jpg' % batch_i)
   221         1        347.2    347.2          1.5              plot_images(img, output_to_target(output, width, height), paths, str(f), names)

Summary:

To store 1 Bounding-Box image in a list for logging to W&B, time taken is : Lines torch 1.4 error - RuntimeError: Failed to register operator torchvision::_new_empty_tensor_op. #148(8.8ms) + input_image_size #161(117.8ms) = 126.6ms
log 50 images, using wandb.log : line onnx export #225 = 108.6ms
Here's the W&B dashboard that was generated from this particular run -> Yolov5-Dashboard

One thing to note here is that this time taken will remain constant on a given system, as we're logging a constant number of images( 50 in this case) . In the case of smaller datasets( like coco-128), the penalty will be noticeable but intuitively, we should never log around 50% of the dataset for debugging. But for larger datasets, this penalty won't be that noticeable. But in either case, the number of images to be logged should be available as an argparse parameter to provide full control to the user

glenn-jocher · 2020-10-28T15:56:16Z

@AyushExel thanks for the profile info! Before going any further we'll want to rebase this against master, as there have been many commits since the PR was originally submitted. We have an action that can do this automatically sometimes, I'll try it, but failing that you'll likely need to do this by hand by following the directions in #1108 (review).

/rebase

glenn-jocher · 2020-10-28T16:18:54Z

@AyushExel ok, I ran some speed tests for an alternative implementation.

If you put a breakpoint at L146 of test.py (after your rebase) and test the following lines you'll see that the second accomplishes the same as the first in about 2% of the time (94us vs 5540us, or 50X faster). So you should switch to .tolist() for sending the values to your dict of dicts.

If wandb can accept a python float() (the default type() created by the .tolist() method) value for the class_id key then you are all set, else you can retain that "class_id": int(cls), line.

%timeit [[float(x1), float(x2), float(x3), float(x4), float(x5), float(x6)] for x1, x2, x3, x4, x5, x6 in pred.clone()]
5.54 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit [[x1, x2, x3, x4, x5, x6] for x1, x2, x3, x4, x5, x6 in pred.clone().tolist()]
94.1 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

AyushExel · 2020-10-28T21:05:28Z

@glenn-jocher Thanks for pointing this out. I was not able to merge this branch with the updated version of the master branch as there were a lot of conflicts. I ended up with broken code when trying to resolve the conflicts. So, I started fresh with a clean branch and made a new PR with the changes you requested.

Here the new PR #1235 . Feel to close this PR.

glenn-jocher · 2020-10-29T10:37:45Z

@glenn-jocher Thanks for pointing this out. I was not able to merge this branch with the updated version of the master branch as there were a lot of conflicts. I ended up with broken code when trying to resolve the conflicts. So, I started fresh with a clean branch and made a new PR with the changes you requested.

Here the new PR #1235 . Feel to close this PR.

That's a good idea. I was actually thinking of recommending that. Will close.

AyushExel and others added 6 commits October 7, 2020 20:03

Add wandb metric logging and bounding box debugging

0df3b97

Improve formatting, readability

9ef29eb

Remove mutliple path for init, improve formatting

ff7b1c9

Merge branch 'master' into wandb

b73ef6b

Merge pull request #1 from AyushExel/wandb

3b8349c

Add Support for W&B Bounding Box debugging and Metric logging

Fix typo

7d18721

github-actions bot reviewed Oct 9, 2020

View reviewed changes

glenn-jocher assigned AyushExel and glenn-jocher Oct 28, 2020

glenn-jocher added the enhancement New feature or request label Oct 28, 2020

glenn-jocher self-requested a review October 28, 2020 16:06

AyushExel added 3 commits October 28, 2020 22:17

Fix argument conflicts

1a09bc2

Remove conflicts

572a627

Fix typo

597dead

AyushExel mentioned this pull request Oct 28, 2020

Weights & Biases (W&B) Feature Addition #1235

Merged

glenn-jocher closed this Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for W&B bounding box debugging and metric logging #1108

Add support for W&B bounding box debugging and metric logging #1108

AyushExel commented Oct 9, 2020 •

edited by UltralyticsAssistant

Loading

github-actions bot left a comment

glenn-jocher commented Oct 10, 2020

AyushExel commented Oct 10, 2020 •

edited

Loading

glenn-jocher commented Oct 11, 2020

AyushExel commented Oct 27, 2020 •

edited

Loading

glenn-jocher commented Oct 28, 2020

glenn-jocher commented Oct 28, 2020 •

edited

Loading

AyushExel commented Oct 28, 2020

glenn-jocher commented Oct 29, 2020

Add support for W&B bounding box debugging and metric logging #1108

Add support for W&B bounding box debugging and metric logging #1108

Conversation

AyushExel commented Oct 9, 2020 • edited by UltralyticsAssistant Loading

Here is a live dashboard comparing all the standard YOLOv5 models

Features:

Bonding Box Debugging

Automatically log and compare the performance of multiple models

Supports Resuming

Adds no dependencies

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

github-actions bot left a comment

Choose a reason for hiding this comment

glenn-jocher commented Oct 10, 2020

AyushExel commented Oct 10, 2020 • edited Loading

glenn-jocher commented Oct 11, 2020

AyushExel commented Oct 27, 2020 • edited Loading

Summary:

glenn-jocher commented Oct 28, 2020

glenn-jocher commented Oct 28, 2020 • edited Loading

AyushExel commented Oct 28, 2020

glenn-jocher commented Oct 29, 2020

AyushExel commented Oct 9, 2020 •

edited by UltralyticsAssistant

Loading

AyushExel commented Oct 10, 2020 •

edited

Loading

AyushExel commented Oct 27, 2020 •

edited

Loading

glenn-jocher commented Oct 28, 2020 •

edited

Loading