Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for W&B bounding box debugging and metric logging #1108

Closed
wants to merge 9 commits into from
Closed

Add support for W&B bounding box debugging and metric logging #1108

wants to merge 9 commits into from

Conversation

AyushExel
Copy link
Contributor

@AyushExel AyushExel commented Oct 9, 2020

This PR adds support for debugging models using W&B, only if the library is already found installed. When using W&B, users of YOLOV5 can debug their models easily inside a customizable dashboard by logging & comparing performance metrics, system usage metrics (like GPU memory) and predictions.

W&B is free for all individuals, academics and open source projects

Here is a live dashboard comparing all the standard YOLOv5 models

Features:

  • Bonding Box Debugging

Debug you bounding box predictions in real-time.
4d482f8f.gif

  • Automatically log and compare the performance of multiple models

ezgif com-resize

  • Supports Resuming

When training is resumed from a previous checkpoint, the metrics and images will continue to be logged in the same W&B dashboard if it exists, otherwise, a new W&B run will be created

  • Adds no dependencies

The library will work as it is supposed to if wandb is not installed and will only log metrics and media files to W&B if it is installed. To enable W&B logging, you just need to install the library using pip install wandb

To disable logging you can set os.environ['WANDB_DISABLED'] = 'true' in your code or call wandb off using the command line.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced YOLOv5 testing and training modules with improved logging, code refactoring, and Weights & Biases integration.

📊 Key Changes

  • Added save_conf and num_predictions arguments to control output during testing.
  • Integrated Weights & Biases (W&B) for better experiment tracking and visualization.
  • Improved file management during model testing, such as creating output directories and saving results.
  • Code refactoring for clarity, such as renaming variables and updating comments.
  • Added bounding box logging to W&B for better debugging.
  • Modified JSON saving logic to reflect the correct file naming and handling.
  • Enhanced the training pipeline with W&B logging, including bounding box images if W&B is initialized.

🎯 Purpose & Impact

  • Enhanced logging: With W&B integration, users gain access to advanced experiment tracking and visualization tools. 📊
  • Robust file handling: Ensures consistent output directory creation and deletion to avoid errors and keep file structure clean. 🗂️
  • Better debuggability: Bounding box image logging helps users visualize model performance and debug more effectively. 🖼️
  • Clearer code: Refactoring code improves readability and maintainability for developers. 💻
  • Consistent naming: Standardized JSON file naming aligns with expected outputs for easier result identification. 🏷️

Users will experience a more seamless testing and training process with enhanced tools for monitoring their model's performance, leading to a smoother development cycle and potential improvements in model accuracy.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @AyushExel, thank you for submitting a PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • Verify your PR is up-to-date with origin/master. If your PR is behind origin/master update by running the following, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
  • Verify all Continuous Integration (CI) checks are passing.
  • Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

@glenn-jocher
Copy link
Member

@AyushExel very interesting, thanks for the PR! I was not aware of this logging tool. One thing I noticed is that you have a ceiling in place on box quantity. An alternative approach which would allow you to remove this argument, which we currently use for inference and for plotting predictions (i.e. test_batch0.jpg) is to only plot boxes above a sensible real-world confidence:

yolov5/utils/general.py

Lines 1078 to 1081 in 5fac5ad

if gt or conf[j] > 0.3: # 0.3 conf thresh
label = '%s' % cls if gt else '%s %.1f' % (cls, conf[j])
plot_one_box(box, mosaic, label=label, color=color, line_thickness=tl)

Not to be confused with the mAP confidence threshold, which is set as close to zero as feasible to record the highest mAP:

yolov5/test.py

Line 257 in 5fac5ad

parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')

Also, a bit on a separate topic but based on your training plots you have may not have trained sufficient epochs. One clue to this (other than validation confidences that do not overfit) is low confidences on your detections (i.e. around 40% confidence in your road traffic gif).

@AyushExel
Copy link
Contributor Author

AyushExel commented Oct 10, 2020

@glenn-jocher I ceiling is for the number of images being logged per epoch and not on the number of bounding boxes. All the detected bounding boxes are being logged. Which means that at maximum 100 images will be logged per epoch. This serves 2 purposes:

  • To sure the UI feels snappy
  • It's easier to debug fewer images.

Also, plotting all the boxes also serves a purpose. The bounding box debugger of W&B supports the tuning of confidence scores in real-time( Like the GIF above). So, I thought applying a filter would not make sense.

But we can relax this restriction in the future. Would love your thoughts on this as well.

As for the training, I trained all the models for 100 epochs to keep the comparison fair. But I'm training again for 200 epochs to improve confidence.

@glenn-jocher
Copy link
Member

@AyushExel ah, got it, thanks for the clarification. We've found image logging ops to be very costly operations, for wall clock time as well as hard drive space, hence the current approach of only plotting the first few batches of the first and final epoch during training.

The slowdown is more pronounced for smaller datasets, like the COCO128 tutorial, where each epoch may only last a few seconds, but will affect all datasets to varying degrees naturally. Toeing this line is part of the compromise we are trying to keep, between adding features and speeding up training.

@AyushExel
Copy link
Contributor Author

AyushExel commented Oct 27, 2020

@glenn-jocher Here are some results of the profiling tests that I ran for wandb logging code

Time unit => 1e-3 seconds(ms)Time profile for all steps in W&B image logging

#Line     #Hits     #Time     #Time/hit  #%Time  
   146       128          0.6       0.0         0.0              if len(wandb_image_log) < num_predictions:
   147        50          1.5       0.0        0.0                  x = pred.clone()
   148       100        875.5      8.8         3.8                  bbox_data = [{
   149                                                               "position": {
   150                                                                   "minX": float(xyxy[0]),
   151                                                                   "minY": float(xyxy[1]),
   152                                                                   "maxX": float(xyxy[2]),
   153                                                                   "maxY": float(xyxy[3])
   154                                                               },
   155                                                               "class_id": int(cls),
   156                                                               "scores": {
   157                                                                   "class_score": float(conf)
   158                                                               },
   159                                                               "domain":"pixel"
   160        50          0.2        0.0          0.0                  } for *xyxy, conf, cls in x]
   161        50       5889.7    117.8           25.5                  im = wandb.Image(img[si], boxes={"predictions": 
                                                                                           {"box_data":bbox_data}})
   162        50          0.3      0.0           0.0                  wandb_image_log.append(im)
 
   224         1          0.0      0.0             0.0      if len(wandb_image_log) > 0:
   225         1        108.6    108.6            0.5          wandb.log({"outputs":wandb_image_log})

TIme Profile for all steps in existing image plotting

#Line     #Hits     #Time     #Time/hit  #%Time    
   217        32          0.1      0.0             0.0          if plots and batch_i < 1:
   218         1          0.1      0.1              0.0              f = save_dir / ('test_batch%g_gt.jpg' % batch_i)  # filename
   219         1         22.9     22.9            0.1              plot_images(img, targets, paths, str(f), names)  # ground truth
   220         1          0.0      0.0              0.0              f = save_dir / ('test_batch%g_pred.jpg' % batch_i)
   221         1        347.2    347.2          1.5              plot_images(img, output_to_target(output, width, height), paths, str(f), names) 
                             

Summary:

One thing to note here is that this time taken will remain constant on a given system, as we're logging a constant number of images( 50 in this case) . In the case of smaller datasets( like coco-128), the penalty will be noticeable but intuitively, we should never log around 50% of the dataset for debugging. But for larger datasets, this penalty won't be that noticeable. But in either case, the number of images to be logged should be available as an argparse parameter to provide full control to the user

@glenn-jocher
Copy link
Member

@AyushExel thanks for the profile info! Before going any further we'll want to rebase this against master, as there have been many commits since the PR was originally submitted. We have an action that can do this automatically sometimes, I'll try it, but failing that you'll likely need to do this by hand by following the directions in #1108 (review).

/rebase

@glenn-jocher glenn-jocher added the enhancement New feature or request label Oct 28, 2020
@glenn-jocher glenn-jocher self-requested a review October 28, 2020 16:06
@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 28, 2020

@AyushExel ok, I ran some speed tests for an alternative implementation.

If you put a breakpoint at L146 of test.py (after your rebase) and test the following lines you'll see that the second accomplishes the same as the first in about 2% of the time (94us vs 5540us, or 50X faster). So you should switch to .tolist() for sending the values to your dict of dicts.

If wandb can accept a python float() (the default type() created by the .tolist() method) value for the class_id key then you are all set, else you can retain that "class_id": int(cls), line.

%timeit [[float(x1), float(x2), float(x3), float(x4), float(x5), float(x6)] for x1, x2, x3, x4, x5, x6 in pred.clone()]
5.54 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit [[x1, x2, x3, x4, x5, x6] for x1, x2, x3, x4, x5, x6 in pred.clone().tolist()]
94.1 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@AyushExel
Copy link
Contributor Author

@glenn-jocher Thanks for pointing this out. I was not able to merge this branch with the updated version of the master branch as there were a lot of conflicts. I ended up with broken code when trying to resolve the conflicts. So, I started fresh with a clean branch and made a new PR with the changes you requested.

Here the new PR #1235 . Feel to close this PR.

@glenn-jocher
Copy link
Member

@glenn-jocher Thanks for pointing this out. I was not able to merge this branch with the updated version of the master branch as there were a lot of conflicts. I ended up with broken code when trying to resolve the conflicts. So, I started fresh with a clean branch and made a new PR with the changes you requested.

Here the new PR #1235 . Feel to close this PR.

That's a good idea. I was actually thinking of recommending that. Will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants