wandb upload/syncing very slow #6389

DavidBaldsiefen · 2022-01-22T02:45:38Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

I recently noticed that for me, wandb syncing at the end of a training is always extremely slow. Often, it takes around 10 minutes to upload less than 5mb of data. It will stay at "0.00MB" for most of the time and then jump to the max, where it will stay for another 1-2 minutes.

This has become especially problematic as I am trying to use hyperparameter evolution, where data is synced once after every run. Right now when training 10 epochs per generation, syncing takes longer than training itself.

I already checked my internet connection through manual uploads and saw significantly faster speeds.

My questions are as follows:

Is there any known reason why syncing would be so slow?
Is it possible to make wandb only sync once after the 300 evolutions have been performed, as opposed to syncing once per evolution?

Additional

No response

github-actions · 2022-01-22T02:46:15Z

👋 Hello @DavidBaldsiefen, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2022-01-22T06:16:54Z

@DavidBaldsiefen thanks for the feedback! @AyushExel are there any network issues recently on the wandb side that might be causing delays?

AyushExel · 2022-01-23T14:28:48Z

@glenn-jocher There are no known network issues. Probably it has something to do with the users' networks. Nevertheless, there are some things that we can do to reduce the size of load.

First, do you like the idea of disabling media logging in evolve?
Second, Should we use just one wandb run for all runs in an evolve?

glenn-jocher · 2022-01-24T18:54:40Z

@AyushExel I think disabling media logging is a good idea during evolve. I don't produce any of the normal imagery like confusion matrices and validation batch images etc. when evolve is on. Let's start with that. Thanks!

glenn-jocher · 2022-02-12T12:05:26Z

@DavidBaldsiefen good news 😃! Your original issue may now be fixed ✅ in PR #6617 by @AyushExel. To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

DavidBaldsiefen added the question Further information is requested label Jan 22, 2022

AyushExel mentioned this issue Feb 11, 2022

W&B: don't log media in evolve #6617

Merged

glenn-jocher closed this as completed in #6617 Feb 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wandb upload/syncing very slow #6389

wandb upload/syncing very slow #6389

DavidBaldsiefen commented Jan 22, 2022

github-actions bot commented Jan 22, 2022 •

edited by glenn-jocher

Loading

glenn-jocher commented Jan 22, 2022

AyushExel commented Jan 23, 2022 •

edited

Loading

glenn-jocher commented Jan 24, 2022

glenn-jocher commented Feb 12, 2022

wandb upload/syncing very slow #6389

wandb upload/syncing very slow #6389

Comments

DavidBaldsiefen commented Jan 22, 2022

Search before asking

Question

Additional

github-actions bot commented Jan 22, 2022 • edited by glenn-jocher Loading

Requirements

Environments

Status

glenn-jocher commented Jan 22, 2022

AyushExel commented Jan 23, 2022 • edited Loading

glenn-jocher commented Jan 24, 2022

glenn-jocher commented Feb 12, 2022

github-actions bot commented Jan 22, 2022 •

edited by glenn-jocher

Loading

AyushExel commented Jan 23, 2022 •

edited

Loading