wandb: Network error (ReadTimeout), entering retry loop. See wandb\debug-internal.log for full traceback. #2840

Zigars · 2021-04-19T02:45:54Z

Current repo: yolov5-5.0 release version
Common dataset: VisDrone.yaml
Common environment: Colab, Google Cloud, or Docker image. See https://github.com/ultralytics/yolov5#environments

🐛 Bug

I try to use your rep to train yolov4's NET because yolov4(https://github.com/WongKinYiu/PyTorch_YOLOv4)'s code is outdate and do not maintain, it has many bugs.
when I train my own yolov4-tiny.yaml, it comes this bug, I think this bug is because my network can not connect to wandb's server? before today, I can train normally, and a few minute ago, I try many times to python train.py ,but I still can not begin my train code.

To Reproduce (REQUIRED)

python train.py

Output:

YOLOv5  2021-4-15 torch 1.7.1 CUDA:0 (GRID V100D-32Q, 32638.0MB)

Namespace(adam=False, artifact_alias='latest', batch_size=64, bbox_interval=-1, bucket='', cache_images=False, cfg='models/yolov4-tiny.yaml', data='datai/Visdrone.yaml', device='', entity=None, epochs=300, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs\\train\\exp8', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=64, upload_dataset=False, weights='', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
wandb: Currently logged in as: zigar (use `wandb login --relogin` to force relogin)
wandb: Network error (ReadTimeout), entering retry loop. See wandb\debug-internal.log for full traceback.

Expected behavior

A clear and concise description of what you expected to happen.

Environment

If applicable, add screenshots to help explain your problem.

OS: [e.g. WIndows 10]
GPU [e.g. GRID V100D-32Q, 32638.0MB]

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

github-actions · 2021-04-19T02:46:30Z

👋 Hello @Zigars, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

Zigars · 2021-04-19T02:53:37Z

also, when I switch to yolov5s.yaml, I still can not train normally, if there have some way that I can close wandb so that I can train normally, I used have login in wandb, and my network can't open wandb.ai too.

Zigars · 2021-04-19T03:10:40Z

I uninstall the wandb and solve this bug...

glenn-jocher · 2021-04-19T09:24:18Z

@Zigars hi sorry to hear about your logging issues! Sometimes network interruptions can prevent logging to wandb, though this should not cause an error. @AyushExel, our W&B contact may have some more info.

I saw you have a VisDrone.yaml. I've seen this dataset is pretty popular, please consider submitting a Pull Request to add your VisDone.yaml and if possible a get_visdrone.sh file to help future users auto-download this dataset. Thank you!

Zigars · 2021-04-19T10:15:52Z

@glenn-jocher I'm so happy to get your reply! I enjoy using your yolov5 code to train object detection task, it's a great rep! Recently ,I was doing some research that use yolo to detect VisDrone dataset. I'm sorry that I'm not familiar with git and scratch, So PR or a get_visdrone.sh is a difficult things for me.If you want the VisDrone.yaml and the ready-made VisDrone dataset(I download it from VisDrone, and transform it to coco form), I can send these to your email.

glenn-jocher · 2021-04-19T10:25:50Z

@Zigars hey great! I think you can attach files directly to these messages, so maybe you can just attach your visdrone yaml and the code you used to download and convert to YOLO format and I could do the PR.

Zigars · 2021-04-19T13:47:59Z

Hi, @glenn-jocher ,I spend some times to rewrite my convert code, because the original code is a little ugly. :(

And I will give you the visdrone.yaml, the code trans_yolo.py and a VisDrone-test.zip dataset zip.

visdrone.yaml include the data path, nc=10 and class names;

you can convert visdrone to YOLO format by use trans_yolo.py;

because the original dataset is too large, you can download the VisDrone-DET in github, and put the annotaions and images in one directory VisDrone-DET like the VisDrone-test.zip.

VisDrone-test.zip is test for convert code, include test-dev, train and val data, 3 data type each 10 images and annotations. you can delate the other file except annotations and images, than python trans_yolo.py to test the convert code, remember to fix the path, I provide 'relate path' and 'absolute path' two path way, all tested it in your train.py code successfully.
VisDrone.zip

AyushExel · 2021-04-19T14:54:31Z

@Zigars Thanks for filing this issue. As @glenn-jocher said, network interruptions can cause wandb to not log data to the dashboard but it should not cause errors. Can you please confirm what version of the wandb client you're using( run pip list and see what version of wandb is installed)?
But if you're facing network issues and you're not able to log files using wandb at the moment, there's another recommended way to handle this case:

run wandb offline to enable offline mode. This will track you experiments but not try to upload anything to cloud
run wandb sync when you're back online to sync everything to your wandb dashboard.

Zigars · 2021-04-19T15:29:09Z

@AyushExel Sorry, I solve this bug by uninstall the wandb, I remember I update the latest version of wandb? and the terminal could be stick, train.py still do not work in that times. I can show you a debug-internal.log so that you can fix this bug, thank you for your replay!
It's 23.24 now in China, tomorrow I will try your recommended way. good night!
debug-internal.log

glenn-jocher · 2021-04-21T12:25:23Z

Hi, @glenn-jocher ,I spend some times to rewrite my convert code, because the original code is a little ugly. :(

And I will give you the visdrone.yaml, the code trans_yolo.py and a VisDrone-test.zip dataset zip.

visdrone.yaml include the data path, nc=10 and class names;

you can convert visdrone to YOLO format by use trans_yolo.py;

because the original dataset is too large, you can download the VisDrone-DET in github, and put the annotaions and images in one directory VisDrone-DET like the VisDrone-test.zip.

VisDrone-test.zip is test for convert code, include test-dev, train and val data, 3 data type each 10 images and annotations. you can delate the other file except annotations and images, than python trans_yolo.py to test the convert code, remember to fix the path, I provide 'relate path' and 'absolute path' two path way, all tested it in your train.py code successfully.
VisDrone.zip

@Zigars awesome thanks! I'll see if I can convert this into a PR so future users can autodownload VisDrone more easily.

TODO: VisDrone autodownload PR

glenn-jocher · 2021-04-21T19:18:20Z

@Zigars I've used your example to create a working visdrone.yaml with autodownload capability in PR #2882. Please take a look there and let me know what you think. One thing I don't understand is this line, I'm guessing this is an ignore region?

if row[4] == '0':  # TODO explain this line
    continue

glenn-jocher · 2021-04-21T19:20:21Z

@Zigars actually, even better, could you update this line in the PR with a better explanation for this? Then you will also show up as an official PR author for the repo, giving you credit for your work!

Zigars · 2021-04-22T01:25:06Z

@glenn-jocher hi! I‘m SOOOO happy to give the PR for yolov5! thanks so much! and I can answer your question, this line is because original VisDrone-DET have 12 classes! it include 'ignored regions' and 'others' two classes ,with original annotations row[4] == '0' to delate these two classes, so that we can get 10 useful classes to train our network. also this dataset is particularly difficult to train, yolov5s with 300 epoch, I can only get 17.0 map, the reason for this maybe because the targets are most small target. Recently, I'm conducting an experiment to get faster and more mAP in small object detection.

glenn-jocher · 2021-04-22T20:01:18Z

@Zigars ah I understand now! Yes the dataset is difficult. With this sort of data (very small objects) you should really train at higher resolution with a P6 model, i.e.:

python train.py --data visdrone.yaml --weights yolov5m6.pt --batch-size 32 --img 1280

EDIT: actually maybe the P6 model doesn't matter, as it's targeted for larger objects, but definitely a higher resolution like 1280 or 1920 would help this dataset.

Zigars added the bug Something isn't working label Apr 19, 2021

glenn-jocher added TODO and removed TODO labels Apr 21, 2021

glenn-jocher linked a pull request Apr 21, 2021 that will close this issue

VisDrone2019-DET Dataset Auto-Download #2882

Merged

11 tasks

glenn-jocher closed this as completed in #2882 Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wandb: Network error (ReadTimeout), entering retry loop. See wandb\debug-internal.log for full traceback. #2840

wandb: Network error (ReadTimeout), entering retry loop. See wandb\debug-internal.log for full traceback. #2840

Zigars commented Apr 19, 2021

github-actions bot commented Apr 19, 2021 •

edited by glenn-jocher

Loading

Zigars commented Apr 19, 2021

Zigars commented Apr 19, 2021

glenn-jocher commented Apr 19, 2021

Zigars commented Apr 19, 2021

glenn-jocher commented Apr 19, 2021

Zigars commented Apr 19, 2021 •

edited

Loading

AyushExel commented Apr 19, 2021

Zigars commented Apr 19, 2021

glenn-jocher commented Apr 21, 2021 •

edited

Loading

glenn-jocher commented Apr 21, 2021

glenn-jocher commented Apr 21, 2021

Zigars commented Apr 22, 2021

glenn-jocher commented Apr 22, 2021 •

edited

Loading

wandb: Network error (ReadTimeout), entering retry loop. See wandb\debug-internal.log for full traceback. #2840

wandb: Network error (ReadTimeout), entering retry loop. See wandb\debug-internal.log for full traceback. #2840

Comments

Zigars commented Apr 19, 2021

🐛 Bug

To Reproduce (REQUIRED)

Expected behavior

Environment

Additional context

github-actions bot commented Apr 19, 2021 • edited by glenn-jocher Loading

Requirements

Environments

Status

Zigars commented Apr 19, 2021

Zigars commented Apr 19, 2021

glenn-jocher commented Apr 19, 2021

Zigars commented Apr 19, 2021

glenn-jocher commented Apr 19, 2021

Zigars commented Apr 19, 2021 • edited Loading

AyushExel commented Apr 19, 2021

Zigars commented Apr 19, 2021

glenn-jocher commented Apr 21, 2021 • edited Loading

glenn-jocher commented Apr 21, 2021

glenn-jocher commented Apr 21, 2021

Zigars commented Apr 22, 2021

glenn-jocher commented Apr 22, 2021 • edited Loading

github-actions bot commented Apr 19, 2021 •

edited by glenn-jocher

Loading

Zigars commented Apr 19, 2021 •

edited

Loading

glenn-jocher commented Apr 21, 2021 •

edited

Loading

glenn-jocher commented Apr 22, 2021 •

edited

Loading