when i run train.py calculate val is very slow #11474

HerrAskinSM · 2023-05-02T21:26:28Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Greetings, colleagues!
I run the training on my own dataset using the python command
python train.py --batch 512 --weights runs/train/exp/weights/best.pt --data custom.yaml --epochs 300 --img 96 --cache --patience 20 --freeze 2
Train process is fast - 7+ it/s
but Val process is very slow - 3.7 s/it
There is a feeling that GPU acceleration is turned off, or cache is not used, although I read from the picture in cashe before starting the training.
P.S. YOLOv5 🚀 v7.0-155-g8ecc727 Python-3.9.5 torch-1.12.1+cu116 CUDA:0 (NVIDIA GeForce RTX 3090, 24268MiB)

Additional

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
  0/299      4.05G    0.03435   0.008149   0.009415       3140         96: 100%|██████████| 555/555 [01:17<00:00,  7.18it/s]
             Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 31/31 [01:54<00:00,  3.68s/it]
               all      31539     267968      0.998      0.986      0.994      0.825

The text was updated successfully, but these errors were encountered:

github-actions · 2023-05-02T21:27:12Z

👋 Hello @HerrAskinSM, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

glenn-jocher · 2023-05-03T00:16:52Z

@HerrAskinSM hello! Thank you for using YOLOv5.

The training process is generally faster than validation because it requires fewer computations. It is normal for the validation process to be slower than training. 3.7s/it for validation is not unusual.

However, you can increase the validation speed by reducing the --batch-size parameter in the val section of your custom.yaml file, which will reduce the images processed in each validation iteration.

I hope this helps! If you have any additional questions or concerns, don't hesitate to ask.

HerrAskinSM · 2023-05-03T10:51:29Z

Thanks for the answer!
However, if I run the training on the COCO 128 dataset, the output will be as follows:
python train.py --img 640 --epochs 3 --data coco128.yaml --weights yolov5m.pt

Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
1/2 8.37G 0.03795 0.05817 0.01471 193 640: 100%|██████████| 8/8 [00:01<00:00, 6.35it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 4/4 [00:00<00:00, 4.88it/s]
all 128 929 0.763 0.707 0.792 0.574

To put it very roughly, the speed decreases by 2 times - from 6.35 to 4.38 iterations per second. I also remember my past training sessions, where the speed also dropped about 2 times.
But in the post above, the speed dropped 26 times!

glenn-jocher · 2023-05-03T12:54:14Z

@HerrAskinSM, thank you for bringing that to our attention. The decrease in validation speed may be due to the size of the dataset or the hardware that you are using. Validation on larger datasets can take longer because the model needs to process more images. Additionally, certain hardware configurations may result in slower validation speeds.

The speed drop you reported in your previous post may be abnormal and is likely due to some other issue. If you have any other concerns or notice any unusual behavior during validation, please let us know and we'd be happy to help you troubleshoot.

github-actions · 2023-06-03T00:23:35Z

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

BossCrab-jyj · 2023-08-27T15:07:47Z

I modify the code ,modify val_loader like train_loader:
val_loader = create_dataloader(val_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
hyp=hyp, cache=None if noval else opt.cache,
rect=opt.rect, rank=LOCAL_RANK, workers=workers, pad=0.5,
prefix=colorstr('val: '))[0]

when training,the speed of validation time reduced from 25 seconds to 3 seconds,Is there only one gpu running validation ？ yolov5 can reduce time when modify the code，but its not working on yolov8，The following error message appears，Is it correct for me to modify yolov5 like this, and if so, how should I modify it in yolov8?

File "yolov8/ultralytics/data/build.py"，line 38，in iteryield next(self.iterator)
File "yolov8/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 633,innextdata = self. next data()
File "yolov8/lib/python3.8/site-packages/torch/utils/data/dataloader.py".line 1325,in next datareturn self. process data(data)line 1371,in process datacile
"volov8/lib/python3,8/site-packages/torch/utils/data/dataloader.py"data.reraise()File "yolov8/lib/python3.8/site-packages/torch/ utils,py", line 644, in reraiseraise exceptionRuntimeError: Caught RuntimeError in Dataloader worker process 18.Original Traceback (most recent call last):
File "yolov8/lib/python3.8/site-packages/torch/utils/data/ utils/worker,py", line 308,in worker loop=fetcher.fetch(index)
yolov8/lib/python3.8/site-packages/torch/utils/data/ utils/fetch.py", line 54, in fetchFilereturn self.collate fn(data)
File "yolov8/ultralytics/data/dataset.py", line 192,in collate fnwaluetorch.stack(value, 0)runtimeError: stack expects each tensor to be ecual size, but got 13, 832, 640] at entry 0 and (3, 832, 544) at entry 50

glenn-jocher · 2023-08-27T16:43:17Z

@yingjie-jiang the issue you are experiencing is likely related to the difference in input sizes between the training and validation datasets. In YOLOv8, the collate_fn function used in the val.loader requires each tensor to be of equal size. However, in your modified code, the size of the validation images seems to be different from the training images, resulting in the error message you mentioned.

To resolve this issue, you need to ensure that the images in both the training and validation datasets have the same size. Make sure that the imgsz parameter passed to the create_dataloader function is the same for both the training and validation sets.

Additionally, please note that modifying the code in this manner may have unintended consequences and is not the recommended approach. It's best to follow the default configuration and settings provided by the YOLOv8 repository.

If you have any further questions or issues, please don't hesitate to ask for assistance.

BossCrab-jyj · 2023-08-28T03:34:47Z

I know the reason why the img size of validation is not same as training,I modify ultralytics/models/yolo/detect/train.py func DetectionTrainer.build_dataset() change the return :
return build_yolo_dataset(self.args, img_path, batch, self.data, mode=mode, rect=mode == 'val', stride=gs)
to:
return build_yolo_dataset(self.args, img_path, batch, self.data, mode=mode, rect=False, stride=gs)
The speed reduced from 40 seconds to 5 seconds, but when validation ,the terminal will print many information.

18/500 22.5G Class Images Instances Class Images Instances Class Images Instances Class Images Instances Class Images Instances Class Images Instances Epoch GPU_mem box_loss Class Images Instances Class Images Instances 19/500 22.5G Class Images Instances Class Images Instances Class Images Instances Class Images Instances Class Images Instances Epoch GPU_mem box_loss Class Images Instances Class Images Instances Class Images Instances 1.7 1.133 1.263 174 800: 100%|██████████| 81/81 [00:37<00:00, 2.15it/s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.13it/s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.10it/s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.02it/s]
Box(P R mAP50 mAP50-95): 67%|██████▋ | 14/21 [00:06<00:02, 2.45it/s] all 1209 3927 0.385 0.298 0.271 0.132
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:07<00:00, 2.81it/s]
Box(P R mAP50 mAP50-95): 81%|████████ | 17/21 [00:07<00:01, 3.20it/s]
cls_loss dfl_loss Instances Size
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:08<00:00, 2.48it/s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:08<00:00, 2.33it/s]
1.687 1.116 1.258 149 800: 100%|██████████| 81/81 [00:39<00:00, 2.03it/s]s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.20it/s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.16it/s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.08it/s]
Box(P R mAP50 mAP50-95): 67%|██████▋ | 14/21 [00:06<00:02, 3.06it/s] all 1209 3927 0.459 0.278 0.281 0.135
Box(P R mAP50 mAP50-95): 90%|█████████ | 19/21 [00:07<00:00, 3.21it/s]
cls_loss dfl_loss Instances Size
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:07<00:00, 2.64it/s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:08<00:00, 2.47it/s]
Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:08<00:00, 2.43it/s]

I don't know if this modification will be correct, but it does greatly improve the speed of verification during training.

glenn-jocher · 2023-08-28T04:45:12Z

@yingjie-jiang the modification you made to the DetectionTrainer.build_dataset() function in ultralytics/models/yolo/detect/train.py appears to have improved the validation speed during training. However, it's important to note that modifying the code in this way may have unintended consequences and could potentially impact the accuracy or reliability of the model.

To ensure that the modification is correct and has no negative effects, it would be ideal to thoroughly test the trained model and evaluate its performance on the validation set. Specifically, you should assess the mAP50 and mAP50-95 scores to ensure that the accuracy of the model has not been compromised.

If the modified code consistently produces good results on the validation set without any negative impact on the model's accuracy, then it could be considered a valid modification to improve speed during training. However, it is always recommended to follow the default configuration and settings provided by the YOLOv5 repository unless you have strong reasons to make modifications.

If you have any further questions or concerns, please feel free to ask.

BossCrab-jyj · 2023-08-28T05:40:02Z

Thank you for your suggestion! Will this project optimize this speed issue (not too sure if it makes sense)?

glenn-jocher · 2023-08-28T06:08:40Z

@yingjie-jiang thank you for your suggestion! We appreciate your feedback regarding the speed issue in the project. Our team is always working on improving the performance and optimizing the codebase of YOLOv5. While we cannot guarantee specific timelines or outcomes, we will certainly take your suggestion into consideration for future updates.

Please continue to share any other issues or feature requests you come across. We value your contributions to YOLOv5 and Vision AI.

ASharpSword · 2024-06-27T07:29:43Z

@BossCrab-jyj Hello, I would like to ask how do you change this, because I noticed that there is (if RANK in {-1, 0}:) constraint in the validation phase, only change the create_dataloader parameter to create the validation set, and finally there is only one GPU running (validate.run) method

ASharpSword · 2024-06-27T08:16:09Z

@BossCrab-jyj I don't know exactly how you changed the code, but I changed it myself by adding (results, maps, _ = validate.run(...) This is out of scope for (if RANK in {-1, 0}:), but this leads to a problem where the n Gpus split the validation set equally, but only the cuda:0 result is printed, or each GPU is competing to print the information to the console. I was never able to combine the results from each GPU process to get the full validation set results, only the fragmented validator results from each GPU process

ASharpSword · 2024-06-27T08:31:07Z

@BossCrab-jyj "but when validation,the terminal will print many information." I think it should be validate.run (...) In, each GPU process creates a tqdm(pbar = tqdm(dataloader, desc=s, bar_format=TQDM_BAR_FORMAT)), and the tqdms interfere with each other, eventually printing many tqdm. If I had to do it, I would remove the tqdm and print the information myself

HerrAskinSM added the question Further information is requested label May 2, 2023

github-actions bot added the Stale label Jun 3, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when i run train.py calculate val is very slow #11474

when i run train.py calculate val is very slow #11474

HerrAskinSM commented May 2, 2023

github-actions bot commented May 2, 2023

glenn-jocher commented May 3, 2023

HerrAskinSM commented May 3, 2023

glenn-jocher commented May 3, 2023

github-actions bot commented Jun 3, 2023

BossCrab-jyj commented Aug 27, 2023 •

edited

Loading

glenn-jocher commented Aug 27, 2023

BossCrab-jyj commented Aug 28, 2023 •

edited

Loading

glenn-jocher commented Aug 28, 2023

BossCrab-jyj commented Aug 28, 2023

glenn-jocher commented Aug 28, 2023

ASharpSword commented Jun 27, 2024

ASharpSword commented Jun 27, 2024

ASharpSword commented Jun 27, 2024

when i run train.py calculate val is very slow #11474

when i run train.py calculate val is very slow #11474

Comments

HerrAskinSM commented May 2, 2023

Search before asking

Question

Additional

github-actions bot commented May 2, 2023

Requirements

Environments

Status

Introducing YOLOv8 🚀

glenn-jocher commented May 3, 2023

HerrAskinSM commented May 3, 2023

glenn-jocher commented May 3, 2023

github-actions bot commented Jun 3, 2023

BossCrab-jyj commented Aug 27, 2023 • edited Loading

glenn-jocher commented Aug 27, 2023

BossCrab-jyj commented Aug 28, 2023 • edited Loading

glenn-jocher commented Aug 28, 2023

BossCrab-jyj commented Aug 28, 2023

glenn-jocher commented Aug 28, 2023

ASharpSword commented Jun 27, 2024

ASharpSword commented Jun 27, 2024

ASharpSword commented Jun 27, 2024

BossCrab-jyj commented Aug 27, 2023 •

edited

Loading

BossCrab-jyj commented Aug 28, 2023 •

edited

Loading