Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when i run train.py calculate val is very slow #11474

Closed
1 task done
HerrAskinSM opened this issue May 2, 2023 · 14 comments
Closed
1 task done

when i run train.py calculate val is very slow #11474

HerrAskinSM opened this issue May 2, 2023 · 14 comments
Labels
question Further information is requested Stale

Comments

@HerrAskinSM
Copy link

Search before asking

Question

Greetings, colleagues!
I run the training on my own dataset using the python command
python train.py --batch 512 --weights runs/train/exp/weights/best.pt --data custom.yaml --epochs 300 --img 96 --cache --patience 20 --freeze 2
Train process is fast - 7+ it/s
but Val process is very slow - 3.7 s/it
There is a feeling that GPU acceleration is turned off, or cache is not used, although I read from the picture in cashe before starting the training.
P.S. YOLOv5 🚀 v7.0-155-g8ecc727 Python-3.9.5 torch-1.12.1+cu116 CUDA:0 (NVIDIA GeForce RTX 3090, 24268MiB)

Additional

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
  0/299      4.05G    0.03435   0.008149   0.009415       3140         96: 100%|██████████| 555/555 [01:17<00:00,  7.18it/s]
             Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 31/31 [01:54<00:00,  3.68s/it]
               all      31539     267968      0.998      0.986      0.994      0.825
@HerrAskinSM HerrAskinSM added the question Further information is requested label May 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 2, 2023

👋 Hello @HerrAskinSM, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@glenn-jocher
Copy link
Member

@HerrAskinSM hello! Thank you for using YOLOv5.

The training process is generally faster than validation because it requires fewer computations. It is normal for the validation process to be slower than training. 3.7s/it for validation is not unusual.

However, you can increase the validation speed by reducing the --batch-size parameter in the val section of your custom.yaml file, which will reduce the images processed in each validation iteration.

I hope this helps! If you have any additional questions or concerns, don't hesitate to ask.

@HerrAskinSM
Copy link
Author

Thanks for the answer!
However, if I run the training on the COCO 128 dataset, the output will be as follows:
python train.py --img 640 --epochs 3 --data coco128.yaml --weights yolov5m.pt

Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
1/2 8.37G 0.03795 0.05817 0.01471 193 640: 100%|██████████| 8/8 [00:01<00:00, 6.35it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 4/4 [00:00<00:00, 4.88it/s]
all 128 929 0.763 0.707 0.792 0.574

To put it very roughly, the speed decreases by 2 times - from 6.35 to 4.38 iterations per second. I also remember my past training sessions, where the speed also dropped about 2 times.
But in the post above, the speed dropped 26 times!

@glenn-jocher
Copy link
Member

@HerrAskinSM, thank you for bringing that to our attention. The decrease in validation speed may be due to the size of the dataset or the hardware that you are using. Validation on larger datasets can take longer because the model needs to process more images. Additionally, certain hardware configurations may result in slower validation speeds.

The speed drop you reported in your previous post may be abnormal and is likely due to some other issue. If you have any other concerns or notice any unusual behavior during validation, please let us know and we'd be happy to help you troubleshoot.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2023

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Jun 3, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2023
@BossCrab-jyj
Copy link

BossCrab-jyj commented Aug 27, 2023

I modify the code ,modify val_loader like train_loader:
val_loader = create_dataloader(val_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
hyp=hyp, cache=None if noval else opt.cache,
rect=opt.rect, rank=LOCAL_RANK, workers=workers, pad=0.5,
prefix=colorstr('val: '))[0]

when training,the speed of validation time reduced from 25 seconds to 3 seconds,Is there only one gpu running validation ? yolov5 can reduce time when modify the code,but its not working on yolov8,The following error message appears,Is it correct for me to modify yolov5 like this, and if so, how should I modify it in yolov8?

File "yolov8/ultralytics/data/build.py",line 38,in iteryield next(self.iterator)
File "yolov8/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 633,innextdata = self. next data()
File "yolov8/lib/python3.8/site-packages/torch/utils/data/dataloader.py".line 1325,in next datareturn self. process data(data)line 1371,in process datacile
"volov8/lib/python3,8/site-packages/torch/utils/data/dataloader.py"data.reraise()File "yolov8/lib/python3.8/site-packages/torch/ utils,py", line 644, in reraiseraise exceptionRuntimeError: Caught RuntimeError in Dataloader worker process 18.Original Traceback (most recent call last):
File "yolov8/lib/python3.8/site-packages/torch/utils/data/ utils/worker,py", line 308,in worker loop=fetcher.fetch(index)
yolov8/lib/python3.8/site-packages/torch/utils/data/ utils/fetch.py", line 54, in fetchFilereturn self.collate fn(data)
File "yolov8/ultralytics/data/dataset.py", line 192,in collate fnwaluetorch.stack(value, 0)runtimeError: stack expects each tensor to be ecual size, but got 13, 832, 640] at entry 0 and (3, 832, 544) at entry 50

@glenn-jocher
Copy link
Member

@yingjie-jiang the issue you are experiencing is likely related to the difference in input sizes between the training and validation datasets. In YOLOv8, the collate_fn function used in the val.loader requires each tensor to be of equal size. However, in your modified code, the size of the validation images seems to be different from the training images, resulting in the error message you mentioned.

To resolve this issue, you need to ensure that the images in both the training and validation datasets have the same size. Make sure that the imgsz parameter passed to the create_dataloader function is the same for both the training and validation sets.

Additionally, please note that modifying the code in this manner may have unintended consequences and is not the recommended approach. It's best to follow the default configuration and settings provided by the YOLOv8 repository.

If you have any further questions or issues, please don't hesitate to ask for assistance.

@BossCrab-jyj
Copy link

BossCrab-jyj commented Aug 28, 2023

I know the reason why the img size of validation is not same as training,I modify ultralytics/models/yolo/detect/train.py func DetectionTrainer.build_dataset() change the return :
return build_yolo_dataset(self.args, img_path, batch, self.data, mode=mode, rect=mode == 'val', stride=gs)
to:
return build_yolo_dataset(self.args, img_path, batch, self.data, mode=mode, rect=False, stride=gs)
The speed reduced from 40 seconds to 5 seconds, but when validation ,the terminal will print many information.

18/500 22.5G 1.7 1.133 1.263 174 800: 100%|██████████| 81/81 [00:37<00:00, 2.15it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.13it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.10it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.02it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 67%|██████▋ | 14/21 [00:06<00:02, 2.45it/s] all 1209 3927 0.385 0.298 0.271 0.132
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:07<00:00, 2.81it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 81%|████████ | 17/21 [00:07<00:01, 3.20it/s]
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:08<00:00, 2.48it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:08<00:00, 2.33it/s]
19/500 22.5G 1.687 1.116 1.258 149 800: 100%|██████████| 81/81 [00:39<00:00, 2.03it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.20it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.16it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:06<00:00, 3.08it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 67%|██████▋ | 14/21 [00:06<00:02, 3.06it/s] all 1209 3927 0.459 0.278 0.281 0.135
Class Images Instances Box(P R mAP50 mAP50-95): 90%|█████████ | 19/21 [00:07<00:00, 3.21it/s]
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:07<00:00, 2.64it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:08<00:00, 2.47it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 21/21 [00:08<00:00, 2.43it/s]

I don't know if this modification will be correct, but it does greatly improve the speed of verification during training.

@glenn-jocher
Copy link
Member

@yingjie-jiang the modification you made to the DetectionTrainer.build_dataset() function in ultralytics/models/yolo/detect/train.py appears to have improved the validation speed during training. However, it's important to note that modifying the code in this way may have unintended consequences and could potentially impact the accuracy or reliability of the model.

To ensure that the modification is correct and has no negative effects, it would be ideal to thoroughly test the trained model and evaluate its performance on the validation set. Specifically, you should assess the mAP50 and mAP50-95 scores to ensure that the accuracy of the model has not been compromised.

If the modified code consistently produces good results on the validation set without any negative impact on the model's accuracy, then it could be considered a valid modification to improve speed during training. However, it is always recommended to follow the default configuration and settings provided by the YOLOv5 repository unless you have strong reasons to make modifications.

If you have any further questions or concerns, please feel free to ask.

@BossCrab-jyj
Copy link

Thank you for your suggestion! Will this project optimize this speed issue (not too sure if it makes sense)?

@glenn-jocher
Copy link
Member

@yingjie-jiang thank you for your suggestion! We appreciate your feedback regarding the speed issue in the project. Our team is always working on improving the performance and optimizing the codebase of YOLOv5. While we cannot guarantee specific timelines or outcomes, we will certainly take your suggestion into consideration for future updates.

Please continue to share any other issues or feature requests you come across. We value your contributions to YOLOv5 and Vision AI.

@ASharpSword
Copy link

@BossCrab-jyj Hello, I would like to ask how do you change this, because I noticed that there is (if RANK in {-1, 0}:) constraint in the validation phase, only change the create_dataloader parameter to create the validation set, and finally there is only one GPU running (validate.run) method

@ASharpSword
Copy link

@BossCrab-jyj I don't know exactly how you changed the code, but I changed it myself by adding (results, maps, _ = validate.run(...) This is out of scope for (if RANK in {-1, 0}:), but this leads to a problem where the n Gpus split the validation set equally, but only the cuda:0 result is printed, or each GPU is competing to print the information to the console. I was never able to combine the results from each GPU process to get the full validation set results, only the fragmented validator results from each GPU process

@ASharpSword
Copy link

@BossCrab-jyj "but when validation,the terminal will print many information." I think it should be validate.run (...) In, each GPU process creates a tqdm(pbar = tqdm(dataloader, desc=s, bar_format=TQDM_BAR_FORMAT)), and the tqdms interfere with each other, eventually printing many tqdm. If I had to do it, I would remove the tqdm and print the information myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

4 participants