-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when i run train.py calculate val is very slow #11474
Comments
👋 Hello @HerrAskinSM, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it. If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results. RequirementsPython>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started: git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit. Introducing YOLOv8 🚀We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀! Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects. Check out our YOLOv8 Docs for details and get started with: pip install ultralytics |
@HerrAskinSM hello! Thank you for using YOLOv5. The training process is generally faster than validation because it requires fewer computations. It is normal for the validation process to be slower than training. 3.7s/it for validation is not unusual. However, you can increase the validation speed by reducing the I hope this helps! If you have any additional questions or concerns, don't hesitate to ask. |
Thanks for the answer! Epoch GPU_mem box_loss obj_loss cls_loss Instances Size To put it very roughly, the speed decreases by 2 times - from 6.35 to 4.38 iterations per second. I also remember my past training sessions, where the speed also dropped about 2 times. |
@HerrAskinSM, thank you for bringing that to our attention. The decrease in validation speed may be due to the size of the dataset or the hardware that you are using. Validation on larger datasets can take longer because the model needs to process more images. Additionally, certain hardware configurations may result in slower validation speeds. The speed drop you reported in your previous post may be abnormal and is likely due to some other issue. If you have any other concerns or notice any unusual behavior during validation, please let us know and we'd be happy to help you troubleshoot. |
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help. For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLO 🚀 and Vision AI ⭐ |
I modify the code ,modify val_loader like train_loader: when training,the speed of validation time reduced from 25 seconds to 3 seconds,Is there only one gpu running validation ? yolov5 can reduce time when modify the code,but its not working on yolov8,The following error message appears,Is it correct for me to modify yolov5 like this, and if so, how should I modify it in yolov8? File "yolov8/ultralytics/data/build.py",line 38,in iteryield next(self.iterator) |
@yingjie-jiang the issue you are experiencing is likely related to the difference in input sizes between the training and validation datasets. In YOLOv8, the To resolve this issue, you need to ensure that the images in both the training and validation datasets have the same size. Make sure that the Additionally, please note that modifying the code in this manner may have unintended consequences and is not the recommended approach. It's best to follow the default configuration and settings provided by the YOLOv8 repository. If you have any further questions or issues, please don't hesitate to ask for assistance. |
I know the reason why the img size of validation is not same as training,I modify ultralytics/models/yolo/detect/train.py func DetectionTrainer.build_dataset() change the return : 18/500 22.5G 1.7 1.133 1.263 174 800: 100%|██████████| 81/81 [00:37<00:00, 2.15it/s] I don't know if this modification will be correct, but it does greatly improve the speed of verification during training. |
@yingjie-jiang the modification you made to the To ensure that the modification is correct and has no negative effects, it would be ideal to thoroughly test the trained model and evaluate its performance on the validation set. Specifically, you should assess the mAP50 and mAP50-95 scores to ensure that the accuracy of the model has not been compromised. If the modified code consistently produces good results on the validation set without any negative impact on the model's accuracy, then it could be considered a valid modification to improve speed during training. However, it is always recommended to follow the default configuration and settings provided by the YOLOv5 repository unless you have strong reasons to make modifications. If you have any further questions or concerns, please feel free to ask. |
Thank you for your suggestion! Will this project optimize this speed issue (not too sure if it makes sense)? |
@yingjie-jiang thank you for your suggestion! We appreciate your feedback regarding the speed issue in the project. Our team is always working on improving the performance and optimizing the codebase of YOLOv5. While we cannot guarantee specific timelines or outcomes, we will certainly take your suggestion into consideration for future updates. Please continue to share any other issues or feature requests you come across. We value your contributions to YOLOv5 and Vision AI. |
@BossCrab-jyj Hello, I would like to ask how do you change this, because I noticed that there is (if RANK in {-1, 0}:) constraint in the validation phase, only change the create_dataloader parameter to create the validation set, and finally there is only one GPU running (validate.run) method |
@BossCrab-jyj I don't know exactly how you changed the code, but I changed it myself by adding (results, maps, _ = validate.run(...) This is out of scope for (if RANK in {-1, 0}:), but this leads to a problem where the n Gpus split the validation set equally, but only the cuda:0 result is printed, or each GPU is competing to print the information to the console. I was never able to combine the results from each GPU process to get the full validation set results, only the fragmented validator results from each GPU process |
@BossCrab-jyj "but when validation,the terminal will print many information." I think it should be validate.run (...) In, each GPU process creates a tqdm(pbar = tqdm(dataloader, desc=s, bar_format=TQDM_BAR_FORMAT)), and the tqdms interfere with each other, eventually printing many tqdm. If I had to do it, I would remove the tqdm and print the information myself |
Search before asking
Question
Greetings, colleagues!
I run the training on my own dataset using the python command
python train.py --batch 512 --weights runs/train/exp/weights/best.pt --data custom.yaml --epochs 300 --img 96 --cache --patience 20 --freeze 2
Train process is fast - 7+ it/s
but Val process is very slow - 3.7 s/it
There is a feeling that GPU acceleration is turned off, or cache is not used, although I read from the picture in cashe before starting the training.
P.S. YOLOv5 🚀 v7.0-155-g8ecc727 Python-3.9.5 torch-1.12.1+cu116 CUDA:0 (NVIDIA GeForce RTX 3090, 24268MiB)
Additional
The text was updated successfully, but these errors were encountered: