-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training is getting slower and slower #11490
Comments
@ylc2580 hi there, Based on the output you shared, it looks like your training is progressing normally. The warnings about sBIT chunk length are not indicative of an issue with the training process. Regarding the NMS time limit warning, this may not necessarily be an issue unless you are seeing drastically poor performance or other issues during the training process. Since you have a relatively powerful server setup, I recommend ensuring that your dataset is properly formatted and optimized for training with YOLOv5. Additionally, you may want to experiment with adjusting your hyperparameters to see if you can improve performance. Let us know if you have any further questions or concerns. Best, |
@ylc2580 hello, Based on your description, it seems like the issue might be related to the hyperparameters and the size of your validation set. The default hyperparameters might not be the most optimal for your specific use-case, and the size of your validation set (30,000) might be too large. These factors might be leading to a drop in GPU utilization during the map calculation stage. To solve the issue, I suggest experimenting with different hyperparameters to find the most optimal ones for your use-case. Additionally, you can try reducing the size of your validation set to see if that improves GPU utilization during the map calculation stage. If these suggestions do not resolve the issue, please provide more details such as the version of YOLOv5 and the specific command used to run the training/validation. I hope this helps. Let us know if you need further assistance. Best, |
the version of v5 are 7.0 and 6.1.
also, i will try in small val.
thank you.
…---Original---
From: "Glenn ***@***.***>
Date: Fri, May 5, 2023 19:20 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [ultralytics/yolov5] Training is getting slower and slower (Issue#11490)
@ylc2580 hello,
Based on your description, it seems like the issue might be related to the hyperparameters and the size of your validation set. The default hyperparameters might not be the most optimal for your specific use-case, and the size of your validation set (30,000) might be too large. These factors might be leading to a drop in GPU utilization during the map calculation stage.
To solve the issue, I suggest experimenting with different hyperparameters to find the most optimal ones for your use-case. Additionally, you can try reducing the size of your validation set to see if that improves GPU utilization during the map calculation stage.
If these suggestions do not resolve the issue, please provide more details such as the version of YOLOv5 and the specific command used to run the training/validation.
I hope this helps. Let us know if you need further assistance.
Best,
Glenn Jocher
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hello @ylc2580, Thank you for your response. I recommend trying different hyperparameters to optimize the training for your specific use-case. Additionally, reducing the size of your validation set might help improve GPU utilization during the map calculation stage. Please let us know if you encounter any other issues or have further questions. Best regards, |
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help. For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLO 🚀 and Vision AI ⭐ |
Search before asking
Question
hi,
when i am training,it is always stucked here,could help me.
My training server hardware situation is:
3090*2,xeon silver 4210
AutoAnchor: 4.64 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅
Plotting labels to runs/train/exp/labels.jpg...
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/train/exp
Starting training for 300 epochs...
Additional
No response
The text was updated successfully, but these errors were encountered: