Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training a model on MBP M1 extremely slow #7308

Closed
1 of 2 tasks
lesleypotters opened this issue Apr 6, 2022 · 18 comments
Closed
1 of 2 tasks

Training a model on MBP M1 extremely slow #7308

lesleypotters opened this issue Apr 6, 2022 · 18 comments
Labels
bug Something isn't working

Comments

@lesleypotters
Copy link

lesleypotters commented Apr 6, 2022

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

Hi all,

I am working on a MBP M1 in a PyTorch environment with torchvision 0.12.0 and torch 1.11.0 (as recommended). I am trying yolov5 out with a wheat detection dataset. When running:
python train.py --img 1024 --batch 8 --epochs 100 --data wheat.yaml --cfg models/yolov5s.yaml --name wm
it indeed starts to train for 100 epochs, but the expected time is about an hour per epoch. I find it suspicious that no gpu memory is allocated gpu_mem 0G, although I have to say I am a newby to yolov5 and MBP M1.

This is a prtscr:
image

What could I change to improve? I have tried with --device cpubut to no avail. Any other options? Thanks!

This is my python detect.py output (which, if I am correct, is ok):
image

Environment

  • YOLOv5 🚀 v6.1-105-gd257c75 torch 1.11.0 CPU
  • Setup complete ✅ (8 CPUs, 16.0 GB RAM, 156.2/926.4 GB disk)
  • Python 3.8.13
  • torch 1.11.0
  • torchvision 0.12.0
  • OS: macOS Monterey 12.3

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@lesleypotters lesleypotters added the bug Something isn't working label Apr 6, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2022

👋 Hello @lesleypotters, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 6, 2022

@lesleypotters base M1 is pretty slow, Pro and Max are much faster but still not as fast as a CUDA GPU. Yes your times look right. This is essentially just fast CPU training, the Neural Engine is not being used by PyTorch, but is being used for CoreML exported models.

gpu_mem displays CUDA memory usage only.

See my Reddit post here: https://www.reddit.com/r/MachineLearning/comments/tbj4lf/comment/i083o5s/?utm_source=share&utm_medium=web2x&context=3

Screenshot 2022-04-06 at 11 01 10

@lesleypotters
Copy link
Author

@glenn-jocher Many thanks for your answer, very helpful.
Can we expect PyTorch to use the Neural Engine at some point (I should ask them).

Can I add --include coreml in the training code that I provided? Thanks again!

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 6, 2022

@lesleypotters yes the PyTorch team is working on M1 support, no current timeline available though.

Export is something you do after training has completed. See TFLite, ONNX, CoreML, TensorRT Export tutorial for details.

YOLOv5 Tutorials

Good luck 🍀 and let us know if you have any other questions!

@lesleypotters
Copy link
Author

Ok great, thanks for the instant support! I will close this thread.

@sphrak
Copy link

sphrak commented Aug 7, 2022

@glenn-jocher is this the stuff we need to get faster training speed with yolov5? https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/

also: would it require a lot of work to get yolov5 run with this you think? or would it be enough to bump pytorch to v1.12?

@glenn-jocher
Copy link
Member

@sphrak use python universal2 installer for ARM devices and torch nightly if you expect to use MPS

@sphrak
Copy link

sphrak commented Aug 9, 2022

@glenn-jocher thanks, but is that for training or is it only detection phase?

I think i got it i just pass --device mps to either detect.py or train.py on nightly and then it seems to be using the mps backend.

@glenn-jocher
Copy link
Member

@sphrak yes both, but full MPS support is not working yet due to unsupported pytorch aten ops. Regardless running YOLOv5 with an ARM python version will significantly speed up performance on M1/M2 devices vs Intel CPU speeds (but not as much as full MPS support).

@suveerudayashankara
Copy link

but if is use --device mps for train.py it is showing not implemented error,but working fine for detect.py
is their something i am missing

@jasonrichdarmawan
Copy link

@glenn-jocher thanks, but is that for training or is it only detection phase?

I think i got it i just pass --device mps to either detect.py or train.py on nightly and then it seems to be using the mps backend.

running python detect.py --device mps --weights yolov7-e6e.pt --img-size 1280 --source 0 throws error.

(pytorch) jason@Jasons-Mac-mini yolov7 % python detect.py --device
 mps --weights yolov7-e6e.pt --img-size 1280 --source 0
Namespace(weights=['yolov7-e6e.pt'], source='0', img_size=1280, conf_thres=0.25, iou_thres=0.45, device='mps', view_img=False, save_txt=False, save_conf=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project='runs/detect', name='exp', exist_ok=False, no_trace=False)
Traceback (most recent call last):
  File "/Volumes/T7Touch/Learn/yolov7/detect.py", line 196, in <module>
    detect()
  File "/Volumes/T7Touch/Learn/yolov7/detect.py", line 30, in detect
    device = select_device(opt.device)
  File "/Volumes/T7Touch/Learn/yolov7/utils/torch_utils.py", line 71, in select_device
    assert torch.cuda.is_available(), f'CUDA unavailable, invalid device {device} requested'  # check availability
AssertionError: CUDA unavailable, invalid device mps requested

@OliveCHU
Copy link

@glenn-jocher thanks, but is that for training or is it only detection phase?
I think i got it i just pass --device mps to either detect.py or train.py on nightly and then it seems to be using the mps backend.

running python detect.py --device mps --weights yolov7-e6e.pt --img-size 1280 --source 0 throws error.

(pytorch) jason@Jasons-Mac-mini yolov7 % python detect.py --device
 mps --weights yolov7-e6e.pt --img-size 1280 --source 0
Namespace(weights=['yolov7-e6e.pt'], source='0', img_size=1280, conf_thres=0.25, iou_thres=0.45, device='mps', view_img=False, save_txt=False, save_conf=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project='runs/detect', name='exp', exist_ok=False, no_trace=False)
Traceback (most recent call last):
  File "/Volumes/T7Touch/Learn/yolov7/detect.py", line 196, in <module>
    detect()
  File "/Volumes/T7Touch/Learn/yolov7/detect.py", line 30, in detect
    device = select_device(opt.device)
  File "/Volumes/T7Touch/Learn/yolov7/utils/torch_utils.py", line 71, in select_device
    assert torch.cuda.is_available(), f'CUDA unavailable, invalid device {device} requested'  # check availability
AssertionError: CUDA unavailable, invalid device mps requested

Hi, did you solve the problem? I received same error here, running python train.py --device mps --data vehicle/data.yaml --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
I use a Macbook pro with M1 chip. and trying to train yolo7 using custom data.
my torch version is 2.0.0.dev20230131
checked mps by torch.backends.mps.is_available() and it return True

@Mopele
Copy link

Mopele commented Mar 5, 2023

Same with me! I Also try to train yolov7 with mps and also have confirmed that mps is available, but I get the same error

@il-nietos
Copy link

image

When I compare the yolov7/utils/torch_utils.py (on the left) and yolov5/utils/torch_utils.py (on the right), the v7 doesn't seem to have an option for mps. Has anyone had any luck configuring these?

@Crear12
Copy link

Crear12 commented May 20, 2023

image

When I compare the yolov7/utils/torch_utils.py (on the left) and yolov5/utils/torch_utils.py (on the right), the v7 doesn't seem to have an option for mps. Has anyone had any luck configuring these?

I tried to modify it and eventually could get “mps” activated, but tbh it’s not worthy because there are more incompatibility issues like old ones used float64 which mps doesn’t support. It’s endless. I would suggest just use latest ones.

@glenn-jocher
Copy link
Member

@Crear12 thank you for sharing your experience with modifying the torch_utils.py file to activate MPS on YOLOv7. It's good to know that while you were able to modify the file to activate MPS, you have also encountered incompatibility issues which made the whole process not worth the effort. For those who want to use MPS, it is recommended to use the latest versions and updates of YOLOv5 and PyTorch. Thank you again for sharing your experience!

@ez4bk
Copy link

ez4bk commented Jun 11, 2023

Using PyTorch Nightly version could solve this problem with adding the flag device=mps
yolo train data=data.yaml epochs=100 batch=64 device=mps model=yolov8n.pt
Everything is working fine for me, and the M1Pro GPU significantly speeds up the training process.

@glenn-jocher
Copy link
Member

@ez4bk That's great news! Thank you for sharing your solution with us. It's good to hear that using the PyTorch Nightly version with the device=mps flag has resolved the issue and significantly sped up the training process on your M1Pro GPU. Your input will undoubtedly be helpful to others facing similar challenges. Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants