Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude torch==1.12.0, torchvision==0.13.0 (Fix #8395) #8497

Merged
merged 1 commit into from
Jul 6, 2022

Conversation

mjun0812
Copy link
Contributor

@mjun0812 mjun0812 commented Jul 6, 2022

Fix #8395

There is a bug in PyTorch 1.12.0 regarding CUDA initialization.
Therefore, the environment variable "CUDA_VISIBLE_DEVICES" set after import torch will not work.

This is a fatal problem because YOLOv5 uses CUDA_VISIBLE_DEVICES to allocate GPUs.

def select_device(device='', batch_size=0, newline=True):
# device = None or 'cpu' or 0 or '0' or '0,1,2,3'
s = f'YOLOv5 🚀 {git_describe() or file_date()} Python-{platform.python_version()} torch-{torch.__version__} '
device = str(device).strip().lower().replace('cuda:', '').replace('none', '') # to string, 'cuda:0' to '0'
cpu = device == 'cpu'
mps = device == 'mps' # Apple Metal Performance Shaders (MPS)
if cpu or mps:
os.environ['CUDA_VISIBLE_DEVICES'] = '-1' # force torch.cuda.is_available() = False
elif device: # non-cpu device requested
os.environ['CUDA_VISIBLE_DEVICES'] = device # set environment variable - must be before assert is_available()
assert torch.cuda.is_available() and torch.cuda.device_count() >= len(device.replace(',', '')), \
f"Invalid CUDA '--device {device}' requested, use '--device cpu' or pass valid CUDA device(s)"
if not (cpu or mps) and torch.cuda.is_available(): # prefer GPU if available
devices = device.split(',') if device else '0' # range(torch.cuda.device_count()) # i.e. 0,1,6,7
n = len(devices) # device count
if n > 1 and batch_size > 0: # check batch_size is divisible by device_count
assert batch_size % n == 0, f'batch-size {batch_size} not multiple of GPU count {n}'
space = ' ' * (len(s) + 1)
for i, d in enumerate(devices):
p = torch.cuda.get_device_properties(i)
s += f"{'' if i == 0 else space}CUDA:{d} ({p.name}, {p.total_memory / (1 << 20):.0f}MiB)\n" # bytes to MB
arg = 'cuda:0'
elif mps and getattr(torch, 'has_mps', False) and torch.backends.mps.is_available(): # prefer MPS if available
s += 'MPS\n'
arg = 'mps'
else: # revert to CPU
s += 'CPU\n'
arg = 'cpu'
if not newline:
s = s.rstrip()
LOGGER.info(s.encode().decode('ascii', 'ignore') if platform.system() == 'Windows' else s) # emoji-safe
return torch.device(arg)

As mentioned in this Issue, the next version of PyTorch 1.12.1 will resolve this issue, so it is better to remove the buggy version 1.12.0 and the associated torchvision version from the requirements.txt file.

If you have any comments, please do not hesitate to let me know.
Thanks.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Updated PyTorch requirements to exclude buggy versions.

📊 Key Changes

  • Excluded PyTorch version 1.12.0 due to identified issues.
  • Excluded torchvision version 0.13.0 as it's not compatible with current codebase.

🎯 Purpose & Impact

  • 🎯 Purpose: To prevent installation of specific PyTorch and torchvision versions that are known to cause problems with the YOLOv5 code, ensuring stability and reliability for users.
  • 💥 Impact: Users will avoid potential bugs by not installing these problematic versions, leading to a smoother experience with the YOLOv5 project.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @mjun0812, thank you for submitting a YOLOv5 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify your PR is up-to-date with upstream/master. If your PR is behind upstream/master an automatic GitHub Actions merge may be attempted by writing /rebase in a new comment, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
# git checkout feature  # <--- replace 'feature' with local branch name
git merge upstream/master
git push -u origin -f
  • ✅ Verify all Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

@glenn-jocher glenn-jocher self-assigned this Jul 6, 2022
@glenn-jocher glenn-jocher merged commit 1ab23fc into ultralytics:master Jul 6, 2022
@glenn-jocher
Copy link
Member

@mjun0812 thanks for the investigation and the PR. PR is merged!

Shivvrat pushed a commit to Shivvrat/epic-yolov5 that referenced this pull request Jul 12, 2022
@zhiqwang zhiqwang mentioned this pull request Jul 15, 2022
1 task
ctjanuhowski pushed a commit to ctjanuhowski/yolov5 that referenced this pull request Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

YOLOv5 issues with torch==1.12 on Multi-GPU systems
2 participants