-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BrokenPipeError: [Errno 32] Broken pipe #758
Comments
Hello @dapsjj, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments. If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com. |
Reduce the number of workers using —workers 2 or even —workers 0. You dont need 4 workers for a batch size of 4. @glenn-jocher same kind of error I was getting on Windows, maybe we could revisit the workers formula |
@dapsjj it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment (conda is not recommended), clone the latest repo (code changes daily), and RequirementsPython 3.8 or later with all requirements.txt dependencies installed, including $ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu. |
You are right,my GPU performance is not good, I can run it only by setting batch-size to 1. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Reducing the batch size from 8 to 4 solved this issue for me! |
It's work perfectly , thank you. @Ownmarc |
@glenn-jocher i have some problems to work with cuda because the requirements working with torch cpu, so to work with cuda i'll need to install another version of torch. (the base environment work perfectly with torch cpu) another version of torch for cuda #8395 |
@martingaudio94 So glad I found your reply and it works. Can I ask why conda is not recommended? |
I use this command to train model:
python train.py --img-size 640 --batch-size 4 --epochs 300 --data ./data/garbage.yaml --cfg ./models/yolov5m.yaml --weights weights/yolov5m.pt
But the error message is:
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1050', total_memory=4096MB)
Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='./models/yolov5m.yaml', data='./data/garbage.yaml', device='', epochs=300, evolve=False, global_rank=-1, hyp='data/
hyp.finetune.yaml', img_size=[640, 640], local_rank=-1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=Fa
lse, sync_bn=False, total_batch_size=4, weights='weights/yolov5m.pt', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0
, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Overriding ./models/yolov5m.yaml nc=80 with nc=13
0 -1 1 5280 models.common.Focus [3, 48, 3]
1 -1 1 41664 models.common.Conv [48, 96, 3, 2]
2 -1 1 67680 models.common.BottleneckCSP [96, 96, 2]
3 -1 1 166272 models.common.Conv [96, 192, 3, 2]
4 -1 1 639168 models.common.BottleneckCSP [192, 192, 6]
5 -1 1 664320 models.common.Conv [192, 384, 3, 2]
6 -1 1 2550144 models.common.BottleneckCSP [384, 384, 6]
7 -1 1 2655744 models.common.Conv [384, 768, 3, 2]
8 -1 1 1476864 models.common.SPP [768, 768, [5, 9, 13]]
9 -1 1 4283136 models.common.BottleneckCSP [768, 768, 2, False]
10 -1 1 295680 models.common.Conv [768, 384, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 1219968 models.common.BottleneckCSP [768, 384, 2, False]
14 -1 1 74112 models.common.Conv [384, 192, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 305856 models.common.BottleneckCSP [384, 192, 2, False]
18 -1 1 332160 models.common.Conv [192, 192, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 1072512 models.common.BottleneckCSP [384, 384, 2, False]
21 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 4283136 models.common.BottleneckCSP [768, 768, 2, False]
24 [17, 20, 23] 1 72738 models.yolo.Detect [13, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]
Model Summary: 263 layers, 2.15343e+07 parameters, 2.15343e+07 gradients
Transferred 506/514 items from weights/yolov5m.pt
Optimizer groups: 86 .bias, 94 conv.weight, 83 other
Scanning labels E:\test_opencv\yolov5-master\dataset\labels\train_small_image.cache (12752 found, 0 missing, 0 empty, 0 duplicate, for 12752 images): 12752it [00:00, 25988.17it/s]
Scanning labels E:\test_opencv\yolov5-master\dataset\labels\test_small_image.cache (3443 found, 0 missing, 0 empty, 134 duplicate, for 3443 images): 3443it [00:00, 23168.64it/s]
Analyzing anchors... anchors/target = 4.14, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 4 dataloader workers
Starting training for 300 epochs...
You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3\sitecustomize.py
You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3\sitecustomize.py
You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3\sitecustomize.py
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "train.py", line 453, in
File "E:\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
train(hyp, opt, device, tb_writer) exitcode = _main(fd)
File "E:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
File "train.py", line 237, in train
prepare(preparation_data)
File "E:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
pbar = enumerate(dataloader)
_fixup_main_from_path(data['init_main_from_path'])
File "E:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 291, in iter
File "E:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
return _MultiProcessingDataLoaderIter(self)
run_name="mp_main")
File "E:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 737, in init
File "E:\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "E:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
w.start() mod_name, mod_spec, pkg_name, script_name)
File "E:\Anaconda3\lib\runpy.py", line 85, in _run_code
File "E:\Anaconda3\lib\multiprocessing\process.py", line 112, in start
exec(code, run_globals)
File "E:\test_opencv\yolov5-master\train.py", line 10, in
import torch.distributed as dist
self._popen = self.Popen(self) File "E:\Anaconda3\lib\site-packages\torch_init.py", line 116, in
File "E:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
raise err
OSError: [WinError 1455] Error loading "E:\Anaconda3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.
return _default_context.get_context().Process._Popen(process_obj)
File "E:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "E:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
The text was updated successfully, but these errors were encountered: