Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BrokenPipeError: [Errno 32] Broken pipe #758

Closed
dapsjj opened this issue Aug 17, 2020 · 9 comments
Closed

BrokenPipeError: [Errno 32] Broken pipe #758

dapsjj opened this issue Aug 17, 2020 · 9 comments
Labels
question Further information is requested Stale

Comments

@dapsjj
Copy link

dapsjj commented Aug 17, 2020

I use this command to train model:python train.py --img-size 640 --batch-size 4 --epochs 300 --data ./data/garbage.yaml --cfg ./models/yolov5m.yaml --weights weights/yolov5m.pt

But the error message is:
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1050', total_memory=4096MB)

Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='./models/yolov5m.yaml', data='./data/garbage.yaml', device='', epochs=300, evolve=False, global_rank=-1, hyp='data/
hyp.finetune.yaml', img_size=[640, 640], local_rank=-1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=Fa
lse, sync_bn=False, total_batch_size=4, weights='weights/yolov5m.pt', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0
, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Overriding ./models/yolov5m.yaml nc=80 with nc=13

             from  n    params  module                                  arguments

0 -1 1 5280 models.common.Focus [3, 48, 3]
1 -1 1 41664 models.common.Conv [48, 96, 3, 2]
2 -1 1 67680 models.common.BottleneckCSP [96, 96, 2]
3 -1 1 166272 models.common.Conv [96, 192, 3, 2]
4 -1 1 639168 models.common.BottleneckCSP [192, 192, 6]
5 -1 1 664320 models.common.Conv [192, 384, 3, 2]
6 -1 1 2550144 models.common.BottleneckCSP [384, 384, 6]
7 -1 1 2655744 models.common.Conv [384, 768, 3, 2]
8 -1 1 1476864 models.common.SPP [768, 768, [5, 9, 13]]
9 -1 1 4283136 models.common.BottleneckCSP [768, 768, 2, False]
10 -1 1 295680 models.common.Conv [768, 384, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 1219968 models.common.BottleneckCSP [768, 384, 2, False]
14 -1 1 74112 models.common.Conv [384, 192, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 305856 models.common.BottleneckCSP [384, 192, 2, False]
18 -1 1 332160 models.common.Conv [192, 192, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 1072512 models.common.BottleneckCSP [384, 384, 2, False]
21 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 4283136 models.common.BottleneckCSP [768, 768, 2, False]
24 [17, 20, 23] 1 72738 models.yolo.Detect [13, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]
Model Summary: 263 layers, 2.15343e+07 parameters, 2.15343e+07 gradients

Transferred 506/514 items from weights/yolov5m.pt
Optimizer groups: 86 .bias, 94 conv.weight, 83 other
Scanning labels E:\test_opencv\yolov5-master\dataset\labels\train_small_image.cache (12752 found, 0 missing, 0 empty, 0 duplicate, for 12752 images): 12752it [00:00, 25988.17it/s]
Scanning labels E:\test_opencv\yolov5-master\dataset\labels\test_small_image.cache (3443 found, 0 missing, 0 empty, 134 duplicate, for 3443 images): 3443it [00:00, 23168.64it/s]

Analyzing anchors... anchors/target = 4.14, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 4 dataloader workers
Starting training for 300 epochs...
You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3\sitecustomize.py

You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3\sitecustomize.py

You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3\sitecustomize.py

Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "train.py", line 453, in
File "E:\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
train(hyp, opt, device, tb_writer) exitcode = _main(fd)

File "E:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
File "train.py", line 237, in train
prepare(preparation_data)
File "E:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
pbar = enumerate(dataloader)
_fixup_main_from_path(data['init_main_from_path'])
File "E:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 291, in iter
File "E:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
return _MultiProcessingDataLoaderIter(self)
run_name="mp_main")
File "E:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 737, in init
File "E:\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "E:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
w.start() mod_name, mod_spec, pkg_name, script_name)

File "E:\Anaconda3\lib\runpy.py", line 85, in _run_code
File "E:\Anaconda3\lib\multiprocessing\process.py", line 112, in start
exec(code, run_globals)
File "E:\test_opencv\yolov5-master\train.py", line 10, in
import torch.distributed as dist
self._popen = self.Popen(self) File "E:\Anaconda3\lib\site-packages\torch_init.py", line 116, in

File "E:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
raise err
OSError: [WinError 1455] Error loading "E:\Anaconda3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.
return _default_context.get_context().Process._Popen(process_obj)
File "E:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "E:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

@dapsjj dapsjj added the question Further information is requested label Aug 17, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Aug 17, 2020

Hello @dapsjj, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@Ownmarc
Copy link
Contributor

Ownmarc commented Aug 17, 2020

Reduce the number of workers using —workers 2 or even —workers 0. You dont need 4 workers for a batch size of 4.

@glenn-jocher same kind of error I was getting on Windows, maybe we could revisit the workers formula

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 17, 2020

@dapsjj it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment (conda is not recommended), clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

@dapsjj
Copy link
Author

dapsjj commented Aug 18, 2020

Reduce the number of workers using —workers 2 or even —workers 0. You dont need 4 workers for a batch size of 4.

@glenn-jocher same kind of error I was getting on Windows, maybe we could revisit the workers formula

You are right,my GPU performance is not good, I can run it only by setting batch-size to 1.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@jithin8mathew
Copy link

Reducing the batch size from 8 to 4 solved this issue for me!

@martingaudio94
Copy link

Reduce the number of workers using —workers 2 or even —workers 0. You dont need 4 workers for a batch size of 4.

@glenn-jocher same kind of error I was getting on Windows, maybe we could revisit the workers formula

It's work perfectly , thank you. @Ownmarc

@martingaudio94
Copy link

martingaudio94 commented Jul 13, 2022

@dapsjj it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment (conda is not recommended), clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

@glenn-jocher i have some problems to work with cuda because the requirements working with torch cpu, so to work with cuda i'll need to install another version of torch. (the base environment work perfectly with torch cpu)

another version of torch for cuda #8395
and @Ownmarc fix for the workers problems , work for me.

@jsmlau
Copy link

jsmlau commented Aug 18, 2022

@martingaudio94 So glad I found your reply and it works. Can I ask why conda is not recommended?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

6 participants