Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: DataLoader worker (pid 16126) exited unexpectedly with exit code 1 #5

Open
Kyle0936 opened this issue Jul 24, 2018 · 2 comments

Comments

@Kyle0936
Copy link

Sorry to bother you but I had another issue when I tried to train with my own data sets:
python main.py --mode='train' --train_path='data/images' --label_path='data/ground_truth_mask' --batch_size=8 --visdom=False

It seemed to run correctly as first, but then came the error:

......
The number of parameters: 62238175
/Users/kyle/Documents/MATLAB/DSS-py/solver.py:138: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
utils.clip_grad_norm(self.net.parameters(), self.config.clip_gradient)
/Users/kyle/Documents/MATLAB/DSS-py/solver.py:141: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
loss_epoch += loss.cpu().data[0]
/Users/kyle/Documents/MATLAB/DSS-py/solver.py:143: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
epoch, self.config.epoch, i, iter_num, loss.cpu().data[0]))
epoch: [0/500], iter: [0/3], loss: [4.8447]
/Users/kyle/Documents/MATLAB/DSS-py/solver.py:145: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
error = OrderedDict([('loss:', loss.cpu().data[0])])
epoch: [0/500], iter: [1/3], loss: [4.8428]
epoch: [0/500], iter: [2/3], loss: [4.8411]
thread_monitor No such process in pthread_detach
Traceback (most recent call last):
File "main.py", line 86, in
main(config)
File "main.py", line 25, in main
train.train()
File "/Users/kyle/Documents/MATLAB/DSS-py/solver.py", line 127, in train
for i, data_batch in enumerate(self.train_loader):
File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 280, in next
idx, batch = self._get_batch()
File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch
return self.data_queue.get()
File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/queues.py", line 335, in get
res = self._reader.recv_bytes()
File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 16126) exited unexpectedly with exit code 1.

I really appreciate your generous help!

@AceCoooool
Copy link
Owner

please check your own image and label directory:is there are some "not picture" file in this directory (I try the training in my computer, without the error you meet.)

@Kyle0936
Copy link
Author

screen shot 2018-07-25 at 1 41 20 am

screen shot 2018-07-25 at 1 41 39 am

Here are my image set and ground truth set. I also have used "ls -a" to check and have removed '.DS_Store'. The only not picture files left are '.' and '..', which clearly should not be deleted. Are there any requirements for the names or forms of images?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants