Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionResetError: [Errno 104] Connection reset by peer #125

Closed
xiaomujiang opened this issue Jul 24, 2021 · 7 comments
Closed

ConnectionResetError: [Errno 104] Connection reset by peer #125

xiaomujiang opened this issue Jul 24, 2021 · 7 comments

Comments

@xiaomujiang
Copy link

i use yolox to train , finished 20 opeoch , it happen a error , but look the log , i do not know why , can you help me ?

2021-07-24 08:49:52.170 | INFO | yolox.core.trainer:after_iter:245 - epoch: 20/20, iter: 10/10, mem: 12556Mb, iter_time: 0.953s, data_time: 0.130s, total_loss: 6.4, iou_loss: 2.1, l1_loss: 1.3, conf_loss: 2.2, cls_loss: 0.8, lr: 6.250e-05, size: 800, ETA: 0:00:00
2021-07-24 08:49:59.031 | INFO | yolox.evaluators.voc_evaluator:evaluate_prediction:142 - Evaluate in main process...


Results computed with the unofficial Python eval code.
Results should be very close to the official MATLAB eval code.
Recompute with ./tools/reval.py --matlab ... for your paper.
-- Thanks, The Management

Eval IoU : 0.55
Eval IoU : 0.60
Eval IoU : 0.65
Eval IoU : 0.70
Eval IoU : 0.75
Eval IoU : 0.80
Eval IoU : 0.85
Eval IoU : 0.90
Eval IoU : 0.95
2021-07-24 08:49:59.604 | INFO | yolox.core.trainer:evaluate_and_save_model:298 -
Average forward time: 21.82 ms, Average NMS time: 1.45 ms, Average inference time: 23.27 ms


map_5095: 0.3851118401510251
map_50: 0.5957307663890736

2021-07-24 08:49:59.604 | INFO | yolox.core.trainer:save_ckpt:307 - Save weights to ../models/yolox_voc_s
2021-07-24 08:50:00.101 | INFO | yolox.core.trainer:after_train:184 - Training of experiment is done and the best AP is 38.51
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

@lavender-ling
Copy link

i have the same problem,finished 6 epoch

@xiaomujiang
Copy link
Author

i have the same problem,finished 6 epoch

do you have a idea to solve this problem?

@lavender-ling
Copy link

i have the same problem,finished 6 epoch

do you have a idea to solve this problem?
not find

@Abandon-ht
Copy link

i have the same problem
AttributeError: 'Namespace' object has no attribute 'occumpy'

@xiaomujiang
Copy link
Author

i have the same problem

AttributeError: 'Namespace' object has no attribute 'occumpy'

i think your problem is not the same. occumpy you can pull the new project. i remember it have updated

@lavender-ling
Copy link

已知的pytorch dataloader内存泄漏的bug。 参见 #103 ,还在一行一行定位中。。。开大内存或者减小worker能暂时缓解

@FateScript
Copy link
Member

This issue is solved by fix memory leak in #216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants