Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

[WIP] newStreamToPeer hangs #476

Closed
schomatis opened this issue Apr 22, 2021 · 7 comments
Closed

[WIP] newStreamToPeer hangs #476

schomatis opened this issue Apr 22, 2021 · 7 comments
Assignees
Labels
need/triage Needs initial labeling and prioritization

Comments

@schomatis
Copy link
Contributor

schomatis commented Apr 22, 2021

As explained in ipfs/kubo#7972 (comment) newStreamToPeer doesn't have a timeout in its context and stalls forever.

The most direct fix (#477) is to add a timeout for a hardcoded (or make it an option somewhere) amount of time in

func (bsnet *impl) SendMessage(
ctx context.Context,
p peer.ID,
outgoing bsmsg.BitSwapMessage) error {
s, err := bsnet.newStreamToPeer(ctx, p)

I'm opening this issue to discuss potentially better fixes (which can happen either as a complement or instead of the above one).

Without a deeper BS knowledge I can find another newStreamToPeer call which actually does have a send timeout along with other sane options that I would expect to have when connecting/sending.

func (s *streamMessageSender) Connect(ctx context.Context) (network.Stream, error) {
if s.connected {
return s.stream, nil
}
tctx, cancel := context.WithTimeout(ctx, s.opts.SendTimeout)
defer cancel()
if err := s.bsnet.ConnectTo(tctx, s.to); err != nil {
return nil, err
}
stream, err := s.bsnet.newStreamToPeer(tctx, s.to)

They happen in the context of the streamMessageSender and the interface MessageSender (which seems close to the more broadly used BitSwapNetwork). This logic seems to be used only for the want list managment (MessageQueue), which is handled by a different part of the code than the WANT/HAVE message processing. I don't know the exact rationale for this distinction since both deal with a lot of overlap.

Looking into this some more...

@schomatis schomatis added the need/triage Needs initial labeling and prioritization label Apr 22, 2021
@schomatis schomatis self-assigned this Apr 22, 2021
@welcome
Copy link

welcome bot commented Apr 22, 2021

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@Stebalien
Copy link
Member

I don't know the exact rationale for this distinction since both deal with a lot of overlap.

Yeah, I'm a bit confused as well. I think the main difference is: SendMessage is used for sending blocks, one per stream. SendMsg is used for sending wantlist messages, where we want to use a single stream. But I'm not entirely sure why...

A hard-coded large enough timeout should be fine (for now) so I merged the patch. My main concern is that 5s may be too small due to backpressure in some cases... but it should be fine.

@Stebalien
Copy link
Member

This also brings up the more general question: we should likely be setting more timeouts in bitswap when sending data to deal with slow peers.

@schomatis
Copy link
Contributor Author

we should likely be setting more timeouts in bitswap when sending data to deal with slow peers.

Yes. At the very least I would expect (as a consumer of this library) to have some catch-all maximum timeout (even if in the order of minutes) to never reach a deadlock situation.

@schomatis
Copy link
Contributor Author

SendMsg is used for sending wantlist messages, where we want to use a single stream. But I'm not entirely sure why...

I'm assuming because the wanlist broadcasting has a more regular flow to each peer than on-deman blocks, but it's just a guess.

@Stebalien
Copy link
Member

Yes. At the very least I would expect (as a consumer of this library) to have some catch-all maximum timeout (even if in the order of minutes) to never reach a deadlock situation.

Yeah, we probably should have one.

I'm assuming because the wanlist broadcasting has a more regular flow to each peer than on-deman blocks, but it's just a guess.

That and the fact that wantlist messages tend to be small.

@schomatis
Copy link
Contributor Author

I don't see a way forward that doesn't imply a big refactor, and with #477 in place I'm going to consider this fixed (at least for the short term).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

2 participants