Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use timeouts when sending messages for stream open, close, and reset. #52

Merged
merged 18 commits into from
May 10, 2019

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented May 8, 2019

Closes #51 #48

@vyzo vyzo requested review from Stebalien and raulk May 8, 2019 10:36
@ghost ghost assigned vyzo May 8, 2019
@ghost ghost added the status/in-progress In progress label May 8, 2019
@vyzo
Copy link
Contributor Author

vyzo commented May 8, 2019

test failures are #47

multiplex.go Outdated Show resolved Hide resolved
multiplex.go Outdated Show resolved Hide resolved
@vyzo
Copy link
Contributor Author

vyzo commented May 8, 2019

We might also consider not sending a reset message at all, just swallow the remote error.

raulk
raulk previously requested changes May 8, 2019
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this closer, I think this patches the wrong sites. Calls to Reset() and Close() can still block forever. It seems that patching sendMsg to fallback to a grace period iff the context has no deadline would cover all bases more accurately:

go-mplex/multiplex.go

Lines 150 to 155 in dec60e0

dl, hasDl := ctx.Deadline()
if hasDl {
if err := mp.con.SetWriteDeadline(dl); err != nil {
return err
}
}

@vyzo
Copy link
Contributor Author

vyzo commented May 8, 2019

I think this patches the wrong sites.

Not the wrong sites, these sites clearly need to be patched. It's just not all of them.

Calls to Reset() and Close() can still block forever.

We have no mechanism to propagate the error to the user in these cases, and I don't know what the semantics of failing to reset or close because of a timeout should be.
Maybe we should just close the connection when it happens?

@vyzo
Copy link
Contributor Author

vyzo commented May 8, 2019

Also patched the Reset call site, but with hard reset semantics.
If it fails to reset within the timeout (1min), it kills the connection entirely.
I don't think we need to patch Close, as it returns the error synchronously.

@vyzo
Copy link
Contributor Author

vyzo commented May 8, 2019

actually we probably need a timeout there, and return the error.

@vyzo
Copy link
Contributor Author

vyzo commented May 8, 2019

actually we probably need a timeout there, and return the error.

We don't really take action on the error in our code, and at best we would reset the stream.
So I added the timeout and make it kill the connection, hard reset style, if the Close fails.

@vyzo vyzo changed the title use timeouts when sending messages for stream open and error reset. use timeouts when sending messages for stream open, close, and reset. May 8, 2019
@vyzo vyzo requested a review from raulk May 8, 2019 17:07
@vyzo
Copy link
Contributor Author

vyzo commented May 9, 2019

summoning @Stebalien @raulk -- this has been thoroughly tested, let's review and get it merged!

@vyzo vyzo dismissed raulk’s stale review May 10, 2019 19:21

issues addressed

@vyzo vyzo merged commit 03b91c7 into master May 10, 2019
@ghost ghost removed the status/in-progress In progress label May 10, 2019
@vyzo vyzo deleted the fix/issue-51 branch May 10, 2019 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stalled writer goroutines
3 participants