Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(DeadlockTest): Handle draining of closed channel, speed up test. #1957

Merged
merged 1 commit into from
May 30, 2023

Conversation

billprovince
Copy link
Contributor

@billprovince billprovince commented May 30, 2023

Fixes DGRAPHCORE-153

Problem

We have had frequent problems with the TestPublisherDeadlockTest. Every once in a while, we would witness timeouts on the test itself. I witnessed that in at least one case, the problem could be attributed to a channel being "drained" but having incorrect handling in the case where the channel is closed. While this was not a majority root-cause for this issue, it is nevertheless something we should correctly handle. Additionally, we find that these tests can take quite a while -- most of the time simply waiting for go-routines to finish their time.Sleep(...) invocations. The use of time.Sleep to handle timing issues is typically a sign that the messaging has not been fully thought through.

Solution

For the issue with the "draining" not handling closed channels, the process is simple: we just add a check for the channel already being closed during the drain process. For the issue of the overuse of time.Sleep, and long-running invocations, I replaced the time.Sleep invocations with WaitGroup instances. After these changes, I am unable to prove that the full issue has been resolved, but I find that I have been unable to reproduce the original failures after these changes. Invoking go test -race -v --count=1000 -run TestPublisherDeadlock now completes in roughly 80 seconds (for the 1000 tests) with each test taking roughly 0.08 seconds as opposed to roughly 20 seconds (and multiple hours required to run 1000 tests). Moreover, on the 10 or so trials I have made to reproduced the original problem after this change, I get 100% test success.

@billprovince billprovince merged commit f9b9e4d into main May 30, 2023
@billprovince billprovince deleted the HandleDrainingClosedChannel branch May 30, 2023 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants