Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI sometimes hangs #85

Closed
brettle opened this issue Jul 2, 2024 · 8 comments · Fixed by #91
Closed

CI sometimes hangs #85

brettle opened this issue Jul 2, 2024 · 8 comments · Fixed by #91
Assignees

Comments

@brettle
Copy link
Member

brettle commented Jul 2, 2024

See 8f0f4e9.

@brettle
Copy link
Member Author

brettle commented Jul 2, 2024

Draft PR #86 currently hangs on my local machine and might be useful for isolating the problem.

@brettle
Copy link
Member Author

brettle commented Jul 2, 2024

There appears to be a race condition when switching between :example:systemTest and :tests:systemTest in the same build run.

@brettle
Copy link
Member Author

brettle commented Jul 2, 2024

Further investigation reveals that the problem with PR #86 is that gradlerio is running :example:system Test in parallel with :tests:systemTest and one of them will fail because it can't access the TCP ports used by NT and halsim WebSockets server. I'm planning to use a file lock to ensure only once test process tries to run a set of tests at a time. Let me know if you have a better idea.

@brettle
Copy link
Member Author

brettle commented Jul 4, 2024

Just some notes:

The log for the macos CI run for 8f0f4e9 seems to indicate that the first test failed because Webots was not ready in time (it was in the process of reloading the world when the timeout occurred). The second test then hung after the world reload. Perhaps the first test failure left something in a bad state that caused the second test to hang. Worth trying to reproduce.

@brettle
Copy link
Member Author

brettle commented Jul 4, 2024

The log for Ubuntu indicates that the second test timed out because Webots didn't load the world. It doesn't look like Webots attempted to load it because there is no indication that the controller disconnected from NT. Not sure why. The NT server restarts between tests. Perhaps the controller hadn't reconnected in time to get the reload request.

@brettle
Copy link
Member Author

brettle commented Jul 7, 2024

This is not fully fixed. See this comment.

@brettle brettle reopened this Jul 7, 2024
@brettle
Copy link
Member Author

brettle commented Jul 7, 2024

Perhaps restart the NT server every time we remind the user to load the world? That should workaround any race condition causing the server to miss messages indicating that the reload completed.

@brettle brettle self-assigned this Jul 7, 2024
@brettle
Copy link
Member Author

brettle commented Jul 17, 2024

I believe these CI hangs are fixed in PR #98. The remaining CI hangs are covered by issue #113.

@brettle brettle closed this as completed Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant