Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Sim and e2e Test Stalls and/or Failures #5913

Closed
matthewkeil opened this issue Aug 25, 2023 · 4 comments · Fixed by #6017
Closed

Intermittent Sim and e2e Test Stalls and/or Failures #5913

matthewkeil opened this issue Aug 25, 2023 · 4 comments · Fixed by #6017
Assignees
Labels
meta-bug Issues that identify a bug and require a fix. meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-medium Resolve this some time soon (tm). scope-testing Issues for adding test coverage, fixing existing tests or testing strategies.

Comments

@matthewkeil
Copy link
Member

Describe the bug

There has been a recent occurrence, and it seems to be getting more frequent where our end-to-end and simulation tests either hang or fail for unknown reasons. In many instances the tests are running for several hours which also increases our CI budget from the excess VM time.

Expected behavior

Sim tests and e2e tests should run to completion in a “reasonable” amount of time. They should also all be passing.

Steps to reproduce

Add a PR that merges to unstable and CI will run.

Additional context

It seems like this behavior started within the last month or two so maybe back to June/July time is when it began but perhaps it was earlier.

Operating system

Linux

Lodestar version or commit hash

Branches from unstable

@matthewkeil matthewkeil added prio-medium Resolve this some time soon (tm). scope-testing Issues for adding test coverage, fixing existing tests or testing strategies. meta-investigate Issues found that require further investigation and may not have a specific resolution/fix meta-bug Issues that identify a bug and require a fix. labels Aug 25, 2023
@twoeths
Copy link
Contributor

twoeths commented Aug 25, 2023

from https://github.com/ChainSafe/lodestar/actions/runs/5969671307/job/16195943714?pr=5912

the issue happens in e2e test of network thread packages/beacon-node/test/e2e/network/reqresp.test.ts

  • the tests for main thread are done
  • the 1st test of network thread is done: should send/receive signed blocks
  • the 2nd test of network thread is passed but it never finish the code in afterEach
@lodestar/beacon-node:     ✔ should send/receive a light client bootstrap message (11014ms)
@lodestar/beacon-node:     1) "after each" hook for "should send/receive a light client bootstrap message"
  • all next tests in worker thread did not run and this test suite is suspended

@twoeths
Copy link
Contributor

twoeths commented Aug 25, 2023

root cause is network worker not able to finish shutting down #5775

@nazarhussain nazarhussain self-assigned this Sep 8, 2023
@nazarhussain
Copy link
Contributor

Fixed by #5946

@nflaig
Copy link
Member

nflaig commented Oct 3, 2023

Not yet resolved, we might wanna look into short term workarounds to make e2e tests less annoying

  • mark tests as passed after 15 minutes if there was no failed test
  • lower timeout, 6h is too long, normal e2e run takes ~10-15 minutes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-bug Issues that identify a bug and require a fix. meta-investigate Issues found that require further investigation and may not have a specific resolution/fix prio-medium Resolve this some time soon (tm). scope-testing Issues for adding test coverage, fixing existing tests or testing strategies.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants