Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: task manager panic #919

Open
lxl66566 opened this issue Jul 26, 2024 · 2 comments · May be fixed by #925
Open

[Bug]: task manager panic #919

lxl66566 opened this issue Jul 26, 2024 · 2 comments · May be fixed by #925
Assignees
Labels
bug Something isn't working kind/flake Categorizes issue or PR as related to a flaky test.

Comments

@lxl66566
Copy link
Collaborator

After fixed #907 in #918, the repeat test on shutdown_rpc_should_shutdown_the_cluster() will continue to fail on task manager panic. This panic is not introduced by #918, because before #918, there's already a panic case.

This bug cannot be reproduced stably. The average panic possibility maybe 1/525 (tested failure round: 231, 134, 1211).

Log

  1. failed3.log is the panic log before fix: shutdown cluster test timeout #918 change.
  2. panic8.log, panic9.log, panic10.log is the three panic record tested on fix: shutdown cluster test timeout #918 (commit 1607307)
Copy link

👋 Thanks for opening this issue!

Reply with the following command on its own line to get help or engage:

  • /contributing-agreement : to print Contributing Agreements.
  • /assignme : to assign this issue to you.

@lxl66566 lxl66566 added the bug Something isn't working label Jul 26, 2024
@lxl66566 lxl66566 self-assigned this Jul 29, 2024
@lxl66566 lxl66566 linked a pull request Jul 29, 2024 that will close this issue
@lxl66566
Copy link
Collaborator Author

lxl66566 commented Aug 8, 2024

When cluster is running, all tasks should exist, so we can use unwrap in getting tasks from task manager. But when cluster shutdown, the tasks will be removed from task manager top-to-down.
The remove behavior is performed by cmd_worker worker_as(), which runs on another parallel thread, so the removal timing is unspecified. So getting tasks may get an None, indicates the task has been removed by worker_as.

@liangyuanpeng liangyuanpeng added the kind/flake Categorizes issue or PR as related to a flaky test. label Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kind/flake Categorizes issue or PR as related to a flaky test.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants