-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP/POC] ordered RPC #8430
base: main
Are you sure you want to change the base?
[WIP/POC] ordered RPC #8430
Conversation
After implementing this, I'm not entirely sure if this is what we actually need/want. I'll have to think a little more about it. Either way, this doesn't handle many cases. |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 9h 39m 36s ⏱️ - 11m 22s For more details on these failures, see this check. Results for commit 570af51. ± Comparison against base commit 1c74474. This pull request removes 2 and adds 5 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
I cleaned this up and took it a step further. This now moves the worker heartbeat to the scheduler stream. This was not possible earlier because the heartbeat expected a scheduler response to adjust its interval. This is the only reason why the heartbeat used a dedicated connection pool. |
I'm starting to like this after all. Moving the heartbeat was seamless and I believe this avoids a lot of anti patterns. The transfer of actual payload data should not be done over this interface so we can't get rid of the old |
Well, thinking about the above statement again, I think there is nothing wrong with payload data on that stream as long as we're not re-using/abusing the primary administrative stream between scheduler/worker. However, to use this interface for smth like P2P where we would want to use the same stream that is also used for all the task-finished/erred/etc. messages we'll likely need to extend this multiplexing to move data off-band after all and just transmit administrative metadata over the stream. |
@@ -597,7 +597,7 @@ async def test_new_metrics_during_heartbeat(c, s, a): | |||
a.digest_metric(("execute", span.id, "x", "test", "test"), 1) | |||
await asyncio.sleep(0) | |||
await hb_task | |||
assert n > 9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 9
feels very magical. since the heartbeat doesn't have to open a new comm, we need fewer ticks until it completes and this is not guaranteed to be past 9
.
I'm not entirely convinced this test still works/make sense/is relevant.
@hendrikmakait this now moved the P2P logic to the ordered PRC. Do you recall which logic was put in P2P to ensure ordering? I would hope that we could remove some logic to test this (and ultimately reduce complexity) |
The one test failure right now is |
Closes #7480
cc @hendrikmakait