-
Notifications
You must be signed in to change notification settings - Fork 618
-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClientActor's message queues are sometimes very congested #6438
Comments
Yes - and the proper solution is to 'split' ClientActor into couple separate Actors/processes -- so that the runtime execution doesn't block it. |
@nikurt did you assign yourself to fix the underlying issue or to add metrics? |
@bowenwang1996 to add metrics. |
Could you create a separate issue for the metrics? |
@bowenwang1996 @nikurt ok filed #6460 |
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. |
Recently we've seen a mainnet archival node built from b949483 process blocks too slowly to sync. What happens is that intermittently it'll fall behind and then stay around 500 blocks behind HEAD, as it doesn't process blocks fast enough to catch up after falling behind.
After adding logging like this:
We can see that there's big delay between when the peer actor passes the partial encoded chunk message to the client actor and the point when the client actor actually receives and processes it (sometimes seconds long):
Mar 15 20:15:57.277 DEBUG network: peer actor -> client actor chunk part ChunkHash(
CooC7KRNfFsuUuEWaCAYeghCCujXZbm5LGUhEJmPWYm4
) [52]...
...
Mar 15 20:16:00.799 DEBUG client: recv partial chunk response ChunkHash(
CooC7KRNfFsuUuEWaCAYeghCCujXZbm5LGUhEJmPWYm4
) [52]So the node ends up taking too long to sync even though there's nothing wrong with its block/chunk processing speed. Note that this is a node built from b949483 (plus the extra logging), which contains #6333. So that PR definitely helps, but doesnt fully get rid of this
In debugging this, it could be helpful to add metrics that show this delay between when messages are sent to the client actor and when they're received (unless calculating it is too invasive/expensive with the volume of messages coming in, not sure).
cc @nikurt @bowenwang1996 @mm-near
The text was updated successfully, but these errors were encountered: