-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Serialization bugs can cause node drops #624
Comments
Hi @Bukhtawar, Could you elaborate more about this issue? we don't know if this issue is really happening or you think it could happen. Are you able to reproduce it? or this is something you anticipated that it will break based in your understanding of the code. |
I have observed this happening while making bw incompatible change and can also be reproduced with a small code change on one node that serializes message in a format that cannot be read or deserialized on the other node I can see if I can write an integration test that would demonstrate that |
@Bukhtawar I think this is related, elastic/elasticsearch#75334, right? |
Looks similar to me @reta |
@Bukhtawar do you mind if I pick this one? have few ideas in mind, thank you |
@Bukhtawar a bit more details regarding the (de)serialization handling inside
Interestingly, it is only raised when input stream has unconsumed bytes, the case when stream has less bytes than needed to reconstruct the payload is not classified as serialization issue (resulting in
The suggested changes are intended to report more details regarding the cause of the (de)serialization failure and provide uniformity in handling (de)serialization issues:
|
@nknize sorry cannot do that myself, could you please add |
* Fix for CVE-2976 + add CVE checker Signed-off-by: Omar Khasawneh <okhasawn@amazon.com> * Updated Changelog Signed-off-by: Omar Khasawneh <okhasawn@amazon.com> --------- Signed-off-by: Omar Khasawneh <okhasawn@amazon.com>
Describe the bug
Most commonly backward incompatible changes to APIs like stats can cause ser/de over the wire to fail
To Reproduce
If there is a serialization bug it would manifest itself in the InboundHandler#messageReceived causing to throw an
IllegalStateException
The exception is then caught at TcpTransport#handleException which ends up closing the specific TCP channel that caused the serialization issue mostly the REG channel
Each channel has a close listener attached which on being closed closes all channels, thereby removing the connection
NodeChannels is actually a TransportConnection implementation which again has a close listener attached which removes it from the list of connected nodes.
At this point the connection target is gone and requests should see a NodeNotConnectedException
NodeConnectionService has a periodic ConnectionChecker which then tries to re-establish all connections if a single connection target closes.
Expected behavior
Node disconnections and reconnections are too intrusive and ser de exceptions can be handled more gracefully
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: