Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circular reference in Exception when checkpoint writing fails #15941

Closed
brwe opened this issue Jan 12, 2016 · 1 comment · Fixed by #15952
Closed

Circular reference in Exception when checkpoint writing fails #15941

brwe opened this issue Jan 12, 2016 · 1 comment · Fixed by #15952
Labels

Comments

@brwe
Copy link
Contributor

brwe commented Jan 12, 2016

We can end up with a circular reference in an Exception A, which has as suppressed Exception B which has as cause A again if an Exception is thrown in the try block of Translog.ensureSynced() (Translog.java#L562).

This is easy to see if you pick brwe@bb10899 and run the TranslogTests.

The way this happens is so:

  1. We call Translog.ensureSynced() but unfortunately when TranslogWriter.sync() is called (via current.syncUpTo()) the writing of the checkpoint fails (TranslogWriter.java#L151) with Exception A. This causes TranslogWriter.tragedy to be set to Exception A before it is thrown (TranslogWriter.closeWithTragicEvent()).
  2. Because of Exception A we now get to the catch clause in Translog.ensureSynced() where we call Translog.closeOnTragicEvent() and pass as parameter the Exception A that we just set as tragic exception (TranslogWriter.tragedy).
  3. In Translog.closeOnTragicEvent() we call close() but that will fail too because we closed the translog already before (because of the tragic event A). We will get Exception B which is an AlreadyClosedException with cause TranslogWriter.tragedy (Exception A). (It is created in TranslogWriter.ensureOpen()).
  4. Because of Exception B now we get to the catch clause of Translog.closeOnTragicEvent() and have
    • inner Exception B, the AlreadyClosedException with cause TranslogWriter.tragedy (Exception A)
    • method parameter Exception (ex) which is TranslogWriter.tragedy (Exception A)

We the set B as suppressed Exception for the parameter exception A and therefore end up with a circular reference A that has suppressed B that has cause A.

As a result, when we try to serialize this, we get a stackoverflow.

This can be seen also in this build failure: http://build-us-00.elastic.co/job/es_core_21_metal/326/

If this explanation is too confusing let me know.

@brwe brwe changed the title Circular dependency in Exception when checkpoint writing fails Circular reference in Exception when checkpoint writing fails Jan 12, 2016
@s1monw
Copy link
Contributor

s1monw commented Jan 13, 2016

thanks so much @brwe for tracking this down. I know exactly what kind of time-sink this is/can be. The explanation makes perfect sense to me after reading it a couple of times :). I think the fix is relatively straight forward, no? I mean we should no add AlreadyClosedExceptions as suppressed exceptions when we call Translog#close() inside `Translog#closeOnTragicEvent(Throwable), what do you think?

brwe added a commit to brwe/elasticsearch that referenced this issue Jan 13, 2016
Don't set the suppressed Exception in Translog.closeOnTragicEvent(Exception ex) if it is an
AlreadyClosedException. ACE is thrown by the TranslogWriter and as cause might
contain the Exception that we add the suppressed ACE to. We then end up with a
circular reference where Exception A has a suppressed Exception B that has as cause A.
This would cause a stackoverflow when we try to serialize it.
For a more detailed description see elastic#15941

closes elastic#15941
brwe added a commit that referenced this issue Jan 13, 2016
Avoid circular reference in exception

Don't set the suppressed Exception in Translog.closeOnTragicEvent(Exception ex) if it is an
AlreadyClosedException. ACE is thrown by the TranslogWriter and as cause might
contain the Exception that we add the suppressed ACE to. We then end up with a
circular reference where Exception A has a suppressed Exception B that has as cause A.
This would cause a stackoverflow when we try to serialize it.
For a more detailed description see #15941

closes #15941
brwe added a commit that referenced this issue Jan 13, 2016
Avoid circular reference in exception

Don't set the suppressed Exception in Translog.closeOnTragicEvent(Exception ex) if it is an
AlreadyClosedException. ACE is thrown by the TranslogWriter and as cause might
contain the Exception that we add the suppressed ACE to. We then end up with a
circular reference where Exception A has a suppressed Exception B that has as cause A.
This would cause a stackoverflow when we try to serialize it.
For a more detailed description see #15941

closes #15941
brwe added a commit that referenced this issue Jan 13, 2016
Avoid circular reference in exception

Don't set the suppressed Exception in Translog.closeOnTragicEvent(Exception ex) if it is an
AlreadyClosedException. ACE is thrown by the TranslogWriter and as cause might
contain the Exception that we add the suppressed ACE to. We then end up with a
circular reference where Exception A has a suppressed Exception B that has as cause A.
This would cause a stackoverflow when we try to serialize it.
For a more detailed description see #15941

closes #15941
brwe added a commit that referenced this issue Jan 13, 2016
Avoid circular reference in exception

Don't set the suppressed Exception in Translog.closeOnTragicEvent(Exception ex) if it is an
AlreadyClosedException. ACE is thrown by the TranslogWriter and as cause might
contain the Exception that we add the suppressed ACE to. We then end up with a
circular reference where Exception A has a suppressed Exception B that has as cause A.
This would cause a stackoverflow when we try to serialize it.
For a more detailed description see #15941

closes #15941
fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants