Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Unnecessary transform warning message is logged very often #48379

Closed
dolaru opened this issue Oct 23, 2019 · 1 comment · Fixed by #48423
Closed

[ML] Unnecessary transform warning message is logged very often #48379

dolaru opened this issue Oct 23, 2019 · 1 comment · Fixed by #48423
Assignees
Labels
:ml/Transform Transform

Comments

@dolaru
Copy link
Member

dolaru commented Oct 23, 2019

Spotted in 7.4.0

In a multi-node environment, when running a continuous transform, the following warning is spammed in the logs occasionally:

[instance-0000000009] [some_transform_id] data frame transform encountered an exception: 
java.lang.RuntimeException: Failed to retrieve checkpoint due to Failed to create checkpoint
	at org.elasticsearch.xpack.dataframe.transforms.DataFrameTransformTask$ClientDataFrameIndexer.lambda$createCheckpoint$17(DataFrameTransformTask.java:1084) [data-frame-7.4.0.jar:7.4.0]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-7.4.0.jar:7.4.0]
...

After @hendrikmuhs investigated this, we found out that this is due to a mismatch of global checkpoints for the same shard (replicas). This is by design and it's nothing to worry about but the transform is paranoid and throws an exception. It should be safe to ignore the mismatch and e.g. take the max of all global checkpoints.

As a result, we should remove this message as it is unnecessary.

@dolaru dolaru added the :ml/Transform Transform label Oct 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@hendrikmuhs hendrikmuhs self-assigned this Oct 23, 2019
hendrikmuhs pushed a commit that referenced this issue Oct 24, 2019
…mismatch (#48423)

Take the max if global checkpoints mismatch instead of throwing an exception. It turned out global
checkpoints can mismatch by design

fixes #48379
hendrikmuhs pushed a commit that referenced this issue Oct 24, 2019
…mismatch (#48423)

Take the max if global checkpoints mismatch instead of throwing an exception. It turned out global
checkpoints can mismatch by design

fixes #48379
hendrikmuhs pushed a commit that referenced this issue Oct 24, 2019
…mismatch (#48423)

Take the max if global checkpoints mismatch instead of throwing an exception. It turned out global
checkpoints can mismatch by design

fixes #48379
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform
Projects
None yet
3 participants