Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Delete unused data frame analytics state #50243

Conversation

dimitris-athanasiou
Copy link
Contributor

This commit adds removal of unused data frame analytics state
from the _delete_expired_data API (and in extend th ML daily
maintenance task). At the moment the potential state docs
include the progress document and state for regression and
classification analyses.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

This commit adds removal of unused data frame analytics state
from the _delete_expired_data API (and in extend th ML daily
maintenance task). At the moment the potential state docs
include the progress document and state for regression and
classification analyses.
@dimitris-athanasiou dimitris-athanasiou force-pushed the delete-unused-data-frame-analytics-state branch from 92dd2e7 to 8d2b5ec Compare December 16, 2019 17:33
@@ -39,6 +39,8 @@
public static final ParseField TRAINING_PERCENT = new ParseField("training_percent");
public static final ParseField RANDOMIZE_SEED = new ParseField("randomize_seed");

private static final String STATE_DOC_ID_SUFFIX = "_classification_state#1";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the suffix have the #1 attached, it feels like thats a number that will change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about this. We currently handle just a single doc. But we can fix this if and when we switch to handling state split over multiple docs.

assertInferenceModelPersisted(jobId);

// Delete the config straight from the config index
DeleteResponse deleteResponse = client().prepareDelete(".ml-config", DataFrameAnalyticsConfig.documentId(jobId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete the config? Does DeleteExpiredDataAction require the config to be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the state to become "unused", ie. there is no job owning those state docs.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dimitris-athanasiou dimitris-athanasiou merged commit d2aee62 into elastic:master Dec 17, 2019
@dimitris-athanasiou dimitris-athanasiou deleted the delete-unused-data-frame-analytics-state branch December 17, 2019 15:20
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Dec 17, 2019
This commit adds removal of unused data frame analytics state
from the _delete_expired_data API (and in extend th ML daily
maintenance task). At the moment the potential state docs
include the progress document and state for regression and
classification analyses.

Backport of elastic#50243
dimitris-athanasiou added a commit that referenced this pull request Dec 18, 2019
This commit adds removal of unused data frame analytics state
from the _delete_expired_data API (and in extend th ML daily
maintenance task). At the moment the potential state docs
include the progress document and state for regression and
classification analyses.

Backport of #50243
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
This commit adds removal of unused data frame analytics state
from the _delete_expired_data API (and in extend th ML daily
maintenance task). At the moment the potential state docs
include the progress document and state for regression and
classification analyses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants