Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Validate existing cluster state differently to newly submitted configs #30084

Closed
elasticmachine opened this issue Sep 5, 2017 · 1 comment
Labels
>enhancement :ml Machine learning

Comments

@elasticmachine
Copy link
Collaborator

Original comment by @droberts195:

If we're going to introduce completely new job types in the future, we need to change the way unknown job/datafeed cluster state is validated.

While trying to add categorizer jobs, which are quite similar to anomaly_detector jobs, I ran into the following problem:

  • Logically, a categorizer job should have no detectors
  • But the AnalysisConfig class requires detectors
  • There are two possible solutions that seem reasonable at first glance:
    1. Have categorizer jobs have a categorization_config instead of analysis_config
    2. Change analysis_config so that detectors is not required if the job_type is categorizer
  • Unfortunately neither of these works:
    1. Old nodes will ignore categorization_config when parsing metadata, but then error because Job requires an analysis_config
    2. Old nodes will not tolerate an analysis_config with no detectors
  • This results in the messy solution that categorizer jobs will have to have an analysis_config that includes unnecessary fields - new nodes will ignore these fields and mask them when printing the config in REST responses, but old nodes will show the unnecessary bits

I think the only long term solution that allows the necessary degree of extensibility is to hold Jobs as arbitrary Map<String, Object> or BytesReference when parsing from cluster state, and only interpret what's in the Map or BytesReference if the job_type is understood. This is pretty much how index settings work.

@droberts195
Copy link
Contributor

We decided to avoid this problem by using a completely different class to store new types of jobs that are not anomaly_detectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning
Projects
None yet
Development

No branches or pull requests

2 participants