Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index age tracking since rollover #191

Closed
Lawryy opened this issue Oct 15, 2021 · 3 comments
Closed

Index age tracking since rollover #191

Lawryy opened this issue Oct 15, 2021 · 3 comments

Comments

@Lawryy
Copy link

Lawryy commented Oct 15, 2021

Index state management using min_index_age doesn't guarantee a definite amount of retention time when using size-based index rollovers.

E.g. Data retention has to be 30d or more:

Index rollover happens once per week, when shard size hits 30GB. Then it moves to delete phase with min_index_age set to 30d. Index gets deleted after 30 days its creation, but since logs were still going in a week after its creation, the newest messages that get deleted are less than 30 days old.

Add a min_age_since_rollover (or similar) variable that can be used with Index State Management.

Or add a variable to "lock" the index in a state for a defined amount of time, to ensure sufficient retention.

ILM with Elasticsearch already behaves this way with rollovers

@dbbaughe
Copy link
Contributor

@CEHENKLE Can you move this to index management

@CEHENKLE CEHENKLE transferred this issue from opensearch-project/OpenSearch Nov 11, 2021
@dbbaughe
Copy link
Contributor

@Lawryy

We should be able to get the time a rollover happened for an index from the metadata.
That being said, an index can have multiple aliases that were rolled over on it:
e.g.
Create foo-index w/ alias bar and alias baz

You can call /bar/_rollover/new-index-1 and /baz/_rollover/another-index-1
And now the metadata for foo-index will show that it had two rollovers, one for bar and another for baz each with different timestamps.

We do not want to have concrete alias names in the policy itself to know which to use as that would mean it's not reusable across multiple groups of indices potentially.

So what do you think about:

Introducing min_age_since_rollover condition where you can specify a time unit like 10h, 5d, 30d, etc. and by default it'll use the oldest rollover it finds for that index (the most common case will probably only have a single rollover).

If it finds no rollover has happened... need to decide if it should continue waiting for a rollover to happen or if it should fail because to use this condition check a rollover must have happened and no rollover is an error.

And then we could introduce a new optional index setting to let you explicitly say which alias to use for the check for a specific index if for some reason you have multiple rollovers on an index.

@Lawryy
Copy link
Author

Lawryy commented Nov 12, 2021

Using the oldest rollover sounds fine to me.

I think it would be wise to give an error if there is no rollover, as the check should only be applied after the rollover has happened.

The optional index setting also seems like a good idea.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants