Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/cache: fix broken metric and current index cache size handling #873

Merged
merged 5 commits into from
Mar 4, 2019

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Feb 27, 2019

Potpourri of fixes to the cache:

  • Do not forget to increase the c.current metric with relevant labels when
    adding items to the cache. Without this the metric thanos_store_index_cache_items is unavailable.

  • Properly adjust c.curSize when adding items to the cache.

  • Switch the order of operands in a check to avoid a potential uint64 overflow and thus not removing enough items to satisfy our size constraints.

Without these last two fixes essentially the metric thanos_store_index_cache_items_evicted_total is always zero together with thanos_store_index_cache_items_overflowed_total (most of the time). This is because we never enter the loop, and we never increase c.curSize before this.

Verification:

  • New smoke tests pass which check if curSize is properly adjusted.

@GiedriusS GiedriusS changed the title store/cache: do not forget to increase c.current on adding new items store/cache: fix broken metric and current index cache size handling Feb 28, 2019
Adding uint64(len(b)) to c.curSize might potentially overflow uint64 if
the numbers are big enough and then we might not remove enough items
from the LRU to satisfy the request. On the other hand, switching the
operands avoids this problem because we check before if uint64(len(b))
is bigger than c.maxSize so subtracting uint64(len(b)) will *never*
overflow because we know that it is less or equal to c.maxSize.
for c.curSize+uint64(len(b)) > c.maxSize {
c.lru.RemoveOldest()
for c.curSize > c.maxSize-uint64(len(b)) {
if _, val, ok := c.lru.RemoveOldest(); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle something if that does not ok ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we could inform the users about such case? AFAICT there is no way ok can be false due to the check before-hand.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm . Isn't currSize updated in "onEvict"?

@GiedriusS
Copy link
Member Author

Ah yes, that's true but it is never increased. Will revert fbaae7c.

c.curSize is lowered in onEvict.
Add smoke tests for the index cache which check if we set curSize
properly, and if removal works.
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@bwplotka
Copy link
Member

bwplotka commented Mar 4, 2019

2 flakes on tests, but now is green: https://circleci.com/gh/improbable-eng/thanos/2353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants