Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive, rule: Lock TSDB directories #2915

Merged
merged 7 commits into from
Jul 24, 2020
Merged

Conversation

kakkoyun
Copy link
Member

@kakkoyun kakkoyun commented Jul 21, 2020

This PR attempts to prevent any concurrent access to TSDB directories on a single session.

  • I added CHANGELOG entry for this change.

Changes

  • Lock TSDB dir for Receiver
  • Lock TSDB dir for Ruler

TODO

  • Introduce a flag or a default mode to clean up flags, in case of a dirty shutdown in the next startup.

Verification

  • make test-local
  • make test-e2e

@kakkoyun kakkoyun changed the title [WIP] Lock TSDB directories [WIP] receive, rule: Lock TSDB directories Jul 21, 2020
@kakkoyun kakkoyun changed the title [WIP] receive, rule: Lock TSDB directories receive, rule: Lock TSDB directories Jul 21, 2020
@kakkoyun kakkoyun marked this pull request as ready for review July 21, 2020 14:40
@kakkoyun kakkoyun requested review from bwplotka and brancz July 21, 2020 18:03
@brancz
Copy link
Member

brancz commented Jul 22, 2020

For multi-tsdb this lgtm, but I'm not entirely sure about the ruler, why do we think we need this there? I think it would be ok to have configurable, so people can make this decision themselves, much like with Prometheus (this goes for both receive and ruler), as locking like this may already be handled by kubernetes for example.

@kakkoyun
Copy link
Member Author

@brancz It was just premature optimization for Ruler, I can revert it. We can add a flag to control this behaviour. What should be the default? Prometheus uses lock file by default, should we as well?

@brancz
Copy link
Member

brancz commented Jul 22, 2020

I believe creating the lock by default is correct, with the option to opt out. As long this is clear in the changelog, this is fine in my opinion. Would be good to be confirmed by @bwplotka though.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, Generally LGTM, just some suggestions 🤗
Thanks! 💪


ignoreBlockSize := cmd.Flag("shipper.ignore-unequal-block-size", "If true receive will not require min and max block size flags to be set to the same value. Only use this if you want to keep long retention and compaction enabled, as in the worst case it can result in ~2h data loss for your Thanos bucket storage.").Default("false").Hidden().Bool()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ignoreBlockSize := cmd.Flag("shipper.ignore-unequal-block-size", "If true receive will not require min and max block size flags to be set to the same value. Only use this if you want to keep long retention and compaction enabled, as in the worst case it can result in ~2h data loss for your Thanos bucket storage.").Default("false").Hidden().Bool()
ignoreBlockSize := cmd.Flag("shipper.ignore-unequal-block-size", "If true receive will not require min and max block size flags to be set to the same value. Only use this if you want to keep long retention and compaction enabled, as in the worst case it can result in ~2h data loss for your Thanos bucket storage.").Default("false").Hidden().Bool()

Hm... this is actually something we can improve. Let's add TODO and ensure there is issue for this. We can leverage vertical compaciton just fine in this case.

cmd/thanos/receive.go Outdated Show resolved Hide resolved
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but some tiny nits to improve, we can do them in next iter, feel free to merge (:

}
return err
}
level.Info(logger).Log("msg", "a leftover lockfile found and removed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice and explict! 👍

if !fi.IsDir() {
continue
}
if err := os.Remove(filepath.Join(t.defaultTenantDataDir(fi.Name()), "lock")); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reuse the removeLockfileIfAny, because... why not? (:

@kakkoyun kakkoyun merged commit bebaa62 into thanos-io:master Jul 24, 2020
@kakkoyun kakkoyun deleted the tsdb_lock branch July 24, 2020 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants