Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: Distributed filesystem S3-like object storage implementation #326

Closed
jdfalk opened this issue May 8, 2018 · 13 comments
Closed

store: Distributed filesystem S3-like object storage implementation #326

jdfalk opened this issue May 8, 2018 · 13 comments

Comments

@jdfalk
Copy link
Contributor

jdfalk commented May 8, 2018

We would like to be able to have the Thanos instances backup to a central thanos instance and have it write the data to the local HD.

i.e. prom1/thanos1 -> thanosstorage1 -> /var/lib/thanos/feddata

@bwplotka
Copy link
Member

bwplotka commented May 8, 2018

Hey, Can I have more details on it?

  • What do you mean by Thanos instance? What exactly component?
  • What is a central thanos instance?
  • What exactly data you have in mind? TSDB Blocks?
  • Can you share what problem/goal you want to solve? Do you want to kind of use local HD as "object storage"? (so you want to browse the data using store Thanos gateway)?

Currently, data can be uploaded first to object storage and then you can write straightforward tool to download the data to the local HD, but I am lacking of context what you want to achieve.

@jdfalk
Copy link
Contributor Author

jdfalk commented May 14, 2018

Sorry about the late reply. Hopefully this helps clear it up.

My goal is to backup the TSDB data to a local system rather than GCS or AWS. We have two main datacenters and several smaller locations. At our main datacenters we purchased some large storage systems for longterm storage all of our prometheus data. We were originally going to federate all data upwards to these machines but the scrape times become too high to get a detailed picture of the data.

A breakdown of what I am proposing:
Prometheus Storage Server:

  • Thanos Backup / Store Gateway — Saving TSDB to local system for long term storage.
  • Thanos Query Layer / Thanos Query Access — Point Grafana towards the query layer.
  • Thanos Compactor — Compact storage.

Prometheus Scrape Server:

  • Thanos Backup saving to Prometheus Storage Server
  • Thanos Query Access — Not sure if that’s necessary as all data would be migrated to the Prometheus Storage Server.

On each Prometheus scrape server we would run the Thanos backup (and probably query access.) As data is added to prometheus this would be backed up to the long term storage systems via thanos. Allowing us to make the scrape servers almost completely stateless.

On the prometheus storage servers we would have the backup and gateway running so that on individual nodes they can do calculations against the larger dataset. We would also have the Query layer which would provide an interface for Grafana.

@bwplotka
Copy link
Member

hmm, so I think there are minor misunderstandings but overall what I get from this is that you would like to have "Filesystem Provider" or something like that, right? To be able to upload metrics somewhere to your NAS server or something like that and in the same way query metrics from there?

Something like this, is achievable by "simply" implementing another adapter for our Bucket interface. See this short tutorial how to do it (currently in a PR): https://github.com/improbable-eng/thanos/blob/71634c354202d96f83ea22c3c1e1f194701b368e/docs/object_stores.md

It is up to you how would you like to implement this, maybe iSCSI or NFS would help here, but basically GetRange operation is non-trivial. (Getting the arbitrary number of bytes from an object).

Not sure, how consistency model would look like here as well as latency and cost for this, but technically it is doable.

Regarding potential misunderstandings:
Thanos query does not care about object storage it just cares about gRPC StoreAPI it has access to, so not sure if thanos query is relevant in this particular discussion.

On the prometheus storage servers we would have the backup and gateway

Thanos Store does not back up anything. It just use "read" bucket operations.

On each Prometheus scrape server we would run the Thanos backup (and probably query access.)

I don't understand that part, if you set up thanos sidecar it will gives you query access (expose StoreAPI that is used by thanos query components) and optional backup logic.

This indeed makes the Prometheus almost stateless even in current form (GCS/S3 backup), but "almost" means still ~3h of fresh data that needs to be available.

@jdfalk
Copy link
Contributor Author

jdfalk commented May 15, 2018

So my thought was to have prometheus scrape systems with SSD's and then have thanos push that data using it's backup logic to a centralized store (S3/GCS/NFS) so that individual prometheus servers don't have that data locally. That's the data I was hoping to backup to a file system. My other option is to setup something to provide an S3/GCS compatible interface but save to the file system but I was hoping that solution could be natively provided. I will look at making a provider that writes to NFS.

Thank you

@bwplotka bwplotka changed the title Allow Local Storage Backup store: Distributed filesystem S3-like object storage implementation May 16, 2018
@bwplotka
Copy link
Member

Does the new title of this issue makes sense to you @jdfalk ? (:

@jdfalk
Copy link
Contributor Author

jdfalk commented May 16, 2018 via email

@dupondje
Copy link

Hi,

As discussed on Slack, this is something we might also like to have!
We have redundant SAN systems, so we would like to have data stored there instead of on GCS or S3.

Implementing GetRange is just as easy as doing a Seek on the io object?

Thanks!

@bwplotka
Copy link
Member

Cool, any volunteers? (: I won't have time for this currently and I don't have any Filesystem-like system available right now.

@BenoitKnecht
Copy link

My other option is to setup something to provide an S3/GCS compatible interface but save to the file system but I was hoping that solution could be natively provided.

@jdfalk If you're interested in setting up something like that until this issue is resolved, I recommend you check out Minio. It provides an S3-compatible API and in its simplest form, just writes objects as files and directories on a local filesystem.

Getting started is as easy as running

docker run -p 9000:9000 --name minio -v /mnt/data:/data minio/minio server /data

If instead you already have a NAS set up, Minio can act as an S3-to-NAS gateway:

minio gateway nas /path/to/nfs-volume

It can also act as a gateway in front of other object stores, such as Azure or GCS, if needed.

@bwplotka
Copy link
Member

I have seen this but have hard time to decrypt what minio actually gives by the gateway, good to know!

@jdfalk
Copy link
Contributor Author

jdfalk commented Jun 14, 2018

That's pretty slick. I wonder how it will scale to having 10-20k of machines dumping data into prometheus and thanos pushing that much to it. I will have to run some load tests but thanks that's a promising solution.

Really I wouldn't even be opposed if thanos would just write all the blocks as is to the long term storage system's file system so it looks like a giant prometheus folder; then having it perform compaction, maintenance, etc. My biggest goal is ensuring the data is replicated, backed up, and queryable from our remote datacenters.

@adrien-f
Copy link
Member

I think you'll find the receive component able to do what you want, it' still WIP and you can track progress here: #1093

In your use case, I guess you could have Prometheus remote-writing to the Thanos Receive component in your main datacenter/storage system, would that be alright ?

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@stale stale bot closed this as completed Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants