-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design doc: object store scrape #2107
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2107 +/- ##
=======================================
Coverage 61.21% 61.22%
=======================================
Files 146 146
Lines 11187 11187
=======================================
+ Hits 6848 6849 +1
+ Misses 3792 3789 -3
- Partials 547 549 +2
|
|
||
### Where should we keep positions.yaml file ? | ||
|
||
Since, the files we scrape is remotely placed in a objecstore, should we keep the `positions.yaml` also in the same place? Advantage of keeping the `positions.yaml` within the objecstore will help us to recover from the state where the scraping was left when promtail is running in docker container and the container crashes. However, `user` reading the files from the Object store would also need write access to the Objecstore. Getting write access may not be possible in some cases where people doesn't want to risk modifying the files by giving write access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You gotta be very careful with this due to the data consistency model in object storage systems. You may get stale data on a read after write, meaning you scrape the same data twice. Potentially consider a highly consistent system like consul/etcd, similar to how Loki uses these for its distributor/ingester ring.
This would also enable usage of locking/leasing so that multiple promtail instances could coordinate scraping. I think this is pretty important since it's possible during rolling updates that you end up with multiple promtail instances running, and this could also lead to duplicated scrapes/data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very valid point. Especially with S3 we should be very careful for any operation we do like LIST, PUT, GET because of eventual consistency issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now if we just skip through out of order and try to always snapshot the position file when terminating and from time to time. This file should be just used for resuming/starting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. You still need to think about concurrency though.
|
||
### How should we support reading of compressed files ? | ||
|
||
If the files in the Objectstore is stored in a compressed format, promtail should have support for reading the compressed data and we should make sure we support most of the compression algorithms like we already do for chunks(Snappy, LZ4, Gzip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to have this addition but to make it simple we should add it after. And also we could add it for static config too. But let's not add this for the first MVP. It can still stays in this doc though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on this, we should add features one by one.
|
||
### How we should identify and watch newly added files ? | ||
|
||
To identify and to start tailing the newly added files, we should list the files in the Object store at some desired intervals. How frequently we should this? Once we add the file to `watch`, to identify any changes to the file we should frequently check the modified `datetime` of the file. The list of files might be large and each watch will be running in a `go routine` and each `go routine` making frequent `http` requests to the Obejctstore. We should make sure this doesn't affect promtail's performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not worried about goroutine. I think the frequency should be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought that since this is a client side code we should make sure we don't consume lot of CPU cycles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 is event driven and can be configured to trigger an event when updates occur. Setting up a Lambda in combination with SQS, promtail can be notified that new content has arrived. I’m ingesting data from S3 into Loki via promtail today using a similar architecture. This eliminates polling for changes to S3. I am interested in your proposed feature as I would use it for much more than load balancer logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mrmikemarshall
Thanks for the inputs. We definitely want to consider other options like SQS. However, SQS doesn't support FIFO and we might get out of order entries due to this. Also, as @Punkoivan mentioned below, AWS doesn't have any plan to support FIFO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our preliminary version, we just want to keep things simple. Later, we can plan to support both general polling and SQS
|
||
> There is no appending of lines to file in object store | ||
|
||
In the positions.yaml along with the `offset` we should also store file modified `datetime`. If we see that date is modified we should reset the offset and tail the contents from the begining. This is the same behaviour in local filesystem as well. If a file is `opened`, `hpcloud tail` will tail the file from the begining. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the problem if we resume where the offset was when a file is modified ? I think it's fine to expect a file to be updated with a bigger content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two scenarios here. One, file is just appended with more content and uploaded. Two, full content of the file is modified. How will we identify this? If the whole file is modified then we are expected to tail from the beginning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do some research to see how it work for AWS loadbalancer logs which is the essence of this feature. I wondering if they are just dropping file.
I'm ok to no support update for the first version too and leave that for later. I don;'t think file update is a big requirement.
/cc @Punkoivan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we see that date is modified we should reset the offset and tail the contents from the begining
I was thinking that modified time will be updated if some content was just appended.
We can store timestamp for the last line in the file and the compare to appended/replaced line's timestamp.
Pseudo code:
if previous_line_ts > new_line_ts
then content was fully replaced
if previous_line_ts < new_line_ts
then content was added.
That's based on the statement "Loki doesn't support out of order message". In other words even if we have new lines with older timestamp, it will be declined.
I assume it's bad approach which will lead to a huge overhead, especially in case where we have hundreds of files, but might be helpful to develop good approach :)
|
||
> How should we support reading of compressed files ? | ||
|
||
It would be good to support most of the compression algorithms like we already do for chunks(Snappy, LZ4, Gzip). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add this after, it will be super easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
|
||
> Where should we keep `positions.yaml` file ? | ||
|
||
I think this should be an user choice. One thing we need to consider here is that there will be API limit per account per region and if we store the `positions.yaml` in the object store itself, we may end up doing lot of APIs calls and we would get `RequestLimitExceeded` very frequently. Still, backoff and retry should be implemented in case we get this error even when doing API calls less frequently. So, placing and updating the `positions.yaml` should be left to the user so he can decide based on the API usage of his setup wether to keep it locally or remotely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would start with storing the file first on disk, with docker and kubernetes it's pretty simple to ensure the file is always mount and kept safe.
We can store the file later in the object store in another iteration. The main reason for this is running multiple promtail will not work, at least not from the first iteration and so I don't see any big advantage for saving it in the cloud, except to run stateless workload which again is not a big deal nowadays. (e.g statefulset, volume mount, etc..)
|
||
> How we should identify and watch newly added files ? | ||
|
||
Limit on API calls should also be considered for syncing the newly added files. To start tailing the newly added files, we should do `ListObjects` at a desired interval. We should make sure we don't keep this interval too small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good points !
|
||
## path to store positions.yaml | ||
filename: /path/to/positions.yaml | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support a glob to limit path in the bucket for two reasons :
- You can use this for scaling horizontally, aka running multiple promtails each of them with different path. ( That's cool until we have something better, like a ring to shard files across multiple promtails)
- You may not want to read and send from all files in the buckets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from couple of comments. This proposal looks good.
When you start working on this, please don't try to do it all at once, let's focus on a MVP aka send text log from object store to Loki with position file on disk.
Let's add the rest of the noise later. This will help ease reviews but also tackle problems at once.
Will update the doc with the list of features we would try to achieve in the first iteration |
Instead of frequently doing `ListObjects`, we can consider to use event notification in S3 with `SQS`, through which we can get to know the newly added files and then we can fetch only them. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyriltovena WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me put few words please.
SQS doesn't support FIFO for event notifications from S3. That means we will have "out of order" errors in Loki.
We're using S3, SQS, fluentd and loki now and sometimes get this error. AWS has no plan to support FIFO for S3 notifications, we asked them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQS doesn't support FIFO for event notifications from S3. That means we will have "out of order" errors in Loki.
Nice point.
AWS has no plan to support FIFO for S3 notifications, we asked them
This is very good information. It's also mentioned here https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
Thanks @Punkoivan
|
||
### How to fetch the file and read the contents? | ||
|
||
Here, we have two options. One, just get the file with a `reader` and iterate the lines of the file but if the files we are fetching are very large, doing `GetObject` will consume more memory. This might be a serious performance issue. Two, we can download the file to local and then tail the file the same way we do for other log files. This would help us reduce memory consumption when fetching large files. Also, we should consider to implement multipart download to make the downloading of large files faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyriltovena Need your inputs here
e3379bf
to
f4ded5e
Compare
f4ded5e
to
7e48400
Compare
1. We will use Cortex's object client code to list and read objects | ||
2. `positions.yaml` will be kept locally | ||
3. If the object is modifed, we will resume to read from the previous known position. Suppose, if the modified object's size is less than the previous know position then we read from the begining. | ||
4. We will support scraping logs only from S3 in first iteration. | ||
5. Only normal files and `gzip` files will be supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyriltovena This is the first iteration plan. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ELB logs will be published to S3 every 5mins by default. May be more than 1 file based on traffic. So, I think it's ok if we don't support reading the modified file in the first iteration.
FYR: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
@cyriltovena Just curious, any progress on this? Would be useful for accessing AWS ALB logs. Also, as an a side, those "Sign our SLA" bots are pretty annoying. I mostly skip contributing to projects that require them. If you really need it, make it dead simple, e.g. ask the user to comment in the PR and acknowledge that they agree, e.g. "Please add a comment to this thread saying 'I agree to the SLA per grafana.com/sla' and be done with it." |
Closing this as no activities for long time. Feel free to send new PR if anyone want's to revive the work |
@kavirajk You'd probably want to update your automation, since this isn't done. 😄 |
This may be relevant for any repo watchers: #5065 |
What this PR does / why we need it:
This PR is to start the discussion on how we should implement Object store scraping feature described in #2045
Thanks to @cyriltovena for his help and guidance.
Also, thanks to @Punkoivan, some of the points are taken from his comments here #2045