-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new fields file.origin_referrer_url & file.origin_url #2348
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks generally pretty good, just some small things to fix up
- name: origin_referrer_url | ||
level: extended | ||
type: keyword | ||
ignore_above: 8192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly large ignore_above value, do you expect that there will be data this size? Have you looked into the performance impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you expect that there will be data this size?
Yes. When defining the URL size, I did some research and came across a discussion from a past ECS thread about URL size, which I used as a reference. There was a discussion that the URL field should be at least 4096 bytes, and ideally 8192 bytes based on the actual URL log data, so I followed that suggestion. In addition, after discussing it within the team, we concluded that since the typical maximum URL size accepted by servers is 8KB, 8KB seemed like the most reasonable choice.
Have you looked into the performance impact?
However, I haven't look at the performance impact of setting ignore_above
as 8192. If performance testing is required, I would be happy to carry it out. If there is any documentation on how to do it, could you please share it with me?
schemas/file.yml
Outdated
level: extended | ||
type: keyword | ||
ignore_above: 8192 | ||
description: The url of the webpage that linked to the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize URL here and in the below description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I have changed url to URL!
schemas/file.yml
Outdated
level: extended | ||
type: keyword | ||
ignore_above: 8192 | ||
description: The url where the file is hosted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered redirects? For example, when using CDNs, the initial URL can be redirected to various download URLs. Is that meaningful for your use case? Should the description specify if this is the initial request or a redirected URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have implemented this field (origin_url
) to retrieve the actual URL from which the file was downloaded using the Mark of the Web (MoTW) information in Windows. (MoTW is metadata added to files downloaded from the internet in Windows. )
For example, if you download ejabberd_20.12-0_amd64.deb
from https://www.process-one.net/downloads/downloads-action.php?file=/20.12/ejabberd_20.12-0_amd64.deb, the download is redirected, and the file is actually retrieved from https://static.process-one.net/ejabberd/downloads/20.12/ejabberd_20.12-0_amd64.deb.
In MoTW, as shown in the attached image, the redirect URL is saved as the file's host URL.
Co-authored-by: Michael Wolf <michael.wolf@elastic.co>
@mjwolf |
level: extended | ||
type: keyword | ||
ignore_above: 8192 | ||
description: The URL of the webpage that linked to the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what does it mean? Could you elaborate it a bit, give some examples maybe? Is it url, where file was located before downloading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @trisch-me,
Thank you for the comments/feedbacks!🙇🏻♀️
Following is the actual example of the file.origin_referrer_url
and file.origin_url
.
When you download an image file (for example, news30_img3.png
) from the following web page,
the file.origin_url
will be the url for the downloaded file, and the file.origin_referrer_url
will be the url which hosted the file (http://girls.seccon.jp/news30.html).
Hey @AsuNa-jp I think it would be great to follow RFC process for this change, because it's a new addition and it should usually go through the process. Currently looking into fields I'm not sure we can add those to the file namespace because from the semantic logic they are of in Otel we have this differentiation between schema, i.e. https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/url.yaml and usage of schema (or how we call it - usecases), where you have fields from different namespaces together in 1 place, describing the usecase (https://github.com/open-telemetry/semantic-conventions/blob/main/model/trace/database.yaml#L263) Translating this into your PR and new fields, I would argue that they should be a part of the file namespace (though it's possible too) but rather be a part of that mix for specific usecase. Also another possibility to have them not as strings but as a reference to a In current Otel version one can't rename referenced attributes, it means url.full is always |
Checklist
make test
?make
and committed those changes?PR summary
This PR adds
file.origin_referrer_url
andfile.origin_url
To reviewers
Size of the fields
The two fields added in this PR are intended to store URL. Therefore, the default field size of 1024 bytes is insufficient. As a result, we want to set ignore_above to 8k(8192) bytes, if possible.
Default Field
This field will not necessarily be included in all file events.
Therefore, I set it as
default_field: false
, but please let me know if this is incorrect.