Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new fields file.origin_referrer_url & file.origin_url #2348

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

AsuNa-jp
Copy link

@AsuNa-jp AsuNa-jp commented Jun 27, 2024

Checklist

  • Have you signed the contributor license agreement?
    • YES
  • Have you followed the contributor guidelines?
    • YES
  • For proposing substantial changes or additions to the schema, have you reviewed the RFC process?
    • Not substantial changes
  • If submitting code/script changes, have you verified all tests pass locally using make test?
    • Not code/script changes changes
  • If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes?
    • YES
  • Is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
    • YES
  • Have you added an entry to the CHANGELOG.next.md?
    • YES

PR summary

This PR adds file.origin_referrer_url and file.origin_url

To reviewers

Size of the fields
The two fields added in this PR are intended to store URL. Therefore, the default field size of 1024 bytes is insufficient. As a result, we want to set ignore_above to 8k(8192) bytes, if possible.

Default Field
This field will not necessarily be included in all file events.
Therefore, I set it as default_field: false, but please let me know if this is incorrect.

@AsuNa-jp AsuNa-jp requested a review from a team as a code owner June 27, 2024 08:19
@AsuNa-jp AsuNa-jp self-assigned this Jun 27, 2024
@AsuNa-jp AsuNa-jp added New Field Request review endpoint Relevant to elastic endpoint security Team: ECS labels Aug 6, 2024
Copy link
Contributor

@mjwolf mjwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks generally pretty good, just some small things to fix up

CHANGELOG.next.md Outdated Show resolved Hide resolved
- name: origin_referrer_url
level: extended
type: keyword
ignore_above: 8192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly large ignore_above value, do you expect that there will be data this size? Have you looked into the performance impact?

Copy link
Author

@AsuNa-jp AsuNa-jp Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you expect that there will be data this size?

Yes. When defining the URL size, I did some research and came across a discussion from a past ECS thread about URL size, which I used as a reference. There was a discussion that the URL field should be at least 4096 bytes, and ideally 8192 bytes based on the actual URL log data, so I followed that suggestion. In addition, after discussing it within the team, we concluded that since the typical maximum URL size accepted by servers is 8KB, 8KB seemed like the most reasonable choice.

Have you looked into the performance impact?

However, I haven't look at the performance impact of setting ignore_above as 8192. If performance testing is required, I would be happy to carry it out. If there is any documentation on how to do it, could you please share it with me?

schemas/file.yml Outdated
level: extended
type: keyword
ignore_above: 8192
description: The url of the webpage that linked to the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize URL here and in the below description

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I have changed url to URL!

schemas/file.yml Outdated
level: extended
type: keyword
ignore_above: 8192
description: The url where the file is hosted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered redirects? For example, when using CDNs, the initial URL can be redirected to various download URLs. Is that meaningful for your use case? Should the description specify if this is the initial request or a redirected URL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have implemented this field (origin_url) to retrieve the actual URL from which the file was downloaded using the Mark of the Web (MoTW) information in Windows. (MoTW is metadata added to files downloaded from the internet in Windows. )

For example, if you download ejabberd_20.12-0_amd64.deb from https://www.process-one.net/downloads/downloads-action.php?file=/20.12/ejabberd_20.12-0_amd64.deb, the download is redirected, and the file is actually retrieved from https://static.process-one.net/ejabberd/downloads/20.12/ejabberd_20.12-0_amd64.deb.

In MoTW, as shown in the attached image, the redirect URL is saved as the file's host URL.

image

@AsuNa-jp
Copy link
Author

AsuNa-jp commented Sep 4, 2024

@mjwolf
Thank you for the detailed comments/feedbacks🙇🏻‍♀️ & I’m sorry for the delay in responding, as I was out of the office from mid to late August. I’ve updated the PR based on your feedback, and I’ve also answered your questions above. I’d appreciate it if you could take a look.🙇🏻‍♀️

level: extended
type: keyword
ignore_above: 8192
description: The URL of the webpage that linked to the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what does it mean? Could you elaborate it a bit, give some examples maybe? Is it url, where file was located before downloading?

Copy link
Author

@AsuNa-jp AsuNa-jp Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @trisch-me,
Thank you for the comments/feedbacks!🙇🏻‍♀️
Following is the actual example of the file.origin_referrer_url and file.origin_url.

When you download an image file (for example, news30_img3.png) from the following web page,

the file.origin_url will be the url for the downloaded file, and the file.origin_referrer_url will be the url which hosted the file (http://girls.seccon.jp/news30.html).

image

@trisch-me
Copy link
Contributor

trisch-me commented Sep 12, 2024

Hey @AsuNa-jp I think it would be great to follow RFC process for this change, because it's a new addition and it should usually go through the process.

Currently looking into fields I'm not sure we can add those to the file namespace because from the semantic logic they are of type file and would be great to have them as such.

in Otel we have this differentiation between schema, i.e. https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/url.yaml and usage of schema (or how we call it - usecases), where you have fields from different namespaces together in 1 place, describing the usecase (https://github.com/open-telemetry/semantic-conventions/blob/main/model/trace/database.yaml#L263)

Translating this into your PR and new fields, I would argue that they should be a part of the file namespace (though it's possible too) but rather be a part of that mix for specific usecase. Also another possibility to have them not as strings but as a reference to a file namespace. Why - because then file.url will have the same restrictions as all other urls (for example redacting sensitive data in url)

In current Otel version one can't rename referenced attributes, it means url.full is always url.full but there will be a change soon for this possibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
endpoint Relevant to elastic endpoint security New Field Request review Team: ECS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants