Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply speculativeFixup before evaluating meta content #595

Merged
merged 1 commit into from
Aug 8, 2024

Conversation

kris-sigur
Copy link
Collaborator

This avoids treating meta conent values like "Example.com" as relative urls as they are converted to absolute URLs. This is already done for speculative JS extraction.

The example sited above is common in meta "sitename" elements where the sitename is something dot com or similar.

This avoids treating meta conent values like "Example.com" as relative
urls as they are converted to absolute URLs. This is already done for
speculative JS extraction.

The example sited above is common in meta "sitename" elements where the
sitename is something dot com or similar.
@kris-sigur
Copy link
Collaborator Author

For additional context. One of the main banks here in Iceland includes this meta tag in every page:

<meta property="og:site_name" content="Landsbankinn.is"/>

Leading to a lot of false positives. Worse, as they run a very sensitive security tool, these false positive requests are marked as malicious and the crawler is quickly blacklisted.

@kris-sigur kris-sigur merged commit 0faf338 into master Aug 8, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants