Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid URL Fix: Decoding manually for some reserved characters #1682

Merged
merged 14 commits into from
Aug 27, 2024

Conversation

this-is-shivamsingh
Copy link
Contributor

@this-is-shivamsingh this-is-shivamsingh commented Jul 31, 2024

Reference PR: #1664

  • While testing the previous PR, we found for 11/18 reserved characters, JS neither encodes/decodes this.
  • When a user gives original characters like # : /, then we don't have an issue, but if user given an encoded value of these characters, it does get's not decoded, and while encoding the % part of get encoded again
decodeURI('%23') -> output: %23
encodeURI('%23') -> output: %2523
  • To resolve this issue we are decoding these encoded characters manually.
  • General public URLs/endpoints are not like this, but if someone is using a localhost, and hosted a file(containing this encoded value) for taking a snapshot then creates an issue, when the response is 404 after modifying the URL.
  • We tested for all reserved and unreserved characters from the official Wikipedia docs(https://en.wikipedia.org/wiki/Percent-encoding) and only found these 11 characters that were causing this issue.

Changes:

  • Manually decoding some reserved characters.
  • A feature flag to disable this modification in case user encounter some issues with this.

List of all reserved characters for which decodeURI, doesn't work.

# -> %23
$ -> %24
: -> %3A
& -> %26
+ -> %2B
, -> %2C
/ -> %2F
; -> %3B
= -> %3D
? -> %3F
@ -> %40

Revisiting solution for the above issue with an update:

  1. While decoding manually for reserved characters we found that if a file system is hosted locally, and if the filename contains ? and #, then the browser converts them to their encoded form.
  2. Suppose a file - abd?s.html is hosted locally then the browser opens this file as http://localhost:9000/abd%3Fs.html, now once we decode it manually, it becomes http://localhost:9000/abd?s.html that is a valid file path, but the browser is not able to recognise it and return 404, because for ? browser think it to be a query params, but it is actually a filename, so it only accepts the encoded value of it.
  3. It is happening for ? and # only out of all reserved characters. And only happens for if a file name contains these 2 reserved characters.
  4. To solve generically and for all reserved characters, we are just neither encoding nor decoding the reserved characters from the above list, so if the URL contains an encoded value of a reserved character, it just used as it is, without changing it. We preserve the encoded reserved characters from being encoding/decoding.

Changes:

  1. We just replace the encoded reserved character with a placeholder value and once after encoding/decoding is done we replace placeholder values with their encoded reserved character so that nothing is changed for encoded reserved characters

@this-is-shivamsingh this-is-shivamsingh changed the title fix: decoding manually for reserved characters Invalid URL Fix: Decoding manually for some reserved characters Aug 1, 2024
Copy link
Contributor

@pankaj443 pankaj443 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -342,7 +342,7 @@ export function createDiscoveryQueue(percy) {
authorization: snapshot.discovery.authorization,
userAgent: snapshot.discovery.userAgent,
captureMockedServiceWorker: snapshot.discovery.captureMockedServiceWorker,
meta: snapshot.meta,
meta: { ...snapshot.meta, snapshotURL: snapshot.url },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We are adding a snapshot url there, to use it in network.js

// This code checks for issues such as `%` and leading spaces and warns the user accordingly.
decodeAndEncodeURLWithLogging(request.url, this.log, {
meta: { ...this.meta, url: request.url },
shouldLogWarning: request.url !== this.meta?.snapshotURL,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Since snapshot.url also goes via network.js, so if snapshot.url is invalid the error message is already been shown before so no need to show again. Only show for assets urls

for (let key of Object.keys(RESERVED_CHARACTERS)) {
let regex = new RegExp(key, 'g');
if (regex.test(result)) {
let placeholder = `__PERCY_PLACEHOLDER_${placeHolderCount}__`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using __PERCY_PLACEHOLDER_0__ as a placeholder, as these alphanumerical and _ doesn't get encoded as it is a part of a valid URL and not a part of reserved characters.

Copy link
Contributor

@rishigupta1599 rishigupta1599 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@this-is-shivamsingh this-is-shivamsingh merged commit d6f9fc2 into master Aug 27, 2024
36 checks passed
@this-is-shivamsingh this-is-shivamsingh deleted the PPLT_3177_2.0 branch August 27, 2024 07:24
@this-is-shivamsingh this-is-shivamsingh added the ✨ enhancement New feature or request label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants