Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address inconsistency in single quote normalization in JSON reader #15324

Merged

Conversation

shrshi
Copy link
Contributor

@shrshi shrshi commented Mar 15, 2024

Description

This PR addresses the inconsistency in processing single quotes within a quoted string in the single quote normalizer.
In the current implementation, when we have an escaped single quote within a single quoted string, the normalizer removes the backslash escape on converting the string to double quotes. However, the normalizer retains the contents of double quoted strings as-is i.e. if there are escaped single quotes within a double quoted string, the backslash character is retained in the output.

We address this inconsistency by removing the escape character for single quotes in all double quoted string in the output.

Tackles #15303 to mimic Spark behavior.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 15, 2024
@shrshi shrshi added feature request New feature or request non-breaking Non-breaking change labels Mar 15, 2024
@shrshi shrshi marked this pull request as ready for review March 18, 2024 16:35
@shrshi shrshi requested a review from a team as a code owner March 18, 2024 16:35
@shrshi shrshi requested review from vuule and davidwendt March 18, 2024 16:35
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me.

Copy link
Contributor

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just two optional nitpicks.

cpp/src/io/json/json_normalization.cu Outdated Show resolved Hide resolved
cpp/src/io/json/json_normalization.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @elstehle 's suggestions, otherwise looks great.
Love the test name changes! 👍

@shrshi
Copy link
Contributor Author

shrshi commented Mar 19, 2024

/merge

@rapids-bot rapids-bot bot merged commit 4a5fab7 into rapidsai:branch-24.04 Mar 19, 2024
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants