Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add column name to JSONParsing exception message during index build #12151

Merged
merged 9 commits into from
Jan 3, 2024

Conversation

vrajat
Copy link
Collaborator

@vrajat vrajat commented Dec 14, 2023

JSON Index build may fail due to errors in JSON content. When there are multiple JSON indices or when the schema is not immediately available, the column for which index build broke is not clear.

Add column name to the JSONParsing Exception message to help debug JSON index builds.

Previously, a JSON Parse Exception emitted the following message:

Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"<json content>"; line: 1, column: 91]

With this change, the above error message is prepended with Column: <column name>

Copy link
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise. Could you add before/after of this change? How we see this error now and how that'll be improved with this change.

@codecov-commenter
Copy link

codecov-commenter commented Dec 15, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (d38e15d) 61.61% compared to head (f802b33) 61.51%.
Report is 37 commits behind head on master.

Files Patch % Lines
...egment/creator/impl/ColumnJsonParserException.java 0.00% 5 Missing ⚠️
...ment/creator/impl/SegmentColumnarIndexCreator.java 60.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12151      +/-   ##
============================================
- Coverage     61.61%   61.51%   -0.11%     
  Complexity     1153     1153              
============================================
  Files          2407     2416       +9     
  Lines        130922   131175     +253     
  Branches      20223    20245      +22     
============================================
+ Hits          80669    80692      +23     
- Misses        44366    44589     +223     
- Partials       5887     5894       +7     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.48% <30.00%> (-0.10%) ⬇️
java-21 61.38% <30.00%> (-0.11%) ⬇️
skip-bytebuffers-false 61.50% <30.00%> (-0.10%) ⬇️
skip-bytebuffers-true 61.37% <30.00%> (-0.10%) ⬇️
temurin 61.51% <30.00%> (-0.11%) ⬇️
unittests 61.51% <30.00%> (-0.11%) ⬇️
unittests1 46.60% <0.00%> (-0.07%) ⬇️
unittests2 27.69% <30.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ege-st
Copy link
Contributor

ege-st commented Dec 18, 2023

By the way, besides unit tests, were you able to do a manual test and see what happens when Pinot is running and this error happens?

@vrajat
Copy link
Collaborator Author

vrajat commented Dec 18, 2023

By the way, besides unit tests, were you able to do a manual test and see what happens when Pinot is running and this error happens?

No I did not. Do I have to implement an integration test as well?

@ege-st
Copy link
Contributor

ege-st commented Dec 18, 2023

By the way, besides unit tests, were you able to do a manual test and see what happens when Pinot is running and this error happens?

No I did not. Do I have to implement an integration test as well?

For the manual test: probably best to do a simple one, Pinot's a complex system and I like to do quick manual tests after my unit tests just to make sure there are no unexpected interactions.

For the integration test: that's a good question, yeah, I think it'd be great to have integration tests to check that Pinot correctly handles invalid inputs. @Jackie-Jiang what are your thoughts?

@vrajat
Copy link
Collaborator Author

vrajat commented Dec 18, 2023

I'll do the manual test.

re: integration test, the current test builds a segment for real. I copied the idea from another test that tests segment build. So hopefully this test is enough w.r.t automation?

@vrajat
Copy link
Collaborator Author

vrajat commented Dec 19, 2023

Ran a manual test and was able to repro the error as well get the modified error message.

I used this diff and ran JsonIndexQuickStart:

diff --git a/pinot-tools/src/main/resources/examples/batch/githubEvents/githubEvents_schema.json b/pinot-tools/src/main/resources/examples/batch/githubEvents/githubEvents_schema.json
index aafd259146..30ee206eea 100644
--- a/pinot-tools/src/main/resources/examples/batch/githubEvents/githubEvents_schema.json
+++ b/pinot-tools/src/main/resources/examples/batch/githubEvents/githubEvents_schema.json
@@ -10,7 +10,7 @@
     },
     {
       "name": "actor",
-      "dataType": "JSON"
+      "dataType": "STRING"
     },
     {
       "name": "repo",
diff --git a/pinot-tools/src/main/resources/examples/batch/githubEvents/rawdata/githubEvents_data.json b/pinot-tools/src/main/resources/examples/batch/githubEvents/rawdata/githubEvents_data.json
index ece16582b3..2baf2a760f 100644
--- a/pinot-tools/src/main/resources/examples/batch/githubEvents/rawdata/githubEvents_data.json
+++ b/pinot-tools/src/main/resources/examples/batch/githubEvents/rawdata/githubEvents_data.json
@@ -1,4 +1,4 @@

< Introduced an error in the actor field>

Error message is:

Caused by: org.apache.pinot.segment.local.segment.creator.impl.ColumnJsonParserException: Column: actor
Unexpected character ('1' (code 49)): was expecting a colon to separate field name and value
 at [Source: (String)"{"id"18542751,"login":"LimeVista","display_login":"LimeVista","gravatar_id":"","url":"https://github.com/gitapi/users/LimeVista","avatar_url":"https://github.com/avatars/u/18542751?"}"; line: 1, column: 192]

vrajat and others added 4 commits December 20, 2023 11:46
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
@walterddr walterddr merged commit f9df57a into apache:master Jan 3, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants