Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip invalid json string rather than throwing error during json indexing #12238

Merged
merged 7 commits into from
Feb 8, 2024

Conversation

wirybeaver
Copy link
Contributor

Add a new configuration to skip broken json string and continue indexing following json strings instead of throw exception.

@codecov-commenter
Copy link

codecov-commenter commented Jan 8, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (640ebe5) 61.58% compared to head (fc5a157) 61.73%.
Report is 48 commits behind head on master.

Files Patch % Lines
...apache/pinot/spi/config/table/JsonIndexConfig.java 50.00% 3 Missing ⚠️
...ain/java/org/apache/pinot/spi/utils/JsonUtils.java 77.77% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12238      +/-   ##
============================================
+ Coverage     61.58%   61.73%   +0.14%     
+ Complexity     1153      207     -946     
============================================
  Files          2415     2425      +10     
  Lines        131502   132728    +1226     
  Branches      20310    20533     +223     
============================================
+ Hits          80988    81940     +952     
- Misses        44587    44770     +183     
- Partials       5927     6018      +91     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.69% <70.58%> (+0.18%) ⬆️
java-21 61.60% <70.58%> (+0.13%) ⬆️
skip-bytebuffers-false 61.71% <70.58%> (+0.16%) ⬆️
skip-bytebuffers-true 61.58% <70.58%> (+0.14%) ⬆️
temurin 61.73% <70.58%> (+0.14%) ⬆️
unittests 61.73% <70.58%> (+0.14%) ⬆️
unittests1 46.88% <58.82%> (+0.26%) ⬆️
unittests2 27.75% <11.76%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

JsonNode jsonNode;
try {
jsonNode = JsonUtils.stringToJsonNode(jsonString);
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why limit to IOException? Can we catch a Throwable here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the actual exception here we supposed to cathc is malform json exception only. all other issues should still throw --> e..g we should not return incorrect result or simply replace that with the default when there are other issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @walterddr ,

Copy link
Contributor Author

@wirybeaver wirybeaver Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankitsultana @walterddr
I change the source catch JsonProcessingException in the most recent commit because the stringToJsonNode internal invoke the DEFAULT_READER.readTree() which throws JsonProcessingException only.

JsonUtils.stringToJsonNode(...) throws IOException {
        DEFAULT_READER.readTree(...);
}

class ObjectReader {
      public JsonNode readTree(String json) throws JsonProcessingException, JsonMappingException;
}

JsonMappingException is a child of JsonProcessingException.

JsonNode jsonNode;
try {
jsonNode = JsonUtils.stringToJsonNode(jsonString);
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the actual exception here we supposed to cathc is malform json exception only. all other issues should still throw --> e..g we should not return incorrect result or simply replace that with the default when there are other issues.

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm mostly. one more comment on the testing.

@Test
public void testSkipInvalidJson() throws Exception {
JsonIndexConfig jsonIndexConfig = new JsonIndexConfig();
jsonIndexConfig.setSkipInvalidJson(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you

  1. add a test when it is false it still throws?
  2. add a test for other exception is still throws when this is set to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor Author

@wirybeaver wirybeaver Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you

  1. add a test when it is false it still throws?
  2. add a test for other exception is still throws when this is set to true?

for the 2nd requirement, I haven't figure a way to test since the JsonUtils make the objectReader as a static final member and I cannot replace a static final member without changing the constructor signature or not using more powerful mock libraray.

  public static final ObjectReader DEFAULT_READER = DEFAULT_MAPPER.reader();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late reply. that fine if we can't find it.

@wirybeaver
Copy link
Contributor Author

wirybeaver commented Jan 24, 2024

Deployed this change to the company's prod cluster with following config. The ERROR segments containing invalid json rows become ONLINE.

    "fieldConfigList": [
      {
        "name": "foo",
        "encodingType": "RAW",
        "indexTypes": [],
        "compressionCodec": "ZSTANDARD",
        "indexes": {
          "json": {
            "maxLevels": 2,
            "disableCrossArrayUnnest": true,
            "skipInvalidJson": true
          },
          "text": {}
        },
        "properties": {
          "deriveNumDocsPerChunkForRawIndex": "true",
          "rawIndexWriterVersion": "3",
        },
        "tierOverwrites": null
      }
    ]

Copy link
Contributor

@ankitsultana ankitsultana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies. Had stamped but had a question. @wirybeaver can you take a look?

@wirybeaver
Copy link
Contributor Author

Apologies. Had stamped but had a question. @wirybeaver can you take a look?

no worries

@wirybeaver
Copy link
Contributor Author

@walterddr Could you review again to see if there's any objection?

@ankitsultana ankitsultana merged commit 8684e04 into apache:master Feb 8, 2024
19 checks passed
@ankitsultana
Copy link
Contributor

ankitsultana commented Feb 8, 2024

Merged since very minor changes since Rong's last review. (Rong is OOO for a week)

suyashpatel98 pushed a commit to suyashpatel98/pinot that referenced this pull request Feb 28, 2024
@tibrewalpratik17 tibrewalpratik17 mentioned this pull request Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants