Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json extract index filter support #12683

Merged
merged 41 commits into from
Mar 26, 2024

Conversation

saurabhd336
Copy link
Contributor

@saurabhd336 saurabhd336 commented Mar 21, 2024

This PR depends on #12532

Allows contextual filtering of documents when using jsonExtractIndex. Eg

columnName: jsonField
value:
{
    "stringField": "xyz",
    "arrField": [{"f1": 1, "f2": 2}, {"f1": 3, "f2": 4}, {"f2": 6}, {"f1": 0, "f2": 5}]
}

jsonExtractIndex(jsonField, '$.arrField[*].f1', 'INT_ARRAY', '0', '"$.arrField[*].f2" > 2') returns [3, 0]
i.e. Only return matching array elements that match the given json predicate.

@saurabhd336
Copy link
Contributor Author

saurabhd336 commented Mar 21, 2024

@wirybeaver @Jackie-Jiang Split the original PR (https://github.com/apache/pinot/pull/12532/files) into two as discussed. Please take a look at this one for filter support once #12532 is merged. Ty!

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2024

Codecov Report

Attention: Patch coverage is 0% with 49 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (59551e4) to head (f8b9b3c).
Report is 156 commits behind head on master.

Files Patch % Lines
...t/index/readers/json/ImmutableJsonIndexReader.java 0.00% 23 Missing ⚠️
...local/realtime/impl/json/MutableJsonIndexImpl.java 0.00% 13 Missing ⚠️
...rm/function/JsonExtractIndexTransformFunction.java 0.00% 12 Missing ⚠️
...e/pinot/common/function/TransformFunctionType.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #12683       +/-   ##
=============================================
- Coverage     61.75%    0.00%   -61.76%     
=============================================
  Files          2436     2381       -55     
  Lines        133233   130795     -2438     
  Branches      20636    20267      -369     
=============================================
- Hits          82274        0    -82274     
- Misses        44911   130795    +85884     
+ Partials       6048        0     -6048     
Flag Coverage Δ
custom-integration1 ?
integration 0.00% <0.00%> (-0.01%) ⬇️
integration1 ?
integration2 0.00% <0.00%> (ø)
java-11 ?
java-21 0.00% <0.00%> (-61.63%) ⬇️
skip-bytebuffers-false ?
skip-bytebuffers-true 0.00% <0.00%> (-27.73%) ⬇️
temurin 0.00% <0.00%> (-61.76%) ⬇️
unittests ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

return getMatchingFlattenedDocIds(filter, false);
}

private RoaringBitmap getMatchingFlattenedDocIds(FilterContext filter, boolean allowNestedExclusivePredicate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand why we cannot always allow nested exclusive predicate?

for (int i = 1; i < numChildren; i++) {
matchingDocIds.or(getMatchingFlattenedDocIds(children.get(i)));
matchingDocIds.or(getMatchingFlattenedDocIds(children.get(i), allowNestedExclusivePredicate));
}
return matchingDocIds;
}
case PREDICATE: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add NOT since we allow exclusive predicate?

@@ -367,10 +379,18 @@ public void convertFlattenedDocIdsToDocIds(Map<String, RoaringBitmap> valueToFla
}

@Override
public Map<String, RoaringBitmap> getMatchingFlattenedDocsMap(String jsonPathKey) {
public Map<String, RoaringBitmap> getMatchingFlattenedDocsMap(String jsonPathKey, @Nullable String filterString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced in this PR, but I think we should support json path key with and without $ prefix. See getMatchingFlattenedDocIds() for reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the support

@saurabhd336 saurabhd336 force-pushed the jsonExtractIndexFilterSupport branch from cb36a1b to 8e45c63 Compare March 22, 2024 05:43
@saurabhd336 saurabhd336 force-pushed the jsonExtractIndexFilterSupport branch from 2581212 to 3771339 Compare March 22, 2024 08:56
@saurabhd336
Copy link
Contributor Author

@Jackie-Jiang updated the PR to let getMatchingFlattenedDocsMap accept jsonPath string with $ itself (similar to getMatchingFlattenedDocIds)

As for the nested exclusive predicates, I think my understanding was incorrect. We can't infact support nested exclusive predicates (see https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java#L106). Turns out, the contract is for the predicates to maintain context, and therefore trying to solve nested exclusive predicates by flipping results of individual flattened doc id bitmaps would be wrong. Since our usecase does not require nested exclusive predicates, I removed that part entirely. Do have a look ty!

@Jackie-Jiang Jackie-Jiang added feature documentation release-notes Referenced by PRs that need attention when compiling the next release notes labels Mar 25, 2024
@saurabhd336 saurabhd336 merged commit 57f50d3 into apache:master Mar 26, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants