Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix partition handling by always using string values #12115

Merged
merged 2 commits into from
Dec 20, 2023

Conversation

Jackie-Jiang
Copy link
Contributor

Always use string representation of the value to compute the partition so that the value from query can always match the value from the segment.
Fix the wrong handling of BYTES and BIG_DECIMAL type as partition column

@Jackie-Jiang Jackie-Jiang added incompatible Indicate PR that introduces backward incompatibility bugfix labels Dec 7, 2023
@codecov-commenter
Copy link

codecov-commenter commented Dec 7, 2023

Codecov Report

Attention: 27 lines in your changes are missing coverage. Please review.

Comparison is base (d38e15d) 61.61% compared to head (14567e5) 61.61%.

Files Patch % Lines
...ot/common/request/context/RequestContextUtils.java 33.33% 8 Missing and 2 partials ⚠️
...main/java/org/apache/pinot/spi/data/FieldSpec.java 37.50% 5 Missing ⚠️
...local/indexsegment/mutable/MutableSegmentImpl.java 0.00% 2 Missing ⚠️
.../stats/BigDecimalColumnPreIndexStatsCollector.java 0.00% 1 Missing and 1 partial ⚠️
...impl/stats/BytesColumnPredIndexStatsCollector.java 0.00% 1 Missing and 1 partial ⚠️
...impl/stats/DoubleColumnPreIndexStatsCollector.java 0.00% 1 Missing and 1 partial ⚠️
.../impl/stats/FloatColumnPreIndexStatsCollector.java 0.00% 1 Missing and 1 partial ⚠️
...r/impl/stats/LongColumnPreIndexStatsCollector.java 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12115      +/-   ##
============================================
- Coverage     61.61%   61.61%   -0.01%     
  Complexity     1153     1153              
============================================
  Files          2407     2407              
  Lines        130922   130871      -51     
  Branches      20223    20222       -1     
============================================
- Hits          80669    80632      -37     
+ Misses        44366    44351      -15     
- Partials       5887     5888       +1     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (ø)
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 61.55% <63.01%> (-0.02%) ⬇️
java-21 61.48% <63.01%> (-0.01%) ⬇️
skip-bytebuffers-false 61.58% <63.01%> (-0.02%) ⬇️
skip-bytebuffers-true 61.46% <63.01%> (+<0.01%) ⬆️
temurin 61.61% <63.01%> (-0.01%) ⬇️
unittests 61.60% <63.01%> (-0.01%) ⬇️
unittests1 46.64% <52.17%> (-0.03%) ⬇️
unittests2 27.69% <13.69%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

return String.valueOf(_partitionFunction.getPartition(convertToString(genericRow.getValue(_column))));
}

private static String convertToString(Object value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to catch unsupported types here to ensure that for new types, we do the string conversion correctly ? Same for the logic in MutableSegmentImpl.java (similar to what you have in getStringValue utility)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can potentially check for all possible classes supported (e.g. Integer, Long, Float, Double, String etc.), but that might involve performance overhead (one if check per type). Since we don't add new data types frequently, I prefer keeping the check only for types that need to be converted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good. I assume that if a new type is introduced, we anyway have to modify RequestContextUtils to add the new literal type? (where the exception is thrown). if thats the case, we can consider moving this logic and the one in MutableSegmentImpl.java to RequestContextUtils, so that they are in the same file and easy to discover all the places that needs the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Moved them into FieldSpec where the DataType is defined

Copy link
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM . Thanks for doing this in a generic way.

return String.valueOf(_partitionFunction.getPartition(convertToString(genericRow.getValue(_column))));
}

private static String convertToString(Object value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good. I assume that if a new type is introduced, we anyway have to modify RequestContextUtils to add the new literal type? (where the exception is thrown). if thats the case, we can consider moving this logic and the one in MutableSegmentImpl.java to RequestContextUtils, so that they are in the same file and easy to discover all the places that needs the change.

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM!

@Jackie-Jiang Jackie-Jiang merged commit c706fd0 into apache:master Dec 20, 2023
19 checks passed
@Jackie-Jiang Jackie-Jiang deleted the fix_partition_function branch December 20, 2023 00:41
@jackjlli
Copy link
Member

jackjlli commented Jan 9, 2024

Hi @Jackie-Jiang @snleee @swaminathanmanish , thanks for making the code changes in this PR! We really appreciate the contribution here. While we do have some concerns on making the backward incompatible changes not just for this PR (i.e. changing the signature of a public API) but for all the future PRs, as any incompatible changes could easily impact the existing running Pinot use cases and requires admins to verify, not just for 1 cluster in 1 company but for all the Pinot clusters run in different places/companies.

If possible, can we do the following if the backward incompatibility is inevitable in the future PR:

  • tag the PR with incompatible label, and at least ask for more reviewers from different companies (like StarTree, LinkedIn, Uber, etc).
  • add the migration plan/action items to the PR if it's marked as backward incompatible, so that same steps can be followed by different companies to avoid anything unexpected.

By following this steps we can at least have someone to keep an eye on the incompatible changes from multiple stakeholders and have consensus promptly instead of merging it silently. Again, we really appreciate the contribution here and look forward to building the great milestones to come together! Thanks!

@Jackie-Jiang
Copy link
Contributor Author

@jackjlli Thanks for the suggestion! Going forward, will add a section to list all the public API changes in the PR description.
The backward incompatible changes (non API changes) are more critical because it might be hard to catch. We should always handle backward compatibility for consecutive releases, and use multiple releases to maintain backward compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix incompatible Indicate PR that introduces backward incompatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants