Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support IsDistinctFrom and IsNotDistinctFrom #9312

Merged
merged 11 commits into from
Sep 14, 2022

Conversation

61yao
Copy link
Contributor

@61yao 61yao commented Aug 31, 2022

The operators IsDistinctFrom and IsNotDistinctFrom only supports column names as argument for now.
When null option is disabled, the row is considered as not null by default.
Expected value:
Null is IsDistinctFrom ValueA: True
Null is IsDistinctFrom Null: False
ValueA is IsDistinctFrom ValueB: NotEquals(ValueA, ValueB)
Null is IsNotDistinctFrom ValueA: False
Null is IsNotDistinctFrom Null: True
ValueA is IsNotDistinctFrom ValueB: Equals(ValueA, ValueB)

Example Usage:
ColumnA IsDistinctFrom ColumnB
ColumnA IsNotDistinctFrom ColumnB

@61yao 61yao force-pushed the is_distinct_from branch 2 times, most recently from 3b1e21a to cffbec4 Compare August 31, 2022 22:35
@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2022

Codecov Report

Merging #9312 (4610d4f) into master (ff2a333) will decrease coverage by 38.56%.
The diff coverage is 64.10%.

@@              Coverage Diff              @@
##             master    #9312       +/-   ##
=============================================
- Coverage     63.40%   24.84%   -38.57%     
+ Complexity     4762       53     -4709     
=============================================
  Files          1832     1876       +44     
  Lines         98146   100105     +1959     
  Branches      15020    15251      +231     
=============================================
- Hits          62231    24871    -37360     
- Misses        31321    72701    +41380     
+ Partials       4594     2533     -2061     
Flag Coverage Δ
integration2 24.84% <64.10%> (?)
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...form/function/BinaryOperatorTransformFunction.java 20.61% <ø> (-19.25%) ⬇️
...nsform/function/DistinctFromTransformFunction.java 54.83% <54.83%> (ø)
...e/pinot/common/function/TransformFunctionType.java 88.42% <100.00%> (-11.58%) ⬇️
...form/function/IsDistinctFromTransformFunction.java 100.00% <100.00%> (ø)
...m/function/IsNotDistinctFromTransformFunction.java 100.00% <100.00%> (ø)
...r/transform/function/TransformFunctionFactory.java 88.35% <100.00%> (+1.55%) ⬆️
...in/java/org/apache/pinot/spi/utils/BytesUtils.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/spi/utils/BooleanUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/BaseRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/NoOpRecording.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1537 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@61yao 61yao force-pushed the is_distinct_from branch 5 times, most recently from da884fe to 1213bf8 Compare September 6, 2022 18:38
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Sep 7, 2022
@61yao 61yao force-pushed the is_distinct_from branch 4 times, most recently from e4589f6 to 37724d7 Compare September 8, 2022 18:09
@61yao 61yao force-pushed the is_distinct_from branch 2 times, most recently from eb5275d to 5237004 Compare September 10, 2022 05:57
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

/**
* @param distinctType is set to true for IsDistinctFrom, otherwise it is for IsNotDistinctFrom.
*/
protected DistinctFromTransformFunction(boolean distinctType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected DistinctFromTransformFunction(boolean distinctType) {
protected DistinctFromTransformFunction(boolean distinct) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

leftNull.forEach((IntConsumer) i -> _results[i] = _distinctType);
return _results;
}
RoaringBitmap xorNull = RoaringBitmap.xor(leftNull, rightNull);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be faster if we iterate in the following way:

  • Iterate over left null and set _distinctResult
  • Iterate over right null and set _distinctResult
  • Iterate over left AND right null and set _notDistinctResult

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to iterate through xor and and bitset

@walterddr walterddr merged commit d2b65db into apache:master Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants