Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not allow implicit cast for BOOLEAN and TIMESTAMP #9385

Merged
merged 1 commit into from
Sep 19, 2022

Conversation

Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Sep 12, 2022

We have observed performance regression after #9287 where we try to support implicit conversion from BOOLEAN and TIMESTAMP to STRING. Several implicit conversions to BOOLEAN and TIMESTAMP is still not supported.
This PR removes the implicit conversion support for BOOLEAN and TIMESTAMP, and enhances CAST function to support all explicit conversions. It also includes several bugfixes and cleanups related to type conversion.
To fully support implicit conversion for BOOLEAN and TIMESTAMP without big overhead, we need to add read APIs for them (similar to how we handle BIG_DECIMAL. We will address that separately.

Release Notes

Implicit conversion from/to BOOLEAN and TIMESTAMP is not supported. The conversion can be done explicitly with CAST transform function

@Jackie-Jiang Jackie-Jiang added enhancement backward-incompat Referenced by PRs that introduce or fix backward compat issues labels Sep 12, 2022
@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Sep 13, 2022
Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me overall. several questions

  1. is this backward incompatible, should we add release note?
  2. can we write a detail rules on what implicit type case are supported and what are not. from my understanding it is now it only support between any number types, yes?

@@ -43,6 +43,7 @@ private DataTypeConversionFunctions() {
public static Object cast(Object value, String targetTypeLiteral) {
try {
Class<?> clazz = value.getClass();
// TODO: Support cast for MV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you meant element-wise? how would the semantic work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is supported in the transform function, but not in the scalar function

Comment on lines 235 to 245
testFetchStringValues(INT_COLUMN);
testFetchStringValues(LONG_COLUMN);
testFetchStringValues(FLOAT_COLUMN);
testFetchStringValues(DOUBLE_COLUMN);
testFetchStringValues(BIG_DECIMAL_COLUMN);
testFetchStringValues(STRING_COLUMN);
testFetchStringValues(NO_DICT_INT_COLUMN);
testFetchStringValues(NO_DICT_LONG_COLUMN);
testFetchStringValues(NO_DICT_FLOAT_COLUMN);
testFetchStringValues(NO_DICT_DOUBLE_COLUMN);
testFetchStringValues(NO_DICT_BIG_DECIMAL_COLUMN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand implicit type cast from STRING to NUMBER is not a good practice. but do we want to support the otherway around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think implicit type cast from other type to STRING is good practice either, and we have observed performance regression doing it in #9287

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. good to add release note and backward incompatible tags to explain

import static org.testng.Assert.assertNull;
import static org.testng.Assert.assertThrows;
import static org.testng.Assert.assertTrue;
import static org.testng.Assert.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import Assert instead of star import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually import Assert.* in test to simplify the code

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm mostly.

Copy link
Member

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reference available that talks about the restriction on implicit cast between them? If yes, can you link it to the description section of this PR?

We might need to carefully monitor the deployment in order to detect any existing use cases once this PR is merged.

@walterddr
Copy link
Contributor

Is there any reference available that talks about the restriction on implicit cast between them? If yes, can you link it to the description section of this PR?

We might need to carefully monitor the deployment in order to detect any existing use cases once this PR is merged.

yes I was suggesting either a release note or a backward incompatible section in the PR but also in Pinot doc.

In terms of the rationale, AFAIK,

  • nothing should be implicitly cast in logical plan. what we do NOW is b/c:

    • if a function expects foo(int, long) then they data type should be exact
    • however many system implements overload foo(int, int), foo(int, long), foo(long, long) ... so it will always find one properly. Pinot doesn't support these type of operator/function overloading thus we need to provide implicit cast.
  • many logical planner insert explicit CAST operator if the type doesn't match in user's provided SQL, for example:

    • SELECT intCol + strCol FROM tbl will result in a
LogicalProject(
  [+(intCol, CAST(strCol, INT))], 
  LogicalScan(
    tbl, 
    [intCol, strCol]
  )
)

so IMO as long as we state clearly in our user-facing implicit cast rule in doc we are good. I am incline to the following:

  • all numeric should have implicit cast and (1) throw exception (2) put MAX/MIN_VAL, or NaN when overflow
  • all DataType sharing the same StorageType should be able to implicitly cast between (for example JSON / STRING)
  • all other should not have default implicit cast rule

@Jackie-Jiang
Copy link
Contributor Author

Jackie-Jiang commented Sep 15, 2022

@walterddr @jackjlli Seems different DB have different implicit conversion behavior between number and string:
Presto: not allow - https://prestodb.io/docs/current/functions/conversion.html
MySQL: allow - https://dev.mysql.com/doc/refman/8.0/en/type-conversion.html
SQL Server: allow - https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine?view=sql-server-ver16

Let me see if we can support it without paying too much overhead

@Jackie-Jiang Jackie-Jiang changed the title Do not allow implicit cast between number, string, binary Do not allow implicit cast for BOOLEAN and TIMESTAMP Sep 16, 2022
@Jackie-Jiang Jackie-Jiang added bugfix and removed backward-incompat Referenced by PRs that introduce or fix backward compat issues labels Sep 16, 2022
@Jackie-Jiang
Copy link
Contributor Author

After some discussion, decided to keep the existing supported conversions, but revert the new added #9287 which causes the performance regression.

@codecov-commenter
Copy link

codecov-commenter commented Sep 17, 2022

Codecov Report

Merging #9385 (ef1a234) into master (6bf11c8) will increase coverage by 0.03%.
The diff coverage is 58.25%.

@@             Coverage Diff              @@
##             master    #9385      +/-   ##
============================================
+ Coverage     69.73%   69.76%   +0.03%     
+ Complexity     5093     4792     -301     
============================================
  Files          1890     1890              
  Lines        100754   100858     +104     
  Branches      15350    15333      -17     
============================================
+ Hits          70259    70365     +106     
- Misses        25499    25522      +23     
+ Partials       4996     4971      -25     
Flag Coverage Δ
integration1 25.94% <17.75%> (+<0.01%) ⬆️
integration2 24.72% <17.75%> (-0.08%) ⬇️
unittests1 67.11% <58.25%> (+0.11%) ⬆️
unittests2 15.36% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...n/function/scalar/DataTypeConversionFunctions.java 52.38% <ø> (ø)
...ator/transform/function/CaseTransformFunction.java 61.83% <0.00%> (ø)
...main/java/org/apache/pinot/spi/data/FieldSpec.java 81.33% <ø> (ø)
...r/transform/function/ValueInTransformFunction.java 81.81% <20.00%> (ø)
...ava/org/apache/pinot/spi/utils/ArrayCopyUtils.java 68.36% <43.85%> (+4.40%) ⬆️
...ator/transform/function/BaseTransformFunction.java 79.71% <50.81%> (+8.67%) ⬆️
...ator/transform/function/CastTransformFunction.java 57.00% <55.26%> (-19.92%) ⬇️
...or/transform/function/LookupTransformFunction.java 77.10% <75.67%> (-2.07%) ⬇️
...java/org/apache/pinot/core/common/DataFetcher.java 88.97% <84.61%> (+11.15%) ⬆️
...orm/function/LogicalOperatorTransformFunction.java 95.83% <100.00%> (+4.52%) ⬆️
... and 47 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Jackie-Jiang Jackie-Jiang merged commit 985d0b5 into apache:master Sep 19, 2022
@Jackie-Jiang Jackie-Jiang deleted the implicit_cast branch September 19, 2022 23:14
61yao pushed a commit to 61yao/pinot that referenced this pull request Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix enhancement release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants