-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Base64 Encode/Decode Scalar Functions #9114
Conversation
In the description, can we add a short overview of the issue being addressed along with a summary of changes. |
pinot-common/src/test/java/org/apache/pinot/sql/parsers/CalciteSqlCompilerTest.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java
Outdated
Show resolved
Hide resolved
0f550c9
to
577c35e
Compare
93eab25
to
cb9007b
Compare
Codecov Report
@@ Coverage Diff @@
## master #9114 +/- ##
============================================
- Coverage 69.96% 66.96% -3.01%
- Complexity 4749 4800 +51
============================================
Files 1838 1378 -460
Lines 97350 70889 -26461
Branches 14672 11211 -3461
============================================
- Hits 68113 47469 -20644
+ Misses 24495 20018 -4477
+ Partials 4742 3402 -1340
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. |
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java
Outdated
Show resolved
Hide resolved
...roker/src/test/java/org/apache/pinot/broker/requesthandler/LiteralOnlyBrokerRequestTest.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java
Outdated
Show resolved
Hide resolved
pinot-common/src/test/java/org/apache/pinot/sql/parsers/CalciteSqlCompilerTest.java
Outdated
Show resolved
Hide resolved
...roker/src/test/java/org/apache/pinot/broker/requesthandler/LiteralOnlyBrokerRequestTest.java
Show resolved
Hide resolved
ca55cbd
to
7a40588
Compare
*/ | ||
@ScalarFunction | ||
public static String binaryToBase64(byte[] input) { | ||
return Base64.getEncoder().encodeToString(input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expectation on charset of input bytes / binary data ? Looks like we assume it is UTF8. What if the input bytes provided here was encoded using UTF16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved - we will not have a fixed charset of input bytes now with the new implementation. For example, if user wants to encode the String "hello!", he/she can choose to do toBase64(toUtf8("hello"))
or toBase64(toUtf16("hello"))
(Pinot currently only supports toUtf8 and toASCII functions). During decoding, he/she can do fromUtf8(fromBase64("fvduivheui"))
to fromUtf16(fromBase64("fvduivheui"))
to get "hello!" depending on if they used UTF8 or UTF16 when they encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we error if the user passes a string to the toBase64
function, or some implicit type casting will happen? MySQL users might not expect the syntax of toBase64(toUtf8("hello"))
. Similarly, if the user calls on fromBase64('aGVsbG8h')
, they might not expecting us to return hex string. We should at least call this out in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved - added tests for calling toBase64 on a string. Will add in documentation that BYTES col will be represented in HEX string(also link:https://docs.pinot.apache.org/users/user-guide-query/querying-pinot#bytes-column).
* @return decoded string | ||
*/ | ||
@ScalarFunction | ||
public static String fromBase64(String input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are providing a toBase64
with a user specified charset encoding and a default utf8 charset flavor then shouldn't we also provide fromBase64
in both flavors as well ?
What if someone calls toBase64("blahblah", "UTF16")
to get "something" and then fromBase64("something")
- this will convert "something" to bytes and then to String using UTF8 which can be wrong because "something" was created from "blahblah" using UTF16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved - fromBase64
will decode into byte[] (the layer from byte[] to String is removed now) and the user can decide to encode the byte[] to UTF8 strings or other charsets by calling other functions such as fromUtf8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution
@SabrinaZhaozyf - please update the PR description to reflect the latest discussion / approach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor questions regarding invalid input / unexpected usage.
...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java
Show resolved
Hide resolved
*/ | ||
@ScalarFunction | ||
public static String binaryToBase64(byte[] input) { | ||
return Base64.getEncoder().encodeToString(input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we error if the user passes a string to the toBase64
function, or some implicit type casting will happen? MySQL users might not expect the syntax of toBase64(toUtf8("hello"))
. Similarly, if the user calls on fromBase64('aGVsbG8h')
, they might not expecting us to return hex string. We should at least call this out in the documentation.
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java
Outdated
Show resolved
Hide resolved
@SabrinaZhaozyf - let's user docs to gitbook please |
Hello, was this added into the docs? I cannot find it in https://docs.pinot.apache.org/configuration-reference/functions |
@npawar Changes to user doc has been merged here (pinot-contrib/pinot-docs#109). Not sure why it hasn't been updated on documentation yet. |
Oh i see. lemme check on that |
@npawar - User docs were added some time ago by @SabrinaZhaozyf Just linking them here for reference - https://docs.pinot.apache.org/configuration-reference/functions/base64 |
Label =
feature
This PR implements
toBase64(byte[])
andfromBase64(str)
encoding and decoding functionality.Use Case Examples
toBase64(binaryCol)
toBase64(toUtf8('hello'))
,toBase64(toUtf8(stringCol))
fromBase64('aGVsbG8h')
,fromBase64(stringCol)
fromUtf8(fromBase64('aGVsbG8h'))