Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new function decode_base64_utf8 and expr macro #14943

Merged
merged 4 commits into from
Sep 21, 2023

Conversation

pranavbhole
Copy link
Contributor

@pranavbhole pranavbhole commented Sep 6, 2023

Description

Adding a new function called decode_base64_utf8 that takes base64 encoded string, decode it and gives back the utf8 encoded string.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@pranavbhole
Copy link
Contributor Author

meanwhile review, I am adding more tests.
Thank you


import javax.annotation.Nullable;

public class DecodeBase64UTFOperatorConversion implements SqlOperatorConversion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just use DirectOperatorConversion which will save you from having to implement toDruidExpression

if (toDecode.value() == null) {
return ExprEval.of(null);
}
return new StringExpr(new String(StringUtils.decodeBase64String(toDecode.asString()), StandardCharsets.UTF_8)).eval(bindings);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use StringUtils.fromUtf8(StringUtils.decodeBase64String(...))

@@ -144,4 +145,47 @@ public Object getLiteralValue()
}
}
}

public static class StringDecodeBase64UTFExprMacro implements ExprMacroTable.ExprMacro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to implement getOutputType, isLiteral, isNullLiteral, and getLiteralValue similar to ComplexDecodeBase64Expression.

Copy link
Contributor

@somu-imply somu-imply Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General Q, can this be implemented as an extension of UnivariateFunction? But then this is an expression and not exactly a function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseScalarUnivariateMacroFunctionExpr is the macro version of right thing to extend here since this is a single arugment function, +1 for re-using it since it would simplify things a bit here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -44,7 +44,8 @@
public class ExprMacroTable
{
private static final List<ExprMacro> BUILT_IN = ImmutableList.of(
new BuiltInExprMacros.ComplexDecodeBase64ExprMacro()
new BuiltInExprMacros.ComplexDecodeBase64ExprMacro(),
new BuiltInExprMacros.StringDecodeBase64UTFExprMacro()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't necessarily have to be in this PR, but it would be nice to add an alias for ComplexDecodeBase64ExprMacro (and also the SQL layer), perhaps just by extending the class and overriding the usage of NAME so it has consistent naming with this new function, e.g. decode_base64_complex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done registering an AliasedOperatorConversion similar to how SUBSTR or TRUNC is done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would fix the SQL layer, but this is native expression macros, which do not currently have a generic aliasing thing here.

@pranavbhole what are your plans for this aliasing? (I just want to make sure this actually gets done whether or not in this PR, and not lost)

Copy link
Contributor Author

@pranavbhole pranavbhole Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add aliasing in the followup PR.

Copy link
Contributor

@somu-imply somu-imply left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be good to add

  1. a test in CalciteQueryTest
  2. Update website/.spelling with decode_base64_utf8

assertExpr("decode_base64_utf8('aGVsbG8=')", "hello");
assertExpr("decode_base64_utf8('V2hlbiBhbiBvbmlvbiBpcyBjdXQsIGNlcnRhaW4gKGxhY2hyeW1hdG9yKSBjb21wb3VuZHMgYXJlIHJlbGVhc2VkIGNhdXNpbmcgdGhlIG5lcnZlcyBhcm91bmQgdGhlIGV5ZXMgKGxhY3JpbWFsIGdsYW5kcykgdG8gYmVjb21lIGlycml0YXRlZC4=')", "When an onion is cut, certain (lachrymator) compounds are released causing the nerves around the eyes (lacrimal glands) to become irritated.");
assertExpr("decode_base64_utf8('eyJ0ZXN0IjogMX0=')", "{\"test\": 1}");
assertExpr("decode_base64_utf8('')", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. this can be null in SQL compatible mode

@pranavbhole
Copy link
Contributor Author

addressed comments.

@pranavbhole pranavbhole force-pushed the decode_base64_utf8 branch 2 times, most recently from 040c898 to ad20520 Compare September 11, 2023 20:30
@soumyava
Copy link
Contributor

LGTM. One question, is there a way say I have a column of encoded strings. Atm I understand the function works on string values. Not this PR but can we later try to make it work on a column too ? Will that be useful ? Something like SELECT DECODE_BASE64_UTF8("<column>") FROM <table>. Currently this throws an error like

SELECT DECODE_BASE64_UTF8("dim1") FROM foo

Error: RUNTIME_FAILURE (OPERATOR)
Illegal base64 character 2e
java.lang.IllegalArgumentException

So might want to think about a better error message. But need not be done in context of this PR

@pranavbhole
Copy link
Contributor Author

LGTM. One question, is there a way say I have a column of encoded strings. Atm I understand the function works on string values. Not this PR but can we later try to make it work on a column too ? Will that be useful ? Something like SELECT DECODE_BASE64_UTF8("<column>") FROM <table>. Currently this throws an error like

SELECT DECODE_BASE64_UTF8("dim1") FROM foo

Error: RUNTIME_FAILURE (OPERATOR)
Illegal base64 character 2e
java.lang.IllegalArgumentException

So might want to think about a better error message. But need not be done in context of this PR

It works on the column as well, provided you have encoded base64 string value stored in column.
If you column value has any other non base64 chars then it will throw above error.

Copy link
Contributor

@soumyava soumyava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the aliasing will be done in a followup PR

@soumyava soumyava merged commit 883c269 into apache:master Sep 21, 2023
74 checks passed
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants