Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support functions in Data Prepper expressions #2626 #2644

Merged
merged 10 commits into from
May 11, 2023

Conversation

kkondaka
Copy link
Collaborator

@kkondaka kkondaka commented May 5, 2023

Description

Adds supports for functions in expressions.

Only length() function is supported initially.

Resolves #2626 #2639

Issues Resolved

#2626
#2639

Check List

  • [ X] New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
public Object applyFunction(final String functionName, final String[] args, Event event) {
// Supports length(String) and length(JsonPointer)
// For example., length("abcd"), length("/keyName")
if (functionName.equals("length")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take a factory-based approach similar to what we did for operators. See the OperatorProvider class and how it is used.

At a high-level, The OperatorProvider takes a List<Operator<?>> as input. Spring will inject this list. It builds it up from all the actual Operator implementations (see AndOperator as one example). Because they are @Named, they are injected into the application context and thus become part of the List<Operator<?>>.

I think the interface might look something like the following:

interface ExpressionFunction<T> {
    int getNumberOfArguments();
    String getFunctionName();
    T evaluate(final Object ... args);
}

Copy link
Collaborator Author

@kkondaka kkondaka May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable Not sure if this suggestion works for functions unless we plan to put function names in the grammar. Is that what you are suggesting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline. No, this will not be a change to the grammar. This will use the Spring application context to get the List<ExpressionFunction>.

A class like this should work:

@Named
public class ExpressionFunctionProvider {

    private final Map<String, ExpressionFunction> functionsMap;

    @Inject
    public ExpressionFunctionProvider(List<ExpressionFunction> expressionFunctions) {
        functionsMap = expressionFunctions.stream().collect(Collectors.toMap(e -> e.getFunctionName(), e -> e));
    }
    
    public Object provideFunction(String functionName, String[] args, Event event) {
        return functionsMap.get(functionName);
    }
}

@dlvenable
Copy link
Member

This should also be resolving: #2639

…rovider and implementation

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
@codecov-commenter
Copy link

codecov-commenter commented May 9, 2023

Codecov Report

Merging #2644 (1b3d76d) into main (94974df) will increase coverage by 0.12%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main    #2644      +/-   ##
============================================
+ Coverage     93.45%   93.57%   +0.12%     
- Complexity     2185     2238      +53     
============================================
  Files           258      261       +3     
  Lines          6107     6275     +168     
  Branches        497      519      +22     
============================================
+ Hits           5707     5872     +165     
+ Misses          268      266       -2     
- Partials        132      137       +5     
Impacted Files Coverage Δ
...prepper/expression/ExpressionFunctionProvider.java 100.00% <100.00%> (ø)
...taprepper/expression/LengthExpressionFunction.java 100.00% <100.00%> (ø)
...taprepper/expression/ParseTreeCoercionService.java 100.00% <100.00%> (ø)

... and 12 files with indirect coverage changes

Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should have a small integration test within the data-prepper-expression package to verify behavior end-to-end.

You may be able to get some ideas on this from ConditionalExpressionEvaluatorIT.

if (args.size() != 1) {
throw new RuntimeException("length() takes only one argument");
}
Object arg = args.get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently evaluating a string directly, right?

In the description for #2639, we are looking to evaluate a JSON Path within an Event.

With the current implementation:

length("/log_level")

Output: 10

I believe we'd want this to evaluate as such:

Event:

{"log_level": "INFO"}

Output 4

Event:

{"log_level": "ERROR"}

Output 5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The JsonPointers are already resolved before passing them to this function. So, it would work as you mentioned.

* @return the result of function evaluation
* @since 2.3
*/
Object evaluate(final List<Object> args);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include Event? See my comments below on how to evaluate length.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. JsonPointers are already resolved when this is called.

return new LengthExpressionFunction();
}

@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this an @ParameterizedTest with a few values: 0, 1, 2, 5, 20.

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
List<Object> argList = new ArrayList<>();
for (final String arg: args) {
String trimmedArg = arg.trim();
if (trimmedArg.charAt(0) == '/') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I see now that this is where you resolve the JSON Pointer.

I think we should let the function decide which arguments are JSON Pointers and which are not. For example, the length function should state that the first argument is a JSON Pointer.

Also, we generally support a simpler JSON Pointer syntax without the leading slash if you are getting from the root object. So, this logic will not work for this situation.

Also, see that for CIDR checks, we want to include arguments which are not part of the event. #2625 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on what is "simpler JSON Pointer"? It looks like grammar only supports JSON pointer with forward slash.

Regarding CIDR checks, Could you provide some example usecases? Are you referring to issues like #2625? To any function that may be dealing with CIDR blocks, it would get a string argument like 192.168.10.* and it is up to the function to convert it to network addresses. The grammar that I defined in this PR, only allows string or JsonPointer arguments. JsonPointer argument will be looked in the event and the resultant value can be of any "type". That's why the argument list is a list of Objects and not list of Strings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, I think we need to pass event to evaluate because we may have to use the string arguments as an argument to event tags or event metadata but resolving JsonPointer in ParseTreeCoercionService still makes sense because it avoids code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on what is "simpler JSON Pointer"? It looks like grammar only supports JSON pointer with forward slash.

Yes, that is what I meant. I'm sorry about the confusion. I know we support this in the Event model (see this test case), but perhaps not in the grammar.


I think that the function itself needs to know what its arguments are and how to interpret them.

Consider the possibility of a hasSubstring method defined as hasSubstring(eventPointer, substring).

We can't be sure how pipeline authors would use it. Perhaps they have the following:

hasSubstring("/file", "/usr/var/linux/path");

This is a reasonable scenario. However, the current code would interpret that the second argument should be a JSON Pointer.


On second thoughts, I think we need to pass event to evaluate because we may have to use the string arguments as an argument to event tags or event metadata

This is a good point. Indeed, getTags() will have not arguments (at least as initially planned).

but resolving JsonPointer in ParseTreeCoercionService still makes sense because it avoids code duplication.

I think we should look to see how we can generalize resolveJsonPointerValue. One simple approach could be to also pass a BiFunction<String, Event, Object> into the evaluate method.

return expressionFunctionProvider.provideFunction(functionName, event, argList, this::resolveJsonPointerValue)

Another option is an Event decorator so that we could call Event::get, but that might be a larger undertaking.

@@ -160,11 +165,16 @@ private static Stream<Arguments> validExpressionArguments() {
complexEvent(ALL_JACKSON_EVENT_GET_SUPPORTED_CHARACTERS, true),
true),
Arguments.of("/durationInNanos > 5000000000", event("{\"durationInNanos\": 6000000000}"), true),
Arguments.of("/response == \"OK\"", event("{\"response\": \"OK\"}"), true)
Arguments.of("/response == \"OK\"", event("{\"response\": \"OK\"}"), true),
Arguments.of("length(/response) == "+testStringLength, event("{\"response\": \""+testString+"\"}"), true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm glad this works.

@dlvenable
Copy link
Member

Also, there is a test failure:

LengthExpressionFunctionTest > testWithOneStringArgumentWithOutQuotes(int) > org.opensearch.dataprepper.expression.LengthExpressionFunctionTest.testWithOneStringArgumentWithOutQuotes(int)[1] FAILED
    java.lang.StringIndexOutOfBoundsException at LengthExpressionFunctionTest.java:30

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>

public Object provideFunction(final String functionName, final List<Object> argList) {
if (!expressionFunctionsMap.containsKey(functionName)) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning null is an anti pattern.

I would recommend we throw and exception here that the function does not exist rather than rely on null checks or have to debug a NPE down the road.

import javax.inject.Named;

@Named
public class LengthExpressionFunction implements ExpressionFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be sure to add documentation for this new feature here: https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also create a GitHub issue in this repo for us to update the syntax in the user-facing documents.

https://github.com/opensearch-project/documentation-website

@@ -183,7 +193,8 @@ private static Stream<Arguments> invalidExpressionArguments() {
Arguments.of("not null", event("{}")),
Arguments.of("not/status_code", event("{\"status_code\": 200}")),
Arguments.of("trueand/status_code", event("{\"status_code\": 200}")),
Arguments.of("trueor/status_code", event("{\"status_code\": 200}"))
Arguments.of("trueor/status_code", event("{\"status_code\": 200}")),
Arguments.of("length(\""+testString+") == "+testStringLength, event("{\"response\": \""+testString+"\"}"), true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test which uses this list of args takes two arguments. You have three on this line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Thanks!

Comment on lines 29 to 33
if (argStr.charAt(0) == '\"') {
if (argStr.charAt(argStr.length()-1) != '\"') {
throw new RuntimeException("Invalid string passed to length() "+argStr);
}
return Integer.valueOf(argStr.length()-2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does excluding the extra quotes make sense to the user excluding only the first set of extra quotes? How would this work for string data like: "\"\"\"\"foobar\"\"\"\""?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work like Java Strings. If the input is "abcd", the length would be 4. If the input is ""abcd"", the length would be 6.

…e common infra

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appear to be some situations not covered by the tests and the build is failing.

[ant:jacocoReport] Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

Please address those and the comment from @cmanning09 . Aside from that, this looks good to me.

if (argStr.length() == 0) {
return 0;
}
if (argStr.charAt(0) == '\"') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we should support?

I'm not sure it adds a lot of value. And it could lead to confusing situations for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to think we should start simply by assuming that the first (and only) argument must be a JSON Pointer. If it is not, the expression should fail.

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
…er expressions

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
dlvenable
dlvenable previously approved these changes May 11, 2023
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this!

Currently the following functions are supported
* `length`
- takes one argument of string or JsonPointer type
- returns the length of the string if the input is of the string type. For example, `length("abcd")` returns 4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not supporting literals at the moment. The argument must be a jsonPointer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I have updated the doc.

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
@kkondaka kkondaka merged commit a63e5d8 into opensearch-project:main May 11, 2023
@kkondaka kkondaka deleted the expr-functions branch July 13, 2023 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support functions in Data Prepper expressions
4 participants