Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support functions in Data Prepper expressions #2626 #2644

Merged
merged 10 commits into from
May 11, 2023
1 change: 1 addition & 0 deletions data-prepper-expression/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ dependencies {
implementation 'org.apache.logging.log4j:log4j-core'
implementation 'org.apache.logging.log4j:log4j-slf4j2-impl'
testImplementation testLibs.spring.test
testImplementation "org.apache.commons:commons-lang3:3.12.0"
testImplementation 'com.fasterxml.jackson.core:jackson-databind'
}

Expand Down
26 changes: 24 additions & 2 deletions data-prepper-expression/src/main/antlr/DataPrepperExpression.g4
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,14 @@ setInitializer
: LBRACE primary (SET_DELIMITER primary)* RBRACE
;


unaryOperator
: NOT
| SUBTRACT
;

primary
: jsonPointer
| function
| variableIdentifier
| setInitializer
| literal
Expand All @@ -104,6 +104,25 @@ jsonPointer
| EscapedJsonPointer
;

function
: Function
;

Function
: JsonPointerCharacters LPAREN FunctionArgs RPAREN
;

fragment
FunctionArgs
: (FunctionArg SPACE* COMMA)* SPACE* FunctionArg
;

fragment
FunctionArg
: JsonPointer
| String
;

variableIdentifier
: variableName
;
Expand Down Expand Up @@ -227,7 +246,10 @@ EscapeSequence
: '\\' [btnfr"'\\$]
;

SET_DELIMITER : ',';
SET_DELIMITER
: COMMA
;
COMMA : ',';
EQUAL : '==';
NOT_EQUAL : '!=';
LT : '<';
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import org.opensearch.dataprepper.model.event.Event;
import java.util.List;
import java.util.function.Function;

interface ExpressionFunction {
/**
* @return function name
* @since 2.3
*/
String getFunctionName();

/**
* evaluates the function and returns the result
* @param args list of arguments to the function
* @return the result of function evaluation
* @since 2.3
*/
Object evaluate(final List<Object> args, Event event, Function<Object, Object> convertLiteralType);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import org.opensearch.dataprepper.model.event.Event;
import javax.inject.Named;
import javax.inject.Inject;
import java.util.Map;
import java.util.List;
import java.util.stream.Collectors;
import java.util.function.Function;

@Named
public class ExpressionFunctionProvider {
private Map<String, ExpressionFunction> expressionFunctionsMap;

@Inject
public ExpressionFunctionProvider(final List<ExpressionFunction> expressionFunctions) {
expressionFunctionsMap = expressionFunctions.stream().collect(Collectors.toMap(e -> e.getFunctionName(), e -> e));
}

public Object provideFunction(final String functionName, final List<Object> argList, Event event, Function<Object, Object> convertLiteralType) {
if (!expressionFunctionsMap.containsKey(functionName)) {
throw new RuntimeException("Unknown function in the expression");
}
return expressionFunctionsMap.get(functionName).evaluate(argList, event, convertLiteralType);
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import org.opensearch.dataprepper.model.event.Event;
import java.util.List;
import javax.inject.Named;
import java.util.function.Function;

@Named
public class LengthExpressionFunction implements ExpressionFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be sure to add documentation for this new feature here: https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also create a GitHub issue in this repo for us to update the syntax in the user-facing documents.

https://github.com/opensearch-project/documentation-website

public String getFunctionName() {
return "length";
}

public Object evaluate(final List<Object> args, Event event, Function<Object, Object> convertLiteralType) {
if (args.size() != 1) {
throw new RuntimeException("length() takes only one argument");
}
Object arg = args.get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently evaluating a string directly, right?

In the description for #2639, we are looking to evaluate a JSON Path within an Event.

With the current implementation:

length("/log_level")

Output: 10

I believe we'd want this to evaluate as such:

Event:

{"log_level": "INFO"}

Output 4

Event:

{"log_level": "ERROR"}

Output 5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The JsonPointers are already resolved before passing them to this function. So, it would work as you mentioned.

if (!(arg instanceof String)) {
throw new RuntimeException("length() takes only String type arguments");
}
String argStr = ((String)arg).trim();
if (argStr.length() == 0) {
return 0;
}
if (argStr.charAt(0) == '\"') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we should support?

I'm not sure it adds a lot of value. And it could lead to confusing situations for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to think we should start simply by assuming that the first (and only) argument must be a JSON Pointer. If it is not, the expression should fail.

throw new RuntimeException("Literal strings not supported as arguments to length()");
} else {
// argStr must be JsonPointer
final Object value = event.get(argStr, Object.class);
if (value == null) {
return null;
}

if (!(value instanceof String)) {
throw new RuntimeException(argStr + " is not String type");
}
return Integer.valueOf(((String)value).length());
}
}
}

Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,54 @@
import javax.inject.Named;
import java.io.Serializable;
import java.util.Map;
import java.util.List;
import java.util.ArrayList;
import java.util.function.Function;

@Named
class ParseTreeCoercionService {
private final Map<Class<? extends Serializable>, Function<Object, Object>> literalTypeConversions;
private ExpressionFunctionProvider expressionFunctionProvider;
private Function<Object, Object> convertLiteralType;

@Inject
public ParseTreeCoercionService(final Map<Class<? extends Serializable>, Function<Object, Object>> literalTypeConversions) {
public ParseTreeCoercionService(final Map<Class<? extends Serializable>, Function<Object, Object>> literalTypeConversions, ExpressionFunctionProvider expressionFunctionProvider) {
this.literalTypeConversions = literalTypeConversions;
convertLiteralType = (value) -> {
if (literalTypeConversions.containsKey(value.getClass())) {
return literalTypeConversions.get(value.getClass()).apply(value);
} else {
throw new ExpressionCoercionException("Unsupported type for value " + value);
}
};
this.expressionFunctionProvider = expressionFunctionProvider;
}

public Object coercePrimaryTerminalNode(final TerminalNode node, final Event event) {
final int nodeType = node.getSymbol().getType();
final String nodeStringValue = node.getText();
switch (nodeType) {
case DataPrepperExpressionParser.Function:
final int funcNameIndex = nodeStringValue.indexOf("(");
final String functionName = nodeStringValue.substring(0, funcNameIndex);
final int argsEndIndex = nodeStringValue.indexOf(")", funcNameIndex);
final String argsStr = nodeStringValue.substring(funcNameIndex+1, argsEndIndex);
final String[] args = argsStr.split(",");
List<Object> argList = new ArrayList<>();
for (final String arg: args) {
String trimmedArg = arg.trim();
if (trimmedArg.charAt(0) == '/') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I see now that this is where you resolve the JSON Pointer.

I think we should let the function decide which arguments are JSON Pointers and which are not. For example, the length function should state that the first argument is a JSON Pointer.

Also, we generally support a simpler JSON Pointer syntax without the leading slash if you are getting from the root object. So, this logic will not work for this situation.

Also, see that for CIDR checks, we want to include arguments which are not part of the event. #2625 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on what is "simpler JSON Pointer"? It looks like grammar only supports JSON pointer with forward slash.

Regarding CIDR checks, Could you provide some example usecases? Are you referring to issues like #2625? To any function that may be dealing with CIDR blocks, it would get a string argument like 192.168.10.* and it is up to the function to convert it to network addresses. The grammar that I defined in this PR, only allows string or JsonPointer arguments. JsonPointer argument will be looked in the event and the resultant value can be of any "type". That's why the argument list is a list of Objects and not list of Strings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, I think we need to pass event to evaluate because we may have to use the string arguments as an argument to event tags or event metadata but resolving JsonPointer in ParseTreeCoercionService still makes sense because it avoids code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on what is "simpler JSON Pointer"? It looks like grammar only supports JSON pointer with forward slash.

Yes, that is what I meant. I'm sorry about the confusion. I know we support this in the Event model (see this test case), but perhaps not in the grammar.


I think that the function itself needs to know what its arguments are and how to interpret them.

Consider the possibility of a hasSubstring method defined as hasSubstring(eventPointer, substring).

We can't be sure how pipeline authors would use it. Perhaps they have the following:

hasSubstring("/file", "/usr/var/linux/path");

This is a reasonable scenario. However, the current code would interpret that the second argument should be a JSON Pointer.


On second thoughts, I think we need to pass event to evaluate because we may have to use the string arguments as an argument to event tags or event metadata

This is a good point. Indeed, getTags() will have not arguments (at least as initially planned).

but resolving JsonPointer in ParseTreeCoercionService still makes sense because it avoids code duplication.

I think we should look to see how we can generalize resolveJsonPointerValue. One simple approach could be to also pass a BiFunction<String, Event, Object> into the evaluate method.

return expressionFunctionProvider.provideFunction(functionName, event, argList, this::resolveJsonPointerValue)

Another option is an Event decorator so that we could call Event::get, but that might be a larger undertaking.

argList.add(trimmedArg);
} else if (trimmedArg.charAt(0) == '"') {
if (trimmedArg.charAt(trimmedArg.length()-1) != '"') {
throw new RuntimeException("Invalid string argument. Missing double quote at the end");
}
argList.add(trimmedArg);
} else {
throw new RuntimeException("Unsupported type passed as function argument");
}
}
return expressionFunctionProvider.provideFunction(functionName, argList, event, convertLiteralType);
case DataPrepperExpressionParser.EscapedJsonPointer:
final String jsonPointerWithoutQuotes = nodeStringValue.substring(1, nodeStringValue.length() - 1);
return resolveJsonPointerValue(jsonPointerWithoutQuotes, event);
Expand Down Expand Up @@ -65,10 +98,7 @@ private Object resolveJsonPointerValue(final String jsonPointer, final Event eve
final Object value = event.get(jsonPointer, Object.class);
if (value == null) {
return null;
} else if (literalTypeConversions.containsKey(value.getClass())) {
return literalTypeConversions.get(value.getClass()).apply(value);
} else {
throw new ExpressionCoercionException("Unsupported type for value " + value);
}
}
return convertLiteralType.apply(value);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.stream.Stream;
import java.util.Random;

import static org.awaitility.Awaitility.await;
import static org.hamcrest.CoreMatchers.equalTo;
Expand All @@ -33,6 +34,7 @@
import static org.hamcrest.CoreMatchers.sameInstance;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.junit.jupiter.api.Assertions.assertThrows;
import org.apache.commons.lang3.RandomStringUtils;

class ConditionalExpressionEvaluatorIT {
/**
Expand Down Expand Up @@ -127,6 +129,9 @@ private static Stream<Arguments> validExpressionArguments() {
.withData(eventMap)
.build();

Random random = new Random();
int testStringLength = random.nextInt(10);
String testString = RandomStringUtils.randomAlphabetic(testStringLength);
return Stream.of(
Arguments.of("true", event("{}"), true),
Arguments.of("/status_code == 200", event("{\"status_code\": 200}"), true),
Expand Down Expand Up @@ -160,11 +165,15 @@ private static Stream<Arguments> validExpressionArguments() {
complexEvent(ALL_JACKSON_EVENT_GET_SUPPORTED_CHARACTERS, true),
true),
Arguments.of("/durationInNanos > 5000000000", event("{\"durationInNanos\": 6000000000}"), true),
Arguments.of("/response == \"OK\"", event("{\"response\": \"OK\"}"), true)
Arguments.of("/response == \"OK\"", event("{\"response\": \"OK\"}"), true),
Arguments.of("length(/response) == "+testStringLength, event("{\"response\": \""+testString+"\"}"), true)
);
}

private static Stream<Arguments> invalidExpressionArguments() {
Random random = new Random();
int testStringLength = random.nextInt(10);
String testString = RandomStringUtils.randomAlphabetic(testStringLength);
return Stream.of(
Arguments.of("/missing", event("{}")),
Arguments.of("/success < /status_code", event("{\"success\": true, \"status_code\": 200}")),
Expand All @@ -183,7 +192,9 @@ private static Stream<Arguments> invalidExpressionArguments() {
Arguments.of("not null", event("{}")),
Arguments.of("not/status_code", event("{\"status_code\": 200}")),
Arguments.of("trueand/status_code", event("{\"status_code\": 200}")),
Arguments.of("trueor/status_code", event("{\"status_code\": 200}"))
Arguments.of("trueor/status_code", event("{\"status_code\": 200}")),
Arguments.of("length(\""+testString+") == "+testStringLength, event("{\"response\": \""+testString+"\"}")),
Arguments.of("length(\""+testString+"\") == "+testStringLength, event("{\"response\": \""+testString+"\"}"))
);
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import org.opensearch.dataprepper.model.event.Event;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.junit.jupiter.MockitoExtension;
import org.apache.commons.lang3.RandomStringUtils;
import static org.hamcrest.CoreMatchers.equalTo;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.lenient;

import java.util.List;
import java.util.function.Function;

@ExtendWith(MockitoExtension.class)
class ExpressionFunctionProviderTest {
private ExpressionFunctionProvider objectUnderTest;
private String testFunctionName;
private Object testResultObject;
private ExpressionFunction expressionFunction;
private Function<Object, Object> testFunction;
private Event testEvent;

public ExpressionFunctionProvider createObjectUnderTest() {
expressionFunction = mock(ExpressionFunction.class);
testFunction = mock(Function.class);
testFunctionName = RandomStringUtils.randomAlphabetic(8);
testEvent = mock(Event.class);
testResultObject = mock(Object.class);
lenient().when(expressionFunction.evaluate(any(List.class), any(Event.class), any(Function.class))).thenReturn(testResultObject);
lenient().when(expressionFunction.getFunctionName()).thenReturn(testFunctionName);

return new ExpressionFunctionProvider(List.of(expressionFunction));
}

@Test
void testUnknownFunction() {
objectUnderTest = createObjectUnderTest();
String unknownFunctionName = RandomStringUtils.randomAlphabetic(8);
assertThrows(RuntimeException.class, () -> objectUnderTest.provideFunction(unknownFunctionName, List.of(), testEvent, testFunction));
}

@Test
void testFunctionBasic() {
objectUnderTest = createObjectUnderTest();
assertThat(objectUnderTest.provideFunction(testFunctionName, List.of(), testEvent, testFunction), equalTo(testResultObject));
}

}
Loading