Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support functions in Data Prepper expressions #2626 #2644

Merged
merged 10 commits into from
May 11, 2023
1 change: 1 addition & 0 deletions data-prepper-expression/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ dependencies {
implementation 'org.apache.logging.log4j:log4j-core'
implementation 'org.apache.logging.log4j:log4j-slf4j2-impl'
testImplementation testLibs.spring.test
testImplementation "org.apache.commons:commons-lang3:3.12.0"
testImplementation 'com.fasterxml.jackson.core:jackson-databind'
}

Expand Down
26 changes: 24 additions & 2 deletions data-prepper-expression/src/main/antlr/DataPrepperExpression.g4
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,14 @@ setInitializer
: LBRACE primary (SET_DELIMITER primary)* RBRACE
;


unaryOperator
: NOT
| SUBTRACT
;

primary
: jsonPointer
| function
| variableIdentifier
| setInitializer
| literal
Expand All @@ -104,6 +104,25 @@ jsonPointer
| EscapedJsonPointer
;

function
: Function
;

Function
: JsonPointerCharacters LPAREN FunctionArgs RPAREN
;

fragment
FunctionArgs
: (FunctionArg SPACE* COMMA)* SPACE* FunctionArg
;

fragment
FunctionArg
: JsonPointer
| String
;

variableIdentifier
: variableName
;
Expand Down Expand Up @@ -227,7 +246,10 @@ EscapeSequence
: '\\' [btnfr"'\\$]
;

SET_DELIMITER : ',';
SET_DELIMITER
: COMMA
;
COMMA : ',';
EQUAL : '==';
NOT_EQUAL : '!=';
LT : '<';
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import java.util.List;

interface ExpressionFunction {
/**
* @return function name
* @since 2.3
*/
String getFunctionName();

/**
* evaluates the function and returns the result
* @param args list of arguments to the function
* @return the result of function evaluation
* @since 2.3
*/
Object evaluate(final List<Object> args);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include Event? See my comments below on how to evaluate length.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. JsonPointers are already resolved when this is called.

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import javax.inject.Named;
import javax.inject.Inject;
import java.util.Map;
import java.util.List;
import java.util.stream.Collectors;

@Named
public class ExpressionFunctionProvider {
private Map<String, ExpressionFunction> expressionFunctionsMap;

@Inject
public ExpressionFunctionProvider(final List<ExpressionFunction> expressionFunctions) {
expressionFunctionsMap = expressionFunctions.stream().collect(Collectors.toMap(e -> e.getFunctionName(), e -> e));
}

public Object provideFunction(final String functionName, final List<Object> argList) {
if (!expressionFunctionsMap.containsKey(functionName)) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning null is an anti pattern.

I would recommend we throw and exception here that the function does not exist rather than rely on null checks or have to debug a NPE down the road.

}
return expressionFunctionsMap.get(functionName).evaluate(argList);
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import java.util.List;
import javax.inject.Named;

@Named
public class LengthExpressionFunction implements ExpressionFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be sure to add documentation for this new feature here: https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also create a GitHub issue in this repo for us to update the syntax in the user-facing documents.

https://github.com/opensearch-project/documentation-website

public String getFunctionName() {
return "length";
}

public Object evaluate(final List<Object> args) {
if (args.size() != 1) {
throw new RuntimeException("length() takes only one argument");
}
Object arg = args.get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently evaluating a string directly, right?

In the description for #2639, we are looking to evaluate a JSON Path within an Event.

With the current implementation:

length("/log_level")

Output: 10

I believe we'd want this to evaluate as such:

Event:

{"log_level": "INFO"}

Output 4

Event:

{"log_level": "ERROR"}

Output 5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The JsonPointers are already resolved before passing them to this function. So, it would work as you mentioned.

if (!(arg instanceof String)) {
throw new RuntimeException("length() takes only String type arguments");
}
String argStr = ((String)arg).trim();
if (argStr.charAt(0) == '\"') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we should support?

I'm not sure it adds a lot of value. And it could lead to confusing situations for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to think we should start simply by assuming that the first (and only) argument must be a JSON Pointer. If it is not, the expression should fail.

if (argStr.charAt(argStr.length()-1) != '\"') {
throw new RuntimeException("Invalid string passed to length() "+argStr);
}
return Integer.valueOf(argStr.length()-2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does excluding the extra quotes make sense to the user excluding only the first set of extra quotes? How would this work for string data like: "\"\"\"\"foobar\"\"\"\""?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work like Java Strings. If the input is "abcd", the length would be 4. If the input is ""abcd"", the length would be 6.

} else {
return Integer.valueOf(argStr.length());
}
}
}

Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,43 @@
import javax.inject.Named;
import java.io.Serializable;
import java.util.Map;
import java.util.List;
import java.util.ArrayList;
import java.util.function.Function;

@Named
class ParseTreeCoercionService {
private final Map<Class<? extends Serializable>, Function<Object, Object>> literalTypeConversions;
private ExpressionFunctionProvider expressionFunctionProvider;

@Inject
public ParseTreeCoercionService(final Map<Class<? extends Serializable>, Function<Object, Object>> literalTypeConversions) {
public ParseTreeCoercionService(final Map<Class<? extends Serializable>, Function<Object, Object>> literalTypeConversions, ExpressionFunctionProvider expressionFunctionProvider) {
this.literalTypeConversions = literalTypeConversions;
this.expressionFunctionProvider = expressionFunctionProvider;
}

public Object coercePrimaryTerminalNode(final TerminalNode node, final Event event) {
final int nodeType = node.getSymbol().getType();
final String nodeStringValue = node.getText();
switch (nodeType) {
case DataPrepperExpressionParser.Function:
final int funcNameIndex = nodeStringValue.indexOf("(");
final String functionName = nodeStringValue.substring(0, funcNameIndex);
final int argsEndIndex = nodeStringValue.indexOf(")", funcNameIndex);
final String argsStr = nodeStringValue.substring(funcNameIndex+1, argsEndIndex);
final String[] args = argsStr.split(",");
List<Object> argList = new ArrayList<>();
for (final String arg: args) {
String trimmedArg = arg.trim();
if (trimmedArg.charAt(0) == '/') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I see now that this is where you resolve the JSON Pointer.

I think we should let the function decide which arguments are JSON Pointers and which are not. For example, the length function should state that the first argument is a JSON Pointer.

Also, we generally support a simpler JSON Pointer syntax without the leading slash if you are getting from the root object. So, this logic will not work for this situation.

Also, see that for CIDR checks, we want to include arguments which are not part of the event. #2625 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on what is "simpler JSON Pointer"? It looks like grammar only supports JSON pointer with forward slash.

Regarding CIDR checks, Could you provide some example usecases? Are you referring to issues like #2625? To any function that may be dealing with CIDR blocks, it would get a string argument like 192.168.10.* and it is up to the function to convert it to network addresses. The grammar that I defined in this PR, only allows string or JsonPointer arguments. JsonPointer argument will be looked in the event and the resultant value can be of any "type". That's why the argument list is a list of Objects and not list of Strings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, I think we need to pass event to evaluate because we may have to use the string arguments as an argument to event tags or event metadata but resolving JsonPointer in ParseTreeCoercionService still makes sense because it avoids code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate more on what is "simpler JSON Pointer"? It looks like grammar only supports JSON pointer with forward slash.

Yes, that is what I meant. I'm sorry about the confusion. I know we support this in the Event model (see this test case), but perhaps not in the grammar.


I think that the function itself needs to know what its arguments are and how to interpret them.

Consider the possibility of a hasSubstring method defined as hasSubstring(eventPointer, substring).

We can't be sure how pipeline authors would use it. Perhaps they have the following:

hasSubstring("/file", "/usr/var/linux/path");

This is a reasonable scenario. However, the current code would interpret that the second argument should be a JSON Pointer.


On second thoughts, I think we need to pass event to evaluate because we may have to use the string arguments as an argument to event tags or event metadata

This is a good point. Indeed, getTags() will have not arguments (at least as initially planned).

but resolving JsonPointer in ParseTreeCoercionService still makes sense because it avoids code duplication.

I think we should look to see how we can generalize resolveJsonPointerValue. One simple approach could be to also pass a BiFunction<String, Event, Object> into the evaluate method.

return expressionFunctionProvider.provideFunction(functionName, event, argList, this::resolveJsonPointerValue)

Another option is an Event decorator so that we could call Event::get, but that might be a larger undertaking.

argList.add(resolveJsonPointerValue(arg, event));
} else if (trimmedArg.charAt(0) == '"') {
argList.add(trimmedArg);
} else {
throw new RuntimeException("Unsupported type passed as function argument");
}
}
return expressionFunctionProvider.provideFunction(functionName, argList);
case DataPrepperExpressionParser.EscapedJsonPointer:
final String jsonPointerWithoutQuotes = nodeStringValue.substring(1, nodeStringValue.length() - 1);
return resolveJsonPointerValue(jsonPointerWithoutQuotes, event);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.junit.jupiter.MockitoExtension;
import org.apache.commons.lang3.RandomStringUtils;
import static org.hamcrest.CoreMatchers.equalTo;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.lenient;

import java.util.List;

@ExtendWith(MockitoExtension.class)
class ExpressionFunctionProviderTest {
private ExpressionFunctionProvider objectUnderTest;
private String testFunctionName;
private Object testResultObject;
private ExpressionFunction expressionFunction;

public ExpressionFunctionProvider createObjectUnderTest() {
expressionFunction = mock(ExpressionFunction.class);
testFunctionName = RandomStringUtils.randomAlphabetic(8);
testResultObject = mock(Object.class);
lenient().when(expressionFunction.evaluate(any(List.class))).thenReturn(testResultObject);
lenient().when(expressionFunction.getFunctionName()).thenReturn(testFunctionName);

return new ExpressionFunctionProvider(List.of(expressionFunction));
}

@Test
void testUnknownFunction() {
objectUnderTest = createObjectUnderTest();
String unknownFunctionName = RandomStringUtils.randomAlphabetic(8);
assertThat(objectUnderTest.provideFunction(unknownFunctionName, List.of()), equalTo(null));
}

@Test
void testFunctionBasic() {
objectUnderTest = createObjectUnderTest();
assertThat(objectUnderTest.provideFunction(testFunctionName, List.of()), equalTo(testResultObject));
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.dataprepper.expression;

import org.junit.jupiter.api.Test;
import static org.hamcrest.CoreMatchers.equalTo;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.junit.jupiter.api.Assertions.assertThrows;
import org.apache.commons.lang3.RandomStringUtils;

import java.util.List;

class LengthExpressionFunctionTest {
private LengthExpressionFunction lengthExpressionFunction;

public LengthExpressionFunction createObjectUnderTest() {
return new LengthExpressionFunction();
}

@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this an @ParameterizedTest with a few values: 0, 1, 2, 5, 20.

void testWithOneStringArgumentWithOutQuotes() {
lengthExpressionFunction = createObjectUnderTest();
String testString = RandomStringUtils.randomAlphabetic(5);
assertThat(lengthExpressionFunction.evaluate(List.of(testString)), equalTo(testString.length()));
}

@Test
void testWithOneStringArgumentWithQuotes() {
lengthExpressionFunction = createObjectUnderTest();
String testString = RandomStringUtils.randomAlphabetic(5);
assertThat(lengthExpressionFunction.evaluate(List.of("\""+testString + "\"")), equalTo(testString.length()));
}

@Test
void testWithTwoArgs() {
lengthExpressionFunction = createObjectUnderTest();
assertThrows(RuntimeException.class, () -> lengthExpressionFunction.evaluate(List.of("arg1", "arg2")));
}

@Test
void testWithNonStringArgument() {
lengthExpressionFunction = createObjectUnderTest();
assertThrows(RuntimeException.class, () -> lengthExpressionFunction.evaluate(List.of(10)));
}

@Test
void testWithInvalidStringArgument() {
lengthExpressionFunction = createObjectUnderTest();
assertThrows(RuntimeException.class, () -> lengthExpressionFunction.evaluate(List.of("\"arg1")));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,11 @@
import org.mockito.junit.jupiter.MockitoExtension;
import org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser;
import org.opensearch.dataprepper.expression.util.TestObject;
import org.apache.commons.lang3.RandomStringUtils;

import java.util.HashMap;
import java.util.Map;
import java.util.List;
import java.util.Random;
import java.util.stream.Stream;

Expand All @@ -31,6 +33,7 @@
import static org.hamcrest.MatcherAssert.assertThat;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.ArgumentMatchers.anyString;
import static org.mockito.Mockito.lenient;
import static org.mockito.Mockito.mock;
Expand All @@ -48,8 +51,9 @@ class ParseTreeCoercionServiceTest {

private final LiteralTypeConversionsConfiguration literalTypeConversionsConfiguration =
new LiteralTypeConversionsConfiguration();
private final ExpressionFunctionProvider expressionFunctionProvider = mock(ExpressionFunctionProvider.class);
private final ParseTreeCoercionService objectUnderTest = new ParseTreeCoercionService(
literalTypeConversionsConfiguration.literalTypeConversions());
literalTypeConversionsConfiguration.literalTypeConversions(), expressionFunctionProvider);

@Test
void testCoerceTerminalNodeStringType() {
Expand Down Expand Up @@ -229,6 +233,42 @@ void testCoerceFailure() {
assertThrows(ExpressionCoercionException.class, () -> objectUnderTest.coerce(testObj, String.class));
}

@Test
void testCoerceTerminalNodeLengthFunction() {
final String key = RandomStringUtils.randomAlphabetic(5);
final String value = RandomStringUtils.randomAlphabetic(10);
final Event testEvent = createTestEvent(Map.of(key, value));
when(terminalNode.getSymbol()).thenReturn(token);
when(terminalNode.getText()).thenReturn("length(/"+key+")");
when(expressionFunctionProvider.provideFunction(eq("length"), any(List.class))).thenReturn(value.length());
when(token.getType()).thenReturn(DataPrepperExpressionParser.Function);
assertThat(objectUnderTest.coercePrimaryTerminalNode(terminalNode, testEvent), equalTo(value.length()));
final String testString = RandomStringUtils.randomAlphabetic(10);
when(terminalNode.getText()).thenReturn("length(\""+testString+"\")");
when(expressionFunctionProvider.provideFunction(eq("length"), any(List.class))).thenReturn(testString.length());
assertThat(objectUnderTest.coercePrimaryTerminalNode(terminalNode, testEvent), equalTo(testString.length()));
}

@Test
void testCoerceTerminalNodeLengthFunctionKeyNotInEvent() {
final String key = RandomStringUtils.randomAlphabetic(5);
final String value = RandomStringUtils.randomAlphabetic(10);
final String key2 = RandomStringUtils.randomAlphabetic(5);
final Event testEvent = createTestEvent(Map.of(key, value));
when(terminalNode.getSymbol()).thenReturn(token);
when(terminalNode.getText()).thenReturn("length(/"+key2+")");
when(token.getType()).thenReturn(DataPrepperExpressionParser.Function);
assertThat(objectUnderTest.coercePrimaryTerminalNode(terminalNode, testEvent), equalTo(null));
}

@Test
void testCoerceTerminalNodeWithUnknownFunction() {
when(terminalNode.getSymbol()).thenReturn(token);
when(token.getType()).thenReturn(DataPrepperExpressionParser.Function);
when(terminalNode.getText()).thenReturn("xyz(arg1)");
assertThrows(RuntimeException.class, () -> objectUnderTest.coercePrimaryTerminalNode(terminalNode, null));
}

private Event createTestEvent(final Object data) {
final Event event = mock(Event.class);
final JsonNode node = mapper.valueToTree(data);
Expand Down
Loading