Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for regexpReplace scalar function #9123

Merged
merged 2 commits into from
Aug 2, 2022

Conversation

vvivekiyer
Copy link
Contributor

@vvivekiyer vvivekiyer commented Jul 28, 2022

Fixes issue = #9079
Label = feature

This PR adds a scalar function to perform REGEXP REPLACE in Pinot. The semantics of using this scalar function is as follows:

  • If matches is not found, inputStr is returned as such.
  • By default, all occurrences of the matched patterns are replaced unless the occurence parameter is not -1.
  • Default match flag is case sensitive. Can be overriden with the flag parameter

Parameters:

  1. inputStr (required): Input string against which regexp match and replace should be performed.
  2. matchStr (required): String or regexp to match against inputStr
  3. replaceStr (required): String or regexp to replace if a pattern is found.
  4. startPos (optional): Starting index from where matching should be performed. Default value is 0.
  5. occurrence (optional): The occurrence of the matched pattern must be replaced. Default is -1 where all found patterns will be replaced.
  6. flag (optional): Single character flag that controls how the regex finds matches in inputStr. Default flag of empty is case sensitive match. If an incorrect flag is specified, the function applies default case sensitive match. Only one flag can be specified. Supported flags are:
  • i -> Case insensitive
  • m -> Treats the string to match as multiple lines
  • x -> Add comments to the regular expression

We can support more flags in the future.

@codecov-commenter
Copy link

codecov-commenter commented Jul 28, 2022

Codecov Report

Merging #9123 (655b1b4) into master (913f0d8) will increase coverage by 2.28%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master    #9123      +/-   ##
============================================
+ Coverage     61.23%   63.51%   +2.28%     
+ Complexity     4842     4735     -107     
============================================
  Files          1836     1799      -37     
  Lines         97524    95732    -1792     
  Branches      14703    14497     -206     
============================================
+ Hits          59716    60806    +1090     
+ Misses        33342    30542    -2800     
+ Partials       4466     4384      -82     
Flag Coverage Δ
integration1 ?
unittests1 66.97% <0.00%> (+0.01%) ⬆️
unittests2 15.37% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../pinot/common/function/scalar/StringFunctions.java 54.90% <0.00%> (-17.25%) ⬇️
...va/org/apache/pinot/core/routing/RoutingTable.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...ache/pinot/server/access/AccessControlFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/TableDeletionMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/core/data/manager/realtime/TimerService.java 0.00% <0.00%> (-100.00%) ⬇️
... and 557 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@vvivekiyer vvivekiyer changed the title [WIP; Do not Review] Add support for regexpReplace scalar function Add support for regexpReplace scalar function Jul 29, 2022
@vvivekiyer vvivekiyer marked this pull request as ready for review July 29, 2022 00:11
@vvivekiyer vvivekiyer changed the title Add support for regexpReplace scalar function [WIP] Add support for regexpReplace scalar function Jul 29, 2022
@vvivekiyer vvivekiyer changed the title [WIP] Add support for regexpReplace scalar function Add support for regexpReplace scalar function Jul 29, 2022
@vvivekiyer
Copy link
Contributor Author

@siddharthteotia @Jackie-Jiang please take a look and provide feedback. Thanks!

@siddharthteotia
Copy link
Contributor

@vvivekiyer - can you rebase this once and also look at test failures ?

@vvivekiyer vvivekiyer force-pushed the regexp_replace branch 2 times, most recently from 8760195 to 4b44076 Compare August 2, 2022 16:19
@vvivekiyer
Copy link
Contributor Author

@siddharthteotia Rebased and reran the tests. Please review.

* @return replaced input string
*/
@ScalarFunction
public static String regexpReplace(String inputStr, String matchStr, String replaceStr, int startPos,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) startPos -> matchStartPos ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

patternFlag = Pattern.MULTILINE;
break;
case "x":
patternFlag = Pattern.COMMENTS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the purpose / use of this flag. Can you please elaborate ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest not handling m and x now. We can add them later and keep it simple for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed m and x.

@siddharthteotia
Copy link
Contributor

@vvivekiyer - can you please rebase on top of #9114

@vvivekiyer
Copy link
Contributor Author

@siddharthteotia rebased and all tests are passing now.

@siddharthteotia siddharthteotia merged commit 561e471 into apache:master Aug 2, 2022
@siddharthteotia
Copy link
Contributor

@vvivekiyer - let's please add user docs to gitbook

@Jackie-Jiang
Copy link
Contributor

Not specific to this PR, but suggest using draft to mark the WIP PR instead of putting it as the title.
I saw the WIP; Do not Review was removed 5 days ago, but the github notification emails are still grouped under the original name, which is very confusing. I thought this PR is not yet ready for review, but is already merged...

@vvivekiyer
Copy link
Contributor Author

@Jackie-Jiang Sure, makes sense.
If there is any feedback from your review, please let me know. I can address them as a followup.

@vvivekiyer vvivekiyer deleted the regexp_replace branch August 10, 2022 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants