Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex String Functions #1516

Merged
merged 1 commit into from
May 8, 2023
Merged

Conversation

gaurav8297
Copy link
Collaborator

@gaurav8297 gaurav8297 commented May 5, 2023

Functions added:

  1. regexp_matches(string, regex)
    Returns true if a part of string matches the regex.

  2. regexp_replace(string, regex, replacement)
    Replaces the first occurrence of regex with the replacement,

  3. regexp_extract(string, regex[, group = 0])
    Split the string along the regex and extract first occurrence of group.

  4. regexp_extract_all(string, regex[, group = 0])
    Split the string along the regex and extract all occurrences of group.

I have read and agree to the terms under CLA.md

Copy link
Collaborator

@acquamarin acquamarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start!

  1. Our CI uses the clang-format tool to check the formatting of our code. You can get a copy of the format checking script here: https://github.com/Sarcasm/run-clang-format/blob/master/run-clang-format.py. Be sure to check the formatting of your code using this tool before pushing.
  2. Please make sure that your code can pass all the CI before submitting.
  3. We prefer to squash all commits into one for each PR.
    If you have any questions related to my comments, feel free to message me.

.gitignore Outdated Show resolved Hide resolved
src/common/re2_regex.cpp Outdated Show resolved Hide resolved
src/common/re2_regex.cpp Outdated Show resolved Hide resolved
src/common/re2_regex.cpp Outdated Show resolved Hide resolved
src/function/built_in_vector_operations.cpp Outdated Show resolved Hide resolved
src/include/common/re2_regex.h Outdated Show resolved Hide resolved
src/include/common/re2_regex.h Outdated Show resolved Hide resolved
src/include/common/re2_regex.h Outdated Show resolved Hide resolved
src/include/common/re2_regex.h Outdated Show resolved Hide resolved
test/test_files/tinysnb/function/string.test Show resolved Hide resolved
@gaurav8297 gaurav8297 force-pushed the string_functions branch 4 times, most recently from 13ace76 to 9e3b399 Compare May 6, 2023 20:18
@codecov
Copy link

codecov bot commented May 6, 2023

Codecov Report

Patch coverage: 92.66% and project coverage change: +0.02 🎉

Comparison is base (3108a21) 91.91% compared to head (a682d24) 91.93%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1516      +/-   ##
==========================================
+ Coverage   91.91%   91.93%   +0.02%     
==========================================
  Files         678      685       +7     
  Lines       24458    24640     +182     
==========================================
+ Hits        22480    22653     +173     
- Misses       1978     1987       +9     
Impacted Files Coverage Δ
...include/function/string/vector_string_operations.h 95.83% <ø> (ø)
src/function/vector_string_operations.cpp 84.54% <86.84%> (+1.21%) ⬆️
...n/string/operations/regexp_extract_all_operation.h 92.30% <92.30%> (ø)
...ction/string/operations/regexp_extract_operation.h 94.11% <94.11%> (ø)
src/function/built_in_vector_operations.cpp 96.00% <100.00%> (+0.08%) ⬆️
...function/string/operations/base_regexp_operation.h 100.00% <100.00%> (ø)
...on/string/operations/regexp_full_match_operation.h 100.00% <100.00%> (ø)
...ction/string/operations/regexp_matches_operation.h 100.00% <100.00%> (ø)
...ction/string/operations/regexp_replace_operation.h 100.00% <100.00%> (ø)
src/parser/transformer.cpp 95.91% <100.00%> (ø)

... and 30 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@gaurav8297 gaurav8297 requested a review from acquamarin May 6, 2023 21:23
src/include/common/re2_regex.h Outdated Show resolved Hide resolved
src/common/re2_regex.cpp Outdated Show resolved Hide resolved
test/test_files/tinysnb/function/string.test Outdated Show resolved Hide resolved
src/function/vector_string_operations.cpp Show resolved Hide resolved
Copy link
Collaborator

@acquamarin acquamarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a base RegexOperation class and let other regexOperations inherit it.
We can put common methods: parseCypherPattern and copyToKuzuString under this class.

Copy link
Collaborator

@acquamarin acquamarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a base RegexOperation class and let other regexOperations inherit it.
We can put common methods: parseCypherPattern and copyToKuzuString under this class.

Rename RE_MATCH function to REGEXP_FULL_MATCH

The latter is more descriptive and complaint with
duckdb's naming convention.

Introduce regexp utils based on re2

Refactor regex_full_match implementation

Functions added:

1.regexp_matches(string, regex)
Returns true if a part of string matches the
regex.

2. regexp_replace(string, regex, replacement)
Replaces the first occurrence of regex with the
replacement,

3. regexp_extract(string, regex[, group = 0])
Split the string along the regex and extract
first occurrence of group.

4. regexp_extract_all(string, regex[, group = 0])
Split the string along the regex and extract
all occurrences of group.
@gaurav8297 gaurav8297 merged commit e12c903 into kuzudb:master May 8, 2023
7 checks passed
@gaurav8297 gaurav8297 deleted the string_functions branch May 8, 2023 18:29
@andyfengHKU andyfengHKU mentioned this pull request May 29, 2023
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants