Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String utf8 test #3287

Merged
merged 4 commits into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/include/function/string/functions/ltrim_function.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ struct Ltrim {
break;
}
}
memcpy(data, data + counter, len - counter);
memmove(data, data + counter, len - counter);
hououou marked this conversation as resolved.
Show resolved Hide resolved
return len - counter;
}
};
Expand Down
1 change: 1 addition & 0 deletions test/include/test_runner/test_parser.h
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ class TestParser {

void openFile();
void tokenize();
std::vector<std::string> splitString();
void parseHeader();
void parseBody();
void extractExpectedResult(TestStatement* statement);
Expand Down
121 changes: 121 additions & 0 deletions test/test_files/function/list_of_string_utf8.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
-GROUP TinySnbReadTest2
-DATASET CSV empty

--

-CASE ListOfStringFunctionUTF8
-STATEMENT RETURN list_extract(["成績評価","の甘","業が"], 2)
---- 1
の甘

-STATEMENT RETURN list_element(["成績評価","の甘","業が"], 1)
---- 1
成績評価

-STATEMENT RETURN list_concat(["成績評価","の甘","業が"], ["这是中文","的语句"])
---- 1
[成績評価,の甘,業が,这是中文,的语句]

-STATEMENT RETURN list_cat(["成績評価","の甘","業が"], ["这是中文","的语句"])
---- 1
[成績評価,の甘,業が,这是中文,的语句]

-STATEMENT RETURN array_cat(["成績評価","の甘","業が"], ["这是中文","的语句"])
---- 1
[成績評価,の甘,業が,这是中文,的语句]

-STATEMENT RETURN array_concat(["成績評価","の甘","業が"], ["这是中文","的语句"])
---- 1
[成績評価,の甘,業が,这是中文,的语句]

-STATEMENT RETURN list_append(["成績評価","の甘","業が", "这是中文"], "的语句")
---- 1
[成績評価,の甘,業が,这是中文,的语句]

-STATEMENT RETURN array_append(["成績評価","の甘","業が", "这是中文"], "的语句")
---- 1
[成績評価,の甘,業が,这是中文,的语句]

-STATEMENT RETURN array_push_back(["成績評価","の甘","業が", "这是中文"], "的语句")
---- 1
[成績評価,の甘,業が,这是中文,的语句]

-STATEMENT RETURN list_prepend(["成績評価","の甘","業が", "这是中文"], "的语句")
---- 1
[的语句,成績評価,の甘,業が,这是中文]

-STATEMENT RETURN array_prepend(["成績評価","の甘","業が", "这是中文"], "的语句")
---- 1
[的语句,成績評価,の甘,業が,这是中文]

-STATEMENT RETURN array_push_front(["成績評価","の甘","業が", "这是中文"], "的语句")
---- 1
[的语句,成績評価,の甘,業が,这是中文]

-STATEMENT RETURN list_position(["成績評価","の甘","業が", "这是中文"], "这是中文")
---- 1
4
-STATEMENT RETURN list_indexof(["成績評価","の甘","業が", "这是中文"], "这是中文")
---- 1
4

-STATEMENT RETURN array_position(["成績評価","の甘","業が", "这是中文"], "这是中文")
---- 1
4
-STATEMENT RETURN array_indexof(["成績評価","の甘","業が", "这是中文"], "这是中文")
---- 1
4

-STATEMENT RETURN list_contains(["成績評価","の甘","業が", "这是中文"], "这是中文")
---- 1
True

-STATEMENT RETURN array_has(["成績評価","の甘","業が", "这是中文"], "这是中文吗")
---- 1
False

-STATEMENT RETURN list_slice(["成績評価","の甘","業が", "这是中文"], 1, 2)
---- 1
[成績評価]

-STATEMENT RETURN array_slice(["成績評価","の甘","業が", "这是中文"], 1, 4)
---- 1
[成績評価,の甘,業が]

-STATEMENT RETURN list_reverse(["成績評価","の甘","業が", "这是中文"])
---- 1
[这是中文,業が,の甘,成績評価]

-STATEMENT RETURN list_sort(["成績評価","の甘","業が", "这是中文"])
---- 1
[の甘,成績評価,業が,这是中文]

-STATEMENT RETURN list_reverse_sort(["成績評価","の甘","業が", "这是中文"])
---- 1
[这是中文,業が,成績評価,の甘]


-STATEMENT RETURN list_sum(["成績評価","の甘","業が", "这是中文"])
---- error
Binder exception: Unsupported inner data type for LIST_SUM: STRING


-STATEMENT RETURN list_sum(["toronto","waterloo"])
---- error
Binder exception: Unsupported inner data type for LIST_SUM: STRING

-STATEMENT RETURN list_product(["成績評価","の甘","業が", "这是中文"])
---- error
Binder exception: Unsupported inner data type for LIST_PRODUCT: STRING

-STATEMENT RETURN list_distinct(["成績評価","成績評価","成績評価", "这是中文"])
---- 1
[成績評価,这是中文]

-STATEMENT RETURN list_unique(["成績評価","成績評価","成績評価", "这是中文"])
---- 1
2

-STATEMENT RETURN list_any_value([null, "成績評価","成績評価","成績評価", "这是中文"])
---- 1
成績評価
172 changes: 172 additions & 0 deletions test/test_files/function/string_utf8.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
-GROUP TinySnbReadTest
-DATASET CSV tinysnb

--

-CASE StringFunctionUTF8
-LOG StrAddOperation
-STATEMENT MATCH (a:movies) RETURN a.name + "suffix"
---- 3
Sóló cón tu párejâsuffix
The 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 moviesuffix
Romasuffix

-LOG StrAdd
-STATEMENT return string("The 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movies") + string("成績評価の甘い授業が高く評価");
---- 1
The 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movies成績評価の甘い授業が高く評価

-LOG StrConcat
-STATEMENT return concat(string("The 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movies"),string("成績評価の甘い授業が高く評価"));
---- 1
The 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movies成績評価の甘い授業が高く評価

-LOG StrEndsWith
-STATEMENT return ends_with(string("The 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movies"),string("🍞🚗 movies"));
---- 1
True
-STATEMENT return ends_with(string("The 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movies"),string("成績評価の甘い授業が高く評価"));
---- 1
False

-LOG StrLower
-STATEMENT MATCH (m:movies) RETURN lower(m.name)
---- 3
sóló cón tu párejâ
the 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movie
roma

-LOG StrLcase
-STATEMENT MATCH (m:movies) RETURN lcase(m.name)
---- 3
sóló cón tu párejâ
the 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movie
roma

-LOG StrLeft
-STATEMENT MATCH (m:movies) RETURN left(m.name, 6)
---- 3
Sóló c
The 😂😃
Roma

-LOG StrLevenshtein
-STATEMENT return levenshtein('成績評価の甘い授業が高く評価', '成績評価の甘い授業が高く‍');
---- 1
6

-LOG StrSize
-STATEMENT return size('abc');
---- 1
3
-STATEMENT return size('成績評価の甘い授業が高く評価');
---- 1
14

-LOG StrLpad
-STATEMENT RETURN lpad(string('成績評価'), 10, "<")
---- 1
<<<<<<成績評価

-LOG strReverse
-STATEMENT RETURN reverse('成績評価の甘い授業が高く評価')
---- 1
価評く高が業授い甘の価評績成

-LOG strltrim
-STATEMENT RETURN ltrim(' 😃🧘🏻‍♂️🌍🌦️🍞🚗')
---- 1
😃🧘🏻‍♂️🌍🌦️🍞🚗

-LOG strprefix
-STATEMENT RETURN prefix('😃🧘🏻‍♂️🌍🌦️🍞🚗','😃🧘🏻‍')
---- 1
True

-LOG strrepeat
-STATEMENT RETURN repeat('😃🧘🏻‍♂️🌍🌦️🍞🚗',3)
---- 1
😃🧘🏻‍♂️🌍🌦️🍞🚗😃🧘🏻‍♂️🌍🌦️🍞🚗😃🧘🏻‍♂️🌍🌦️🍞🚗

-LOG strRight
-STATEMENT RETURN right('😃🧘🏻‍♂️🌍🌦️🍞🚗', 3)
---- 1
🌦️🍞🚗

-LOG strRpad
-STATEMENT RETURN rpad('😃🌍♂', 5,'<')
---- 1
😃🌍♂<<
-STATEMENT RETURN rpad('😃🌍♂🍞🚗', 7,'<')
---- 1
😃🌍♂🍞🚗<<
-STATEMENT RETURN rpad('😃🌍♂🌦🍞🚗', 7,'<')
---- 1
😃🌍♂🌦🍞🚗<

-LOG strstartwith
-STATEMENT RETURN starts_with('😃🧘🏻‍♂️🌍🌦️🍞🚗', '😃🧘🏻‍')
---- 1
True
-STATEMENT RETURN starts_with('成績評価の甘い授業が高く評価', '成績')
---- 1
True

-LOG strsubstring
-STATEMENT RETURN substring('😃🌍🌦️🍞🚗', 1,3)
---- 1
😃🌍🌦️
-STATEMENT RETURN substring('成績評価の甘い授業が高く評価', 1,3)
---- 1
成績評
-STATEMENT RETURN substring('😃🧘♂🌍', 1,3)
---- 1
😃🧘♂

-LOG strsubstr
-STATEMENT RETURN substr('成績評価の甘い授業が高く評価', 1,3)
---- 1
成績評

-LOG strsuffix
-STATEMENT RETURN suffix('成績評価の甘い授業が高く評価', '高く評価')
---- 1
True

-LOG strtrim
-STATEMENT RETURN trim(' 成績評価の甘い授業が高く評価')
---- 1
成績評価の甘い授業が高く評価
-STATEMENT RETURN trim(' 成 績 評価の甘い授業が高く評価')
---- 1
成 績 評価の甘い授業が高く評価


-LOG strupper
-STATEMENT MATCH (m:movies) RETURN upper(m.name)
---- 3
SÓLÓ CÓN TU PÁREJÂ
THE 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 MOVIE
ROMA

-LOG strlower
-STATEMENT MATCH (m:movies) RETURN lower(m.name)
---- 3
sóló cón tu párejâ
the 😂😃🧘🏻‍♂️🌍🌦️🍞🚗 movie
roma


-LOG listfunc
-STATEMENT RETURN list_element("成績評価の甘い授業が高く評価", 3)
---- 1
-STATEMENT RETURN list_extract("成績評価の甘い授業が高く評価", 4)
---- 1
-STATEMENT RETURN array_slice("成績評価の甘い授業が高く評価", 1, 3)
---- 1
成績評
-STATEMENT RETURN array_extract("成績評価の甘い授業が高く評価", 4)
---- 1
14 changes: 13 additions & 1 deletion test/test_runner/test_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -422,8 +422,20 @@ void TestParser::openFile() {
fileStream.open(path);
}

std::vector<std::string> TestParser::splitString() {
hououou marked this conversation as resolved.
Show resolved Hide resolved
std::vector<std::string> matches;
std::regex re(R"((?:[^'"\s\\]+|'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|\S+)+)");
auto wordsBegin = std::sregex_iterator(line.begin(), line.end(), re);
auto wordsEnd = std::sregex_iterator();
for (std::sregex_iterator i = wordsBegin; i != wordsEnd; ++i) {
std::smatch match = *i;
matches.push_back(match.str());
}
return matches;
}

void TestParser::tokenize() {
currentToken.params = StringUtils::splitBySpace(line);
currentToken.params = splitString();
if ((currentToken.params.size() == 0) || (currentToken.params[0][0] == '#')) {
currentToken.type = TokenType::EMPTY;
} else {
Expand Down
Loading