-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](function)support url domain functions #42488
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
need to add fe fold const when adding new scalar function to fe, which can refer to: https://selectdb.feishu.cn/wiki/BGfGwgY2uiQrK7krGUVcYJFknVg?from=from_copylink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There were too many comments to post at once. Showing the first 25 out of 39. Check the log or trigger a new build to see more.
run buildall |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
requires(0 <= sizeof...(symbols) && sizeof...(symbols) <= 16) | ||
{ | ||
#if defined(__SSE4_2__) | ||
if (sizeof...(symbols) >= 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: statement should be inside braces [readability-braces-around-statements]
if (sizeof...(symbols) >= 5) | |
if (sizeof...(symbols) >= 5) { |
be/src/vec/functions/url/find_symbols.h:344:
- else
+ } else
if (sizeof...(symbols) >= 5) | ||
return find_first_symbols_sse42<positive, return_mode, sizeof...(symbols), symbols...>( | ||
begin, end); | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: statement should be inside braces [readability-braces-around-statements]
else | |
else { |
be/src/vec/functions/url/find_symbols.h:346:
- return find_first_symbols_sse2<positive, return_mode, symbols...>(begin, end);
+ return find_first_symbols_sse2<positive, return_mode, symbols...>(begin, end);
+ }
inline const char* find_first_symbols_dispatch(const std::string_view haystack, | ||
const SearchSymbols& symbols) { | ||
#if defined(__SSE4_2__) | ||
if (symbols.str.size() >= 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: statement should be inside braces [readability-braces-around-statements]
if (symbols.str.size() >= 5) | |
if (symbols.str.size() >= 5) { |
be/src/vec/functions/url/find_symbols.h:356:
- else
+ } else
if (symbols.str.size() >= 5) | ||
return find_first_symbols_sse42<positive, return_mode>(haystack.begin(), haystack.end(), | ||
symbols); | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: statement should be inside braces [readability-braces-around-statements]
else | |
else { |
be/src/vec/functions/url/find_symbols.h:359:
- haystack.begin(), haystack.end(), symbols.str.data(), symbols.str.size());
+ haystack.begin(), haystack.end(), symbols.str.data(), symbols.str.size());
+ }
while (pos < end) { | ||
const char* delimiter_or_end = find_first_symbols<symbols...>(pos, end); | ||
|
||
if (!token_compress || pos < delimiter_or_end) to.emplace_back(pos, delimiter_or_end - pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: statement should be inside braces [readability-braces-around-statements]
if (!token_compress || pos < delimiter_or_end) to.emplace_back(pos, delimiter_or_end - pos); | |
if (!token_compress || pos < delimiter_or_end) { to.emplace_back(pos, delimiter_or_end - pos); | |
} |
#include <cstring> | ||
|
||
#define TOTAL_KEYWORDS 5045 | ||
#define MIN_WORD_LENGTH 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: macro 'MIN_WORD_LENGTH' defines an integral constant; prefer an enum instead [modernize-macro-to-enum]
#define MIN_WORD_LENGTH 4
^
|
||
#define TOTAL_KEYWORDS 5045 | ||
#define MIN_WORD_LENGTH 4 | ||
#define MAX_WORD_LENGTH 34 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: macro 'MAX_WORD_LENGTH' defines an integral constant; prefer an enum instead [modernize-macro-to-enum]
#define MAX_WORD_LENGTH 34
^
#define TOTAL_KEYWORDS 5045 | ||
#define MIN_WORD_LENGTH 4 | ||
#define MAX_WORD_LENGTH 34 | ||
#define MIN_HASH_VALUE 75 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: macro 'MIN_HASH_VALUE' defines an integral constant; prefer an enum instead [modernize-macro-to-enum]
#define MIN_HASH_VALUE 75
^
#define MIN_WORD_LENGTH 4 | ||
#define MAX_WORD_LENGTH 34 | ||
#define MIN_HASH_VALUE 75 | ||
#define MAX_HASH_VALUE 110600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: macro 'MAX_HASH_VALUE' defines an integral constant; prefer an enum instead [modernize-macro-to-enum]
#define MAX_HASH_VALUE 110600
^
TeamCity be ut coverage result: |
struct ExtractTopLevelDomain { | ||
static size_t get_reserve_length_for_element() { return 5; } | ||
|
||
static void execute(const char* data, size_t size, const char*& res_data, size_t& res_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not return StringRef
, we better refactor the code
run buildall |
TeamCity be ut coverage result: |
#include <cstdint> | ||
#include <string> | ||
|
||
#if defined(__SSE2__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only define SSE4_2 enoungh. and here need add header
|
||
const auto* host_end = host_view.data() + host_view.size(); | ||
const char* last_dot = find_last_symbols_or_null<'.'>(host_view.data(), host_end); | ||
if (!last_dot) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not find the last_dot
what should the res_data
and res_size
? DANGER ! it's may be UB
}; | ||
|
||
struct FirstSignificantSubdomainDefaultLookup { | ||
bool operator()(StringRef host) const { return tldLookup::isValid(host.data, host.size); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_valid
|
||
static void execute(const Pos data, const size_t size, Pos& res_data, size_t& res_size, | ||
Pos* out_domain_end = nullptr) { | ||
FirstSignificantSubdomainDefaultLookup loookup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here we need a class, not direct call the static func tldLooup::is_valid
Proposed changes
support top_level_domain/first_significant_subdomain/cut_to_first_significant_subdomain functions
doc: apache/doris-website#1230