-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update string limit to 256KB for non-primary-key strings #2472
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #2472 +/- ##
=======================================
Coverage 91.48% 91.48%
=======================================
Files 1025 1024 -1
Lines 37646 37647 +1
=======================================
+ Hits 34440 34442 +2
+ Misses 3206 3205 -1 ☔ View full report in Codecov by Sentry. |
8610674
to
dfcbf20
Compare
dfcbf20
to
07e854d
Compare
07e854d
to
552766e
Compare
std::string ExceptionMessage::overLargeStringValueException(uint64_t length) { | ||
return stringFormat("Maximum length of strings is 4096. Input string's length is {}.", length); | ||
return stringFormat( | ||
"The maximum length of strings is 262144 bytes. The input string's length was {}.", length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just say 256KB? so it's more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it's more comparable to the length, which we're showing in bytes. It might not be very easy to compare, for example, 262145 bytes to 256KB at a glance, but if we list the exact number of bytes you can do so more easily.
@@ -427,6 +428,9 @@ void StringVector::addString(ValueVector* vector, uint32_t vectorPos, ku_string_ | |||
if (ku_string_t::isShortString(srcStr.len)) { | |||
dstStr.setShortString(srcStr); | |||
} else { | |||
if (srcStr.len > BufferPoolConstants::PAGE_256KB_SIZE) { | |||
throw RuntimeException(ExceptionMessage::overLargeStringValueException(srcStr.len)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codecov says this is not covered, is it a false positive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess only two of the addString variants are covered by the long string tests. I'm not sure how to trigger the others via an end-to-end test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems to be used by addBlob, which is used in the rdf reader.
The other one that's missed seems to be used in encode_function.h
and decode_function.h
Needs some tests for the primary key string limits now that those are different from non-primary-key strings.Because the limiting part is now where the value vector allocates space in-memory for the string, I've made that be the only place we check the maximum string length (excluding the primary key check that is). The other checks (e.g. in
cast_from_string_functions.cpp
) seem redundant now.I've also added a similar check to InMemOverflowFile, though it seems a little out of place given that InMemOverflowFile wasn't designed to be just for primary keys, but in practice I think that's all it's now being used for. I'd prefer to have some sort of exception at the lowest level to make sure there always is some exception, however maybe that could be an exception specific to InMemOverflowFile and we could have an earlier exception in the hash index insert/append methods which is more user-friendly.