Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update string limit to 256KB for non-primary-key strings #2472

Merged
merged 1 commit into from
Nov 21, 2023

Conversation

benjaminwinger
Copy link
Collaborator

@benjaminwinger benjaminwinger commented Nov 20, 2023

Needs some tests for the primary key string limits now that those are different from non-primary-key strings.

Because the limiting part is now where the value vector allocates space in-memory for the string, I've made that be the only place we check the maximum string length (excluding the primary key check that is). The other checks (e.g. in cast_from_string_functions.cpp) seem redundant now.

I've also added a similar check to InMemOverflowFile, though it seems a little out of place given that InMemOverflowFile wasn't designed to be just for primary keys, but in practice I think that's all it's now being used for. I'd prefer to have some sort of exception at the lowest level to make sure there always is some exception, however maybe that could be an exception specific to InMemOverflowFile and we could have an earlier exception in the hash index insert/append methods which is more user-friendly.

Copy link

codecov bot commented Nov 20, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (072c256) 91.48% compared to head (552766e) 91.48%.
Report is 3 commits behind head on master.

Files Patch % Lines
src/common/vector/value_vector.cpp 75.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2472   +/-   ##
=======================================
  Coverage   91.48%   91.48%           
=======================================
  Files        1025     1024    -1     
  Lines       37646    37647    +1     
=======================================
+ Hits        34440    34442    +2     
+ Misses       3206     3205    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benjaminwinger benjaminwinger force-pushed the string-limits branch 3 times, most recently from 8610674 to dfcbf20 Compare November 20, 2023 19:33
@benjaminwinger benjaminwinger marked this pull request as ready for review November 20, 2023 19:33
std::string ExceptionMessage::overLargeStringValueException(uint64_t length) {
return stringFormat("Maximum length of strings is 4096. Input string's length is {}.", length);
return stringFormat(
"The maximum length of strings is 262144 bytes. The input string's length was {}.", length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just say 256KB? so it's more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it's more comparable to the length, which we're showing in bytes. It might not be very easy to compare, for example, 262145 bytes to 256KB at a glance, but if we list the exact number of bytes you can do so more easily.

@@ -427,6 +428,9 @@ void StringVector::addString(ValueVector* vector, uint32_t vectorPos, ku_string_
if (ku_string_t::isShortString(srcStr.len)) {
dstStr.setShortString(srcStr);
} else {
if (srcStr.len > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(srcStr.len));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codecov says this is not covered, is it a false positive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess only two of the addString variants are covered by the long string tests. I'm not sure how to trigger the others via an end-to-end test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems to be used by addBlob, which is used in the rdf reader.

The other one that's missed seems to be used in encode_function.h and decode_function.h

@ray6080 ray6080 merged commit cd0530d into master Nov 21, 2023
12 checks passed
@ray6080 ray6080 deleted the string-limits branch November 21, 2023 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants