Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update string limit to 256KB for non-primary-key strings #2472

Merged
merged 1 commit into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dataset/copy-fault-tests/long-string/vPerson.csv

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions dataset/copy-fault-tests/long-string/vPersonPK.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
AliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAlice,AliceAlice,1
2,Bob,2
8 changes: 7 additions & 1 deletion src/common/exception/message.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,14 @@ std::string ExceptionMessage::invalidPKType(const std::string& type) {
type);
}

std::string ExceptionMessage::overLargeStringPKValueException(uint64_t length) {
return stringFormat("The maximum length of primary key strings is 4096 bytes. The input "
"string's length was {}.",
length);
}
std::string ExceptionMessage::overLargeStringValueException(uint64_t length) {
return stringFormat("Maximum length of strings is 4096. Input string's length is {}.", length);
return stringFormat(
"The maximum length of strings is 262144 bytes. The input string's length was {}.", length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just say 256KB? so it's more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it's more comparable to the length, which we're showing in bytes. It might not be very easy to compare, for example, 262145 bytes to 256KB at a glance, but if we list the exact number of bytes you can do so more easily.

}

std::string ExceptionMessage::violateUniquenessOfRelAdjColumn(const std::string& tableName,
Expand Down
13 changes: 13 additions & 0 deletions src/common/vector/value_vector.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "common/vector/value_vector.h"

#include "common/exception/message.h"
#include "common/null_buffer.h"
#include "common/types/value/nested.h"
#include "common/types/value/value.h"
Expand Down Expand Up @@ -427,6 +428,9 @@
if (ku_string_t::isShortString(srcStr.len)) {
dstStr.setShortString(srcStr);
} else {
if (srcStr.len > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(srcStr.len));

Check warning on line 432 in src/common/vector/value_vector.cpp

View check run for this annotation

Codecov / codecov/patch

src/common/vector/value_vector.cpp#L432

Added line #L432 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codecov says this is not covered, is it a false positive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess only two of the addString variants are covered by the long string tests. I'm not sure how to trigger the others via an end-to-end test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems to be used by addBlob, which is used in the rdf reader.

The other one that's missed seems to be used in encode_function.h and decode_function.h

}
dstStr.overflowPtr = reinterpret_cast<uint64_t>(stringBuffer->allocateOverflow(srcStr.len));
dstStr.setLongString(srcStr);
}
Expand All @@ -440,6 +444,9 @@
if (ku_string_t::isShortString(length)) {
dstStr.setShortString(srcStr, length);
} else {
if (length > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(length));
}
dstStr.overflowPtr = reinterpret_cast<uint64_t>(stringBuffer->allocateOverflow(length));
dstStr.setLongString(srcStr, length);
}
Expand All @@ -462,6 +469,9 @@
if (ku_string_t::isShortString(srcStr.len)) {
dstStr.setShortString(srcStr);
} else {
if (srcStr.len > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(srcStr.len));

Check warning on line 473 in src/common/vector/value_vector.cpp

View check run for this annotation

Codecov / codecov/patch

src/common/vector/value_vector.cpp#L473

Added line #L473 was not covered by tests
}
dstStr.overflowPtr = reinterpret_cast<uint64_t>(stringBuffer->allocateOverflow(srcStr.len));
dstStr.setLongString(srcStr);
}
Expand All @@ -474,6 +484,9 @@
if (ku_string_t::isShortString(length)) {
dstStr.setShortString(srcStr, length);
} else {
if (length > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(length));
}
dstStr.overflowPtr = reinterpret_cast<uint64_t>(stringBuffer->allocateOverflow(length));
dstStr.setLongString(srcStr, length);
}
Expand Down
4 changes: 0 additions & 4 deletions src/function/cast_from_string_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
#include "common/exception/parser.h"
#include "common/string_format.h"
#include "common/types/blob.h"
#include "storage/store/table_copy_utils.h"
#include "utf8proc_wrapper.h"

using namespace kuzu::common;
Expand Down Expand Up @@ -130,7 +129,6 @@ template<>
void CastStringHelper::cast(const char* input, uint64_t len, blob_t& /*result*/,
ValueVector* vector, uint64_t rowToAdd, const CSVReaderConfig* /*csvReaderConfig*/) {
// base case: blob
storage::TableCopyUtils::validateStrLen(len);
auto blobBuffer = std::make_unique<uint8_t[]>(len);
auto blobLen = Blob::fromString(input, len, blobBuffer.get());
BlobVector::addBlob(vector, rowToAdd, blobBuffer.get(), blobLen);
Expand Down Expand Up @@ -712,7 +710,6 @@ static bool tryCastUnionField(
testAndSetValue(vector, rowToAdd, result, success);
} break;
case LogicalTypeID::STRING: {
storage::TableCopyUtils::validateStrLen(len);
if (!utf8proc::Utf8Proc::isValid(input, len)) {
throw CopyException{"Invalid UTF8-encoded string."};
}
Expand Down Expand Up @@ -838,7 +835,6 @@ void CastString::copyStringToVector(ValueVector* vector, uint64_t rowToAdd, std:
strVal.data(), strVal.length(), val, vector, rowToAdd, csvReaderConfig);
} break;
case LogicalTypeID::STRING: {
storage::TableCopyUtils::validateStrLen(strVal.length());
if (!utf8proc::Utf8Proc::isValid(strVal.data(), strVal.length())) {
throw CopyException{"Invalid UTF8-encoded string."};
}
Expand Down
1 change: 1 addition & 0 deletions src/include/common/exception/message.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ struct ExceptionMessage {
static inline std::string notAllowCopyOnNonEmptyTableException() {
return "COPY commands can only be executed once on a table.";
}
static std::string overLargeStringPKValueException(uint64_t length);
static std::string overLargeStringValueException(uint64_t length);
static std::string violateUniquenessOfRelAdjColumn(const std::string& tableName,
const std::string& offset, const std::string& multiplicity, const std::string& direction);
Expand Down
15 changes: 0 additions & 15 deletions src/include/storage/store/table_copy_utils.h

This file was deleted.

7 changes: 0 additions & 7 deletions src/storage/local_storage/local_table.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#include "storage/local_storage/local_table.h"

#include "common/exception/message.h"
#include "storage/local_storage/local_node_table.h"
#include "storage/local_storage/local_rel_table.h"
#include "storage/store/column.h"
Expand All @@ -18,12 +17,6 @@ void LocalVector::read(
void LocalVector::append(ValueVector* valueVector) {
KU_ASSERT(valueVector->state->selVector->selectedSize == 1);
auto pos = valueVector->state->selVector->selectedPositions[0];
if (valueVector->dataType.getPhysicalType() == PhysicalTypeID::STRING) {
auto kuStr = valueVector->getValue<ku_string_t>(pos);
if (kuStr.len > BufferPoolConstants::PAGE_4KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(kuStr.len));
}
}
vector->copyFromVectorData(numValues, valueVector, pos);
numValues++;
}
Expand Down
4 changes: 4 additions & 0 deletions src/storage/storage_structure/in_mem_file.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include <mutex>

#include "common/exception/copy.h"
#include "common/exception/message.h"
#include "common/type_utils.h"
#include "common/types/value/nested.h"

Expand Down Expand Up @@ -56,6 +57,9 @@ ku_string_t InMemOverflowFile::appendString(const char* rawString) {
std::memcpy(result.prefix, rawString,
length <= ku_string_t::SHORT_STR_LENGTH ? length : ku_string_t::PREFIX_LENGTH);
if (length > ku_string_t::SHORT_STR_LENGTH) {
if (length > BufferPoolConstants::PAGE_4KB_SIZE) {
throw CopyException(ExceptionMessage::overLargeStringPKValueException(length));
}
std::unique_lock lck{lock};
// Allocate a new page if necessary.
if (nextOffsetInPageToAppend + length >= BufferPoolConstants::PAGE_4KB_SIZE) {
Expand Down
1 change: 0 additions & 1 deletion src/storage/store/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ add_library(kuzu_storage_store
string_column.cpp
struct_column_chunk.cpp
struct_column.cpp
table_copy_utils.cpp
table_data.cpp
var_list_column_chunk.cpp
var_list_column.cpp)
Expand Down
19 changes: 0 additions & 19 deletions src/storage/store/table_copy_utils.cpp

This file was deleted.

1 change: 1 addition & 0 deletions test/answers/long_string_valid.csv

Large diffs are not rendered by default.

20 changes: 19 additions & 1 deletion test/test_files/copy/copy_long_string.test
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,27 @@

-CASE LongStringError

-LOG LongStringSuccess
-STATEMENT create node table person0 (ID SERIAL, fName STRING, PRIMARY KEY (ID));
---- ok
-STATEMENT copy person0 from '${KUZU_ROOT_DIRECTORY}/test/answers/long_string_valid.csv';
---- ok
-STATEMENT match (p:person0) return p.fName;
---- 1
<FILE>:long_string_valid.csv

-LOG LongStringError
-STATEMENT create node table person1 (ID INT64, fName STRING, gender INT64, PRIMARY KEY (ID));
---- ok
-STATEMENT copy person1 from '${KUZU_ROOT_DIRECTORY}/dataset/copy-fault-tests/long-string/vPerson.csv';
---- error
Copy exception: Maximum length of strings is 4096. Input string's length is 5625.
Runtime exception: The maximum length of strings is 262144 bytes. The input string's length was 270000.

-CASE LongStringPKError

-LOG LongStringPKError
-STATEMENT create node table person1 (ID STRING, fName STRING, gender INT64, PRIMARY KEY (ID));
---- ok
-STATEMENT copy person1 from '${KUZU_ROOT_DIRECTORY}/dataset/copy-fault-tests/long-string/vPersonPK.csv';
---- error
Copy exception: The maximum length of primary key strings is 4096 bytes. The input string's length was 4100.
2 changes: 1 addition & 1 deletion test/test_files/transaction/set/set_transaction.test
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,6 @@ Alice
-LOG Set Very Long String Errors Test
-STATEMENT BEGIN TRANSACTION;
---- ok
-DEFINE test_long_string REPEAT 4096 "a"
-DEFINE test_long_string REPEAT 262144 "a"
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.fName='${test_long_string}';
---- ok
16 changes: 13 additions & 3 deletions test/test_files/update_node/set_tinysnb.test
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,23 @@ False
---- 1
abcdefghijklmnopqrstuvwxyz

-CASE SetLongListTest
-DEFINE STRING_EXCEEDS_PAGE ARANGE 0 5990
-STATEMENT BEGIN TRANSACTION
---- ok
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.fName="${STRING_EXCEEDS_PAGE}"
---- ok
-STATEMENT MATCH (a:person) WHERE a.ID=0 RETURN a.fName;
---- 1
${STRING_EXCEEDS_PAGE}

-CASE SetVeryLongListErrorsTest
-DEFINE STRING_EXCEEDS_OVERFLOW ARANGE 0 5990
-DEFINE STRING_EXCEEDS_MEMORY_MANAGER_LIMIT ARANGE 0 50000
-STATEMENT BEGIN TRANSACTION
---- ok
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.fName="${STRING_EXCEEDS_OVERFLOW}"
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.fName="${STRING_EXCEEDS_MEMORY_MANAGER_LIMIT}"
---- error
Runtime exception: Maximum length of strings is 4096. Input string's length is 28846.
Runtime exception: The maximum length of strings is 262144 bytes. The input string's length was 288897.

-CASE SetNodeIntervalPropTest
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.lastJobDuration=interval('1 years 1 days')
Expand Down