Skip to content

Commit

Permalink
Update string limit to 256KB for non-primary-key strings
Browse files Browse the repository at this point in the history
  • Loading branch information
benjaminwinger committed Nov 20, 2023
1 parent 072c256 commit 07e854d
Show file tree
Hide file tree
Showing 14 changed files with 49 additions and 53 deletions.
2 changes: 1 addition & 1 deletion dataset/copy-fault-tests/long-string/vPerson.csv

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions dataset/copy-fault-tests/long-string/vPersonPK.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
AliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAliceAlice,AliceAlice,1
2,Bob,2
8 changes: 7 additions & 1 deletion src/common/exception/message.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,14 @@ std::string ExceptionMessage::invalidPKType(const std::string& type) {
type);
}

std::string ExceptionMessage::overLargeStringPKValueException(uint64_t length) {
return stringFormat("The maximum length of primary key strings is 4096 bytes. The input "
"string's length was {}.",
length);
}
std::string ExceptionMessage::overLargeStringValueException(uint64_t length) {
return stringFormat("Maximum length of strings is 4096. Input string's length is {}.", length);
return stringFormat(
"The maximum length of strings is 262144 bytes. The input string's length was {}.", length);
}

std::string ExceptionMessage::violateUniquenessOfRelAdjColumn(const std::string& tableName,
Expand Down
13 changes: 13 additions & 0 deletions src/common/vector/value_vector.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "common/vector/value_vector.h"

#include "common/exception/message.h"
#include "common/null_buffer.h"
#include "common/types/value/nested.h"
#include "common/types/value/value.h"
Expand Down Expand Up @@ -427,6 +428,9 @@ void StringVector::addString(ValueVector* vector, uint32_t vectorPos, ku_string_
if (ku_string_t::isShortString(srcStr.len)) {
dstStr.setShortString(srcStr);
} else {
if (srcStr.len > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(srcStr.len));

Check warning on line 432 in src/common/vector/value_vector.cpp

View check run for this annotation

Codecov / codecov/patch

src/common/vector/value_vector.cpp#L432

Added line #L432 was not covered by tests
}
dstStr.overflowPtr = reinterpret_cast<uint64_t>(stringBuffer->allocateOverflow(srcStr.len));
dstStr.setLongString(srcStr);
}
Expand All @@ -440,6 +444,9 @@ void StringVector::addString(
if (ku_string_t::isShortString(length)) {
dstStr.setShortString(srcStr, length);
} else {
if (length > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(length));
}
dstStr.overflowPtr = reinterpret_cast<uint64_t>(stringBuffer->allocateOverflow(length));
dstStr.setLongString(srcStr, length);
}
Expand All @@ -462,6 +469,9 @@ void StringVector::addString(ValueVector* vector, ku_string_t& dstStr, ku_string
if (ku_string_t::isShortString(srcStr.len)) {
dstStr.setShortString(srcStr);
} else {
if (srcStr.len > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(srcStr.len));

Check warning on line 473 in src/common/vector/value_vector.cpp

View check run for this annotation

Codecov / codecov/patch

src/common/vector/value_vector.cpp#L473

Added line #L473 was not covered by tests
}
dstStr.overflowPtr = reinterpret_cast<uint64_t>(stringBuffer->allocateOverflow(srcStr.len));
dstStr.setLongString(srcStr);
}
Expand All @@ -474,6 +484,9 @@ void StringVector::addString(
if (ku_string_t::isShortString(length)) {
dstStr.setShortString(srcStr, length);
} else {
if (length > BufferPoolConstants::PAGE_256KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(length));
}
dstStr.overflowPtr = reinterpret_cast<uint64_t>(stringBuffer->allocateOverflow(length));
dstStr.setLongString(srcStr, length);
}
Expand Down
4 changes: 0 additions & 4 deletions src/function/cast_from_string_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
#include "common/exception/parser.h"
#include "common/string_format.h"
#include "common/types/blob.h"
#include "storage/store/table_copy_utils.h"
#include "utf8proc_wrapper.h"

using namespace kuzu::common;
Expand Down Expand Up @@ -130,7 +129,6 @@ template<>
void CastStringHelper::cast(const char* input, uint64_t len, blob_t& /*result*/,
ValueVector* vector, uint64_t rowToAdd, const CSVReaderConfig* /*csvReaderConfig*/) {
// base case: blob
storage::TableCopyUtils::validateStrLen(len);
auto blobBuffer = std::make_unique<uint8_t[]>(len);
auto blobLen = Blob::fromString(input, len, blobBuffer.get());
BlobVector::addBlob(vector, rowToAdd, blobBuffer.get(), blobLen);
Expand Down Expand Up @@ -712,7 +710,6 @@ static bool tryCastUnionField(
testAndSetValue(vector, rowToAdd, result, success);
} break;
case LogicalTypeID::STRING: {
storage::TableCopyUtils::validateStrLen(len);
if (!utf8proc::Utf8Proc::isValid(input, len)) {
throw CopyException{"Invalid UTF8-encoded string."};
}
Expand Down Expand Up @@ -838,7 +835,6 @@ void CastString::copyStringToVector(ValueVector* vector, uint64_t rowToAdd, std:
strVal.data(), strVal.length(), val, vector, rowToAdd, csvReaderConfig);
} break;
case LogicalTypeID::STRING: {
storage::TableCopyUtils::validateStrLen(strVal.length());
if (!utf8proc::Utf8Proc::isValid(strVal.data(), strVal.length())) {
throw CopyException{"Invalid UTF8-encoded string."};
}
Expand Down
1 change: 1 addition & 0 deletions src/include/common/exception/message.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ struct ExceptionMessage {
static inline std::string notAllowCopyOnNonEmptyTableException() {
return "COPY commands can only be executed once on a table.";
}
static std::string overLargeStringPKValueException(uint64_t length);
static std::string overLargeStringValueException(uint64_t length);
static std::string violateUniquenessOfRelAdjColumn(const std::string& tableName,
const std::string& offset, const std::string& multiplicity, const std::string& direction);
Expand Down
15 changes: 0 additions & 15 deletions src/include/storage/store/table_copy_utils.h

This file was deleted.

7 changes: 0 additions & 7 deletions src/storage/local_storage/local_table.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#include "storage/local_storage/local_table.h"

#include "common/exception/message.h"
#include "storage/local_storage/local_node_table.h"
#include "storage/local_storage/local_rel_table.h"
#include "storage/store/column.h"
Expand All @@ -18,12 +17,6 @@ void LocalVector::read(
void LocalVector::append(ValueVector* valueVector) {
KU_ASSERT(valueVector->state->selVector->selectedSize == 1);
auto pos = valueVector->state->selVector->selectedPositions[0];
if (valueVector->dataType.getPhysicalType() == PhysicalTypeID::STRING) {
auto kuStr = valueVector->getValue<ku_string_t>(pos);
if (kuStr.len > BufferPoolConstants::PAGE_4KB_SIZE) {
throw RuntimeException(ExceptionMessage::overLargeStringValueException(kuStr.len));
}
}
vector->copyFromVectorData(numValues, valueVector, pos);
numValues++;
}
Expand Down
4 changes: 4 additions & 0 deletions src/storage/storage_structure/in_mem_file.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include <mutex>

#include "common/exception/copy.h"
#include "common/exception/message.h"
#include "common/type_utils.h"
#include "common/types/value/nested.h"

Expand Down Expand Up @@ -56,6 +57,9 @@ ku_string_t InMemOverflowFile::appendString(const char* rawString) {
std::memcpy(result.prefix, rawString,
length <= ku_string_t::SHORT_STR_LENGTH ? length : ku_string_t::PREFIX_LENGTH);
if (length > ku_string_t::SHORT_STR_LENGTH) {
if (length > BufferPoolConstants::PAGE_4KB_SIZE) {
throw CopyException(ExceptionMessage::overLargeStringPKValueException(length));
}
std::unique_lock lck{lock};
// Allocate a new page if necessary.
if (nextOffsetInPageToAppend + length >= BufferPoolConstants::PAGE_4KB_SIZE) {
Expand Down
1 change: 0 additions & 1 deletion src/storage/store/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ add_library(kuzu_storage_store
string_column.cpp
struct_column_chunk.cpp
struct_column.cpp
table_copy_utils.cpp
table_data.cpp
var_list_column_chunk.cpp
var_list_column.cpp)
Expand Down
19 changes: 0 additions & 19 deletions src/storage/store/table_copy_utils.cpp

This file was deleted.

11 changes: 10 additions & 1 deletion test/test_files/copy/copy_long_string.test
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,13 @@
---- ok
-STATEMENT copy person1 from '${KUZU_ROOT_DIRECTORY}/dataset/copy-fault-tests/long-string/vPerson.csv';
---- error
Copy exception: Maximum length of strings is 4096. Input string's length is 5625.
Runtime exception: The maximum length of strings is 262144 bytes. The input string's length was 270000.

-CASE LongStringPKError

-LOG LongStringPKError
-STATEMENT create node table person1 (ID STRING, fName STRING, gender INT64, PRIMARY KEY (ID));
---- ok
-STATEMENT copy person1 from '${KUZU_ROOT_DIRECTORY}/dataset/copy-fault-tests/long-string/vPersonPK.csv';
---- error
Copy exception: The maximum length of primary key strings is 4096 bytes. The input string's length was 4100.
2 changes: 1 addition & 1 deletion test/test_files/transaction/set/set_transaction.test
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,6 @@ Alice
-LOG Set Very Long String Errors Test
-STATEMENT BEGIN TRANSACTION;
---- ok
-DEFINE test_long_string REPEAT 4096 "a"
-DEFINE test_long_string REPEAT 262144 "a"
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.fName='${test_long_string}';
---- ok
13 changes: 10 additions & 3 deletions test/test_files/update_node/set_tinysnb.test
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,20 @@ False
---- 1
abcdefghijklmnopqrstuvwxyz

-CASE SetLongListTest
-DEFINE STRING_EXCEEDS_PAGE ARANGE 0 5990
-STATEMENT BEGIN TRANSACTION
---- ok
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.fName="${STRING_EXCEEDS_PAGE}"
---- ok

-CASE SetVeryLongListErrorsTest
-DEFINE STRING_EXCEEDS_OVERFLOW ARANGE 0 5990
-DEFINE STRING_EXCEEDS_MEMORY_MANAGER_LIMIT ARANGE 0 50000
-STATEMENT BEGIN TRANSACTION
---- ok
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.fName="${STRING_EXCEEDS_OVERFLOW}"
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.fName="${STRING_EXCEEDS_MEMORY_MANAGER_LIMIT}"
---- error
Runtime exception: Maximum length of strings is 4096. Input string's length is 28846.
Runtime exception: The maximum length of strings is 262144 bytes. The input string's length was 288897.

-CASE SetNodeIntervalPropTest
-STATEMENT MATCH (a:person) WHERE a.ID=0 SET a.lastJobDuration=interval('1 years 1 days')
Expand Down

0 comments on commit 07e854d

Please sign in to comment.