Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List valueVector refactor #1503

Merged
merged 1 commit into from
May 4, 2023
Merged

List valueVector refactor #1503

merged 1 commit into from
May 4, 2023

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented May 1, 2023

This PR refactors the QueryProcessing logic for List DataType.

  1. Refactors how we store lists in the valueVector:
    We use two vectors to store lists in the system:
    a. A vector for the list offsets (called offset vector): This vector contains the starting index and length for each list within the data vector. Each entry in this vector represents the starting index and length of the list in the data vector.
    b. A data vector to store the actual list elements: This vector holds the actual elements of the lists in a flat, continuous storage. Each list would be represented as a contiguous subsequence of elements in this vector.
    E.g. We want to store [1,3] and [4,8,9] in an unflat valueVector.
    We firstly copy[1,3] into the dataVector, and store the ListEntry with startOffset: 0, length: 2 in the offsetVector,
    Then we append [4,8,9] to the dataVector, and store the ListEntry with startOffset: 2, length: 3 in the offfsetVector.
    To access the second list of the listVector, we firstly need to get the ListEntry: startoffset: 2, length: 3. Then we know the second list is stored at offset 2 of the dataVector with length=3.
  2. Todos:
    Storage refactor for list dataType.

@acquamarin acquamarin requested a review from ray6080 May 1, 2023 05:05
@codecov
Copy link

codecov bot commented May 1, 2023

Codecov Report

Patch coverage: 93.91% and project coverage change: -0.36 ⚠️

Comparison is base (4eebacf) 92.23% compared to head (4906bf1) 91.87%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1503      +/-   ##
==========================================
- Coverage   92.23%   91.87%   -0.36%     
==========================================
  Files         676      677       +1     
  Lines       24314    24396      +82     
==========================================
- Hits        22425    22415      -10     
- Misses       1889     1981      +92     
Impacted Files Coverage Δ
src/common/in_mem_overflow_buffer_utils.cpp 25.00% <ø> (-75.00%) ⬇️
src/function/vector_cast_operations.cpp 83.50% <ø> (ø)
src/include/common/in_mem_overflow_buffer_utils.h 100.00% <ø> (ø)
src/include/common/null_mask.h 100.00% <ø> (ø)
src/include/common/type_utils.h 100.00% <ø> (+6.06%) ⬆️
...c/include/expression_evaluator/literal_evaluator.h 100.00% <ø> (ø)
...lude/function/boolean/boolean_operation_executor.h 75.79% <ø> (ø)
.../include/function/schema/vector_label_operations.h 100.00% <ø> (ø)
...de/function/string/operations/base_str_operation.h 45.45% <0.00%> (ø)
...include/function/string/vector_string_operations.h 95.83% <ø> (ø)
... and 56 more

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/common/vector/value_vector.cpp Outdated Show resolved Hide resolved
} else {
copyNonNullDataWithSameType(srcVector.dataType,
srcVector.getData() + pos * srcVector.getNumBytesPerValue(), dstData,
dstOverflowBuffer);
}
}

void ValueVectorUtils::appendElementToList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to list creation (as a static function in cpp) if no other function need it.

}
}

void ValueVectorUtils::copyElementOutFromListVector(ValueVector& listVector,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto move to unwind as a static function

src/include/common/vector/value_vector.h Outdated Show resolved Hide resolved
src/include/common/vector/value_vector.h Outdated Show resolved Hide resolved
src/include/common/types/types.h Outdated Show resolved Hide resolved
src/include/common/types/types.h Outdated Show resolved Hide resolved
src/include/common/vector/value_vector.h Outdated Show resolved Hide resolved
src/common/type_utils.cpp Outdated Show resolved Hide resolved
src/common/vector/value_vector.cpp Outdated Show resolved Hide resolved
src/include/function/binary_operation_executor.h Outdated Show resolved Hide resolved
src/include/common/vector/value_vector.h Outdated Show resolved Hide resolved
src/include/common/vector/value_vector.h Outdated Show resolved Hide resolved
@acquamarin acquamarin force-pushed the list-vector branch 2 times, most recently from 48e4fd5 to 567b638 Compare May 4, 2023 18:05
src/common/vector/auxiliary_buffer.cpp Show resolved Hide resolved
src/include/common/type_utils.h Outdated Show resolved Hide resolved
src/include/common/type_utils.h Outdated Show resolved Hide resolved
src/common/type_utils.cpp Show resolved Hide resolved
}
} break;
case STRING: {
common::InMemOverflowBufferUtils::copyString(*(common::ku_string_t*)srcValue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open an issue about removing InMemOverflowBufferUtils::copyString. I think this function may not be justified anymore in the sense that we always copy from memory to vector or vector to memory. If so, this logic might be written in vector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened issue #1511

resultVector->setValue<T>(resultPos, val);
if (thenVector.dataType.typeID == common::VAR_LIST) {
auto srcListEntry = thenVector.getValue<list_entry_t>(thenPos);
list_entry_t resultEntry = ListVector::addList(resultVector.get(), srcListEntry.size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l97 - l100 seems to be a unified logic for list entry, i.e. copy a list_entry from one list vector to another list vector requires

  1. l97 create list entry in result vector
  2. l98-99 copy data vector
  3. l100 copy list entry vector

src/include/function/function_definition.h Outdated Show resolved Hide resolved
src/include/common/types/types.h Outdated Show resolved Hide resolved
}

static inline void resetOverflowBuffer(ValueVector* vector) {
if (vector->dataType.typeID == STRING) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert STRING

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assert string here, other vectors that are not strings may call this function as well. So we have:
if (vector->dataType.typeID == STRING) { check there.

class StringVector {
public:
static inline InMemOverflowBuffer* getInMemOverflowBuffer(ValueVector* vector) {
return vector->dataType.typeID == STRING ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert STRING

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

src/common/type_utils.cpp Show resolved Hide resolved
@acquamarin acquamarin merged commit 7f9d84c into master May 4, 2023
7 checks passed
@acquamarin acquamarin deleted the list-vector branch May 4, 2023 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants