-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List valueVector refactor #1503
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1503 +/- ##
==========================================
- Coverage 92.23% 91.87% -0.36%
==========================================
Files 676 677 +1
Lines 24314 24396 +82
==========================================
- Hits 22425 22415 -10
- Misses 1889 1981 +92
☔ View full report in Codecov by Sentry. |
} else { | ||
copyNonNullDataWithSameType(srcVector.dataType, | ||
srcVector.getData() + pos * srcVector.getNumBytesPerValue(), dstData, | ||
dstOverflowBuffer); | ||
} | ||
} | ||
|
||
void ValueVectorUtils::appendElementToList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to list creation (as a static function in cpp) if no other function need it.
} | ||
} | ||
|
||
void ValueVectorUtils::copyElementOutFromListVector(ValueVector& listVector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto move to unwind as a static function
48e4fd5
to
567b638
Compare
} | ||
} break; | ||
case STRING: { | ||
common::InMemOverflowBufferUtils::copyString(*(common::ku_string_t*)srcValue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open an issue about removing InMemOverflowBufferUtils::copyString. I think this function may not be justified anymore in the sense that we always copy from memory to vector or vector to memory. If so, this logic might be written in vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened issue #1511
resultVector->setValue<T>(resultPos, val); | ||
if (thenVector.dataType.typeID == common::VAR_LIST) { | ||
auto srcListEntry = thenVector.getValue<list_entry_t>(thenPos); | ||
list_entry_t resultEntry = ListVector::addList(resultVector.get(), srcListEntry.size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
l97 - l100 seems to be a unified logic for list entry, i.e. copy a list_entry from one list vector to another list vector requires
- l97 create list entry in result vector
- l98-99 copy data vector
- l100 copy list entry vector
} | ||
|
||
static inline void resetOverflowBuffer(ValueVector* vector) { | ||
if (vector->dataType.typeID == STRING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert STRING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't assert string here, other vectors that are not strings may call this function as well. So we have:
if (vector->dataType.typeID == STRING) {
check there.
class StringVector { | ||
public: | ||
static inline InMemOverflowBuffer* getInMemOverflowBuffer(ValueVector* vector) { | ||
return vector->dataType.typeID == STRING ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert STRING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
This PR refactors the QueryProcessing logic for List DataType.
We use two vectors to store lists in the system:
a. A vector for the list offsets (called offset vector): This vector contains the starting index and length for each list within the data vector. Each entry in this vector represents the starting index and length of the list in the data vector.
b. A data vector to store the actual list elements: This vector holds the actual elements of the lists in a flat, continuous storage. Each list would be represented as a contiguous subsequence of elements in this vector.
E.g. We want to store
[1,3]
and[4,8,9]
in an unflat valueVector.We firstly copy
[1,3]
into the dataVector, and store the ListEntry with startOffset: 0, length: 2 in the offsetVector,Then we append
[4,8,9]
to the dataVector, and store the ListEntry with startOffset: 2, length: 3 in the offfsetVector.To access the second list of the listVector, we firstly need to get the ListEntry: startoffset: 2, length: 3. Then we know the second list is stored at offset 2 of the dataVector with length=3.
Storage refactor for list dataType.