Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework FIXED_LIST #3057

Merged
merged 1 commit into from
Mar 18, 2024
Merged

Rework FIXED_LIST #3057

merged 1 commit into from
Mar 18, 2024

Conversation

manh9203
Copy link
Contributor

@manh9203 manh9203 commented Mar 14, 2024

This PR resolves the issue #2975

What is changed:

  • Rename FIXED_LIST to ARRAY
  • Fix ARRAY to use the same physical layout as VAR_LIST so it can support nested data types
  • Unify string casting functions of VAR_LIST and ARRAY
  • Unify casting functions between nested data types
  • Fix NpyReader to write to ListVector
  • Fix StorageDriver to read from VarListColumn

To-do list:

  • Implement implicit casting rules between VAR_LIST and ARRAY so we can use list functions for array
  • Support array native functions Feature Request: Vector Operations #3068
  • Optimize the physical layout of array by removing the entry information vector

@manh9203 manh9203 marked this pull request as draft March 14, 2024 16:37
@manh9203 manh9203 changed the title Fixed-list rework Rework FIXED_LIST Mar 14, 2024
tools/rust_api/src/ffi.rs Outdated Show resolved Hide resolved
src/include/common/type_utils.h Outdated Show resolved Hide resolved
src/include/common/types/types.h Outdated Show resolved Hide resolved
src/include/common/types/types.h Outdated Show resolved Hide resolved
src/common/type_utils.cpp Outdated Show resolved Hide resolved
void Value::copyFromVarList(ku_list_t& list, const LogicalType& childType) {
if (dataType->getLogicalTypeID() == LogicalTypeID::ARRAY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss with @ray6080 to see if we should have this type of check everywhere in the code base. I tend to not have it. And maybe we should have ARRAY being a physical type as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a physical type ARRAY to avoid these checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do it in another PR. I still have array using the same physical layout as var-list in this PR. And I removed the size check since we didn't have these checks for the old fixed-list vector.

// Used by fixedList only.
uint64_t numValuesPerList;
// Used by array only.
uint64_t fixedNumValues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure should we remove it since this field is still needed in other parts of the code.

src/function/cast/cast_fixed_list.cpp Outdated Show resolved Hide resolved
src/function/cast/cast_fixed_list.cpp Outdated Show resolved Hide resolved
@ray6080
Copy link
Contributor

ray6080 commented Mar 15, 2024

Don't forget to update docs or open an issue on docs repo.

@manh9203 manh9203 marked this pull request as ready for review March 16, 2024 03:21
Copy link

codecov bot commented Mar 17, 2024

Codecov Report

Attention: Patch coverage is 99.41860% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 92.94%. Comparing base (bd963c1) to head (35b9438).

Files Patch % Lines
src/common/arrow/arrow_row_batch.cpp 94.44% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3057      +/-   ##
==========================================
- Coverage   93.00%   92.94%   -0.06%     
==========================================
  Files        1155     1155              
  Lines       43286    42919     -367     
==========================================
- Hits        40259    39892     -367     
  Misses       3027     3027              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@manh9203 manh9203 force-pushed the fixed-list-rework branch 2 times, most recently from 1c52155 to b90ff5d Compare March 18, 2024 05:04

switch (resultType->getPhysicalType()) {
case PhysicalTypeID::VAR_LIST: {
if (inputType.getPhysicalType() == PhysicalTypeID::VAR_LIST) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write as

if input == ARRAY && output == ARRAY
     check numElements
     return
if input == LIST && output == ARRAY:
    check numElements
    return
validate recursively

numValuesInChild *
storage::StorageUtils::getDataTypeSize(LogicalType{typeInfo.childrenTypesInfo[0]->typeID}));
for (auto i = 0u; i < numValuesPerList; i++) {
if (typeInfo.childrenTypesInfo[0]->typeID != LogicalTypeID::VAR_LIST &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a comment say why we don't need to resize for VAR_LIST and ARRAY

@manh9203 manh9203 merged commit e7c6d73 into master Mar 18, 2024
17 checks passed
@manh9203 manh9203 deleted the fixed-list-rework branch March 18, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants