Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fixed-list DataType to System #1298

Merged
merged 1 commit into from
Feb 22, 2023
Merged

Add Fixed-list DataType to System #1298

merged 1 commit into from
Feb 22, 2023

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented Feb 17, 2023

FIXED-LIST DataType

1. Grammar Rule:

We can define a colum with FIXED-LIST type using the following grammar:

DataType[SIZE]

E.g. INT64[3] => This means the list size is 3 and type is INT64.

2. Note:

a. The number of elements in the FIXED-LIST must be a positive number.
b. Supported element types: INT64, FLOAT (numeric types)

@acquamarin acquamarin marked this pull request as draft February 17, 2023 10:14
Copy link
Contributor

@andyfengHKU andyfengHKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but @ray6080 should take another look

src/antlr4/Cypher.g4 Outdated Show resolved Hide resolved
src/include/common/types/types.h Outdated Show resolved Hide resolved
src/include/common/types/value.h Outdated Show resolved Hide resolved
src/catalog/catalog.cpp Outdated Show resolved Hide resolved
src/common/csv_reader/csv_reader.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
src/function/built_in_aggregate_functions.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high level suggestion is we should not introduce TENSOR as a logical data type to users. Instead, LIST is a more generic data type that covers what we need already.
From the users' perspective, only LIST is exposed. We support defining fixed-sized and variable-sized LIST in the front-end. Internally, we have two separate physical types to hand this, which are FIXED_LIST and VAR_LIST.
We can discuss the changes in more details.

Currently, what's missing from this pr are two things:

  1. we need to check if the defined LIST size is larger than a default page size. In that case, we don't support it for now.
  2. we need to disable all functions over fixed sized LIST. maybe in the binder or somewhere else, @andyfengHKU can help with this.

src/antlr4/Cypher.g4 Outdated Show resolved Hide resolved
src/include/common/types/value.h Outdated Show resolved Hide resolved
src/common/csv_reader/csv_reader.cpp Outdated Show resolved Hide resolved
src/include/common/types/types.h Outdated Show resolved Hide resolved
src/include/common/types/types.h Outdated Show resolved Hide resolved
@@ -320,6 +320,62 @@ std::unique_ptr<Value> CopyStructuresArrow::getArrowList(std::string& l, int64_t
DataType(LIST, std::make_unique<DataType>(childDataType)), std::move(values));
}

std::unique_ptr<uint8_t[]> CopyStructuresArrow::getArrowTensor(std::string& l, int64_t from,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not reuse getArrowList?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some differences between fixed/var list for now:

  1. getArrowVarList returns a vector of values, and then convert that vector to ku_list_t, and store the elements in the overflow.
  2. getArrowFixedList should return a pointer to a block of memory which stores the elements of the list. We should copy that block of memory to inMemColumn directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let's try to reuse the part of parsing lists.

src/include/common/types/types.h Outdated Show resolved Hide resolved
src/function/built_in_aggregate_functions.cpp Outdated Show resolved Hide resolved
src/common/vector/value_vector.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
@acquamarin acquamarin marked this pull request as ready for review February 21, 2023 04:46
@acquamarin acquamarin changed the title Add Tensor dataType to system Add Fixed-list DataType to System Feb 21, 2023
src/common/arrow/arrow_converter.cpp Show resolved Hide resolved
src/storage/copy_arrow/copy_structures_arrow.cpp Outdated Show resolved Hide resolved
@@ -320,6 +320,62 @@ std::unique_ptr<Value> CopyStructuresArrow::getArrowList(std::string& l, int64_t
DataType(LIST, std::make_unique<DataType>(childDataType)), std::move(values));
}

std::unique_ptr<uint8_t[]> CopyStructuresArrow::getArrowTensor(std::string& l, int64_t from,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let's try to reuse the part of parsing lists.

src/binder/bind/bind_ddl.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
src/common/types/types.cpp Outdated Show resolved Hide resolved
src/common/types/value.cpp Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants