Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove DataTypeInfo and use LogicalType and column names #2539

Merged
merged 11 commits into from
Apr 9, 2024

Conversation

russell-liu
Copy link
Contributor

@russell-liu russell-liu commented Dec 4, 2023

fixes #2423

Copy link

codecov bot commented Dec 4, 2023

Codecov Report

Attention: Patch coverage is 81.69935% with 28 lines in your changes are missing coverage. Please review.

Project coverage is 93.31%. Comparing base (d946982) to head (84700b3).
Report is 11 commits behind head on master.

❗ Current head 84700b3 differs from pull request most recent head 4d9b7bf. Consider uploading reports for the commit 4d9b7bf to get more accurate results

Files Patch % Lines
src/common/arrow/arrow_row_batch.cpp 78.46% 28 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2539      +/-   ##
==========================================
+ Coverage   92.28%   93.31%   +1.03%     
==========================================
  Files        1161     1036     -125     
  Lines       44150    38819    -5331     
==========================================
- Hits        40744    36225    -4519     
+ Misses       3406     2594     -812     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ArrowVector* vector, const main::DataTypeInfo& typeInfo, std::int64_t capacity) {
initializeStructVector(vector, typeInfo, capacity);
ArrowVector* vector, const LogicalType& type, std::int64_t capacity) {
initializeStructVector(vector, type, capacity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test to cover all uncovered lines.


ArrowArray toArray();

private:
std::vector<std::unique_ptr<main::DataTypeInfo>> typesInfo;
std::vector<LogicalType> types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use common::logical_types_t

@russell-liu
Copy link
Contributor Author

After attempting to add more Arrow tests, I've discovered a problem with this removal of DataTypeInfo that I do not know how to resolve. Details:

I tried to return a node, rel, or internal ID in the Arrow tests, and this led to segmentation faults. After further investigation, I believe this is because the following logic in the old implementation using DataTypeInfo is not preserved:
https://github.com/kuzudb/kuzu/blob/master/src/main/query_result.cpp#L24
https://github.com/kuzudb/kuzu/blob/master/src/main/query_result.cpp#L102
https://github.com/kuzudb/kuzu/blob/master/src/main/query_result.cpp#L115
That is, when DataTypeInfo used to be created for internal IDs, nodes, or rels, some additional childrenTypesInfo would be created; when we switch to using DataType only, these fields are missing.

I tried to modify arrow_row_batch.cpp (including templateInitializeVector and templateCopyNonNullValue for internal IDs, nodes, and rels) to replicate this logic, but I didn't really know what I was doing and I ended up with a segmentation fault occurring during std::memcpy that I didn't know what to do about. My approach was probably incorrect and messy I won't push the code here.

@mxwli
Copy link
Collaborator

mxwli commented Apr 8, 2024

Should close #3224 as well

@mxwli mxwli linked an issue Apr 8, 2024 that may be closed by this pull request
Copy link
Contributor

@andyfengHKU andyfengHKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure we add corresponding test cases for coverage.

Makefile Outdated Show resolved Hide resolved
tools/python_api/src_cpp/py_query_result.cpp Show resolved Hide resolved
src/include/common/arrow/arrow_row_batch.h Outdated Show resolved Hide resolved
src/common/arrow/arrow_type.cpp Show resolved Hide resolved
@mxwli mxwli merged commit 0b70b02 into master Apr 9, 2024
18 checks passed
@mxwli mxwli deleted the remove-datatypeinfo branch April 9, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python API: Segfault with nodes() and rels() when using get_as_arrow() Remove DataTypeInfo from codebase
3 participants