Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader function refactor #1979

Merged
merged 1 commit into from
Aug 31, 2023
Merged

Reader function refactor #1979

merged 1 commit into from
Aug 31, 2023

Conversation

acquamarin
Copy link
Collaborator

No description provided.

@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Patch coverage: 96.23% and project coverage change: +0.01% 🎉

Comparison is base (440eb0a) 89.66% compared to head (a1997be) 89.68%.
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1979      +/-   ##
==========================================
+ Coverage   89.66%   89.68%   +0.01%     
==========================================
  Files         876      877       +1     
  Lines       32337    32362      +25     
==========================================
+ Hits        28996    29024      +28     
+ Misses       3341     3338       -3     
Files Changed Coverage Δ
src/include/common/types/types.h 100.00% <ø> (ø)
src/include/common/vector/value_vector.h 100.00% <ø> (ø)
...c/include/expression_evaluator/literal_evaluator.h 100.00% <ø> (ø)
src/include/storage/copier/rel_copier.h 95.74% <ø> (ø)
src/include/storage/copier/reader_state.h 92.30% <75.00%> (-3.70%) ⬇️
src/common/vector/value_vector.cpp 96.77% <81.25%> (-2.69%) ⬇️
src/expression_evaluator/literal_evaluator.cpp 100.00% <100.00%> (+7.89%) ⬆️
src/include/processor/operator/persistent/reader.h 100.00% <100.00%> (ø)
src/include/storage/storage_info.h 100.00% <100.00%> (ø)
src/processor/operator/persistent/reader.cpp 100.00% <100.00%> (ø)
... and 2 more

... and 41 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we trigger a reload on the benchmark dataset? Just want to set a routine that non-trivial changes to COPY should always run on large dataset.

src/include/storage/copier/reader_state.h Outdated Show resolved Hide resolved
src/include/processor/operator/persistent/reader.h Outdated Show resolved Hide resolved
src/include/processor/operator/persistent/reader.h Outdated Show resolved Hide resolved
src/include/storage/copier/reader_state.h Outdated Show resolved Hide resolved
src/storage/copier/reader_state.cpp Outdated Show resolved Hide resolved
src/storage/copier/reader_state.cpp Outdated Show resolved Hide resolved
src/storage/copier/reader_state.cpp Outdated Show resolved Hide resolved
src/storage/copier/rel_copier.cpp Show resolved Hide resolved
std::unique_ptr<ReaderMorsel> getParallelMorsel();

inline void lock() { mtx.lock(); }
inline void unlock() { mtx.unlock(); }
inline common::row_idx_t& getNumRowsRef() { return std::ref(numRows); }

static void appendToArrayVectors(std::vector<arrow::ArrayVector>& arrayVectors,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names of these two functions appendToArrayVectors and appendArrowArraysToVectors are quite confusing. I really can't differentiate them easily as they read so similarly...

For appendToArrayVectors, I think you mean appendArraysToVectors?
and for appendArrowArraysToVectors, maybe extractArraysToVectors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, appendToArrayVectors means append data from valueVector to vector<arrow::ArrayVector>
appendArrowArraysToVectors means append data from ArrayVector to arrow vector

}
}

arrow::ArrayVector ReaderSharedState::extractArrayVectorToAppend(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a part I'm a bit confused. Isn't the slicing and append logic the same for all arrays within the same set of batches? Why do we perform them repeatedly column by column? It seems if we don't organize data as per ArrayVector, instead per DataChunk (std::vector<std::shared_ptr<DataChunk>>), we can simplify this. Let's discuss a bit more on this, maybe I'm missing something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the slicing algorithm

@acquamarin acquamarin merged commit a260a35 into master Aug 31, 2023
10 checks passed
@acquamarin acquamarin deleted the csv-reader branch August 31, 2023 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants