Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reader/csv: sniff column name and type #2116

Merged
merged 1 commit into from
Sep 30, 2023
Merged

reader/csv: sniff column name and type #2116

merged 1 commit into from
Sep 30, 2023

Conversation

Riolku
Copy link
Contributor

@Riolku Riolku commented Sep 29, 2023

The format is COLUMN_NAME:COLUMN_TYPE.

@Riolku Riolku force-pushed the sniff-header branch 4 times, most recently from 9097cb8 to c8ec87e Compare September 29, 2023 21:17
readerConfig->columnTypes.push_back(stringType.copy());
columns.push_back(createVariable(columnName, stringType));
readerConfig->columnTypes.push_back(std::move(columnType));
columns.push_back(createVariable(columnName, columnTypeID));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to create a variable for columnTypeID since it is only used once. Just inline the call

void addRow(common::DataChunk&, common::column_id_t column);

template<typename Driver>
bool addRow(Driver&, uint64_t rowNum, common::column_id_t column_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a variable name for Driver&

return;
}
template<typename Driver>
bool BaseCSVReader::addRow(Driver& driver, uint64_t rowNum, column_id_t column) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline this function

@codecov
Copy link

codecov bot commented Sep 29, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (9783bf4) 89.78% compared to head (5121c62) 89.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2116      +/-   ##
==========================================
- Coverage   89.78%   89.76%   -0.03%     
==========================================
  Files         986      988       +2     
  Lines       35563    35596      +33     
==========================================
+ Hits        31932    31953      +21     
- Misses       3631     3643      +12     
Files Coverage Δ
src/binder/bind/bind_reading_clause.cpp 97.00% <100.00%> (ø)
...r/operator/persistent/reader/csv/base_csv_reader.h 100.00% <ø> (ø)
.../processor/operator/persistent/reader/csv/driver.h 100.00% <100.00%> (ø)
...erator/persistent/reader/csv/parallel_csv_reader.h 100.00% <ø> (ø)
...operator/persistent/reader/csv/serial_csv_reader.h 100.00% <ø> (ø)
...operator/persistent/reader/csv/base_csv_reader.cpp 96.70% <100.00%> (+0.65%) ⬆️
...ator/persistent/reader/csv/parallel_csv_reader.cpp 100.00% <100.00%> (ø)
...erator/persistent/reader/csv/serial_csv_reader.cpp 100.00% <100.00%> (ø)
...rocessor/operator/persistent/reader/csv/driver.cpp 94.16% <94.16%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The format is `COLUMN_NAME:COLUMN_TYPE`.

As part of this change, the CSV reader module has been refactored to use
"Drivers" for the actual storing of the values. This allows easy
implementation of sniffing, header skipping, and parsing, without
affecting performance (because it's all compile time) and it also keeps
the reader code clean.
@andyfengHKU andyfengHKU merged commit 9f2b790 into master Sep 30, 2023
10 of 11 checks passed
@andyfengHKU andyfengHKU deleted the sniff-header branch September 30, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants