-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate file header for LOAD and COPY #2210
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #2210 +/- ##
==========================================
+ Coverage 89.46% 89.58% +0.12%
==========================================
Files 1007 1007
Lines 36241 36249 +8
==========================================
+ Hits 32422 32475 +53
+ Misses 3819 3774 -45
☔ View full report in Codecov by Sentry. |
void Binder::sniffFiles(const common::ReaderConfig& readerConfig, | ||
std::vector<std::string>& columnNames, | ||
std::vector<std::unique_ptr<common::LogicalType>>& columnTypes) { | ||
assert(readerConfig.getNumFiles() > 0); | ||
sniffFile(readerConfig, 0, columnNames, columnTypes); | ||
for (auto i = 1; i < readerConfig.getNumFiles(); ++i) { | ||
std::vector<std::string> tmpColumnNames; | ||
std::vector<std::unique_ptr<LogicalType>> tmpColumnTypes; | ||
sniffFile(readerConfig, i, tmpColumnNames, tmpColumnTypes); | ||
switch (readerConfig.fileType) { | ||
case FileType::CSV: { | ||
validateNumColumns(columnTypes.size(), tmpColumnTypes.size()); | ||
} | ||
case FileType::PARQUET: { | ||
validateNumColumns(columnTypes.size(), tmpColumnTypes.size()); | ||
validateColumnTypes(columnNames, columnTypes, tmpColumnTypes); | ||
} break; | ||
case FileType::NPY: { | ||
validateNumColumns(1, tmpColumnTypes.size()); | ||
columnNames.push_back(tmpColumnNames[0]); | ||
columnTypes.push_back(tmpColumnTypes[0]->copy()); | ||
} break; | ||
default: | ||
break; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some duplication here, right? Either we should validate the number of columns here, or in each bind
step, but not both.
I think it'd be best to move the validation entirely to a separate function to consolidate the shared logic.
reader/csv: skip empty lines when sniffing On CSVs without headers, we should skip any leading empty lines, and return zero if all lines are empty. Co-authored-by: Keenan G <41458184+Riolku@users.noreply.github.com>
37e51ea
to
80e9905
Compare
This PR partially solves issue #2139.
We add several validation to COPY and LOAD FROM