-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write file-level statistics when writing ORC files with zero rows #14707
Write file-level statistics when writing ORC files with zero rows #14707
Conversation
…bug-write_orc-empty-file-stats
/** | ||
* @brief Skip writing the footer and close the writer. | ||
*/ | ||
void skip_close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a better name for this function. I fear the name implies that it's the close()
that's skipped.
skip_footer_write_on_close()
is too long.
If we can't think of a succinct name, maybe we should keep skip_close()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg, my updated code for this function got lost between commits.
I think skip_close
is a pretty accurate name, given that we raise the _closed
flag in skip_close
and close()
returns immediately if that flag is raised.
I'll update the docs above, sorry for the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for working on this.
|
…uule/cudf into bug-write_orc-empty-file-stats
…bug-write_orc-empty-file-stats
std::generate_n( | ||
std::back_inserter(expected_column_names), | ||
expected.num_columns(), | ||
[starting_index = 0]() mutable { return "_col" + std::to_string(starting_index++); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow mutable lambda is great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I was told it's a code smell (but this is test code)
I think in theory mutable lambda means we should use an STL algorithm instead
/merge |
Description
Fixes #14675
Write file-level statistics even when stripe-level statistics don't exist (no stripes).
Written statistics are in line with Pandas - zero sum, no min/max.
Checklist