-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standalone 12/N: Implement and test ZarrV3ArrayWriter #304
Conversation
…andalone-sequence-4
…andalone-sequence-4b
…sc compression parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions and comments but otherwise looks good.
#ifdef max | ||
#undef max | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is max
defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere in minio, which is included in s3.connection.hh which is included in sink.creator.hh which is included here. I have an issue here to address it.
namespace { | ||
std::string | ||
sample_type_to_dtype(ZarrDataType t) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary empty lines.
src/streaming/zarrv3.array.writer.hh
Outdated
#include "array.writer.hh" | ||
|
||
namespace zarr { | ||
struct ZarrV3ArrayWriter final : public ArrayWriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previously suggested (apologies if I missed your response), I think we should avoid using final
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been removing them, but missed this one.
struct ZarrV3ArrayWriter final : public ArrayWriter | ||
{ | ||
public: | ||
ZarrV3ArrayWriter(ArrayWriterConfig&& config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is an r-value reference necessary in this context? I assume you want to enforce a move operation, but I'm unsure why the config should be moved rather than passed as a pointer, which is one alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayWriterConfig
has a std::unique_ptr<ArrayDimensions>
member, though the next PR reveals this does need to be a shared pointer after all, because the Stream object needs to hold on to them as well for metadata purposes.
std::fill_n( | ||
table.begin(), table.size(), std::numeric_limits<uint64_t>::max()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is std::fill_n
necessary if you're filling the entire container? If not, you should use std::fill
or even better, use <ranges>
instead.
ranges::fill(table, std::numeric_limits<uint64_t>::max());
} | ||
|
||
// write out chunks to shards | ||
bool write_table = is_finalizing_ || should_rollover_(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using auto
consistently is a good idea.
const auto is_write_table = is_finalizing_ || should_rollover_();
for (auto i = 0; i < n_shards; ++i) { | ||
const auto& chunks = chunk_in_shards.at(i); | ||
auto& chunk_table = shard_tables_.at(i); | ||
size_t* file_offset = &shard_file_offsets_.at(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use auto*
here too if you want.
file_offset, | ||
write_table, | ||
&latch, | ||
this](std::string& err) mutable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this lambda need to be mutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. At one point it did, but I can't remember why anymore. It doesn't need it now.
try { | ||
for (const auto& chunk_idx : chunks) { | ||
auto& chunk = chunk_buffers_.at(chunk_idx); | ||
std::span data{ reinterpret_cast<std::byte*>(chunk.data()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Depends on #303.