Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone 12/N: Implement and test ZarrV3ArrayWriter #304

Merged
merged 93 commits into from
Oct 4, 2024

Conversation

aliddell
Copy link
Member

Depends on #303.

Copy link
Collaborator

@shlomnissan shlomnissan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions and comments but otherwise looks good.

Comment on lines +12 to +14
#ifdef max
#undef max
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is max defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere in minio, which is included in s3.connection.hh which is included in sink.creator.hh which is included here. I have an issue here to address it.

namespace {
std::string
sample_type_to_dtype(ZarrDataType t)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary empty lines.

#include "array.writer.hh"

namespace zarr {
struct ZarrV3ArrayWriter final : public ArrayWriter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously suggested (apologies if I missed your response), I think we should avoid using final.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been removing them, but missed this one.

struct ZarrV3ArrayWriter final : public ArrayWriter
{
public:
ZarrV3ArrayWriter(ArrayWriterConfig&& config,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is an r-value reference necessary in this context? I assume you want to enforce a move operation, but I'm unsure why the config should be moved rather than passed as a pointer, which is one alternative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayWriterConfig has a std::unique_ptr<ArrayDimensions> member, though the next PR reveals this does need to be a shared pointer after all, because the Stream object needs to hold on to them as well for metadata purposes.

Comment on lines 70 to 71
std::fill_n(
table.begin(), table.size(), std::numeric_limits<uint64_t>::max());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is std::fill_n necessary if you're filling the entire container? If not, you should use std::fill or even better, use <ranges> instead.

ranges::fill(table, std::numeric_limits<uint64_t>::max());

src/streaming/zarrv3.array.writer.cpp Show resolved Hide resolved
}

// write out chunks to shards
bool write_table = is_finalizing_ || should_rollover_();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using auto consistently is a good idea.

const auto is_write_table = is_finalizing_ || should_rollover_();

for (auto i = 0; i < n_shards; ++i) {
const auto& chunks = chunk_in_shards.at(i);
auto& chunk_table = shard_tables_.at(i);
size_t* file_offset = &shard_file_offsets_.at(i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use auto* here too if you want.

file_offset,
write_table,
&latch,
this](std::string& err) mutable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this lambda need to be mutable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. At one point it did, but I can't remember why anymore. It doesn't need it now.

try {
for (const auto& chunk_idx : chunks) {
auto& chunk = chunk_buffers_.at(chunk_idx);
std::span data{ reinterpret_cast<std::byte*>(chunk.data()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@aliddell aliddell changed the base branch from standalone-sequence-11 to main October 4, 2024 12:52
@aliddell aliddell merged commit aa59079 into main Oct 4, 2024
3 checks passed
aliddell added a commit that referenced this pull request Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants