Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Managed Iceberg] Support partitioning time types (year, month, day, hour) #32939

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ahmedabu98
Copy link
Contributor

Fixes #32865

@ahmedabu98
Copy link
Contributor Author

@DanielMorales9 can you take a look at this one too?

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@DanielMorales9
Copy link

DanielMorales9 commented Oct 25, 2024

I think a more scalable approach here would be to encapsulate the writing logic within a Parquet writer class. This would be similar to how Spark or Flink handle Parquet writes (e.g., i.e. SparkParquetWriter, FlinkParquetWriter), allowing you to manage the type conversions and partitioning requirements specific to Iceberg in a centralized and reusable way.

@ahmedabu98
Copy link
Contributor Author

ahmedabu98 commented Oct 25, 2024

We have a relatively thin RecordWriter wrapper that uses Parquet and Avro writers. A RecordWriter is blind to its data file's partition key and spec.

There's one RecordWriter for each destination-partition pair, and RecordWriterManager takes care of routing records to the correct destination and partition. If it helps, we can certainly move the new partition logic in this PR to RecordWriterManager. I can see that it belongs there more than in utils.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: IcebergIO cannot write data into an hourly partitioned table
2 participants