Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-632] Use direct committer when writing raster files using df.write.format("raster") #1528

Merged

Conversation

Kontinuation
Copy link
Member

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

Writing large amounts of raster files to distributed file systems or object store is super slow, because the output committer has to move files from temporary locations to their target locations. Users will see all the tasks are completed but the driver is stuck at the committing phase.

This patch an option useDirectCommitter to the raster format. By default useDirectCommitter is true, and the raster format will use a direct committer that writes raster files to their target locations directly. Users can manually set it to false if they want the original behavior.

How was this patch tested?

Passing existing tests

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

@Kontinuation Kontinuation marked this pull request as ready for review July 25, 2024 13:56
@jiayuasu jiayuasu added this to the sedona-1.6.1 milestone Jul 25, 2024
@jiayuasu jiayuasu merged commit 47bd817 into apache:master Jul 25, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants