-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-405] Replace Metric with LongAccumulator to reduce memory overhead for spatial join #1041
Conversation
Just my two cents. I would prefer removing the Sedona Metric class and use the Spark build in LongAccumulator instead. The LongAccumulator has no overhead. There would be no need for a debug option. Less code, less knobs and metrics for everyone. Keeping the Sedona Metric class means there is no way to have metrics for larger jobs, where you would really want them for tuning. |
@umartin The Metric is designed to capture the <paritionId, buildSideCount> and <paritionId, streamSideCount> for each partition in a task. If we use the LongAccumulator without the partitionId, then there is no partition-wise information and there is no point of even having this LongAccumulator. Or maybe I misunderstood your suggestion. Please advise. |
@umartin Oh, now it is clear to me. Thanks for the suggestion. Will fix it according to your suggestion |
Thank you! I'm really looking forward to the 1.5 release. There are tons of great changes. You've been really busy! |
…rhead for spatial join (apache#1041)
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-405] my subject
.What changes were proposed in this PR?
Replace the old Metric class with the Spark built-in LongAccumulator to reduce the memory overhead of spatial join
This should significantly reduce the Sedona driver program memory overhead for both RDD-based join and SQL-based join
How was this patch tested?
Passed existing tests.
Did this PR include necessary documentation updates?