-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add partition level Force Commit #12088
Add partition level Force Commit #12088
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12088 +/- ##
============================================
- Coverage 61.62% 61.56% -0.07%
Complexity 1152 1152
============================================
Files 2389 2407 +18
Lines 129824 130957 +1133
Branches 20083 20235 +152
============================================
+ Hits 80009 80625 +616
- Misses 43989 44440 +451
- Partials 5826 5892 +66
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, otherwise lgtm
* @param tableNameWithType table name with type | ||
* @param partitionsToCommit comma separated list of partitions to commit | ||
* @param segmentsToCommit comma separated list of consuming segments to commit | ||
* @return the set of consuming segments that were committed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really don't wait for them to commit, so perhaps we should say something like:
"set of segments for which commit was initiated"
* Commit all partitions unless either partitionsToCommit or segmentsToCommit are provided. | ||
* | ||
* @param tableNameWithType table name with type | ||
* @param partitionsToCommit comma separated list of partitions to commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent, should it be "partitionGroupIdsToCommit" instead of "partitionsToCommit"?
@ApiParam(value = "Name of the table", required = true) @PathParam("tableName") String tableName, | ||
@ApiParam(value = "Comma separated list of partition group IDs to be committed") @QueryParam("partitions") | ||
String partitionGroupIds, | ||
@ApiParam(value = "Comma separated list of consuming segments to be committed") @QueryParam("segments") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us modify this documentation as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the variableName in the method as well as the documentation, but for the actual API's query param, I believe the shorter forms - partitions
and segments
- are better as they make the endpoint shorter:
POST /tables/{tableName}/forceCommit?segments=s1,s2
instead of
POST /tables/{tableName}/forceCommit?segmentsToCommit=s1,s2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait. Why do we have both partitiongroupIds as well as consuming segments? There is only one consuming segment per partition, so it is just enough to provide the segment names right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the option to provide either makes it easier for operators.
Keep in mind that both params cannot be provided at the same time, we fail the request. It's either partitions or segments.
if (partitionGroupIdsToCommitStr == null && segmentsToCommitStr == null) { | ||
return allConsumingSegments; | ||
} | ||
Preconditions.checkState(partitionGroupIdsToCommitStr == null || segmentsToCommitStr == null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Suggest moving this check to the first line of forceCommit()
and change it to checkArgument()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for making the changes
Currently, the force-commit operation is applied to all partitions within a table. However, in certain instances, problems may arise in only a specific subset of consuming segments. Addressing these issues requires force-committing the problematic partitions, but applying force-commit to the entire table affects all partitions. This is undesirable for tables with a large number of partitions. This PR introduces partition-level force-commit functionality, expanding the endpoint to accept a comma-separated list of partitions or consuming segment names.