Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partition level Force Commit #12088

Merged

Conversation

sajjad-moradi
Copy link
Contributor

@sajjad-moradi sajjad-moradi commented Dec 4, 2023

Currently, the force-commit operation is applied to all partitions within a table. However, in certain instances, problems may arise in only a specific subset of consuming segments. Addressing these issues requires force-committing the problematic partitions, but applying force-commit to the entire table affects all partitions. This is undesirable for tables with a large number of partitions. This PR introduces partition-level force-commit functionality, expanding the endpoint to accept a comma-separated list of partitions or consuming segment names.

@codecov-commenter
Copy link

codecov-commenter commented Dec 4, 2023

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (d7c76b9) 61.62% compared to head (7e00e40) 61.56%.
Report is 60 commits behind head on master.

Files Patch % Lines
...ller/api/resources/PinotRealtimeTableResource.java 0.00% 8 Missing ⚠️
.../core/realtime/PinotLLCRealtimeSegmentManager.java 65.21% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12088      +/-   ##
============================================
- Coverage     61.62%   61.56%   -0.07%     
  Complexity     1152     1152              
============================================
  Files          2389     2407      +18     
  Lines        129824   130957    +1133     
  Branches      20083    20235     +152     
============================================
+ Hits          80009    80625     +616     
- Misses        43989    44440     +451     
- Partials       5826     5892      +66     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.52% <48.38%> (-0.08%) ⬇️
java-21 61.42% <48.38%> (-0.07%) ⬇️
skip-bytebuffers-false 61.55% <48.38%> (-0.07%) ⬇️
skip-bytebuffers-true 61.39% <48.38%> (-0.08%) ⬇️
temurin 61.56% <48.38%> (-0.07%) ⬇️
unittests 61.56% <48.38%> (-0.07%) ⬇️
unittests1 46.62% <ø> (-0.28%) ⬇️
unittests2 27.67% <48.38%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise lgtm

* @param tableNameWithType table name with type
* @param partitionsToCommit comma separated list of partitions to commit
* @param segmentsToCommit comma separated list of consuming segments to commit
* @return the set of consuming segments that were committed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really don't wait for them to commit, so perhaps we should say something like:
"set of segments for which commit was initiated"

* Commit all partitions unless either partitionsToCommit or segmentsToCommit are provided.
*
* @param tableNameWithType table name with type
* @param partitionsToCommit comma separated list of partitions to commit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent, should it be "partitionGroupIdsToCommit" instead of "partitionsToCommit"?

@ApiParam(value = "Name of the table", required = true) @PathParam("tableName") String tableName,
@ApiParam(value = "Comma separated list of partition group IDs to be committed") @QueryParam("partitions")
String partitionGroupIds,
@ApiParam(value = "Comma separated list of consuming segments to be committed") @QueryParam("segments")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us modify this documentation as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the variableName in the method as well as the documentation, but for the actual API's query param, I believe the shorter forms - partitions and segments - are better as they make the endpoint shorter:

POST /tables/{tableName}/forceCommit?segments=s1,s2

instead of

POST /tables/{tableName}/forceCommit?segmentsToCommit=s1,s2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. Why do we have both partitiongroupIds as well as consuming segments? There is only one consuming segment per partition, so it is just enough to provide the segment names right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the option to provide either makes it easier for operators.
Keep in mind that both params cannot be provided at the same time, we fail the request. It's either partitions or segments.

@Jackie-Jiang Jackie-Jiang added release-notes Referenced by PRs that need attention when compiling the next release notes documentation labels Dec 5, 2023
if (partitionGroupIdsToCommitStr == null && segmentsToCommitStr == null) {
return allConsumingSegments;
}
Preconditions.checkState(partitionGroupIdsToCommitStr == null || segmentsToCommitStr == null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Suggest moving this check to the first line of forceCommit() and change it to checkArgument()

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making the changes

@sajjad-moradi sajjad-moradi merged commit 9d939a0 into apache:master Dec 21, 2023
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants