-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance SegmentPartitionMetadataManager to handle new segment #11585
Enhance SegmentPartitionMetadataManager to handle new segment #11585
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11585 +/- ##
============================================
- Coverage 63.04% 63.04% -0.01%
Complexity 1108 1108
============================================
Files 2326 2326
Lines 124882 124926 +44
Branches 19065 19075 +10
============================================
+ Hits 78731 78757 +26
- Misses 40544 40564 +20
+ Partials 5607 5605 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 8 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
// If the new segment is not the first segment of a partition, add it only if it won't reduce the fully | ||
// replicated servers. It is common that a new segment (newly pushed, or a new consuming segment) doesn't have | ||
// all the replicas available yet, and we want to exclude it from the partition info until all the replicas | ||
// are available. | ||
//noinspection SlowListContainsAll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means any request with partitioned computation will miss a segment until it is fully replicated? what if a commit time of a consuming segment is on average 1hr and in that case:
- we wont be able to access consuming segment data until 1hr later when it commits + fully replicated to all replicas
- we will be able to access consuming segment as long as all replica servers starts consuming (but results will be inconsistent depending which partition replica we dispatch the request to
which of the above is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Segment is count as new for up to 5 minutes after pushed/created, so committing segment won't be counted as new segment. New consuming segment is created after the previous consuming segment is committed.
if (partitionId == INVALID_PARTITION_ID) { | ||
segmentsWithInvalidPartition.add(segment); | ||
continue; | ||
} | ||
// Process new segments in the end | ||
if (InstanceSelector.isNewSegment(segmentInfo._pushTimeMs, currentTimeMs)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that a new segment is with invalid partitionID? should we move this above the invalid id check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New segment should have valid partition id. Segment ZK metadata won't be changed for new segment within the life cycle
Similar to #10466, we need to be more graceful for new segment (newly pushed or new consuming segment) because it is common that they don't have all replicas available in external view.
This PR adds the new segment into the partition info only when all the replicas are available