-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work around the problem of Helix sending 2 transitions for CONSUMING -> DROPPED #12351
Work around the problem of Helix sending 2 transitions for CONSUMING -> DROPPED #12351
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Submitted a Helix issue to verify the observation: apache/helix#2751 |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12351 +/- ##
============================================
+ Coverage 61.69% 61.74% +0.04%
Complexity 207 207
============================================
Files 2424 2424
Lines 132340 132512 +172
Branches 20436 20481 +45
============================================
+ Hits 81651 81820 +169
- Misses 44693 44694 +1
- Partials 5996 5998 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you for catching this.
// by OFFLINE -> DROPPED. Use this cache to track the segments that just went through CONSUMING -> OFFLINE transition | ||
// to detect CONSUMING -> DROPPED transition. | ||
// TODO: Check how Helix handle CONSUMING -> DROPPED transition and remove this cache if it's not needed. | ||
private final Cache<Pair<String, String>, Boolean> _recentlyOffloadedConsumingSegments = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about move the cache into the tableDataMgr? Considering the key of the map is pair<tableName, segmentName> anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer keeping it here since it is just a workaround for Helix state transition handling. Once Helix fixes the problem we should be able to remove it
if (_recentlyOffloadedConsumingSegments.getIfPresent(tableSegmentPair) != null) { | ||
_recentlyOffloadedConsumingSegments.invalidate(tableSegmentPair); | ||
onConsumingToDropped(tableNameWithType, segmentName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an actual problem, but it is a bit confusing.
_recentlyOffloadedConsumingSegments
is only populated with realtime tables, why do we need to check if (TableNameBuilder.isRealtimeTableResource(tableNameWithType))
?
Is the reason to do not create the pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and to reduce a cache lookup. OFFLINE table doesn't really have CONSUMING state
Helix might send 2 state transitions when a segment is dropped from a server. This behavior breaks the
onConsumingToDropped()
callback because it is only invoked when processingCONSUMING -> DROPPED
state transition.This PR added a cache for segments that just went through
CONSUMING -> OFFLINE
state transition to work around this problem.