-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodically delete Tmp Segment file brought by the SplitCommit End phase #10815
Periodically delete Tmp Segment file brought by the SplitCommit End phase #10815
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankitsultana Please check my self-review. Wanted to solicit your idea.
...r/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java
Outdated
Show resolved
Hide resolved
@@ -74,7 +73,7 @@ public URI uploadSegment(File segmentFile, LLCSegmentName segmentName, int timeo | |||
final String rawTableName = TableNameBuilder.extractRawTableName(segmentName.getTableName()); | |||
Callable<URI> uploadTask = () -> { | |||
URI destUri = new URI(StringUtil.join(File.separator, _segmentStoreUriStr, segmentName.getTableName(), | |||
segmentName.getSegmentName() + UUID.randomUUID().toString())); | |||
SegmentCompletionUtils.generateSegmentFileName(segmentName.getSegmentName()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Server2ControllerSegmentUploader and PinotFSSegmentUploader has different naming pattern for tmp segments before.
Server2ControllerSegmentUploader: segment_name.tmp.
PinotFSSegmentUploader: segment_name.
...t-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java
Outdated
Show resolved
Hide resolved
@wirybeaver Let's follow-up offline |
Do we need to clean up tmp files periodically, or just when the controller starts? Will controller create extra tmp files during the regular life cycle without shut down? |
Hi Jackie, there are two kind of segment uploader during SegmentUpload phase for split commit.
In the CommitEnd phase, the deep storage can have outage due to flaky network when controller attempt to rename the tmp file. If peer downloading is enabled, controller will continue to commit the segment in ZK after timeout. The tmp file named with unique |
To add to @wirybeaver 's response, we will need to add a periodic job for this clean-up. This PR is corresponding to this doc we had written a while back: https://docs.google.com/document/u/1/d/18PqeN7eUP9Yc2MqwihSrGMmadJ3nuDsrcWf6F_ks_hQ/edit In our internal clusters, we found that 50% of all deep-store data for realtime tables was just leaked data. |
Codecov Report
@@ Coverage Diff @@
## master #10815 +/- ##
============================================
+ Coverage 61.52% 63.07% +1.55%
+ Complexity 6513 1106 -5407
============================================
Files 2233 2326 +93
Lines 120083 125049 +4966
Branches 18223 19094 +871
============================================
+ Hits 73878 78873 +4995
+ Misses 40798 40563 -235
- Partials 5407 5613 +206
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 523 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Outdated
Show resolved
Hide resolved
...troller/src/main/java/org/apache/pinot/controller/validation/RealtimeTempSegmentCleaner.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java
Outdated
Show resolved
Hide resolved
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java
Outdated
Show resolved
Hide resolved
pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java
Outdated
Show resolved
Hide resolved
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high level questions:
- Should we always rely on the async cleanup and remove the sync cleanup (assuming listing files is expensive)? I can see we can integrate more clean up logic into this periodic task in the future. Also, let's make the periodic task name more generic for future prove
- Do we create temp file for non-split-commit scenario?
For question 1, I already ensured the sync cleanup will be skipped if async deletion is enabled.
Rename it as TmpSegmentCleaner? From my point of view, have a dedicated class for a specific job is more modular and gets better parallel. Put multiple tasks into the same Thread.run() code will result a lot of if/else, try/catch logic. For example, the failure of task1 should not block task2. For question 2, I need to inspect the source code. |
What I meant is to always enable this periodic task and remove the sync cleanup logic. |
I tried to set the default value to true but a bunch of unit test failure. directly set to true seems to be incompatible.
In the initial version, I append the clean up task at the end of validation manager. I am good to rollback. |
Have an manual integration test with Uber internal pinotFS and the log shows that the tmp file was deleted successfully.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the feedback.
List<SegmentZKMetadata> segmentsZKMetadata = _pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName); | ||
|
||
// Delete tmp segments | ||
try { | ||
long numDeleteTmpSegments = _llcRealtimeSegmentManager.deleteTmpSegments(realtimeTableName, segmentsZKMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: one alternative for this may be to check isTmpSegmentAsyncDeletionEnabled
outside deleteTmpSegments
and avoid calling deleteTmpSegments
altogether if the feature is not enabled. Also deleteTmpSegments
could add a Precondition(isTmpSegmentAsyncDeletionEnabled(), "called even though not enabled")
. Though this also works so feel free to ignore.
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SegmentCompletionUtils.java
Outdated
Show resolved
Hide resolved
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java
Outdated
Show resolved
Hide resolved
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java
Show resolved
Hide resolved
...er/src/main/java/org/apache/pinot/controller/api/resources/LLCSegmentCompletionHandlers.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
@@ -239,6 +239,14 @@ public void cleanupSegmentCountGauge(final String resource) { | |||
removeGauge(resource, ValidationMetricName.SEGMENT_COUNT); | |||
} | |||
|
|||
public void updateTmpSegmentCountGauge(final String resource, final long tmpSegmentCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we define it as a gauge? Should it be a meter instead?
Let's match the name of the method and the metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static <T> PinotGauge<T> makeGauge(PinotMetricsRegistry registry, PinotMetricName name, PinotGauge<T> gauge) {
return registry.newGauge(name, gauge);
}
public static PinotMeter makePinotMeter(PinotMetricsRegistry registry, PinotMetricName name, String eventType,
TimeUnit unit) {
return registry.newMeter(name, eventType, unit);
}
Meter metrics would output .m1, .m5, .m15 I am not whether we need those statics. Existing count related metrics of ValidationMetrics also use gauge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gauge means refreshing the value, so that can be used to for total docs, total segments etc, but not for incremental values such as segments deleted in this round
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate for sharing the knowledge. I already replaced gauge with meter and add removeTableMeter method into AbstractMetrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang Could you have another review?
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java
Outdated
Show resolved
Hide resolved
pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/pinot/controller/validation/RealtimeSegmentValidationManager.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
5ff96a7
to
cef991f
Compare
&& getCurrentTimeMs() - lastModified > _controllerConf.getTmpSegmentRetentionInSeconds() * 1000L; | ||
} | ||
|
||
private boolean isLowLevelConsumer(String tableNameWithType, TableConfig tableConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isLowLevelConsumer
is deprecated.
issue #10243
Unify the tmp file naming format
In the 2nd phase (Segment Upload) of segment split commit, Server2ControllerSegmentUploader and PinotFSSegmentUploader diverges on the naming pattern of temporary segment file, which are
<segmentName>.tmp.<UUID>
and<segmentName><UUID>
respectively. In the 3rd phase (CommitEnd), the controller moves the temporary file to permanent file and then synchronously delete other potential temporary files that were incidentally generated due to the flaky network. However, the deletion currently only filters tmp files matches Server2ControllerSegmentUploader's pattern. Thus, we need to unify the tmp file pattern for both uploaders.Periodic File Deletion
Add a clusters level knob to turn on async tmp file deletion. If it's turned on, the time consuming FS.listFiles() won't be invoked and thus increase the throughput of realtime consuming.
Deletion files only if the following conditions are meet:
(1) the name ends with
.tmp.<UUID>
(2) out of the configurable retention period, e.g. 10 min. Don't want to make the retention too short or zero. Because it's likely that the current consuming segment is being committed by the lead server and the lead server will generate an intermediate tmp segment, which is a valid case.
(3) the file path is not in the set of segment metadata's download url
New config
controller.realtime.segment.tmpFileAsyncDeletionEnabled
(defaultfalse
)controller.realtime.segment.tmpFileRetentionInSeconds
(default3600
)