-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec #11025
Changes from all commits
9cd4434
e8a5fda
bcc8d83
c158d63
726a90f
efb2dc9
b3338f2
fb49da0
7f0ef9a
b930fae
ed0a0a3
59b35ba
343f641
2495534
64f42c0
53c67bd
1dcc9b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -67,6 +67,7 @@ | |
import java.util.ArrayList; | ||
import java.util.Collection; | ||
import java.util.Collections; | ||
import java.util.HashSet; | ||
import java.util.Iterator; | ||
import java.util.List; | ||
import java.util.Map; | ||
|
@@ -481,6 +482,28 @@ public static Function<Set<DataSegment>, Set<DataSegment>> compactionStateAnnota | |
} | ||
} | ||
|
||
public static Set<DataSegment> getUsedSegmentsWithinInterval( | ||
TaskToolbox toolbox, | ||
String dataSource, | ||
List<Interval> intervals | ||
) throws IOException | ||
{ | ||
Set<DataSegment> segmentsFoundForDrop = new HashSet<>(); | ||
List<Interval> condensedIntervals = JodaUtils.condenseIntervals(intervals); | ||
if (!intervals.isEmpty()) { | ||
Collection<DataSegment> usedSegment = toolbox.getTaskActionClient().submit(new RetrieveUsedSegmentsAction(dataSource, null, condensedIntervals, Segments.ONLY_VISIBLE)); | ||
for (DataSegment segment : usedSegment) { | ||
for (Interval interval : condensedIntervals) { | ||
if (interval.contains(segment.getInterval())) { | ||
segmentsFoundForDrop.add(segment); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. Btw I think it should be a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yes, you're right, should be a |
||
break; | ||
} | ||
} | ||
} | ||
} | ||
return segmentsFoundForDrop; | ||
} | ||
|
||
@Nullable | ||
static Granularity findGranularityFromSegments(List<DataSegment> segments) | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that the segments returned from the RetrieveUsedSegmentsAction are gaurenteed to be within the interval specified, is that not the case? If so, do we need to check again that the segment is in the interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not the case. For example, if you have a segment with interval 2000-01-01/2001-01-01 and your interval specified is 2000-04-28/2000-04-29. Then if the above segment has data for the day 2000-04-28/2000-04-29, it would be returned by RetrieveUsedSegmentsAction. However, we cannot drop this segment since the interval specified is 2000-04-28/2000-04-29. We can only drop segments that starts and ends within the interval specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let me make the doc a little clearer than we are only dropping segments that starts and ends within the
interval
specified ingranularitySpec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see ok, makes sense.