Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log a small subset of segments to refresh for debugging Coordinator refresh logic #16998

Merged
merged 3 commits into from
Sep 5, 2024

Conversation

findingrish
Copy link
Contributor

With CDS enabled Coordinator shouldn't ideally be issuing metadata queries to refresh segments.

This change logs a subset of segment ids to be refreshed to enable debugging.

@Override
void logSegmentsToRefresh(String dataSource, Set<SegmentId> ids)
{
log.info("Refreshing segments [%s] for datasource [%s]", Iterables.limit(ids, 10), dataSource);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line should mention that this is a sample of the over all refresh segments.
Also this would be per data source no ? Should we just log 5 segments here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is per datasource. Sure, I can make it log 5 segment ids.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How frequently will this method be called?

Copy link
Contributor Author

@findingrish findingrish Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the worst case this would be called every minute. Generally there aren't segments to be refreshed in each cycle.
Also, as I mentioned in the description we expect this set to be ideally empty on the Coordinator.

@@ -805,6 +808,11 @@ public Set<SegmentId> refreshSegmentsForDataSource(final String dataSource, fina
return retVal;
}

void logSegmentsToRefresh(String dataSource, Set<SegmentId> ids)
Copy link
Contributor

@cryptoe cryptoe Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add java docs to this method. Also there should be a abstract method called logRefreshSegments and that should return a boolean.
Then each impl can set true or false
So CSMC can return true where as the broker one can return false.
And the log.info call is in the abstract class itself.

Copy link
Contributor Author

@findingrish findingrish Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of doing that? Also, is there a reason behind making it abstract?

Copy link
Contributor

@cryptoe cryptoe Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main logic of logging does not leak out to impls.
So a rouge impl cannot mess up the logs of the coordinator/broker ever . Atleast that is what I was thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but is there a reason behind making it abstract and implementing it in both Broker and Coordinator?
The base method could return false and the overridden method in Coordinator could return true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the implementation logic for logging the segments should reside in the child classes. This gives flexibility to both Broker and Coordinator to log different number of segments, blacklist/whitelist datasources etc.

Copy link
Contributor

@cryptoe cryptoe Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base method could return false and the overridden method in Coordinator could return true?

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base method could return false and the overridden method in Coordinator could return true?

Sure.

I haven't made this change yet. Let me know what do you think about this,

I feel that the implementation logic for logging the segments should reside in the child classes. This gives flexibility to both Broker and Coordinator to log different number of segments, blacklist/whitelist datasources etc.

@cryptoe cryptoe merged commit 18a9a75 into apache:master Sep 5, 2024
89 of 90 checks passed
edgar2020 pushed a commit to edgar2020/druid that referenced this pull request Sep 5, 2024
…efresh logic (apache#16998)

* Log a small number of segments to refresh per datasource in the Coordinator

* review comments

* Update log message
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants