Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] Run cold schema refresh thread periodically #16873

Merged
merged 10 commits into from
Aug 13, 2024

Conversation

findingrish
Copy link
Contributor

@findingrish findingrish commented Aug 9, 2024

Issue

Parent issue: #14989
Fixes a bug in #16676.
Schema from cold segments wasn't getting reflected in the datasource schema.
The problem turned out to be in the refresh thread, it ran only once instead of running periodically.

This PR also adds some metrics to monitor cold schema refresh process.

Testing

  1. Added unit test
  2. Verified in a Druid cluster that,
    • Completely cold datasource is queryable. Refer to cold datasource in the attached screen shot with 1 segment.

Screenshot 2024-08-09 at 5 40 33 PM

Screenshot 2024-08-09 at 5 39 39 PM

  • Datasource schema for partially cold datasource includes columns from cold segments. Refer to partially_cold datasource in the attached screen shot with 2 segments. The hot segment has columns c3, c4, c5 and cold segment has columns c1, c2, c3.

Screenshot 2024-08-09 at 5 40 05 PM

  • Schema for new cold segments created after enabling the CentralizedDatasourceSchema feature is included in datasource schema.
  • Entirely cold datasources which existed before enabling CentralizedDatasourceSchema will not be queryable until at least 1 segment is loaded after enabling the feature. The datasource remains queryable even if the segment is unloaded.

Release Notes

Added followings metrics to monitor cold schema refresh process.

metadatacache/cold/segment/count -> Number of cold segment per datasource. 
metadatacache/cold/refresh/count -> Number of cold segments with cached schema per datasource. 
metadatacache/cold/process/time -> Time taken in milliseconds to process cold segments. 

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

ServiceMetricEvent.Builder metricBuilder =
new ServiceMetricEvent.Builder().setDimension(DruidMetrics.DATASOURCE, dataSourceName);

emitter.emit(metricBuilder.setMetric("metadatacache/cold/segment/count", coldSegments));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use a prefix for metadatacache/cold/segment/

@@ -382,6 +382,9 @@ These metrics are emitted by the Druid Coordinator in every run of the correspon
|`metadatacache/finalizedSchemaPayload/count`|Number of finalized segment schema cached.||Depends on the number of distinct schema in the cluster.|
|`metadatacache/temporaryMetadataQueryResults/count`|Number of segments for which schema was fetched by executing segment metadata query.||Eventually it should be 0.|
|`metadatacache/temporaryPublishedMetadataQueryResults/count`|Number of segments for which schema is cached after back filling in the database.||This value gets reset after each database poll. Eventually it should be 0.|
|`metadatacache/cold/segment/count`|Number of cold segments.|`dataSource`||
|`metadatacache/cold/refresh/count`|Number of cold segments with cached schema.|`dataSource`||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cold segment" is not a term that exists anywhere outside of CoordinatorSegmentMetadataCache.
It would be better to use a term that is more easily relatable.

Also, the metadatacache metrics seems all over the place with their prefixes. It is unavoidable for metrics across multiple features to not have a consistent naming scheme. But metrics within the same feature should be consistent. Since this feature is still nascent, I would advise we revisit all the metric names for this feature and categorize them nicely with proper prefixes. (not necessarily in this PR)

cc: @cryptoe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use a term that is more easily relatable.

I don't find much references for such segments, do you have any suggestions?

Since this feature is still nascent, I would advise we revisit all the metric names for this feature and categorize them nicely with proper prefixes.

Are you suggesting to use a different prefix than metadatacache? How about cds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes lets change the wording from cold to deepstorageonly.

@cryptoe cryptoe merged commit f67ff92 into apache:master Aug 13, 2024
90 checks passed
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants