Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ability to fetch metadata for specific list of segments #11949

Merged
merged 1 commit into from
Nov 9, 2023

Conversation

swaminathanmanish
Copy link
Contributor

@swaminathanmanish swaminathanmanish commented Nov 3, 2023

Problem: Getting metadata for all segments leads to high heap utilization and even controller OOMs (especially when we have large number of segments and small controller heap). Typically we dont need metadata for all segments at once and this can be done in batches.
Solution:
Adding an API to fetch metadata for a list of segments. Also unified all APIs to use the same internal function. Did not modify existing APIs for backwards compatiblity.

Verified that existing functionality/apis work as expected
curl -X 'GET'
'http://localhost:9000/segments/airlineStats/metadata?type=OFFLINE'
-H 'accept: application/json'

@swaminathanmanish
Copy link
Contributor Author

cc @snleee - Please review

@codecov-commenter
Copy link

codecov-commenter commented Nov 3, 2023

Codecov Report

Merging #11949 (4ee5788) into master (53e8033) will increase coverage by 0.14%.
Report is 15 commits behind head on master.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master   #11949      +/-   ##
============================================
+ Coverage     61.30%   61.45%   +0.14%     
+ Complexity     1140      207     -933     
============================================
  Files          2378     2385       +7     
  Lines        128865   129157     +292     
  Branches      19927    19997      +70     
============================================
+ Hits          78998    79370     +372     
+ Misses        44147    44027     -120     
- Partials       5720     5760      +40     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (?)
java-11 61.39% <0.00%> (+61.39%) ⬆️
java-21 61.29% <0.00%> (-0.02%) ⬇️
skip-bytebuffers-false 61.42% <0.00%> (+0.14%) ⬆️
skip-bytebuffers-true 61.27% <0.00%> (+<0.01%) ⬆️
temurin 61.45% <0.00%> (+0.14%) ⬆️
unittests 61.44% <0.00%> (+0.14%) ⬆️
unittests1 46.66% <ø> (+0.19%) ⬆️
unittests2 27.64% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...che/pinot/controller/util/TableMetadataReader.java 0.00% <0.00%> (ø)

... and 77 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@swaminathanmanish swaminathanmanish force-pushed the segmentMetadataBatch branch 2 times, most recently from d03d917 to a6c8a4d Compare November 7, 2023 17:39
@snleee snleee merged commit aac4898 into apache:master Nov 9, 2023
19 checks passed
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. I don't see the rest API change in this PR. Do we plan to add it in a separate PR?

*/
public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, int timeoutMs)
public JsonNode getSegmentsMetadata(String tableNameWithType, List<String> columns, Set<String> segmentsToInclude,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add @Nullable to segmentsToInclude

}

private JsonNode getSegmentsMetadataInternal(String tableNameWithType, List<String> columns,
Set<String> segmentsToInclude, int timeoutMs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add @Nullable to segmentsToInclude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants