Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Pinot storage metrics for compressed tar.gz and table size w/o replicas #8358

Merged
merged 1 commit into from
Mar 25, 2022

Conversation

timsants
Copy link
Contributor

@timsants timsants commented Mar 16, 2022

Description

New Pinot table/segment storage metrics:

  1. TableCompressedSize: For computing this metric, I added a new sizeInBytes field in SegmentZKMetadata. Aside for being useful for debugging, this allows us to quantify the size of compressed tar.gz files that is stored in deep store. This is different from the segment size metrics that is reported using the segment sizes taken from the Pinot servers.
  2. TableSizePerReplicaOnServer: This metric provides the table size per replica. This is calculated using he table size reported by servers divided by the number of replicas configured for the table.
  3. TableTotalSizeOnServer: This is the table size reported by servers for the table. This metric should be used instead of OfflineTableEstimatedSize. OfflineTableEstimatedSize is now deprecated.

Testing

  • Updated unit tests
  • Verified metrics are appearing on jconsole with expected byte counts

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

  • A new field, segment.size.in.byte, was added to SegmentZKMetadata which is the size of the compressed segment tar.gz file.
  • New table size gauge metrics were added to the controller:
    1. TableCompressedSize: The total size in bytes of the compressed segments per table
    2. TableSizePerReplicaOnServer: The table size reported by servers divided by the number of replicas configured for the table.
    3. TableTotalSizeOnServer: The total table size reported by servers. This metric should be used instead of OfflineTableEstimatedSize. OfflineTableEstimatedSize is now deprecated.

@codecov-commenter
Copy link

codecov-commenter commented Mar 16, 2022

Codecov Report

Merging #8358 (737ce64) into master (4bb7e4f) will decrease coverage by 40.36%.
The diff coverage is 76.36%.

@@              Coverage Diff              @@
##             master    #8358       +/-   ##
=============================================
- Coverage     70.88%   30.52%   -40.37%     
=============================================
  Files          1655     1643       -12     
  Lines         86607    86285      -322     
  Branches      13064    13025       -39     
=============================================
- Hits          61389    26335    -35054     
- Misses        20982    57583    +36601     
+ Partials       4236     2367     -1869     
Flag Coverage Δ
integration1 28.70% <76.36%> (-0.01%) ⬇️
integration2 27.26% <67.27%> (-0.18%) ⬇️
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...apache/pinot/controller/BaseControllerStarter.java 77.28% <ø> (-5.37%) ⬇️
...va/org/apache/pinot/controller/ControllerConf.java 51.82% <ø> (-7.29%) ⬇️
...va/org/apache/pinot/spi/utils/CommonConstants.java 0.00% <ø> (-23.64%) ⬇️
...ntroller/helix/core/PinotHelixResourceManager.java 41.38% <50.00%> (-26.00%) ⬇️
...ces/PinotSegmentUploadDownloadRestletResource.java 52.28% <60.00%> (-7.81%) ⬇️
...e/pinot/controller/helix/SegmentStatusChecker.java 72.32% <75.00%> (-10.67%) ⬇️
.../apache/pinot/controller/util/TableSizeReader.java 97.16% <92.30%> (+1.01%) ⬆️
...not/common/metadata/segment/SegmentZKMetadata.java 82.31% <100.00%> (-2.41%) ⬇️
...g/apache/pinot/common/metrics/ControllerGauge.java 97.56% <100.00%> (+0.19%) ⬆️
...apache/pinot/controller/api/upload/ZKOperator.java 66.14% <100.00%> (-11.82%) ⬇️
... and 1137 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bb7e4f...737ce64. Read the comment docs.

@mcvsubbu
Copy link
Contributor

What use case are we trying to solve here?
We need to be a bit careful while increasing zk metadata

@Jackie-Jiang
Copy link
Contributor

@mcvsubbu segment size can be very useful debugging info (we keep total docs in segment metadata for similar purpose). Currently we only have APIs to read segment size on the servers (uncompressed + extra indexes), but no APIs to read the compressed segment size. The overhead of adding one entry into the segment metadata should be minimal

@mcvsubbu
Copy link
Contributor

@mcvsubbu segment size can be very useful debugging info (we keep total docs in segment metadata for similar purpose). Currently we only have APIs to read segment size on the servers (uncompressed + extra indexes), but no APIs to read the compressed segment size. The overhead of adding one entry into the segment metadata should be minimal

I agree that one extra item is minimal in overhead. We add one at a time, though, and we need to be aware that we are expanding it. It is good to know the use case.

If this is only for debugging purpose, we can always query the store to find the (compressed) size of the segment. I don't see the need for adding a metric in that case.

Am I missing anything?

@Jackie-Jiang
Copy link
Contributor

@mcvsubbu Besides the debugging purpose, we are also planning to add a gauge for the total compressed data size for each table using the size info in the segment ZK metadata. Reading the size info for one segment through rest API is fine, but querying the size info for the whole table requires sending lots of requests (one per segment) to the deep store, which can potentially cause problems. The gauge metric will be integrated into an existing periodic task to avoid the overhead of reading all segments ZK metadata.

@timsants timsants force-pushed the new_size_metrics branch 2 times, most recently from 7b0e7ac to 530de35 Compare March 18, 2022 16:41
@timsants timsants marked this pull request as ready for review March 18, 2022 16:45
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already read the segment ZK metadata in SegmentStatusChecker and emit metrics for table/segment status. Let's add the new logic into that class, and we can remove the config as it does not add extra overhead

@timsants timsants force-pushed the new_size_metrics branch 3 times, most recently from 32f2f64 to 2c891f7 Compare March 24, 2022 23:18
@timsants timsants force-pushed the new_size_metrics branch 2 times, most recently from ca2b6e0 to 66d61a4 Compare March 24, 2022 23:39
@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Mar 25, 2022
@Jackie-Jiang Jackie-Jiang merged commit e02bdda into apache:master Mar 25, 2022
@Jackie-Jiang Jackie-Jiang deleted the new_size_metrics branch March 25, 2022 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants