Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pushFileNamePattern in pushJobSpec #8191

Conversation

kkrugler
Copy link
Contributor

Description

Support a new, optional pushFileNamePattern parameter in the pushJobSpec section of the job yaml. This will filter segments in the outputDir that are pushed to Pinot.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Support for name-based filtering of segments being pushed to the Pinot cluster.

Documentation

In the https://docs.pinot.apache.org/configuration-reference/job-specification#push-job-spec section, add:

pushFileNamePattern | segment name pattern for which segments to push, supported glob and regex patterns. E.g. 
'glob:stats_* will push all segment files under the outputDirURI whose names start with 'stats_'. 
'glob:*2022-01*' will push all the segment files under the outputDirURI whose names contain '2022-01'.

@kkrugler
Copy link
Contributor Author

See also #8141

@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Feb 14, 2022
@Jackie-Jiang Jackie-Jiang merged commit f12e625 into apache:master Feb 14, 2022
@kkrugler
Copy link
Contributor Author

Thanks @Jackie-Jiang !

@kai11
Copy link

kai11 commented Apr 5, 2023

"glob:*2023-04*" don't work; However, "glob:**2023-04*" does.
We use hdfs for deep storage, so segment names there are full paths like hdfs://hostname/pinot/pinot-segments/table/table_2023-04_00.tar.gz , not table_2023-04_00.tar.gz
PS. This might be hdfs-specific.
PPS. PR to update documentation: pinot-contrib/pinot-docs#160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants