Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing custom transformers to be passed to SegmentProcessingFramework #11887

Merged
merged 2 commits into from
Oct 31, 2023

Conversation

swaminathanmanish
Copy link
Contributor

@swaminathanmanish swaminathanmanish commented Oct 26, 2023

Problem:
We need the ability to pass in custom RecordTransformers that can be used during map transformation (segment generation). Without this, users have to introduce custom RecordReaders where custom transformations have to be done. With customer RecordReaders, we no longer have control over allocations/deallocations of RecordReaders (eg: users can pass in long list of pre-allocated Parquet recordReaders which can result in holding up large heap allocations during segment generation and even result in OOM). Moreover we also have a well defined RecordTransformer interface just to do this transformation.

This PR allows users to pass custom RecordTransformers to SegmentProcessorFramework.

@swaminathanmanish swaminathanmanish marked this pull request as draft October 26, 2023 23:19
Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@codecov-commenter
Copy link

codecov-commenter commented Oct 27, 2023

Codecov Report

Merging #11887 (d19c6af) into master (60edf49) will decrease coverage by 0.02%.
Report is 12 commits behind head on master.
The diff coverage is 76.92%.

@@             Coverage Diff              @@
##             master   #11887      +/-   ##
============================================
- Coverage     61.43%   61.42%   -0.02%     
- Complexity     1146     1147       +1     
============================================
  Files          2375     2376       +1     
  Lines        128511   128779     +268     
  Branches      19848    19906      +58     
============================================
+ Hits          78955    79102     +147     
- Misses        43859    43956      +97     
- Partials       5697     5721      +24     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.37% <76.92%> (+0.01%) ⬆️
java-21 61.30% <76.92%> (-0.01%) ⬇️
skip-bytebuffers-false 61.40% <76.92%> (-0.02%) ⬇️
skip-bytebuffers-true 61.27% <76.92%> (+<0.01%) ⬆️
temurin 61.42% <76.92%> (-0.02%) ⬇️
unittests 61.41% <76.92%> (-0.02%) ⬇️
unittests1 46.62% <53.84%> (-0.04%) ⬇️
unittests2 27.66% <23.07%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...rocessing/framework/SegmentProcessorFramework.java 89.89% <100.00%> (+0.20%) ⬆️
.../core/segment/processing/mapper/SegmentMapper.java 84.14% <100.00%> (+0.19%) ⬆️
...pinot/spi/data/readers/RecordReaderFileConfig.java 100.00% <ø> (ø)
.../local/recordtransformer/CompositeTransformer.java 68.75% <50.00%> (-15.87%) ⬇️

... and 25 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@swaminathanmanish swaminathanmanish marked this pull request as ready for review October 30, 2023 22:30
@swaminathanmanish swaminathanmanish changed the title (WIP) Allowing custom transformers to be passed to SegmentProcessingFramework Allowing custom transformers to be passed to SegmentProcessingFramework Oct 30, 2023
@snleee snleee merged commit 7b0eb29 into apache:master Oct 31, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants