Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory optimized dimension table #9802

Merged
merged 9 commits into from
Dec 2, 2022

Conversation

KKcorps
Copy link
Contributor

@KKcorps KKcorps commented Nov 15, 2022

Currently, we store the whole row in the Dimension table hash map for fast lookups.
However, in some cases, we can trade off the speed for memory efficiency.
In this PR, I am adding a new implementation that only stores the segment reference and docID, and the actual value is read at the time of lookup.

I have also created another design where I am using different DataManager implementations instead of DataTable implementations. It touches a lot of code paths though hence I didn't raise PR for that approach. The code can be seen here
https://github.com/KKcorps/incubator-pinot/pull/23/files

Release Notes

Disable preloading while using Dimension tables to save memory

Documentation

add the following config in tableConfig json to disable preloading

{
      "dimensionTableConfig": {
        "disablePreload": true
      }
}

@KKcorps KKcorps changed the title Add memomry optimized dimension table Add memory optimized dimension table Nov 15, 2022
@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2022

Codecov Report

Merging #9802 (d157124) into master (cd1a017) will decrease coverage by 48.31%.
The diff coverage is 0.00%.

@@              Coverage Diff              @@
##             master    #9802       +/-   ##
=============================================
- Coverage     64.11%   15.80%   -48.32%     
+ Complexity     5385      175     -5210     
=============================================
  Files          1903     1923       +20     
  Lines        102554   103482      +928     
  Branches      15604    15758      +154     
=============================================
- Hits          65753    16354    -49399     
- Misses        32017    85936    +53919     
+ Partials       4784     1192     -3592     
Flag Coverage Δ
unittests1 ?
unittests2 15.80% <0.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...he/pinot/common/utils/config/TableConfigUtils.java 0.00% <0.00%> (-76.69%) ⬇️
...ata/manager/offline/DimensionTableDataManager.java 0.00% <0.00%> (-77.78%) ⬇️
...data/manager/offline/FastLookupDimensionTable.java 0.00% <0.00%> (ø)
...ore/data/manager/offline/LookupRecordLocation.java 0.00% <0.00%> (ø)
...manager/offline/MemoryOptimizedDimensionTable.java 0.00% <0.00%> (ø)
...e/pinot/spi/config/table/DimensionTableConfig.java 0.00% <0.00%> (ø)
...org/apache/pinot/spi/config/table/TableConfig.java 0.00% <0.00%> (-63.53%) ⬇️
...he/pinot/spi/utils/builder/TableConfigBuilder.java 0.00% <0.00%> (-83.56%) ⬇️
...src/main/java/org/apache/pinot/sql/FilterKind.java 0.00% <0.00%> (-100.00%) ⬇️
...ain/java/org/apache/pinot/core/data/table/Key.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1281 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a test for it?

@Jackie-Jiang Jackie-Jiang added feature Configuration Config changes (addition/deletion/change in behavior) release-notes Referenced by PRs that need attention when compiling the next release notes labels Nov 21, 2022
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please reformat the changes

@KKcorps KKcorps merged commit 4235420 into apache:master Dec 2, 2022
if (lookupRecordLocation == null) {
return null;
}
return lookupRecordLocation.getRecord(_reuse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thread safe? _reuse being shared across .get calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clear the _reuse before passing it to .getRecord.
That's what we do in other places in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get can be called concurrently. Think this should be ThreadLocal<GenericRow> _reuse instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I think this is a bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants