Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add LazyRow abstraction for previously indexed record #11826

Merged
merged 1 commit into from
Oct 26, 2023

Conversation

rohityadav1993
Copy link
Contributor

@rohityadav1993 rohityadav1993 commented Oct 17, 2023

refactor

Based on offline discussion for issue: #11174 and previous draft PR: #11584
Design doc: https://docs.google.com/document/d/1bBTCYZFP2stvzc6xZUOEh-XweVgC9WfD7uiSPbKtaZY/edit

There is a need for abstracting previously presisted row as an object to enable mergers between previous and current row which can return multiple merged column values in a single method.

This is a pre-requisite refactoring is needed PartialUpsertMerger contract to updated as void merge(LazyRow previousRow, GenericRow currentRow, Map<String, Object> mergedValues)

The current change includes:

  1. New class LazyRow wrapping IndexSegment and docId and an internal map to cache column values to avoid re-lookups.
  2. Defining field LazyRow _reusePreviousRow in BasePartitionUpsertMetadataManager to be reused per consuming partition.
  3. Initiliasing _reusePreviousRow with previous row's indexSegment and docId.

Local test setup:

  1. QuickStart partial upsert rsvp example.
  2. UpsertConfig:
"upsertConfig": {
      "hashFunction": "NONE",
      "partialUpsertStrategies": {
        "rsvp_count": "INCREMENT",
        "group_name": "UNION",
        "venue_name": "APPEND"
      },
      "enableSnapshot": false,
      "metadataTTL": 0,
      "enablePreload": false,
      "mode": "PARTIAL",
      "defaultPartialUpsertStrategy": "OVERWRITE"
    }
  1. Results:
Query: select * from upsertPartialMeetupRsvp where event_id = 6 order by $segmentName, $docId asc limit 10 option(skipUpsert=true)

Result:
Pinot_Data Explorer (2).csv
The rows as per the upsert config get merged, result is sorted in ascending order of insertion.

@rohityadav1993 rohityadav1993 changed the title add LazyRow abstraction for previously indexed record [Draft] add LazyRow abstraction for previously indexed record Oct 17, 2023
@rohityadav1993 rohityadav1993 marked this pull request as ready for review October 18, 2023 13:12
@rohityadav1993 rohityadav1993 changed the title [Draft] add LazyRow abstraction for previously indexed record add LazyRow abstraction for previously indexed record Oct 18, 2023
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the test failures

@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2023

Codecov Report

Merging #11826 (b36d656) into master (c3f7b6d) will decrease coverage by 0.09%.
The diff coverage is 80.00%.

@@             Coverage Diff              @@
##             master   #11826      +/-   ##
============================================
- Coverage     61.35%   61.27%   -0.09%     
+ Complexity     1146     1145       -1     
============================================
  Files          2373     2374       +1     
  Lines        128276   128297      +21     
  Branches      19803    19805       +2     
============================================
- Hits          78706    78608      -98     
- Misses        43872    44005     +133     
+ Partials       5698     5684      -14     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (ø)
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 ?
java-21 61.27% <80.00%> (+33.68%) ⬆️
skip-bytebuffers-false 61.24% <80.00%> (-0.12%) ⬇️
skip-bytebuffers-true 34.69% <0.00%> (+34.69%) ⬆️
temurin 61.27% <80.00%> (-0.09%) ⬇️
unittests 61.26% <80.00%> (-0.09%) ⬇️
unittests1 46.48% <0.00%> (-0.11%) ⬇️
unittests2 27.56% <80.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...t/ConcurrentMapPartitionUpsertMetadataManager.java 85.41% <100.00%> (+0.20%) ⬆️
...not/segment/local/upsert/PartialUpsertHandler.java 100.00% <100.00%> (+16.21%) ⬆️
...e/pinot/segment/local/segment/readers/LazyRow.java 70.00% <70.00%> (ø)

... and 31 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@rohityadav1993
Copy link
Contributor Author

Please take a look at the test failures
The failures in UT were due to build failure due to missing artifact pinot-flink-connector in artifactory. Couldn't copy the previous run's log.

@Jackie-Jiang Jackie-Jiang merged commit 736829c into apache:master Oct 26, 2023
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants