Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Avro dependency to 1.10.2 #11698

Conversation

sajjad-moradi
Copy link
Contributor

@sajjad-moradi sajjad-moradi commented Sep 27, 2023

Upgraded Avro version from 1.9.2 to 1.10.2. In our installation, the newer version gets pulled transitively. Although major version is the same, and there's no interface changes, but the way OSS Pinot handles Avro fields of type map doesn't work with 1.10.2. In Pinot, for map fields, fieldName__KEYS and fieldName__VALUES are dynamically added to the avro record. Since they are not defined in the schema, in avro 1.10.2, a runtime exception gets thrown when record.get(fieldName) is called. To fix the issue, I simply added a check to see if the schema has that field, and if not, we set value as null.

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2023

Codecov Report

Merging #11698 (b1df4af) into master (e686b6a) will decrease coverage by 0.01%.
Report is 11 commits behind head on master.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master   #11698      +/-   ##
============================================
- Coverage     63.04%   63.03%   -0.01%     
+ Complexity     1120     1117       -3     
============================================
  Files          2343     2342       -1     
  Lines        125676   125780     +104     
  Branches      19314    19335      +21     
============================================
+ Hits          79230    79290      +60     
- Misses        40787    40847      +60     
+ Partials       5659     5643      -16     
Flag Coverage Δ
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 63.01% <100.00%> (+<0.01%) ⬆️
java-17 62.89% <100.00%> (+62.89%) ⬆️
java-20 62.89% <100.00%> (-0.02%) ⬇️
temurin 63.03% <100.00%> (-0.01%) ⬇️
unittests 63.03% <100.00%> (-0.01%) ⬇️
unittests1 67.18% <0.00%> (-0.01%) ⬇️
unittests2 14.43% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...t/plugin/inputformat/avro/AvroRecordExtractor.java 90.19% <100.00%> (+1.96%) ⬆️

... and 27 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The comment is optional

@@ -74,7 +74,7 @@ public GenericRow extract(GenericRecord from, GenericRow to) {
}
} else {
for (String fieldName : _fields) {
Object value = from.get(fieldName);
Object value = from.hasField(fieldName) ? from.get(fieldName) : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will introduce one extra map lookup per field. If we want to avoid this, we can change the T to GenericData.Record and make a single lookup

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that this introduces an extra lookup. In Venice, we have maintained the single lookup like this: https://github.com/linkedin/venice/blob/main/internal/venice-client-common/src/main/java/com/linkedin/venice/compute/ComputeUtils.java#L382

In the context of the Pinot code here, the solution could look like this:

Field field = from.getSchema().getField(fieldName);
Object value = field == null ? null : from.get(field.pos());

This works on GenericRecord. No need to cast or change the type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FelixGV I don't see an API in GenericRecord to lookup an index. Do you use an even higher version Avro?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad. It is in the IndexedRecord. Yes this is the preferred solution!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although hashmap lookup is O(1), using field.pos() is definitely faster and avoids that lookup. Updated the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can change the T to GenericData.Record and make a single lookup

@Jackie-Jiang I didn't quite get this. Can you elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can change the T to GenericData.Record and make a single lookup

@Jackie-Jiang I didn't quite get this. Can you elaborate?

This is no longer relevant. I missed the fact that GenericRecord has an API to lookup an index

@sajjad-moradi sajjad-moradi merged commit 9ead3ff into apache:master Sep 28, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants