-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Avro dependency to 1.10.2 #11698
Upgrade Avro dependency to 1.10.2 #11698
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11698 +/- ##
============================================
- Coverage 63.04% 63.03% -0.01%
+ Complexity 1120 1117 -3
============================================
Files 2343 2342 -1
Lines 125676 125780 +104
Branches 19314 19335 +21
============================================
+ Hits 79230 79290 +60
- Misses 40787 40847 +60
+ Partials 5659 5643 -16
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 27 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The comment is optional
@@ -74,7 +74,7 @@ public GenericRow extract(GenericRecord from, GenericRow to) { | |||
} | |||
} else { | |||
for (String fieldName : _fields) { | |||
Object value = from.get(fieldName); | |||
Object value = from.hasField(fieldName) ? from.get(fieldName) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will introduce one extra map lookup per field. If we want to avoid this, we can change the T
to GenericData.Record
and make a single lookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that this introduces an extra lookup. In Venice, we have maintained the single lookup like this: https://github.com/linkedin/venice/blob/main/internal/venice-client-common/src/main/java/com/linkedin/venice/compute/ComputeUtils.java#L382
In the context of the Pinot code here, the solution could look like this:
Field field = from.getSchema().getField(fieldName);
Object value = field == null ? null : from.get(field.pos());
This works on GenericRecord
. No need to cast or change the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FelixGV I don't see an API in GenericRecord
to lookup an index. Do you use an even higher version Avro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my bad. It is in the IndexedRecord
. Yes this is the preferred solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although hashmap lookup is O(1), using field.pos() is definitely faster and avoids that lookup. Updated the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can change the T to GenericData.Record and make a single lookup
@Jackie-Jiang I didn't quite get this. Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can change the T to GenericData.Record and make a single lookup
@Jackie-Jiang I didn't quite get this. Can you elaborate?
This is no longer relevant. I missed the fact that GenericRecord
has an API to lookup an index
Upgraded
Avro
version from1.9.2
to1.10.2
. In our installation, the newer version gets pulled transitively. Although major version is the same, and there's no interface changes, but the way OSS Pinot handles Avro fields of type map doesn't work with 1.10.2. In Pinot, for map fields,fieldName__KEYS
andfieldName__VALUES
are dynamically added to the avro record. Since they are not defined in the schema, in avro 1.10.2, a runtime exception gets thrown when record.get(fieldName) is called. To fix the issue, I simply added a check to see if the schema has that field, and if not, we set value as null.