Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Connector, support for TIMESTAMP and BOOLEAN fields #8825

Merged
merged 5 commits into from
Jun 4, 2022

Conversation

cbalci
Copy link
Contributor

@cbalci cbalci commented Jun 3, 2022

Change
Spark Connector doesn't support TIMESTAMP and BOOLEAN field types, which were introduced to Pinot after the connector was added. I'm adding mapping for singular and array variations of these types.

New field type mappings:

Pinot Type Spark Type
TIMESTAMP LongType
TIMESTAMP_ARRAY ArrayType(LongType)
BOOLEAN BooleanType
BOOLEAN_ARRAY ArrayType(BooleanType)

Discussion
Spark also supports a TimestampType which is backed by "Long" and stores milliseconds since epoch as explained here. It could have been a better choice from Pinot TIMESTAMP field, however I had a hard time correctly translating the Pinot value to microseconds for all TIMESTAMP column. I'm open to suggestions here, would like to know if there is an easy way.

Testing

  • Unit tests are updated to cover new fields and their conversion.

Backwards Compatibility
No previous behavior is broken with the introduction of these fields. Previously the connector would throw an exception when it came across these unknown Pinot field types.

bugfix feature

@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2022

Codecov Report

Merging #8825 (16257ac) into master (8b2b8f5) will decrease coverage by 42.39%.
The diff coverage is 4.95%.

❗ Current head 16257ac differs from pull request most recent head a225165. Consider uploading reports for the commit a225165 to get more accurate results

@@              Coverage Diff              @@
##             master    #8825       +/-   ##
=============================================
- Coverage     68.15%   25.75%   -42.40%     
+ Complexity     4626       45     -4581     
=============================================
  Files          1735     1729        -6     
  Lines         91298    91124      -174     
  Branches      13636    13636               
=============================================
- Hits          62226    23473    -38753     
- Misses        24733    65384    +40651     
+ Partials       4339     2267     -2072     
Flag Coverage Δ
integration2 25.75% <4.95%> (+0.20%) ⬆️
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...e/pinot/common/utils/FileUploadDownloadClient.java 38.85% <ø> (-10.29%) ⬇️
...ent/local/dedup/PartitionDedupMetadataManager.java 0.00% <0.00%> (ø)
...segment/local/dedup/TableDedupMetadataManager.java 0.00% <0.00%> (ø)
...l/indexsegment/immutable/ImmutableSegmentImpl.java 0.00% <0.00%> (-68.12%) ⬇️
...local/indexsegment/mutable/MutableSegmentImpl.java 0.00% <0.00%> (-59.29%) ⬇️
...ent/local/realtime/impl/RealtimeSegmentConfig.java 0.00% <0.00%> (-92.37%) ⬇️
...not/segment/local/upsert/PartialUpsertHandler.java 0.00% <0.00%> (-38.71%) ⬇️
...t/local/upsert/PartitionUpsertMetadataManager.java 0.00% <0.00%> (-76.00%) ⬇️
...gment/local/upsert/TableUpsertMetadataManager.java 0.00% <0.00%> (-100.00%) ⬇️
...rg/apache/pinot/segment/local/utils/HashUtils.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1221 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b2b8f5...a225165. Read the comment docs.

@KKcorps KKcorps added enhancement bugfix release-notes Referenced by PRs that need attention when compiling the next release notes labels Jun 3, 2022
@yupeng9 yupeng9 merged commit 551d53b into apache:master Jun 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix enhancement release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants