Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bug of reading decimal value stored in int32 or int64 in ParquetNativeRecordReader #11840

Merged
merged 1 commit into from
Oct 21, 2023

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Oct 20, 2023

In Apache Parquet, decimal logical types can be represented by using either the INT32, INT64, or FIXED_LEN_BYTE_ARRAY primitive types, depending on the precision needed for the decimal numbers.

The logical type specification includes two components: precision and scale.
Precision: The total number of digits that the decimal number can have.
Scale: The number of digits that can appear after the decimal point.

Here's how these map:

INT32: Suitable for decimal types that require up to 9 digits of precision.
INT64: Suitable for decimal types that require up to 18 digits of precision.
FIXED_LEN_BYTE_ARRAY: Suitable for decimal types that require more than 18 digits of precision. The length of the byte array would depend on the required precision.

More Ref can be found here: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

This PR:

  1. Replace deprecated usage of OriginalType to LogicalTypeAnnotation
  2. Return BigDecimal Value instead for all logical Decimal values

@codecov-commenter
Copy link

codecov-commenter commented Oct 20, 2023

Codecov Report

Merging #11840 (a25701b) into master (229ce86) will increase coverage by 0.02%.
Report is 1 commits behind head on master.
The diff coverage is 86.36%.

@@             Coverage Diff              @@
##             master   #11840      +/-   ##
============================================
+ Coverage     63.03%   63.06%   +0.02%     
  Complexity     1141     1141              
============================================
  Files          2350     2350              
  Lines        127284   127293       +9     
  Branches      19604    19607       +3     
============================================
+ Hits          80236    80279      +43     
+ Misses        41339    41300      -39     
- Partials       5709     5714       +5     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (ø)
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 63.03% <86.36%> (+0.03%) ⬆️
java-21 62.91% <86.36%> (+<0.01%) ⬆️
skip-bytebuffers-false 63.05% <86.36%> (+0.03%) ⬆️
skip-bytebuffers-true 62.89% <86.36%> (+<0.01%) ⬆️
temurin 63.06% <86.36%> (+0.02%) ⬆️
unittests 63.06% <86.36%> (+0.02%) ⬆️
unittests1 67.15% <ø> (+0.02%) ⬆️
unittests2 14.51% <86.36%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...utformat/parquet/ParquetNativeRecordExtractor.java 82.08% <86.36%> (+0.48%) ⬆️

... and 12 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's easy, can we add the case that would cover to test decimal stored in int?

LGTM otherwise.

Copy link
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change. @xiangfu0 - How are we verifying this change ?

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiangfu0 xiangfu0 force-pushed the more-parquet-fix branch 3 times, most recently from d8ea5ce to 4e4d152 Compare October 20, 2023 21:52
@xiangfu0
Copy link
Contributor Author

Added a test file to test logical types and just one simple test for decimal data type.
We should support and test more logical types.
cc: @snleee

@xiangfu0 xiangfu0 changed the title Fix the bug of reading decimal value stored in int32 or int64 Fix the bug of reading decimal value stored in int32 or int64 in ParquetNativeRecordReader Oct 20, 2023
@xiangfu0 xiangfu0 merged commit 226087f into apache:master Oct 21, 2023
19 checks passed
@xiangfu0 xiangfu0 deleted the more-parquet-fix branch October 21, 2023 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants