Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] fix bug when partition_id exceeds integer range in spark load #9073

Merged
merged 1 commit into from
Apr 20, 2022

Conversation

spaces-X
Copy link
Contributor

@spaces-X spaces-X commented Apr 18, 2022

Proposed changes

Problem Summary:

when partition_id exceed integer range, it will encountered java.lang.NumberFormatException in spark load.

Describe the overview of changes.

Checklist(Required)

  1. Does it affect the original behavior: (No)
  2. Has unit tests been added: (No Need)
  3. Has document been added or modified: (No Need)
  4. Does it need to update dependencies: (No)
  5. Are there any changes that cannot be rolled back: (No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@wangbo wangbo added the area/spark-load Issues or PRs related to the spark load label Apr 19, 2022
Copy link
Contributor

@wangbo wangbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 19, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 39c0fec into apache:master Apr 20, 2022
@morningman morningman added the dev/1.0.1-deprecated should be merged into dev-1.0.1 branch label Apr 20, 2022
@morningman morningman added dev/merged-1.0.1-deprecated PR has been merged into dev-1.0.1 and removed dev/1.0.1-deprecated should be merged into dev-1.0.1 branch labels Apr 20, 2022
weizhengte pushed a commit to weizhengte/incubator-doris that referenced this pull request Apr 22, 2022
zhengshiJ pushed a commit to zhengshiJ/incubator-doris that referenced this pull request Apr 27, 2022
starocean999 pushed a commit to starocean999/incubator-doris that referenced this pull request May 19, 2022
englefly pushed a commit to englefly/incubator-doris that referenced this pull request May 23, 2022
liutang123 pushed a commit to liutang123/doris that referenced this pull request Apr 15, 2024
change list
1.add broker plus manifest.yaml
2.[MT] spark load for mt
3.[MT] Adapt MT internal spark/yarn commands and configurations
4.[MT] add custom properties for spark load etl & del tmp hive table
5.[MT] delete spark delete spark repository and archive & improve etl job log
6.[MT] feature(sparkload): support bitmap encode features in spark load
7.[MT] feature(sparkload parquet): disable parquet dictionary
8.feature(sparkload): support bitmap binary data from hive in spark load
9.[MT] feature(sparkload): add tolas-output dependency in SparkDpp
10.fix(spark load): resolve args conflict between skip_null_value  and map_side_join
   refactor(spark load): refactor function name

each commit detail are listed in this branch:
https://dev.sankuai.com/code/repo-detail/data/palo/commit/list?branch=sparkload-14-update-details
or in branh 13:
https://dev.sankuai.com/code/repo-detail/data/palo/commit/list?branch=refs%2Fheads%2F13

Some Spark Load changes in 0.15 to 1.1:
[MT][FIX][SPARKLOAD] fix bug when partition_id exceeds integer range in spark load (apache#9073)
[MT][FIX][SPARKLOAD] fix `getHashValue` of string type is always zero in spark load (apache#9135)
[MT][TMP][SPARKLOAD] support `custom.global.dict.table` in spark load
[MT][FEATURE][SPARKLOAD] support retry-strategy when get the spark elt job state timeout
[MT][SPARKLOAD] hive table name start with tmp key word and its size should be no longer than 128
[MT][TMP][SPARKLOAD] fix min_value will be negative number when `maxGlobalDictValue`  exceeds integer range (apache#9436)

detail commmits' content could be found in this branch:
https://dev.sankuai.com/code/repo-detail/data/palo/commit/list?branch=refs%2Fheads%2F14

[MT][TMP][FIX] fix UT in spark load
[MT][FEATURE] feature(spark-dpp version): add version file for spark-dpp
add spark-dpp commit id as version file when build FE

[MT][SPARKLOAD] some fixes from 15 to 1.1 by wangbo36
1 not connect hive metastore when create hive table
2 avoid cast from string to bitmap expr
3 cast bytebuffer to buffer
4 handle exception in ut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/spark-load Issues or PRs related to the spark load dev/merged-1.0.1-deprecated PR has been merged into dev-1.0.1 reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants