Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4099: Use mangled column names but keep the original when building frames from arrow #4767

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

AndreyPavlenko
Copy link
Collaborator

@AndreyPavlenko AndreyPavlenko commented Aug 3, 2022

What do these changes do?

The method Table.rename_columns() accepts a list of strings, but not numbers, thus, the columns must be converted to strings.

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves dataframe.sum() doesn`t work correctly at MODIN_STORAGE_FORMAT=omnisci #4099
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date
  • added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

@codecov
Copy link

codecov bot commented Aug 4, 2022

Codecov Report

Merging #4767 (3e45f2d) into master (f7fd559) will decrease coverage by 0.53%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master    #4767      +/-   ##
==========================================
- Coverage   85.37%   84.84%   -0.54%     
==========================================
  Files         265      266       +1     
  Lines       19622    19908     +286     
==========================================
+ Hits        16752    16890     +138     
- Misses       2870     3018     +148     
Impacted Files Coverage Δ
...entations/omnisci_on_native/dataframe/dataframe.py 91.81% <88.88%> (-0.34%) ⬇️
modin/experimental/sklearn/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
modin/experimental/xgboost/test/test_dmatrix.py 0.00% <0.00%> (-100.00%) ⬇️
modin/experimental/xgboost/test/test_xgboost.py 0.00% <0.00%> (-100.00%) ⬇️
modin/experimental/core/execution/ray/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...n/experimental/sklearn/model_selection/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...mental/sklearn/model_selection/train_test_split.py 0.00% <0.00%> (-100.00%) ⬇️
...tal/core/execution/ray/implementations/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
...ecution/ray/implementations/pandas_on_ray/io/io.py 0.00% <0.00%> (-100.00%) ⬇️
...tion/ray/implementations/pandas_on_ray/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
... and 61 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@AndreyPavlenko AndreyPavlenko marked this pull request as ready for review August 4, 2022 14:55
@AndreyPavlenko AndreyPavlenko requested a review from a team as a code owner August 4, 2022 14:55
@anmyachev
Copy link
Collaborator

@AndreyPavlenko thanks for contribution! Please also add a test and a release note into docs\release_notes\release_notes-0.16.0.rst.

Copy link
Collaborator

@dchigarev dchigarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreyPavlenko thanks for the changes, it should indeed fix the problem. Although, to be sure about this it's better to add a test that verifies it.

The fix looks general and may even be fixing the more general related issue (#3370), so I would suggest trying to add general testing for integer column names to see if we can close the #3370 with the fix, what do you think?

@AndreyPavlenko
Copy link
Collaborator Author

The fix looks general and may even be fixing the more general related issue (#3370), so I would suggest trying to add general testing for integer column names to see if we can close the #3370 with the fix, what do you think?

Sorry, but looking deeper, now I'm not sure that the fix is quite correct, because it changes the type of the column names.

@AndreyPavlenko AndreyPavlenko force-pushed the issue-4099 branch 2 times, most recently from bb029ac to b080a4b Compare August 11, 2022 19:38
@AndreyPavlenko AndreyPavlenko requested a review from a team as a code owner August 11, 2022 19:38
@AndreyPavlenko AndreyPavlenko changed the title FIX-#4099: Convert columns to list of str FIX-#4099: Use mangled column names but keep the original when building frames from arrow Aug 11, 2022
@AndreyPavlenko AndreyPavlenko force-pushed the issue-4099 branch 3 times, most recently from bd0056d to b6e3617 Compare August 12, 2022 18:51
…l when building from Arrow

 - Add F_ suffix to column names since integer names are not supported.
 - Keep the original column names in the from_arrow() method.

Signed-off-by: Andrey Pavlenko <andrey.a.pavlenko@gmail.com>
Copy link
Collaborator

@ienkovich ienkovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@YarShev YarShev merged commit af86009 into modin-project:master Aug 17, 2022
@AndreyPavlenko
Copy link
Collaborator Author

I think, the issue #3370 is also resolved by this fix. The test test_set_index() covers this case.

@YarShev
Copy link
Collaborator

YarShev commented Aug 17, 2022

@AndreyPavlenko, cool, closing that issue as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dataframe.sum() doesn`t work correctly at MODIN_STORAGE_FORMAT=omnisci
5 participants