Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF-#7130: Preserve partition lengths in apply_full_axis with keep_partitioning=True #7131

Merged
merged 2 commits into from
Mar 27, 2024

Conversation

YarShev
Copy link
Collaborator

@YarShev YarShev commented Mar 27, 2024

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Preserve partition lengths in apply_full_axis with keep_partitioning=True #7130
  • tests passing
  • module layout described at docs/development/architecture.rst is up-to-date

…s with keep_partitioning=True

Signed-off-by: Iaroslav Igoshev <iaroslav.igoshev@intel.com>
Signed-off-by: Iaroslav Igoshev <iaroslav.igoshev@intel.com>
if (
kw["column_widths"] is None
and ModinIndex.is_materialized_index(new_columns)
and len(new_partitions.shape) > 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check this?

Suggested change
and len(new_partitions.shape) > 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to check this. I faced some errors without this check. I think it is because we use np.array([]) and np.array([[]]) in different places to create an empty frame. We should look into this as part of a separate issue. I will file one for this.

Comment on lines 3467 to 3468
elif len(new_columns) == 1 and new_partitions.shape[1] == 1:
kw["column_widths"] = [1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just this change?

Suggested change
elif len(new_columns) == 1 and new_partitions.shape[1] == 1:
kw["column_widths"] = [1]
elif new_partitions.shape[1] == 1:
kw["column_widths"] = [len(new_columns)]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for row_lengths a few lines above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to happen in #7133.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it (#7131 (comment)) can be done instead of the following changes:

                if (
                    kw["column_widths"] is None
                    and ModinIndex.is_materialized_index(new_columns)
                    and len(new_partitions.shape) > 1
                    and new_partitions.shape[1] == 1
                ):
                    kw["column_widths"] = [len(new_columns)]

This way the diff will be smaller.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion relates to the problem #7133 resolves. We should make changes not by diff size but an issue related.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it (#7131 (comment)) can be done instead of the following changes:

                if (
                    kw["column_widths"] is None
                    and ModinIndex.is_materialized_index(new_columns)
                    and len(new_partitions.shape) > 1
                    and new_partitions.shape[1] == 1
                ):
                    kw["column_widths"] = [len(new_columns)]

This way the diff will be smaller.

This block of code is independent of keep_partitioning parameter, are you planning to get rid of the code duplication in #7133?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of confused. Your suggestion lies under if not keep_partitioning branch. If we have duplication in #7133, we can get rid of it there.

@anmyachev anmyachev merged commit 4c1e448 into modin-project:master Mar 27, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preserve partition lengths in apply_full_axis with keep_partitioning=True
2 participants