-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF-#7130: Preserve partition lengths in apply_full_axis with keep_partitioning=True #7131
PERF-#7130: Preserve partition lengths in apply_full_axis with keep_partitioning=True #7131
Conversation
…s with keep_partitioning=True Signed-off-by: Iaroslav Igoshev <iaroslav.igoshev@intel.com>
if ( | ||
kw["column_widths"] is None | ||
and ModinIndex.is_materialized_index(new_columns) | ||
and len(new_partitions.shape) > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check this?
and len(new_partitions.shape) > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to check this. I faced some errors without this check. I think it is because we use np.array([])
and np.array([[]])
in different places to create an empty frame. We should look into this as part of a separate issue. I will file one for this.
elif len(new_columns) == 1 and new_partitions.shape[1] == 1: | ||
kw["column_widths"] = [1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just this change?
elif len(new_columns) == 1 and new_partitions.shape[1] == 1: | |
kw["column_widths"] = [1] | |
elif new_partitions.shape[1] == 1: | |
kw["column_widths"] = [len(new_columns)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for row_lengths
a few lines above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to happen in #7133.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: it (#7131 (comment)) can be done instead of the following changes:
if (
kw["column_widths"] is None
and ModinIndex.is_materialized_index(new_columns)
and len(new_partitions.shape) > 1
and new_partitions.shape[1] == 1
):
kw["column_widths"] = [len(new_columns)]
This way the diff will be smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion relates to the problem #7133 resolves. We should make changes not by diff size but an issue related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: it (#7131 (comment)) can be done instead of the following changes:
if ( kw["column_widths"] is None and ModinIndex.is_materialized_index(new_columns) and len(new_partitions.shape) > 1 and new_partitions.shape[1] == 1 ): kw["column_widths"] = [len(new_columns)]This way the diff will be smaller.
This block of code is independent of keep_partitioning
parameter, are you planning to get rid of the code duplication in #7133?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am kind of confused. Your suggestion lies under if not keep_partitioning
branch. If we have duplication in #7133, we can get rid of it there.
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date