-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#5829: fix ndarray assignment via loc #5847
Conversation
@@ -821,22 +821,23 @@ def _setitem_with_new_columns(self, row_loc, col_loc, item): | |||
"Must have equal len keys and value when setting with an iterable" | |||
) | |||
else: | |||
if item.shape != (len(self.qc.index, len(col_loc))): | |||
if item.shape != (len(row_loc), len(col_loc)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we need to compare the lengths of the keys with the object that we insert.
if not all(common_label_loc): | ||
# In this case we have some new cols and some old ones | ||
columns = self.qc.columns | ||
for i in range(len(common_label_loc)): | ||
if not common_label_loc[i]: | ||
columns = columns.insert(len(columns), col_loc[i]) | ||
self.qc = self.qc.reindex(labels=columns, axis=1, fill_value=0) | ||
self.qc = self.qc.reindex(labels=columns, axis=1, fill_value=np.NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently the behavior in pandas has been changed.
modin/pandas/indexing.py
Outdated
if len(item.shape) > 1 | ||
else item[common_label_loc] | ||
) | ||
if not isinstance(item, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in the case when the keys are strings, we will get an empty array here, which is not true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate please? I still don't understand why the check for numpy array is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous comment is not entirely correct.
This part of the code was responsible for not taking those columns from the assigned value that were not in the query compiler, however this behavior is not true for loc
operation, since missed columns will be added in the next few lines via reindex
op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering also that in this code branch there can only be a numpy array, I completely deleted this code.
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@dchigarev anything else? |
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date