Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: dataframe constructor should preserve order with standard Python dicts #38033

Closed
arw2019 opened this issue Nov 24, 2020 · 6 comments · Fixed by #38206
Closed

TST: dataframe constructor should preserve order with standard Python dicts #38033

arw2019 opened this issue Nov 24, 2020 · 6 comments · Fixed by #38206
Labels
Constructors Series/DataFrame/Index/pd.array Constructors good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@arw2019
Copy link
Member

arw2019 commented Nov 24, 2020

We have a couple of dataframe tests (xref #13304) related to column ordering when constructed from an OrderedDict:

def test_constructor_ordered_dict_preserve_order(self):
# see gh-13304
expected = DataFrame([[2, 1]], columns=["b", "a"])
data = OrderedDict()
data["b"] = [2]
data["a"] = [1]
result = DataFrame(data)
tm.assert_frame_equal(result, expected)
data = OrderedDict()
data["b"] = 2
data["a"] = 1
result = DataFrame([data])
tm.assert_frame_equal(result, expected)
def test_constructor_ordered_dict_conflicting_orders(self):
# the first dict element sets the ordering for the DataFrame,
# even if there are conflicting orders from subsequent ones
row_one = OrderedDict()
row_one["b"] = 2
row_one["a"] = 1
row_two = OrderedDict()
row_two["a"] = 1
row_two["b"] = 2
row_three = {"b": 2, "a": 1}
expected = DataFrame([[2, 1], [2, 1]], columns=["b", "a"])
result = DataFrame([row_one, row_two])
tm.assert_frame_equal(result, expected)
expected = DataFrame([[2, 1], [2, 1], [2, 1]], columns=["b", "a"])
result = DataFrame([row_one, row_two, row_three])
tm.assert_frame_equal(result, expected)

As of PEP 468 standard dictionaries are ordered so these tests should also work with OrderedDict replaced by dict

@arw2019 arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member Constructors Series/DataFrame/Index/pd.array Constructors Needs Tests Unit test(s) needed to prevent regressions good first issue labels Nov 24, 2020
@itsmegarvi
Copy link

would doing dict(orderedDict()) work?

@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Nov 24, 2020
@rhshadrach
Copy link
Member

Is the current behavior correct and this just needs tests, or is there a bug when using dict instead of OrderedDict?

@arw2019
Copy link
Member Author

arw2019 commented Nov 24, 2020

Is the current behavior correct and this just needs tests, or is there a bug when using dict instead of OrderedDict?

I think the current behavior is correct so just need tests. I didn't find open issues in the tracker since the OrderedDict ticket was closed four years ago.

would doing dict(orderedDict()) work?

I was thinking of parametrizing the tests on [dict, OrderedDict]. Something like

@pytest.mark.parametrize("dictionary, (dict, OrderedDict)")
def test_foo(dictionary):
    ...

@rhshadrach rhshadrach removed the Bug label Nov 24, 2020
@itsmegarvi
Copy link

wouldnt @parametrized("dictionary, (dict, OrderedDict)") do the same thing

@arw2019
Copy link
Member Author

arw2019 commented Nov 24, 2020

wouldnt @parametrized("dictionary, (dict, OrderedDict)") do the same thing

However you wanna do it yeah

@chethanreddy123
Copy link

I don't know where to start contributing, can anyone guide me on how to contribute

@jreback jreback added this to the 1.2 milestone Dec 2, 2020
jreback pushed a commit that referenced this issue Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
5 participants