Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: DataFrame.plot.scatter argument c now accepts a column of strings, where rows with the same string are colored identically #59239

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

michaelmannino
Copy link

@michaelmannino michaelmannino commented Jul 12, 2024

@michaelmannino
Copy link
Author

import pandas as pd
import random as rand

df = pd.DataFrame({
    'dataX': [rand.randint(0, 100) for _ in range(100)],
    'dataY': [rand.randint(0, 100) for _ in range(100)],
    'fav_fruit': [rand.choice(['Apples', 'Bananas', 'Grapes', 'Peaches']) for _ in range(100)],
})
df.plot.scatter('dataX', 'dataY', c='fav_fruit')

image

Michael Vincent Mannino added 2 commits July 12, 2024 23:28
@michaelmannino michaelmannino changed the title ENH: Update plot.scatter color to define colors on iterables of non-numerics and non-mpl colors ENH: DataFrame.plot.scatter argument c now accepts a column of strings, where rows with the same string are colored identically Jul 13, 2024
@michaelmannino michaelmannino marked this pull request as ready for review July 13, 2024 04:18
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Irv in case you have any feedback as well

@@ -1390,6 +1407,38 @@ def _get_c_values(self, color, color_by_categorical: bool, c_is_column: bool):
c_values = c
return c_values

def _are_valid_colors(self, c_values: np.ndarray | list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what instances is c_values a list? Might be misreading but would be better if we only worked with a pd.Series and could call .unique on that, instead of checking every single value in a loop

except (TypeError, ValueError) as _:
return False

def _uniquely_color_strs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no matplotlib expert but I think we need to defer to that somehow to get the desired colors, instead of trying to write this out ourselves

@WillAyd WillAyd added the Visualization plotting label Jul 26, 2024
Comment on lines +220 to +221
with tm.assert_produces_warning(None):
ax = df.plot.scatter(x=0, y=1, c="state")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure that the test here should be in the tm.assert_produces_warning(None) context. My intuition is that should be removed.

There should also be tests where the column state contains values that are colors, as well as a mix of colors and non-colors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scatter plot with colour_by and size_by variables Meaning of color argument in DataFrame.plot.scatter()
3 participants