-
-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: DataFrame.plot.scatter argument c
now accepts a column of strings, where rows with the same string are colored identically
#59239
base: main
Are you sure you want to change the base?
Conversation
import pandas as pd
import random as rand
df = pd.DataFrame({
'dataX': [rand.randint(0, 100) for _ in range(100)],
'dataY': [rand.randint(0, 100) for _ in range(100)],
'fav_fruit': [rand.choice(['Apples', 'Bananas', 'Grapes', 'Peaches']) for _ in range(100)],
})
df.plot.scatter('dataX', 'dataY', c='fav_fruit') |
c
now accepts a column of strings, where rows with the same string are colored identically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dr-Irv in case you have any feedback as well
@@ -1390,6 +1407,38 @@ def _get_c_values(self, color, color_by_categorical: bool, c_is_column: bool): | |||
c_values = c | |||
return c_values | |||
|
|||
def _are_valid_colors(self, c_values: np.ndarray | list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what instances is c_values
a list? Might be misreading but would be better if we only worked with a pd.Series and could call .unique on that, instead of checking every single value in a loop
except (TypeError, ValueError) as _: | ||
return False | ||
|
||
def _uniquely_color_strs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no matplotlib expert but I think we need to defer to that somehow to get the desired colors, instead of trying to write this out ourselves
with tm.assert_produces_warning(None): | ||
ax = df.plot.scatter(x=0, y=1, c="state") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure that the test here should be in the tm.assert_produces_warning(None)
context. My intuition is that should be removed.
There should also be tests where the column state
contains values that are colors, as well as a mix of colors and non-colors.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.