Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing pandas indexes in addition to lists and arrays #2808

Closed
joelostblom opened this issue Jan 6, 2023 · 2 comments · Fixed by #3501
Closed

Allow passing pandas indexes in addition to lists and arrays #2808

joelostblom opened this issue Jan 6, 2023 · 2 comments · Fixed by #3501

Comments

@joelostblom
Copy link
Contributor

There are some places where Altair currently requires a list or an array to be passed and doesn't accept e.g. a pandas index, although this can easily be converted to a list or an array. The errors from this can sometimes be confusing so I suggest that we are more lenient and at least allow passing pandas Indices by converting them to lists automatically. Optionally, we can convert any data structure that has a tolist() method.

In this example the error is quite helpful, although still a bit confusing since indexes and arrays are often uses interchangeable when working with pandas directly

source = data.cars().melt(id_vars=['Origin', 'Name', 'Year', 'Horsepower', 'Cylinders'])
dropdown_options = source['variable'].drop_duplicates()  # This line needs explicit conversion

dropdown = alt.binding_select(
    options=dropdown_options,
    name='X-axis column '
)
selection = alt.selection_point(
    fields=['variable'],
    value=[{'variable': dropdown_options[0]}],
    bind=dropdown
)

alt.Chart(source).mark_circle().encode(
    x=alt.X('value:Q', title=''),
    y='Horsepower',
    color='Origin',
).add_params(
    selection
).transform_filter(
    selection
)
SchemaValidationError: Invalid specification

        altair.vegalite.v5.schema.core.BindRadioSelect->options, validating 'type'

        {0: 'Miles_per_Gallon', 406: 'Displacement', 812: 'Weight_in_lbs', 1218: 'Acceleration'} is not of type 'array'

In other cases, such as the one below, the traceback is huge and the error message is not at all helpful

import altair as alt
from vega_datasets import data

barley = data.barley()

barley['variety'] = pd.Categorical(
    barley['variety'],
    ordered=True,
    categories=[
        'Manchuria',
         'No. 457',
         'No. 462',
         'No. 475',
         'Glabron',
         'Svansota',
         'Velvet',
         'Trebi',
         'Wisconsin No. 38',
         'Peatland'
    ]
)

alt.Chart(barley).mark_bar().encode(
    x=alt.X('variety', sort=barley['variety'].cat.categories),  # This line needs manual conversion
    y=alt.Y('sum(yield)'),
    color='site:N'
)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
@ChristopherDavisUCI
Copy link
Contributor

ChristopherDavisUCI commented Jan 6, 2023

This sounds great to me @joelostblom. I come across this fairly frequently. (I just looked up an example, and in my case, I was getting the index from something like df.groupby(...).median().sort_values().index.)

@joelostblom
Copy link
Contributor Author

Ah yes that definitely happens to me as well when sorting boxplots since they don't support the usual sorting values. Feel free to work on this if you want, I haven't looked into this so I don't have a good idea of where to start and have some other commitments for the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants