Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't call len on DataFrame interchange protocol object #3111

Merged
merged 2 commits into from
Jul 17, 2023

Conversation

jonmmease
Copy link
Contributor

See #3110.

We can't call len(data) when data is a DataFrame interchange protocol object. It happens to work for PyArrow and Polars, which is why we didn't catch it before, but that's not part of the standard.

@jcrist
Copy link

jcrist commented Jul 17, 2023

Thanks Jon! We're getting closer. Running on this PR I now see no error with to_dict, but do see an error with chart.save:

import altair as alt
import ibis
import polars as pl

data = ibis.memtable(
    {'x': ['A', 'B', 'C', 'D', 'E'],
     'y': [5, 3, 6, 7, 2]}
)

chart = alt.Chart(data).mark_bar().encode(
    x='x:O',
    y='y:Q',
)

chart.save("chart.html")

Output:

Traceback (most recent call last):
  File "/home/jcristharif/Code/altair/test.py", line 15, in <module>
    chart.save("chart.html")
  File "/home/jcristharif/Code/altair/altair/vegalite/v5/api.py", line 1066, in save
    result = save(**kwds)
             ^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/utils/save.py", line 189, in save
    perform_save()
  File "/home/jcristharif/Code/altair/altair/utils/save.py", line 127, in perform_save
    spec = chart.to_dict(context={"pre_transform": False})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/vegalite/v5/api.py", line 2677, in to_dict
    return super().to_dict(
           ^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/vegalite/v5/api.py", line 892, in to_dict
    copy.data = _prepare_data(original_data, context)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/vegalite/v5/api.py", line 114, in _prepare_data
    data = _pipe(data, data_transformers.get())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/miniconda3/envs/altair/lib/python3.11/site-packages/toolz/functoolz.py", line 628, in pipe
    data = func(data)
           ^^^^^^^^^^
  File "/home/jcristharif/miniconda3/envs/altair/lib/python3.11/site-packages/toolz/functoolz.py", line 304, in __call__
    return self._partial(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/vegalite/data.py", line 23, in default_data_transformer
    return curried.pipe(data, limit_rows(max_rows=max_rows), to_values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/miniconda3/envs/altair/lib/python3.11/site-packages/toolz/functoolz.py", line 628, in pipe
    data = func(data)
           ^^^^^^^^^^
  File "/home/jcristharif/miniconda3/envs/altair/lib/python3.11/site-packages/toolz/functoolz.py", line 304, in __call__
    return self._partial(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/utils/data.py", line 111, in limit_rows
    if pa_table.num_rows > max_rows:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'NoneType'

max_rows is None here for some reason.

@mattijn
Copy link
Contributor

mattijn commented Jul 17, 2023

Tested as follows:

import altair as alt
import pyarrow as pa
import polars as pl
import pandas as pd
import vaex
import ibis


def chart(source, title):
    return (
        alt.Chart(source, height=20, title=title)
        .mark_bar()
        .encode(x="x:O", y="y:Q")
    )


data = {"x": [1, 2, 3], "y": [4, 5, 6]}
pa_table = pa.table(data)
df_polars = pl.DataFrame(data)
df_pandas = pd.DataFrame(data)
df_vaex = vaex.from_pandas(df_pandas)
df_ibis = ibis.memtable(data)

dataframes = {
    "pyarrow": pa_table,
    "polars": df_polars,
    "pandas": df_pandas,
    "vaex": df_vaex,
    "ibis": df_ibis
}


chart = alt.hconcat(*[chart(dataframes[df], df) for df in dataframes])

# raise MaxRowsError
with alt.data_transformers.enable("default", max_rows=2):
    try:
        chart.to_dict()
    except alt.MaxRowsError as e:
        print(e)
The number of rows in your dataset is greater than the maximum allowed (2).

Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
transformations in Python.
    >> import altair as alt
    >> alt.data_transformers.enable("vegafusion")

Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
on how to plot large datasets.
# render chart with standard `max_rows` (5000) from different sources
chart

image

@mattijn
Copy link
Contributor

mattijn commented Jul 17, 2023

Good catch @jcrist! Something is still missing

@mattijn
Copy link
Contributor

mattijn commented Jul 17, 2023

Related vega/vl-convert#31 (comment) and #2724

@jonmmease
Copy link
Contributor Author

max_rows is None here for some reason.

Thanks for checking, I missed a None check. Fixed in 3615dae.

For context, when saving to a file we disable the max_rows check since the worst that happens in that case is you end up with a large file. The main purpose of max_rows is to protect against crashing the browser.

@mattijn
Copy link
Contributor

mattijn commented Jul 17, 2023

Checked again and tests from above work correctly. Thanks @jonmmease for this PR, thanks @jcrist for the review!

@mattijn mattijn merged commit 139c86a into master Jul 17, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants