Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify differences between pandas and other dataframe packages #2986

Merged
merged 1 commit into from
Mar 26, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions doc/user_guide/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@ and :class:`FacetChart`) accepts a dataset as its first argument.
There are many different ways of specifying a dataset:

- as a `Pandas DataFrame <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html>`_
- as a DataFrame that supports the DataFrame Interchange Protocol (contains a ``__dataframe__`` attribute), e.g. polars and pyarrow. This is experimental.
- as a :class:`Data` or related object (i.e. :class:`UrlData`, :class:`InlineData`, :class:`NamedData`)
- as a url string pointing to a ``json`` or ``csv`` formatted text file
- as a `geopandas GeoDataFrame <http://geopandas.org/data_structures.html#geodataframe>`_, `Shapely Geometries <https://shapely.readthedocs.io/en/latest/manual.html#geometric-objects>`_, `GeoJSON Objects <https://github.com/jazzband/geojson#geojson-objects>`_ or other objects that support the ``__geo_interface__``
- as a generated dataset such as numerical sequences or geographic reference elements
- as a DataFrame that supports the DataFrame Interchange Protocol (contains a ``__dataframe__`` attribute). This is experimental.

When data is specified as a DataFrame, the encoding is quite simple, as Altair
When data is specified as a pandas DataFrame, Altair
uses the data type information provided by pandas to automatically determine
the data types required in the encoding. For example, here we specify data via a pandas DataFrame
and Altair automatically detects that the x-column should be visualized on a quantitative scale
and that the y-column should be visualized on a categorical scale:
and that the y-column should be visualized on a categorical (nominal) scale:

.. altair-plot::

Expand All @@ -40,7 +40,10 @@ and that the y-column should be visualized on a categorical scale:
y='y',
)

By comparison, here we create the same chart using a :class:`Data` object,
By comparison,
all other ways of specifying the data (including non-pandas DataFrames)
requires encoding types to be declared explicitly.
Here we create the same chart as above using a :class:`Data` object,
with the data specified as a JSON-style list of records:

.. altair-plot::
Expand All @@ -53,13 +56,13 @@ with the data specified as a JSON-style list of records:
{'x': 'D', 'y': 7},
{'x': 'E', 'y': 2}])
alt.Chart(data).mark_bar().encode(
x='x:O', # specify ordinal data
x='x:N', # specify nominal data
y='y:Q', # specify quantitative data
)

Notice the extra markup required in the encoding; because Altair cannot infer
the types within a :class:`Data` object, we must specify them manually
(here we use :ref:`shorthand-description` to specify *ordinal* (``O``) for ``x``
(here we use :ref:`shorthand-description` to specify *nominal* (``N``) for ``x``
and *quantitative* (``Q``) for ``y``; see :ref:`encoding-data-types`).

Similarly, we must also specify the data type when referencing data by URL:
Expand All @@ -75,7 +78,8 @@ Similarly, we must also specify the data type when referencing data by URL:
y='Miles_per_Gallon:Q'
)

We will further discuss encodings and associated types in :ref:`user-guide-encoding`, next.
Encodings and their associated types are further discussed in :ref:`user-guide-encoding`.
Below we go into more detail about the different ways of specifying data in an Altair chart.

Pandas DataFrame
~~~~~~~~~~~~~~~~
Expand Down