Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into bug/categorical-i…
Browse files Browse the repository at this point in the history
…ndexing-1row-df

* upstream/master: (109 commits)
  stronger typing in libreduction (pandas-dev#29502)
  API: rename labels to codes (pandas-dev#29509)
  CLN: remove unnecessary type checks (pandas-dev#29517)
  implement _BaseGrouper (pandas-dev#29520)
  CLN: F-string formatting in pandas/_libs/*.pyx (pandas-dev#29527)
  Fixed more SS03 errors (pandas-dev#29540)
  consolidate dim checks (pandas-dev#29536)
  REF: separate out _get_cython_func_and_vals (pandas-dev#29537)
  remove unnecessary exception (pandas-dev#29538)
  TST:Add test to check single category col returns series with single row slice (pandas-dev#29521)
  Make color validation more forgiving (pandas-dev#29122)
  DOC: update bottleneck repo and documentation urls (pandas-dev#29516)
  TST: add test for df construction from dict with tuples (pandas-dev#29497)
  add test for pd.melt dtypes preservation (pandas-dev#29510)
  updated DataFrame.equals docstring (pandas-dev#29496)
  Resolved merge conflicts (pandas-dev#29506)
  DOC: Improved pandas/compact/__init__.py (pandas-dev#29507)
  DOC: Update performance comparison section of io docs (pandas-dev#28890)
  TST: add test for df.where() with category dtype (pandas-dev#29454)
  DOC: Fix docs on merging categoricals. (pandas-dev#28185)
  ...
  • Loading branch information
keechongtan committed Nov 11, 2019
2 parents 2dfa594 + 07efdd4 commit 3e847e9
Show file tree
Hide file tree
Showing 206 changed files with 4,540 additions and 4,113 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ matrix:
include:
- dist: bionic
# 18.04
python: 3.8-dev
python: 3.8.0
env:
- JOB="3.8-dev" PATTERN="(not slow and not network)"

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ or for installing in [development mode](https://pip.pypa.io/en/latest/reference/


```sh
python -m pip install --no-build-isolation -e .
python -m pip install -e . --no-build-isolation --no-use-pep517
```

If you have `make`, you can also use `make develop` to run the same command.
Expand Down
30 changes: 16 additions & 14 deletions ci/azure/posix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,16 @@ jobs:
strategy:
matrix:
${{ if eq(parameters.name, 'macOS') }}:
py35_macos:
ENV_FILE: ci/deps/azure-macos-35.yaml
CONDA_PY: "35"
py36_macos:
ENV_FILE: ci/deps/azure-macos-36.yaml
CONDA_PY: "36"
PATTERN: "not slow and not network"

${{ if eq(parameters.name, 'Linux') }}:
py35_compat:
ENV_FILE: ci/deps/azure-35-compat.yaml
CONDA_PY: "35"
py36_minimum_versions:
ENV_FILE: ci/deps/azure-36-minimum_versions.yaml
CONDA_PY: "36"
PATTERN: "not slow and not network"

py36_locale_slow_old_np:
ENV_FILE: ci/deps/azure-36-locale.yaml
CONDA_PY: "36"
Expand All @@ -45,13 +44,16 @@ jobs:
PATTERN: "not slow and not network"
LOCALE_OVERRIDE: "zh_CN.UTF-8"

py37_np_dev:
ENV_FILE: ci/deps/azure-37-numpydev.yaml
CONDA_PY: "37"
PATTERN: "not slow and not network"
TEST_ARGS: "-W error"
PANDAS_TESTING_MODE: "deprecate"
EXTRA_APT: "xsel"
# https://github.com/pandas-dev/pandas/issues/29432
# py37_np_dev:
# ENV_FILE: ci/deps/azure-37-numpydev.yaml
# CONDA_PY: "37"
# PATTERN: "not slow and not network"
# TEST_ARGS: "-W error"
# PANDAS_TESTING_MODE: "deprecate"
# EXTRA_APT: "xsel"
# # TODO:
# continueOnError: true

steps:
- script: |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,23 @@ channels:
dependencies:
- beautifulsoup4=4.6.0
- bottleneck=1.2.1
- cython>=0.29.13
- jinja2=2.8
- numexpr=2.6.2
- numpy=1.13.3
- openpyxl=2.4.8
- pytables=3.4.2
- python-dateutil=2.6.1
- python=3.5.3
- python=3.6.1
- pytz=2017.2
- scipy=0.19.0
- xlrd=1.1.0
- xlsxwriter=0.9.8
- xlwt=1.2.0
# universal
- html5lib=1.0.1
- hypothesis>=3.58.0
- pytest=4.5.0
- pytest-xdist
- pytest-mock
- pytest-azurepipelines
- pip
- pip:
# for python 3.5, pytest>=4.0.2, cython>=0.29.13 is not available in conda
- cython>=0.29.13
- pytest==4.5.0
- html5lib==1.0b2
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ dependencies:
- openpyxl
- pyarrow
- pytables
- python=3.5.*
- python=3.6.*
- python-dateutil==2.6.1
- pytz
- xarray
Expand Down
31 changes: 4 additions & 27 deletions doc/source/development/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ We'll now kick off a three-step process:
# Build and install pandas
python setup.py build_ext --inplace -j 4
python -m pip install -e . --no-build-isolation
python -m pip install -e . --no-build-isolation --no-use-pep517
At this point you should be able to import pandas from your locally built version::

Expand Down Expand Up @@ -236,7 +236,7 @@ Creating a Python environment (pip)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you aren't using conda for your development environment, follow these instructions.
You'll need to have at least python3.5 installed on your system.
You'll need to have at least Python 3.6.1 installed on your system.

**Unix**/**Mac OS**

Expand All @@ -255,7 +255,7 @@ You'll need to have at least python3.5 installed on your system.
# Build and install pandas
python setup.py build_ext --inplace -j 0
python -m pip install -e . --no-build-isolation
python -m pip install -e . --no-build-isolation --no-use-pep517
**Windows**

Expand Down Expand Up @@ -847,29 +847,6 @@ The limitation here is that while a human can reasonably understand that ``is_nu
With custom types and inference this is not always possible so exceptions are made, but every effort should be exhausted to avoid ``cast`` before going down such paths.

Syntax Requirements
~~~~~~~~~~~~~~~~~~~

Because *pandas* still supports Python 3.5, :pep:`526` does not apply and variables **must** be annotated with type comments. Specifically, this is a valid annotation within pandas:

.. code-block:: python
primes = [] # type: List[int]
Whereas this is **NOT** allowed:

.. code-block:: python
primes: List[int] = [] # not supported in Python 3.5!
Note that function signatures can always be annotated per :pep:`3107`:

.. code-block:: python
def sum_of_primes(primes: List[int] = []) -> int:
...
Pandas-specific Types
~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -1296,7 +1273,7 @@ environment by::

or, to use a specific Python interpreter,::

asv run -e -E existing:python3.5
asv run -e -E existing:python3.6

This will display stderr from the benchmarks, and use your local
``python`` that comes from your ``$PATH``.
Expand Down
2 changes: 1 addition & 1 deletion doc/source/development/policies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Pandas may change the behavior of experimental features at any time.
Python Support
~~~~~~~~~~~~~~

Pandas will only drop support for specific Python versions (e.g. 3.5.x, 3.6.x) in
Pandas will only drop support for specific Python versions (e.g. 3.6.x, 3.7.x) in
pandas **major** releases.

.. _SemVer: https://semver.org
45 changes: 34 additions & 11 deletions doc/source/getting_started/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -753,28 +753,51 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
Tablewise function application
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``DataFrames`` and ``Series`` can of course just be passed into functions.
``DataFrames`` and ``Series`` can be passed into functions.
However, if the function needs to be called in a chain, consider using the :meth:`~DataFrame.pipe` method.
Compare the following

.. code-block:: python
First some setup:

.. ipython:: python
# f, g, and h are functions taking and returning ``DataFrames``
>>> f(g(h(df), arg1=1), arg2=2, arg3=3)
def extract_city_name(df):
"""
Chicago, IL -> Chicago for city_name column
"""
df['city_name'] = df['city_and_code'].str.split(",").str.get(0)
return df
with the equivalent
def add_country_name(df, country_name=None):
"""
Chicago -> Chicago-US for city_name column
"""
col = 'city_name'
df['city_and_country'] = df[col] + country_name
return df
.. code-block:: python
df_p = pd.DataFrame({'city_and_code': ['Chicago, IL']})
``extract_city_name`` and ``add_country_name`` are functions taking and returning ``DataFrames``.

Now compare the following:

.. ipython:: python
add_country_name(extract_city_name(df_p), country_name='US')
Is equivalent to:

.. ipython:: python
>>> (df.pipe(h)
... .pipe(g, arg1=1)
... .pipe(f, arg2=2, arg3=3))
(df_p.pipe(extract_city_name)
.pipe(add_country_name, country_name="US"))
Pandas encourages the second style, which is known as method chaining.
``pipe`` makes it easy to use your own or another library's functions
in method chains, alongside pandas' methods.

In the example above, the functions ``f``, ``g``, and ``h`` each expected the ``DataFrame`` as the first positional argument.
In the example above, the functions ``extract_city_name`` and ``add_country_name`` each expected a ``DataFrame`` as the first positional argument.
What if the function you wish to apply takes its data as, say, the second argument?
In this case, provide ``pipe`` with a tuple of ``(callable, data_keyword)``.
``.pipe`` will route the ``DataFrame`` to the argument specified in the tuple.
Expand Down
47 changes: 0 additions & 47 deletions doc/source/getting_started/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -564,53 +564,6 @@ to a column created earlier in the same :meth:`~DataFrame.assign`.
In the second expression, ``x['C']`` will refer to the newly created column,
that's equal to ``dfa['A'] + dfa['B']``.

To write code compatible with all versions of Python, split the assignment in two.

.. ipython:: python
dependent = pd.DataFrame({"A": [1, 1, 1]})
(dependent.assign(A=lambda x: x['A'] + 1)
.assign(B=lambda x: x['A'] + 2))
.. warning::

Dependent assignment may subtly change the behavior of your code between
Python 3.6 and older versions of Python.

If you wish to write code that supports versions of python before and after 3.6,
you'll need to take care when passing ``assign`` expressions that

* Update an existing column
* Refer to the newly updated column in the same ``assign``

For example, we'll update column "A" and then refer to it when creating "B".

.. code-block:: python
>>> dependent = pd.DataFrame({"A": [1, 1, 1]})
>>> dependent.assign(A=lambda x: x["A"] + 1, B=lambda x: x["A"] + 2)
For Python 3.5 and earlier the expression creating ``B`` refers to the
"old" value of ``A``, ``[1, 1, 1]``. The output is then

.. code-block:: console
A B
0 2 3
1 2 3
2 2 3
For Python 3.6 and later, the expression creating ``A`` refers to the
"new" value of ``A``, ``[2, 2, 2]``, which results in

.. code-block:: console
A B
0 2 4
1 2 4
2 2 4

Indexing / selection
~~~~~~~~~~~~~~~~~~~~
Expand Down
6 changes: 3 additions & 3 deletions doc/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Instructions for installing from source,
Python version support
----------------------

Officially Python 3.5.3 and above, 3.6, 3.7, and 3.8.
Officially Python 3.6.1 and above, 3.7, and 3.8.

Installing pandas
-----------------
Expand Down Expand Up @@ -140,7 +140,7 @@ Installing with ActivePython
Installation instructions for
`ActivePython <https://www.activestate.com/activepython>`__ can be found
`here <https://www.activestate.com/activepython/downloads>`__. Versions
2.7 and 3.5 include pandas.
2.7, 3.5 and 3.6 include pandas.

Installing using your Linux distribution's package manager.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -218,7 +218,7 @@ Recommended dependencies
``numexpr`` uses multiple cores as well as smart chunking and caching to achieve large speedups.
If installed, must be Version 2.6.2 or higher.

* `bottleneck <https://github.com/kwgoodman/bottleneck>`__: for accelerating certain types of ``nan``
* `bottleneck <https://github.com/pydata/bottleneck>`__: for accelerating certain types of ``nan``
evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups. If installed,
must be Version 1.2.1 or higher.

Expand Down
1 change: 0 additions & 1 deletion doc/source/reference/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ Compatibility with MultiIndex
:toctree: api/

Index.set_names
Index.is_lexsorted_for_tuple
Index.droplevel

Missing values
Expand Down
2 changes: 2 additions & 0 deletions doc/source/reference/window.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Standard moving window functions
Rolling.quantile
Window.mean
Window.sum
Window.var
Window.std

.. _api.functions_expanding:

Expand Down
Loading

0 comments on commit 3e847e9

Please sign in to comment.