Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into to_html-to_string
Browse files Browse the repository at this point in the history
* upstream/master:
  BUG: to_html misses truncation indicators (...) when index=False (pandas-dev#22786)
  API/DEPR: replace "raise_conflict" with "errors" for df.update (pandas-dev#23657)
  BUG: Append DataFrame to Series with dateutil timezone (pandas-dev#23685)
  CLN/CI: Catch that stderr-warning! (pandas-dev#23706)
  ENH: Allow for join between two multi-index dataframe instances (pandas-dev#20356)
  Ensure Index._data is an ndarray (pandas-dev#23628)
  DOC: flake8-per-pr for windows users (pandas-dev#23707)
  DOC: Handle exceptions when computing contributors. (pandas-dev#23714)
  DOC: Validate space before colon docstring parameters pandas-dev#23483 (pandas-dev#23506)
  BUG-22984 Fix truncation of DataFrame representations (pandas-dev#22987)
  • Loading branch information
thoo committed Nov 15, 2018
2 parents 2bb90d4 + 8af7637 commit ddf42c7
Show file tree
Hide file tree
Showing 24 changed files with 1,193 additions and 662 deletions.
19 changes: 6 additions & 13 deletions doc/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -591,21 +591,14 @@ run this slightly modified command::

git diff master --name-only -- "*.py" | grep "pandas/" | xargs flake8

Note that on Windows, these commands are unfortunately not possible because
commands like ``grep`` and ``xargs`` are not available natively. To imitate the
behavior with the commands above, you should run::
Windows does not support the ``grep`` and ``xargs`` commands (unless installed
for example via the `MinGW <http://www.mingw.org/>`__ toolchain), but one can
imitate the behaviour as follows::

git diff master --name-only -- "*.py"
for /f %i in ('git diff upstream/master --name-only ^| findstr pandas/') do flake8 %i

This will list all of the Python files that have been modified. The only ones
that matter during linting are any whose directory filepath begins with "pandas."
For each filepath, copy and paste it after the ``flake8`` command as shown below:

flake8 <python-filepath>

Alternatively, you can install the ``grep`` and ``xargs`` commands via the
`MinGW <http://www.mingw.org/>`__ toolchain, and it will allow you to run the
commands above.
This will also get all the files being changed by the PR (and within the
``pandas/`` folder), and run ``flake8`` on them one after the other.

.. _contributing.import-formatting:

Expand Down
47 changes: 46 additions & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,47 @@ array, but rather an ``ExtensionArray``:
This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.

.. _whatsnew_0240.enhancements.join_with_two_multiindexes:

Joining with two multi-indexes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`Datafame.merge` and :func:`Dataframe.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels (:issue:`6360`)

See the :ref:`Merge, join, and concatenate
<merging.Join_with_two_multi_indexes>` documentation section.

.. ipython:: python
index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
('K1', 'X2')],
names=['key', 'X'])
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=index_left)
index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
('K2', 'Y2'), ('K2', 'Y3')],
names=['key', 'Y'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=index_right)
left.join(right)
For earlier versions this can be done using the following.

.. ipython:: python
pd.merge(left.reset_index(), right.reset_index(),
on=['key'], how='inner').set_index(['key', 'X', 'Y'])
.. _whatsnew_0240.enhancements.rename_axis:

Renaming names in a MultiIndex
Expand Down Expand Up @@ -983,6 +1024,7 @@ Deprecations
- The ``fastpath`` keyword of the different Index constructors is deprecated (:issue:`23110`).
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)
- The class ``FrozenNDArray`` has been deprecated. When unpickling, ``FrozenNDArray`` will be unpickled to ``np.ndarray`` once this class is removed (:issue:`9031`)
- The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`)
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
Expand Down Expand Up @@ -1321,7 +1363,9 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
- :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
- :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
- Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
- Bug in :func:`to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
- Bug in :func:`to_html()` with ``index=False`` misses truncation indicators (...) on truncated DataFrame (:issue:`15019`, :issue:`22783`)
- Bug in :func:`DataFrame.to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
- Bug in :func:`DataFrame.to_string()` that caused representations of :class:`DataFrame` to not take up the whole window (:issue:`22984`)
- Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).
- Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
- Bug in :meth:`read_csv()` in which :class:`MultiIndex` index names were being improperly handled in the cases when they were not provided (:issue:`23484`)
Expand Down Expand Up @@ -1374,6 +1418,7 @@ Reshaping
- Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue:`22796`)
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)

.. _whatsnew_0240.bug_fixes.sparse:

Expand Down
31 changes: 20 additions & 11 deletions doc/sphinxext/contributors.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
"""
from docutils import nodes
from docutils.parsers.rst import Directive
import git

from announce import build_components

Expand All @@ -19,17 +20,25 @@ class ContributorsDirective(Directive):
name = 'contributors'

def run(self):
components = build_components(self.arguments[0])

message = nodes.paragraph()
message += nodes.Text(components['author_message'])

listnode = nodes.bullet_list()

for author in components['authors']:
para = nodes.paragraph()
para += nodes.Text(author)
listnode += nodes.list_item('', para)
range_ = self.arguments[0]
try:
components = build_components(range_)
except git.GitCommandError:
return [
self.state.document.reporter.warning(
"Cannot find contributors for range '{}'".format(range_),
line=self.lineno)
]
else:
message = nodes.paragraph()
message += nodes.Text(components['author_message'])

listnode = nodes.bullet_list()

for author in components['authors']:
para = nodes.paragraph()
para += nodes.Text(author)
listnode += nodes.list_item('', para)

return [message, listnode]

Expand Down
23 changes: 12 additions & 11 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,7 @@ cdef extern from "src/parse_helper.h":
int floatify(object, float64_t *result, int *maybe_int) except -1

cimport util
from util cimport (is_nan,
UINT8_MAX, UINT64_MAX, INT64_MAX, INT64_MIN)
from util cimport is_nan, UINT64_MAX, INT64_MAX, INT64_MIN

from tslib import array_to_datetime
from tslibs.nattype cimport NPY_NAT
Expand Down Expand Up @@ -1642,20 +1641,22 @@ def is_datetime_with_singletz_array(values: ndarray) -> bool:

if n == 0:
return False

# Get a reference timezone to compare with the rest of the tzs in the array
for i in range(n):
base_val = values[i]
if base_val is not NaT:
base_tz = get_timezone(getattr(base_val, 'tzinfo', None))

for j in range(i, n):
val = values[j]
if val is not NaT:
tz = getattr(val, 'tzinfo', None)
if not tz_compare(base_tz, tz):
return False
break

for j in range(i, n):
# Compare val's timezone with the reference timezone
# NaT can coexist with tz-aware datetimes, so skip if encountered
val = values[j]
if val is not NaT:
tz = getattr(val, 'tzinfo', None)
if not tz_compare(base_tz, tz):
return False

return True


Expand Down Expand Up @@ -2045,7 +2046,7 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,

# we try to coerce datetime w/tz but must all have the same tz
if seen.datetimetz_:
if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
if is_datetime_with_singletz_array(objects):
from pandas import DatetimeIndex
return DatetimeIndex(objects)
seen.object_ = 1
Expand Down
28 changes: 22 additions & 6 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5203,8 +5203,10 @@ def combiner(x, y):

return self.combine(other, combiner, overwrite=False)

@deprecate_kwarg(old_arg_name='raise_conflict', new_arg_name='errors',
mapping={False: 'ignore', True: 'raise'})
def update(self, other, join='left', overwrite=True, filter_func=None,
raise_conflict=False):
errors='ignore'):
"""
Modify in place using non-NA values from another DataFrame.
Expand All @@ -5228,17 +5230,28 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
* False: only update values that are NA in
the original DataFrame.
filter_func : callable(1d-array) -> boolean 1d-array, optional
filter_func : callable(1d-array) -> bool 1d-array, optional
Can choose to replace values other than NA. Return True for values
that should be updated.
raise_conflict : bool, default False
If True, will raise a ValueError if the DataFrame and `other`
errors : {'raise', 'ignore'}, default 'ignore'
If 'raise', will raise a ValueError if the DataFrame and `other`
both contain non-NA data in the same place.
.. versionchanged :: 0.24.0
Changed from `raise_conflict=False|True`
to `errors='ignore'|'raise'`.
Returns
-------
None : method directly changes calling object
Raises
------
ValueError
When `raise_conflict` is True and there's overlapping non-NA data.
* When `errors='raise'` and there's overlapping non-NA data.
* When `errors` is not either `'ignore'` or `'raise'`
NotImplementedError
* If `join != 'left'`
See Also
--------
Expand Down Expand Up @@ -5309,6 +5322,9 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
# TODO: Support other joins
if join != 'left': # pragma: no cover
raise NotImplementedError("Only left join is supported")
if errors not in ['ignore', 'raise']:
raise ValueError("The parameter errors must be either "
"'ignore' or 'raise'")

if not isinstance(other, DataFrame):
other = DataFrame(other)
Expand All @@ -5322,7 +5338,7 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
with np.errstate(all='ignore'):
mask = ~filter_func(this) | isna(that)
else:
if raise_conflict:
if errors == 'raise':
mask_this = notna(that)
mask_that = notna(this)
if any(mask_this & mask_that):
Expand Down
Loading

0 comments on commit ddf42c7

Please sign in to comment.