Merge remote-tracking branch 'upstream/master' into to_html-to_string

* upstream/master: BUG: to_html misses truncation indicators (...) when index=False (pandas-dev#22786) API/DEPR: replace "raise_conflict" with "errors" for df.update (pandas-dev#23657) BUG: Append DataFrame to Series with dateutil timezone (pandas-dev#23685) CLN/CI: Catch that stderr-warning! (pandas-dev#23706) ENH: Allow for join between two multi-index dataframe instances (pandas-dev#20356) Ensure Index._data is an ndarray (pandas-dev#23628) DOC: flake8-per-pr for windows users (pandas-dev#23707) DOC: Handle exceptions when computing contributors. (pandas-dev#23714) DOC: Validate space before colon docstring parameters pandas-dev#23483 (pandas-dev#23506) BUG-22984 Fix truncation of DataFrame representations (pandas-dev#22987)
thoo · Nov 15, 2018 · ddf42c7 · ddf42c7
2 parents 2bb90d4 + 8af7637
commit ddf42c7
Show file tree

Hide file tree

Showing 24 changed files with 1,193 additions and 662 deletions.
diff --git a/doc/source/contributing.rst b/doc/source/contributing.rst
@@ -591,21 +591,14 @@ run this slightly modified command::
 
    git diff master --name-only -- "*.py" | grep "pandas/" | xargs flake8
 
-Note that on Windows, these commands are unfortunately not possible because
-commands like ``grep`` and ``xargs`` are not available natively. To imitate the
-behavior with the commands above, you should run::
+Windows does not support the ``grep`` and ``xargs`` commands (unless installed
+for example via the `MinGW <http://www.mingw.org/>`__ toolchain), but one can
+imitate the behaviour as follows::
 
-    git diff master --name-only -- "*.py"
+    for /f %i in ('git diff upstream/master --name-only ^| findstr pandas/') do flake8 %i
 
-This will list all of the Python files that have been modified. The only ones
-that matter during linting are any whose directory filepath begins with "pandas."
-For each filepath, copy and paste it after the ``flake8`` command as shown below:
-
-    flake8 <python-filepath>
-
-Alternatively, you can install the ``grep`` and ``xargs`` commands via the
-`MinGW <http://www.mingw.org/>`__ toolchain, and it will allow you to run the
-commands above.
+This will also get all the files being changed by the PR (and within the
+``pandas/`` folder), and run ``flake8`` on them one after the other.
 
 .. _contributing.import-formatting:
 

diff --git a/doc/source/whatsnew/v0.24.0.rst b/doc/source/whatsnew/v0.24.0.rst
@@ -184,6 +184,47 @@ array, but rather an ``ExtensionArray``:
 This is the same behavior as ``Series.values`` for categorical data. See
 :ref:`whatsnew_0240.api_breaking.interval_values` for more.
 
+.. _whatsnew_0240.enhancements.join_with_two_multiindexes:
+
+Joining with two multi-indexes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+:func:`Datafame.merge` and :func:`Dataframe.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels (:issue:`6360`)
+
+See the :ref:`Merge, join, and concatenate
+<merging.Join_with_two_multi_indexes>` documentation section.
+
+.. ipython:: python
+
+   index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
+                                          ('K1', 'X2')],
+                                          names=['key', 'X'])
+
+
+   left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
+                        'B': ['B0', 'B1', 'B2']},
+                         index=index_left)
+
+
+   index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
+                                           ('K2', 'Y2'), ('K2', 'Y3')],
+                                           names=['key', 'Y'])
+
+
+   right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
+                         'D': ['D0', 'D1', 'D2', 'D3']},
+                        index=index_right)
+
+
+    left.join(right)
+
+For earlier versions this can be done using the following.
+
+.. ipython:: python
+
+   pd.merge(left.reset_index(), right.reset_index(),
+            on=['key'], how='inner').set_index(['key', 'X', 'Y'])
+
 .. _whatsnew_0240.enhancements.rename_axis:
 
 Renaming names in a MultiIndex
@@ -983,6 +1024,7 @@ Deprecations
 - The ``fastpath`` keyword of the different Index constructors is deprecated (:issue:`23110`).
 - :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)
 - The class ``FrozenNDArray`` has been deprecated. When unpickling, ``FrozenNDArray`` will be unpickled to ``np.ndarray`` once this class is removed (:issue:`9031`)
+- The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`)
 - Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
   `use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
 - :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
@@ -1321,7 +1363,9 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
 - :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
 - :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
 - Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
-- Bug in :func:`to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
+- Bug in :func:`to_html()` with ``index=False`` misses truncation indicators (...) on truncated DataFrame (:issue:`15019`, :issue:`22783`)
+- Bug in :func:`DataFrame.to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
+- Bug in :func:`DataFrame.to_string()` that caused representations of :class:`DataFrame` to not take up the whole window (:issue:`22984`)
 - Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).
 - Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
 - Bug in :meth:`read_csv()` in which :class:`MultiIndex` index names were being improperly handled in the cases when they were not provided (:issue:`23484`)
@@ -1374,6 +1418,7 @@ Reshaping
 - Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue:`22796`)
 - Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
 - Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
+- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
 
 .. _whatsnew_0240.bug_fixes.sparse:
 

diff --git a/doc/sphinxext/contributors.py b/doc/sphinxext/contributors.py
@@ -10,6 +10,7 @@
 """
 from docutils import nodes
 from docutils.parsers.rst import Directive
+import git
 
 from announce import build_components
 
@@ -19,17 +20,25 @@ class ContributorsDirective(Directive):
     name = 'contributors'
 
     def run(self):
-        components = build_components(self.arguments[0])
-
-        message = nodes.paragraph()
-        message += nodes.Text(components['author_message'])
-
-        listnode = nodes.bullet_list()
-
-        for author in components['authors']:
-            para = nodes.paragraph()
-            para += nodes.Text(author)
-            listnode += nodes.list_item('', para)
+        range_ = self.arguments[0]
+        try:
+            components = build_components(range_)
+        except git.GitCommandError:
+            return [
+                self.state.document.reporter.warning(
+                    "Cannot find contributors for range '{}'".format(range_),
+                    line=self.lineno)
+            ]
+        else:
+            message = nodes.paragraph()
+            message += nodes.Text(components['author_message'])
+
+            listnode = nodes.bullet_list()
+
+            for author in components['authors']:
+                para = nodes.paragraph()
+                para += nodes.Text(author)
+                listnode += nodes.list_item('', para)
 
         return [message, listnode]
 

diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx
@@ -48,8 +48,7 @@ cdef extern from "src/parse_helper.h":
     int floatify(object, float64_t *result, int *maybe_int) except -1
 
 cimport util
-from util cimport (is_nan,
-                   UINT8_MAX, UINT64_MAX, INT64_MAX, INT64_MIN)
+from util cimport is_nan, UINT64_MAX, INT64_MAX, INT64_MIN
 
 from tslib import array_to_datetime
 from tslibs.nattype cimport NPY_NAT
@@ -1642,20 +1641,22 @@ def is_datetime_with_singletz_array(values: ndarray) -> bool:
 
     if n == 0:
         return False
-
+    # Get a reference timezone to compare with the rest of the tzs in the array
     for i in range(n):
         base_val = values[i]
         if base_val is not NaT:
             base_tz = get_timezone(getattr(base_val, 'tzinfo', None))
-
-            for j in range(i, n):
-                val = values[j]
-                if val is not NaT:
-                    tz = getattr(val, 'tzinfo', None)
-                    if not tz_compare(base_tz, tz):
-                        return False
             break
 
+    for j in range(i, n):
+        # Compare val's timezone with the reference timezone
+        # NaT can coexist with tz-aware datetimes, so skip if encountered
+        val = values[j]
+        if val is not NaT:
+            tz = getattr(val, 'tzinfo', None)
+            if not tz_compare(base_tz, tz):
+                return False
+
     return True
 
 
@@ -2045,7 +2046,7 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,
 
     # we try to coerce datetime w/tz but must all have the same tz
     if seen.datetimetz_:
-        if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
+        if is_datetime_with_singletz_array(objects):
             from pandas import DatetimeIndex
             return DatetimeIndex(objects)
         seen.object_ = 1

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -5203,8 +5203,10 @@ def combiner(x, y):
 
         return self.combine(other, combiner, overwrite=False)
 
+    @deprecate_kwarg(old_arg_name='raise_conflict', new_arg_name='errors',
+                     mapping={False: 'ignore', True: 'raise'})
     def update(self, other, join='left', overwrite=True, filter_func=None,
-               raise_conflict=False):
+               errors='ignore'):
         """
         Modify in place using non-NA values from another DataFrame.
 
@@ -5228,17 +5230,28 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
             * False: only update values that are NA in
               the original DataFrame.
 
-        filter_func : callable(1d-array) -> boolean 1d-array, optional
+        filter_func : callable(1d-array) -> bool 1d-array, optional
             Can choose to replace values other than NA. Return True for values
             that should be updated.
-        raise_conflict : bool, default False
-            If True, will raise a ValueError if the DataFrame and `other`
+        errors : {'raise', 'ignore'}, default 'ignore'
+            If 'raise', will raise a ValueError if the DataFrame and `other`
             both contain non-NA data in the same place.
 
+            .. versionchanged :: 0.24.0
+               Changed from `raise_conflict=False|True`
+               to `errors='ignore'|'raise'`.
+
+        Returns
+        -------
+        None : method directly changes calling object
+
         Raises
         ------
         ValueError
-            When `raise_conflict` is True and there's overlapping non-NA data.
+            * When `errors='raise'` and there's overlapping non-NA data.
+            * When `errors` is not either `'ignore'` or `'raise'`
+        NotImplementedError
+            * If `join != 'left'`
 
         See Also
         --------
@@ -5309,6 +5322,9 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
         # TODO: Support other joins
         if join != 'left':  # pragma: no cover
             raise NotImplementedError("Only left join is supported")
+        if errors not in ['ignore', 'raise']:
+            raise ValueError("The parameter errors must be either "
+                             "'ignore' or 'raise'")
 
         if not isinstance(other, DataFrame):
             other = DataFrame(other)
@@ -5322,7 +5338,7 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
                 with np.errstate(all='ignore'):
                     mask = ~filter_func(this) | isna(that)
             else:
-                if raise_conflict:
+                if errors == 'raise':
                     mask_this = notna(that)
                     mask_that = notna(this)
                     if any(mask_this & mask_that):