BUG: DataFrame constructor raises error if specify tz dtype `dtype='datetime64[ns, UTC]'` #12513

BranYang · 2016-03-02T16:19:56Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
array_dim2 = np.arange(10).reshape((5, 2))
df = pd.DataFrame(array_dim2 , dtype='datetime64[ns, UTC]') # doesn't work

The error:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-7101cf798aa3> in <module>()
----> 1 df = pd.DataFrame(array_dim2 , dtype='datetime64[ns, UTC]')

C:\D\Projects\Github\pandas\pandas\core\frame.py in __init__(self, data, index,
columns, dtype, copy)
    252             else:
    253                 mgr = self._init_ndarray(data, index, columns, dtype=dty
pe,
--> 254                                          copy=copy)
    255         elif isinstance(data, (list, types.GeneratorType)):
    256             if isinstance(data, types.GeneratorType):

C:\D\Projects\Github\pandas\pandas\core\frame.py in _init_ndarray(self, values,
index, columns, dtype, copy)
    412
    413         if dtype is not None:
--> 414             if values.dtype != dtype:
    415                 try:
    416                     values = values.astype(dtype)

TypeError: data type not understood

Expected Output

In [5]: df = pd.DataFrame(array_dim2 , dtype='datetime64[ns, UTC]')

In [6]: df
Out[6]:
                              0                                           1
0 1970-01-01 00:00:00.000000000+00:00 1970-01-01 00:00:00.000000001+00:00
1 1970-01-01 00:00:00.000000002+00:00 1970-01-01 00:00:00.000000003+00:00
2 1970-01-01 00:00:00.000000004+00:00 1970-01-01 00:00:00.000000005+00:00
3 1970-01-01 00:00:00.000000006+00:00 1970-01-01 00:00:00.000000007+00:00
4 1970-01-01 00:00:00.000000008+00:00 1970-01-01 00:00:00.000000009+00:00

output of `pd.show_versions()`

python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0rc1+66.gce3ac93
nose: 1.3.7
pip: 8.0.2
setuptools: 19.2
Cython: 0.23.4
numpy: 1.10.1
scipy: None
statsmodels: None
xarray: None
IPython: 4.0.2
sphinx: 1.3.1
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.3.3
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8

The text was updated successfully, but these errors were encountered:

BranYang · 2016-03-02T16:20:34Z

Trying to look into this.

John-Boik · 2017-10-04T21:25:35Z

Is there any workaround offered for the `dtype='datetime64[ns, UTC]' problem? Any suggestions?

jreback · 2017-10-05T18:39:39Z

what are you trying to do?

jreback · 2017-10-05T18:41:59Z

this is a reasonable way to deal with this.

In [15]: array_dim2 = np.arange(10).reshape((5, 2))
    ...: df = pd.DataFrame(array_dim2)
    ...: 
    ...: 

In [16]: df
Out[16]: 
   0  1
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

In [17]: df.apply(lambda x: pd.to_datetime(x, unit='D').dt.tz_localize('UTC'))
Out[17]: 
                          0                         1
0 1970-01-01 00:00:00+00:00 1970-01-02 00:00:00+00:00
1 1970-01-03 00:00:00+00:00 1970-01-04 00:00:00+00:00
2 1970-01-05 00:00:00+00:00 1970-01-06 00:00:00+00:00
3 1970-01-07 00:00:00+00:00 1970-01-08 00:00:00+00:00
4 1970-01-09 00:00:00+00:00 1970-01-10 00:00:00+00:00

In [18]: df.apply(lambda x: pd.to_datetime(x, unit='D').dt.tz_localize('UTC')).dtypes
Out[18]: 
0    datetime64[ns, UTC]
1    datetime64[ns, UTC]
dtype: object

John-Boik · 2017-10-05T18:43:06Z

Thanks. I see the error when using Ibis framework, when I query on a table that has null values in a timestamp with timezone field. I did use something like that as a fix, but it was very slow on queries for large tables.

jreback · 2017-10-05T18:45:20Z

@John-Boik that doesn't make sense, you are iterating over the columns. unless you have millions of columns (which would be completely non-performant anyhow)

John-Boik · 2017-10-05T18:50:23Z

The error occurs within Ibis, which calls pandas, which raises an error in ~lib/python3.5/site-packages/pandas/core/internals.py, near line 573: dtype = np.dtype(dtype).
The error is something like "dtype not understood". If I change the database field to timezone without timestamp, then the error is not raised. Nor is it raised if the values are non-null. I am now using an older version of Ibis, where that error is not raised.

John-Boik · 2017-10-05T18:53:03Z

My crude fix was:

            if dtype ==  'datetime64[ns, UTC]':
                dtype = np.arange(2).reshape((1, 2)).astype('datetime64[ns]').dtype
            else:
                assert False

But as I said, it was too slow to work for big tables.

zhuoqiang · 2019-01-02T07:46:12Z

Pandas also failed to view() with tz dtype:

import pandas as pd

df = pd.DataFrame({'a': pd.date_range('2018-01-01', '2018-01-03', tz='Asia/Shanghai')})
da = df['a'].view('int64')
da.view(df['a'].dtype)

will generate TypeError: data type not understood

Traceback (most recent call last)
<ipython-input-62-58aa88ef59a7> in <module>
      3 df = pd.DataFrame({'a': pd.date_range('2018-01-01', '2018-01-03', tz='Asia/Shanghai')})
      4 da = df['a'].view('int64')
----> 5 da.view(df['a'].dtype)

~/python3.7/site-packages/pandas/core/series.py in view(self, dtype)
    632         dtype: int8
    633         """
--> 634         return self._constructor(self._values.view(dtype),
    635                                  index=self.index).__finalize__(self)
    636 

TypeError: data type not understood

I have to use the following view_as() to make it work:

def view_as(s, dtype):
    try:
        return s.view(dtype)
    except TypeError as e:
        if isinstance(dtype, str):
            dtype = pd.core.dtypes.dtypes.DatetimeTZDtype.construct_from_string(dtype)
        if isinstance(dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
            s = s.view(f'datetime64[{dtype.unit}]')
            if dtype.tz:
                s = s.dt.tz_localize('utc').dt.tz_convert(dtype.tz)
            return s
        raise e

Actually, the full version of view_as() could also view the categorical data:

def view_as(s, dtype):
    try:
        if isinstance(s.dtype, pd.core.dtypes.dtypes.CategoricalDtype):
            s = s.cat.codes.values
        if isinstance(dtype, pd.core.dtypes.dtypes.CategoricalDtype):
            return pd.Categorical.from_codes(s, dtype.categories)
        else:
            return s.view(dtype)
    except TypeError as e:
        if isinstance(dtype, str):
            dtype = pd.core.dtypes.dtypes.DatetimeTZDtype.construct_from_string(dtype)
        if isinstance(dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
            s = s.view(f'datetime64[{dtype.unit}]')
            if dtype.tz:
                s = s.dt.tz_localize('utc').dt.tz_convert(dtype.tz)
            return s
        raise e

tswast · 2019-02-22T19:21:40Z

While working on googleapis/python-bigquery-pandas#247, I'm able to construct a DataFrame (and series) with dtype="datetime64[ns, UTC]" in the latest packages on pip, but it fails for the pre-wheels with the following:

_ TestReadGBQIntegration.test_should_properly_handle_timestamp_unix_epoch[env] _

self = <tests.system.test_gbq.TestReadGBQIntegration object at 0x7f3225515588>
project_id = 'pandas-gbq-tests'

    def test_should_properly_handle_timestamp_unix_epoch(self, project_id):
        query = 'SELECT TIMESTAMP("1970-01-01 00:00:00") AS unix_epoch'
        df = gbq.read_gbq(
            query,
            project_id=project_id,
            credentials=self.credentials,
>           dialect="legacy",
        )

tests/system/test_gbq.py:310: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas_gbq/gbq.py:842: in read_gbq
    final_df = connector.run_query(query, configuration=configuration)
pandas_gbq/gbq.py:486: in run_query
    df = rows_iter.to_dataframe(dtypes=nullsafe_dtypes)
/opt/conda/envs/test-environment/lib/python3.6/site-packages/google/cloud/bigquery/table.py:1429: in to_dataframe
    return self._to_dataframe_tabledata_list(dtypes)
/opt/conda/envs/test-environment/lib/python3.6/site-packages/google/cloud/bigquery/table.py:1333: in _to_dataframe_tabledata_list
    frames.append(self._to_dataframe_dtypes(page, column_names, dtypes))
/opt/conda/envs/test-environment/lib/python3.6/site-packages/google/cloud/bigquery/table.py:1325: in _to_dataframe_dtypes
    columns[column] = pandas.Series(columns[column], dtype=dtypes[column])
/opt/conda/envs/test-environment/lib/python3.6/site-packages/pandas/core/series.py:248: in __init__
    raise_cast_failure=True)
/opt/conda/envs/test-environment/lib/python3.6/site-packages/pandas/core/series.py:2967: in _sanitize_array
    subarr = _try_cast(data, False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = [datetime.datetime(1970, 1, 1, 0, 0, tzinfo=<UTC>)]
take_fast_path = False

    def _try_cast(arr, take_fast_path):
    
        # perf shortcut as this is the most common case
        if take_fast_path:
            if maybe_castable(arr) and not copy and dtype is None:
                return arr
    
        try:
            subarr = maybe_cast_to_datetime(arr, dtype)
            if not is_extension_type(subarr):
>               subarr = np.array(subarr, dtype=dtype, copy=copy)
E               TypeError: data type not understood

I'm not sure what the pip packages are doing differently than the latest pre-wheel? In the meantime, I'll use timezone naive datetimes in pandas-gbq.

tswast · 2019-03-23T21:25:19Z

Update: Passing a timezone as part of the dtype string was officially deprecated in #23990 Construct a DatetimeTZDtype instead.

I believe this issue can be closed.

… (pandas-dev#30507)

JoshZastrow · 2021-05-17T20:26:07Z

Is this issue closed? Getting a new error related to how numpy handles datetime[ns, UTC] types:

>>> schema_file = '{"date": "datetime64[ns, UTC]"}'
>>> import json
>>> schema = json.loads(schema_file)

Working Example:

>>> working_df = pd.DataFrame(columns=schema).astype(schema)
>>> working_df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype              
---  ------  --------------  -----              
 0   date    0 non-null      datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1)
memory usage: 0.0+ bytes

Not Working Example # 1

>>> no_work_df = pd.DataFrame(columns=schema, dtype=schema)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/pandas/core/frame.py", line 513, in __init__
    dtype = self._validate_dtype(dtype)
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/pandas/core/generic.py", line 345, in _validate_dtype
    dtype = pandas_dtype(dtype)
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 1799, in pandas_dtype
    npdtype = np.dtype(dtype)
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/numpy/core/_internal.py", line 61, in _usefields
    names, formats, offsets, titles = _makenames_list(adict, align)
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/numpy/core/_internal.py", line 31, in _makenames_list
    raise ValueError("entry not a 2- or 3- tuple")
ValueError: entry not a 2- or 3- tuple

Not Working Example # 2

>>> no_work_df_2 = pd.DataFrame(columns=schema, dtype=working_df.dtypes.to_dict())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/pandas/core/frame.py", line 513, in __init__
    dtype = self._validate_dtype(dtype)
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/pandas/core/generic.py", line 345, in _validate_dtype
    dtype = pandas_dtype(dtype)
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 1799, in pandas_dtype
    npdtype = np.dtype(dtype)
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/numpy/core/_internal.py", line 61, in _usefields
    names, formats, offsets, titles = _makenames_list(adict, align)
  File "/Users/joshua.zastrow/.pyenv/versions/3.7.9/envs/dynamic_pricing/lib/python3.7/site-packages/numpy/core/_internal.py", line 29, in _makenames_list
    n = len(obj)
TypeError: object of type 'DatetimeTZDtype' has no len()

NSTALLED VERSIONS
------------------
commit           : f2c8480af2f25efdbd803218b9d87980f416563e
python           : 3.7.9.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.2.3
numpy            : 1.20.1
pytz             : 2021.1
dateutil         : 2.8.1
pip              : 21.1.1
setuptools       : 47.1.0
Cython           : None
pytest           : 6.2.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : 2.8.6 (dt dec pq3 ext lo64)
jinja2           : 2.11.3
IPython          : 7.21.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.9.0
fastparquet      : None
gcsfs            : 0.8.0
matplotlib       : 3.3.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : 0.14.1
pyarrow          : 3.0.0
pyxlsb           : None
s3fs             : None
scipy            : 1.6.0
sqlalchemy       : 1.3.20
tables           : None
tabulate         : 0.8.9
xarray           : None
xlrd             : None
xlwt             : None
numba            : 0.52.0

jreback · 2021-05-17T22:50:26Z

you would have to try on master and if not u can open an issue

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype labels Mar 3, 2016

jreback added this to the 0.18.1 milestone Mar 3, 2016

jreback modified the milestones: 0.18.1, 0.18.2 Apr 25, 2016

jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 21, 2016

John-Boik mentioned this issue Oct 5, 2017

null timestamp in postgres causes error ibis-project/ibis#1159

Closed

mroeschke mentioned this issue Apr 27, 2018

DataFrame constructor fails with tz dtype #20850

Closed

TomAugspurger changed the title ~~BUG: Construct DataFrame raise error if specify dtype='datetime64[ns, UTC]'~~ BUG: DataFrame constructor raises error if specify tz dtype dtype='datetime64[ns, UTC]' Apr 27, 2018

tswast mentioned this issue Feb 22, 2019

CLN: Use to_dataframe to download query results. googleapis/python-bigquery-pandas#247

Merged

jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Dec 27, 2019

BUG: pass 2D ndarray and EA-dtype to DataFrame, closes pandas-dev#12513

004570f

jbrockmendel mentioned this issue Dec 27, 2019

BUG: pass 2D ndarray and EA-dtype to DataFrame, closes #12513 #30507

Merged

5 tasks

jreback removed this from the Contributions Welcome milestone Jan 1, 2020

jreback added this to the 1.0 milestone Jan 1, 2020

jreback closed this as completed in #30507 Jan 1, 2020

jreback pushed a commit that referenced this issue Jan 1, 2020

BUG: pass 2D ndarray and EA-dtype to DataFrame, closes #12513 (#30507)

765d8db

hweecat pushed a commit to hweecat/pandas that referenced this issue Jan 1, 2020

BUG: pass 2D ndarray and EA-dtype to DataFrame, closes pandas-dev#12513…

7b0714c

… (pandas-dev#30507)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame constructor raises error if specify tz dtype `dtype='datetime64[ns, UTC]'` #12513

BUG: DataFrame constructor raises error if specify tz dtype `dtype='datetime64[ns, UTC]'` #12513

BranYang commented Mar 2, 2016

BranYang commented Mar 2, 2016

John-Boik commented Oct 4, 2017

jreback commented Oct 5, 2017

jreback commented Oct 5, 2017

John-Boik commented Oct 5, 2017 •

edited

Loading

jreback commented Oct 5, 2017

John-Boik commented Oct 5, 2017

John-Boik commented Oct 5, 2017

zhuoqiang commented Jan 2, 2019 •

edited

Loading

tswast commented Feb 22, 2019

tswast commented Mar 23, 2019 •

edited

Loading

JoshZastrow commented May 17, 2021 •

edited

Loading

jreback commented May 17, 2021

BUG: DataFrame constructor raises error if specify tz dtype dtype='datetime64[ns, UTC]' #12513

BUG: DataFrame constructor raises error if specify tz dtype dtype='datetime64[ns, UTC]' #12513

Comments

BranYang commented Mar 2, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

BranYang commented Mar 2, 2016

John-Boik commented Oct 4, 2017

jreback commented Oct 5, 2017

jreback commented Oct 5, 2017

John-Boik commented Oct 5, 2017 • edited Loading

jreback commented Oct 5, 2017

John-Boik commented Oct 5, 2017

John-Boik commented Oct 5, 2017

zhuoqiang commented Jan 2, 2019 • edited Loading

tswast commented Feb 22, 2019

tswast commented Mar 23, 2019 • edited Loading

JoshZastrow commented May 17, 2021 • edited Loading

Working Example:

Not Working Example # 1

Not Working Example # 2

jreback commented May 17, 2021

BUG: DataFrame constructor raises error if specify tz dtype `dtype='datetime64[ns, UTC]'` #12513

BUG: DataFrame constructor raises error if specify tz dtype `dtype='datetime64[ns, UTC]'` #12513

output of `pd.show_versions()`

John-Boik commented Oct 5, 2017 •

edited

Loading

zhuoqiang commented Jan 2, 2019 •

edited

Loading

tswast commented Mar 23, 2019 •

edited

Loading

JoshZastrow commented May 17, 2021 •

edited

Loading