BUG: Assignment of Timestamp Scalar uses micrsosecond precision, Series uses nano #55487

WillAyd · 2023-10-11T16:01:31Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
ts = pd.Timestamp.now()
df = pd.DataFrame({"a": [1]})
df["direct_assignment"] = ts
df["series_assignment"] = pd.Series(ts)
df.dtypes


yields

```python
a                             int64
direct_assignment    datetime64[us]
series_assignment    datetime64[ns]
dtype: object

yields

Issue Description

I was surprised to see the dtype mismatch here

Expected Behavior

At least for backwards compatability we might want to still make the scalar assignment still yield nanosecond resolution

Installed Versions

INSTALLED VERSIONS

commit : c2cd90a
python : 3.10.12.final.0
python-bits : 64
OS : Linux
OS-release : 6.2.0-33-generic
Version : #33-Ubuntu SMP PREEMPT_DYNAMIC Tue Sep 5 14:49:19 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.0dev0+341.gc2cd90ac54
numpy : 1.24.4
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : 0.29.33
pytest : 7.4.2
hypothesis : 6.87.1
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : 3.1.6
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.7
jinja2 : 3.1.2
IPython : 8.16.1
pandas_datareader : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat: None
fastparquet : 2023.8.0
fsspec : 2023.9.2
gcsfs : 2023.9.2
matplotlib : 3.7.3
numba : 0.57.1
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 13.0.0
pyreadstat : 1.2.3
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2023.9.2
scipy : 1.11.3
sqlalchemy : 2.0.21
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.9.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

behrenhoff · 2023-10-25T10:24:45Z

I just ran into a similar issue:

df1 = pd.DataFrame(data={"x": [1], "d": [pd.Timestamp("2020-01-01")]})
df2 = pd.DataFrame(data={"x": [1], "d": pd.Timestamp("2020-01-01")})

Question: which resulting dtypes do df1 and df2 have?

Answer:

>>> df1.dtypes
x             int64
d    datetime64[ns]
dtype: object

>>> df2.dtypes
x            int64
d    datetime64[s]
dtype: object

...and the resulting DFs are incompatible, cannot be concatenated because of incompatible dtypes!

I think it boils down to pd.Timestamp("2020-01-01") deciding on an internal granularity automatically. The "unit" argument does nothing (it is used for interpreting the input value, not the resulting internal dtype). There seems to be no parameter to switch off the automatic. So I think Timestamp should always be "ns" unless you specify something like Timestamp(..., resolution="s") explicitly. Otherwise we get different incompatible dtypes depending on the input string (which might come from external sources). The only current solution seems to be to use Timestamp("2020-01-01").as_unit("ns"). Then my example from above works.

ziadk · 2023-10-26T13:11:15Z

Hello @WillAyd,

I would love to work on this.

I have found that the issue is that the direct assignment passes through the infer_dtype_from_scalar() function in the cast.py file. Inside this function, the following cast is what gives the us precision : val = val.to_datetime64().

To recap, the direct assignment follows this call trace to the problem : Dataframe.__setitem__() -> Dataframe._set_item() ->Dataframe._sanitize_column() -> construction.sanitize_array() -> dtypes.cast.construct_1d_arraylike_from_scalar() -> dtypes.cast.infer_dtype_from_scalar(). Inside this method, these lines of code are the source of our problem:
elif isinstance(val, (np.datetime64, dt.datetime)): ... if val is NaT or val.tz is None: val = val.to_datetime64() dtype = val.dtype

I am just beginning in this kind of open source work, so please do not hesitate to give me any kind of guidance. Also, I would be very happy to work on the problem if you would have any specific guidelines.

Thank you

davetapley · 2023-11-10T20:03:56Z

Dupe of ⬇️ ?

BUG: Inconsistent datetime conversion behavior when constructing a DataFrame with Python datetimes. #55014

ValueRaider · 2023-12-17T13:48:33Z

@davetapley How is pd.Timestamp.now() a Python datetime?

davetapley · 2023-12-23T18:49:53Z

@ValueRaider I'm not sure I follow?

It is literally a datetime in the sense that:

pandas/pandas/_libs/tslibs/timestamps.pyi

Line 34 in dc37a6d

class Timestamp(datetime):

i.e.:

>>> import pandas as pd
>>> from datetime import datetime

>>> isinstance(pd.Timestamp.now(), datetime)
True

R.e. my specific linking of #55014 as a possible dupe,
then the issues are linked because they both have the same symptom,
as identified in #55014 (comment):

for scalars, the resolution is preserved (so for stdlib datetime, it becomes 'us', because that's the resolution of the python stdlib)

for a list, the resolution is 'ns' by default

ValueRaider · 2023-12-30T17:12:00Z

@davetapley It does appear similar, but my concern is that thread is handling bug as a low-priority edge case: I think a conversation is needed regarding the expected behaviour in Pandas 2 when instantiating a DataFrame with columns of type dt.datetime

That this happens using pure Pandas API should raise the urgency.

WillAyd added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 11, 2023

jbrockmendel mentioned this issue Oct 24, 2023

ENH/API: resolution inference in vectorized datetime parsing #55564

Closed

6 tasks

jbrockmendel added the Non-Nano datetime64/timedelta64 with non-nanosecond resolution label Oct 24, 2023

jbrockmendel removed the Needs Triage Issue that has not been reviewed by a pandas team member label Nov 1, 2023

ValueRaider mentioned this issue Feb 10, 2024

Too many "FutureWarning" ValueRaider/yfinance-cache#44

Closed

jbrockmendel mentioned this issue Apr 24, 2024

ENH/WIP: resolution inference in pd.to_datetime, DatetimeIndex #55901

Merged

13 tasks

mroeschke closed this as completed in #55901 May 31, 2024

ValueRaider mentioned this issue Jul 7, 2024

requires numpy version 1.26.x (< 2.0) and appdirs module fix #65 #64 ValueRaider/yfinance-cache#66

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Assignment of Timestamp Scalar uses micrsosecond precision, Series uses nano #55487

BUG: Assignment of Timestamp Scalar uses micrsosecond precision, Series uses nano #55487

WillAyd commented Oct 11, 2023

INSTALLED VERSIONS

behrenhoff commented Oct 25, 2023 •

edited

Loading

ziadk commented Oct 26, 2023

davetapley commented Nov 10, 2023

ValueRaider commented Dec 17, 2023

davetapley commented Dec 23, 2023

ValueRaider commented Dec 30, 2023 •

edited

Loading

BUG: Assignment of Timestamp Scalar uses micrsosecond precision, Series uses nano #55487

BUG: Assignment of Timestamp Scalar uses micrsosecond precision, Series uses nano #55487

Comments

WillAyd commented Oct 11, 2023

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

behrenhoff commented Oct 25, 2023 • edited Loading

ziadk commented Oct 26, 2023

davetapley commented Nov 10, 2023

ValueRaider commented Dec 17, 2023

davetapley commented Dec 23, 2023

ValueRaider commented Dec 30, 2023 • edited Loading

behrenhoff commented Oct 25, 2023 •

edited

Loading

ValueRaider commented Dec 30, 2023 •

edited

Loading