Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more precise but flexible typing for statistics, time, and datetime. #12175

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Flameeyes
Copy link
Contributor

No description provided.

Most of the functions in the module include fairly straightforward arithmetic
operations that can be executed on any type of Real number, not just floats or
fractions.

Decimal still needs to be included explicitly, as it is not a subtype of Real.
…ent it.

While the C standard does not define the type for time_t, all of the APIs
using it can only provide resolution of one second.

While `SupportsInt` should describe this correctly, passing a Decimal object
does appear to cause a `DeprecationWarning` right now:

    >>> time.localtime(decimal.Decimal(1))
    <stdin>:1: DeprecationWarning: an integer is required (got type decimal.Decimal).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
    time.struct_time(tm_year=1969, tm_mon=12, tm_mday=31, tm_hour=16, tm_min=0, tm_sec=1, tm_wday=2, tm_yday=365, tm_isdst=0)
The `fromtimestamp()` API differs in behaviour between `date` and `datetime`.

In the case of `date`, the parameter is passed directly to `time.localtime()`
or `time.gmtime()`, which only accept integral values, so it is obvious for
it to take `SupportsInt` (despite the `DeprecationWarning` when passing, for
instance, `Decimal` numbers.) Due to the nature of the type, sub-second
precision is not necessary.

In the case of `datetime`, sub-second precision *is* necessary, so at least
up to Python 3.12, it's this implementation that extracts the decimal part
of the parameter. But it then take the remainder integral part and passes it
to the same `time.localtime()`, which requires a conversion to integer.

This means that while the function takes `float` parameters just fine, it
requires them to also support `int` conversion — so it will not accept e.g.
`Fraction` parameters.

    >>> datetime.date.fromtimestamp(fractions.Fraction(1, 1))
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: an integer is required (got type Fraction)
    >>> datetime.datetime.fromtimestamp(decimal.Decimal(3))
    datetime.datetime(1969, 12, 31, 16, 0, 3)

It also doesn't quite always preserve the sub-second precision for non-float
types:

    >>> t = time.time()
    >>> t
    1718884427.5039804
    >>> d = decimal.Decimal(t)
    >>> d
    Decimal('1718884427.5039803981781005859375')
    >>> datetime.datetime.fromtimestamp(t)
    datetime.datetime(2024, 6, 20, 4, 53, 47, 503980)
    >>> datetime.datetime.fromtimestamp(d)
    datetime.datetime(2024, 6, 20, 4, 53, 47)
@@ -32,8 +32,8 @@ if sys.version_info >= (3, 13):
__all__ += ["kde", "kde_random"]

# Most functions in this module accept homogeneous collections of one of these types
_Number: TypeAlias = float | Decimal | Fraction
_NumberT = TypeVar("_NumberT", float, Decimal, Fraction)
_Number: TypeAlias = Real | Decimal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid type checkers don't understand float to be a subtype of numbers.Real

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit scared now — I know pyre does, and ISTR mypy did as well?

Is there an easy-to-use playground to run the same typing information across all the supported checkers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy has never properly supported the ABCs in the numbers module, and I don't believe pyright has either. There's no cross-typechecker playground, but MyPy's playground is at https://mypy-play.net/?mypy=latest&python=3.12 and pyright's is at https://pyright-play.net/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see evidence of this in the new openlibrary hit reported by mypy_primer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this is worse than I expected :(

Not only mypy doesn't support it (I might be confusing it with pytype, but it's been years, I'm lost), but neither of them seem to even think that a * 2 maintains the type.

Do you have a suggestion on how to handle this? Just add Real without removing float ? Replace it with SupportsFloat even if it's not quite the right type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers ABCs are sort-of fundamentally broken when it comes to static typing, I'm afraid. It's impossible to add proper annotations to many of the methods for those ABCs in numbers.pyi here in typeshed, and so as a result we deliberately don't have the builtin numeric classes inherit from them in builtins.pyi; this is why type checkers do not understand them.

I'll take a look in a bit and see if I can think of what to do here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask what the problem you're trying to solve is here? I feel like our current versions of _Number and _NumberT here are going to work well for the vast majority of uses of the statistics functions. Is there a specific false positive mypy emitted on some code that you're trying to fix?

Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

pandas (https://github.com/pandas-dev/pandas)
+ pandas/_libs/tslibs/timestamps.pyi:88: error: Signature of "fromtimestamp" incompatible with supertype "date"  [override]
+ pandas/_libs/tslibs/timestamps.pyi:88: note:      Superclass:
+ pandas/_libs/tslibs/timestamps.pyi:88: note:          @classmethod
+ pandas/_libs/tslibs/timestamps.pyi:88: note:          def fromtimestamp(cls, SupportsInt, /) -> Timestamp
+ pandas/_libs/tslibs/timestamps.pyi:88: note:      Subclass:
+ pandas/_libs/tslibs/timestamps.pyi:88: note:          @classmethod
+ pandas/_libs/tslibs/timestamps.pyi:88: note:          def fromtimestamp(cls, ts: float, tz: str | tzinfo | int | None = ...) -> Timestamp
+ pandas/_libs/tslibs/timestamps.pyi:88: error: Argument 1 of "fromtimestamp" is incompatible with supertype "datetime"; supertype defines the argument type as "SupportsInt"  [override]
+ pandas/_libs/tslibs/timestamps.pyi:88: note: This violates the Liskov substitution principle
+ pandas/_libs/tslibs/timestamps.pyi:88: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides
+ pandas/_libs/tslibs/timestamps.pyi:90: error: Argument 1 of "utcfromtimestamp" is incompatible with supertype "datetime"; supertype defines the argument type as "SupportsInt"  [override]
+ pandas/_libs/tslibs/timestamps.pyi:90: note: This violates the Liskov substitution principle
+ pandas/_libs/tslibs/timestamps.pyi:90: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

openlibrary (https://github.com/internetarchive/openlibrary)
+ openlibrary/solr/updater/work.py: note: In member "number_of_pages_median" of class "WorkSolrBuilder":
+ openlibrary/solr/updater/work.py:377: error: Value of type variable "_NumberT" of "median" cannot be "int"  [type-var]

pandas-stubs (https://github.com/pandas-dev/pandas-stubs)
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: error: Signature of "fromtimestamp" incompatible with supertype "date"  [override]
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: note:      Superclass:
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: note:          @classmethod
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: note:          def fromtimestamp(cls, SupportsInt, /) -> Timestamp
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: note:      Subclass:
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: note:          @classmethod
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: note:          def fromtimestamp(cls, t: float, tz: tzinfo | str | None = ...) -> Timestamp
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: error: Argument 1 of "fromtimestamp" is incompatible with supertype "datetime"; supertype defines the argument type as "SupportsInt"  [override]
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: note: This violates the Liskov substitution principle
+ pandas-stubs/_libs/tslibs/timestamps.pyi:107: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides
+ pandas-stubs/_libs/tslibs/timestamps.pyi:115: error: Argument 1 of "utcfromtimestamp" is incompatible with supertype "datetime"; supertype defines the argument type as "SupportsInt"  [override]
+ pandas-stubs/_libs/tslibs/timestamps.pyi:115: note: This violates the Liskov substitution principle
+ pandas-stubs/_libs/tslibs/timestamps.pyi:115: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to use SupportsInt don't look correct either, unfortunately. Possibly SupportsIndex would be a better fit than SupportsInt:

@@ -51,7 +51,7 @@ class date:
resolution: ClassVar[timedelta]
def __new__(cls, year: SupportsIndex, month: SupportsIndex, day: SupportsIndex) -> Self: ...
@classmethod
def fromtimestamp(cls, timestamp: float, /) -> Self: ...
def fromtimestamp(cls, timestamp: SupportsInt, /) -> Self: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct:

>>> class HasInt:
...     def __int__(self): return 42
... 
>>> import datetime as dt
>>> dt.date.fromtimestamp(HasInt())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'HasInt' object cannot be interpreted as an integer

Copy link
Member

@AlexWaygood AlexWaygood Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you meant typing.SupportsIndex?

>>> class HasIndex:
...     def __index__(self): return 42
... 
>>> dt.date.fromtimestamp(HasIndex())
datetime.date(1970, 1, 1)

@@ -259,14 +259,14 @@ class datetime(date):
# meaning it is only *safe* to pass it as a keyword argument on 3.12+
if sys.version_info >= (3, 12):
@classmethod
def fromtimestamp(cls, timestamp: float, tz: _TzInfo | None = ...) -> Self: ...
def fromtimestamp(cls, timestamp: SupportsInt, tz: _TzInfo | None = ...) -> Self: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor is this:

>>> dt.datetime.fromtimestamp(HasInt())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'HasInt' object cannot be interpreted as an integer


@classmethod
@deprecated("Use timezone-aware objects to represent datetimes in UTC; e.g. by calling .fromtimestamp(datetime.UTC)")
def utcfromtimestamp(cls, t: float, /) -> Self: ...
def utcfromtimestamp(cls, t: SupportsInt, /) -> Self: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor is this:

>>> dt.datetime.utcfromtimestamp(HasInt())
<stdin>:1: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'HasInt' object cannot be interpreted as an integer

Comment on lines +71 to +73
def ctime(seconds: SupportsInt | None = None, /) -> str: ...
def gmtime(seconds: SupportsInt | None = None, /) -> struct_time: ...
def localtime(seconds: SupportsInt | None = None, /) -> struct_time: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all incorrect as well:

>>> import time
>>> class HasInt:
...     def __int__(self): return 42
...
>>> time.ctime(HasInt())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'HasInt' object cannot be interpreted as an integer
>>> time.gmtime(HasInt())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'HasInt' object cannot be interpreted as an integer
>>> time.localtime(HasInt())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'HasInt' object cannot be interpreted as an integer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants