[WIP] Typehint np.random #11

alanhdu · 2018-03-01T01:28:17Z

No description provided.

alanhdu · 2018-03-01T01:33:05Z

This isn't ready yet, but I wanted to get feedback about whether this was the right approach (lots of @overloads for type-hinting). The problems I have with it are:

It's verbose and irritating (I actually have a jinja2 template that generates this file), although this is a one-time cost so I'm not too concerned.
overload semantics are a little in flux -- because of error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/mypy#4020, some of the overloads are counted as overlapping from MyPy and return an error (see that last commit)
The type signatures aren't strictly correct -- if you pass in a scalar ndarray into one of these functions, you get a scalar non-ndarray out. I'm not sure how to deal with this until we can distinguish scalar ndarrays and non-scalar ndarrays.

I'm not sure what the alternative approaches are -- I thought about trying to create a special _Distribution that subclasses Callable and somehow does all of the broadcasting for us, but I couldn't actually figure out how to get that to work.

shoyer · 2018-03-01T02:08:04Z

numpy/random/mtrand.pyi

+    @overload
+    def beta(self, a: float, b: float) -> float: ...
+    @overload
+    def beta(self, a: float, b: float, size: _Size) -> float: ...


If size is specified, you get an ndarray, not a float.

shoyer · 2018-03-01T02:12:07Z

numpy/random/mtrand.pyi

+    @overload
+    def beta(self, a: float, b: float, size: _Size) -> float: ...
+    @overload
+    def beta(self, a: ndarray, b: float) -> ndarray: ...


Unfortunately this isn't quite right -- if you pass in a zero-dimensional ndarray, you get a float back, not a ndarray:

In [17]: rs.beta(np.array(2), 2) Out[17]: 0.3340049666412606

This sort of behavior (zero dimensional arrays -> scalars) is unfortunately quite common with NumPy.

EDIT: I see you already noted this in your comment :)

shoyer · 2018-03-01T02:14:18Z

numpy/random/mtrand.pyi

+    @overload
+    def choice(self, a: int, *,
+               replace: bool = ..., p: Optional[Sequence[float]] = ...
+               ) -> int: ...


This is np.int64, not int (at least not on Python 3)

shoyer · 2018-03-01T02:16:30Z

numpy/random/mtrand.pyi

+    @overload
+    def choice(self, a: Sequence[_T], *,
+               replace: bool = ..., p: Optional[Sequence[float]] = ...
+               ) -> _T: ...


a gets cast to an ndarray, so the return value is the corresponding NumPy scalar type.

shoyer · 2018-03-01T02:18:38Z

numpy/random/mtrand.pyi

+_T = TypeVar('_T')
+
+class RandomState:
+    def __init__(self, state: Optional[Union[int, ndarray]] = ...) -> None: ...


The constructor argument is called seed, not state:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html

Also, it can accept sequences of integers ("array_like") not just ndarray instances.

shoyer · 2018-03-01T02:19:48Z

numpy/random/mtrand.pyi

+    @overload
+    def beta(self, a: float, b: ndarray) -> ndarray: ...
+    @overload
+    def beta(self, a: ndarray, b: ndarray) -> ndarray: ...


This method can also accept arbitrary "array-like" types which it coerces to ndarrays, e.g., lists of numbers.

Hm... I'm not sure how to represent "array-like" types. A naive approach would be something like:

_ArrayLike = Union[Sequence[_T], ndarray]

But nested sequences are also _ArrayLike and MyPy doesn't have support for recursive types (which we'd need for arbitrarily nested sequences).

shoyer · 2018-03-01T02:34:28Z

It's verbose and irritating (I actually have a jinja2 template that generates this file), although this is a one-time cost so I'm not too concerned.

There are going to be quite a few functions like these (e.g., all NumPy ufuncs), so it may make sense to incorporate some sort of template generation in the build process for this library.

I guess ufuncs are objects, we so might be able to specify the type-casting rules once (for each number of function arguments) on the base class. But there are still lots of NumPy functions/methods that aren't ufuncs...

overload semantics are a little in flux -- because of python/mypy#4020, some of the overloads are counted as overlapping from MyPy and return an error (see that last commit)

This is more concerning to me -- it basically blocks anyone from using our annotations with MyPy. I think we need to ensure that mypy can use our annotations without errors.

The type signatures aren't strictly correct -- if you pass in a scalar ndarray into one of these functions, you get a scalar non-ndarray out. I'm not sure how to deal with this until we can distinguish scalar ndarrays and non-scalar ndarrays.

Yes, this sort of behavior is very unfortunate for us, and unfortunately is also quite prevalent in NumPy. My inclination is that our type-stubs should prioritize correctness more than catching all possible errors. Zero-dimensional arrays are somewhat usual to see in NumPy, but they do come up. Without typing support for array dimensionality, this would mean that many of these return values should be unsatisfying Any types.

alanhdu · 2018-03-02T02:05:31Z

I believe I've figured out how to hack around the @overload problem -- by explicitly annotating size: None, MyPy no longer thinks that they're conflicting types.

There are going to be quite a few functions like these (e.g., all NumPy ufuncs), so it may make sense to incorporate some sort of template generation in the build process for this library.

Yeah, I think some template generation "build" makes a lot of sense. my approach w/ Jinja2 is pretty ugly (because of how it handles whitespace) and still fails flake8 despite my best efforts. Do you know of a good way to template out the AST level (or something that's not whitespace sensitive?)? For prettifying, I've thought about using yapf or some other autoformatter (although we might have to tweak them for .pyi files).

Yes, this sort of behavior is very unfortunate for us, and unfortunately is also quite prevalent in NumPy. My inclination is that our type-stubs should prioritize correctness more than catching all possible errors. Zero-dimensional arrays are somewhat usual to see in NumPy, but they do come up. Without typing support for array dimensionality, this would mean that many of these return values should be unsatisfying Any types.

I'm willing to defer to your judgement here, but I'd personally lean the other way -- IME w/ MyPy is still limited enough that you almost never have "seamless" integration with any reasonably large codebase, and given that you have to adapt the coding style (to write "type-friendly" code) and do integration work anyways, I think it's reasonable to prioritize "correctness" (as long as the burden's not too much, like having to add some int and float casts for numpy scalars).

In any case, I think Any is an ok return type -- we could also return a Union[int, ndarray] or whatever, although that's arguably the worst of both worlds (since you'd almost certainly need to do a type assertion on the output anyways).

It conflicts with `@overload`s in type stubs

Better overloading rules

alanhdu · 2018-07-06T23:27:56Z

@shoyer Sorry for the (very long) delay!

I've pushed a new revision which (I believe) fully types out np.random (although I need to go through and double-check and write some tests). I decided to start over with a custom stub generator (in generate/random.py and generate/utils.py), which I found to be much smoother and easier to read than trying to use Jinja2 (significant whitespace meant that Jinja2 was finicky).

To avoid having to think too much about formatting, I also used https://github.com/ambv/black as a (manual) post-processing step of the generated stub files.

I think the main unresolved questions I have are:

Does this light-weight stub generator look good, or does it need more structure?
What's the best way to integrate a stub generator and the stubs? do we check both of them into the repo? Do we automatically run the generator in CI?
Should black just autoformat everything in the repo? (I think the answer is yes)
I don't think we resolved the behavior with 0-D ndarrays. I still believe that the current behavior is the "best" compromise, although I'm willing to defer to whatever you think's best.

WDYT?

The most significant benefit is changes to `@overload` around what counts as overlapping signatures that could unlock some functionality; see e.g. - numpy#44 (comment) - numpy#11 (comment)

The most significant benefit is changes to `@overload` around what counts as overlapping signatures that could unlock some functionality; see e.g. - #44 (comment) - #11 (comment)

person142 · 2020-06-09T15:37:57Z

Closing; but feel free to make a new PR against the NumPy main repo!

shoyer reviewed Mar 1, 2018

View reviewed changes

shoyer mentioned this pull request Mar 5, 2018

Balancing correctness vs. type sanity #12

Closed

Ignore F8111 flake8 error

f32978e

It conflicts with `@overload`s in type stubs

alanhdu force-pushed the random branch 2 times, most recently from b38cc2b to 813f0b9 Compare July 6, 2018 23:02

alanhdu added 3 commits July 6, 2018 19:05

Add generate framework

20ddd20

Generate random typestubs

0bf5f84

Add a bit of a test suite

44c9946

alanhdu force-pushed the random branch from 813f0b9 to 44c9946 Compare July 6, 2018 23:05

Bump mypy version

4bb8a46

Better overloading rules

alanhdu force-pushed the random branch from 19f75a6 to 4bb8a46 Compare July 6, 2018 23:16

alanhdu mentioned this pull request Nov 20, 2018

Bumpy mypy versions #28

Merged

person142 mentioned this pull request Mar 26, 2020

MAINT: upgrade mypy to 0.770 (the latest) #45

Merged

alanhdu mentioned this pull request Jun 9, 2020

Use a stub-generator numpy/numpy#16548

Closed

person142 closed this Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Typehint np.random #11

[WIP] Typehint np.random #11

alanhdu commented Mar 1, 2018

alanhdu commented Mar 1, 2018

shoyer Mar 1, 2018

shoyer Mar 1, 2018 •

edited

Loading

shoyer Mar 1, 2018

shoyer Mar 1, 2018

shoyer Mar 1, 2018

shoyer Mar 1, 2018

alanhdu Mar 2, 2018

shoyer commented Mar 1, 2018

alanhdu commented Mar 2, 2018

alanhdu commented Jul 6, 2018 •

edited

Loading

person142 commented Jun 9, 2020

[WIP] Typehint np.random #11

[WIP] Typehint np.random #11

Conversation

alanhdu commented Mar 1, 2018

alanhdu commented Mar 1, 2018

shoyer Mar 1, 2018

Choose a reason for hiding this comment

shoyer Mar 1, 2018 • edited Loading

Choose a reason for hiding this comment

shoyer Mar 1, 2018

Choose a reason for hiding this comment

shoyer Mar 1, 2018

Choose a reason for hiding this comment

shoyer Mar 1, 2018

Choose a reason for hiding this comment

shoyer Mar 1, 2018

Choose a reason for hiding this comment

alanhdu Mar 2, 2018

Choose a reason for hiding this comment

shoyer commented Mar 1, 2018

alanhdu commented Mar 2, 2018

alanhdu commented Jul 6, 2018 • edited Loading

person142 commented Jun 9, 2020

shoyer Mar 1, 2018 •

edited

Loading

alanhdu commented Jul 6, 2018 •

edited

Loading