-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Represent Model._coords
values as 1-dim numpy arrays
#6589
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6589 +/- ##
===========================================
- Coverage 92.03% 80.23% -11.80%
===========================================
Files 92 92
Lines 15535 15551 +16
===========================================
- Hits 14297 12478 -1819
- Misses 1238 3073 +1835
|
Model._coords
as a 1-dim numpy arrayModel._coords
values as 1-dim numpy arrays
CI tests found a bug in the original PR; now corrected! I also added an initial regression test for #6496. |
}, | ||
) | ||
assert isinstance(converter.coords["city"], pd.MultiIndex) | ||
# TODO: we now ensure that everything passed to arviz is converted to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging this for discussion. This PR currently updates InferenceDataConverter to always convert coordinate values to ndarray. (Previously they were only converted if the sequence was a tuple.) Is there a reason we would want to maintain the original behavior...?
This test was originally made for #5043
The latest run revealed one other case in the test suite that is incompatible with this PR as it is. One test (link) for pm.Model(coords={"sample": np.ones((2, 5))}) Then when creating a Also looks like the pre-commit run failed due to a different commit touching mcmc.py. ... |
I didn't have time to read on this issue about the internal representation. That Bound test is incorrect, not a design choice, feel free to fix it. |
What is this PR about?
Seeks to address #6496: buggy behavior could occur in InferenceData creation (and perhaps elsewhere, see diff) when a user provides a coordinate value sequence that numpy interprets as
ndim>1
.We prevent this by changing the internal representation of
Model._coords
to beDict[str, Optional[np.ndarray]]
rather thanDict[str, Optional[Tuple]]
, and we ensure that the internal ndarray is 1dim.Currently this PR does not change the publicly-named
Model.coords
property. For backwards-compatibility, that property still returns a dict of tuples. Instead we expose a new model propertyModel.coords_typed
that gives the model coord values in the ndarray format.As it turns out, nearly all of the times that pymc library would access
Model.coords
, it would seek to convert the (tupled) coordinate values to numpy array. This PR instead updates the whole library (not currently tests) to use the newcoords_typed
property. Ultimately this is how #6496 is resolved.This PR is currently a draft to request discussion on if this is the best approach. One alternative would be to update the
Model.coords
property itself—this would prevent cluttering Model with another property but would be less backwards-compatible. Another question is ifnp.ndarray
is in fact the best way to represent coordinate values. E.g. one existing test case in the library gives an example of passingpd.MultiIndex
to coords when converting a trace to InferenceData, and allowing pandas indexes has been discussed in #6081.Once an approach is given the 👍 I'll add some tests (especially regression test).
Checklist
Major / Breaking Changes
New features
Model.coords_typed
property ensures that coordinate values are 1dim (if not none)Bugfixes
coords
properties return coordinates/coordinate values as copiesDocumentation
Maintenance