Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to create non-shared pm.Data #5295

Merged
merged 4 commits into from
Jan 4, 2022

Conversation

michaelosthege
Copy link
Member

@michaelosthege michaelosthege commented Dec 30, 2021

Until now pm.Data always created a SharedVariable, even though most uses of pm.Data are not because it's mutable, but because it registers the variable in the model.

With pm.Data(..., mutable=False/True), this PR adds the option to create non-shared data variables (TensorConstant) that possibly improve performance (via optimizations available for TensorConstant) and can also be used in places where a SharedVariable is not supported.

I also refactored pm.Data from a class into a function. I'm not 100 % sure why it was a class in the first place, but if I understand it correctly our use of __new__ actually violates its intended signature, since we're not returning an instance of the correct type(?).

Depending on what your PR does, here are a few things you might want to address in the description:

  • what are the (breaking) changes that this PR makes? The pm.Data.get_coord staticmethod was refactored to pm.data.determine_coords, but this is just internal API.
  • important background, or details about the implementation. See above.
  • are the changes—especially new features—covered by tests and docstrings? Yes.
  • linting/style checks have been run
  • consider adding/updating relevant example notebooks 👉 Do we want to add a deprecation warning when mutable is not specified? We could also make mutable=None by default and warn that the default will change in the future?
  • right before it's ready to merge, mention the PR in the RELEASE-NOTES.md

Closes #5105

By passing `pm.Data(mutable=False)` one can create a `TensorConstant` instead of a `SharedVariable`.
Data variables with known, fixed shape can enhance performance and compatibility in some situations.
`pm.ConstantData` or `pm.MutableData` wrappers are provided as alternative syntax.

This is the basis for solving pymc-devs#4441.
@michaelosthege
Copy link
Member Author

@ricardoV94 these changes could help for #4441 - I didn't test it, but with a (float) pm.ConstantData/TensorConstant having NaN values one could get at least the dims/coords/InferenceData.constant_data functionality also for imputation.

@codecov
Copy link

codecov bot commented Dec 30, 2021

Codecov Report

Merging #5295 (ea28e41) into main (69d3114) will increase coverage by 0.06%.
The diff coverage is 94.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5295      +/-   ##
==========================================
+ Coverage   78.99%   79.06%   +0.06%     
==========================================
  Files          87       87              
  Lines       14366    14394      +28     
==========================================
+ Hits        11349    11380      +31     
+ Misses       3017     3014       -3     
Impacted Files Coverage Δ
pymc/backends/arviz.py 90.00% <ø> (+0.83%) ⬆️
pymc/model.py 83.55% <ø> (ø)
pymc/data.py 80.67% <93.84%> (+1.11%) ⬆️
pymc/model_graph.py 85.51% <100.00%> (+0.41%) ⬆️
pymc/sampling.py 86.25% <100.00%> (+0.01%) ⬆️
pymc/distributions/discrete.py 98.72% <0.00%> (ø)
pymc/distributions/continuous.py 97.38% <0.00%> (+0.01%) ⬆️
pymc/distributions/multivariate.py 73.76% <0.00%> (+0.22%) ⬆️
... and 1 more

@michaelosthege michaelosthege marked this pull request as ready for review December 30, 2021 17:05
@ricardoV94
Copy link
Member

ricardoV94 commented Dec 30, 2021

@ricardoV94 these changes could help for #4441 - I didn't test it, but with a (float) pm.ConstantData/TensorConstant having NaN values one could get at least the dims/coords/InferenceData.constant_data functionality also for imputation.

Yes it should be possible, just need to consider the TensorConstant case in there

@twiecki
Copy link
Member

twiecki commented Jan 3, 2022

In pm.set_data() can we check if it was unmutable and then give an informative error?

Also, should the default really be mutable=True?

@michaelosthege
Copy link
Member Author

In pm.set_data() can we check if it was unmutable and then give an informative error?

I'll make the error message more instructive 👍

Also, should the default really be mutable=True?

No it shouldn't, but we can't flip that switch at the same time as we're introducing it. That's why I'd first default to showing pm.ConstantData/pm.MutableDatain model graphs (see Slack).
Next step would be a FutureWarning.

Should I add the FutureWarning already? (When mutable is None..)

@twiecki
Copy link
Member

twiecki commented Jan 3, 2022 via email

Copy link
Member

@twiecki twiecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we still need the check in pm.set_data() if it was immutable.

@michaelosthege
Copy link
Member Author

Then we still need the check in pm.set_data() if it was immutable.

It was there already, line 1132 in model.py

@twiecki twiecki self-requested a review January 4, 2022 12:40
@twiecki twiecki merged commit 600fe90 into pymc-devs:main Jan 4, 2022
@twiecki
Copy link
Member

twiecki commented Jan 4, 2022

👍

@michaelosthege michaelosthege deleted the data-refactor branch January 4, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tweak pm.Data to create or auto-replace by TensorConstant
3 participants