Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement several RandomVariables as SymbolicRandomVariables #7239

Merged
merged 3 commits into from
Apr 14, 2024

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Apr 5, 2024

Description

Implement several RandomVariables as SymbolicRandomVariables

This allows sampling from multiple backends without having to dispatch for each one

Also:

  • Allow signature to handle rng and size arguments explicitly.
  • Move rv_op method to the SymbolicRandomVariable class and get rid of dummy inputs logic (it was needed in previous versions of PyTensor)
  • Allow dispatch methods without filtering of inputs for SymbolicRandomVariable distributions

Related Issue

  • Closes #
  • Related to #

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc--7239.org.readthedocs.build/en/7239/

@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch 2 times, most recently from bdc958d to f81b494 Compare April 9, 2024 18:17
@ricardoV94 ricardoV94 changed the title Implement DirichletMultinomial as OpFromGraph Implement several RandomVariables as SymbolicRandomVariables Apr 9, 2024
@ricardoV94 ricardoV94 marked this pull request as ready for review April 9, 2024 18:19
@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch 6 times, most recently from 800b896 to 43f7a91 Compare April 9, 2024 19:04
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is mostly shuffling around and simplifying stuff that already exists, so if tests pass it looks great. I'm a bit unclear on how the signatures for RVs work, was that changed in this PR or another one? The square brackets in particular throw me off. There's also some opportunities for refactoring duplicated code.

pymc/distributions/continuous.py Outdated Show resolved Hide resolved
pymc/distributions/distribution.py Show resolved Hide resolved
pymc/distributions/distribution.py Show resolved Hide resolved
pymc/distributions/timeseries.py Show resolved Hide resolved
pymc/distributions/truncated.py Show resolved Hide resolved
pymc/distributions/distribution.py Outdated Show resolved Hide resolved
pymc/distributions/timeseries.py Show resolved Hide resolved
pymc/distributions/truncated.py Show resolved Hide resolved
@ricardoV94
Copy link
Member Author

ricardoV94 commented Apr 10, 2024

Seems like this is mostly shuffling around and simplifying stuff that already exists, so if tests pass it looks great.

There are two commits in this PR. The first one shuffles and refactors things so that the second commit (converting existing RandomVariables to SymbolicRandomVariables) is done mostly without hassle.

I'm a bit unclear on how the signatures for RVs work, was that changed in this PR or another one? The square brackets in particular throw me off.

I think this is the kind of signature we should use in PyTensor, so I was testing it here. The numpy vectorize signature doesn't really handle stuff like rng, size, axis, so I think we should take the initative here. In a previous PR I pretend those were scalars, so stuff looked like (),(),(),()->(),(), which is both fake and I think less readable than [rng],[size],(),()->[rng],(). Wdyt? This would ultimately replace the ndim_supp and ndims_params which is very strict and only works because we assume all RVs have the same non-numerical signature. This is not useful for SymbolicRandomVariables that can have multiple (or zero) RNGs and only optionally size.

There's also some opportunities for refactoring duplicated code.

I'm weary of some helpers, but I can optimize for a bit less DRY

@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch from 43f7a91 to 9be037c Compare April 10, 2024 10:42
@ricardoV94
Copy link
Member Author

@jessegrabowski I simplified the logic needed to build standard SymbolicRandomVariables, what do you think?

@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch from 9be037c to 2e6c2c4 Compare April 10, 2024 11:07
Comment on lines 278 to 289
class _class_or_instancemethod(classmethod):
"""Allow a method to be called both as a classmethod and an instancemethod,
giving priority to the instance method.

This is used to allow extracting information from the signature of a SymbolicRandomVariable
which may be provided either as a class attribute or as an instance attribute.

Adapted from https://stackoverflow.com/a/28238047
"""

def __get__(self, instance, type_):
descr_get = super().__get__ if instance is None else self.__func__.__get__
return descr_get(instance, type_)


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some bit of blackmagic to allow dynamic properties from the signature, to work both when signatures are class attributes or only instance attributes. Let me know if you see a better/ less magical solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we have the signature fixed in the class (DM), sometimes only known at runtime (Mixture). This allows us to parse ndims_params, ndim_supp, default_outputs from the signature regardless of when it's defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have a class_signature (always defined) and a instance_signature (default None), and drop back to the class_signature if instance_signature is None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A class signature cannot always be defined. The point is I want to extract ndim_supp from either a class signature (without having to initiate said class) or, if it cannot be known in advance, from an instance signature. I need a static class method or instance method that can read either the class or instance signature, this achieves exactly that.

Copy link
Member Author

@ricardoV94 ricardoV94 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I really need this? Not so much. Alternatives are:

  1. Hard code ndim_supp/ndims_params/default_output when they can be known in advance. It's just redundant/error prone because all info exists in the signature.
  2. Don't provide any of these until the Op is defined. Drawback? I can't use these lines as straightfoward:

We have to create dummy Ops with empty size just just to find the ndim_supp:

# SymbolicRVs don't have `ndim_supp` until they are created
ndim_supp = getattr(cls.rv_op, "ndim_supp", None)
if ndim_supp is None:
ndim_supp = cls.rv_op(*dist_params, **kwargs).owner.op.ndim_supp

I can't write classmethods of the new SymbolicRandomVariables like this:

        if rv_size_is_none(size):
            size = implicit_size_from_params(b, kappa, mu, ndims_params=cls.ndims_params)

Because cls.ndims_params doesn't exist until we create the Op below.

I could obviously parse ndims_params/ndim_supp on the spot from the signature. It's just a small convenience. But OTOH it's also just 5 lines of Python

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessegrabowski My last commit reverts the magic solution for the cleanest alternative I can think of. Which one do you prefer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the non-magic commit, for reference it's here: 20e6e3b

@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch from 2e6c2c4 to e2f1185 Compare April 10, 2024 11:35
This is not only not used anywhere, but could also try to initiate an RV with incompatible shapes and size
@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch 4 times, most recently from 6de9efa to e23a785 Compare April 10, 2024 13:41
@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch 2 times, most recently from f822af4 to 0487e92 Compare April 10, 2024 14:42
@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch 3 times, most recently from a550758 to 20e6e3b Compare April 11, 2024 08:16
* Allow signature to handle rng and size arguments explicitly.
* Parse ndim_supp and ndims_params from class signature
* Move rv_op method to the SymbolicRandomVariable class and get rid of dummy inputs logic (it was needed in previous versions of PyTensor)
* Fix errors in automatic signature of CustomDist
* Allow dispatch methods without filtering of inputs for SymbolicRandomVariable distributions
This allows sampling from multiple backends without having to dispatch for each one
@ricardoV94 ricardoV94 force-pushed the simplify_dirichlet_multinomial branch from 20e6e3b to 597e195 Compare April 12, 2024 07:25
@ricardoV94 ricardoV94 merged commit f3c3086 into pymc-devs:main Apr 14, 2024
18 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants