Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for a functional API? #84

Closed
adamboche opened this issue Feb 20, 2019 · 15 comments
Closed

Option for a functional API? #84

adamboche opened this issue Feb 20, 2019 · 15 comments
Labels
discussion Open for input

Comments

@adamboche
Copy link

Hello! I'm excited to see all the cool ideas going on in the new PyMC, and I'm looking forward to using it for real. I've been following the development a little, and had an idea I wanted to run by you. I'm still new to PyMC, so please correct me if I get anything wrong.

One of the distinctive features of PyMC is its usage of context managers for building models, like this:

with pm.Model() as model:
    eta = pm.Normal("eta", 0, 1, shape=J)
    mu = pm.Normal("mu", 0, sd=1e6)
    tau = pm.HalfCauchy("tau", 5)
    theta = pm.Deterministic("theta", mu + tau * eta)
    obs = pm.Normal("obs", theta, sd=sigma, observed=y)
    trace_h = pm.sample(1000)

plot_summary(model)

This kind of API is powerful in that it allows users to transparently access the sampling backend without extra work, and it makes common workflows really quick and easy. The decorator-based @pm.model API has similar advantages. The developer guide explains the power and flexibility that comes out of this design.

The design also has some side effects:

  • It relies on hidden global mutable state to manage the context, which can be hard for some users to understand. It's not always clear what must be done inside versus outside the context manager, or what state is attached to which objects.
  • It couples the model to the data -- there's no concept of a model in the absence of its observed data.
  • It requires passing the name of the each variable to the variable's constructor. This could be avoided by hacking the AST, but that would be rather less robust, and the Python AST is documented as unstable: "The abstract syntax itself might change with each Python release".

I've been wondering about some possible API designs. Some of them may have been discussed and rejected already; please forgive me if I'm being redundant.

One idea that might be familiar to Python developers might be using a class per model, something like this:

@model
class MyModel:
    J = ConstantInteger()
    eta = Normal(0, 1, shape=J)
    mu = Normal(0, sd=1e6)
    tau = HalfCauchy(5)
    theta = Deterministic(mu + tau * eta)


# Any of these functions could be methods instead.
model = MyModel()
observed = observe(model, data)
trace = sample(observed)
plot_summary(trace)

I'm not 100% sure that it can do everything PyMC needs, but, from my (possibly naive) perspective, having an option like this might have some benefits:

  • All the necessary state can live on the model instance, rather than in a global context or on the distribution objects. Simple functions (or methods) connect the objects of the API, making it composable and easy to use in a library.
  • The model can exist independent of any observed data.
  • No AST hacking is necessary to give each distribution a name. The setup can be done in a class decorator, as in the popular attrs library, or in the attribute initialization through the descriptor protocol, each of which produces plain ol' python objects without hidden state.

There's a lot to explore in this design space. If this seems interesting to people, I'm happy to discuss or try out some implementation ideas, to see if something like this could be possible, and if it'd be nice. I'd love to hear your thoughts! 🙂

@twiecki
Copy link
Member

twiecki commented Feb 20, 2019

I like it. One key question is if the model class can be initialized twice as we currently do with the function. We have to call it twice in different contexts currently, once to create RVs and gather the tensors, and then once to create the logp tensor with inputs from step 1.

@twiecki
Copy link
Member

twiecki commented Mar 2, 2019

@adamboche Any more thoughts on this?

@adamboche
Copy link
Author

@twiecki Sorry for the delay; I'm hoping for a moment to experiment with it this week.

@jt-lab
Copy link

jt-lab commented Mar 12, 2019

I like the idea of separating model and data in this way. Would make applying the model to batches of datasets (e.g. to simulations for power estimations) much more convenient.

@adamboche
Copy link
Author

I started trying out some basic ideas but it'll require more thought and experimentation -- nothing I have is usable and I wouldn't recommend implementing their current incarnation. In case anyone wants to read along, my code is available.

Some things I like so far:

  • A model is defined declaratively on a class which becomes a plain python class immediately
  • The model produces instances which are very basic
  • Defining the model happens separately from combining it with data

On the other hand, there is still a bit more magic involved than I'd like, and there's an issue of how to represent variables that depend on other variables.

@twiecki
Copy link
Member

twiecki commented Mar 17, 2019

that looks quite interesting, is this functional or just pseudo code?

@twiecki
Copy link
Member

twiecki commented Mar 18, 2019

Just looked a bit over the code base, looks really nice. Unless I missed it, the key missing piece is construction of a tensor-in-tensor-out logp function.

@junpenglao junpenglao added the discussion Open for input label Mar 18, 2019
@twiecki
Copy link
Member

twiecki commented Sep 5, 2019

Closing due to inactivity.

@twiecki twiecki closed this as completed Sep 5, 2019
@Padarn
Copy link

Padarn commented Mar 31, 2023

Curious if this ever went anywhere? I see no linked issues but I'd not be surprised if it was picked up elsewhere

@twiecki
Copy link
Member

twiecki commented Apr 6, 2023

@Padarn pymc4 is no more, check out pymc 5: https://github.com/pymc-devs/pymc

@Padarn
Copy link

Padarn commented Apr 8, 2023

Thanks @twiecki. I had seen that but couldn't find discussion on this topic there. Would you suggest opening an issue to discuss in the pymc repo?

@twiecki
Copy link
Member

twiecki commented Apr 9, 2023

I'd be curious what use-case you're after.

@Padarn
Copy link

Padarn commented Apr 10, 2023

Sure. Nothing very specific, I just thought the proposal here was quite nice and made the API easier to understand. Teaching people about the context manager API has been a hurdle in getting people to work on pymc code in my team.

@twiecki
Copy link
Member

twiecki commented Apr 12, 2023

Curious, it hasn't been a problem in my experience, but I also don't go into detail of what it does. In any case, we probably won't provide an alternative API in PyMC.

@Padarn
Copy link

Padarn commented Apr 12, 2023

Got it, totally reasonable. Thanks for your responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Open for input
Projects
None yet
Development

No branches or pull requests

5 participants