Partially observed multivariate distributions not implemented in V4 #5260

ricardoV94 · 2021-12-13T10:00:51Z

I added an explicit NotImplementedError and future test in #5245 - 022cefa

Partial observed variables now rely on Aesara's local_subtensor_rv_lift, to separate the observed and unobserved portions of a RV into two separate RVs.

https://github.com/aesara-devs/aesara/blob/4e1077210721deb007b4f556f08702cec5a74dfb/aesara/tensor/random/opt.py#L259

This rewrite needs to be updated to work for Multivariate distributions, there are a couple of comments in its code suggesting how to start approaching such task. If it does not already exist, we should open an issue in Aesara as those code changes would go there.

Importantly, even after this rewrite, we won't have the same flexibility we had in V3, where we might be missing some portions of a the base multidimensional value. Let's call these "sub-values", such as in pm.MvNormal("x", np.ones(2), np.eye(2), observed=np.array([[1, 1], [1, np.nan]]), because the missing portion of the distribution does not correspond to a MvNormal distribution. In contrast, pm.MvNormal("x", np.ones(2), np.eye(2), observed=np.array([[1, np.nan], [1, np.nan]]) should work just fine, as we can split that into two MvNormals.

On the bright side, we would retain the ability to transform imputed values which was not the case with V3, and which would not be trivial if we were to somehow allow for "sub-value" imputation, because we don't know what are the constraints on those "sub-values".

One different issue, concerns partial observations for aeppl "derived" distributions such as in #5169. These won't work out of the box, because local_subtensor_rv_lift is meant to work only with pure RandomVariables. In theory we could make such rewrite (or a copycat one) work with these "derived distributions", as long as we can provide the value and parameters support/mapping information that the rewrite requires, as well as the ability to re-create a node on the spot. In #5169 I had to implement most of these features anyway, but kept them accessible only via the cls object. These can easily be made more accessible via class properties or dispatching.

CC @brandonwillard

The text was updated successfully, but these errors were encountered:

brandonwillard · 2021-12-13T21:26:15Z

See incsubtensor_rv_replace.

Here's a small example illustrating how AePPL's support for set_subtensor can be used to construct a log-probability function for a multivariate distributions with missing observations:

import aesara.tensor as at
import numpy as np
import scipy.stats
from aeppl.joint_logprob import joint_logprob


data = np.array([[-11, 0, np.nan], [np.nan, 1, 5]])
missing_idx = np.isnan(data)
size = data.shape[0]

mu = np.r_[-10, 0, 10]
sigma = np.eye(3)
Y_base_rv = at.random.multivariate_normal(mu, sigma, size=size, name="Y")

missing_vals_var = at.vector("y_missing")
Y_rv = at.set_subtensor(Y_base_rv[missing_idx], missing_vals_var)

Y_rv_logp = joint_logprob({Y_rv: at.as_tensor(data)}, sum=False)

missing_vals = np.r_[-2, 2]
Y_rv_logp.eval({missing_vals_var: missing_vals})
# array([-75.2568156, -87.7568156])

full_obs = data.copy()
full_obs[missing_idx] = missing_vals

np.fromiter(
    (scipy.stats.multivariate_normal(mu, sigma).logpdf(obs) for obs in full_obs),
    dtype=np.float64,
)
# array([-75.2568156, -87.7568156])

ricardoV94 · 2021-12-22T18:55:34Z

That seems perfect for sub-partial observed values.

On the other hand it is nice to have transforms working for imputed variables like we do now in V4. Perhaps we can do a forward transform pass on the observed data, so that all the input values can be on the transformed scale?

Not sure if this would work already on the aeppl side but it would be the best of both worlds.

The idea, if defensible, would be to allow:

with pm.Model() as m:
  x = pm.Dirichlet("x", a=np.ones(3), observed=[0.1, np.nan, np.nan])

While keeping the sampler happy by proposing the missing value(s) on the Simplex space

ricardoV94 · 2022-12-19T19:33:33Z

Here is a snippet for how to do it manually until we provide a proper API:

https://discourse.pymc.io/t/automatic-imputation-of-multivariate-models/11029/3?u=ricardov94

ricardoV94 added the v4 label Dec 13, 2021

ricardoV94 mentioned this issue Dec 13, 2021

Finalize release notes for 4.0.0-beta1 #5202

Merged

ricardoV94 added the request discussion label Dec 13, 2021

ricardoV94 added logprob pytensor labels Jan 27, 2022

ricardoV94 modified the milestones: v4.0.0b3, v4.0.0 Jan 30, 2022

ricardoV94 modified the milestones: v4.0.0, v4.1.0 Apr 1, 2022

michaelosthege modified the milestones: v4.1.0, v4.2.0 Jul 2, 2022

ricardoV94 modified the milestones: v4.2.0, v4.3.0 Sep 14, 2022

ricardoV94 modified the milestones: v4.3.0, v4.4.0 Oct 27, 2022

ricardoV94 removed this from the v4.4.0 milestone Nov 9, 2022

ricardoV94 mentioned this issue Nov 29, 2022

Extend lift rewrites to multivariate RVs pymc-devs/pytensor#49

Closed

ricardoV94 mentioned this issue Jun 27, 2023

Support automatic imputation for multivariate and symbolic distributions #6797

Merged

ricardoV94 closed this as completed in #6797 Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partially observed multivariate distributions not implemented in V4 #5260

Partially observed multivariate distributions not implemented in V4 #5260

ricardoV94 commented Dec 13, 2021 •

edited

Loading

brandonwillard commented Dec 13, 2021 •

edited

Loading

ricardoV94 commented Dec 22, 2021 •

edited

Loading

ricardoV94 commented Dec 19, 2022

Partially observed multivariate distributions not implemented in V4 #5260

Partially observed multivariate distributions not implemented in V4 #5260

Comments

ricardoV94 commented Dec 13, 2021 • edited Loading

brandonwillard commented Dec 13, 2021 • edited Loading

ricardoV94 commented Dec 22, 2021 • edited Loading

ricardoV94 commented Dec 19, 2022

ricardoV94 commented Dec 13, 2021 •

edited

Loading

brandonwillard commented Dec 13, 2021 •

edited

Loading

ricardoV94 commented Dec 22, 2021 •

edited

Loading