sample_posterior_predictive flattens chains and draws #4004

kyleabeauchamp · 2020-07-06T20:11:26Z

I noticed that the results of pm.sample_posterior_predictive are flattened over (chain, draw) dimensions. AFAIK, this is a lossy transform because the results are numpy objects that don't track the source. I wonder if there should be an option to preserve the (chain, draw) shape and output the results using an arviz InferenceData object. See also arviz-devs/arviz#1282

The text was updated successfully, but these errors were encountered:

rpgoldman · 2020-07-07T01:07:30Z

I believe that there are already options to do this. Have you tried using the keep_size argument to sample_posterior_predictive? Also, ArviZ has a function that translates a PyMC3 set of predictive samples into an InferenceData. Do either of those do what you want?

kyleabeauchamp · 2020-07-07T01:31:05Z

In my test case, I had 2 chains and 500 draws per chain. With keep_size=False, the output of PPC was 1000. With keep_size=True, the output of PPC was (1000, 1).

I'll look at from_pymc3_predictions(), as that looks like exactly what I needed!

kyleabeauchamp · 2020-07-07T02:52:05Z

Here is my minimal reproducing example:

import numpy as np
import pymc3 as pm

chains = 2
n_samples = 1000
n_things = 25

y = np.random.normal(size=n_things) + 5.0

coords = {"things": range(n_things)}

with pm.Model(coords=coords) as model:
    y = pm.Data('y', y, dims="things")
    mu = pm.Normal('mu', 0, sd=50.0)
    obs = pm.Normal('obs', mu, observed=y, dims="things")
    tr = pm.sample(n_samples, chains=chains, return_inferencedata=True)

ppc_true = pm.sample_posterior_predictive(tr.posterior, keep_size=True, model=model)
ppc_false = pm.sample_posterior_predictive(tr.posterior, keep_size=False, model=model)

print(tr.posterior.coords)
print(ppc_true["obs"].shape)
print(ppc_false["obs"].shape)


Coordinates:
  * chain    (chain) int64 0 1
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999
(1, 2000, 25)
(2000, 25)

OriolAbril · 2020-07-07T10:30:15Z

Here are the relevant lines in sample_posterior_predictive, not sure if the same issue happens is fast_sample_posterior_predictive.

https://github.com/pymc-devs/pymc3/blob/master/pymc3/sampling.py#L1598-L1605

The chain information is not retrieved in the dataset case, only if a trace is passed, thus keep_size does not work if the input is a dataset. In the dataset case, doing nchain = trace.dims["chain"] should work

rpgoldman · 2020-07-07T14:37:53Z

@kyleabeauchamp Thanks for checking this: it looks like keep_size does not work in sample_posterior_predictive. Would you mind checking to see if it is also broken in fast_sample_posterior_predictive?

I can shove these into the tests, so that we can fix and make sure that keep_size continues to work correctly.

kyleabeauchamp · 2020-07-07T15:19:29Z

For the case of an arviz trace, fast_sample_posterior_predictive doesn't run at all due to some strong type checking:

lib/python3.7/site-packages/pymc3/distributions/posterior_predictive.py in fast_sample_posterior_predictive(trace, samples, model, var_names, keep_size, random_seed)
    182         if keep_size and not isinstance(trace, MultiTrace):
    183             # arguably this should be just a warning.
--> 184             raise IncorrectArgumentsError("keep_size argument only applies when sampling from MultiTrace.")
    185 
    186         if isinstance(trace, list) and all((isinstance(x, dict) for x in trace)):

IncorrectArgumentsError: keep_size argument only applies when sampling from MultiTrace.

For the case of a pymc3 trace, fast_sample_posterior_predictive does seem to respect the (chain, draw) shape:

import numpy as np
import pymc3 as pm

chains = 2
n_samples = 1000
n_things = 25

y = np.random.normal(size=n_things) + 5.0

coords = {"things": range(n_things)}

with pm.Model(coords=coords) as model:
    y = pm.Data('y', y, dims="things")
    mu = pm.Normal('mu', 0, sd=50.0)
    obs = pm.Normal('obs', mu, observed=y, dims="things")
    tr = pm.sample(n_samples, chains=chains)

ppc_true = pm.fast_sample_posterior_predictive(tr, keep_size=True, model=model)
ppc_false = pm.fast_sample_posterior_predictive(tr, keep_size=False, model=model)

print(ppc_true["obs"].shape)
print(ppc_false["obs"].shape)

(2, 1000, 25)
(2000, 25)

OriolAbril · 2020-07-07T15:32:13Z

sample_posterior_predictive also respects keep_size when used with a pm.MultiTrace I remember adding tests for this to both PyMC3 and ArviZ

rpgoldman · 2020-07-07T15:40:45Z

Probably we didn't get the handling of shape right when we added InferenceData input to fast_sample_posterior_predictive. I'll see about fixing this.

I would imagine that if we take an InferenceData as input, maybe we should return one as output.

rpgoldman · 2020-07-09T03:44:24Z

@michaelosthege while I was debugging this, I was surprised to see that the posterior predictive sampling functions now accept an xarray dataset, but not an InferenceData object.
Shouldn't we permit an InferenceData also as an input, and then extract the posterior group ourselves for the default case?

kyleabeauchamp · 2020-07-09T03:46:39Z

I also experienced this confusion---in the pymc3 codepath you pass the output of tr = pm.sample() but in the arviz codepath you must pass the the tr.posterior variable on the object...

rpgoldman · 2020-07-09T03:49:06Z

@kyleabeauchamp I'm going to address this in my fix for this issue; I agree with you -- it seems odd to require that extra step if using InferenceData.

rpgoldman · 2020-07-09T16:08:36Z

Just debugging a solution to this now...

rpgoldman · 2020-07-09T17:29:09Z

I have a WIP solution -- waiting to see if it passes CI.

lucianopaz added the enhancements label Jul 6, 2020

kyleabeauchamp mentioned this issue Jul 7, 2020

from_pymc3_predictions does not support arviz trace input arviz-devs/arviz#1283

Closed

rpgoldman self-assigned this Jul 9, 2020

rpgoldman linked a pull request Jul 9, 2020 that will close this issue

Fix keep_size for arviz structures. #4006

Merged

michaelosthege closed this as completed in #4006 Jul 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample_posterior_predictive flattens chains and draws #4004

sample_posterior_predictive flattens chains and draws #4004

kyleabeauchamp commented Jul 6, 2020

rpgoldman commented Jul 7, 2020

kyleabeauchamp commented Jul 7, 2020

kyleabeauchamp commented Jul 7, 2020

OriolAbril commented Jul 7, 2020

rpgoldman commented Jul 7, 2020

kyleabeauchamp commented Jul 7, 2020

OriolAbril commented Jul 7, 2020

rpgoldman commented Jul 7, 2020 •

edited

Loading

rpgoldman commented Jul 9, 2020

kyleabeauchamp commented Jul 9, 2020

rpgoldman commented Jul 9, 2020

rpgoldman commented Jul 9, 2020

rpgoldman commented Jul 9, 2020

sample_posterior_predictive flattens chains and draws #4004

sample_posterior_predictive flattens chains and draws #4004

Comments

kyleabeauchamp commented Jul 6, 2020

rpgoldman commented Jul 7, 2020

kyleabeauchamp commented Jul 7, 2020

kyleabeauchamp commented Jul 7, 2020

OriolAbril commented Jul 7, 2020

rpgoldman commented Jul 7, 2020

kyleabeauchamp commented Jul 7, 2020

OriolAbril commented Jul 7, 2020

rpgoldman commented Jul 7, 2020 • edited Loading

rpgoldman commented Jul 9, 2020

kyleabeauchamp commented Jul 9, 2020

rpgoldman commented Jul 9, 2020

rpgoldman commented Jul 9, 2020

rpgoldman commented Jul 9, 2020

rpgoldman commented Jul 7, 2020 •

edited

Loading