DOC: format of the initial values for the sample_smc function #7283

LongPham7 · 2024-04-27T18:56:05Z

Issue with current documentation:

The PyMC documentation on the sample_smc function for Sequential Monte Carlo (SMC) doesn't describe the correct format/shape for the start parameter, which specifies the initial values for SMC. As a result, the users are required to figure out the correct format/shape on their own. Additionally, this unit test on the start parameter only tests a single chain - it doesn't consider a scenario with multiple chains.

Originally, I posted a question about the correct format/shape of the start parameter on the PyMC Discourse. @ricardoV94 then spotted that the unit test only covered the case of a single chain, suggesting me to report this issue on GitHub.

To be more concrete, let us consider the following code for Bayesian linear regression using SMC (this code is from my question posted on Discourse):

import pymc as pm
import numpy as np

def basic_model(observed_data):
    array_sizes = np.array([size for (size, _) in observed_data])
    array_costs = np.array([cost for (_, cost) in observed_data])
    coefficient_sigma = 5

    with pm.Model() as model:
        coefficient0 = pm.HalfNormal(
            "coefficient0", sigma=coefficient_sigma)
        coefficient1 = pm.HalfNormal(
            "coefficient1", sigma=coefficient_sigma)

        predicted_bounds = coefficient0 + coefficient1 * array_sizes
        observed_costs = pm.Normal("observed_costs", mu=predicted_bounds,
                                   sigma=10, observed=array_costs)
    return model

observed_data = [[1, 1], [2, 2], [4, 3], [8, 4], [16, 7], [32, 10], [64, 13], [128, 17], [256, 18]]

num_draws = 1000
num_chains = 4

init_smc = {"coefficient0_log__": np.full((num_draws, num_chains), 10),
            "coefficient1_log__": np.full((num_draws, num_chains), 10)}

with basic_model(observed_data):
    idata = pm.sample_smc(num_draws, start=init_smc,
                          chains=num_chains, random_seed=42)

Here, inside the dictionary init_smc for SMC's initial values, the latent variables (i.e., coefficient0_log__ and coefficient1_log__) are each mapped to a numpy array of shape (num_draws, num_chains). If I used a different shape, such as (num_chains, num_draws), the code would crash. The PyMC documentation doesn't clarify what the correct shape of the numpy array should be.

Idea or request for content:

I would be grateful if someone could update the documentation on the sample_smc function's start parameter and also add a unit test to test the start parameter in the presence of multiple chains.

The text was updated successfully, but these errors were encountered:

welcome · 2024-04-27T18:56:07Z

]
🎉 Welcome to PyMC! 🎉 We're really excited to have your input into the project! 💖

If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

LongPham7 added the docs label Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: format of the initial values for the sample_smc function #7283

DOC: format of the initial values for the sample_smc function #7283

LongPham7 commented Apr 27, 2024

welcome bot commented Apr 27, 2024

DOC: format of the initial values for the sample_smc function #7283

DOC: format of the initial values for the sample_smc function #7283

Comments

LongPham7 commented Apr 27, 2024

Issue with current documentation:

Idea or request for content:

welcome bot commented Apr 27, 2024