BUG: LKJCorr breaks when used as covariance with MvNormal #7101

velochy · 2024-01-14T13:17:16Z

Describe the issue:

LKJCorr backwards sampling is currently broken and does not properly constrain values to a positive semidefinite matrix early enough. This leads to compilation randomly failing when providing it as a covariance matrix for MvNormal and other similar multivariate distributions. The failure is inherently random, but it gets increasingly likely with larger n values and fails almost always with n>=10 and can in theory be seen with n as low as 3.

Error message is informative, as it is easy to verify the matrix reported in error has negative eigenvalues, proving it is not PSD.

This only affects the backwards sampling (i.e. logp computations), with .eval() and predictive sampling working as they should.

Reproduceable code example:

import pymc as pm
import pytensor.tensor as pt
import numpy as np

n, eta = 10, 1

# Turn LKJCorr output to a proper square corr matrix
def corr_to_mat(corr_values, n):
    corr = pt.zeros((n, n))
    corr = pt.set_subtensor(corr[np.triu_indices(n, k=1)], corr_values)
    return corr + corr.T + pt.identity_like(corr)

# Generate Data
corr = corr_to_mat(pm.LKJCorr.dist(n=n, eta=eta),n)
y = pm.draw(pm.MvNormal.dist(mu=0, cov=corr), 100)

# PyMC Model
with pm.Model() as m:
    corr_values = pm.LKJCorr('corr_values', n=n, eta=eta)
    corr = corr_to_mat(corr_values,n)
    y_hat = pm.MvNormal('y_hat', mu=0, cov=corr, observed=y)
    idata = pm.sample()

Error message:

Traceback (most recent call last):
  File "/home/velochy/pymc/test3.py", line 22, in <module>
    idata = pm.sample()
            ^^^^^^^^^^^
  File "/home/velochy/pymc/pymc/sampling/mcmc.py", line 744, in sample
    model.check_start_vals(ip)
  File "/home/velochy/pymc/pymc/model/core.py", line 1660, in check_start_vals
    raise SamplingError(
pymc.exceptions.SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'corr_values_interval__': array([ 0.96546568,  0.55627918, -0.52092415, -0.79551073, -0.04416735,
       -0.61208579, -0.04889871, -0.62181685,  0.13621748,  0.32908589,
       -0.38471896, -0.89088229, -0.1326561 , -0.18828583,  0.26747631,
       -0.47067211,  0.51480876, -0.56323033, -0.3513053 ,  0.30480191,
       -0.36865109,  0.1479414 , -0.66890036,  0.98484518, -0.85993115,
        0.48094929, -0.04547197, -0.07743477, -0.3381161 ,  0.12565052,
       -0.06450642, -0.69698417,  0.80246982, -0.67958155, -0.29175309,
       -0.02445876, -0.7858127 ,  0.17738997, -0.87779485, -0.27196011,
       -0.52231265,  0.46271978,  0.0810375 ,  0.84458683, -0.72285369])}

Logp initial evaluation results:
{'corr_values': -inf, 'y_hat': -inf}
You can call `model.debug()` for more details.

PyMC version information:

pymc 5.10.3

Context for the issue:

This arose in discussion with @ricardoV94 and @jessegrabowski in https://discourse.pymc.io/t/using-lkjcorr-together-with-mvnormal/13606/29 . Thread has more useful context

ricardoV94 · 2024-01-14T14:04:14Z

The initial point problem happens because of jittering and the insufficient IntervalTransform that is being used.

I assume model.point_logps() still comes out finite for every variable? Otherwise it would indicate we have a problem with the moment function that is used to provide initial valid values for sampling from a distribution.

ricardoV94 · 2024-02-06T09:39:29Z

See also discussion in pymc-devs/pymc-experimental#13

jessegrabowski · 2024-05-03T06:38:34Z

@velochy We just merged pymc-devs/pytensor#614, which lets us now use pt.linalg.norm inside of pymc models. We can now fix this bug by defining a new transformation for LKJCorr. Basically we just need to copy the tensorflowprobably bijector here, then set the newly defined transformation as the default transformation for LKJCorr. Is that something you'd be interested in helping with?

velochy · 2024-05-03T08:16:05Z

Depends on how urgent it is :) currently pretty swamped but should be able to get to it within a month or so

johncant · 2024-06-15T13:57:05Z

Hi - I'm at Pydata London and will a look at this.

johncant · 2024-06-15T13:58:03Z

For anyone in the pymc hack, I'm in the back middle of the room

jessegrabowski · 2024-06-15T14:46:59Z

Hi @johncant , let me know if anything isn't clear about how to proceed

johncant · 2024-06-16T22:30:32Z

@jessegrabowski thanks a lot! Nothing is clear and I am out of my depth. However, it's great learning and I still intend to fix this issue.

@twiecki and @fonnesbeck were there and helped me understand more about pymc than I would have learned from days of hacking on it myself.

Here's what we found:

Issue can be replicated with an even smaller example [1]
LKJCholeskyCov does not have this problem [2]
LKJCholeskyCov did appear to have this problem when I passed sd_dist=pm.Uniform.dist(1.8, 2.2) to MvNormal. Chris indicated that this might cause convergence problems and we changed it to pm.HalfNormal.dist(5, shape=10))
Return type section of LKJCorr is missing from the documentation
Typo in the code where Cholesky is spelled Choleksy
LKJ paper is paywalled

I think it's obvious to you that the problem is the default transformation for LKJCorr and it needs to be replaced with this. For me to do this, the next steps are:

Figure out how to draw a single sample directly from jupyter that replicates this problem
Figure out how to get the transform from an LKJCorr object
Understand the tensorflow bijector, why it should work, and why the current impl does not work
Read the LKJ paper
Port the tensorflow bijector (the easy part!)
Test
Update docs
Submit PR

[1] Replicates issue as long as n is high enough

import pymc as pm
import pytensor.tensor as pt

n, eta = 10, 3

# PyMC Model
with pm.Model() as m:
    corr_values = pm.LKJCorr('corr_values', n=n, eta=eta, return_matrix=True)
    idata = pm.sample()

[2] Replacement with LKJCholeskyCovariance - it samples without error (but with divergences)

import pymc as pm
import pytensor.tensor as pt
import numpy as np

n, eta = 10, 1

# Turn LKJCorr output to a proper square corr matrix
def corr_to_mat(corr_values, n):
    corr = pt.zeros((n, n))
    corr = pt.set_subtensor(corr[np.triu_indices(n, k=1)], corr_values)
    return corr + corr.T + pt.identity_like(corr)

# Generate Data
corr = corr_to_mat(pm.LKJCorr.dist(n=n, eta=eta),n)
y = pm.draw(pm.MvNormal.dist(mu=0, cov=corr), 100)

# PyMC Model
with pm.Model() as m:
    #sigma = pm.Uniform('sigma', 1.9, 2.1)
    #corr_values = pm.LKJCorr('corr_values', n=n, eta=eta)
    cov_values = pm.LKJCholeskyCov('cov_values', n=n, eta=eta, sd_dist=pm.HalfNormal.dist(5, shape=10))
    #corr = corr_to_mat(corr_values,n)
    y_hat = pm.MvNormal('y_hat', mu=0, chol=cov_values[1], observed=y)
    idata = pm.sample()

johncant · 2024-06-18T11:17:53Z

Progress tracking comment

Figure out how to draw a single sample directly from jupyter that replicates this problem
Figure out how to get the transform from an LKJCorr object
Understand the tensorflow bijector, why it should work, and why the current impl does not work
Read the LKJ paper
Port the tensorflow bijector (the easy part!)
Test
Update docs
Submit PR

johncant · 2024-06-18T11:32:18Z

I decided to post my progress from yesterday as a jupyter notebook: https://colab.research.google.com/drive/1UNkgxEuy6j_Eww0aMFnsfzQfQmlxG2-d?usp=sharing . Basically, I now also understand why the transform on LKJCorr is the problem and can start understanding and porting the TF transform.

velochy added the bug label Jan 14, 2024

jessegrabowski mentioned this issue Jan 14, 2024

Refactor nlinalg.norm to match np.linalg.norm signature and functionaly pymc-devs/pytensor#588

Merged

11 tasks

ricardoV94 added the enhancements label Jan 14, 2024

ricardoV94 mentioned this issue Feb 6, 2024

Implement a CorrCholesky transformation for LKJCorr pymc-devs/pymc-experimental#13

Closed

jessegrabowski mentioned this issue Feb 14, 2024

Implement gradient for SVD pymc-devs/pytensor#614

Merged

11 tasks

fonnesbeck added the hackathon Suitable for hackathon label Jun 14, 2024

johncant linked a pull request Jun 21, 2024 that will close this issue

Implement unconstraining transform for LKJCorr #7380

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: LKJCorr breaks when used as covariance with MvNormal #7101

BUG: LKJCorr breaks when used as covariance with MvNormal #7101

velochy commented Jan 14, 2024

ricardoV94 commented Jan 14, 2024 •

edited

Loading

ricardoV94 commented Feb 6, 2024

jessegrabowski commented May 3, 2024

velochy commented May 3, 2024

johncant commented Jun 15, 2024

johncant commented Jun 15, 2024

jessegrabowski commented Jun 15, 2024

johncant commented Jun 16, 2024

johncant commented Jun 18, 2024 •

edited

Loading

johncant commented Jun 18, 2024

BUG: LKJCorr breaks when used as covariance with MvNormal #7101

BUG: LKJCorr breaks when used as covariance with MvNormal #7101

Comments

velochy commented Jan 14, 2024

Describe the issue:

Reproduceable code example:

Error message:

PyMC version information:

Context for the issue:

ricardoV94 commented Jan 14, 2024 • edited Loading

ricardoV94 commented Feb 6, 2024

jessegrabowski commented May 3, 2024

velochy commented May 3, 2024

johncant commented Jun 15, 2024

johncant commented Jun 15, 2024

jessegrabowski commented Jun 15, 2024

johncant commented Jun 16, 2024

johncant commented Jun 18, 2024 • edited Loading

johncant commented Jun 18, 2024

ricardoV94 commented Jan 14, 2024 •

edited

Loading

johncant commented Jun 18, 2024 •

edited

Loading