Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skewed Student-T distribution #7252

Merged
merged 12 commits into from
May 4, 2024
Merged

Conversation

fonnesbeck
Copy link
Member

@fonnesbeck fonnesbeck commented Apr 13, 2024

Description

Added skewed Student T distribution (Jones and Faddy implementation).

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc--7252.org.readthedocs.build/en/7252/

@ricardoV94
Copy link
Member

Add in pymc-experimental first?

@jessegrabowski
Copy link
Member

jessegrabowski commented Apr 14, 2024

If something like this goes in experimental, there should be a clear roadmap to go from there back to the main repo. This is a straight-forward distribution that (I guess?) can't be easily implemented as a generative graph using only existing distributions via CustomDist. If it's in pmx, what's the criteria to move it over? It's not something like statespace or marginalmodel that proposes a huge new functionality, or something like r2d2m2 that implements a distribution that doesn't fit neatly into a typical pymc model as we present them in books/tutorials.

@ricardoV94
Copy link
Member

I don't think this one has a straightforward generative graph

@ricardoV94
Copy link
Member

I assumed something "according to x and y" was a bit experimental, hence my suggestion

@fonnesbeck
Copy link
Member Author

fonnesbeck commented Apr 14, 2024

The "Following Jones and Faddy 2003" was to distinguish it from the range of available skewed Student T implementations (there are 3 or 4). I chose this one because it is available in SciPy and relatively straightforward to implement.

I think it will be a useful distribution because it allows one to specify a likelihood that is both skewed and overdispersed and converges to normal when a and b are both large and equal.

@fonnesbeck
Copy link
Member Author

Jax failure does not seem to be related to this PR

@ricardoV94
Copy link
Member

Jax failure does not seem to be related to this PR

It's not. Has been failing since the last scipy

@ricardoV94
Copy link
Member

But this one is: https://github.com/pymc-devs/pymc/actions/runs/8682226852/job/23806440256?pr=7252#step:7:616

You should try running those logp/logcdf/icdf with n_samples=-1 or whatever it is, to test all combinations locally instead of only a random subset

@ricardoV94
Copy link
Member

You're missing the tests for the RV itself. That's also where we cover alternative parametrizations. Something like these

class TestBeta(BaseTestDistributionRandom):
pymc_dist = pm.Beta
pymc_dist_params = {"alpha": 2.0, "beta": 5.0}
expected_rv_op_params = {"alpha": 2.0, "beta": 5.0}
reference_dist_params = {"a": 2.0, "b": 5.0}
size = 15
reference_dist = lambda self: ft.partial(clipped_beta_rvs, random_state=self.get_random_state()) # noqa E731
checks_to_run = [
"check_pymc_params_match_rv_op",
"check_pymc_draws_match_reference",
"check_rv_size",
]
class TestBetaMuSigma(BaseTestDistributionRandom):
pymc_dist = pm.Beta
pymc_dist_params = {"mu": 0.5, "sigma": 0.25}
expected_alpha, expected_beta = pm.Beta.get_alpha_beta(
mu=pymc_dist_params["mu"], sigma=pymc_dist_params["sigma"]
)
expected_rv_op_params = {"alpha": expected_alpha, "beta": expected_beta}
checks_to_run = ["check_pymc_params_match_rv_op"]

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.35%. Comparing base (60a6314) to head (bcb1b5d).
Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7252      +/-   ##
==========================================
+ Coverage   91.67%   92.35%   +0.68%     
==========================================
  Files         102      102              
  Lines       17017    17069      +52     
==========================================
+ Hits        15600    15764     +164     
+ Misses       1417     1305     -112     
Files Coverage Δ
pymc/distributions/__init__.py 100.00% <ø> (ø)
pymc/distributions/continuous.py 97.93% <100.00%> (+0.09%) ⬆️

... and 4 files with indirect coverage changes


@classmethod
def rng_fn(cls, rng, a, b, mu, sigma, size=None) -> np.ndarray:
return np.asarray(stats.jf_skew_t.rvs(a=a, b=b, size=size, random_state=rng)) * sigma + mu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't sigma and mu be passed as arguments to the rvs?

The way you wrote it has a similar bug that was addressed by this recent bugfix, when there are batched sigma and mu and size is None: #7288

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You marked as resolved but I still see the old code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I had forgotten to push.

@ricardoV94 ricardoV94 merged commit 606d4ff into pymc-devs:main May 4, 2024
20 checks passed
@fonnesbeck fonnesbeck deleted the skewed_t branch May 4, 2024 22:31
@michaelosthege
Copy link
Member

Looks like test_skewstudentt_logp is failing on main, or is at least flaky?

@ricardoV94
Copy link
Member

ricardoV94 commented May 6, 2024

These tests run a random subset of 100 combinations. We should set n_samples=-1 locally and see if every combination passes (both float64 and float32), I guess it does not.

CC @fonnesbeck

@michaelosthege
Copy link
Member

Running float64 this gives:

E           AssertionError: 
E           Arrays are not almost equal to 6 decimals
E           {'a': array(0.01), 'b': array(100.), 'mu': array(-2.1), 'sigma': array(0.01), 'value': array(-0.01)}
E           x and y -inf location mismatch:
E            x: array(-751.339449)
E            y: array(-inf)

@ricardoV94
Copy link
Member

Which underflows? PyMC or Scipy?

@michaelosthege
Copy link
Member

Which underflows? PyMC or Scipy?

SciPy produces the -inf.

What's the fix? I have it open and can push a branch real quick (if it's simple)

@ricardoV94
Copy link
Member

Which underflows? PyMC or Scipy?

SciPy produces the -inf.

What's the fix? I have it open and can push a branch real quick (if it's simple)

Try slightly less extreme parameter domains

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants