Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derivation of the logcdf function #9

Open
TomScheffers opened this issue Feb 2, 2022 · 1 comment
Open

Derivation of the logcdf function #9

TomScheffers opened this issue Feb 2, 2022 · 1 comment

Comments

@TomScheffers
Copy link

@thequackdaddy Do you have any sources which you can share to elaborate on the logcdf_1to2 implementation? I am trying to validate the correctness of this code.

def logcdf_1to2(x, mu, phi, p):
    # I couldn't find a paper on this, so gonna be a little hacky until I
    # have a better idea. The strategy is to create a (n, 1) matrix where
    # n is the number of observations and the first column represents where
    # there are 0 occurences. We'll add an additional column for 1 occurence,
    # and test for whether the difference between the added's column value
    # and the max value is greater than 37. If not, add another column
    # until that's the case. Then, sum the columns to give a vector of length
    # n which *should* be the CDF. (I think).

    # For very high rates, this funciton might not run well as it will
    # create lots of (potentially meaningless) columns.
    rate = est_kappa(mu, p) / phi
    scale = est_gamma(phi, p, mu)
    shape = -est_alpha(p)
    W = -rate.reshape(-1, 1)

    i = 0
    while True:
        i += 1
        trial = i * np.log(rate) - rate - gammaln(i + 1)
        # trial += gamma(a=i * shape, scale=scale).logcdf(x)
        trial += np.log(gammainc(i * shape, x / scale))
        W = np.hstack((W, trial.reshape(-1, 1)))

        if (np.all(W[:, :-1].max(axis=1) - W[:, -1] > 37) &
                np.all(W[:, -2] > W[:, -1])):
            break
    logcdf = np.log(np.exp(W).sum(axis=1))
    return logcdf
@thequackdaddy
Copy link
Owner

Haha not really. That said, I believe I was able, to validate the results against R’s package. So it’s roughly accurate. At low rates, it should be fine. Only in high rates could I see a problem.

This is pretty basic as I remember it. Essentially, it’s calculating log( CDF at zero events + CDF at one event + … + CDF at n events). That’s a sum of series with an infinite number of terms.

You only need to add the terms which are computationally meaningful. The rest are zero, so you can ignore them.

I highly suspect the way it’s done follows the strategy in this paper (though this is pdf, not CDF)

https://eprints.usq.edu.au/2335/1/Dunn_Smyth_Stats_and_Comp_v15n4.pdf

I never really put in the time to engineer a good solution for CDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants