Derivation of the logcdf function #9

TomScheffers · 2022-02-02T10:40:04Z

@thequackdaddy Do you have any sources which you can share to elaborate on the logcdf_1to2 implementation? I am trying to validate the correctness of this code.

def logcdf_1to2(x, mu, phi, p):
    # I couldn't find a paper on this, so gonna be a little hacky until I
    # have a better idea. The strategy is to create a (n, 1) matrix where
    # n is the number of observations and the first column represents where
    # there are 0 occurences. We'll add an additional column for 1 occurence,
    # and test for whether the difference between the added's column value
    # and the max value is greater than 37. If not, add another column
    # until that's the case. Then, sum the columns to give a vector of length
    # n which *should* be the CDF. (I think).

    # For very high rates, this funciton might not run well as it will
    # create lots of (potentially meaningless) columns.
    rate = est_kappa(mu, p) / phi
    scale = est_gamma(phi, p, mu)
    shape = -est_alpha(p)
    W = -rate.reshape(-1, 1)

    i = 0
    while True:
        i += 1
        trial = i * np.log(rate) - rate - gammaln(i + 1)
        # trial += gamma(a=i * shape, scale=scale).logcdf(x)
        trial += np.log(gammainc(i * shape, x / scale))
        W = np.hstack((W, trial.reshape(-1, 1)))

        if (np.all(W[:, :-1].max(axis=1) - W[:, -1] > 37) &
                np.all(W[:, -2] > W[:, -1])):
            break
    logcdf = np.log(np.exp(W).sum(axis=1))
    return logcdf

thequackdaddy · 2022-02-02T14:14:59Z

Haha not really. That said, I believe I was able, to validate the results against R’s package. So it’s roughly accurate. At low rates, it should be fine. Only in high rates could I see a problem.

This is pretty basic as I remember it. Essentially, it’s calculating log( CDF at zero events + CDF at one event + … + CDF at n events). That’s a sum of series with an infinite number of terms.

You only need to add the terms which are computationally meaningful. The rest are zero, so you can ignore them.

I highly suspect the way it’s done follows the strategy in this paper (though this is pdf, not CDF)

https://eprints.usq.edu.au/2335/1/Dunn_Smyth_Stats_and_Comp_v15n4.pdf

I never really put in the time to engineer a good solution for CDF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derivation of the logcdf function #9

Derivation of the logcdf function #9

TomScheffers commented Feb 2, 2022

thequackdaddy commented Feb 2, 2022

Derivation of the logcdf function #9

Derivation of the logcdf function #9

Comments

TomScheffers commented Feb 2, 2022

thequackdaddy commented Feb 2, 2022