Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posterior probability does not integrate to 1 #3

Open
monga opened this issue Jan 6, 2022 · 2 comments
Open

Posterior probability does not integrate to 1 #3

monga opened this issue Jan 6, 2022 · 2 comments

Comments

@monga
Copy link

monga commented Jan 6, 2022

The last line of slide 61 (https://speakerdeck.com/rmcelreath/statistical-rethinking-2022-lecture-02?slide=61) and in the book R code 3.2 (and R code 2.3) uses a standardization rule different from the one used for prior probability.

As explained by the Overthinking box at page 35 of the book, prior is an array of ones, since the important property is that it integrates to one over p_grid. The sum of the values of prior is indeed much greater than 1 (20 in code 2.3, 1000 in code 3.2).

The standardization used for posterior instead guarantees that sum(posterior) == 1, while the integral over p_grid is less than one.

This is not relevant for the shape of the posterior curve, but the asymmetry bothers me. I believe the right statement to use in 3.2 is

posterior <- (posterior / sum(posterior))*length(posterior)

then sum(posterior) == sum(prior) and both their integrals over p_grid should be 1.

@rmcelreath
Copy link
Owner

rmcelreath commented Jan 7, 2022

Imagine a rectangle with width 1 and height 1. The area is 1x1=1. That is the uniform density from p=0 to p=1.

When you do the grid approximation, you turn the continuous density into a discrete probability mass distribution. That is why you are finding the normalization step necessary to get it to sum to 1. But it is still true that Pr(p)=1 for all values of p for p ~ uniform(0,1).

@monga
Copy link
Author

monga commented Jan 7, 2022

That's fine, thank you. What I want to say, however, is that the code uses two different approaches for the prior and posterior arrays. While for posterior ones we can assume the sum(posterior) == 1 invariant, this is not true for prior arrays.
Indeed the invariant is even required by some sampling functions. In python, for example, you could write:

numpy.random.choice(p_grid, size=int(1e4), replace=True, p=posterior)

but

numpy.random.choice(p_grid, size=int(1e4), replace=True, p=prior)

would raise an exception since sum(prior) != 1. (The R sample function seems more tolerant)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants