Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge case were the confidence interval converges to theta_null #5

Open
mthulin opened this issue Apr 1, 2022 · 0 comments
Open

Edge case were the confidence interval converges to theta_null #5

mthulin opened this issue Apr 1, 2022 · 0 comments

Comments

@mthulin
Copy link
Owner

mthulin commented Apr 1, 2022

Issue submitted via e-mail:

I may have found an edge case for which p-values inconsistent with the confidence interval are returned. Here is a minimal reproducible example along with a guess as to what the cause is.

test_df <- data.frame("x" = c(rep(-1, 3),
rep(0, 14),
rep(1, 3)))
statistic <- function(data, indices) mean(data$x[indices])

set.seed(24601)
boot_res <- boot(test_df, statistic, 1000)

table(boot_res$t)

-0.4 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

1 5 5 19 46 66 109 166 169 155 96 78 60

0.25 0.3 0.35 0.5

16 6 2 1

boot.ci(boot_res, type = "perc")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS

Based on 1000 bootstrap replicates

CALL :

boot.ci(boot.out = boot_res, type = "perc")

Intervals :

Level Percentile

95% (-0.2500, 0.2487 )

Calculations and Intervals on Original Scale

boot.pval(boot_res, type = "perc")

[1] 0.001

The confidence interval contains zero, but the p-value comes back as highly significant.

pval_precision <- NULL
type <- "perc"
theta_null <- 0

if (is.null(pval_precision)) {
pval_precision = 1 / boot_res$R
}
alpha_seq <- seq(1e-16, 1 - 1e-16, pval_precision)
ci <- boot::boot.ci(boot_res,
conf = 1 - alpha_seq, type = type)

Warning in norm.inter(t, alpha): extreme order statistics used as endpoints

bounds <-
switch(
type,
norm = ci$normal[, 2:3],
basic = ci$basic[,
4:5],
stud = ci$student[, 4:5],
perc = ci$percent[, 4:5],
bca = ci$bca[, 4:5]
)
alpha <- alpha_seq[which.min(theta_null >= bounds[, 1] &
theta_null <= bounds[, 2])]
The problem is which.min(theta_null >= bounds[, 1] & theta_null <= bounds[, 2]). I believe the intention is to find the first FALSE value, but in this case there are no false values, and it’s picking up the first non-NA value in alpha_seq and falsely communicating high significance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant