Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights are modifying the observation SD not variance #25

Closed
seananderson opened this issue Oct 17, 2023 · 6 comments
Closed

Weights are modifying the observation SD not variance #25

seananderson opened this issue Oct 17, 2023 · 6 comments

Comments

@seananderson
Copy link
Member

SD/weight in the code:

if(obs_model == 1) {for(i in 1:n_pos) log_lik[i] = normal_lpdf(y[i] | offset[i] + pred[row_indx_pos[i],col_indx_pos[i]] + obs_cov_offset[i], exp(log(sigma_vec[row_indx_pos[i]]) - log_weights_vec[i]));}

Variance/weight in the vignette:

Just as the `weights` argument can be ueed with `glm`, `lmer` or `glmmTMB`, we allow weights to be used in DFA models. Weights are currently only used for Gaussian reponses and when data is in long format. Specifically, the weights are included by modifying each variance to be $$\sigma^2 / w_i$$. As a concrete example, we'll simulate a dataset, add some examples of standard errors on the survey indices, and then perform the DFA.

I suppose adjusting the SD is fine and maybe the docs just need to be fixed? Or would scaling by inverse variance be more classical?

?fit_dfa alludes to the implementation:

bayesdfa/R/fit_dfa.R

Lines 82 to 84 in 4d91db1

#' @param weights Optional name of "weights" argument in data frame. This is only implemented when data
#' are in long format. If not entered, defaults to weights = 1 for all observations. The implementation of weights
#' varies slightly by family: Gaussian family models use -log(w_i) in the dispersion formula

@ericward-noaa
Copy link
Contributor

Thanks for catching this. It's an error, because I meant to have the weights affecting the variance, not SD. Before making any changes, probably worth thinking about if this is the approach we want to use, or whether we should adopt the glmmTMB or sdmTMB style, which involves using the weights as a multiplier on the likelihood (see here for a good discussion)

I can really go either way. I was thinking of this classical way initially because surveys brought into a DFA each might have an associated CV, etc. Using those as weights works with the classical implementation when likelihood is normal/lognormal, but the glmmTMB approach would be more general probably across families.

@seananderson
Copy link
Member Author

The current implementation has been useful so far for @ecophilina 's work modelling body condition indices. It sounds like brms takes the likelihood multiplier route too. Perhaps weights should multiply the likelihood but what you currently have could be something like an input observation variance? I.e., give it a different argument name. The current implementation seems more akin to what goes into meta-analysis, which seems like a common application. If changing, I guess there would have to be a warning for a while... although given it's already not implemented exactly how it's described, a change needs to be made regardless.

@ericward-noaa
Copy link
Contributor

@ecophilina -- see the changes I was thinking about here: #27 . Let me know if that's not clear or if there's another approach you were thinking of

@seananderson
Copy link
Member Author

Looking at the implementation, it seems these are modifying the SD not the variance, right? I.e., if so, the argument should change name to inverse SD or the implementation should change to modify the variance not the SD.
https://github.com/fate-ewi/bayesdfa/blob/main/inst/stan/dfa.stan#L622

@ericward-noaa ericward-noaa reopened this Jan 30, 2024
@ericward-noaa
Copy link
Contributor

ericward-noaa commented Jan 30, 2024

I'm not sure if this is just a naming issue, or something with the code that needs changing. If it's naming, we need to have 2 vectors -- one for optional likelihood weights, and one for optional inverse variance weights. Happy to change the arg name to something else.

If it's a coding issue, I think where we want to get to is that if someone has standard errors of a survey (se), these become the inverse variance weights $(w_{i} = \frac{1}{{se_{i}^2}})$ and the variance of observations becomes $(\frac{\sigma^2}{{w_i}})$, or equivalently $(\sigma^2 \cdot (se_{i})^2)$.

The workflow that's currently implemented is that

  1. user creates weights themselves, based on the se values, e.g. in the vignette
    df$weights <- (1 / df$se)^2
    . I did this because this is the approach in lm()
  2. inv_var_weights_vec is manipulated internally to pass the se to Stan,
    inv_var_weights_vec = sqrt(1.0/inv_var_weights_vec),
  3. The inv_var_weights_vec is used in the likelihood in Stan to have a standard deviation = $\sigma \cdot se_{i}$, I think the same as above:
    if(obs_model == 1) {for(i in 1:n_pos) log_lik[i] = normal_lpdf(y[i] | input_offset[i] + pred[row_indx_pos[i],col_indx_pos[i]] + obs_cov_offset[i], sigma_vec[row_indx_pos[i]] * inv_var_weights_vec[i]);}

@seananderson
Copy link
Member Author

seananderson commented Jan 30, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants