Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero Truncated Poisson, also need param_estimate and stats_tbl #471

Closed
Tracked by #467
spsanderson opened this issue May 3, 2024 · 0 comments
Closed
Tracked by #467

Zero Truncated Poisson, also need param_estimate and stats_tbl #471

spsanderson opened this issue May 3, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@spsanderson
Copy link
Owner

spsanderson commented May 3, 2024

Param Estimate

Function:

#' Estimate Zero Truncated Poisson Parameters
#'
#' @family Parameter Estimation
#' @family Poisson
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @details
#'
#' This function estimates the parameter lambda of a Zero-Truncated Poisson distribution
#' based on a vector of non-negative integer values `.x`. The Zero-Truncated Poisson
#' distribution is a discrete probability distribution that models the number of events
#' occurring in a fixed interval of time, given that at least one event has occurred.
#'
#' The estimation is performed by minimizing the negative log-likelihood of the observed
#' data `.x` under the Zero-Truncated Poisson model. The negative log-likelihood function
#' used for optimization is defined as:
#'
#' \deqn{-\sum_{i=1}^{n} \log(P(X_i = x_i \mid X_i > 0, \lambda))}{,}
#'
#' where \( X_i \) are the observed values in `.x` and \( \lambda \) is the parameter
#' of the Zero-Truncated Poisson distribution.
#'
#' The optimization process uses the `optim` function to find the value of \( \lambda \)
#' that minimizes this negative log-likelihood. The chosen optimization method is Brent's
#' method (`method = "Brent"`) within a specified interval `[0, max(.x)]`.
#'
#' If `.auto_gen_empirical` is set to `TRUE`, the function will generate empirical data
#' statistics using `tidy_empirical()` for the input data `.x` and then combine this
#' empirical data with the estimated Zero-Truncated Poisson distribution using
#' `tidy_combine_distributions()`. This combined data can be accessed via the
#' `$combined_data_tbl` element of the function output.
#'
#' The function returns a tibble containing the estimated parameter \( \lambda \) along
#' with other summary statistics of the input data (sample size, minimum, maximum).
#'
#' @description This function will attempt to estimate the Zero Truncated Poisson
#' lambda parameter given some vector of values `.x`. The function will return a
#' tibble output, and if the parameter `.auto_gen_empirical` is set to `TRUE`
#' then the empirical data given to the parameter `.x` will be run through the
#' `tidy_empirical()` function and combined with the estimated Zero Truncated
#' Poisson data.
#'
#' @param .x The vector of data to be passed to the function. Must be non-negative
#' integers.
#' @param .auto_gen_empirical This is a boolean value of TRUE/FALSE with default
#' set to TRUE. This will automatically create the `tidy_empirical()` output
#' for the `.x` parameter and use the `tidy_combine_distributions()`. The user
#' can then plot out the data using `$combined_data_tbl` from the function output.
#'
#' @examples
#' library(dplyr)
#' library(ggplot2)
#'
#' tc <- tidy_zero_truncated_poisson() |> pull(y)
#' output <- util_zero_truncated_poissson_param_estimate(tc)
#'
#' output$parameter_tbl
#'
#' output$combined_data_tbl |>
#'   tidy_combined_autoplot()
#'
#' @return
#' A tibble/list
#'
#' @name util_zero_truncated_poissson_param_estimate
NULL

#' @export
#' @rdname util_zero_truncated_poissson_param_estimate

util_zero_truncated_poissson_param_estimate <- function(.x, .auto_gen_empirical = TRUE) {

  # Tidyeval ----
  x_term <- as.numeric(.x)
  minx <- min(x_term)
  maxx <- max(x_term)
  n <- length(x_term)

  # Define negative log-likelihood function
  neg_loglik <- function(lambda, data) {
    -sum(log(actuar::dztpois(x_term, lambda = lambda)))
  }

  # Optimize to find lambda that minimizes negative log-likelihood
  optim_result <- stats::optim(par = 1, fn = neg_loglik, data = x_term,
                               method = "Brent",
                               lower = 0, upper = max(x))

  # Extract estimated lambda
  lambda_est <- optim_result$par

  # Return Tibble ----
  if (.auto_gen_empirical) {
    te <- tidy_empirical(.x = x_term)
    td <- tidy_zero_truncated_poisson(.n = n, .lambda = round(lambda_est, 3))
    combined_tbl <- tidy_combine_distributions(te, td)
  }

  # Return Tibble
  ret <- dplyr::tibble(
    dist_type = "Zero Truncated Poisson",
    samp_size = n,
    min = minx,
    max = maxx,
    lambda = lambda_est
  )

  # Return ----
  attr(ret, "tibble_type") <- "parameter_estimation"
  attr(ret, "family") <- "zero truncated poisson"
  attr(ret, "x_term") <- .x
  attr(ret, "n") <- n

  if (.auto_gen_empirical) {
    output <- list(
      combined_data_tbl = combined_tbl,
      parameter_tbl     = ret
    )
  } else {
    output <- list(
      parameter_tbl = ret
    )
  }

  return(output)
}

Example:

> tc <- tidy_zero_truncated_poisson() |> pull(y)
> output <- util_zero_truncated_poissson_param_estimate(tc)
> 
> output$parameter_tbl
# A tibble: 1 × 5
  dist_type              samp_size   min   max lambda
  <chr>                      <int> <dbl> <dbl>  <dbl>
1 Zero Truncated Poisson        50     1     3  0.584
> 
> output$combined_data_tbl |>
+   tidy_combined_autoplot()

image

AIC

Function:

#' Calculate Akaike Information Criterion (AIC) for zero-truncated poisson Distribution
#'
#' This function calculates the Akaike Information Criterion (AIC) for a zero-truncated poisson distribution fitted to the provided data.
#'
#' @family Utility
#' @author Steven P. Sanderson II, MPH
#'
#' @description
#' This function estimates the parameters of a zero-truncated poisson distribution from the provided data using maximum likelihood estimation,
#' and then calculates the AIC value based on the fitted distribution.
#'
#' @param .x A numeric vector containing the data to be fitted to a zero-truncated poisson distribution.
#'
#' @examples
#' library(actuar)
#'
#' # Example 1: Calculate AIC for a sample dataset
#' set.seed(123)
#' x <- rztpois(30, lambda = 3)
#' util_zero_truncated_poisson_aic(x)
#'
#' @return
#' The AIC value calculated based on the fitted zero-truncated poisson distribution to the provided data.
#'
#' @name util_zero_truncated_poisson_aic
NULL

#' @export
#' @rdname util_zero_truncated_poisson_aic

util_zero_truncated_poisson_aic <- function(.x) {
  # Validate input
  if (!is.numeric(.x) || any(!is.na(.x) & .x != as.integer(.x)) || any(.x < 0)) {
    stop("Input data (.x) must be a numeric vector of non-negative integers.")
  }
  
  x <- as.numeric(.x)
  
  # Get initial parameter estimates using TidyDensity package (if available)
  pe <- TidyDensity::util_zero_truncated_poisson_param_estimate(x)$parameter_tbl
  
  # Negative log-likelihood function for zero-truncated poisson distribution
  nll <- function(par, data) {
    lambda <- par[1]
    -sum(actuar::dztpois(data, lambda = lambda, log = TRUE))
  }
  
  # Fit zero-truncated poisson distribution to sample data (optimization)
  fit_ztp<- stats::optim(
    pe$lambda,
    nll,
    data = x,
    method = "Brent",
    lower = 0,
    upper = 1000
  )
  
  # Extract log-likelihood and number of parameters
  logLik_ztp<- -fit_ztp$value
  k_ztp <- length(pe)  # Number of parameters for zero-truncated poisson distribution (degrees of freedom and ncp)
  
  # Calculate AIC
  AIC_ztp <- 2 * k_ztp - 2 * logLik_ztp
  
  # Return AIC value
  return(AIC_ztp)
}

Example:

> set.seed(123)
> x <- rztpois(30, lambda = 3)
> util_ztp_aic(x)
[1] 123.8552

Stats Tibble

Function:

#' Distribution Statistics
#'
#' @family Poisson
#' @family Zero Truncated
#' @family Distribution Statistics
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @details This function will take in a tibble and returns the statistics
#' of the given type of `tidy_` distribution. It is required that data be
#' passed from a `tidy_` distribution function.
#'
#' @description Returns distribution statistics in a tibble.
#'
#' @param .data The data being passed from a `tidy_` distribution function.
#'
#' @examples
#' library(dplyr)
#'
#' tidy_zero_truncated_poisson() |>
#'   util_zero_truncated_poisson_stats_tbl() |>
#'   glimpse()
#'
#' @return
#' A tibble
#'
#' @name util_zero_truncated_poisson_stats_tbl
NULL

#' @export
#' @rdname util_zero_truncated_poisson_stats_tbl

util_zero_truncated_poisson_stats_tbl <- function(.data) {
  
  # Immediate check for tidy_ distribution function
  if (!"tibble_type" %in% names(attributes(.data))) {
    rlang::abort(
      message = "You must pass data from the 'tidy_dist' function.",
      use_cli_format = TRUE
    )
  }
  
  if (attributes(.data)$tibble_type != "tidy_zero_truncated_poisson") {
    rlang::abort(
      message = "You must use 'tidy_zero_truncated_poisson()'",
      use_cli_format = TRUE
    )
  }
  
  # Data
  data_tbl <- dplyr::as_tibble(.data)
  
  atb <- attributes(data_tbl)
  l <- atb$.lambda
  
  stat_mean <- l
  stat_mode <- floor(l)
  stat_sd <- sqrt(l)
  stat_skewness <- 1 / sqrt(l)
  stat_kurtosis <- 3 + (1 / l)
  stat_coef_var <- 1 / sqrt(l)
  
  # Data Tibble
  ret <- dplyr::tibble(
    tidy_function = atb$tibble_type,
    function_call = atb$dist_with_params,
    distribution = dist_type_extractor(atb$tibble_type),
    distribution_type = atb$distribution_family_type,
    points = atb$.n,
    simulations = atb$.num_sims,
    mean = stat_mean,
    mode = stat_mode,
    range = paste0("1 to Inf"),
    std_dv = stat_sd,
    coeff_var = stat_coef_var,
    skewness = stat_skewness,
    kurtosis = stat_kurtosis,
    computed_std_skew = tidy_skewness_vec(data_tbl$y),
    computed_std_kurt = tidy_kurtosis_vec(data_tbl$y),
    ci_lo = ci_lo(data_tbl$y),
    ci_hi = ci_hi(data_tbl$y)
  )
  
  # Return
  return(ret)
}

Example:

> tidy_zero_truncated_poisson() |>
+   util_zero_truncated_poisson_stats_tbl() |>
+   glimpse()
Rows: 1
Columns: 17
$ tidy_function     <chr> "tidy_zero_truncated_poisson"
$ function_call     <chr> "Zero Truncated Poisson c(1)"
$ distribution      <chr> "Zero Truncated Poisson"
$ distribution_type <chr> "discrete"
$ points            <dbl> 50
$ simulations       <dbl> 1
$ mean              <dbl> 1
$ mode              <dbl> 1
$ range             <chr> "1 to Inf"
$ std_dv            <dbl> 1
$ coeff_var         <dbl> 1
$ skewness          <dbl> 1
$ kurtosis          <dbl> 4
$ computed_std_skew <dbl> 0.4927686
$ computed_std_kurt <dbl> 1.893151
$ ci_lo             <dbl> 1
$ ci_hi             <dbl> 3
@spsanderson spsanderson self-assigned this May 4, 2024
@spsanderson spsanderson added the enhancement New feature or request label May 4, 2024
@spsanderson spsanderson added this to the TidyDensity 1.4.1 milestone May 4, 2024
@spsanderson spsanderson changed the title Zero Truncated Poisson Zero Truncated Poisson, also need param_estimate and stats_tbl May 4, 2024
spsanderson added a commit that referenced this issue May 4, 2024
spsanderson added a commit that referenced this issue May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

1 participant