Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantile_normalization function #405

Closed
spsanderson opened this issue Mar 28, 2024 · 1 comment
Closed

quantile_normalization function #405

spsanderson opened this issue Mar 28, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@spsanderson
Copy link
Owner

Make a quantile_normalization function

# Perform quantile normalization on a numeric matrix 'data_matrix'
quantile_normalize <- function(data_matrix) {
  # Step 1: Sort each column
  sorted_data <- apply(data_matrix, 2, sort)
  
  # Step 2: Calculate the mean of each row across sorted columns
  row_means <- rowMeans(sorted_data)
  
  # Step 3: Replace each column's sorted values with the row means
  sorted_data <- matrix(row_means, nrow = nrow(sorted_data), ncol = ncol(sorted_data), byrow = TRUE)
  
  # Step 4: Unsort the columns to their original order
  rank_indices <- apply(data_matrix, 2, order)
  normalized_data <- matrix(nrow = nrow(data_matrix), ncol = ncol(data_matrix))
  for (i in 1:ncol(data_matrix)) {
    normalized_data[, i] <- sorted_data[rank_indices[, i], i]
  }
  
  return(normalized_data)
}

Example here:
https://www.spsanderson.com/steveondata/posts/2024-03-28/

@spsanderson spsanderson added the enhancement New feature or request label Mar 28, 2024
@spsanderson spsanderson added this to the TidyDensity 1.3.1 milestone Mar 28, 2024
@spsanderson spsanderson self-assigned this Mar 28, 2024
@spsanderson
Copy link
Owner Author

spsanderson commented Apr 3, 2024

Function:

#' Perform quantile normalization on a numeric matrix/data.frame
#'
#' @family Utility
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @description This function will perform quantile normalization on two or more
#' distributions of equal length. Quantile normalization is a technique used to make the distribution of values across different samples
#' more similar. It ensures that the distributions of values for each sample have the same quantiles.
#' This function takes a numeric matrix as input and returns a quantile-normalized matrix.
#'
#' @param .data A numeric matrix where each column represents a sample.
#'
#' @return A list object that has the following:
#' \enumerate{
#'  \item A numeric matrix that has been quantile normalized.
#'  \item The row means of the quantile normalized matrix.
#'  \item The sorted data
#'  \item The ranked indices
#' }
#'
#' @details
#' This function performs quantile normalization on a numeric matrix by following these steps:
#' \enumerate{
#'   \item Sort each column of the input matrix.
#'   \item Calculate the mean of each row across the sorted columns.
#'   \item Replace each column's sorted values with the row means.
#'   \item Unsort the columns to their original order.
#' }
#'
#' @examples
#' # Create a sample numeric matrix
#' data <- matrix(rnorm(20), ncol = 4)
#' # Perform quantile normalization
#' normalized_data <- quantile_normalize(data)
#' normalized_data
#'
#' @seealso
#' \code{\link{rowMeans}}: Calculate row means.
#'
#' \code{\link{apply}}: Apply a function over the margins of an array.
#'
#' \code{\link{order}}: Order the elements of a vector.
#'
#' @name quantile_normalize
NULL

#' @export
#' @rdname quantile_normalize
# Perform quantile normalization on a numeric matrix 'data_matrix'
quantile_normalize <- function(.data) {

  # Checks ----
  if (!is.matrix(.data) & !is.data.frame(.data)) {
    rlang::abort(
      message = "The input data must be a numeric matrix or data.frame.",
      use_cli_format = TRUE
    )
  }

  if (!all(sapply(data, is.numeric))){
    rlang::abort(
      message = "The input data must be a numeric matrix or data.frame.",
      use_cli_format = TRUE
    )
  }

  # Data ----
  data_matrix <- as.matrix(.data)

  # Step 1: Sort each column
  sorted_data <- apply(data_matrix, 2, sort)

  # Step 2: Calculate the mean of each row across sorted columns
  row_means <- rowMeans(sorted_data)

  # Step 3: Replace each column's sorted values with the row means
  sorted_data <- matrix(
    row_means,
    nrow = nrow(sorted_data),
    ncol = ncol(sorted_data),
    byrow = TRUE
    )

  # Step 4: Unsort the columns to their original order
  rank_indices <- apply(data_matrix, 2, order)
  normalized_data <- matrix(nrow = nrow(data_matrix), ncol = ncol(data_matrix))
  for (i in 1:ncol(data_matrix)) {
    normalized_data[, i] <- sorted_data[rank_indices[, i], i]
  }

  return(list(
    normalized_data = normalized_data,
    row_means = row_means,
    sorted_data = sorted_data,
    rank_indices = rank_indices
  ))
}

Example:

> data <- matrix(rnorm(20), ncol = 4)
> # Perform quantile normalization
> normalized_data <- quantile_normalize(data)
> normalized_data
$normalized_data
            [,1]        [,2]        [,3]        [,4]
[1,] -0.50206621  0.05395216 -1.34467548 -0.50206621
[2,]  0.28707442  0.28707442  0.05395216  0.05395216
[3,]  0.05395216 -1.34467548  1.45839000 -1.34467548
[4,] -1.34467548  1.45839000 -0.50206621  0.28707442
[5,]  1.45839000 -0.50206621  0.28707442  1.45839000

$row_means
[1] -1.34467548 -0.50206621  0.05395216  0.28707442  1.45839000

$sorted_data
            [,1]        [,2]        [,3]        [,4]
[1,] -1.34467548 -0.50206621  0.05395216  0.28707442
[2,]  1.45839000 -1.34467548 -0.50206621  0.05395216
[3,]  0.28707442  1.45839000 -1.34467548 -0.50206621
[4,]  0.05395216  0.28707442  1.45839000 -1.34467548
[5,] -0.50206621  0.05395216  0.28707442  1.45839000

$rank_indices
     [,1] [,2] [,3] [,4]
[1,]    5    5    3    3
[2,]    3    4    1    2
[3,]    4    2    4    4
[4,]    1    3    2    1
[5,]    2    1    5    5

> as.data.frame(normalized_data$normalized_data) |>
+     sapply(function(x) quantile(x, probs = seq(0,1,1/4)))
              V1          V2          V3          V4
0%   -1.34467548 -1.34467548 -1.34467548 -1.34467548
25%  -0.50206621 -0.50206621 -0.50206621 -0.50206621
50%   0.05395216  0.05395216  0.05395216  0.05395216
75%   0.28707442  0.28707442  0.28707442  0.28707442
100%  1.45839000  1.45839000  1.45839000  1.45839000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

1 participant