quantile_normalization function #405

spsanderson · 2024-03-28T12:39:47Z

Make a quantile_normalization function

# Perform quantile normalization on a numeric matrix 'data_matrix'
quantile_normalize <- function(data_matrix) {
  # Step 1: Sort each column
  sorted_data <- apply(data_matrix, 2, sort)
  
  # Step 2: Calculate the mean of each row across sorted columns
  row_means <- rowMeans(sorted_data)
  
  # Step 3: Replace each column's sorted values with the row means
  sorted_data <- matrix(row_means, nrow = nrow(sorted_data), ncol = ncol(sorted_data), byrow = TRUE)
  
  # Step 4: Unsort the columns to their original order
  rank_indices <- apply(data_matrix, 2, order)
  normalized_data <- matrix(nrow = nrow(data_matrix), ncol = ncol(data_matrix))
  for (i in 1:ncol(data_matrix)) {
    normalized_data[, i] <- sorted_data[rank_indices[, i], i]
  }
  
  return(normalized_data)
}

Example here:
https://www.spsanderson.com/steveondata/posts/2024-03-28/

spsanderson · 2024-04-03T02:00:55Z

Function:

#' Perform quantile normalization on a numeric matrix/data.frame
#'
#' @family Utility
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @description This function will perform quantile normalization on two or more
#' distributions of equal length. Quantile normalization is a technique used to make the distribution of values across different samples
#' more similar. It ensures that the distributions of values for each sample have the same quantiles.
#' This function takes a numeric matrix as input and returns a quantile-normalized matrix.
#'
#' @param .data A numeric matrix where each column represents a sample.
#'
#' @return A list object that has the following:
#' \enumerate{
#'  \item A numeric matrix that has been quantile normalized.
#'  \item The row means of the quantile normalized matrix.
#'  \item The sorted data
#'  \item The ranked indices
#' }
#'
#' @details
#' This function performs quantile normalization on a numeric matrix by following these steps:
#' \enumerate{
#'   \item Sort each column of the input matrix.
#'   \item Calculate the mean of each row across the sorted columns.
#'   \item Replace each column's sorted values with the row means.
#'   \item Unsort the columns to their original order.
#' }
#'
#' @examples
#' # Create a sample numeric matrix
#' data <- matrix(rnorm(20), ncol = 4)
#' # Perform quantile normalization
#' normalized_data <- quantile_normalize(data)
#' normalized_data
#'
#' @seealso
#' \code{\link{rowMeans}}: Calculate row means.
#'
#' \code{\link{apply}}: Apply a function over the margins of an array.
#'
#' \code{\link{order}}: Order the elements of a vector.
#'
#' @name quantile_normalize
NULL

#' @export
#' @rdname quantile_normalize
# Perform quantile normalization on a numeric matrix 'data_matrix'
quantile_normalize <- function(.data) {

  # Checks ----
  if (!is.matrix(.data) & !is.data.frame(.data)) {
    rlang::abort(
      message = "The input data must be a numeric matrix or data.frame.",
      use_cli_format = TRUE
    )
  }

  if (!all(sapply(data, is.numeric))){
    rlang::abort(
      message = "The input data must be a numeric matrix or data.frame.",
      use_cli_format = TRUE
    )
  }

  # Data ----
  data_matrix <- as.matrix(.data)

  # Step 1: Sort each column
  sorted_data <- apply(data_matrix, 2, sort)

  # Step 2: Calculate the mean of each row across sorted columns
  row_means <- rowMeans(sorted_data)

  # Step 3: Replace each column's sorted values with the row means
  sorted_data <- matrix(
    row_means,
    nrow = nrow(sorted_data),
    ncol = ncol(sorted_data),
    byrow = TRUE
    )

  # Step 4: Unsort the columns to their original order
  rank_indices <- apply(data_matrix, 2, order)
  normalized_data <- matrix(nrow = nrow(data_matrix), ncol = ncol(data_matrix))
  for (i in 1:ncol(data_matrix)) {
    normalized_data[, i] <- sorted_data[rank_indices[, i], i]
  }

  return(list(
    normalized_data = normalized_data,
    row_means = row_means,
    sorted_data = sorted_data,
    rank_indices = rank_indices
  ))
}

Example:

> data <- matrix(rnorm(20), ncol = 4)
> # Perform quantile normalization
> normalized_data <- quantile_normalize(data)
> normalized_data
$normalized_data
            [,1]        [,2]        [,3]        [,4]
[1,] -0.50206621  0.05395216 -1.34467548 -0.50206621
[2,]  0.28707442  0.28707442  0.05395216  0.05395216
[3,]  0.05395216 -1.34467548  1.45839000 -1.34467548
[4,] -1.34467548  1.45839000 -0.50206621  0.28707442
[5,]  1.45839000 -0.50206621  0.28707442  1.45839000

$row_means
[1] -1.34467548 -0.50206621  0.05395216  0.28707442  1.45839000

$sorted_data
            [,1]        [,2]        [,3]        [,4]
[1,] -1.34467548 -0.50206621  0.05395216  0.28707442
[2,]  1.45839000 -1.34467548 -0.50206621  0.05395216
[3,]  0.28707442  1.45839000 -1.34467548 -0.50206621
[4,]  0.05395216  0.28707442  1.45839000 -1.34467548
[5,] -0.50206621  0.05395216  0.28707442  1.45839000

$rank_indices
     [,1] [,2] [,3] [,4]
[1,]    5    5    3    3
[2,]    3    4    1    2
[3,]    4    2    4    4
[4,]    1    3    2    1
[5,]    2    1    5    5

> as.data.frame(normalized_data$normalized_data) |>
+     sapply(function(x) quantile(x, probs = seq(0,1,1/4)))
              V1          V2          V3          V4
0%   -1.34467548 -1.34467548 -1.34467548 -1.34467548
25%  -0.50206621 -0.50206621 -0.50206621 -0.50206621
50%   0.05395216  0.05395216  0.05395216  0.05395216
75%   0.28707442  0.28707442  0.28707442  0.28707442
100%  1.45839000  1.45839000  1.45839000  1.45839000

Fixes #405

spsanderson added the enhancement New feature or request label Mar 28, 2024

spsanderson added this to the TidyDensity 1.3.1 milestone Mar 28, 2024

spsanderson self-assigned this Mar 28, 2024

spsanderson closed this as completed in 9d55e22 Apr 3, 2024

spsanderson added a commit that referenced this issue Apr 3, 2024

Merge pull request #408 from spsanderson/development

2cd2b2d

Fixes #405

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantile_normalization function #405

quantile_normalization function #405

spsanderson commented Mar 28, 2024

spsanderson commented Apr 3, 2024 •

edited

Loading

quantile_normalization function #405

quantile_normalization function #405

Comments

spsanderson commented Mar 28, 2024

spsanderson commented Apr 3, 2024 • edited Loading

spsanderson commented Apr 3, 2024 •

edited

Loading