-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quantile_normalization function #405
Comments
Function: #' Perform quantile normalization on a numeric matrix/data.frame
#'
#' @family Utility
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @description This function will perform quantile normalization on two or more
#' distributions of equal length. Quantile normalization is a technique used to make the distribution of values across different samples
#' more similar. It ensures that the distributions of values for each sample have the same quantiles.
#' This function takes a numeric matrix as input and returns a quantile-normalized matrix.
#'
#' @param .data A numeric matrix where each column represents a sample.
#'
#' @return A list object that has the following:
#' \enumerate{
#' \item A numeric matrix that has been quantile normalized.
#' \item The row means of the quantile normalized matrix.
#' \item The sorted data
#' \item The ranked indices
#' }
#'
#' @details
#' This function performs quantile normalization on a numeric matrix by following these steps:
#' \enumerate{
#' \item Sort each column of the input matrix.
#' \item Calculate the mean of each row across the sorted columns.
#' \item Replace each column's sorted values with the row means.
#' \item Unsort the columns to their original order.
#' }
#'
#' @examples
#' # Create a sample numeric matrix
#' data <- matrix(rnorm(20), ncol = 4)
#' # Perform quantile normalization
#' normalized_data <- quantile_normalize(data)
#' normalized_data
#'
#' @seealso
#' \code{\link{rowMeans}}: Calculate row means.
#'
#' \code{\link{apply}}: Apply a function over the margins of an array.
#'
#' \code{\link{order}}: Order the elements of a vector.
#'
#' @name quantile_normalize
NULL
#' @export
#' @rdname quantile_normalize
# Perform quantile normalization on a numeric matrix 'data_matrix'
quantile_normalize <- function(.data) {
# Checks ----
if (!is.matrix(.data) & !is.data.frame(.data)) {
rlang::abort(
message = "The input data must be a numeric matrix or data.frame.",
use_cli_format = TRUE
)
}
if (!all(sapply(data, is.numeric))){
rlang::abort(
message = "The input data must be a numeric matrix or data.frame.",
use_cli_format = TRUE
)
}
# Data ----
data_matrix <- as.matrix(.data)
# Step 1: Sort each column
sorted_data <- apply(data_matrix, 2, sort)
# Step 2: Calculate the mean of each row across sorted columns
row_means <- rowMeans(sorted_data)
# Step 3: Replace each column's sorted values with the row means
sorted_data <- matrix(
row_means,
nrow = nrow(sorted_data),
ncol = ncol(sorted_data),
byrow = TRUE
)
# Step 4: Unsort the columns to their original order
rank_indices <- apply(data_matrix, 2, order)
normalized_data <- matrix(nrow = nrow(data_matrix), ncol = ncol(data_matrix))
for (i in 1:ncol(data_matrix)) {
normalized_data[, i] <- sorted_data[rank_indices[, i], i]
}
return(list(
normalized_data = normalized_data,
row_means = row_means,
sorted_data = sorted_data,
rank_indices = rank_indices
))
} Example: > data <- matrix(rnorm(20), ncol = 4)
> # Perform quantile normalization
> normalized_data <- quantile_normalize(data)
> normalized_data
$normalized_data
[,1] [,2] [,3] [,4]
[1,] -0.50206621 0.05395216 -1.34467548 -0.50206621
[2,] 0.28707442 0.28707442 0.05395216 0.05395216
[3,] 0.05395216 -1.34467548 1.45839000 -1.34467548
[4,] -1.34467548 1.45839000 -0.50206621 0.28707442
[5,] 1.45839000 -0.50206621 0.28707442 1.45839000
$row_means
[1] -1.34467548 -0.50206621 0.05395216 0.28707442 1.45839000
$sorted_data
[,1] [,2] [,3] [,4]
[1,] -1.34467548 -0.50206621 0.05395216 0.28707442
[2,] 1.45839000 -1.34467548 -0.50206621 0.05395216
[3,] 0.28707442 1.45839000 -1.34467548 -0.50206621
[4,] 0.05395216 0.28707442 1.45839000 -1.34467548
[5,] -0.50206621 0.05395216 0.28707442 1.45839000
$rank_indices
[,1] [,2] [,3] [,4]
[1,] 5 5 3 3
[2,] 3 4 1 2
[3,] 4 2 4 4
[4,] 1 3 2 1
[5,] 2 1 5 5
> as.data.frame(normalized_data$normalized_data) |>
+ sapply(function(x) quantile(x, probs = seq(0,1,1/4)))
V1 V2 V3 V4
0% -1.34467548 -1.34467548 -1.34467548 -1.34467548
25% -0.50206621 -0.50206621 -0.50206621 -0.50206621
50% 0.05395216 0.05395216 0.05395216 0.05395216
75% 0.28707442 0.28707442 0.28707442 0.28707442
100% 1.45839000 1.45839000 1.45839000 1.45839000 |
spsanderson
added a commit
that referenced
this issue
Apr 3, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Make a quantile_normalization function
Example here:
https://www.spsanderson.com/steveondata/posts/2024-03-28/
The text was updated successfully, but these errors were encountered: