Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kaiju2anvio.R Execution halted when rows have some missing column values. #2

Open
Mayurk619 opened this issue May 18, 2024 · 1 comment

Comments

@Mayurk619
Copy link

Mayurk619 commented May 18, 2024

When I run his command Rscript kaiju2anvio.R gene_calls_nr.names gene_calls_nr-fixed.names
I am getting the following error in terminal. I'm not able to understand the error. Kindly help.

Loading required package: parallel
Error in cbind(as.matrix(kaiju.names[, 2]), mat) :
number of rows of matrices must match (see arg 2)
Calls: kaiju2mat -> cbind
In addition: Warning message:
In matrix(unlist(mclapply(1:nrow(kaiju.names), FUN = function(i) { :
data length [2193721] is not a sub-multiple or multiple of the number of rows [313389]
Execution halted

@Mayurk619
Copy link
Author

Mayurk619 commented May 20, 2024

I solved it by changing the script.

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
parallel=TRUE

# Input control
if (length(args) == 0) {
  stop("At least one argument must be supplied (input kaiju file).\n", call.=FALSE)
} else if (length(args) == 1) {
  # default output file
  args[2] = "kaiju2Anvio-fixed.names"
} else if (length(args) == 3) {
  parallel = args[3]
}

# Parallel package install control
if (!require("parallel")) install.packages("parallel")

# Function
kaiju2mat <- function(kaiju.names, parallel) {
  require(parallel)
  if (isTRUE(parallel)) {
    cores <- detectCores() - 1
  } else {
    cores <- parallel
  }
  
  mat <- matrix(unlist(mclapply(1:nrow(kaiju.names), FUN = function(i) {
    if (kaiju.names[i, 8] != "") {
      x.tmp <- unlist(strsplit(as.character(kaiju.names[i, 8]), split = ";"))
      length(x.tmp) <- 7
      return(x.tmp)
    } else {
      x.tmp <- rep(NA, 7)
      return(x.tmp)
    }
  }, mc.cores = cores)), ncol = 7, byrow = TRUE)
  
  if (nrow(mat) != nrow(kaiju.names)) {
    stop(paste("Mismatch in the number of rows between 'mat' (", nrow(mat), ") and 'kaiju.names' (", nrow(kaiju.names), ").\n", sep = ""))
  }
  
  mat <- cbind(as.matrix(kaiju.names[, 2]), mat)
  colnames(mat) <- c("gene_callers_id", "t_domain", "t_phylum", "t_class", "t_order", "t_family", "t_genus", "t_species")
  return(mat)
}

# __MAIN__
kaiju.names <- read.table(file = args[1], sep = "\t", fill = TRUE, row.names = NULL, header = FALSE, quote = "")
print("kaiju.names:")
print(head(kaiju.names))

# Check if the expected number of columns is present
if (ncol(kaiju.names) < 8) {
  stop("Input file does not have the expected number of columns.\n", call.=FALSE)
}

kaijumat <- kaiju2mat(kaiju.names = kaiju.names, parallel = parallel)
print("kaijumat:")
print(head(kaijumat))

# Write the output file with tab delimiters
write.table(kaijumat, file = args[2], quote = FALSE, col.names = TRUE, row.names = FALSE, sep = "\t")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant