Pipeline stutters with crew, complete few tasks and fails. [help] #1207
Replies: 4 comments 4 replies
-
It turned out that one step was failing (wrong index of a variable). However, the pipeline crashed badly when using crew.cluster, rather than throwing the expected error message. My specific reprex is very hard to reproduce; I am wondering if this crash on otherwise benign error is generally observed when using the crew.cluster. |
Beta Was this translation helpful? Give feedback.
-
The hanging might go away if you upgrade # _targets.R file:
library(targets)
library(crew.cluster)
tar_option_set(
controller = crew_controller_sge(
script_lines = "module load R/4.2.2",
workers = 100
)
)
list(
tar_target(x, rep(2, 1000)),
tar_target(y, Sys.sleep(x), pattern = map(x))
) The above pipeline completed in about 36 seconds using my company's SGE cluster. If the equivalent SLURM pipeline hangs on your end, it could be package versions or something to do with the SLURM setup. |
Beta Was this translation helpful? Give feedback.
-
@susansjy22 is also having the same problem. Please, @susansjy22 follow the above suggestion. |
Beta Was this translation helpful? Give feedback.
-
crew_controller_local(name = "local_30", workers = 30) Your test script works in local (~1 minute) tar_script({
library(targets)
library(crew.cluster)
library(crew)
tar_option_set(
controller = crew_controller_local(
workers = 30
)
)
list(
tar_target(x, rep(2, 1000)),
tar_target(y, Sys.sleep(x), pattern = map(x))
)
}, script = "temp") However, on the real pipeline gives If I run it with ▶ dispatched (pending) branch pseudobulk_df_1ef64f65
▶ dispatched (pending) branch pseudobulk_df_05b48053
▶ dispatched (pending) branch pseudobulk_df_83d48b69
▶ dispatched (pending) branch pseudobulk_df_92d32f30
▶ dispatched (pending) branch pseudobulk_df_0cdfdc5c
✔ skipped pipeline [3.788 minutes]
Warning messages:
1: <anonymous>: ..1 may be used in an incorrect context
2: <anonymous>: ..2 may be used in an incorrect context
3: <anonymous>: ..3 may be used in an incorrect context
4: <anonymous>: ..4 may be used in an incorrect context
5: <anonymous>: ..1 may be used in an incorrect context
6: <anonymous>: ..1 may be used in an incorrect context
7: <anonymous>: ..2 may be used in an incorrect context
8: <anonymous>: ..3 may be used in an incorrect context
Error:
! Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
all(is.numeric(.)) && all(length(.) == 1L) && all(!anyNA(.)) && all(. >= 0) is not true on . = seconds_timeout
Last error traceback:
tryCatch(withCallingHandlers({ NULL saveRDS(do.call(do.call, c(readRDS("...
tryCatchList(expr, classes, parentenv, handlers)
tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), na...
doTryCatch(return(expr), name, parentenv, handler)
tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
tryCatchOne(expr, names, parentenv, handlers[[1L]])
doTryCatch(return(expr), name, parentenv, handler)
withCallingHandlers({ NULL saveRDS(do.call(do.call, c(readRDS("/tmp/Rtmp...
saveRDS(do.call(do.call, c(readRDS("/tmp/RtmpmITIX1/callr-fun-9a18699a17...
do.call(do.call, c(readRDS("/tmp/RtmpmITIX1/callr-fun-9a18699a17ef"), li...
(function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
(function (targets_function, targets_arguments, options, envir = NULL, s...
tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
tryCatchList(expr, classes, parentenv, handlers)
tryCatchOne(expr, names, parentenv, handlers[[1L]])
doTryCatch(return(expr), name, parentenv, handler)
withCallingHandlers(targets::tar_callr_inner_try(targets_function = targ...
targets::tar_callr_inner_try(targets_function = targets_function, target...
do.call(targets_function, targets_arguments)
(function (pipeline, path_store, names_quosure, shortcut, reporter, seco...
crew_init(pipeline = pipeline, meta = meta_init(path_store = path_store)...
self$run_crew()
self$iterate()
if_any(queue$should_dequeue(), self$process_target(queue$dequeue()), sel...
self$controller$wait(mode = "one", seconds_interval = interval, seconds_...
if_any(identical(mode, "one"), private$.wait_one(controllers = control, ...
private$.wait_one(controllers = control, seconds_interval = seconds_inte...
crew_retry(fun = ~{ if (scale) { walk(controllers, ~.x$scale(throttle = ...
crew_assert(seconds_timeout, is.numeric(.), length(.) == 1L, !anyNA(.), ...
crew_error(message %|||% out)
crew_stop(message = message, class = c("crew_error", "crew"))
rlang::abort(message = message, class = class, call = emptyenv())
signal_abort(cnd, .file)
```
Thanks!
```r
> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /stornext/System/data/apps/R/R-4.3.0/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/apps/R/R-4.3.0/lib64/R/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] crew_0.8.0 crew.cluster_0.2.0 glue_1.7.0 targets_1.4.1 lubridate_1.9.3
[6] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[11] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0 HPCell_0.1.0
[16] bigmemory_4.6.1 Biobase_2.62.0 BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] R.methodsS3_1.8.2 IRanges_2.36.0 progress_1.2.3
[4] tidybulk_1.15.4 goftest_1.2-3 Biostrings_2.70.1
[7] HDF5Array_1.30.0 rstan_2.32.5 vctrs_0.6.5
[10] spatstat.random_3.2-2 digest_0.6.34 png_0.1-8
[13] shape_1.4.6 registry_0.5-1 ggrepel_0.9.5
[16] deldir_2.0-2 parallelly_1.36.0 MASS_7.3-60
[19] reshape2_1.4.4 httpuv_1.6.12 foreach_1.5.2
[22] withr_3.0.0 xfun_0.41 EnsDb.Hsapiens.v86_2.99.0
[25] ggpubr_0.6.0 ellipsis_0.3.2 survival_3.5-7
[28] memoise_2.0.1 ggbeeswarm_0.7.2 systemfonts_1.0.5
[31] zoo_1.8-12 GlobalOptions_0.1.2 V8_4.4.0
[34] pbapply_1.7-2 R.oo_1.25.0 prettyunits_1.2.0
[37] KEGGREST_1.42.0 promises_1.2.1 httr_1.4.7
[40] rstatix_0.7.2 restfulr_0.0.15 globals_0.16.2
[43] fitdistrplus_1.1-11 rhdf5filters_1.14.0 nanonext_0.12.0
[46] rhdf5_2.46.0 ps_1.7.6 rstudioapi_0.15.0
[49] miniUI_0.1.1.1 generics_0.1.3 ggalluvial_0.12.5
[52] processx_3.8.3 curl_5.2.0 S4Vectors_0.40.2
[55] zlibbioc_1.48.0 ScaledMatrix_1.10.0 polyclip_1.10-6
[58] ExperimentHub_2.10.0 GenomeInfoDbData_1.2.11 SparseArray_1.2.3
[61] interactiveDisplayBase_1.40.0 xtable_1.8-4 doParallel_1.0.17
[64] S4Arrays_1.2.0 preprocessCore_1.64.0 BiocFileCache_2.10.1
[67] hms_1.1.3 ttservice_0.4.0 GenomicRanges_1.54.1
[70] irlba_2.3.5.1 filelock_1.0.3 colorspace_2.1-0
[73] ggnetwork_0.5.12 ROCR_1.0-11 reticulate_1.34.0
[76] spatstat.data_3.0-3 magrittr_2.0.3 lmtest_0.9-40
[79] glmGamPoi_1.14.0 later_1.3.1 viridis_0.6.4
[82] lattice_0.22-5 spatstat.geom_3.2-7 NMF_0.26
[85] future.apply_1.11.1 scattermore_1.2 XML_3.99-0.13
[88] scuttle_1.12.0 cowplot_1.1.1 matrixStats_1.2.0
[91] RcppAnnoy_0.0.21 pillar_1.9.0 StanHeaders_2.32.5
[94] tidySummarizedExperiment_1.11.6 nlme_3.1-164 iterators_1.0.14
[97] sna_2.7-1 gridBase_0.4-7 compiler_4.3.0
[100] beachmat_2.18.0 RSpectra_0.16-1 stringi_1.8.3
[103] minqa_1.2.6 tensor_1.5 SummarizedExperiment_1.32.0
[106] GenomicAlignments_1.38.0 plyr_1.8.9 crayon_1.5.2
[109] abind_1.4-5 BiocIO_1.12.0 scater_1.30.0
[112] tidyseurat_0.8.0 locfit_1.5-9.8 sp_2.1-2
[115] bit_4.0.5 codetools_0.2-19 BiocSingular_1.18.0
[118] QuickJSR_1.0.9 GetoptLong_1.0.5 plotly_4.10.3
[121] mime_0.12 splines_4.3.0 circlize_0.4.15
[124] Rcpp_1.0.12 fastDummies_1.7.3 dbplyr_2.4.0
[127] sparseMatrixStats_1.14.0 knitr_1.45 blob_1.2.4
[130] utf8_1.2.4 BiocVersion_3.18.0 clue_0.3-65
[133] lme4_1.1-35.1 AnnotationFilter_1.26.0 fs_1.6.3
[136] listenv_0.9.0 DelayedMatrixStats_1.24.0 getip_0.1-3
[139] pkgbuild_1.4.3 tidySingleCellExperiment_1.11.8 ggsignif_0.6.4
[142] Matrix_1.6-4 callr_3.7.3 statmod_1.5.0
[145] tzdb_0.4.0 svglite_2.1.2 pkgconfig_2.0.3
[148] network_1.18.1 tools_4.3.0 cachem_1.0.8
[151] RSQLite_2.3.3 viridisLite_0.4.2 DBI_1.1.3
[154] celldex_1.12.0 tarchetypes_0.7.9 scDblFinder_1.16.0
[157] fastmap_1.1.1 scales_1.3.0 grid_4.3.0
[160] ica_1.0-3 Seurat_5.0.1 Rsamtools_2.18.0
[163] AnnotationHub_3.10.0 broom_1.0.5 patchwork_1.2.0
[166] coda_0.19-4 FNN_1.1.3.2 BiocManager_1.30.22
[169] dotCall64_1.1-1 carData_3.0-5 SingleR_2.4.0
[172] RANN_2.6.1 yaml_2.3.7 MatrixGenerics_1.14.0
[175] rtracklayer_1.62.0 cli_3.6.2 stats4_4.3.0
[178] leiden_0.4.3.1 lifecycle_1.0.4 uwot_0.1.16
[181] bluster_1.12.0 backports_1.4.1 mirai_0.11.3
[184] DropletUtils_1.22.0 BiocParallel_1.36.0 timechange_0.2.0
[187] gtable_0.3.4 rjson_0.2.21 ggridges_0.5.4
[190] progressr_0.14.0 parallel_4.3.0 limma_3.58.0
[193] jsonlite_1.8.8 edgeR_4.0.1 RcppHNSW_0.5.0
[196] bitops_1.0-7 bigmemory.sri_0.1.6 bit64_4.0.5
[199] xgboost_1.7.5.1 Rtsne_0.16 spatstat.utils_3.0-4
[202] BiocNeighbors_1.20.0 SeuratObject_5.0.1 RcppParallel_5.1.7
[205] metapod_1.10.0 dqrng_0.3.2 loo_2.6.0
[208] R.utils_2.12.3 lazyeval_0.2.2 shiny_1.8.0
[211] htmltools_0.5.7 sctransform_0.4.1 rappdirs_0.3.3
[214] ensembldb_2.26.0 spam_2.10-0 XVector_0.42.0
[217] RCurl_1.98-1.12 scran_1.30.0 gridExtra_2.3
[220] boot_1.3-28.1 igraph_1.5.0.1 R6_2.5.1
[223] SingleCellExperiment_1.24.0 GenomicFeatures_1.54.1 cluster_2.1.6
[226] rngtools_1.5.2 Rhdf5lib_1.24.0 CellChat_1.6.1
[229] GenomeInfoDb_1.38.5 nloptr_2.0.3 statnet.common_4.9.0
[232] ProtGenerics_1.34.0 DelayedArray_0.28.0 tidyselect_1.2.0
[235] vipor_0.4.5 xml2_1.3.3 inline_0.3.19
[238] car_3.1-2 AnnotationDbi_1.64.0 future_1.33.1
[241] rsvd_1.0.5 munsell_0.5.0 KernSmooth_2.23-22
[244] data.table_1.14.8 htmlwidgets_1.6.3 ComplexHeatmap_2.18.0
[247] RColorBrewer_1.1-3 biomaRt_2.58.0 rlang_1.1.3
[250] spatstat.sparse_3.0-3 spatstat.explore_3.2-5 uuid_1.1-1
[253] fansi_1.0.6 base64url_1.4 beeswarm_0.4.0 |
Beta Was this translation helpful? Give feedback.
-
Help
Description
I think, after one of the recent updates of the ecosystem, targets started stuttering, in the sense that it crashes with the below error, but if I start it over and over again, it eventually completes the pipeline a few tasks at a time.
this is the resource I use
crew.cluster::crew_controller_slurm(
name = "slurm",
slurm_memory_gigabytes_per_cpu = 50,
slurm_cpus_per_task = 1,
workers = 100,
verbose = T
)
for completeness the pipeline is as follows
This seems pipeline independent as also other pipeline started to have this behaviour.
Beta Was this translation helpful? Give feedback.
All reactions