Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table::transpose applied to lists changes element types #5639

Closed
MLopez-Ibanez opened this issue May 19, 2023 · 1 comment · Fixed by #5805
Closed

data.table::transpose applied to lists changes element types #5639

MLopez-Ibanez opened this issue May 19, 2023 · 1 comment · Fixed by #5805

Comments

@MLopez-Ibanez
Copy link
Contributor

Applying data.table::transpose to a data.table that contains numerical and string values will convert everything to strings. By comparison purrr::transpose preserves the original types by creating a list of lists. Note that it is very easy to convert a list of lists to a list of vectors (that is, get the output of data.table::transpose from the output of purrr::transpose) but very hard the other way.

# Minimal reproducible example; please be sure to set verbose=TRUE where possible!

library(data.table)
dt <- data.table(num=runif(3), char = letters[1:3])
str(data.table::transpose(dt)) # I want a data.table, so fine.
# Classes ‘data.table’ and 'data.frame':	2 obs. of  3 variables:
#  $ V1: chr  "0.21412930265069" "a"
#  $ V2: chr  "0.0449177662376314" "b"
#  $ V3: chr  "0.299894759198651" "c"
#  - attr(*, ".internal.selfref")=<externalptr> 
str(data.table::transpose(as.list(dt))) # I want a list, so it should be a list of lists.
# List of 3
#  $ : chr [1:2] "0.21412930265069" "a"
#  $ : chr [1:2] "0.0449177662376314" "b"
#  $ : chr [1:2] "0.299894759198651" "c"
library(purrr)
str(purrr::transpose(dt)) # Exactly right!
# List of 3
#  $ :List of 2
#   ..$ num : num 0.214
#   ..$ char: chr "a"
#  $ :List of 2
#   ..$ num : num 0.0449
#   ..$ char: chr "b"
#  $ :List of 2
#   ..$ num : num 0.3
#   ..$ char: chr "c"

This is a very frequent operation, which is currently very hard to do with data.table. See

https://stackoverflow.com/questions/3492379/data-frame-rows-to-a-list
https://stackoverflow.com/questions/27784076/r-data-table-to-list-of-rows-better-ways
https://stackoverflow.com/questions/41005943/turn-all-data-table-rows-into-a-list-r
https://stackoverflow.com/questions/56901903/convert-a-data-table-to-list-of-rows-that-are-data-tables-and-apply-a-function-t

(and more)

@MLopez-Ibanez
Copy link
Contributor Author

The solution recommended often in stackoverflow (and used by data.table:::split.data.table):

lapply(seq_len(nrow(dt)), function(ind) as.list(dt[ind]))

is even slower than the same operation with data.frame (around 5 times slower in my computer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants