Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix transform slowness #5493

Merged
merged 7 commits into from
Jan 6, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -550,6 +550,9 @@

53. `as.data.frame(DT, row.names=)` no longer silently ignores `row.names`, [#5319](https://github.com/Rdatatable/data.table/issues/5319). Thanks to @dereckdemezquita for the fix and PR, and @ben-schwen for guidance.


54. `transform` was extremely slow when creating new columns. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1

## NOTES

1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :
Expand Down
23 changes: 4 additions & 19 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -2291,25 +2291,10 @@ transform.data.table = function (`_data`, ...)
# basically transform.data.frame with data.table instead of data.frame, and retains key
{
if (!cedta()) return(NextMethod()) # nocov
e = eval(substitute(list(...)), `_data`, parent.frame())
tags = names(e)
inx = chmatch(tags, names(`_data`))
matched = !is.na(inx)
if (any(matched)) {
.Call(C_unlock, `_data`) # fix for #1641, now covered by test 104.2
`_data`[,inx[matched]] = e[matched]
`_data` = as.data.table(`_data`)
}
if (!all(matched)) {
ans = do.call("data.table", c(list(`_data`), e[!matched]))
} else {
ans = `_data`
}
key.cols = key(`_data`)
OfekShilon marked this conversation as resolved.
Show resolved Hide resolved
if (!any(tags %chin% key.cols)) {
setattr(ans, "sorted", key.cols)
}
ans
`_data` = copy(`_data`)
OfekShilon marked this conversation as resolved.
Show resolved Hide resolved
e = eval(substitute(list(...)), `_data`, parent.frame())
set(`_data`, ,names(e), e)
`_data`
}

subset.data.table = function (x, subset, select, ...)
Expand Down
2 changes: 1 addition & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ name_dots = function(...) {
if (any(notnamed)) {
syms = vapply_1b(dot_sub, is.symbol) # save the deparse() in most cases of plain symbol
for (i in which(notnamed)) {
tmp = if (syms[i]) as.character(dot_sub[[i]]) else deparse(dot_sub[[i]])[1L]
tmp = if (syms[i]) as.character(dot_sub[[i]]) else deparse(dot_sub[[i]], nlines=1)[1L]
OfekShilon marked this conversation as resolved.
Show resolved Hide resolved
if (tmp == make.names(tmp)) vnames[i]=tmp
}
}
Expand Down
4 changes: 1 addition & 3 deletions man/transform.data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@
\description{
Utilities for \code{data.table} transformation.

\strong{\code{transform} by group is particularly slow. Please use \code{:=} by group instead.}

\code{within}, \code{transform} and other similar functions in \code{data.table} are not just provided for users who expect them to work, but for non-data.table-aware packages to retain keys, for example. Hopefully the (much) faster and more convenient \code{data.table} syntax will be used in time. See examples.
\code{within}, \code{transform} and other similar functions in \code{data.table} are not just provided for users who expect them to work, but for non-data.table-aware packages to retain keys, for example. Hopefully the faster and more convenient \code{data.table} syntax will be used in time. See examples.
}
\usage{
\method{transform}{data.table}(`_data`, \ldots)
Expand Down