Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terra::predict() fails with NAs in input, and multiple returns #998

Closed
lucas-johnson opened this issue Feb 3, 2023 · 10 comments
Closed

Comments

@lucas-johnson
Copy link

As always, thanks for such a great tool. I use it almost every day.

This is pretty odd, and seems to be related toNAs in the input data and a predict function that would result in multiple output bands.

library(terra)
#> Warning: package 'terra' was built under R version 4.1.2
#> terra 1.7.3

a <- rast(ncol=1000, nrow=1000, xmin=0, names="a")
values(a) <- c(10, rep(NA, 999998), 100)
b <- rast(ncol=1000, nrow=1000, xmin=0, names="b")
values(b) <- c(1, rep(NA, 999998), 2)
test <- rast(list(a, b))
dummy_model <- list()


terra::predict(
  test, 
  dummy_model, 
  fun = function(object, newdata) { 
    return(data.frame(x = newdata |> rowSums() |> as.vector(), 
                      y = newdata |> rowSums() |> as.vector() + 5,
                      z = newdata |> rowSums() |> as.vector() * 5))
  }, 
  na.rm = FALSE,
  filename = paste0(tempfile(), '.tiff')
)
#> class       : SpatRaster 
#> dimensions  : 1000, 1000, 3  (nrow, ncol, nlyr)
#> resolution  : 0.18, 0.18  (x, y)
#> extent      : 0, 180, -90, 90  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source      : file10ccdf3c3993.tiff 
#> names       :   x,   y,   z 
#> min values  :  11,  16,  55 
#> max values  : 102, 107, 510

terra::predict(
  test, 
  dummy_model, 
  fun = function(object, newdata) { 
    return(data.frame(x = newdata |> rowSums() |> as.vector()))
  }, 
  na.rm = TRUE,
  filename = paste0(tempfile(), '.tiff')
)
#> Warning: [predict] Cannot determine the number of output variables. Assuming 1.
#> Use argument 'index' to set it manually
#> class       : SpatRaster 
#> dimensions  : 1000, 1000, 1  (nrow, ncol, nlyr)
#> resolution  : 0.18, 0.18  (x, y)
#> extent      : 0, 180, -90, 90  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source      : file10ccd3ae7e9b5.tiff 
#> name        : lyr1 
#> min value   :   11 
#> max value   :  102

terra::predict(
  test, 
  dummy_model, 
  fun = function(object, newdata) { 
    return(data.frame(x = newdata |> rowSums() |> as.vector(), 
                      z = newdata |> rowSums() |> as.vector() * 5))
  }, 
  na.rm = TRUE,
  filename = paste0(tempfile(), '.tiff')
)
#> Warning: [predict] Cannot determine the number of output variables. Assuming 1.
#> Use argument 'index' to set it manually
#> Error: [predict] the number of values returned by the predict function does not match the input.

Created on 2023-02-02 with reprex v2.0.2

@lucas-johnson lucas-johnson changed the title Terra predict fails with NAs in input, and multiple returns terra::predict() fails with NAs in input, and multiple returns Feb 3, 2023
rhijmans added a commit that referenced this issue Feb 3, 2023
@rhijmans
Copy link
Member

rhijmans commented Feb 3, 2023

Thank you. What happens it that terra needs to find out the shape of the output data. It does that with a sample. In an extreme case like this, the sample will only have NAs and then with na.rm=TRUE there is nothing to test with. random numbers are now used in that case. That will probably work for most models.

You should have been able to use index = 1:2 (for a two layer output), but that had a bug that is now also fixed.

The below works now:

library(terra)
a <- rast(ncol=1000, nrow=1000, xmin=0, names="a")
values(a) <- c(10, rep(NA, 999998), 100)
b <- rast(ncol=1000, nrow=1000, xmin=0, names="b")
values(b) <- c(1, rep(NA, 999998), 2)
test <- rast(list(a, b))
dummy_model <- list()

pfun1 <- function(object, newdata) { data.frame(x=rowSums(newdata), y=rowSums(newdata)+5, z=rowSums(newdata)*5) }
pfun2 <- function(object, newdata) { data.frame(x=rowSums(newdata), y=rowSums(newdata)+5) } 
pfun3 <- function(object, newdata) { data.frame(x=rowSums(newdata)) }
  
p1 <- terra::predict(test, dummy_model, fun=pfun1, na.rm=TRUE)
p2 <- terra::predict(test, dummy_model, fun=pfun2, na.rm=TRUE)
p3 <- terra::predict(test, dummy_model, fun=pfun3, na.rm=TRUE)

p1 <- terra::predict(test, dummy_model, fun=pfun1, na.rm=FALSE)
p2 <- terra::predict(test, dummy_model, fun=pfun2, na.rm=FALSE)
p3 <- terra::predict(test, dummy_model, fun=pfun3, na.rm=FALSE)

@lucas-johnson
Copy link
Author

Thanks for the quick fix!

@lucas-johnson
Copy link
Author

reprex.zip

@lucas-johnson
Copy link
Author

I'm hitting this issue (or similar) again with some odd shaped data in my predict pipeline. There are quite a few layers in the raster here, and I couldn't quite pin down exactly what is strange about this raster to produce a synthetic reprex so I just supplied the data as is. Hopefully it's not impossible for you to track down what's going on here. Thank you!

library(terra)
#> Warning: package 'terra' was built under R version 4.1.3
#> terra 1.7.6
x <- tempfile()
download.file('https://github.com/rspatial/terra/files/10580924/reprex.zip', x)
x <- unzip(x)
x <- rast(x)
as.data.frame(x) |> na.omit() |> nrow()
#> [1] 4680

p_fun <- function(object, newdata, ...) {
  return(data.frame(x = rep(1, nrow(newdata)), y = rep(0, nrow(newdata))))
}

terra::predict(x, list(), fun = p_fun, na.rm = TRUE)
#> Error: [predict] the number of values returned by the predict function does not match the input.

Created on 2023-02-03 with reprex v2.0.2

rhijmans added a commit that referenced this issue Feb 3, 2023
@rhijmans
Copy link
Member

rhijmans commented Feb 3, 2023

Thanks very much for reporting that and your patience. This happened because the check for NA values was did not consider variation between layers. In this case the ecozones layer has many NAs where other layers do not. Fixed now.

terra::predict(x, list(), fun = p_fun, na.rm = TRUE)
#class       : SpatRaster 
#dimensions  : 333, 333, 2  (nrow, ncol, nlyr)
#resolution  : 30, 30  (x, y)
#extent      : 1816965, 1826955, 2147295, 2157285  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#source(s)   : memory
#names       : x, y 
#min values  : 1, 0 
#max values  : 1, 0 

With this example you could of course have used na.rm=FALSE, so I assume that the model you actually use cannot handle NAs.

@lucas-johnson
Copy link
Author

Awesome, thank you! Yes, I'm using a few models but I think the ranger implementation of random forest in particular doesn't like NAs. So na.rm=TRUE is indeed necessary.

@rhijmans
Copy link
Member

rhijmans commented Feb 3, 2023

That is correct. "ranger" has a major design flaw. Instead of NA it returns nothing for records that have NAs.

@lucas-johnson
Copy link
Author

reprex_2.zip

@lucas-johnson
Copy link
Author

lucas-johnson commented Feb 4, 2023

Here's another example (I promise I'm not trying to find these!) where most layers in the raster stack are fully populated but a few are very sparse. Thanks so much for sorting all of this out – it makes my life much much easier when terra handles NAs and these oddly shaped raster stacks.

library(terra)
#> Warning: package 'terra' was built under R version 4.1.3
#> terra 1.7.6

x <- tempfile()
download.file('https://github.com/rspatial/terra/files/10608770/reprex_2.zip', 
              x)
x <- unzip(x)
x <- rast(x)

as.data.frame(x) |> na.omit() |> nrow()
#> [1] 155
terra::ncell(x)
#> [1] 111222

p_fun_bad <- function(object, newdata, ...) {
  return(data.frame(x = rep(1, nrow(newdata)), y = rep(0, nrow(newdata))))
}

p_fun_good <- function(object, newdata, ...) {
  return(data.frame(x = rep(1, nrow(newdata))))
}


terra::predict(x, list(), fun = p_fun_good, na.rm = TRUE)
#> class       : SpatRaster 
#> dimensions  : 333, 334, 1  (nrow, ncol, nlyr)
#> resolution  : 30, 30  (x, y)
#> extent      : 1866945, 1876965, 2217285, 2227275  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#> source(s)   : memory
#> name        : lyr1 
#> min value   :    1 
#> max value   :    1
terra::predict(x, list(), fun = p_fun_bad, na.rm = FALSE)
#> class       : SpatRaster 
#> dimensions  : 333, 334, 2  (nrow, ncol, nlyr)
#> resolution  : 30, 30  (x, y)
#> extent      : 1866945, 1876965, 2217285, 2227275  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#> source(s)   : memory
#> names       : x, y 
#> min values  : 1, 0 
#> max values  : 1, 0
terra::predict(x, list(), fun = p_fun_bad, na.rm = TRUE)
#> Error in m[i, ] <- r: number of items to replace is not a multiple of replacement length

Created on 2023-02-04 with reprex v2.0.2

rhijmans added a commit that referenced this issue Feb 4, 2023
@rhijmans
Copy link
Member

rhijmans commented Feb 4, 2023

Thanks, fixed now. This was the corner case of having just one cell without NAs in the test data.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants