Skip to content

Commit

Permalink
Merge pull request #148 from birdflow-science/preprocess-2022
Browse files Browse the repository at this point in the history
* Add support for ebirdst 3.2022.0
* Change format of data frame returned by get_dates()
* Support ebird 2022 date scheme when using 2022 models.
  • Loading branch information
ethanplunkett committed Dec 11, 2023
2 parents 43beefa + 152f824 commit 97dfdf6
Show file tree
Hide file tree
Showing 52 changed files with 1,233 additions and 277 deletions.
4 changes: 3 additions & 1 deletion .lintr
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
linters: linters_with_defaults(cyclocomp_linter = NULL) # drop cyclocomp linter
linters: linters_with_defaults(
cyclocomp_linter = NULL,
indentation_linter = NULL)
encoding: "UTF-8"
exclusions: list( # dropping rendered version of vignettes, .Rmd still ckecked
"vignettes/BirdFlowR.R",
Expand Down
94 changes: 90 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,94 @@
# BirdFlowR 0.1.0.9039
2023-11-21*
# BirdFlowR 0.1.0.9040
2023-12-05

## Support for **ebirdst** 3.2002.0 added.
**BirdFlowR** can now fit models based on eBird 2022 data or 2021 data and
will preprocess using whichever version of **ebirdst** is loaded.
Both types of fitted models can be used with BirdFlowR.
Most testing will run with either version of **ebirdst** installed but CRAN
checks will only pass with the new version due to references to objects that
don't exist in the old version.

## Breaking
* `get_dates()`
* Models fit using **ebirdst** 2.2021 have `$dates` columns:
"interval", "date", "midpoint", "start", "end", "doy", and "week".
* Models fit with **ebirdst** 3.2022 have `$dates` columns:
"timestep", "date", "label", "julian", "week".
* Regardless of the `$dates` format in the model object `get_dates()`
returns the newer columns: "timestep", "date", "label", "julian", "week".
Previously it returned the older columns.
* Replacing `bf$dates` with `get_dates()` and adapting to the new column
names is recommended.

* Inconsistent weeks. **eBird** changed the way dates are assigned to weeks
in the 2022 version. See notes in `get_dates()` for details.
**BirdFlowR** honors the date scheme used
in the eBird data each model was built with and thus some dates will be
assigned to a different week with a 2021 model than they are with
a 2022 model this will affect `lookup_timestep()`,
`lookup_timestep_sequence()`, and the many functions that rely on them.

* Importing (old) BirdFlow models fit without a dynamic mask is no longer
supported. Predicting with them is. Use BirdFlowR 0.1.0.39 if you want to
import an old hdf5 file which could then be saved with `saveRDS()`.
Dynamic masks were added in 0.1.0.9001 (April 2023) so only models fit
(but not imported) before then will be affected.

## New

* `preprocess_species()` works with both **ebirdst** 3.2022.0 and
2.2021.3. It will use whichever version is loaded and models fit with
either eBird version year can be used with BirdFlow.

* New metadata items
* `birdflowr_preprocess_version`: the version of **BirdFlowR** used for
preprocessing.
* `ebirdst_version`: the **ebirdst** version used while preprocessing.

## Updates

* A number of interal changes were made to "`preprocess_species()`
to work with **ebirdst** v. 3.2022
* `species` can be set to either `"example_date"` or `"yeseb-example"` to
triggering using **ebirdst** example data. **BirdFlowR** will silently
switch between the two to accommodate **ebirdst**.
* If **ebirdst** version >= 3.2022 the quality of the species model is
checked using the `<x>_season_quality` values instead of the dropped
`<x>_range_modeled` information and an error is thrown if any value is
less than the new `min_season_quality` argument which defaults to `3`.
With **ebirdst** 2.2021 quality is still checked with `<x>_range_modeled`.
* With **ebirdst** >= 3.2022 all the new trends columns are dropped from
`ebirdst_runs` when creating the species data. The data available via
`species_info()` (and `$species`) is unchanged.
* A new `dates` format is now used with 2022 models.

* Date lookup code was overhauled througout the package.
* Most use of `$dates` was dropped in favor of `get_dates()` to handle the
two date formats in use.

* `preprocess_species()` snapshot tests were updated to use eBird 2022 derived
snapshots and are skipped if older versions of **eBirdst** are loaded, but
most `preprocess_species()` tests are still run.

* Several internal functions documented in `ebirdst-compatability` help
insulate **BirdFlowR** from the changes in the **ebirdst** API and facilitate
working with both versions.

* Updated tests for new `amewoo` model in **BirdFlowModels** (R package)* 0.0.2.9002
* Now depends on
# BirdFlowR 0.1.0.9039
2023-11-21

Updated tests to work with new example data and ebirdst 3.2022.0

* Updated tests for new `amewoo` model in **BirdFlowModels**
(R package) v. 0.0.2.9002
* Added formal dependency on BirdFlowModels >= 0.0.2.9002.
* Updates to pass CRAN checks but not preproces with ebirdst 3.2022.0
* Added ebird/ebirdst to remotes (to force installing dev version)
Revert this after changes in ebirdst 47bbdfc87 are on CRAN
* Add skip_if_unsupported_ebirdst_version() to preprocess species tests
as it currently does for ebirdst 3.2022.0.
* Add copy of `ebirdst_weeks` to BirdFlowR as internal data.

# BirdFlowR 0.1.0.9038
2023-11-16
Expand Down
2 changes: 1 addition & 1 deletion R/animate_distr.R
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
#' }
animate_distr <- function(distr, bf, title = species(bf), ...) {

p <- plot_distr(distr, bf, ...)
p <- plot_distr(distr, bf, ...)

# Drop faceting and add animation
a <- p +
Expand Down
2 changes: 1 addition & 1 deletion R/as_transitions.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@
as_transitions <- function(timesteps, bf) {
return(paste0("T_", pad_timestep(timesteps[-length(timesteps)], bf),
"-",
pad_timestep(timesteps[-1], bf)))
pad_timestep(timesteps[-1], bf)))
}
2 changes: 1 addition & 1 deletion R/birdflow_crs.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
#' crs(birdflow_crs, proj = TRUE)
#'
birdflow_crs <-
'PROJCRS["Western Mollweide",
'PROJCRS["Western Mollweide",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
Expand Down
25 changes: 17 additions & 8 deletions R/determine_resolution.R
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,11 @@
#' The last step is to round up (reducing parameters) to a cleaner number.
#'
#' @inheritParams preprocess_species
#' @param sp_path The species path used with \pkg{ebirdst} to download and load data
#' @param download_species The species code used with \pkg{ebirdst} this might be
#' "example_data" but otherwise will be a real species code.
#' @param sp_path The species path used with \pkg{ebirdst} to download and load
#' data
#' @param download_species The species code used with \pkg{ebirdst} this might
#' be "example_data" or "yebsap-example" but otherwise will be a real
#' species code.
#' @param project_method This is the method used to reproject it is a local
#' variable set within `preprocess_species`.
#'
Expand All @@ -57,6 +59,7 @@ determine_resolution <- function(sp_path,
project_method) {



verbose <- birdflow_options("verbose")
max_param_per_gb <- birdflow_options("max_param_per_gpu_gb")

Expand All @@ -79,8 +82,17 @@ determine_resolution <- function(sp_path,
cat("Calculating resolution\n")
# Load low res abundance data and calculate total areas birds occupy at any
# time (active_sq_m)
if (ebirdst_pkg_ver() < "3.2022.0") {
abunds <- ebirdst::load_raster("abundance",
path = sp_path, resolution = "lr")
path = sp_path,
resolution = res_label("lr"))

} else {
abunds <- ebirdst::load_raster(species = download_species,
product = "abundance",
resolution = res_label("lr"))

}

# Treat NA values as zeros - this better reflects what they actually are
v <- terra::values(abunds)
Expand All @@ -95,9 +107,6 @@ determine_resolution <- function(sp_path,
mask[is.na(mask)] <- FALSE
abunds <- terra::mask(abunds, clip2)




if (verbose) {
# Calculate percent of density lost
# will print after printing the resolved resolution
Expand Down Expand Up @@ -197,7 +206,7 @@ determine_resolution <- function(sp_path,
}

# With example date force resolution to be at least 30
if (download_species == "example_data" && res < 30) {
if (download_species %in% c("example_data", "yebsap-example") && res < 30) {
if (verbose)
cat("Resolution forced to 30 for example data,",
"which only has low resolution images\n")
Expand Down
108 changes: 108 additions & 0 deletions R/ebirdst_compatibility.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
#' @title ebirdst version compatability functions:
#'
#' @description
#' Internal functions to facilitate working with both the 2021 and 2022
#' versions of pkg{ebirdst} despite the significant changes to the API.
#'
#' @name ebirdst-compatability
NULL

#' @section ebirdst_pkg_ver:
#' `ebirdst_pkg_ver()` Look up the version of the currently installed
#' \pkg{ebirdst}.
#' @return `ebirdst_pkg_ver()`: The installed \pkg{ebirdst} package version or
#' `NA` if none.
#' @keywords internal
#' @rdname ebirdst-compatability
ebirdst_pkg_ver <- function() {
res <- tryCatch(utils::packageVersion("ebirdst"), error = identity)
if (inherits(res, "error"))
return(NA)
res
}


#' @section res_label():
#' Convert resolution labels so they are appropriate for the
#' installed \pkg{ebirdst}
#'
#' \pkg{ebirdst} 3.2022.0 switched from "lr", "mr", and "hr" to
#' "27km", "9km", and "3km" to indicate low, medium, and high resolution
#' versions of the raster data in function arguments.
#' [preprocess_species()] uses the older two letter
#' versions but runs them through this function before calling \pkg{ebirdst}
#' functions.
#'
#' @param res A resolution label. One of "lr", "mr", "hr", "27km", "9km", or
#' "3km
#'
#' @return `res_label()`: resolution labels appropriate for installed version
#' of \pkg{ebirdst}.
#' @rdname ebirdst-compatability
#' @keywords internal
res_label <- function(res) {
crosswalk <- data.frame("v2021" = c("lr", "mr", "hr"),
"v2022" = c("27km", "9km", "3km"))

if (ebirdst_pkg_ver() < "3.2022.0") {
alt_col <- "v2022"
valid_col <- "v2021"
} else {
alt_col <- "v2021"
valid_col <- "v2022"
}
sv <- res %in% crosswalk[[alt_col]]
res[sv] <- crosswalk[[valid_col]][match(res[sv], crosswalk[[alt_col]])]
if (!all(res %in% crosswalk[[valid_col]]))
stop("res should be one of ", paste(c(crosswalk[[alt_col]],
crosswalk[[valid_col]]),
collapse = ", "))
res
}

#' @section date_to_week():
#'
#' This is a slightly modified copy of `ebirdst::date_to_st_week()` that
#' allows calculating weeks from dates without depending on \pkg{ebirdst}.
#'
#' @param dates a vector of dates that can be processed by `as.POSIXlt()`
#' @param version A numeric (year) version of the eBird date scheme to use.
#' 2021 for the older or 2022 for the newer; other values will be snapped to
#' the closest of those two. The output of
#' `ebirdst::ebirdst_version()$version_year` or
#' `get_metadata(bf, "ebird_version_year")` is appropriate.
#' @rdname ebirdst-compatability
#' @return `date_to_week()`: A vector of week numbers associated with `dates`
#' @keywords internal
date_to_week <- function(dates, version = 2022) {
stopifnot(is.numeric(version),
length(version) == 1,
!is.na(version))

# Old scheme 2021 and earlier ebirdst_version_years
if (version <= 2021.5) {
dv <- seq(from = 0, to = 1, length.out = 52 + 1)
days <- (as.POSIXlt(dates)$yday + 0.5) / 366
return(findInterval(days, dv))
}

# New ebirdst v 3.2022.0 scheme, 2022 and later ebirdst_version_years
breaks <- c(-Inf, seq(7.5, 357.5, 7), Inf)
jd <- as.POSIXlt(dates)$yday + 1
return(findInterval(jd, breaks))
}

#' @section ebirdst_example_species():
#' Lookup the example species name that is appropriate for the
#' installed \pkg{ebirdst}. The example species changed
#' from "example_data" to "yebsap-example" in version 3.2022.0.
#' @return `ebirdst_example_species()`: The example species name for
#' \pkg{ebirdst}
#' @keywords internal
#' @rdname ebirdst-compatability
ebirdst_example_species <- function(){
ifelse(ebirdst_pkg_ver() < "3.2022.0",
"example_data",
"yebsap-example")
}

3 changes: 2 additions & 1 deletion R/evaluate_performance.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
#' DEPRECATED FUNCTION. Please use [distribution_performance()] instead.
#'
#' Calculate several the correlation between projected distributions and
#' the eBird Status and Trends (S&T) distributions used to train the BirdFlow model.
#' the eBird Status and Trends (S&T) distributions used to train the BirdFlow
#' model.
#'
#' @details "Training distribution" is used to describe the eBird S&T
#' distributions used to train the BirdFlow models. "Marginal distribution"
Expand Down
2 changes: 2 additions & 0 deletions R/find_dead_ends.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# nolint start: line_length_linter
#' @rdname fix_dead_ends
#' @title find and fix inconsistencies in sparse BirdFlow models
#'
Expand Down Expand Up @@ -48,6 +49,7 @@
#' [sparsify()] calls `fix_dead_ends()`, which in turn calls
#' `find_dead_ends()` and [fix_current_dead_ends()].
#' @keywords internal
# nolint end
find_dead_ends <- function(x) {
if (! has_marginals(x)) {
stop("x lacks marginals can't find dead ends.")
Expand Down
Loading

0 comments on commit 97dfdf6

Please sign in to comment.