diff --git a/DESCRIPTION b/DESCRIPTION index 70ba0d1..8df9eea 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -37,7 +37,7 @@ Suggests: asnipe, markdown SystemRequirements: GEOS (>= 3.2.0) -RoxygenNote: 7.2.1 +RoxygenNote: 7.2.2 VignetteBuilder: knitr Roxygen: list(markdown = TRUE) BugReports: https://github.com/ropensci/spatsoc/issues diff --git a/R/edge_dist.R b/R/edge_dist.R index 6411dd5..daf956d 100644 --- a/R/edge_dist.R +++ b/R/edge_dist.R @@ -13,20 +13,25 @@ #' \code{data.frame}, you can convert it by reference using #' \code{\link[data.table:setDT]{data.table::setDT}}. #' -#' The \code{id}, \code{coords} (and optional \code{timegroup} and -#' \code{splitBy}) arguments expect the names of a column in \code{DT} which -#' correspond to the individual identifier, X and Y coordinates, timegroup -#' (generated by \code{group_times}) and additional grouping columns. +#' The \code{id}, \code{coords} \code{timegroup} (and optional \code{splitBy}) +#' arguments expect the names of a column in \code{DT} which correspond to the +#' individual identifier, X and Y coordinates, timegroup (generated by +#' \code{group_times}) and additional grouping columns. #' #' If provided, the \code{threshold} must be provided in the units of the coordinates and must be larger than 0. #' If the \code{threshold} is NULL, the distance to all other individuals will be returned. The coordinates must be planar #' coordinates (e.g.: UTM). In the case of UTM, a \code{threshold} = 50 would #' indicate a 50m distance threshold. #' -#' The \code{timegroup} argument is optional, but recommended to pair with -#' \code{\link{group_times}}. The intended framework is to group rows temporally -#' with \code{\link{group_times}} then spatially with \code{edge_dist} (or -#' grouping functions). +#' The \code{timegroup} argument is required to define the temporal groups +#' within which edges are calculated. The intended framework is to group rows +#' temporally with \code{\link{group_times}} then spatially with \code{edge_dist}. +#' If you have already calculated temporal groups without +#' \code{\link{group_times}}, you can pass this column to the \code{timegroup} +#' argument. Note that the expectation is that each individual will be observed +#' only once per timegroup. Caution that accidentally including huge numbers of +#' rows within timegroups can overload your machine since all pairwise distances +#' are calculated within each timegroup. #' #' The \code{splitBy} argument offers further control over grouping. If within #' your \code{DT}, you have multiple populations, subgroups or other distinct diff --git a/R/edge_nn.R b/R/edge_nn.R index 9ef928d..e2fc7fe 100644 --- a/R/edge_nn.R +++ b/R/edge_nn.R @@ -13,20 +13,25 @@ #' \code{data.frame}, you can convert it by reference using #' \code{\link[data.table:setDT]{data.table::setDT}}. #' -#' The \code{id}, \code{coords} (and optional \code{timegroup} and -#' \code{splitBy}) arguments expect the names of a column in \code{DT} which -#' correspond to the individual identifier, X and Y coordinates, timegroup -#' (generated by \code{group_times}) and additional grouping columns. +#' The \code{id}, \code{coords}, \code{timegroup} (and optional \code{splitBy}) +#' arguments expect the names of a column in \code{DT} which correspond to the +#' individual identifier, X and Y coordinates, timegroup (generated by +#' \code{group_times}) and additional grouping columns. #' #' The \code{threshold} must be provided in the units of the coordinates. The #' \code{threshold} must be larger than 0. The coordinates must be planar #' coordinates (e.g.: UTM). In the case of UTM, a \code{threshold} = 50 would #' indicate a 50m distance threshold. #' -#' The \code{timegroup} argument is optional, but recommended to pair with -#' \code{\link{group_times}}. The intended framework is to group rows temporally -#' with \code{\link{group_times}} then spatially with \code{edge_nn} (or -#' grouping functions). +#' The \code{timegroup} argument is required to define the temporal groups +#' within which edge nearest neighbours are calculated. The intended framework +#' is to group rows temporally with \code{\link{group_times}} then spatially +#' with \code{edge_nn}. If you have already calculated temporal groups without +#' \code{\link{group_times}}, you can pass this column to the \code{timegroup} +#' argument. Note that the expectation is that each individual will be observed +#' only once per timegroup. Caution that accidentally including huge numbers of +#' rows within timegroups can overload your machine since all pairwise distances +#' are calculated within each timegroup. #' #' The \code{splitBy} argument offers further control over grouping. If within #' your \code{DT}, you have multiple populations, subgroups or other distinct diff --git a/R/group_pts.R b/R/group_pts.R index 6741d1b..419062c 100644 --- a/R/group_pts.R +++ b/R/group_pts.R @@ -10,20 +10,27 @@ #' \code{data.frame}, you can convert it by reference using #' \code{\link[data.table:setDT]{data.table::setDT}}. #' -#' The \code{id}, \code{coords} (and optional \code{timegroup} and -#' \code{splitBy}) arguments expect the names of a column in \code{DT} which -#' correspond to the individual identifier, X and Y coordinates, timegroup -#' (generated by \code{group_times}) and additional grouping columns. +#' The \code{id}, \code{coords}, \code{timegroup} (and optional \code{splitBy}) +#' arguments expect the names of a column in \code{DT} which correspond to the +#' individual identifier, X and Y coordinates, timegroup (typically generated by +#' \code{group_times}) and additional grouping columns. #' #' The \code{threshold} must be provided in the units of the coordinates. The #' \code{threshold} must be larger than 0. The coordinates must be planar #' coordinates (e.g.: UTM). In the case of UTM, a \code{threshold} = 50 would #' indicate a 50m distance threshold. #' -#' The \code{timegroup} argument is optional, but recommended to pair with -#' \code{\link{group_times}}. The intended framework is to group rows temporally -#' with \code{\link{group_times}} then spatially with \code{group_pts} (or -#' \code{\link{group_lines}}, \code{\link{group_polys}}). +#' The \code{timegroup} argument is required to define the temporal groups +#' within which spatial groups are calculated. The intended framework is to +#' group rows temporally with \code{\link{group_times}} then spatially with +#' \code{group_pts} (or \code{\link{group_lines}}, \code{\link{group_polys}}). +#' If you have already calculated temporal groups without +#' \code{\link{group_times}}, you can pass this column to the \code{timegroup} +#' argument. Note that the expectation is that each individual will be observed +#' only once per timegroup. Caution that accidentally including huge numbers of +#' rows within timegroups can overload your machine since all pairwise distances +#' are calculated within each timegroup. +#' #' #' The \code{splitBy} argument offers further control over grouping. If within #' your \code{DT}, you have multiple populations, subgroups or other distinct @@ -34,12 +41,11 @@ #' @return \code{group_pts} returns the input \code{DT} appended with a #' \code{group} column. #' -#' This column represents the spatial (and if \code{timegroup} was provided - -#' spatiotemporal) group. As with the other grouping functions, the actual -#' value of \code{group} is arbitrary and represents the identity of a given -#' group where 1 or more individuals are assigned to a group. If the data was -#' reordered, the \code{group} may change, but the contents of each group -#' would not. +#' This column represents the spatialtemporal group. As with the other +#' grouping functions, the actual value of \code{group} is arbitrary and +#' represents the identity of a given group where 1 or more individuals are +#' assigned to a group. If the data was reordered, the \code{group} may +#' change, but the contents of each group would not. #' #' A message is returned when a column named \code{group} already exists in #' the input \code{DT}, because it will be overwritten. @@ -50,7 +56,7 @@ #' coordinates #' @param id Character string of ID column name #' @param coords Character vector of X coordinate and Y coordinate column names -#' @param timegroup timegroup field in the DT upon which the grouping will be +#' @param timegroup timegroup field in the DT within which the grouping will be #' calculated #' @param splitBy (optional) character string or vector of grouping column #' name(s) upon which the grouping will be calculated diff --git a/man/edge_dist.Rd b/man/edge_dist.Rd index acf6df4..aa63036 100644 --- a/man/edge_dist.Rd +++ b/man/edge_dist.Rd @@ -25,7 +25,7 @@ coordinates} \item{coords}{Character vector of X coordinate and Y coordinate column names} -\item{timegroup}{timegroup field in the DT upon which the grouping will be +\item{timegroup}{timegroup field in the DT within which the grouping will be calculated} \item{splitBy}{(optional) character string or vector of grouping column @@ -63,20 +63,25 @@ The \code{DT} must be a \code{data.table}. If your data is a \code{data.frame}, you can convert it by reference using \code{\link[data.table:setDT]{data.table::setDT}}. -The \code{id}, \code{coords} (and optional \code{timegroup} and -\code{splitBy}) arguments expect the names of a column in \code{DT} which -correspond to the individual identifier, X and Y coordinates, timegroup -(generated by \code{group_times}) and additional grouping columns. +The \code{id}, \code{coords} \code{timegroup} (and optional \code{splitBy}) +arguments expect the names of a column in \code{DT} which correspond to the +individual identifier, X and Y coordinates, timegroup (generated by +\code{group_times}) and additional grouping columns. If provided, the \code{threshold} must be provided in the units of the coordinates and must be larger than 0. If the \code{threshold} is NULL, the distance to all other individuals will be returned. The coordinates must be planar coordinates (e.g.: UTM). In the case of UTM, a \code{threshold} = 50 would indicate a 50m distance threshold. -The \code{timegroup} argument is optional, but recommended to pair with -\code{\link{group_times}}. The intended framework is to group rows temporally -with \code{\link{group_times}} then spatially with \code{edge_dist} (or -grouping functions). +The \code{timegroup} argument is required to define the temporal groups +within which edges are calculated. The intended framework is to group rows +temporally with \code{\link{group_times}} then spatially with \code{edge_dist}. +If you have already calculated temporal groups without +\code{\link{group_times}}, you can pass this column to the \code{timegroup} +argument. Note that the expectation is that each individual will be observed +only once per timegroup. Caution that accidentally including huge numbers of +rows within timegroups can overload your machine since all pairwise distances +are calculated within each timegroup. The \code{splitBy} argument offers further control over grouping. If within your \code{DT}, you have multiple populations, subgroups or other distinct diff --git a/man/edge_nn.Rd b/man/edge_nn.Rd index b19c3c9..4c4d10a 100644 --- a/man/edge_nn.Rd +++ b/man/edge_nn.Rd @@ -21,7 +21,7 @@ edge_nn( \item{coords}{Character vector of X coordinate and Y coordinate column names} -\item{timegroup}{timegroup field in the DT upon which the grouping will be +\item{timegroup}{timegroup field in the DT within which the grouping will be calculated} \item{splitBy}{(optional) character string or vector of grouping column @@ -60,20 +60,25 @@ The \code{DT} must be a \code{data.table}. If your data is a \code{data.frame}, you can convert it by reference using \code{\link[data.table:setDT]{data.table::setDT}}. -The \code{id}, \code{coords} (and optional \code{timegroup} and -\code{splitBy}) arguments expect the names of a column in \code{DT} which -correspond to the individual identifier, X and Y coordinates, timegroup -(generated by \code{group_times}) and additional grouping columns. +The \code{id}, \code{coords}, \code{timegroup} (and optional \code{splitBy}) +arguments expect the names of a column in \code{DT} which correspond to the +individual identifier, X and Y coordinates, timegroup (generated by +\code{group_times}) and additional grouping columns. The \code{threshold} must be provided in the units of the coordinates. The \code{threshold} must be larger than 0. The coordinates must be planar coordinates (e.g.: UTM). In the case of UTM, a \code{threshold} = 50 would indicate a 50m distance threshold. -The \code{timegroup} argument is optional, but recommended to pair with -\code{\link{group_times}}. The intended framework is to group rows temporally -with \code{\link{group_times}} then spatially with \code{edge_nn} (or -grouping functions). +The \code{timegroup} argument is required to define the temporal groups +within which edge nearest neighbours are calculated. The intended framework +is to group rows temporally with \code{\link{group_times}} then spatially +with \code{edge_nn}. If you have already calculated temporal groups without +\code{\link{group_times}}, you can pass this column to the \code{timegroup} +argument. Note that the expectation is that each individual will be observed +only once per timegroup. Caution that accidentally including huge numbers of +rows within timegroups can overload your machine since all pairwise distances +are calculated within each timegroup. The \code{splitBy} argument offers further control over grouping. If within your \code{DT}, you have multiple populations, subgroups or other distinct diff --git a/man/group_lines.Rd b/man/group_lines.Rd index c3e9f6a..3ee168b 100644 --- a/man/group_lines.Rd +++ b/man/group_lines.Rd @@ -30,7 +30,7 @@ the projection argument is 'EPSG:32736'. See details.} \item{coords}{Character vector of X coordinate and Y coordinate column names} -\item{timegroup}{timegroup field in the DT upon which the grouping will be +\item{timegroup}{timegroup field in the DT within which the grouping will be calculated} \item{sortBy}{Character string of date time column(s) to sort rows by. Must diff --git a/man/group_pts.Rd b/man/group_pts.Rd index b20b6b7..d38a628 100644 --- a/man/group_pts.Rd +++ b/man/group_pts.Rd @@ -23,7 +23,7 @@ coordinates} \item{coords}{Character vector of X coordinate and Y coordinate column names} -\item{timegroup}{timegroup field in the DT upon which the grouping will be +\item{timegroup}{timegroup field in the DT within which the grouping will be calculated} \item{splitBy}{(optional) character string or vector of grouping column @@ -33,12 +33,11 @@ name(s) upon which the grouping will be calculated} \code{group_pts} returns the input \code{DT} appended with a \code{group} column. -This column represents the spatial (and if \code{timegroup} was provided - -spatiotemporal) group. As with the other grouping functions, the actual -value of \code{group} is arbitrary and represents the identity of a given -group where 1 or more individuals are assigned to a group. If the data was -reordered, the \code{group} may change, but the contents of each group -would not. +This column represents the spatialtemporal group. As with the other +grouping functions, the actual value of \code{group} is arbitrary and +represents the identity of a given group where 1 or more individuals are +assigned to a group. If the data was reordered, the \code{group} may +change, but the contents of each group would not. A message is returned when a column named \code{group} already exists in the input \code{DT}, because it will be overwritten. @@ -55,20 +54,26 @@ The \code{DT} must be a \code{data.table}. If your data is a \code{data.frame}, you can convert it by reference using \code{\link[data.table:setDT]{data.table::setDT}}. -The \code{id}, \code{coords} (and optional \code{timegroup} and -\code{splitBy}) arguments expect the names of a column in \code{DT} which -correspond to the individual identifier, X and Y coordinates, timegroup -(generated by \code{group_times}) and additional grouping columns. +The \code{id}, \code{coords}, \code{timegroup} (and optional \code{splitBy}) +arguments expect the names of a column in \code{DT} which correspond to the +individual identifier, X and Y coordinates, timegroup (typically generated by +\code{group_times}) and additional grouping columns. The \code{threshold} must be provided in the units of the coordinates. The \code{threshold} must be larger than 0. The coordinates must be planar coordinates (e.g.: UTM). In the case of UTM, a \code{threshold} = 50 would indicate a 50m distance threshold. -The \code{timegroup} argument is optional, but recommended to pair with -\code{\link{group_times}}. The intended framework is to group rows temporally -with \code{\link{group_times}} then spatially with \code{group_pts} (or -\code{\link{group_lines}}, \code{\link{group_polys}}). +The \code{timegroup} argument is required to define the temporal groups +within which spatial groups are calculated. The intended framework is to +group rows temporally with \code{\link{group_times}} then spatially with +\code{group_pts} (or \code{\link{group_lines}}, \code{\link{group_polys}}). +If you have already calculated temporal groups without +\code{\link{group_times}}, you can pass this column to the \code{timegroup} +argument. Note that the expectation is that each individual will be observed +only once per timegroup. Caution that accidentally including huge numbers of +rows within timegroups can overload your machine since all pairwise distances +are calculated within each timegroup. The \code{splitBy} argument offers further control over grouping. If within your \code{DT}, you have multiple populations, subgroups or other distinct diff --git a/vignettes/faq.Rmd b/vignettes/faq.Rmd index 7ba0ef8..ae52837 100644 --- a/vignettes/faq.Rmd +++ b/vignettes/faq.Rmd @@ -164,7 +164,7 @@ This warning is returned to the user when the `threshold` with unit days does no * `threshold`: threshold for grouping * `id`: column name of IDs in `DT` * `coords`: column names of x and y coordinates in `DT` -* `timegroup`: (optional) column name of time group +* `timegroup`: column name of time group * `splitBy`: (optional) column names of extra variables to group on ### DT @@ -217,7 +217,7 @@ This warning is explicitly verbose, to ensure we are considering the updated use The `sortBy` argument expects a date time formatted column name, which is used to order the rows for each individual (and `splitBy`). ## group_polys -`group_polys(DT, area, hrType, hrParams, projection, id, coords, timegroup, splitBy, spLines)` +`group_polys(DT, area, hrType, hrParams, projection, id, coords, splitBy, spLines)` * `DT`: input `data.table` * `area`: boolean argument if proportional area should be returned @@ -226,7 +226,6 @@ The `sortBy` argument expects a date time formatted column name, which is used t * `projection`: projection of coordinates in `DT` * `id`: column name of IDs in `DT` * `coords`: column names of x and y coordinates in `DT` -* `timegroup`: (optional) column name of time group * `splitBy`: (optional) column names of extra variables to group on * `spPolys`: alternatively, provide solely a `SpatialPolygons` object @@ -280,7 +279,7 @@ group_polys( * `threshold`: threshold for grouping * `id`: column name of IDs in `DT` * `coords`: column names of x and y coordinates in `DT` -* `timegroup`: (optional) column name of time group +* `timegroup`: column name of time group * `splitBy`: (optional) column names of extra variables to group on * `fillNA`: boolean indicating if NAs should be returned for individuals that were not within the threshold distance of any other. If TRUE, NAs are returned. If FALSE, only edges between individuals within the threshold distance are returned. @@ -294,7 +293,7 @@ This is the non-chain rule implementation similar to `group_pts`. Edges are defi * `DT`: input `data.table` * `id`: column name of IDs in `DT` * `coords`: column names of x and y coordinates in `DT` -* `timegroup`: (optional) column name of time group +* `timegroup`: column name of time group * `splitBy`: (optional) column names of extra variables to group on * `threshold`: (optional) spatial distance threshold to set maximum distance between an individual and their neighbour.