Merge pull request #14 from ropensci/vignette/edge-lists

add vignette describing using edge list and dyad id functions
ropensci · Mar 25, 2020 · aebf335 · aebf335
2 parents c580b5c + e962130
commit aebf335
Show file tree

Hide file tree

Showing 6 changed files with 203 additions and 15 deletions.
diff --git a/R/edge_nn.R b/R/edge_nn.R
@@ -71,11 +71,11 @@
 #' group_times(DT, datetime = 'datetime', threshold = '20 minutes')
 #'
 #' # Edge list generation
-#' edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
+#' edges <- edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
 #'         timegroup = 'timegroup')
 #'
 #' # Edge list generation using maximum distance threshold
-#' edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
+#' edges <- edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
 #'         timegroup = 'timegroup', threshold = 100)
 #'
 #' # Edge list generation, returning distance between nearest neighbours

diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -5,6 +5,7 @@ articles:
   intro-spatsoc: intro-spatsoc.html
   faq: faq.html
   using-in-sna: using-in-sna.html
+  using-edge-and-dyad: using-edge-and-dyad.html
 
 
 destination: ../spatsoc.gitlab.io/public
@@ -69,6 +70,8 @@ navbar:
         href: articles/faq.html
       - text: Using spatsoc in social network analysis
         href: articles/using-in-sna.html
+      - text: Using edge list and dyad id functions
+        href: articles/using-edge-and-dyad.html
     news:
       text: "News"
       href: news/index.html

diff --git a/man/edge_nn.Rd b/man/edge_nn.Rd
diff --git a/vignettes/faq.Rmd b/vignettes/faq.Rmd
@@ -19,7 +19,7 @@ knitr::opts_chunk$set(message = FALSE,
                       echo = TRUE)
 ```
 
-spatsoc is an R package for detecting spatial and temporal groups in GPS relocations. It can be used to build proximity-based social networks using gambit-of-the-group format and edge-lists. . In addition, the randomization function provides data-stream randomization methods suitable for GPS data.
+spatsoc is an R package for detecting spatial and temporal groups in GPS relocations. It can be used to build proximity-based social networks using gambit-of-the-group format and edge-lists. In addition, the randomization function provides data-stream randomization methods suitable for GPS data.
 
 # Usage
 `spatsoc` leverages `data.table` to modify by reference and iteratively work on subsets of the input data. The first input for all functions in `spatsoc` is `DT`, an input `data.table`. If your data is a `data.frame`, you can convert it by reference using `setDT(DF)`. 
@@ -58,7 +58,7 @@ knitr::kable(DT[order(group, timegroup)][1:5, .(ID, X, Y, datetime, timegroup, g
 
 
 ## Social network analysis
-See the vignette about [using spatsoc in social network analysis](http://spatsoc.gitlab.io/articles/using-in-sna.html).
+See the vignette about [using spatsoc in social network analysis](http://spatsoc.robitalec.ca/articles/using-in-sna.html).
 
 
 # Installation
@@ -256,6 +256,8 @@ group_polys(
 
 This is the non-chain rule implementation similar to `group_pts`. Edges are defined by the distance threshold and NAs are returned for individuals within each timegroup if they are not within the threshold distance of any other individual (if `fillNA` is TRUE). 
 
+**See the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html) for details about the `edge_dist` function.**
+
 ## edge_nn
 `edge_nn(DT = NULL, id = NULL, coords = NULL, timegroup = NULL, splitBy = NULL, threshold = NULL)`
 
@@ -269,6 +271,8 @@ This is the non-chain rule implementation similar to `group_pts`. Edges are defi
 This function can be used to generate edge lists defined either by nearest neighbour or nearest neighbour with a maximum distance. NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and it's nearest neighbour is greater than the threshold. 
 
 
+**See the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html) for details about the `edge_nn` function.**
+
 ## randomizations
 `randomizations(DT, type, id, datetime, splitBy, iterations)`
 
@@ -280,27 +284,34 @@ This function can be used to generate edge lists defined either by nearest neigh
 * `iterations`: The number of iterations to randomize
 
 
-**See the vignette [Using spatsoc in social network analysis](http://spatsoc.gitlab.io/articles/using-in-sna.html) for details about the `randomizations` function (specifically the section 'Data stream randomization')**
+**See the vignette [Using spatsoc in social network analysis](http://spatsoc.robitalec.ca/articles/using-in-sna.html) for details about the `randomizations` function (specifically the section 'Data stream randomization')**
 
 
 # Package design
 ## Don't I need to reassign to save the output?
 
-(Almost) all functions in `spatsoc` use data.table's modify-by-reference to reduce recopying large datasets and improve performance. The exceptions are `group_polys(area = TRUE)` and `randomizations`.
+(Almost) all functions in `spatsoc` use data.table's modify-by-reference to reduce recopying large datasets and improve performance. The exceptions are `group_polys(area = TRUE)`, `randomizations` and the edge list generating functions `edge_dist` and `edge_nn`.
 
 
 ## Why does a function print the result, but columns aren't added to my DT?
 
-Check that your `data.table` has columns allocated (with `data.table::truelength`) and if not, use `data.table::setDT`. This can happen if you are reading your data from `RDS` or `RData` files.  [See here.](https://cran.r-project.org/package=data.table/vignettes/datatable-faq.html#reading-data.table-from-rds-or-rdata-file)
+Check that your `data.table` has columns allocated (with `data.table::truelength`) and if not, use `data.table::setDT` or `data.table::alloc.col`. This can happen if you are reading your data from `RDS` or `RData` files.  [See here.](https://cran.r-project.org/package=data.table/vignettes/datatable-faq.html#reading-data.table-from-rds-or-rdata-file)
 
-```{r alloc}
+```{r setdt}
 if (truelength(DT) == 0) {
   setDT(DT)
 }
 # then go to spatsoc
 group_times(DT, datetime = 'datetime', threshold = '5 minutes')
 ```
 
+or simply:
+
+```{r alloc}
+DT <- readRDS('path/to/data.Rds')
+alloc.col(DT)
+```
+
 
 
 

diff --git a/vignettes/using-edge-and-dyad.Rmd b/vignettes/using-edge-and-dyad.Rmd
@@ -0,0 +1,171 @@
+---
+title: "Using edge list generating functions and dyad_id"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Using edge list generating functions and dyad_id}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  eval = FALSE,
+  echo = TRUE,
+  comment = "#>"
+)
+```
+
+
+
+`spatsoc` can be used in social network analysis to generate edge lists from GPS relocation data. 
+
+
+Edge lists are generated using either the `edge_dist` or the `edge_nn` function. 
+
+
+**Note**: The grouping functions and their application in social network analysis are further described in the vignette [Using spatsoc in social network analysis - grouping functions](http://spatsoc.robitalec.ca/articles/using-in-sna.html). 
+
+
+## Generate edge lists
+spatsoc provides users with one temporal (`group_times`) and two edge list generating functions (`edge_dist`, `edge_nn`) to generate edge lists from GPS relocations. Users can consider edges defined by either the spatial proximity between individuals (with `edge_dist`), by nearest neighbour (with `edge_nn`) or by nearest neighbour with a maximum distance (with `edge_nn`). The edge lists can be used directly by the animal social network package `asnipe` to generate networks. 
+
+### 1. Load packages and prepare data
+`spatsoc` expects a `data.table` for all `DT` arguments and date time columns to be formatted `POSIXct`. 
+
+```{r, message = FALSE, warning = FALSE, eval = TRUE}
+## Load packages
+library(spatsoc)
+library(data.table)
+
+## Read data as a data.table
+DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
+
+## Cast datetime column to POSIXct
+DT[, datetime := as.POSIXct(datetime)]
+```
+
+
+Next, we will group relocations temporally with `group_times` and generate edges lists with one of `edge_dist`, `edge_dist`. Note: these are mutually exclusive, only select one edge list generating function at a time. 
+
+### 2. a) `edge_dist` 
+
+Distance based edge lists where relocations in each timegroup are considered edges if they are within the spatial distance defined by the user with the `threshold` argument. Depending on species and study system, relevant temporal and spatial distance thresholds are used. In this case, relocations within 5 minutes and 50 meters are considered edges. 
+
+This is the non-chain rule implementation similar to `group_pts`. Edges are defined by the distance threshold and NAs are returned for individuals within each timegroup if they are not within the threshold distance of any other individual (if `fillNA` is TRUE). 
+
+Optionally, `edge_dist` can return the distances between individuals (less than the threshold) in a column named 'distance' with argument `returnDist = TRUE`. 
+
+```{r, eval = TRUE}
+# Temporal groups
+group_times(DT, datetime = 'datetime', threshold = '5 minutes')
+
+# Edge list generation
+edges <- edge_dist(
+  DT,
+  threshold = 100,
+  id = 'ID',
+  coords = c('X', 'Y'),
+  timegroup = 'timegroup',
+  returnDist = TRUE,
+  fillNA = TRUE
+)
+```
+
+### 2. b) `edge_nn`
+
+Nearest neighbour based edge lists where each individual is connected to their nearest neighbour. `edge_nn` can be used to generate edge lists defined either by nearest neighbour or nearest neighbour with a maximum distance. As with grouping functions and `edge_dist`, temporal and spatial threshold depend on  species and study system. 
+
+NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and its nearest neighbour is greater than the threshold. 
+
+Optionally, `edge_nn` can return the distances between individuals (less than the threshold) in a column named 'distance' with argument `returnDist = TRUE`. 
+
+```{r, eval = FALSE}
+# Temporal groups
+group_times(DT, datetime = 'datetime', threshold = '5 minutes')
+
+# Edge list generation
+edges <- edge_nn(
+  DT,
+  id = 'ID',
+  coords = c('X', 'Y'),
+  timegroup = 'timegroup'
+)
+
+# Edge list generation using maximum distance threshold
+edges <- edge_nn(
+  DT, 
+  id = 'ID', 
+  coords = c('X', 'Y'),
+  timegroup = 'timegroup', 
+  threshold = 100
+)
+
+# Edge list generation using maximum distance threshold, returning distances
+edges <- edge_nn(
+  DT, 
+  id = 'ID', 
+  coords = c('X', 'Y'),
+  timegroup = 'timegroup', 
+  threshold = 100,
+  returnDist = TRUE
+)
+
+```
+
+
+## Dyads
+
+### 3. `dyad_id`
+
+The function `dyad_id` can be used to generate a unique, undirected dyad identifier for edge lists. 
+
+```{r, eval = TRUE}
+# In this case, using the edges generated in 2. a) edge_dist
+dyad_id(edges, id1 = 'ID1', id2 = 'ID2')
+```
+
+
+Once we have generated dyad ids, we can measure consecutive relocations, start and end relocation, etc. **Note:** since the edges are duplicated A-B and B-A, you will need to use the unique timegroup*dyadID or divide counts by 2. 
+
+
+### 4. Dyad stats
+
+```{r, eval = TRUE}
+# Get the unique dyads by timegroup
+dyads <- unique(edges, by = c('timegroup', 'dyadID'))
+
+# Set the order of the rows
+setorder(dyads, timegroup)
+
+## Count number of timegroups dyads are observed together
+dyads[, nObs := .N, by = .(dyadID)]
+
+## Count consecutive relocations together
+# Shift the timegroup within dyadIDs
+dyads[, shifttimegrp := shift(timegroup, 1), by =  dyadID]
+
+# Difference between consecutive timegroups for each dyadID
+# where difftimegrp == 1, the dyads remained together in consecutive timegroups
+dyads[, difftimegrp := timegroup - shifttimegrp]
+
+
+# Run id of diff timegroups
+dyads[, runid := rleid(difftimegrp), by = dyadID]
+
+# N consecutive observations of dyadIDs
+dyads[, runCount := fifelse(difftimegrp == 1, .N, NA_integer_), by = .(runid, dyadID)]
+
+## Start and end of consecutive relocations for each dyad
+# Dont consider where rows aren't a run (na for runCount)
+dyads[!is.na(runCount), start := fifelse(timegroup == min(timegroup), TRUE, NA), by = .(runid, dyadID)]
+
+dyads[!is.na(runCount), end := fifelse(timegroup == max(timegroup), TRUE, NA), by = .(runid, dyadID)]
+
+## Example output
+dyads[dyadID == 'B-H', 
+      .(timegroup, nObs, shifttimegrp, difftimegrp, runid, runCount, start, end)]
+```
+
+<!-- mean xy, todo -->
+
diff --git a/vignettes/using-in-sna.Rmd b/vignettes/using-in-sna.Rmd
@@ -1,8 +1,8 @@
 ---
-title: "Using spatsoc in social network analysis"
+title: "Using spatsoc in social network analysis - grouping functions"
 author: "Alec Robitaille, Quinn Webber and Eric Vander Wal"
 date: "`r Sys.Date()`"
-output: 
+output:
   rmarkdown::html_vignette:
     number_sections: false
     toc: false
@@ -33,6 +33,9 @@ Data stream randomization is performed using the `randomizations` function.
 
 Group by individual matrices are generated using the `get_gbi` function. 
 
+
+**Note**: edge list generating functions are also available and are described in the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html). 
+
 # Generate gambit of the group data
 spatsoc provides users with one temporal (`group_times`) and three spatial (`group_pts`, `group_lines`, `group_polys`) functions to generate gambit of the group data from GPS relocations. Users can consider spatial grouping at three different scales combined with an appropriate temporal grouping threshold. The gambit of the group data is then used to generate a group by individual matrix and build the network. 
 
@@ -138,7 +141,7 @@ areaDT <- group_polys(
 ```
 
 # Build observed network 
-Once we've created groups using `group_times` and one of the spatial grouping functions, we can generate a group by individual matrix. 
+Once we've generated groups using `group_times` and one of the spatial grouping functions, we can generate a group by individual matrix. 
 
 The following code chunk showing `get_gbi` can be used for outputs from any of `group_pts`, `group_lines` or `group_polys(area = FALSE)`. For the purpose of this vignette however, we will consider the outputs from `group_pts` ([2. a)](#a-group_pts)) for the following code chunk.
 
@@ -183,7 +186,7 @@ Note: the `coords` argument is only required for trajectory type randomization,
 
 
 ## 5. a) `type = 'step'`
-`'step'` randomizes identities of relocations between individuals within each time step. The `datetime` argument expects an integer group created by `group_times`. The `group` argument expects the column name of the group generated from the spatial grouping functions. 
+`'step'` randomizes identities of relocations between individuals within each time step. The `datetime` argument expects an integer group generated by `group_times`. The `group` argument expects the column name of the group generated from the spatial grouping functions. 
 
 Four columns are returned when `type = 'step'` along with `id`, `datetime` and `splitBy` columns:
 
@@ -409,7 +412,7 @@ observed <- data.table(
 ## 8. Calculate random network metrics
 With the list of random networks from [6.](#build-random-network), we can generate a list of graphs with `igraph::graph.adjacency` (for example) and calculate random network metrics. 
 
-This example uses the `netLs` created by [6. a)](#a-type-step-1) which was split by year and iteration. 
+This example uses the `netLs` generated by [6. a)](#a-type-step-1) which was split by year and iteration. 
 
 ```{r}
 ## Generate graph and calculate network metrics