Skip to content

Commit

Permalink
Merge pull request #14 from ropensci/vignette/edge-lists
Browse files Browse the repository at this point in the history
add vignette describing using edge list and dyad id functions
  • Loading branch information
robitalec committed Mar 25, 2020
2 parents c580b5c + e962130 commit aebf335
Show file tree
Hide file tree
Showing 6 changed files with 203 additions and 15 deletions.
4 changes: 2 additions & 2 deletions R/edge_nn.R
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,11 @@
#' group_times(DT, datetime = 'datetime', threshold = '20 minutes')
#'
#' # Edge list generation
#' edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
#' edges <- edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
#' timegroup = 'timegroup')
#'
#' # Edge list generation using maximum distance threshold
#' edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
#' edges <- edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
#' timegroup = 'timegroup', threshold = 100)
#'
#' # Edge list generation, returning distance between nearest neighbours
Expand Down
3 changes: 3 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ articles:
intro-spatsoc: intro-spatsoc.html
faq: faq.html
using-in-sna: using-in-sna.html
using-edge-and-dyad: using-edge-and-dyad.html


destination: ../spatsoc.gitlab.io/public
Expand Down Expand Up @@ -69,6 +70,8 @@ navbar:
href: articles/faq.html
- text: Using spatsoc in social network analysis
href: articles/using-in-sna.html
- text: Using edge list and dyad id functions
href: articles/using-edge-and-dyad.html
news:
text: "News"
href: news/index.html
Expand Down
4 changes: 2 additions & 2 deletions man/edge_nn.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 17 additions & 6 deletions vignettes/faq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ knitr::opts_chunk$set(message = FALSE,
echo = TRUE)
```

spatsoc is an R package for detecting spatial and temporal groups in GPS relocations. It can be used to build proximity-based social networks using gambit-of-the-group format and edge-lists. . In addition, the randomization function provides data-stream randomization methods suitable for GPS data.
spatsoc is an R package for detecting spatial and temporal groups in GPS relocations. It can be used to build proximity-based social networks using gambit-of-the-group format and edge-lists. In addition, the randomization function provides data-stream randomization methods suitable for GPS data.

# Usage
`spatsoc` leverages `data.table` to modify by reference and iteratively work on subsets of the input data. The first input for all functions in `spatsoc` is `DT`, an input `data.table`. If your data is a `data.frame`, you can convert it by reference using `setDT(DF)`.
Expand Down Expand Up @@ -58,7 +58,7 @@ knitr::kable(DT[order(group, timegroup)][1:5, .(ID, X, Y, datetime, timegroup, g


## Social network analysis
See the vignette about [using spatsoc in social network analysis](http://spatsoc.gitlab.io/articles/using-in-sna.html).
See the vignette about [using spatsoc in social network analysis](http://spatsoc.robitalec.ca/articles/using-in-sna.html).


# Installation
Expand Down Expand Up @@ -256,6 +256,8 @@ group_polys(

This is the non-chain rule implementation similar to `group_pts`. Edges are defined by the distance threshold and NAs are returned for individuals within each timegroup if they are not within the threshold distance of any other individual (if `fillNA` is TRUE).

**See the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html) for details about the `edge_dist` function.**

## edge_nn
`edge_nn(DT = NULL, id = NULL, coords = NULL, timegroup = NULL, splitBy = NULL, threshold = NULL)`

Expand All @@ -269,6 +271,8 @@ This is the non-chain rule implementation similar to `group_pts`. Edges are defi
This function can be used to generate edge lists defined either by nearest neighbour or nearest neighbour with a maximum distance. NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and it's nearest neighbour is greater than the threshold.


**See the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html) for details about the `edge_nn` function.**

## randomizations
`randomizations(DT, type, id, datetime, splitBy, iterations)`

Expand All @@ -280,27 +284,34 @@ This function can be used to generate edge lists defined either by nearest neigh
* `iterations`: The number of iterations to randomize


**See the vignette [Using spatsoc in social network analysis](http://spatsoc.gitlab.io/articles/using-in-sna.html) for details about the `randomizations` function (specifically the section 'Data stream randomization')**
**See the vignette [Using spatsoc in social network analysis](http://spatsoc.robitalec.ca/articles/using-in-sna.html) for details about the `randomizations` function (specifically the section 'Data stream randomization')**


# Package design
## Don't I need to reassign to save the output?

(Almost) all functions in `spatsoc` use data.table's modify-by-reference to reduce recopying large datasets and improve performance. The exceptions are `group_polys(area = TRUE)` and `randomizations`.
(Almost) all functions in `spatsoc` use data.table's modify-by-reference to reduce recopying large datasets and improve performance. The exceptions are `group_polys(area = TRUE)`, `randomizations` and the edge list generating functions `edge_dist` and `edge_nn`.


## Why does a function print the result, but columns aren't added to my DT?

Check that your `data.table` has columns allocated (with `data.table::truelength`) and if not, use `data.table::setDT`. This can happen if you are reading your data from `RDS` or `RData` files. [See here.](https://cran.r-project.org/package=data.table/vignettes/datatable-faq.html#reading-data.table-from-rds-or-rdata-file)
Check that your `data.table` has columns allocated (with `data.table::truelength`) and if not, use `data.table::setDT` or `data.table::alloc.col`. This can happen if you are reading your data from `RDS` or `RData` files. [See here.](https://cran.r-project.org/package=data.table/vignettes/datatable-faq.html#reading-data.table-from-rds-or-rdata-file)

```{r alloc}
```{r setdt}
if (truelength(DT) == 0) {
setDT(DT)
}
# then go to spatsoc
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
```

or simply:

```{r alloc}
DT <- readRDS('path/to/data.Rds')
alloc.col(DT)
```




Expand Down
171 changes: 171 additions & 0 deletions vignettes/using-edge-and-dyad.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
title: "Using edge list generating functions and dyad_id"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Using edge list generating functions and dyad_id}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
eval = FALSE,
echo = TRUE,
comment = "#>"
)
```



`spatsoc` can be used in social network analysis to generate edge lists from GPS relocation data.


Edge lists are generated using either the `edge_dist` or the `edge_nn` function.


**Note**: The grouping functions and their application in social network analysis are further described in the vignette [Using spatsoc in social network analysis - grouping functions](http://spatsoc.robitalec.ca/articles/using-in-sna.html).


## Generate edge lists
spatsoc provides users with one temporal (`group_times`) and two edge list generating functions (`edge_dist`, `edge_nn`) to generate edge lists from GPS relocations. Users can consider edges defined by either the spatial proximity between individuals (with `edge_dist`), by nearest neighbour (with `edge_nn`) or by nearest neighbour with a maximum distance (with `edge_nn`). The edge lists can be used directly by the animal social network package `asnipe` to generate networks.

### 1. Load packages and prepare data
`spatsoc` expects a `data.table` for all `DT` arguments and date time columns to be formatted `POSIXct`.

```{r, message = FALSE, warning = FALSE, eval = TRUE}
## Load packages
library(spatsoc)
library(data.table)
## Read data as a data.table
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
## Cast datetime column to POSIXct
DT[, datetime := as.POSIXct(datetime)]
```


Next, we will group relocations temporally with `group_times` and generate edges lists with one of `edge_dist`, `edge_dist`. Note: these are mutually exclusive, only select one edge list generating function at a time.

### 2. a) `edge_dist`

Distance based edge lists where relocations in each timegroup are considered edges if they are within the spatial distance defined by the user with the `threshold` argument. Depending on species and study system, relevant temporal and spatial distance thresholds are used. In this case, relocations within 5 minutes and 50 meters are considered edges.

This is the non-chain rule implementation similar to `group_pts`. Edges are defined by the distance threshold and NAs are returned for individuals within each timegroup if they are not within the threshold distance of any other individual (if `fillNA` is TRUE).

Optionally, `edge_dist` can return the distances between individuals (less than the threshold) in a column named 'distance' with argument `returnDist = TRUE`.

```{r, eval = TRUE}
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
# Edge list generation
edges <- edge_dist(
DT,
threshold = 100,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
returnDist = TRUE,
fillNA = TRUE
)
```

### 2. b) `edge_nn`

Nearest neighbour based edge lists where each individual is connected to their nearest neighbour. `edge_nn` can be used to generate edge lists defined either by nearest neighbour or nearest neighbour with a maximum distance. As with grouping functions and `edge_dist`, temporal and spatial threshold depend on species and study system.

NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and its nearest neighbour is greater than the threshold.

Optionally, `edge_nn` can return the distances between individuals (less than the threshold) in a column named 'distance' with argument `returnDist = TRUE`.

```{r, eval = FALSE}
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
# Edge list generation
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup'
)
# Edge list generation using maximum distance threshold
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
threshold = 100
)
# Edge list generation using maximum distance threshold, returning distances
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
threshold = 100,
returnDist = TRUE
)
```


## Dyads

### 3. `dyad_id`

The function `dyad_id` can be used to generate a unique, undirected dyad identifier for edge lists.

```{r, eval = TRUE}
# In this case, using the edges generated in 2. a) edge_dist
dyad_id(edges, id1 = 'ID1', id2 = 'ID2')
```


Once we have generated dyad ids, we can measure consecutive relocations, start and end relocation, etc. **Note:** since the edges are duplicated A-B and B-A, you will need to use the unique timegroup*dyadID or divide counts by 2.


### 4. Dyad stats

```{r, eval = TRUE}
# Get the unique dyads by timegroup
dyads <- unique(edges, by = c('timegroup', 'dyadID'))
# Set the order of the rows
setorder(dyads, timegroup)
## Count number of timegroups dyads are observed together
dyads[, nObs := .N, by = .(dyadID)]
## Count consecutive relocations together
# Shift the timegroup within dyadIDs
dyads[, shifttimegrp := shift(timegroup, 1), by = dyadID]
# Difference between consecutive timegroups for each dyadID
# where difftimegrp == 1, the dyads remained together in consecutive timegroups
dyads[, difftimegrp := timegroup - shifttimegrp]
# Run id of diff timegroups
dyads[, runid := rleid(difftimegrp), by = dyadID]
# N consecutive observations of dyadIDs
dyads[, runCount := fifelse(difftimegrp == 1, .N, NA_integer_), by = .(runid, dyadID)]
## Start and end of consecutive relocations for each dyad
# Dont consider where rows aren't a run (na for runCount)
dyads[!is.na(runCount), start := fifelse(timegroup == min(timegroup), TRUE, NA), by = .(runid, dyadID)]
dyads[!is.na(runCount), end := fifelse(timegroup == max(timegroup), TRUE, NA), by = .(runid, dyadID)]
## Example output
dyads[dyadID == 'B-H',
.(timegroup, nObs, shifttimegrp, difftimegrp, runid, runCount, start, end)]
```

<!-- mean xy, todo -->

13 changes: 8 additions & 5 deletions vignettes/using-in-sna.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: "Using spatsoc in social network analysis"
title: "Using spatsoc in social network analysis - grouping functions"
author: "Alec Robitaille, Quinn Webber and Eric Vander Wal"
date: "`r Sys.Date()`"
output:
output:
rmarkdown::html_vignette:
number_sections: false
toc: false
Expand Down Expand Up @@ -33,6 +33,9 @@ Data stream randomization is performed using the `randomizations` function.

Group by individual matrices are generated using the `get_gbi` function.


**Note**: edge list generating functions are also available and are described in the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html).

# Generate gambit of the group data
spatsoc provides users with one temporal (`group_times`) and three spatial (`group_pts`, `group_lines`, `group_polys`) functions to generate gambit of the group data from GPS relocations. Users can consider spatial grouping at three different scales combined with an appropriate temporal grouping threshold. The gambit of the group data is then used to generate a group by individual matrix and build the network.

Expand Down Expand Up @@ -138,7 +141,7 @@ areaDT <- group_polys(
```

# Build observed network
Once we've created groups using `group_times` and one of the spatial grouping functions, we can generate a group by individual matrix.
Once we've generated groups using `group_times` and one of the spatial grouping functions, we can generate a group by individual matrix.

The following code chunk showing `get_gbi` can be used for outputs from any of `group_pts`, `group_lines` or `group_polys(area = FALSE)`. For the purpose of this vignette however, we will consider the outputs from `group_pts` ([2. a)](#a-group_pts)) for the following code chunk.

Expand Down Expand Up @@ -183,7 +186,7 @@ Note: the `coords` argument is only required for trajectory type randomization,


## 5. a) `type = 'step'`
`'step'` randomizes identities of relocations between individuals within each time step. The `datetime` argument expects an integer group created by `group_times`. The `group` argument expects the column name of the group generated from the spatial grouping functions.
`'step'` randomizes identities of relocations between individuals within each time step. The `datetime` argument expects an integer group generated by `group_times`. The `group` argument expects the column name of the group generated from the spatial grouping functions.

Four columns are returned when `type = 'step'` along with `id`, `datetime` and `splitBy` columns:

Expand Down Expand Up @@ -409,7 +412,7 @@ observed <- data.table(
## 8. Calculate random network metrics
With the list of random networks from [6.](#build-random-network), we can generate a list of graphs with `igraph::graph.adjacency` (for example) and calculate random network metrics.

This example uses the `netLs` created by [6. a)](#a-type-step-1) which was split by year and iteration.
This example uses the `netLs` generated by [6. a)](#a-type-step-1) which was split by year and iteration.

```{r}
## Generate graph and calculate network metrics
Expand Down

0 comments on commit aebf335

Please sign in to comment.