Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add vignette describing using edge list and dyad id functions #14

Merged
merged 35 commits into from
Mar 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a3407be
update original using-in-sna vignette's title
robitalec Apr 27, 2019
1a75076
fst using edge list functions vignette
robitalec Apr 27, 2019
10a0354
add link to edge vignette in using in vignette
robitalec Apr 27, 2019
5617270
add links to using edge in faq
robitalec Apr 27, 2019
bd6149d
fix urls
robitalec Apr 27, 2019
cb2ebeb
add assignment to edge list examples
robitalec Apr 27, 2019
f0cce91
add edge intro
robitalec Apr 27, 2019
0d87ae2
add edge prep
robitalec Apr 27, 2019
bde3459
add edge_dist usage
robitalec Apr 27, 2019
68dad49
add edge_nn usage
robitalec Apr 27, 2019
0e1c39c
add igraph example
robitalec Apr 27, 2019
007e9db
add returnDist arg to vignette
robitalec Jun 6, 2019
478e56e
add returnDist arg to edge_nn in vignette
robitalec Jul 10, 2019
1d1798d
fix chunk opts
robitalec Jul 10, 2019
512e9af
add dyad stats chunk to add
robitalec Mar 25, 2020
b2a35e1
update title
robitalec Mar 25, 2020
05c4b67
rm dyad stats for now
robitalec Mar 25, 2020
bd8fefc
drop extra packages
robitalec Mar 25, 2020
4f9c19f
minor edits
robitalec Mar 25, 2020
b2a0971
fst dyad section
robitalec Mar 25, 2020
59191cd
rename file
robitalec Mar 25, 2020
0b9ad17
minor faq adjustments
robitalec Mar 25, 2020
76cce33
add back dyad stats
robitalec Mar 25, 2020
9f81d8c
fix unique
robitalec Mar 25, 2020
e97f940
calc n observations of each dyad
robitalec Mar 25, 2020
39a17ba
calc consecutive relocations
robitalec Mar 25, 2020
46db4fb
set knitr opts
robitalec Mar 25, 2020
7d6b2d4
rm scrap dyad stats
robitalec Mar 25, 2020
1673746
fix header levels and chunk options
robitalec Mar 25, 2020
5147989
next steps
robitalec Mar 25, 2020
4f85e06
flag start and end for each run
robitalec Mar 25, 2020
2048278
leave mean xy as todo
robitalec Mar 25, 2020
c1b4f71
update _pkgdown.yml
robitalec Mar 25, 2020
aa87f5b
print an example output
robitalec Mar 25, 2020
e962130
Merge branch 'master' into vignette/edge-lists
robitalec Mar 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions R/edge_nn.R
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,11 @@
#' group_times(DT, datetime = 'datetime', threshold = '20 minutes')
#'
#' # Edge list generation
#' edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
#' edges <- edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
#' timegroup = 'timegroup')
#'
#' # Edge list generation using maximum distance threshold
#' edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
#' edges <- edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
#' timegroup = 'timegroup', threshold = 100)
#'
#' # Edge list generation, returning distance between nearest neighbours
Expand Down
3 changes: 3 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ articles:
intro-spatsoc: intro-spatsoc.html
faq: faq.html
using-in-sna: using-in-sna.html
using-edge-and-dyad: using-edge-and-dyad.html


destination: ../spatsoc.gitlab.io/public
Expand Down Expand Up @@ -69,6 +70,8 @@ navbar:
href: articles/faq.html
- text: Using spatsoc in social network analysis
href: articles/using-in-sna.html
- text: Using edge list and dyad id functions
href: articles/using-edge-and-dyad.html
news:
text: "News"
href: news/index.html
Expand Down
4 changes: 2 additions & 2 deletions man/edge_nn.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 17 additions & 6 deletions vignettes/faq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ knitr::opts_chunk$set(message = FALSE,
echo = TRUE)
```

spatsoc is an R package for detecting spatial and temporal groups in GPS relocations. It can be used to build proximity-based social networks using gambit-of-the-group format and edge-lists. . In addition, the randomization function provides data-stream randomization methods suitable for GPS data.
spatsoc is an R package for detecting spatial and temporal groups in GPS relocations. It can be used to build proximity-based social networks using gambit-of-the-group format and edge-lists. In addition, the randomization function provides data-stream randomization methods suitable for GPS data.

# Usage
`spatsoc` leverages `data.table` to modify by reference and iteratively work on subsets of the input data. The first input for all functions in `spatsoc` is `DT`, an input `data.table`. If your data is a `data.frame`, you can convert it by reference using `setDT(DF)`.
Expand Down Expand Up @@ -58,7 +58,7 @@ knitr::kable(DT[order(group, timegroup)][1:5, .(ID, X, Y, datetime, timegroup, g


## Social network analysis
See the vignette about [using spatsoc in social network analysis](http://spatsoc.gitlab.io/articles/using-in-sna.html).
See the vignette about [using spatsoc in social network analysis](http://spatsoc.robitalec.ca/articles/using-in-sna.html).


# Installation
Expand Down Expand Up @@ -256,6 +256,8 @@ group_polys(

This is the non-chain rule implementation similar to `group_pts`. Edges are defined by the distance threshold and NAs are returned for individuals within each timegroup if they are not within the threshold distance of any other individual (if `fillNA` is TRUE).

**See the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html) for details about the `edge_dist` function.**

## edge_nn
`edge_nn(DT = NULL, id = NULL, coords = NULL, timegroup = NULL, splitBy = NULL, threshold = NULL)`

Expand All @@ -269,6 +271,8 @@ This is the non-chain rule implementation similar to `group_pts`. Edges are defi
This function can be used to generate edge lists defined either by nearest neighbour or nearest neighbour with a maximum distance. NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and it's nearest neighbour is greater than the threshold.


**See the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html) for details about the `edge_nn` function.**

## randomizations
`randomizations(DT, type, id, datetime, splitBy, iterations)`

Expand All @@ -280,27 +284,34 @@ This function can be used to generate edge lists defined either by nearest neigh
* `iterations`: The number of iterations to randomize


**See the vignette [Using spatsoc in social network analysis](http://spatsoc.gitlab.io/articles/using-in-sna.html) for details about the `randomizations` function (specifically the section 'Data stream randomization')**
**See the vignette [Using spatsoc in social network analysis](http://spatsoc.robitalec.ca/articles/using-in-sna.html) for details about the `randomizations` function (specifically the section 'Data stream randomization')**


# Package design
## Don't I need to reassign to save the output?

(Almost) all functions in `spatsoc` use data.table's modify-by-reference to reduce recopying large datasets and improve performance. The exceptions are `group_polys(area = TRUE)` and `randomizations`.
(Almost) all functions in `spatsoc` use data.table's modify-by-reference to reduce recopying large datasets and improve performance. The exceptions are `group_polys(area = TRUE)`, `randomizations` and the edge list generating functions `edge_dist` and `edge_nn`.


## Why does a function print the result, but columns aren't added to my DT?

Check that your `data.table` has columns allocated (with `data.table::truelength`) and if not, use `data.table::setDT`. This can happen if you are reading your data from `RDS` or `RData` files. [See here.](https://cran.r-project.org/package=data.table/vignettes/datatable-faq.html#reading-data.table-from-rds-or-rdata-file)
Check that your `data.table` has columns allocated (with `data.table::truelength`) and if not, use `data.table::setDT` or `data.table::alloc.col`. This can happen if you are reading your data from `RDS` or `RData` files. [See here.](https://cran.r-project.org/package=data.table/vignettes/datatable-faq.html#reading-data.table-from-rds-or-rdata-file)

```{r alloc}
```{r setdt}
if (truelength(DT) == 0) {
setDT(DT)
}
# then go to spatsoc
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
```

or simply:

```{r alloc}
DT <- readRDS('path/to/data.Rds')
alloc.col(DT)
```




Expand Down
171 changes: 171 additions & 0 deletions vignettes/using-edge-and-dyad.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
title: "Using edge list generating functions and dyad_id"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Using edge list generating functions and dyad_id}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
eval = FALSE,
echo = TRUE,
comment = "#>"
)
```



`spatsoc` can be used in social network analysis to generate edge lists from GPS relocation data.


Edge lists are generated using either the `edge_dist` or the `edge_nn` function.


**Note**: The grouping functions and their application in social network analysis are further described in the vignette [Using spatsoc in social network analysis - grouping functions](http://spatsoc.robitalec.ca/articles/using-in-sna.html).


## Generate edge lists
spatsoc provides users with one temporal (`group_times`) and two edge list generating functions (`edge_dist`, `edge_nn`) to generate edge lists from GPS relocations. Users can consider edges defined by either the spatial proximity between individuals (with `edge_dist`), by nearest neighbour (with `edge_nn`) or by nearest neighbour with a maximum distance (with `edge_nn`). The edge lists can be used directly by the animal social network package `asnipe` to generate networks.

### 1. Load packages and prepare data
`spatsoc` expects a `data.table` for all `DT` arguments and date time columns to be formatted `POSIXct`.

```{r, message = FALSE, warning = FALSE, eval = TRUE}
## Load packages
library(spatsoc)
library(data.table)

## Read data as a data.table
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))

## Cast datetime column to POSIXct
DT[, datetime := as.POSIXct(datetime)]
```


Next, we will group relocations temporally with `group_times` and generate edges lists with one of `edge_dist`, `edge_dist`. Note: these are mutually exclusive, only select one edge list generating function at a time.

### 2. a) `edge_dist`

Distance based edge lists where relocations in each timegroup are considered edges if they are within the spatial distance defined by the user with the `threshold` argument. Depending on species and study system, relevant temporal and spatial distance thresholds are used. In this case, relocations within 5 minutes and 50 meters are considered edges.

This is the non-chain rule implementation similar to `group_pts`. Edges are defined by the distance threshold and NAs are returned for individuals within each timegroup if they are not within the threshold distance of any other individual (if `fillNA` is TRUE).

Optionally, `edge_dist` can return the distances between individuals (less than the threshold) in a column named 'distance' with argument `returnDist = TRUE`.

```{r, eval = TRUE}
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')

# Edge list generation
edges <- edge_dist(
DT,
threshold = 100,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
returnDist = TRUE,
fillNA = TRUE
)
```

### 2. b) `edge_nn`

Nearest neighbour based edge lists where each individual is connected to their nearest neighbour. `edge_nn` can be used to generate edge lists defined either by nearest neighbour or nearest neighbour with a maximum distance. As with grouping functions and `edge_dist`, temporal and spatial threshold depend on species and study system.

NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and its nearest neighbour is greater than the threshold.

Optionally, `edge_nn` can return the distances between individuals (less than the threshold) in a column named 'distance' with argument `returnDist = TRUE`.

```{r, eval = FALSE}
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')

# Edge list generation
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup'
)

# Edge list generation using maximum distance threshold
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
threshold = 100
)

# Edge list generation using maximum distance threshold, returning distances
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
threshold = 100,
returnDist = TRUE
)

```


## Dyads

### 3. `dyad_id`

The function `dyad_id` can be used to generate a unique, undirected dyad identifier for edge lists.

```{r, eval = TRUE}
# In this case, using the edges generated in 2. a) edge_dist
dyad_id(edges, id1 = 'ID1', id2 = 'ID2')
```


Once we have generated dyad ids, we can measure consecutive relocations, start and end relocation, etc. **Note:** since the edges are duplicated A-B and B-A, you will need to use the unique timegroup*dyadID or divide counts by 2.


### 4. Dyad stats

```{r, eval = TRUE}
# Get the unique dyads by timegroup
dyads <- unique(edges, by = c('timegroup', 'dyadID'))

# Set the order of the rows
setorder(dyads, timegroup)

## Count number of timegroups dyads are observed together
dyads[, nObs := .N, by = .(dyadID)]

## Count consecutive relocations together
# Shift the timegroup within dyadIDs
dyads[, shifttimegrp := shift(timegroup, 1), by = dyadID]

# Difference between consecutive timegroups for each dyadID
# where difftimegrp == 1, the dyads remained together in consecutive timegroups
dyads[, difftimegrp := timegroup - shifttimegrp]


# Run id of diff timegroups
dyads[, runid := rleid(difftimegrp), by = dyadID]

# N consecutive observations of dyadIDs
dyads[, runCount := fifelse(difftimegrp == 1, .N, NA_integer_), by = .(runid, dyadID)]

## Start and end of consecutive relocations for each dyad
# Dont consider where rows aren't a run (na for runCount)
dyads[!is.na(runCount), start := fifelse(timegroup == min(timegroup), TRUE, NA), by = .(runid, dyadID)]

dyads[!is.na(runCount), end := fifelse(timegroup == max(timegroup), TRUE, NA), by = .(runid, dyadID)]

## Example output
dyads[dyadID == 'B-H',
.(timegroup, nObs, shifttimegrp, difftimegrp, runid, runCount, start, end)]
```

<!-- mean xy, todo -->

13 changes: 8 additions & 5 deletions vignettes/using-in-sna.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: "Using spatsoc in social network analysis"
title: "Using spatsoc in social network analysis - grouping functions"
author: "Alec Robitaille, Quinn Webber and Eric Vander Wal"
date: "`r Sys.Date()`"
output:
output:
rmarkdown::html_vignette:
number_sections: false
toc: false
Expand Down Expand Up @@ -33,6 +33,9 @@ Data stream randomization is performed using the `randomizations` function.

Group by individual matrices are generated using the `get_gbi` function.


**Note**: edge list generating functions are also available and are described in the vignette [Using spatsoc in social network analysis - edge list generating functions](http://spatsoc.robitalec.ca/articles/using-edge.html).

# Generate gambit of the group data
spatsoc provides users with one temporal (`group_times`) and three spatial (`group_pts`, `group_lines`, `group_polys`) functions to generate gambit of the group data from GPS relocations. Users can consider spatial grouping at three different scales combined with an appropriate temporal grouping threshold. The gambit of the group data is then used to generate a group by individual matrix and build the network.

Expand Down Expand Up @@ -138,7 +141,7 @@ areaDT <- group_polys(
```

# Build observed network
Once we've created groups using `group_times` and one of the spatial grouping functions, we can generate a group by individual matrix.
Once we've generated groups using `group_times` and one of the spatial grouping functions, we can generate a group by individual matrix.

The following code chunk showing `get_gbi` can be used for outputs from any of `group_pts`, `group_lines` or `group_polys(area = FALSE)`. For the purpose of this vignette however, we will consider the outputs from `group_pts` ([2. a)](#a-group_pts)) for the following code chunk.

Expand Down Expand Up @@ -183,7 +186,7 @@ Note: the `coords` argument is only required for trajectory type randomization,


## 5. a) `type = 'step'`
`'step'` randomizes identities of relocations between individuals within each time step. The `datetime` argument expects an integer group created by `group_times`. The `group` argument expects the column name of the group generated from the spatial grouping functions.
`'step'` randomizes identities of relocations between individuals within each time step. The `datetime` argument expects an integer group generated by `group_times`. The `group` argument expects the column name of the group generated from the spatial grouping functions.

Four columns are returned when `type = 'step'` along with `id`, `datetime` and `splitBy` columns:

Expand Down Expand Up @@ -409,7 +412,7 @@ observed <- data.table(
## 8. Calculate random network metrics
With the list of random networks from [6.](#build-random-network), we can generate a list of graphs with `igraph::graph.adjacency` (for example) and calculate random network metrics.

This example uses the `netLs` created by [6. a)](#a-type-step-1) which was split by year and iteration.
This example uses the `netLs` generated by [6. a)](#a-type-step-1) which was split by year and iteration.

```{r}
## Generate graph and calculate network metrics
Expand Down