Accuracy / confidence interval? #15

rrmn · 2019-05-21T20:02:09Z

Hey guys, fantastic package. Love the use of data.table.

One question: I am working with semi-processed GPS data that has an accuracy variable attached. This accuracy variable indicates the 95% confidence interval (in meters radius) around the unprojected lon / lat coordinates.
For example, with coordinates of c(5, 42) and acc = 15 the subject was somewhere ±15 meters around the specified coordinates. This accuracy can also change with every observation of every subject. Essentially, this can be thought of as extending the threshold of one subject-observation to thresh <- thresh + 2 * acc.

Is there a way to deal with this complication in your package?

The text was updated successfully, but these errors were encountered:

rrmn · 2019-05-29T20:15:27Z

A (crude and somewhat contrived) example could be:

set.seed(1702)
x <- runif(10, -2, 2)
y <- runif(10, 42, 52)
# Accuracy in meters radius
accu <- runif(10, 3, 50)

distMatrix <- as.matrix(stats::dist(cbind(x, y)))

m <- outer(accu, accu, "+")

for(i in 1:nrow(m)){
    for(j in 1:ncol(m)) {
        if(i == j) {
            m[i, j] <- 0
        }
    }
    
}

distMatrix <- distMatrix + m

graphAdj <- igraph::graph_from_adjacency_matrix(distMatrix <= 50)

igraph::plot.igraph((graphAdj))

robitalec · 2019-05-31T00:23:05Z

Hey @RomanAbashin!
Thanks, glad to hear you like it.

Interesting, so you'd hope to combine the accuracy of pairs of individuals and add that to the distance matrix, before considering the threshold? This is a ± accuracy right?

What kind of GPS data is this?

robitalec · 2019-05-31T00:30:36Z

I have another idea, what about combining your accuracy columns with edge_dist?

I'm hoping to (optionally) return the distances between individuals with edge_dist soon (#17).

timegroup	ID1	ID2	Distance
1	G	B	19.2
1	H	E	2.3
1	B	G	11.4
1	E	H	50.3
2	H	E	82.1

Then you could merge your accuracy values, e.g.:

ID	accuracy
B	9.6
E	12.3
G	21.1
H	30.2

generating:

timegroup	ID1	ID2	Distance	accuracy1	accuracy2
1	G	B	19.2	21.1	9.6
1	H	E	2.3	30.2	12.3
1	B	G	11.4	9.6	21.1
1	E	H	50.3	12.3	30.2
2	H	E	82.1	30.2	12.3

and finally, you could sum the accuracy and see if the distance is within the accuracy and the actual threshold, maybe setting the threshold in the edge_dist call to your maximum pairwise accuracy (say G, H above for 51.3) + actual threshold (let's say 100m) for a total of 151.3. This way you'd catch all potential edges.

Does that make sense?

rrmn · 2019-06-05T20:52:36Z

Hi @robitalec ,
sorry for the late reply, had to wrap my head around edge_dist first.

The data I'm working with is basically dumped smartphone location data. It looks roughly like this:

   user   datetime            accuracy   lon   lat
   <chr>  <dttm>                 <dbl> <dbl> <dbl>
 1 User_A 2018-07-29 12:17:26     15.0  4.13  45.4
 2 User_A 2018-07-29 12:17:26     15.0  4.13  45.4
 3 User_A 2018-07-29 12:17:27     15.0  4.13  45.4
 4 User_A 2018-07-29 12:17:28     16.1  4.13  45.4
 5 User_A 2018-07-29 12:17:29     15.0  4.13  45.4
 6 User_A 2018-07-29 12:17:30     13.9  4.13  45.4
 7 User_A 2018-07-29 12:17:31     13.9  4.13  45.4
 8 User_A 2018-07-29 12:17:32     12.9  4.13  45.4
 9 User_A 2018-07-29 12:17:33     12.9  4.13  45.4
10 User_A 2018-07-29 12:17:34     12.9  4.13  45.4

Let's take row 4: it means that, at 12:17:28, User A was somewhere in a radius of 16.1 meters around the lon / lat coordinates 4.13 / 45.4.

As you see, every observation of each user's position has its own accuracy (= radius). Therefore, the method mentioned above (with a fixed accuracy parameter) would not work, right?

You are, however, right on the money with the end goal: I would like to build dyadic groups of users — but the varying accuracy of observations (which is inevitable due to different environmental and technological factors) is a pain in the behind.

robitalec · 2019-06-06T21:52:41Z

@RomanAbashin, if every row has a different accuracy measurement than you could do the same join I describe above, but making sure to include the timegroup in the join.

To update you, I just merged the optional distance return for edge_dist. (Update with devtools)

e.g.:

library(spatsoc)
library(data.table)

# Read package example data
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))

# Cast the character column to POSIXct
DT[, datetime := as.POSIXct(datetime, tz = 'UTC')]

# Temporal grouping
group_times(DT, datetime = 'datetime', threshold = '20 minutes')


# !! ---    Fake an accuracy column, which differs for each individual and fix ---
DT[, accuracy := runif(.N, 3, 50)]

# Edge list generation
edges <- edge_dist(
  DT,
  threshold = 100,
  id = 'ID',
  coords = c('X', 'Y'),
  timegroup = 'timegroup',
  returnDist = TRUE,
  fillNA = TRUE
)

# !! --- Merge ---
m1 <- merge(
  edges,
  DT[, .(ID, timegroup, accuracy1 = accuracy)],
  by.x = c('ID1', 'timegroup'),
  by.y = c('ID', 'timegroup')
)

m2 <- merge(
  m1,
  DT[, .(ID, timegroup, accuracy2 = accuracy)],
  by.x = c('ID2', 'timegroup'),
  by.y = c('ID', 'timegroup')
)

ID2	timegroup	ID1	distance	accuracy1	accuracy2
B	1	G	5.783	41.32	39.95
E	1	H	65.062	43.15	25.62
G	1	B	5.783	39.95	41.32
H	1	E	65.062	25.62	43.15

Then you could combine your accuracy measurements with the distance between individuals. And this is where I was talking about selecting a threshold for your case, maybe it being a combination of the maximum combination of possible pairs of accuracies and the intended threshold, then subsetting by the accuracy adjusted distances afterwards.

Let me know how that sounds!

rrmn · 2019-06-07T08:22:53Z

@robitalec — Ha, this is fantastic! This is pretty much what I'm doing by hand with left joins and one run of an adapted distGeo() right now. Thank you a lot.

rrmn · 2019-06-07T08:23:53Z

Having that in mind, would maybe something like a time_dist make sense for #18 ?

robitalec · 2019-06-07T18:17:12Z

You're welcome.
Closing this, thanks for the great example for edge_dist!

(I'll respond there)

rrmn changed the title ~~Accuracy parameters?~~ Accuracy / confidence interval? May 21, 2019

robitalec added type: enhancement new features, improvements type: support labels May 31, 2019

robitalec closed this as completed Jun 7, 2019

robitalec added the status: completed label Jun 7, 2019

robitalec mentioned this issue Jun 7, 2019

Observation-based timegroups #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy / confidence interval? #15

Accuracy / confidence interval? #15

rrmn commented May 21, 2019 •

edited

Loading

rrmn commented May 29, 2019

robitalec commented May 31, 2019

robitalec commented May 31, 2019 •

edited

Loading

rrmn commented Jun 5, 2019

robitalec commented Jun 6, 2019

rrmn commented Jun 7, 2019

rrmn commented Jun 7, 2019

robitalec commented Jun 7, 2019

Accuracy / confidence interval? #15

Accuracy / confidence interval? #15

Comments

rrmn commented May 21, 2019 • edited Loading

rrmn commented May 29, 2019

robitalec commented May 31, 2019

robitalec commented May 31, 2019 • edited Loading

rrmn commented Jun 5, 2019

robitalec commented Jun 6, 2019

rrmn commented Jun 7, 2019

rrmn commented Jun 7, 2019

robitalec commented Jun 7, 2019

rrmn commented May 21, 2019 •

edited

Loading

robitalec commented May 31, 2019 •

edited

Loading