Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy / confidence interval? #15

Closed
rrmn opened this issue May 21, 2019 · 8 comments
Closed

Accuracy / confidence interval? #15

rrmn opened this issue May 21, 2019 · 8 comments

Comments

@rrmn
Copy link

rrmn commented May 21, 2019

Hey guys, fantastic package. Love the use of data.table.

One question: I am working with semi-processed GPS data that has an accuracy variable attached. This accuracy variable indicates the 95% confidence interval (in meters radius) around the unprojected lon / lat coordinates.
For example, with coordinates of c(5, 42) and acc = 15 the subject was somewhere ±15 meters around the specified coordinates. This accuracy can also change with every observation of every subject. Essentially, this can be thought of as extending the threshold of one subject-observation to thresh <- thresh + 2 * acc.

Is there a way to deal with this complication in your package?

@rrmn rrmn changed the title Accuracy parameters? Accuracy / confidence interval? May 21, 2019
@rrmn
Copy link
Author

rrmn commented May 29, 2019

A (crude and somewhat contrived) example could be:

set.seed(1702)
x <- runif(10, -2, 2)
y <- runif(10, 42, 52)
# Accuracy in meters radius
accu <- runif(10, 3, 50)

distMatrix <- as.matrix(stats::dist(cbind(x, y)))

m <- outer(accu, accu, "+")

for(i in 1:nrow(m)){
    for(j in 1:ncol(m)) {
        if(i == j) {
            m[i, j] <- 0
        }
    }
    
}

distMatrix <- distMatrix + m

graphAdj <- igraph::graph_from_adjacency_matrix(distMatrix <= 50)

igraph::plot.igraph((graphAdj))

plot_zoom

@robitalec
Copy link
Member

Hey @RomanAbashin!
Thanks, glad to hear you like it.

Interesting, so you'd hope to combine the accuracy of pairs of individuals and add that to the distance matrix, before considering the threshold? This is a ± accuracy right?

What kind of GPS data is this?

@robitalec
Copy link
Member

robitalec commented May 31, 2019

I have another idea, what about combining your accuracy columns with edge_dist?

I'm hoping to (optionally) return the distances between individuals with edge_dist soon (#17).

timegroup ID1 ID2 Distance
1 G B 19.2
1 H E 2.3
1 B G 11.4
1 E H 50.3
2 H E 82.1

Then you could merge your accuracy values, e.g.:

ID accuracy
B 9.6
E 12.3
G 21.1
H 30.2

generating:

timegroup ID1 ID2 Distance accuracy1 accuracy2
1 G B 19.2 21.1 9.6
1 H E 2.3 30.2 12.3
1 B G 11.4 9.6 21.1
1 E H 50.3 12.3 30.2
2 H E 82.1 30.2 12.3

and finally, you could sum the accuracy and see if the distance is within the accuracy and the actual threshold, maybe setting the threshold in the edge_dist call to your maximum pairwise accuracy (say G, H above for 51.3) + actual threshold (let's say 100m) for a total of 151.3. This way you'd catch all potential edges.

Does that make sense?

@robitalec robitalec added type: enhancement new features, improvements type: support labels May 31, 2019
@rrmn
Copy link
Author

rrmn commented Jun 5, 2019

Hi @robitalec ,
sorry for the late reply, had to wrap my head around edge_dist first.

The data I'm working with is basically dumped smartphone location data. It looks roughly like this:

   user   datetime            accuracy   lon   lat
   <chr>  <dttm>                 <dbl> <dbl> <dbl>
 1 User_A 2018-07-29 12:17:26     15.0  4.13  45.4
 2 User_A 2018-07-29 12:17:26     15.0  4.13  45.4
 3 User_A 2018-07-29 12:17:27     15.0  4.13  45.4
 4 User_A 2018-07-29 12:17:28     16.1  4.13  45.4
 5 User_A 2018-07-29 12:17:29     15.0  4.13  45.4
 6 User_A 2018-07-29 12:17:30     13.9  4.13  45.4
 7 User_A 2018-07-29 12:17:31     13.9  4.13  45.4
 8 User_A 2018-07-29 12:17:32     12.9  4.13  45.4
 9 User_A 2018-07-29 12:17:33     12.9  4.13  45.4
10 User_A 2018-07-29 12:17:34     12.9  4.13  45.4

Let's take row 4: it means that, at 12:17:28, User A was somewhere in a radius of 16.1 meters around the lon / lat coordinates 4.13 / 45.4.

As you see, every observation of each user's position has its own accuracy (= radius). Therefore, the method mentioned above (with a fixed accuracy parameter) would not work, right?

You are, however, right on the money with the end goal: I would like to build dyadic groups of users — but the varying accuracy of observations (which is inevitable due to different environmental and technological factors) is a pain in the behind.

@robitalec
Copy link
Member

@RomanAbashin, if every row has a different accuracy measurement than you could do the same join I describe above, but making sure to include the timegroup in the join.

To update you, I just merged the optional distance return for edge_dist. (Update with devtools)

e.g.:

library(spatsoc)
library(data.table)

# Read package example data
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))

# Cast the character column to POSIXct
DT[, datetime := as.POSIXct(datetime, tz = 'UTC')]

# Temporal grouping
group_times(DT, datetime = 'datetime', threshold = '20 minutes')


# !! ---    Fake an accuracy column, which differs for each individual and fix ---
DT[, accuracy := runif(.N, 3, 50)]

# Edge list generation
edges <- edge_dist(
  DT,
  threshold = 100,
  id = 'ID',
  coords = c('X', 'Y'),
  timegroup = 'timegroup',
  returnDist = TRUE,
  fillNA = TRUE
)

# !! --- Merge ---
m1 <- merge(
  edges,
  DT[, .(ID, timegroup, accuracy1 = accuracy)],
  by.x = c('ID1', 'timegroup'),
  by.y = c('ID', 'timegroup')
)

m2 <- merge(
  m1,
  DT[, .(ID, timegroup, accuracy2 = accuracy)],
  by.x = c('ID2', 'timegroup'),
  by.y = c('ID', 'timegroup')
)
ID2 timegroup ID1 distance accuracy1 accuracy2
B 1 G 5.783 41.32 39.95
E 1 H 65.062 43.15 25.62
G 1 B 5.783 39.95 41.32
H 1 E 65.062 25.62 43.15

Then you could combine your accuracy measurements with the distance between individuals. And this is where I was talking about selecting a threshold for your case, maybe it being a combination of the maximum combination of possible pairs of accuracies and the intended threshold, then subsetting by the accuracy adjusted distances afterwards.

Let me know how that sounds!

@rrmn
Copy link
Author

rrmn commented Jun 7, 2019

@robitalec — Ha, this is fantastic! This is pretty much what I'm doing by hand with left joins and one run of an adapted distGeo() right now. Thank you a lot.

@rrmn
Copy link
Author

rrmn commented Jun 7, 2019

Having that in mind, would maybe something like a time_dist make sense for #18 ?

@robitalec
Copy link
Member

You're welcome.
Closing this, thanks for the great example for edge_dist!

(I'll respond there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants