Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use spatial indices for sharedPaths() #960

Merged
merged 1 commit into from
Jan 5, 2023

Conversation

jeffreyhanson
Copy link
Contributor

@jeffreyhanson jeffreyhanson commented Dec 26, 2022

Hi @rhijmans,

Please ignore this PR until you're back to work after the holiday break. I'm just posting this now, so I don't forget stuff while it's all fresh in my mind.

I've been playing around with the terra::sharedPaths() function and it looks extremely useful - thanks so much for adding it! I was wondering if it would be possible to increase its performance, and I noticed that the C++ code doesn't use spatial indices. So, I had a go at updating the C++ code to use spatial indices for the terra::sharedPaths() calcations. I think I've got it working correctly and it seems that it can now run much faster too, so I thought I'd submit the changes as a PR.

Here's a brief description of the changes.

  1. I've updated the C++ SpatVector::shared_paths() methods (defined in geos_methods.cpp and spat_vector.h) to have an additional parameter (bool index). This parameter indicates whether spatial indices should be used or not during processing. This was done to ensure consistency with other functions in the codebase (e.g., SpatVector::relate(), https://github.com/rspatial/terra/blob/master/src/geos_methods.cpp#L1319) which have an index parameter to indicate if spatial indices should be used.
  2. I've updated the C++ SpatVector::shared_paths() methods (defined in geos_methods.cpp) to use spatial indices (via GEOSSTRtree_query_r) when the new index parameter is true. The implementation is largely based on SpatVector::relate() (defined in geos_methods.cpp). If the index parameter is false, then the spatial indices are not used for processing and it basically uses the same C++ code as the current version of the package.
  3. I've moved the callbck function to the top of the geos_method.cpp file. This is so that the SpatVector::shared_paths() methods can use it. I was getting compilation errors before I moved this function.
  4. I've updated the R sharedPaths() function (defined in geom.R) to use the spatial indices by default.

Also, to help verify code correctness, I've written a short script (see below). This script compares results when using the spatial indices for processing to results when not using the spatial indices. These tests are based on examples for sharedPaths(). Also, I've added a test to verify that the changes improve performance -- otherwise the PR wouldn't really be worth merging -- using a larger spatial dataset. On my computer, the run time decreases from 6 seconds to 0.5 seconds when using the spatial indices.

Please let me know if there's any further changes I can make to help get this PR accepted? I'm also happy to explain any of the changes if anythings not clear?


Test script

# load package
## load developmental version of terra
devtools::load_all()
## load package with example data
library(prioritizrdata)
## load packages for comparing geometries
library(sf)
library(testthat)

# define function for running sharedPaths with index = FALSE,
# this corresponds to current implementation of terra::sharedPaths()
sharedPaths_old <- function(x, y=NULL) {
	if (is.null(y)) {
		x@ptr <- x@ptr$shared_paths(FALSE)
	} else {
		x@ptr <- x@ptr$shared_paths2(y@ptr, FALSE)
	}
	messages(x, "sharedPaths")
}

# load test data
## built in terra dataset
f <- system.file("ex/lux.shp", package = "terra")
v1 <- vect(f)
## larger spatial dataset with lots of shared paths
data(tas_pu, package = "prioritizrdata")
v2 <- terra::vect(tas_pu)

# perform tests
## note that we convert the vect objects to SpatialPolygonsDataFrame
## objects so we can test for differences between outputs

# Test 1: ensure that sharedLength(x) yields same result
## N.B. this is from the terra documentation
test1_x <- sharedPaths_old(v1)
test1_y <- sharedPaths(v1)

print('test1_x')
print(test1_x)
print('test1_y')
print(test1_y)

expect_equal(
  sf::as_Spatial(sf::st_as_sf(test1_x)),
  sf::as_Spatial(sf::st_as_sf(test1_y))
)

# Test 2: ensure that sharedLength(x, y) yields same result
test2_x <- sharedPaths_old(v1[1:3, ], v1[10:14, ])
test2_y <- sharedPaths(v1[1:3, ], v1[10:14, ])

print('test2_x')
print(test2_x)
print('test2_y')
print(test2_y)

expect_equal(
  sf::as_Spatial(sf::st_as_sf(test2_x)),
  sf::as_Spatial(sf::st_as_sf(test2_y))
)

# Test 3: ensure that we actually see a speed up for large data
test3_x_time <- system.time(test3_x <- sharedPaths_old(v2))
test3_y_time <- system.time(test3_y <- sharedPaths(v2))

print('test3_x')
print(test3_x)
print('test3_y')
print(test3_y)

expect_equal(
  sf::as_Spatial(sf::st_as_sf(test3_x)),
  sf::as_Spatial(sf::st_as_sf(test3_y))
)

message("run time (seconds) without spatial index: ", test3_x_time[[3]])
message("run time (seconds) with spatial index: ", test3_y_time[[3]])

# print success
message("Success!")

@rhijmans rhijmans merged commit 234342f into rspatial:master Jan 5, 2023
@rhijmans
Copy link
Member

rhijmans commented Jan 5, 2023

Thank you very much!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants