Workflow_Confirm.Rmd

---
title: "Workflow_Confirm"
author: "ChenZiYing (Sophie)"
date: "2023-04-28"
output:
  html_document:
    css: tutorial.css
    fig_caption: yes
    highlight: textmate
    theme: flatly
    toc: yes
    toc_float: yes
---

```{r setup, include=FALSE,warning=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


1. Load all packages we need. Please refer to **Statistical Packages**.

```{r}
# Load required packages
library(sp)
library(dplyr)
library(sf)
library(spatstat)
library(maptools)
library(rgdal)
```

2. Data processing: 

```{r}
# Load Oreoscoptes data
df <- read.csv("BC_data.csv", stringsAsFactors = FALSE)

# Generate a dataframe with longitude and latitude in BC_data
coord <- df[,c('decimalLatitude','decimalLongitude')]

# Clean the data by removing the observations with NA
coord <- na.omit(coord)

# Visualize the data 
plot(coord$decimalLatitude, coord$decimalLongitude)

# Convert to CRS
coordinates(coord)<- ~decimalLongitude+decimalLatitude
    
proj4string(coord) <- CRS("+proj=longlat +datum=WGS84")
    
coord_conv <- spTransform(coord,CRS("+proj=aea +lat_0=45 +lon_0=-126 +lat_1=50 +lat_2=58.5 +x_0=1000000 +y_0=0 +datum=NAD83 +units=m +no_defs"))

# Load the covariates data
load("BC_Covariates.Rda")

# Define the observation area, window
win = as.owin(DATA$Window)

# create ppp object with coordination and the window 
ppp_birds <- ppp(coord_conv@coords[,1], 
                 coord_conv@coords[,2], 
                 window=win)
```

3. Inspecting and Exploring data

```{r, warning=FALSE}
# Check the spread of the bird `ppp_birds`
plot(ppp_birds)

# Confirm inhomogeneity
Q <- quadratcount(ppp_birds, nx = 10, ny = 10)
quadrat.test(Q)

# Hot spot analysis to identify areas of elevated intensity
## Estimate R
R <- bw.ppl(ppp_birds)

## Calculate test statistic
LR <- scanLRTS(ppp_birds, r = R)

## Plot the output 
plot(LR)

# Identify the class of each potential covariate we have
sapply(DATA, class)

# Visualise each potential covariate using methods appropriate to each object class.
potential <- c("Window", "Elevation", "Forest", "HFI", "Dist_Water")

for (i in potential){
  plot(DATA[[i]], main = i)
  points(ppp_birds$x, ppp_birds$y, pch = 19, col = "white", cex = 0.6)
  points(ppp_birds$x, ppp_birds$y, pch = 19, col = "black", cex = 0.4)
}

# Check relationships between potential covariates via kernel estimation 
rho_elev <- rhohat(ppp_birds, DATA$Elevation)
rho_fore <- rhohat(ppp_birds, DATA$Forest)
rho_hfi <- rhohat(ppp_birds, DATA$HFI)
rho_dist_water <- rhohat(ppp_birds, DATA$Dist_Water)

plot(rho_elev, xlim = c(0, max(DATA$Elevation)))
plot(rho_fore, xlim = c(0, max(DATA$Forest)))
plot(rho_hfi, xlim = c(0, max(DATA$HFI)))
plot(rho_dist_water, xlim = c(0, max(DATA$Dist_Water)))
```
4. Model Building - Regression
```{r}
# Scale the covariates by their mean and variance
mu_elev <- mean(DATA$Elevation)
stdev_elev <- sd(DATA$Elevation)
DATA$Elevation_scaled <- eval.im((Elevation - mu_elev)/stdev_elev, DATA)

mu_water <- mean(DATA$Dist_Water)
stdev_water <- sd(DATA$Dist_Water)
DATA$Dist_Water_scaled <- eval.im((Dist_Water - mu_water)/stdev_water, DATA)
  
# Establish formula1:
formula1 <- ppp_birds ~ Elevation_scaled + I(Elevation_scaled^2) + Dist_Water_scaled + I(Dist_Water_scaled^2)
  
# Fit the model with formula1 by 
fit1 <- ppm(formula1, data = DATA).

# Because the model did not converge, we simplified our formula by excluding *I(Elevation_scaled^2)*.
formula2 <- ppp_birds ~ Elevation_scaled + Dist_Water_scaled + I(Dist_Water_scaled^2)
  
# Fit the model with formula2 by 
fit2 <- ppm(formula2, data = DATA).

# Test the statistical significance of coefficients by looking at the Ztest from`fit2` table.
fit2

# Identify the formula with coefficients
co <- coef(fit2)
```

5. Model Validation
```{r}
# Model visualisation
plot(fit2,
     se = FALSE,
     superimpose = FALSE)

#Overlay the B. pendula locations
plot(ppp_birds,
     pch = 16,
     cex = 0.7,
     cols = "white",
     add = TRUE)
plot(ppp_birds,
     pch = 16,
     cex = 0.5,
     cols = "black",
     add = TRUE)

#Mean disw
#E_disw <- mean(DATA$Dist_Water_scaled) # just 0

#Elevational effect on lambda at mean disw
elev_effect <- effectfun(fit2, "Elevation_scaled", Dist_Water_scaled = 0, se.fit = T)

disw_effect <- effectfun(fit2, "Dist_Water_scaled", Elevation_scaled = 0, se.fit = T)


#Side by side plotting
par(mfrow = c(1,2))

#Plot the elevation effect 

plot(elev_effect,
     legend = FALSE,
     main = "Elevation effect at mean distance to water ")
plot(disw_effect,
     legend = FALSE,
     main = "Distance to Water  effect at mean elevation ")

```

# Quardratic Counting by `quadrat.test(fit2, nx = 2, ny = 4)`

# Partial Residual Plots
```


## Model Validation 
### Quadrat counting
```{r}
#Run the quadrat test
quadrat.test(fit2, nx = 2, ny = 4)
```
The small p value tells us that there’s a significant deviation from our model’s predictions. Let us dig out how to improve it.


### Partial residual plots
```{r}
#Calculate the partial residuals as a function of elevation
par_res_elev <- parres(fit2, "Elevation_scaled")

#Calculate the relative intensity as a function of gradient
par_res_disw <- parres(fit2, "Dist_Water_scaled")

#Side by side plotting
par(mfrow = c(1,2))
plot(par_res_elev,
     legend = FALSE,
     lwd = 2,
     main = "",
     xlab = "Elevation_scaled")
plot(par_res_disw,
     legend = FALSE,
     lwd = 2,
     main = "",
     xlab = "Dist_Water_scaled)")
```

From these figures we can see that the quadratic terms are not capturing the patterns in our data particularly well. As an improvement, we could try adding higher-order polynomials, but polynomials can be unstable. In this situation, it may be worth switching from a linear modelling framework, to an additive modelling framework (i.e., GAMs). 

### GAM Not Valid,still 'df' was too small
```{r}
library(splines)

#Fit the PPP model
fit_smooth <- ppm(ppp_birds ~ bs(Elevation_scaled,2) + bs(Dist_Water_scaled, 3), data = DATA, use.gam = TRUE)

fit_smooth
```
```{r}
#Calculate the partial residuals as a function of elevation
par_res_elev <- parres(fit_smooth, "Elevation_scaled")

#Calculate the relative intensity as a function of gradient
par_res_disw <- parres(fit_smooth, "Dist_Water_scaled")

#Side by side plotting
par(mfrow = c(1,2))
plot(par_res_elev,
     legend = FALSE,
     lwd = 2,
     main = "",
     xlab = "Elevation_scaled")
plot(par_res_disw,
     legend = FALSE,
     lwd = 2,
     main = "",
     xlab = "Dist_Water_scaled)")
```

```{r}
#Plot the model predictions
plot(fit_smooth,
     se = FALSE,
     superimpose = FALSE)

#Overlay the B. pendula locations
plot(ppp_birds,
     pch = 16,
     cex = 0.7,
     cols = "white",
     add = TRUE)
plot(ppp_birds,
     pch = 16,
     cex = 0.5,
     cols = "black",
     add = TRUE)
```

- For a `GAM`, model, it looks like we do not have enough data to fit.