04-using-tmb.Rmd

# Using `TMB`

We are now ready to present the main `TMB` code for calculating the negative log-likelihood of an HMM.

## Likelihood function
The function is defined in *[code/poi_hmm.cpp](#poi_hmm.cpp)* and imports the utility functions defined in *[functions/utils.cpp](#utils.cpp)*.
```{Rcpp 4poi_hmm.cpp, code=readLines("code/poi_hmm.cpp"), eval=FALSE}
```

Note that we use similar names as to the TMB side.

## Optimization
Before we can fit the model, we need to load some necessary packages and data files. We also need to compile the `C++` code and load the functions into our working environment in `R`.

(i) We start by compiling the `C++` code for computing the likelihood and its gradients. Once it is compiled, we can load the TMB likelihood object into `R` -- making it available from `R`. 

```{r 4init-example1, message = FALSE, warning = FALSE, results = FALSE}
# Load TMB and optimization packages
library(TMB)
# Run the C++ file containing the TMB code
TMB::compile("code/poi_hmm.cpp")
# Load it
dyn.load(dynlib("code/poi_hmm"))
```
Next, we need to load the necessary packages and the utility `R` functions.
```{r 4init-example2, message = FALSE, warning = FALSE, results = FALSE}
# Load the parameter transformation function
source("functions/utils.R")
```

(ii) The data are part of a large data set collected with the "Track Your Tinnitus" (TYT) mobile application, a detailed description of which is presented in @pryss and @pryssa.
We analyze 87 successive days of the "arousal" variable, which is measured on a discrete scale.
Higher values correspond to a higher degree of excitement, lower values to a more calm emotional state [for details, see @probst].
The data can be loaded by a simple call.
```{r 4load-example}
load("data/tinnitus.RData")
```

Figure \@ref(fig:4tinnitus-fig) presents the raw data in the [Table](#table:tinnitus_data) below, which are also available for download at *[data/tinnitus.RData](#tinnitus.RData)*.

```{r 4tinnitus-table, results = 'asis', echo = FALSE}
library(xtable)
temp <- paste(tinn_data, collapse = " ")
temp <- as.data.frame(temp)
temp <- xtable(temp,
               align = "lp{15cm}",
               caption = "TYT data. Observations collected by the TYT app on 87 successive days (from left to right).",
               label = "table:tinnitus_data")
print(temp, include.rownames = FALSE, include.colnames = FALSE,
      hline.after = c(0, nrow(temp)),
      type = "html")
```

```{r 4tinnitus-fig, echo = FALSE, warning = FALSE, fig.cap = "Plot of observations from TYT app data for 87 succesive days."}
plot(tinn_data, ylab = "Arousal", xlab = "Day (#)")
```

(iii) Initialization of the number of states and starting (or initial) values for the optimization.

First, the number of states needs to be determined. As explained by @pohlea, @pohle, and @zucchini[, Section 6] (to name only a few), usually one would first fit models with a different number of states. Then, these models are evaluated e.g. by means of model selection criteria [as carried out by @leroux] or prediction performance [@celeux].
As shown in [the appendix](#appendix), the model selection procedure shows that both AIC and BIC prefer a two-state model over a Poisson regression and over a model with three or four states. Consequently, we focus on the two-state model in the following.\
The list object `TMB_data` contains the data and the number of states.

```{r 4param-example1}
# Model with 2 states
m <- 2
TMB_data <- list(x = tinn_data, m = m)
```
Secondly, initial values for the optimization procedure need to be defined.
Although we will apply unconstrained optimization, we initialize the natural parameters, because this is much more intuitive and practical than handling the working parameters. 

```{r 4param-example2}
# Generate initial set of parameters for optimization
lambda <- c(1, 3)
gamma <- matrix(c(0.8, 0.2,
                  0.2, 0.8), byrow = TRUE, nrow = m)
```

(iv) Transformation from natural to working parameters

The previously created initial values are transformed and stored in the list `parameters` for the optimization procedure.

```{r 4pn2pw-example}
# Turn them into working parameters
parameters <- pois.HMM.pn2pw(m, lambda, gamma)
```

(v) Creation of the `TMB` negative log-likelihood function with its derivatives

This object, stored as `obj_tmb` requires the data, the initial values, and the previously created DLL as input.
Setting argument `silent = TRUE` disables tracing information and is only used here to avoid excessive output.

```{r 4makeadfun-example}
obj_tmb <- MakeADFun(TMB_data, parameters,
                     DLL = "poi_hmm", silent = TRUE)
```

This object also contains the previously defined initial values as a vector (`par`) rather than a list.
The negative log-likelihood (`fn`), its gradient (`gr`), and Hessian (`he`) are functions of the parameters (in vector form) while the data are considered fixed.
These functions are available to the user and can be evaluated for instance at the initial parameter values: 

```{r 4makeadfun-example2}
obj_tmb$par
obj_tmb$fn(obj_tmb$par)
obj_tmb$gr(obj_tmb$par)
obj_tmb$he(obj_tmb$par)
```

(vi) Execution of the optimization

For this step we rely again on the optimizer implemented in the `nlminb` function.
The arguments, i.e.~ initial values for the parameters and the function to be optimized, are extracted from the previously created TMB object. 

```{r 4optimizing-example}
mod_tmb <- nlminb(start = obj_tmb$par, objective = obj_tmb$fn)
# Check that it converged successfully
mod_tmb$convergence == 0
```

It is noteworthy that various alternatives to `nlminb` exist.
Nevertheless, we focus on this established optimization routine because of its high speed of convergence.

(vii) Obtaining ML estimates

Obtaining the ML estimates of the natural parameters together with their standard errors is possible by using the previously introduced command `sdreport`.

Recall that this requires the parameters of interest to be treated by the `ADREPORT` statement in the `C++` part. It should be noted that the presentation of the set of parameters `gamma` below results from a column-wise representation of the TPM.

```{r 4summary-sdreport-example1}
summary(sdreport(obj_tmb, par.fixed = mod_tmb$par), "report")
```

Note that the table above also contains estimation results for $\bs{\delta}$ and accompanying standard errors, although $\bs{\delta}$ is not estimated, but derived from $\bs{\Gamma}$.
We provide further details on this aspect in [Confidence intervals].\
The value of the nll function in the minimum found by the optimizer can also be extracted directly from the object `mod_tmb` by accessing the list element `objective`:

```{r 4summary-sdreport-example2}
mod_tmb$objective
```

(viii) Use exact gradient and Hessian

In the optimization above we already benefited from an increased speed due to the evaluation of the nll in `C++` compared to the forward algorithm being executed entirely in `R`.
However, since `TMB` computes the gradient and hessian of the likelihood, we can provide this information to the optimizer
This is in general advisable, because `TMB` provides an exact value of both gradient and Hessian up to machine precision, which is superior to approximations used by the optimizing procedure.
Similar to the nll, both quantities can be extracted directly from the `TMB` object `obj_tmb`:

```{r 4gradient-hessian-example}
# The negative log-likelihood is accessed by the objective
# attribute of the optimized object
mod_tmb <- nlminb(start = obj_tmb$par, objective = obj_tmb$fn,
                  gradient = obj_tmb$gr, hessian = obj_tmb$he)
mod_tmb$objective
```
Note that passing the exact gradient and Hessian as provided by `TMB` to `nlminb` leads to the same minimum, i.e. value of the nll function, here.

Is it noteworthy that inconsistencies can happen in the estimates due to computer approximations.

The stationary distribution is a vector of probabilities and should sum to 1.
However, it doesn't always behave as expected.
```{r 4_1-diff-1}
adrep <- summary(sdreport(obj_tmb), "report")
estimate_delta <- adrep[rownames(adrep) == "delta", "Estimate"]
sum(estimate_delta)
sum(estimate_delta) == 1 # The sum is displayed as 1 but is not 1
```

As noted on @zucchini[, pp. 159-160], ``the row sums of $\bs{\Gamma}$ will only approximately equal 1, and the components of the vector $\bs{\delta}$ will only approximately total 1. This can be remedied by scaling the vector $\bs{\delta}$ and each row of $\bs{\Gamma}$ to total 1''.
We do not remedy this issue because it provides no benefit to us, but this may lead to surprising results when checking equality.

This is likely due to machine approximations when numbers far apart from each other interact together.
In R, a small number is not 0 but is treated as 0 when added to a much larger number.

This can result in incoherent findings when checking equality between 2 numbers.
```{r 4_0-diff-0}
1e-100 == 0 # Small numbers are 0
(1 + 1e-100) == 1
```

## Basic nested model<br>specification
### Principle
As a first step in building a nested model, we arbitrarily fix $\lambda_1$ to the value 1.
```{r 4nested_params}
# Get the previous values, and fix some
fixed_par_lambda <- lambda
fixed_par_lambda[1] <- 1
# Transform them into working parameters
new_parameters <- pois.HMM.pn2pw(m = m,
                                 lambda = fixed_par_lambda,
                                 gamma = gamma)
```

Then, we instruct `TMB` to read these custom parameters.
We indicate fixed values by mapping them to NA values, whereas the variable values need to be mapped to different factor variables.
Lastly, we specify this mapping with the `map` argument when making the `TMB` object.
```{r 4nested_fix}
map <- list(tlambda = as.factor(c(NA, 1)),
            tgamma = as.factor(c(2, 3)))
fixed_par_obj_tmb <- MakeADFun(TMB_data, new_parameters,
                               DLL = "poi_hmm",
                               silent = TRUE,
                               map = map)
```

Estimation of the model and displaying the results is performed as usual.
```{r 4nested_nlminb}
fixed_par_mod_tmb <- nlminb(start = fixed_par_obj_tmb$par,
                            objective = fixed_par_obj_tmb$fn,
                            gradient = fixed_par_obj_tmb$gr,
                            hessian = fixed_par_obj_tmb$he)
summary(sdreport(fixed_par_obj_tmb), "report")
```
Note that the standard error of $\lambda_1$ is zero, because it is no longer considered a variable parameter and does not enter the optimization procedure.

### Limits
This method cannot work in general for working parameters which are not linked to a single natural parameter.
This is because only working parameters can be fixed with this method, but the working parameters of the TPM are not each linked to a single natural parameter.
As an example, fixing the natural parameter $\gamma_{11}$ is not equivalent to fixing any working parameter $\tau_{ij}$.
Hence, the TPM cannot in general be fixed, except perhaps via constrained optimization.

However, if conditions on the natural parameters can be translated to conditions on the working parameters, then there should not be any issue. We refer to [the next section](#param-eq-constraints) for more details.

### Parameter equality<br>constraints {#param-eq-constraints}
More complex constraints are also possible.
For example, imposing equality constraints (such as $\gamma_{11} = \gamma_{22}$) requires the corresponding factor level to be identical for the concerned entries.

As a reminder, we defined the working parameters via
\[
\tau_{ij} = \log(\frac{\gamma_{ij}}{1 - \sum_{k \neq i} \gamma_{ik}}) = \log(\gamma_{ij}/\gamma_{ii}), \text{ for } i \neq j
\]

With a two-state Poisson HMM, the constraint $\gamma_{11} = \gamma_{22}$ is equivalent to $\gamma_{12} = \gamma_{21}$.
Thus, the constraint can be transformed into $\tau_{12} = \log(\gamma_{12}/\gamma_{11}) = \log(\gamma_{21}/\gamma_{22}) = \tau_{21}$.

The mapping parameters must be set to a common factor to be forced equal.
```{r 4nested_multiple_param}
map <- list(tlambda = as.factor(c(1, 2)),
            tgamma = as.factor(c(3, 3)))
fixed_par_obj_tmb <- MakeADFun(TMB_data, parameters,
                               DLL = "poi_hmm",
                               silent = TRUE,
                               map = map)
fixed_par_mod_tmb <- nlminb(start = fixed_par_obj_tmb$par,
                            objective = fixed_par_obj_tmb$fn,
                            gradient = fixed_par_obj_tmb$gr,
                            hessian = fixed_par_obj_tmb$he)
# Results + check that the constraint is respected
results <- summary(sdreport(fixed_par_obj_tmb), "report")
tpm <- matrix(results[rownames(results) == "gamma", "Estimate"],
              nrow = m,
              ncol = m,
              byrow = FALSE) # Transformations are column-wise by default, be careful!
tpm
tpm[1, 1] == tpm[2, 2]
```

Similar complex constraints on the TPM can also be setup for HMMs with three or more states.
However, it appears this can only be achieved in general when the constraint involves natural parameters of the same row (with the exception of a two-state model).
We have not found a way to easily implement equality constraints between natural TPM parameters of different rows.
A solution might be constrained optimization.

As an example, we will look at a three-state HMM with the constraint $\gamma_{12} = \gamma_{13}$.

```{r 4nested_multiple_param_example}
# Model with 2 states
m <- 3
TMB_data <- list(x = tinn_data, m = m)

# Initial set of parameters
lambda <- c(1, 3, 5)
gamma <- matrix(c(0.8, 0.1, 0.1,
                  0.1, 0.8, 0.1,
                  0.1, 0.1, 0.8), byrow = TRUE, nrow = m)

# Turn them into working parameters
parameters <- pois.HMM.pn2pw(m, lambda, gamma)

# Build the TMB object
obj_tmb <- MakeADFun(TMB_data, parameters,
                     DLL = "poi_hmm", silent = TRUE)

# Optimize
mod_tmb <- nlminb(start = obj_tmb$par,
                  objective = obj_tmb$fn,
                  gradient = obj_tmb$gr,
                  hessian = obj_tmb$he)

# Check convergence
mod_tmb$convergence == 0

# Results
summary(sdreport(obj_tmb), "report")
```

The transformed constraint becomes $\tau_{12} = \log(\gamma_{12}/\gamma_{11}) = \log(\gamma_{13}/\gamma_{11}) = \tau_{13}$.

We need to be careful how we specify the constraint, because the vector `tgamma` will be converted into a matrix column-wise since this is R's default way to handle matrix-vector conversions.

The `tgamma` matrix looks naturally like
$$\begin{pmatrix}
         &\tau_{12}&\tau_{13}\\
\tau_{21}&         &\tau_{23}\\
\tau_{31}&\tau_{32}&
\end{pmatrix}$$

As a vector in `R`, it becomes $\left(\tau_{21}, \tau_{31}, \tau_{12}, \tau_{32}, \tau_{13}, \tau_{23}\right)$.
Therefore, the constraint needs to be placed on the $3^{rd}$ and $5^{th}$ vector parameter in the same way as we did when the Poisson mean $\lambda_1$ was fixed.

```{r 4nested_multiple_param_example_nlminb}
map <- list(tlambda = as.factor(c(1, 2, 3)),
            tgamma = as.factor(c(4, 5, 6, 7, 6, 8)))
fixed_par_obj_tmb <- MakeADFun(TMB_data, parameters,
                               DLL = "poi_hmm",
                               silent = TRUE,
                               map = map)
fixed_par_mod_tmb <- nlminb(start = fixed_par_obj_tmb$par,
                            objective = fixed_par_obj_tmb$fn,
                            gradient = fixed_par_obj_tmb$gr,
                            hessian = fixed_par_obj_tmb$he)
# Results + check that the constraint is respected
results <- summary(sdreport(fixed_par_obj_tmb), "report")
tpm <- matrix(results[rownames(results) == "gamma", "Estimate"],
              nrow = m,
              ncol = m,
              byrow = FALSE) # Transformations are column-wise by default, be careful!
tpm
tpm[1, 2] == tpm[1, 3]
```

Equality constraints involving a diagonal member of the TPM are simpler to specify: the constraint $\gamma_{i,j} = \gamma_{i,i}$ becomes transformed to $\tau_{i,j} = 1$ and this can be specified in the same way the Poisson mean $\lambda_1$ was fixed.


## State inference and forecasting {#state-inference}
The following code requires executing the code presented in the Section [Forward algorithm and backward algorithm](#forward-backward), as the log-forward and log-backward probabilities are needed.
We reuse the TYT data set from above along with the HMM that was estimated on it.

```{r 4prepare-state-inference, ref.label=c('3init-decoding', '3log-forward', '3log-backward')}
```

### Local decoding {#local-decoding}
The smoothing probabilities are defined in @zucchini[p.87 and p.336] as
\[
\text{P}(C_t = i \vert X^{(n)} = x^{(n)}) = \frac{\alpha_t(i) \beta_t(i)}{L_n}
\]


```{r 4smoothing}
# Compute conditional state probabilities, smoothing probabilities
stateprobs <- matrix(NA, ncol = length(tinn_data), nrow = m)
llk <- - mllk
for(i in 1:n) {
  stateprobs[, i] <- exp(lalpha[, i] + lbeta[, i] - llk)
}
# stateprobs contains n=87 columns, so we only display 4 for readability
# Each row corresponds to a state.
# For example, stateprobs[2, 3] is the probability that the third data
# is associated with state 2
stateprobs[, 1:4]
```

Local decoding is a straightforward maximum of the smoothing probabilities.

```{r 4localdecoding}
# Most probable states (local decoding)
ldecode <- rep(NA, n)
for (i in 1:n) {
  ldecode[i] <- which.max(stateprobs[, i])
}
ldecode
```

### Forecast {#forecast}
The forecast distribution or h-step-ahead-probabilities as well as its
implementation in R is detailed in @zucchini[p.83 and p.337]

Then,
\[
\text{P}(X_{n+h} = x \vert X^{(n)} = x^{(n)}) = \frac{\bs{\alpha}_n \bs{\Gamma}^h \textbf{P}(x) \bs{1}'}{\bs{\alpha}_n \bs{1}'} = \bs{\Phi}_n \bs{\Gamma}^h \textbf{P}(x) \bs{1}'
\]
An implementation of this using a scaling scheme is

```{r 4forecast}
# Number of steps
h <- 1
# Values for which we want the forecast probabilities
xf <- 0:50

nxf <- length(xf)
dxf <- matrix(0, nrow = h, ncol = nxf)
foo <- delta * emission_probs[1, ]
sumfoo <- sum(foo)
lscale <- log(sumfoo)
foo <- foo / sumfoo
for (i in 2:n) {
  foo <- foo %*% gamma * emission_probs[i, ]
  sumfoo <- sum( foo)
  lscale <- lscale + log(sumfoo)
  foo <- foo / sumfoo
}
emission_probs_xf <- get.emission.probs(xf, lambda)
for (i in 1:h) {
  foo <- foo %*% gamma
  for (j in 1:m) {
    dxf[i, ] <- dxf[i, ] + foo[j] * emission_probs_xf[, j]
  }
}
# dxf contains n=87 columns, so we only display 4 for readability
dxf[, 1:4]
```

### Global decoding {#global-decoding}
The Viterbi algorithm is detailed in @zucchini[p.88 and p.334].
It calculates the sequence of states $(i_1^*, \ldots, i_n^*)$ which
maximizes the conditional probability of all states simultaneously, i.e.
\[
(i_1^*, \ldots, i_n^*) = \argmax_{i_1, \ldots, i_n \in \{1, \ldots, m \}} \text{P}(C_1 = i_1, \ldots, C_n = i_n \vert X^{(n)} = x^{(n)}).
\]
An implementation of it is

```{r 4global}
xi <- matrix(0, n, m)
foo <- delta * emission_probs[1, ]
xi[1, ] <- foo / sum(foo)
for (i in 2:n) {
  foo <- apply(xi[i - 1, ] * gamma, 2, max) * emission_probs[i, ]
  xi[i, ] <- foo / sum(foo)
}
iv <- numeric(n)
iv[n] <- which.max(xi[n, ])
for (i in (n - 1):1){
  iv[i] <- which.max(gamma[, iv[i + 1]] * xi[i, ])
}
iv
```


## Appendix {#appendix}
Model selection tinnitus via AIC and BIC (calculation defined in @zucchini).

* AIC and BIC of a Poisson regression
```{r 4model-selection-tinnitus1}
m <- 1
TMB_data <- list(x = tinn_data, m = m)
lambda <- 2
gamma <- 1
parameters <- pois.HMM.pn2pw(m, lambda, gamma)
obj_tmb <- MakeADFun(TMB_data, parameters,
                     DLL = "poi_hmm", silent = TRUE)
mod_tmb <- nlminb(start = obj_tmb$par, objective = obj_tmb$fn)
mllk <- mod_tmb$objective
np <- length(unlist(parameters))
AIC_1 <- 2 * (mllk + np)
n <- sum(!is.na(TMB_data$x))
BIC_1 <- 2 * mllk + np * log(n)
mod_tmb$convergence == 0
AIC_1
BIC_1
```
* AIC and BIC of a two-state Poisson HMM
```{r 4model-selection-tinnitus2}
m <- 2
TMB_data <- list(x = tinn_data, m = m)
lambda <- seq(from = 1, to = 3, length.out = m)
# 0.8 on the diagonal, and 0.2 split along the rest of each line, size m
gamma <- matrix(0.2 / (m - 1),
                nrow = m,
                ncol = m)
diag(gamma) <- 0.8
parameters <- pois.HMM.pn2pw(m, lambda, gamma)
obj_tmb <- MakeADFun(TMB_data, parameters,
                     DLL = "poi_hmm", silent = TRUE)
mod_tmb <- nlminb(start = obj_tmb$par, objective = obj_tmb$fn)
mllk <- mod_tmb$objective
np <- length(unlist(parameters))
AIC_2 <- 2 * (mllk + np)
n <- sum(!is.na(TMB_data$x))
BIC_2 <- 2 * mllk + np * log(n)
mod_tmb$convergence == 0
AIC_2
BIC_2
```
* AIC and BIC of a three-state Poisson HMM
```{r 4model-selection-tinnitus3}
m <- 3
TMB_data <- list(x = tinn_data, m = m)
lambda <- seq(from = 1, to = 3, length.out = m)
# 0.8 on the diagonal, and 0.2 split along the rest of each line, size m
gamma <- matrix(0.2 / (m - 1),
                nrow = m,
                ncol = m)
diag(gamma) <- 0.8
parameters <- pois.HMM.pn2pw(m, lambda, gamma)
obj_tmb <- MakeADFun(TMB_data, parameters,
                     DLL = "poi_hmm", silent = TRUE)
mod_tmb <- nlminb(start = obj_tmb$par, objective = obj_tmb$fn)
mllk <- mod_tmb$objective
np <- length(unlist(parameters))
AIC_3 <- 2 * (mllk + np)
n <- sum(!is.na(TMB_data$x))
BIC_3 <- 2 * mllk + np * log(n)
mod_tmb$convergence == 0
AIC_3
BIC_3
```
* AIC and BIC of a four-state Poisson HMM
```{r 4model-selection-tinnitus4}
m <- 4
TMB_data <- list(x = tinn_data, m = m)
lambda <- seq(from = 1, to = 3, length.out = m)
# 0.8 on the diagonal, and 0.2 split along the rest of each line, size m
gamma <- matrix(0.2 / (m - 1),
                nrow = m,
                ncol = m)
diag(gamma) <- 0.8
parameters <- pois.HMM.pn2pw(m, lambda, gamma)
obj_tmb <- MakeADFun(TMB_data, parameters,
                     DLL = "poi_hmm", silent = TRUE)
mod_tmb <- nlminb(start = obj_tmb$par, objective = obj_tmb$fn)
mllk <- mod_tmb$objective
np <- length(unlist(parameters))
AIC_4 <- 2 * (mllk + np)
n <- sum(!is.na(TMB_data$x))
BIC_4 <- 2 * mllk + np * log(n)
mod_tmb$convergence == 0
AIC_4
BIC_4
```
* Summary

Models                    AIC       | BIC
----------------------- | --------- | ---------
Poisson regression      | `r AIC_1` | `r BIC_1`
Two-state Poisson HMM   | `r AIC_2` | `r BIC_2`
Three-state Poisson HMM | `r AIC_3` | `r BIC_3`
Four-state Poisson HMM  | `r AIC_4` | `r BIC_4`

AIC and BIC prefer a two-state model over a Poisson regression and over three and four-state HMMs.