Skip to content

Uncertain Parameters

robj411 edited this page Nov 11, 2019 · 3 revisions

ithim_uncertainty is a wrapper for sampling from the ITHIM output in a loop. The number of samples to take is set at the initiation step, along with specifications of parametric distributions, from which the required number of samples are taken. First, it sets the parameters to the environment for the current sample. Then it recalculates any distance-related objects: set_vehicle_inventory and get_synthetic_from_trips if any raw distances change, and get_all_distances if e.g. the walk-to-bus time has changed. Then the basic run_ithim function is called.

Running ITHIM with uncertain parameters allows assessment of their impact on the outcome (AKA sensitivity analysis). We use EVPPI to calculate the expected reduction in uncertainty in the outcome were we to learn a parameter perfectly. This means we can implement models that are basic in their parametrisation, and learn at the end for which parameters it would be worthwhile spending dedicated time learning better.

Parametric distributions for uncertain variables

Cycling and walking MMETs are the number of MMETs per hour when undertaking cycling and walking, and determine also the ventilation rates. Motorcycle distance is the total distance travelled by motorcycles relative to the total distance travelled by cars in the baseline scenario. Non-travel PA, injury reporting rate and NCD burden all act as scalars for the relevant datasets. Note that the non-travel PA scalar does not affect the ~40% of the population whose non-travel PA is 0.

Dose--response relationships

For the dose--response relationships between physical activity (PA) and disease and air pollution (AP) and disease, we assume that there is uncertainty, but no variability, in the relationship. This means that we sample a relationship from the distribution of relationships, and apply that relationship to all individuals precisely. This means that, given fixed doses, responses between individuals will be perfectly correlated.

We achieve this by use of the probability integral transform: we sample a random variable uniformly distributed on the space (0,1) and map it, via a cumulative distribution function, to the distribution describing the dose--response relationship.

Physical activity

Each disease's PA dose--response relationship is defined by a truncated normal. For each dose, there is a mean value, an upper bound, and a lower bound. For each person's dose, we get the response by mapping the uniform random variable onto the truncated normal defined by the mean and bounds for that dosage.

Air pollution

For the AP relationship, there are four parameters per disease. We sample the first from an empirical distribution using the probability integral transform. We sample the second via the same method, conditioned on the value of the first, constructing their joint density with e.g. kde2d. The third parameter is sampled conditioned on the first and second, constructing their joint density using a GAM. The final parameter is sampled conditioned on the first, second, and third, constructing their joint density using a GAM.

As before, there is perfect correlation between individuals, i.e. if person A's dose is greater than person B's, then person A's response is strictly greater than person B's response.

The empirical distributions come from Burnett (2014). There are four parameters per disease: IHD, lung cancer, COPD, and stroke. In addition, for stroke and IHD, there is a set of four parameters for each age group from 25 to 95 in five-year increments. In addition to our assumption that there is perfect correlation between individuals for diseases, we assume perfect correlation between ages for diseases. I.e., our four quantiles per disease will be applied to all age groups.

Confidences

We use confidences to parametrise the proportion of the population who do no work or leisure physical activity, and the fractional contributions of modes towards background AP. Confidences are values between 0 and 1, where 0 represents complete uncertainty and 1 represents complete certainty.

Physical activity

We learn the fraction of the population that does no non-travel physical activity and convert that to a beta distribution somehow parametrised by our confidence value, from which we sample, so that the more confident we are about the survey, the narrower the distribution will be. Then we resample non-zero PA values for the fraction of the population who, in that sample, do undertake non-travel PA.

Emission inventory

We apply a similar method to the emissions inventory, describing the raw input as a Dirichlet distribution somehow informed by our confidence value.

Clone this wiki locally