02-micromap.Rmd

# Linked Micromap Plots via the **micromap** R Package {#Ch2}


\chapterauthor{J{\"u}rgen Symanzik, Marcus W. Beck, Michael G. McManus}


The **micromap**\index{R Packages!micromap} R package [@PaOl2024],
accessible at https://cran.r-project.org/web/packages/micromap/index.html,
will be introduced in this chapter. The reader will learn how to make use of the
four main steps that are required to create a 
basic linked micromap plot\index{Linked micromap plot} via this R package. 
Details will be provided how to optimize and fine-tune such a basic plot 
into a publication-worthy final linked micromap plot.\index{Linked micromap plot}
Example linked micromap plots\index{Linked micromap plot} are created
for education and poverty data for the 50 states (and Washington, D.C.) of the
United States (U.S.) and for watersheds in West Virginia (one of the 50 U.S. states).


## Introduction {#Ch2-Introduction}


As discussed in Chapter \@ref(Ch1), linked micromap plots\index{Linked micromap plot}
were originally presented at the Joint Statistical Meetings (JSM) 
in Chicago, Illinois, in 1996 [@OCCP1996]. They quickly gained popularity among
researchers at United States (U.S.) Federal Agencies such as
the U.S. Department of Agriculture – National Agricultural Statistics Service (USDA–NASS),
various branches of the U.S. Environmental Protection Agency (USEPA),
the National Cancer Institute (NCI),
the U.S. Census Bureau, and
the U.S. Bureau of Labor Statistics (BLS).
Early main applications of linked micromap plots\index{Linked micromap plot}
can be found in the environmental field [@COPC1998;@COPC2000]
and the medical field [@CCBPZ2002;@CBPZL2003].

While early linked micromap plots\index{Linked micromap plot} were created via
S-Plus\index{S-Plus} and Java, later ones were created in R (see @SC2013 for an overview).
However, even with the availability of R code that was provided in support
of @CP2010, creating linked micromap plots\index{Linked micromap plot} 
was challenging which considerably limited their use.
In fact, @PWMO2012 observed that
"Producing LMplots [...] has
typically been somewhat difficult, and therefore LMplots
have seen limited use." However, linked micromap plots\index{Linked micromap plot}
continued to play an important role at the USEPA. Eventually, a team of 
researchers including Anthony R. Olsen, Quinn C. Payton, Michael G. McManus,
Marc H. Weber, and Thomas M. Kincaid, all originally with the USEPA
in Corvallis, Oregon, started to develop an R package for 
linked micromap plots.\index{Linked micromap plot}
First uses of this package can be seen in poster presentations in
May 2012 [@PWMO2012] and April 2013 [@PWMKO2013].
At about the same time, in December 2012, the first publicly available 
version of the **micromap**\index{R Packages!micromap} R package (version 1.5), 
was released to CRAN [@PaOl2012].
Eventually, Marcus W. Beck, then also with the USEPA in Gulf Breeze, Florida,
joined the team of the original developers with the release of version 1.9.3
of this R package in February 2018 [@PaOl2018] and has also served as the maintainer of
this R package since then.

As frequently happens in an open software environment such as R,
the **micromap**\index{R Packages!micromap} R package is not the only 
R package for linked micromap plots.\index{Linked micromap plot}
Independently, but motivated by similar past uses and resources,
the **micromapST**\index{R Packages!micromapST} R package
has been developed in parallel and its first version (version 1.0)
was released to CRAN in June 2013 [@CP2013CRAN], i.e., 
only a few months after the first release of the
**micromap**\index{R Packages!micromap} R package.
The **micromapST**\index{R Packages!micromapST} R package will be extensively discussed in 
Chapter \@ref(Ch3).

From a user's perspective, there are little differences in the
appearance and quality of the final linked micromap plots\index{Linked micromap plot}
that can be created by these two R packages. 
There are, however, differences in details in how linked micromap plots\index{Linked micromap plot}
are created in each package that can be important for users.
Perhaps the two biggest differences between the two packages are that
the **micromap**\index{R Packages!micromap} R package makes it easy to bring
in one's own boundary files, in particular external shapefiles,\index{Shapefile}
while the **micromapST**\index{R Packages!micromapST} R package
initially supports a larger number of glyph types.
However, even these differences are relatively minor as users can 
create their own plot types for use in the **micromap**\index{R Packages!micromap} R package
as discussed in Chapter \@ref(Ch5) and
incorporate external shapefiles,\index{Shapefile} into the 
**micromapST**\index{R Packages!micromapST} R package as discussed in Chapter \@ref(Ch4b).
Ultimately, the decision is up to the analyst on which of these two R packages
to use for the construction of linked micromap plots.\index{Linked micromap plot}

The remainder of this chapter is organized as follows:
Section \@ref(Ch2-Steps), will introduce the
four main steps that are required to create a 
basic linked micromap plot\index{Linked micromap plot} via the **micromap**\index{R Packages!micromap} R package. 
Examples in Section \@ref(Ch2-Example1) and Section \@ref(Ch2-Example2) will outline how to apply these steps
to data for the for the 50 U.S. States and Washington, D.C. and 
to data for watersheds in West Virginia, respectively.
This chapter concludes with a summary and suggestions for further reading
in Section \@ref(Ch2-SummaryFurtherReading).


## Steps to Create a Linked Micromap Plot with the **micromap**\index{R Packages!micromap} R Package {#Ch2-Steps}


```{r Ch2-flowchart, fig.cap = 'Workflow to create a linked micromap plot with the **micromap**\\index{R Packages!micromap} R package (Diagram created with the **DiagrammeR**\\index{R Packages!DiagrammeR} R package [@Iannone2022]).', fig.width = 5, fig.height = 5, echo = FALSE}
library(DiagrammeR)

DiagrammeR::grViz(
  diagram = "digraph flowchart {
  node [fontname = arial, shape = oval, color = grey, style = filled]
  tab1 [label = '@@1']
  tab2 [label = '@@2']
  tab3 [label = '@@3']
  tab4 [label = '@@4']

  tab1 -> tab2 -> tab3 -> tab4;
}

  [1]: '1. Identifying and Geoprocessing of Spatial Boundary Data'
  [2]: '2. Linking Spatial Boundary Data and Statistical Data'
  [3]: '3. Creating a Draft Linked Micromap Plot'
  [4]: '4. Refining the Linked Micromap Plot'
"
)
```


Four main steps, shown in Figure \@ref(fig:Ch2-flowchart), are needed to create a 
linked micromap plot\index{Linked micromap plot} 
with the **micromap**\index{R Packages!micromap} R package:

1. **Identifying and Geoprocessing of Spatial Boundary Data** 
(see Sections \@ref(Ch2-Identifying) and \@ref(Ch2-IdentifyingWV) for details): 
In addition to a data frame that contains the statistical data,
the user must identify a data structure that contains the spatial boundary data for the region and subregions that are 
colored in the maps. This spatial boundary data typically comes in the form of a
SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame}. In this chapter, we will work with ready-to-use
boundary files. In Chapter \@ref(Ch4), we will discuss how to make use of boundary files that are
provided as external shapefiles\index{Shapefiles}. In particular, we will see in that chapter how
to simplify complex boundaries, enlarge small subregions in the maps, and move subregions closer that are far
from the main area of the map.

2. **Linking Spatial Boundary Data and Statistical Data** 
(see Sections \@ref(Ch2-Linking) and \@ref(Ch2-LinkingWV) for details): 
To link spatial boundary data and statistical data, 
the SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame} first has to be transformed into a regular data frame
via the `create_map_table()` function.
Next, we have to identify one variable from the statistical data frame
and one variable from the newly created data frame with the boundary information that 
allow us to link statistical data and boundary data for each subregion.
This step is not required when the spatial boundary data and statistical data are adequately
stored in a simple features\index{Simple features} (sf) format [@OGC2022] instead of a
SpatialPolygonsDataFrame.\index{SpatialPolygonsDataFrame}

3. **Creating a Draft Linked Micromap Plot** 
(see Sections \@ref(Ch2-Creating) and \@ref(Ch2-CreatingWV) for details): 
While not necessary, it is always a good idea to first create a minimal
linked micromap plot\index{Linked micromap plot} to ensure that the statistical data and boundary data
are matching and a correct draft linked micromap plot\index{Linked micromap plot} is created. Skipping this
step and trying to create a complex linked micromap plot\index{Linked micromap plot} immediately may complicate
debugging the R code.

4. **Refining the Linked Micromap Plot** 
(see Sections \@ref(Ch2-Refining) and \@ref(Ch2-RefiningWV) for details): 
Once a draft linked micromap plot\index{Linked micromap plot} has been
created, this plot usually needs fine-tuning of its appearance and plot aesthetics, e.g., modification of colors,
change of the layout and of perceptual groups, addition of labels and legends, and possibly the addition of additional
statistical variables or changes to different graph types for some of the variables.


## Example 1: A Linked Micromap Plot for the 50 U.S. States and Washington, D.C. {#Ch2-Example1}


In this first linked micromap plot\index{Linked micromap plot}
example created with the **micromap**\index{R Packages!micromap} R package, we work with the
_USstates_\index{Datasets!USstates} and _edPov_\index{Datasets!edPov} datasets
from the **micromap**\index{R Packages!micromap} R package.
We follow the four steps outlined in Section \@ref(Ch2-Steps).


### Identifying and Geoprocessing of Spatial Boundary Data {#Ch2-Identifying}


_USstates_\index{Datasets!USstates} is a SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame} for 
the 50 U.S. states (and Washington, D.C.) 
that was created for use with linked micromap plots\index{Linked micromap plot}. Notably, the boundaries
of many of the states have been simplified, Alaska and Hawaii have been moved closer to the
contiguous 48 states (and also have been resized), and Washington, D.C. has been pulled out of the
main map, placed further to the east, and also has been enlarged
as shown in Figure \@ref(fig:Ch2-USstates). In the following R code,
we first load the **micromap**\index{R Packages!micromap} R package and the
_USstates_\index{Datasets!USstates} dataset, verify that this object indeed is a
SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame}, then look
at some of the data in the `data` component of this object, and finally plot it.
For figures that only contain maps, it is often helpful to remove all 
margin space on all four sides of the plot 
via `par(mar = c(0, 0, 0, 0))`
to maximize the plot, i.e., the actual map.
When loading the **micromap**\index{R Packages!micromap} R package,
the reader will notice that it depends on the 
**RColorBrewer**\index{R Packages!RColorBrewer} [@Neuwirth2022], 
**sp**\index{R Packages!sp} [@PeBi2022], 
and **sf**\index{R Packages!sf} [@Pebesma2022] R packages.
These packages are dependencies for **micromap**\index{R Packages!micromap} and
are installed automatically with the package.

```{r Ch2-USstates, fig.cap = 'Map representation of the _USstates_\\index{Datasets!USstates} spatial boundary dataset for the United States that frequently is used as the basis for linked micromap plots\\index{Linked micromap plot} that are created with the **micromap**\\index{R Packages!micromap} R package.', fig.width = 7, fig.height = 4}
library(micromap)

data(USstates)

class(USstates)
head(USstates@data)

par(mar = c(0, 0, 0, 0))
plot(USstates)
```


### Linking Spatial Boundary Data and Statistical Data {#Ch2-Linking}


In this step, the SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame} first is transformed
into a regular data frame for use in a linked micromap plot\index{Linked micromap plot}
via the `create_map_table()` function. We have to indicate a variable from the `data`
component of the SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame} object that 
can be used as an ID column. A matching ID column must also be identified in the statistical 
dataset for linking with the spatial boundary dataset. For the _USstates_\index{Datasets!USstates} 
dataset, this is usually the `ST` variable that contains the 51 abbreviations for the 50 U.S. 
states (and for Washington, D.C.).

```{r Ch2-linkingdataframesUSstates}
head(USstates@data$ST)

state_polys_table <- create_map_table(
  tmp.map = USstates,
  IDcolumn = "ST"
)

class(state_polys_table)
dim(state_polys_table)
names(state_polys_table)
```


The resulting `state_polys_table` is a regular data frame that will be used for creating a
linked micromap plot\index{Linked micromap plot} in the next step.
`state_polys_table` consists of `r dim(state_polys_table)[1]` rows. This implies that the map is based
on `r dim(state_polys_table)[1]` line segments.

_edPov_\index{Datasets!edPov} is a data frame that contains
education and poverty level data for the 50 U.S. states (and Washington, D.C.).
This data frame has 51 rows, one for each of the 50 U.S. states (and one for Washington, D.C.).
We have to identify one variable from this data frame that can be used for linking the
two datasets. Here, this is the `StateAb` variable.

The last expression in the following R code verifies that there is indeed at least one matching ID 
in the `state_polys_table` data frame for each ID in the _edPov_\index{Datasets!edPov} dataset.
Missing IDs in the statistical data frame may prevent data from appearing on the 
maps. Similarly, mismatching identifiers in the statistical data frame may also prevent data from 
being shown on the maps, e.g., if the statistical data frame uses `D.C.` as ID while
the spatial data frame uses `DC`. Here, everything is matching.


```{r Ch2-linkingdataframesedPov}
data(edPov)
dim(edPov)

head(edPov)
head(edPov$StateAb)

all(sort(edPov$StateAb) == sort(unique(state_polys_table$ID)))
```


### Creating a Draft Linked Micromap Plot {#Ch2-Creating}


Now we can create a minimal draft linked micromap plot\index{Linked micromap plot} using the
`mmplot()` function, based on the previously created 
`state_polys_table` data frame and the _edPov_\index{Datasets!edPov} dataset.
In our function call, seven arguments are required, none of which have a default value.
For all other arguments of this function, the default settings will be used here. 
The statistical data (_edPov_\index{Datasets!edPov}) is assigned to the `stat.data` argument
and the spatial boundary data (in the `state_polys_table` data frame) is assigned to the `map.data` argument.

The `map.link` argument is needed to link the statistical data
and the spatial boundary data. A vector with the names of the two linking variables 
identified in the previous step is needed for this argument.
The first variable name (`StateAb`) must come from the statistical data (here _edPov_\index{Datasets!edPov})
and the second variable name (`ID`) must come from the `state_polys_table` data frame that
contains the spatial boundary information.
Changing the order of these two variable names typically results in an error.

The `panel.types` and `panel.data` arguments are closely related. 
As the name suggests, `panel.types` is a vector
that specifies the layout of the columns of panels\index{Panel}
in the linked micromap plot.\index{Linked micromap plot}
Here, we have panels\index{Panel} with a `dot_legend` in the first column, `labels` in the second column,
the statistical data represented as dotplots\index{Dotplot} (`dot`) in the third and fourth columns,
and the micromaps in the fifth, i.e., final, (`map`) column.
This is matched with a list of data that is used for each of the five columns of panels.\index{Panel}
A list is necessary for this argument as some of the data itself can be lists as we will
see in the R code for Figures \@ref(fig:Ch2-refining1WV) 
and \@ref(fig:Ch2-refining2WV) later on.

The `dot_legend` simply shows a plotting symbol in a certain color that represents that row in
the linked micromap plot\index{Linked micromap plot}. Thus, no further data is needed and `NA`
is assigned. `labels` requires some text argument, typically some identifiers of the subregions
in the maps. Here `state` from _edPov_ is used. 
The variables `pov` and `ed` from _edPov_ are used for the statistical displays in columns three and four.
It should be noted that all data specified in `panel.data` by default is taken from the
data frame specified in the `stat.data` argument.
The fifth and final column contains the micromaps. Their boundaries are obtained from
the `map.data` data frame. Thus, no further data has to be specified here and `NA` 
is assigned instead.

Two more arguments have to be specified: `ord.by` specifies the sorting variable of the 
rows in the linked micromap plot\index{Linked micromap plot}. Here, `pov` 
from the `stat.data` argument is used. The sorting of the rows goes from smallest (at the top) to
largest (at the bottom). Finally, `grouping` specifies  the number of rows
in each of the perceptual groups. If a single integer value is provided, 
that value is used for all perceptual groups. If a vector of integer values is provided,
each perceptual group may have a different number of rows as 
shown in the R code for Figure \@ref(fig:Ch2-refining3). 
Here, `grouping` is set to `5`, meaning there are five rows of data in each perceptual group.

The resulting draft linked micromap plot\index{Linked micromap plot} is shown in
Figure \@ref(fig:Ch2-creatingdraft). It is noteworthy that there are eleven perceptual groups
overall --- ten with five subregions and one with just one subregion. This is because 
there are 51 subregions overall: The 50 U.S. states and Washington, D.C.
This results in only one row of data and one subregion highlighted in the final (bottom) perceptual
group. There is no automatic balancing of the number of rows in each perceptual group.
A layout such as the default one shown in Figure \@ref(fig:Ch2-creatingdraft) 
should be avoided in general. Table \@ref(tab:Ch1-PartitioningTable) in Chapter \@ref(Ch1) provides 
suggestions how to group data for various numbers of subregions.
We will also address this as one of the refinement steps in the next section.

We abstain from interpreting this figure at this stage, but we will provide some
helpful interpretation once we have created the last refined version of this figure
at the end of the next section. However, we encourage the reader to examine the draft maps 
at this time and determine whether any spatial patterns may be visible and to assess the 
statistical relationship between the two variables shown in the third and fourth columns of the plot.


```{r Ch2-creatingdraft, fig.cap = 'Draft linked micromap plot\\index{Linked micromap plot}, based on the _edPov_\\index{Datasets!edPov} dataset.', fig.width = 7, fig.height = 9}
mmplot(
  stat.data = edPov,
  map.data = state_polys_table,
  map.link = c("StateAb", "ID"),
  panel.types = c("dot_legend", "labels", "dot", "dot", "map"),
  panel.data = list(NA, "state", "pov", "ed", NA),
  ord.by = "pov",
  grouping = 5
)
```


### Refining the Linked Micromap Plot {#Ch2-Refining}


We continue with the draft linked micromap plot\index{Linked micromap plot} from the previous
section and refine it in multiple small steps. This refinement process should only be started once
functional R code has been obtained in the previous step and an initial 
linked micromap plot\index{Linked micromap plot} has been created.

First, we remove the eleventh perceptual group (with just one subregion) and introduce
a median row via `median.row = TRUE` instead. A median row is often a good solution if the 
number of subregions is odd such as for the 50 U.S. states (and Washington, D.C.).
Here, Wyoming is the state shown in the median row. It has the 
26$^{th}$ highest (or lowest) value for the sorting variable, i.e., `pov`.
It does not appear in a map by itself, but rather is added to the perceptual
groups above and below the median row in a neutral color, 
thus increasing the number of subregions shown in each of these two maps by one (i.e., six here).
Also, we reverse the sorting order via `rev.ord = TRUE`. 
Now the sorting of the rows goes from largest (at the top) to
smallest (at the bottom).
The resulting linked micromap plot\index{Linked micromap plot} is shown in
Figure \@ref(fig:Ch2-refining1).
While we keep `pov` as the sorting variable in this first refined version,
the reader is encouraged to use `ed` as the sorting variable to see how
the spatial patterns highlighted in the maps change. This can be done
for the original sorting order or for the reversed sorting order.
The rows can even be sorted by `region` or alphabetically 
by `state` or `StateAb` even though such an alphabetical sorting
in most cases is not very meaningful.


```{r Ch2-refining1, fig.cap = 'First refined linked micromap plot\\index{Linked micromap plot}, based on the _edPov_\\index{Datasets!edPov} dataset. Main changes are the introduction of a median row instead of the eleventh perceptual group and the reverse ordering of the `pov` data in the first statistical graphics column.', fig.width = 7, fig.height = 9}
mmplot(
  stat.data = edPov,
  map.data = state_polys_table,
  map.link = c("StateAb", "ID"),
  panel.types = c("dot_legend", "labels", "dot", "dot", "map"),
  panel.data = list(NA, "state", "pov", "ed", NA),
  ord.by = "pov",
  rev.ord = TRUE,
  grouping = 5,
  median.row = TRUE
)
```


We continue with modifications to the first refined linked micromap plot\index{Linked micromap plot}. 
Next, we change the order of the two statistical graphics columns,
i.e., we place `ed` to the left of `pov` and use `ed` as the sorting variable (in reverse order).
While any column of the linked micromap plot\index{Linked micromap plot} or even
variables not shown can be used as sorting variables, in most cases,
the first (leftmost) statistical graphics column is used for sorting.
Moreover, we place the maps on the left side of the plot. As previously stated,
the `panel.types` and `panel.data` arguments of the `mmplot()` function
are closely related. Thus, if we change the order of one, the order of the
other one has to be changed accordingly. 
The resulting linked micromap plot\index{Linked micromap plot} is shown in
Figure \@ref(fig:Ch2-refining2).
As stated in Section \@ref(Ch1-LinkedMicromapPlots),
there is no strong recommendation where the column with the maps should be placed.


```{r Ch2-refining2, fig.cap = 'Second refined linked micromap plot\\index{Linked micromap plot}, based on the _edPov_\\index{Datasets!edPov} dataset. Main changes are related to the order of the five columns in the plot. Most notable, the map column is shown on the left here.', fig.width = 7, fig.height = 9}
mmplot(
  stat.data = edPov,
  map.data = state_polys_table,
  map.link = c("StateAb", "ID"),
  panel.types = c("map", "dot_legend", "labels", "dot", "dot"),
  panel.data = list(NA, NA, "state", "ed", "pov"),
  ord.by = "ed",
  rev.ord = TRUE,
  grouping = 5,
  median.row = TRUE
)
```


We continue making changes to the second refined linked micromap plot\index{Linked micromap plot}.
We change the grouping to nine perceptual groups overall (and no median row)
via `grouping = c(6, 6, 6, 6, 3, 6, 6, 6, 6)` and `median.row = FALSE` (which is the default and could be omitted)
and vertically align the rows in each perceptual group via `vertical.align = "center"`.
This grouping is not a recommended partitioning from Table \@ref(tab:Ch1-PartitioningTable) in Chapter \@ref(Ch1)
and is mostly done for experimental purposes here.
Finally, we start making changes to individual columns of the plot via
the `panel.att` argument. Here, the third column that shows the `labels` is aligned on the left.
The resulting linked micromap plot\index{Linked micromap plot} is shown in
Figure \@ref(fig:Ch2-refining3).


```{r Ch2-refining3, fig.cap = 'Third refined linked micromap plot\\index{Linked micromap plot}, based on the _edPov_\\index{Datasets!edPov} dataset. Main changes are related to perceptual groups with different numbers of subregions and the vertical alignment of rows in the middle of the plot. Also, `labels` are aligned on the left.', fig.width = 7, fig.height = 9}
mmplot(
  stat.data = edPov,
  map.data = state_polys_table,
  map.link = c("StateAb", "ID"),
  panel.types = c("map", "dot_legend", "labels", "dot", "dot"),
  panel.data = list(NA, NA, "state", "ed", "pov"),
  ord.by = "ed",
  rev.ord = TRUE,
  grouping = c(6, 6, 6, 6, 3, 6, 6, 6, 6),
  median.row = FALSE,
  vertical.align = "center",
  panel.att = list(list(3, align = "left"))
)
```


After this experiment with a different grouping, we revert back to the more traditional grouping
of ten perceptual groups with five rows each and a median row that is frequently used
for the 50 U.S. states (and Washington, D.C.) 
as suggested as Partitioning 1 from Table \@ref(tab:Ch1-PartitioningTable) in Chapter \@ref(Ch1).
Next, the choice of colors
for the subregions in each map (and thus for the `dot_legend` appearance in the
`panel.type` argument as well) is discussed. The default setting for the `colors` argument
makes use of `max(grouping)` different colors from a spectral color scheme.
Thus, when `grouping = 5` as in Figures \@ref(fig:Ch2-creatingdraft)-\@ref(fig:Ch2-refining2),
a five-class spectral color scheme\index{Color scheme!Spectral} is selected, 
whereas a six-class spectral color scheme\index{Color scheme!Spectral} is
selected when `grouping = c(6, 6, 6, 6, 3, 6, 6, 6, 6)` as in Figure \@ref(fig:Ch2-refining3).
Historically, many linked micromap plots,\index{Linked micromap plot} 
e.g., in @COCPC1998 and @WCCBP2002,
made use of rainbow colors\index{Color scheme!Rainbow colors} that could be obtained 
via the `colors = c("red", "orange", "green", "blue", "purple")` setting of the
`colors` argument. While these colors work well for readers with normal color vision,
they may not work well for readers with certain types of color vision deficiencies.
Instead, some color schemes that are colorblind safe are better suited for such readers.
Options are single-hue or multi-hue sequential color schemes\index{Color scheme!Sequential} 
or selected divergent color schemes\index{Color scheme!Divergent}.
Such color schemes can be obtained from the 
**RColorBrewer**\index{R Packages!RColorBrewer} R package [@Neuwirth2022].
The reader is encouraged to read more about the theoretical background
of these color schemes in @BHH2003 and @HaBr2003 and experiment with different settings
at the supporting web page at https://colorbrewer2.org/.
@SDWPM2014 used a five-class greyscale sequential color scheme\index{Color scheme!Sequential}
from **RColorBrewer**\index{R Packages!RColorBrewer} in reverse sorting (where
the darkest grey color comes first) for publication in a greyscale publication,\index{Colors!Greyscale publication}
obtained via `colors = RColorBrewer::brewer.pal(n = 5, name = "Greys")[5:1]`.
@SBDSS2016 used a five-class divergent red-yellow-blue (RdYlBu) color scheme\index{Color scheme!Divergent}
from **RColorBrewer**\index{R Packages!RColorBrewer}
that is colorblind safe\index{Colors!Colorblind safe} and print friendly,\index{Colors!Print friendly}
obtained via `colors = RColorBrewer::brewer.pal(n = 5, name = "RdYlBu")`.
In the linked micromap plot\index{Linked micromap plot} shown in
Figure \@ref(fig:Ch2-refining4), we use a
five-class divergent brown-blue-green (BrBG) color scheme\index{Color scheme!Divergent}
from **RColorBrewer**\index{R Packages!RColorBrewer}
that is also colorblind safe\index{Colors!Colorblind safe} and print friendly,\index{Colors!Print friendly}
obtained via `colors = RColorBrewer::brewer.pal(n = 5, name = "BrBG")`.


```{r Ch2-refining4, fig.cap = 'Fourth refined linked micromap plot\\index{Linked micromap plot}, based on the _edPov_\\index{Datasets!edPov} dataset. Main changes are the use of a divergent brown-blue-green color scheme\\index{Color scheme!Divergent} and the conversion back to the traditional grouping for 51 subregions.', fig.width = 7, fig.height = 9}
mmplot(
  stat.data = edPov,
  map.data = state_polys_table,
  map.link = c("StateAb", "ID"),
  panel.types = c("map", "dot_legend", "labels", "dot", "dot"),
  panel.data = list(NA, NA, "state", "ed", "pov"),
  ord.by = "ed",
  rev.ord = TRUE,
  grouping = 5,
  median.row = TRUE,
  colors = RColorBrewer::brewer.pal(n = 5, name = "BrBG"),
  panel.att = list(list(3, align = "left"))
)
```


So far, labels for the columns and titles for the statistical graphics columns are not shown.
Tic marks and tic mark labels, in particular in the second statistical graphics column, could be improved,
background colors in the statistical graphics columns and maps could be modified, 
font and symbol sizes could be modified, and the widths of the columns could be adjusted.
All of this is done via the `panel.att` argument that controls the panel specific attributes of
each column in the linked micromap plot.\index{Linked micromap plot}
The content of this argument typically is a list of lists where the attributes
for each column of the linked micromap plot\index{Linked micromap plot} are modified via a separate list.
These inner lists are numbered from 1 to the number of elements in the `panel.types` vector
where `1` is related to `map`, `2` to `dot_legend`, `3` to `labels`, and `4` and `5` to `dot`, i.e.,
the dotplots\index{Dotplot} in the two statistical graphics columns,
in the linked micromap plot\index{Linked micromap plot} shown in
Figure \@ref(fig:Ch2-refining5).

We leave it to the reader to further experiment with the different elements in these lists.
If the purpose of a certain element is not immediately obvious, it is useful
to considerably increase or decrease the numeric value of that element or change the color to
`red` or `yellow` to highlight that element. 

Here, we only want to explain the purpose of the `fill.regions` element with the matching `header` element
in the list for column `1`, i.e., the `map` column:
The setting `fill.regions = "aggregate"` (which in fact is the default setting)
fills in the subregions from all previous perceptual groups in the
subsequent perceptual groups. This filing proceeds from
the top perceptual group to the bottom perceptual group by sequentially
filling the subregions that have already been displayed.
Thus, in the map for the final perceptual group at the bottom, all subregions have been filled.
The text in the `header` is used to communicate this information to the reader.
Alternatively, the setting `fill.regions = "with data"` only fills those subregions
in a map that actually show data in that perceptual group. 
No additional subregions are filled in any of the maps.
Another setting for `fill.regions` is discussed in the next refinement step.


```{r Ch2-refining5, fig.cap = 'Fifth refined linked micromap plot\\index{Linked micromap plot}, based on the _edPov_\\index{Datasets!edPov} dataset. Main changes are related to the panel specific attributes.', fig.width = 7, fig.height = 9}
mmplot(
  stat.data = edPov,
  map.data = state_polys_table,
  map.link = c("StateAb", "ID"),
  panel.types = c("map", "dot_legend", "labels", "dot", "dot"),
  panel.data = list(NA, NA, "state", "ed", "pov"),
  ord.by = "ed",
  rev.ord = TRUE,
  grouping = 5,
  median.row = TRUE,
  colors = RColorBrewer::brewer.pal(n = 5, name = "BrBG"),
  panel.att = list(
    list(
      1,
      header = "Light Gray Means\nPreviously Displayed",
      map.all = TRUE,
      fill.regions = "aggregate",
      active.border.color = "black",
      active.border.size = 1.2,
      inactive.border.color = gray(0.7),
      inactive.border.size = 1,
      panel.width = 0.85
    ),
    list(
      2,
      point.type = 20,
      point.border = TRUE,
      point.size = 2,
      panel.width = 1.0
    ),
    list(
      3,
      header = "States",
      align = "left",
      text.size = 0.9,
      panel.width = 0.75
    ),
    list(
      4,
      header = "Percent Adults With\n4+ Years of College",
      graph.bgcolor = "lightgray",
      point.size = 1.5,
      xaxis.ticks = list(10, 20, 30, 40),
      xaxis.labels = list(10, 20, 30, 40),
      xaxis.title = "Percent"
    ),
    list(
      5,
      header = "Percent Living Below\nPoverty Level",
      graph.bgcolor = "lightgray",
      point.size = 1.5,
      xaxis.ticks = list(5, 10, 15, 20),
      xaxis.labels = list(5, 10, 15, 20),
      xaxis.title = "Percent"
    )
  )
)
```


In the final refined linked micromap plot\index{Linked micromap plot} shown in
Figure \@ref(fig:Ch2-refining6), we make three more types of changes.
First, we make use of the `labeling::extended()` function of 
the **labeling**\index{R Packages!labeling} R package [@Talbot2020].  This package is not 
included with **micromap**\index{R Packages!micromap} and it must be installed separately. 
This function is provided with the minimum and maximum values of a variable and the tentative
number of tic marks and tic marks labels for that variable and it then creates a vector with
near-optimal axis labels. The argument `m` is used as a guideline for the number of axis labels,
but the actual number of near-optimal axis labels may differ slightly. The reader is encouraged to experiment with
`m = 2` to `m = 6` in the R code below.

Second, we use the setting `fill.regions = "two ended"`. This setting makes most sense when
`median.row = TRUE` (as is the case here) and the focus of the maps is to 
indicate which subregions are above or below the median value
of the variable specified in the `ord.by` argument (here `ed`).
Similar to the setting `fill.regions = "aggregate"`,
the subregions from all previous perceptual groups are filled in the
subsequent perceptual groups. This filling proceeds from
the top perceptual group to the median row 
and from the bottom perceptual group to the median row
by sequentially
filling the subregions that have already been displayed on the more extreme ends.
The text in the `header` has been updated to communicate this information to the reader.

Third, we reduce the margin space between some of the columns via the
`right.margin` and `left.margin` settings. Negative values are allowed.
Fine-tuning the margin spacing and the widths of the columns can require
a few iterations. The reader always should check carefully that no identifiers
are truncated or overprinted, in particular that no letters
from the longest identifier (here `Washington D.C.`) are cut off.


```{r Ch2-refining6, fig.cap = 'Sixth (and final) refined linked micromap plot\\index{Linked micromap plot}, based on the _edPov_\\index{Datasets!edPov} dataset. Main changes are related to labeling, the coloring of perceptual groups above and below the median row, and the column spacing.', fig.width = 7, fig.height = 9}
library(labeling)

mmplot(
  stat.data = edPov,
  map.data = state_polys_table,
  map.link = c("StateAb", "ID"),
  panel.types = c("map", "dot_legend", "labels", "dot", "dot"),
  panel.data = list(NA, NA, "state", "ed", "pov"),
  ord.by = "ed",
  rev.ord = TRUE,
  grouping = 5,
  median.row = TRUE,
  colors = RColorBrewer::brewer.pal(n = 5, name = "BrBG"),
  panel.att = list(
    list(
      1,
      header = "Two-ended\nCumulative Maps",
      map.all = TRUE,
      fill.regions = "two ended",
      active.border.color = "black",
      active.border.size = 1.2,
      inactive.border.color = gray(0.7),
      inactive.border.size = 1,
      panel.width = 0.85
    ),
    list(
      2,
      point.type = 20,
      point.border = TRUE,
      point.size = 2,
      panel.width = 1.0
    ),
    list(
      3,
      header = "States",
      align = "left",
      right.margin = 0,
      left.margin = -1,
      text.size = 0.9,
      panel.width = 0.75
    ),
    list(
      4,
      header = "Percent Adults With\n4+ Years of College",
      graph.bgcolor = "lightgray",
      right.margin = 0,
      left.margin = -0.6,
      point.size = 1.5,
      xaxis.ticks = as.list(labeling::extended(
        dmin = min(edPov$ed),
        dmax = max(edPov$ed),
        m = 5
      )),
      xaxis.labels = as.list(labeling::extended(
        dmin = min(edPov$ed),
        dmax = max(edPov$ed),
        m = 5
      )),
      xaxis.title = "Percent"
    ),
    list(
      5,
      header = "Percent Living Below\nPoverty Level",
      graph.bgcolor = "lightgray",
      right.margin = 0.25,
      left.margin = -0.6,
      point.size = 1.5,
      xaxis.ticks = as.list(labeling::extended(
        dmin = min(edPov$pov),
        dmax = max(edPov$pov),
        m = 4
      )),
      xaxis.labels = as.list(labeling::extended(
        dmin = min(edPov$pov),
        dmax = max(edPov$pov),
        m = 4
      )),
      xaxis.title = "Percent"
    )
  )
)
```


What remains to be done is a summary and interpretation of the 
final refined linked micromap plot\index{Linked micromap plot} shown in Figure \@ref(fig:Ch2-refining6).
This plot shows dotplots of two statistical variables, the percentage of adults with four or more years of college
(in the first statistical graphics column which is the fourth column overall) 
and the percentage living below poverty level (in the second statistical graphics column 
which is the fifth column overall) in the 50 U.S. states and Washington, D.C.

The percentage of adults with four or more years of college is used as the sorting variable
for the rows in the plot -- with highest percentages shown at the top and lowest percentages
shown at the bottom.
The map panel in the first column shows some noticeable, but not very strong spatial patterns.
Highest percentages of adults with four or more years of college can be found in the
northeastern states. In fact, eight of the top-10 states are located in the northeast,
with Washington, D.C., having the highest percentage with almost 40%. When looking at the other
maps above the median state (here, Arizona), additional eastern states, but also several western
states can be seen.
When looking at the two maps at the bottom of the plot, mostly southern states can be seen.
West Virginia is the state with the lowest percentage of only about 15%
of adults with four or more years of college. Overall,
primarily southern and central states can be seen in the maps below the median state.

The percentage living below poverty level (in the fifth column overall) shows a different pattern.
States with a high percentage of adults with four or more years of college have
a low percentage living below poverty level. 
Visually, the dots in the fourth and fifth column diverge, forming some crude caret shape
(resembling an upside down V-shape).
Washington, D.C., is a major outlier as it has the highest 
percentage of adults with four or more years of college (almost 40%)
but also the highest percentage living below poverty level (more than 20%).
Another interesting state is New Mexico with 
an above median percentage of adults with four or more years of college,
but with the fourth highest percentage living below poverty level (about 18%).
As expected, the overall correlation between these two variables is negative,
but given these two major outliers and several minor outliers
(such as Indiana and Nevada that both have a relatively low percentage living below poverty level
despite being among the bottom-10 states with respect to the percentage of
adults with four or more years of college), the (negative) correlation is relatively weak.
The correlation coefficient $r$ is
only `r round(cor(edPov$ed, edPov$pov), digits = 2)`.


## Example 2: A Linked Micromap Plot for Watersheds in West Virginia {#Ch2-Example2}


In this second linked micromap plot\index{Linked micromap plot}
example created with the **micromap**\index{R Packages!micromap} R package, we work with the
_WV_Watershed_\index{Datasets!WV\_Watershed} dataset
from the **micromapExtra**\index{R Packages!micromapExtra} R package.
We follow the four general steps outlined in Section \@ref(Ch2-Steps) again.
However, there are several differences compared to the first example in Section \@ref(Ch2-Example1).

First, we work with external shapefiles\index{Shapefiles} [@ESRI1998]
that contain both the boundary for watersheds in West Virginia and the statistical data
used in the following linked micromap plots\index{Linked micromap plot}, rather than having
a separate dataset for the statistical data. 
In general, shapefiles\index{Shapefiles} are a collection of related files 
with the same prefix that contain the geography and attributes (i.e., data) 
of geographically referenced spatial features.
Shapefiles\index{Shapefiles} consist of at least three files: 
a main file that stores the feature geometry (with suffix `.shp`), 
an index file that stores the index of the feature geometry (with suffix `.shx`), 
and a dBASE table that contains the attribute information of the spatial features (with suffix `.dbf`). 
Additional files may be included in a shapefile\index{Shapefiles}. 
Here, for the _WV_Watershed_\index{Datasets!WV\_Watershed} dataset, four additional files are provided:
a file with suffix `.prj` that contains the spatial coordinate system information (i.e., the projection), 
two files with suffix `.sbn` and `.sbx` that store the spatial index of the features,
and a file with suffix `.xml` that contains metadata for the shapefile.
Additional suffixes may be used for other shapefiles\index{Shapefiles} as outlined in @ESRI2016.
For use in linked micromap plots\index{Linked micromap plot}
that are created with the **micromap**\index{R Packages!micromap} R package,
external shapefiles\index{Shapefiles} can be handled in two ways.

Option (i) is to read in the external shapefile\index{Shapefiles} as a
SpatialPolygonsDataFrame,\index{SpatialPolygonsDataFrame}
and then split it into the geographic information and the statistical data component,
use a modified statistical data component, or use a statistical data component from 
a different source, in particular if the shapefile\index{Shapefiles} does not contain any
statistical data.
Numerous R packages, such as 
**raster**\index{R Packages!raster} [@Hijmans2022raster],
**sf**\index{R Packages!sf} [@Pebesma2022],
**shapefiles**\index{R Packages!shapefiles} [@Stabler2022], 
and **terra**\index{R Packages!terra} [@Hijmans2022terra],
support the use of shapefiles\index{Shapefiles} in R.
From these R packages, only the **raster**\index{R Packages!raster}
R package directly creates a SpatialPolygonsDataFrame.\index{SpatialPolygonsDataFrame}
For functions from the other R packages, an additional transformation step would be needed.
Therefore, we use the `raster::shapefile()` function here to read in the shapefile.\index{Shapefiles}

Option (ii) is to read in the external shapefile\index{Shapefiles} in a
simple features\index{Simple features} format [@OGC2022] 
via the `sf::st_read()` function 
from the **sf**\index{R Packages!sf} R package [@Pebesma2022],
instead of creating a
SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame}.
It is worthwhile to mention that geography and attributes that are
stored in a simple features\index{Simple features} format 
no longer have to be split when used in linked micromap plots.\index{Linked micromap plot} 
Rather, the **micromap**\index{R Packages!micromap} R package
can handle them directly from the simple features\index{Simple features} object.

Overall, options (i) and (ii) both will eventually result in the same final
linked micromap plot.\index{Linked micromap plot} 
Option (i) may be preferred if the external shapefile does not contain the statistical data
or if some considerable modifications have to be made to the statistical data prior to
creating the linked micromap plot.\index{Linked micromap plot}
Not all R package can handle simple features\index{Simple features} objects so that
a split into a geographic component and a statistical data component may be necessary anyway
for some advanced processing of the statistical data.
Option (ii) may be preferred if the statistical data from the external shapefile\index{Shapefiles}
can be used almost as provided in the external shapefile.\index{Shapefiles}

Moreover, in this second example we introduce two new statistical displays
for the statistical graphics columns of the linked micromap plot\index{Linked micromap plot}:
a boxplot\index{Boxplot} and a dotplot with confidence bounds\index{Dotplot with confidence bounds}. 
Different arguments are used to fine-tune this linked micromap plot\index{Linked micromap plot}.
Finally, we will demonstrate how to add an overall statistics (or criteria) line to the 
statistical graphics columns of the linked micromap plot\index{Linked micromap plot}.


### Identifying and Geoprocessing of Spatial Boundary Data {#Ch2-IdentifyingWV}


The _WV\_Watershed_\index{Datasets!WV\_Watershed} dataset
from the **micromapExtra**\index{R Packages!micromapExtra} R package
is stored in external shapefiles\index{Shapefiles} that can be
read in as a SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame}
via the `raster::shapefile()` function from the
**raster**\index{R Packages!raster} R package [@Hijmans2022raster] 
or via the `sf::st_read()` function from the 
**sf**\index{R Packages!sf} R package [@Pebesma2022]
as discussed in the previous section. 

The shapefile contains
25 aggregated watersheds and subbasins in West Virginia in the United States. 
These watersheds were introduced and discussed in more detail in @MPRG2016.
In addition to the geographic information, this dataset also contains
the statistical information for the linked micromap plots\index{Linked micromap plot} 
created in this section.

We first demonstrate the steps necessary for option (i).
Similar to the first example in Section \@ref(Ch2-Identifying), 
we verify that this object (once read into R) indeed is a
SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame}, then view 
some of the data in the `data` component of this object, and finally plot it,
as shown in Figure \@ref(fig:Ch2-WV).
The last step is the extraction of the statistical data from this
SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame} into a regular data frame.
This can be done by accessing the data from the `@data` slot in the 
SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame}.

There are 41 variables in the resulting data frame.
In contrast to @MPRG2016, we will focus on the specific conductance variables
in the following linked micromap plots.\index{Linked micromap plot}

Variable names beginning with an uppercase letter, e.g., `Cond_med`, `Cond_LCB95`,
`Cond_UCB95`, are the population estimates, representing the median,
the lower 95% confidence bound, and the upper 95% confidence bound of a variable
(here, specific conductance) in each of the 25 watersheds, respectively.
These will be used for the construction of 
dotplots with confidence bounds\index{Dotplot with confidence bounds}.

Variable names starting with a lowercase letter, e.g., 
`cond_min`, `condq1`, `cond_med1`, `cond_q3`, `cond_max`,
are the descriptive statistics, representing the minimum, first quartile, median,
third quartile, and maximum of a variable (here, specific conductance) in each of the 25 watersheds, respectively.
These will be used for the construction of boxplots\index{Boxplot}.


(ref:Ch2-WV-cap) Map representation of the _WV_Watershed_\index{Datasets!WV\_Watershed} spatial boundary dataset for the 25 watersheds in West Virginia that is used as the basis for the linked micromap plots\index{Linked micromap plot} for option (i).


```{r Ch2-WV, fig.cap = '(ref:Ch2-WV-cap)', fig.width = 5, fig.height = 4}
wv_watershed <- raster::shapefile(
  x = "data/WV_Watershed/RandomWatershed2_stats_smooth.shp",
  verbose = FALSE
)

class(wv_watershed)

head(wv_watershed@data, n = 2)

par(mar = c(0, 0, 0, 0))
plot(wv_watershed)

wv_data <- wv_watershed@data

names(wv_data)
dim(wv_data)
```

Alternatively, we demonstrate the steps necessary for option (ii).
Similar to option (i),
we verify that this object (once read into R) indeed is in the
simple features\index{Simple features} format, then look
at some of the data of this sf object, and finally plot it,
as shown in Figure \@ref(fig:Ch2-WV-sf).


(ref:Ch2-WV-cap-sf) Map representation of the _WV_Watershed_\index{Datasets!WV\_Watershed} spatial boundary dataset for the 25 watersheds in West Virginia that is used as the basis for the linked micromap plots\index{Linked micromap plot} for option (ii). It is identical to the one for option (i) shown in Figure \@ref(fig:Ch2-WV).


```{r Ch2-WV-sf, fig.cap = '(ref:Ch2-WV-cap-sf)', fig.width = 5, fig.height = 4}
wv_watershed_sf <- sf::st_read(
  dsn = "data/WV_Watershed/RandomWatershed2_stats_smooth.shp",
  quiet = TRUE
)

class(wv_watershed_sf)

head(wv_watershed_sf, n = 2)

par(mar = c(0, 0, 0, 0))
plot(sf::st_geometry(wv_watershed_sf))
```


### Linking Spatial Boundary Data and Statistical Data {#Ch2-LinkingWV}


This step is only required for option (i) and can be skipped entirely for option (ii).
Similar to Section \@ref(Ch2-Linking),
the SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame} first is transformed
into a regular data frame for use in a linked micromap plot\index{Linked micromap plot}
via the `create_map_table()` function. The `Random_Wat` variable from the `data`
component of the SpatialPolygonsDataFrame\index{SpatialPolygonsDataFrame} object 
can be used as an ID column here. 


```{r Ch2-linkingdataframesWV}
wv_polys_table <- create_map_table(
  tmp.map = wv_watershed,
  IDcolumn = "Random_Wat"
)

head(wv_polys_table)
class(wv_polys_table)
dim(wv_polys_table)
names(wv_polys_table)
```


The resulting `wv_polys_table` is a regular data frame that will be used for creating a
linked micromap plot\index{Linked micromap plot} in the next step.
`wv_polys_table` consists of `r dim(wv_polys_table)[1]` rows. This implies that the map is based
on `r dim(wv_polys_table)[1]` line segments.

We have already extracted the statistical data into the `wv_data` data frame.
Same as for the spatial component, 
the `Random_Wat` variable serves as the linking variable between the two data frames.
As in Section \@ref(Ch2-Linking), we want to verify that there is indeed at least one matching ID 
in the `wv_polys_table` data frame for each ID in the `wv_data` data frame.
In fact, everything is matching here.


```{r Ch2-linkingdataframesWV2}
all(sort(wv_data$Random_Wat) == sort(unique(wv_polys_table$ID)))
```


### Creating a Draft Linked Micromap Plot {#Ch2-CreatingWV}


We can now create a minimal draft linked micromap plot\index{Linked micromap plot} using the
`mmplot()` function, based on the previously created 
`wv_polys_table` and `wv_data` data frames for option (i).
Similar to our first example in Section \@ref(Ch2-Creating),
we only provide seven required arguments in our function call.
None of these arguments have a default setting.
For all other arguments of this function, the default settings will be used here. 
The statistical data (`wv_data`) is assigned to the `stat.data` argument
and the spatial boundary data (in the `wv_polys_table` data frame) is assigned to the `map.data` argument.

The `map.link` argument is needed to tell the function how to link the statistical data
and the spatial boundary data. A vector with the names of the two linking variables 
identified in the previous step is needed for this argument.
As explained in Section \@ref(Ch2-Creating),
the first variable name (`Random_Wat`) must come from the statistical data frame (`wv_data`)
and the second variable name (`ID`) must come from the data frame that
contains the spatial boundary information (`wv_polys_table`).
Changing the order of these two variable names typically results in an error.

As discussed in Section \@ref(Ch2-Creating),
the `panel.types` and `panel.data` arguments are closely related. 
Even though we want to ultimately display boxplots\index{Boxplot} 
and dotplots with confidence bounds\index{Dotplot with confidence bounds} in the
statistical graphics columns of the linked micromap plot,\index{Linked micromap plot}
it often is prudent to start with simple dotplots\index{Dotplot}.
Therefore, we start with these initial settings for the 
`panel.types` vector: We have the `dot_legend` in the first column, `labels` in the second column,
two columns with statistical data represented as dotplots\index{Dotplot} (`dot`)
in the third and fourth columns,
and the micromaps (`map`) in the final fifth column.
This is matched with a list of data that is used for each of the five columns.
Here, we use `Random_Wat` for the second column and `cond_med1` and `Cond_med`
for the third and fourth columns. No further data has to be used for the
first and fifth columns, so `NA` is assigned here.

Two more arguments have to be specified: `ord.by` specifies the sorting variable of the 
rows in the linked micromap plot\index{Linked micromap plot}. Here, `cond_med1` 
from the `wv_data` argument is used. The sorting of the rows goes from smallest (at the top) to
largest (at the bottom) median observed specific conductance in a certain subbasin. 
Finally, `grouping` specifies  the number of rows
in each of the perceptual groups (here `5` rows). This implies that the 25 subbasins
will be split into five perceptual groups showing five subbasins each.
This is the recommended Partitioning 1 from Table \@ref(tab:Ch1-PartitioningTable) in Chapter \@ref(Ch1).

The resulting draft linked micromap plot\index{Linked micromap plot} is shown in
Figure \@ref(fig:Ch2-creatingdraftWVdot).
We abstain from interpreting this figure at this stage, but we will provide some
helpful interpretation once we have created the final refined version of this figure
in the next step.


```{r Ch2-creatingdraftWVdot, fig.cap = 'Draft linked micromap plot\\index{Linked micromap plot} for option (i), based on the previously created `wv_data` and `wv_polys_table` data frames for the West Virginia watershed dataset.', fig.width = 7, fig.height = 7}
mmplot(
  stat.data = wv_data,
  map.data = wv_polys_table,
  map.link = c("Random_Wat", "ID"),
  panel.types = c("dot_legend", "labels", "dot", "dot", "map"),
  panel.data = list(NA, "Random_Wat", "cond_med1", "Cond_med", NA),
  ord.by = "cond_med1",
  grouping = 5
)
```


For option (ii), the same draft linked micromap plot\index{Linked micromap plot}
can be created directly from the `wv_watershed_sf` object 
(see Figure \@ref(fig:Ch2-creatingdraftWVdotsf)).
It is not necessary to call the `create_map_table()` function first.
Also, the `stat.data` and `map.link` arguments are not needed as the `wv_watershed_sf` object
contains all relevant geographic and statistical data.
The linked micromap plots\index{Linked micromap plot}
in Figure \@ref(fig:Ch2-creatingdraftWVdot), 
and Figure \@ref(fig:Ch2-creatingdraftWVdotsf) are identical.
Both draft linked micromap plots\index{Linked micromap plot}
can be refined in a similar way as shown in the next section.


```{r Ch2-creatingdraftWVdotsf, fig.cap = 'Draft linked micromap plot\\index{Linked micromap plot} for option (ii), based on the West Virginia watershed dataset. In contrast to the previous figure, this figure has been created directly from the external shapefiles without using the `create_map_table()` function first.', fig.width = 7, fig.height = 7}
mmplot(
  map.data = wv_watershed_sf,
  panel.types = c("dot_legend", "labels", "dot", "dot", "map"),
  panel.data = list(NA, "Random_Wat", "cond_med1", "Cond_med", NA),
  ord.by = "cond_med1",
  grouping = 5
)
```


### Refining the Linked Micromap Plot {#Ch2-RefiningWV}


We continue with the draft linked micromap plot\index{Linked micromap plot} from the previous
step and refine it in two main steps. As mentioned in Section \@ref(Ch2-Refining),
this refinement should only be started once
functional R code has been obtained in the previous step.

A problem that did not occur in the first example in Section \@ref(Ch2-Refining) is the
length of many subregion names that considerably exceed the allocated space for this column
as can be seen in Figure \@ref(fig:Ch2-creatingdraftWVdot). Instead of using full terms,
we shorten the names of the subbasins, in particular those seven names that combine smaller
hydrologic unit codes\index{Hydrologic unit code}
(HUC) with larger neighboring HUCs (here HUC 8 at the subbasin level) in the West Virginia sampling frame.
Overall, the following abbreviations are introduced and added as part of the new `short_name` variable
to the `wv_data` data frame: 
L. (Lower), M. (Middle), U. (Upper), N. (North), S. (South),
Br. (Branch), Dunk. (Dunkard), Mononga. (Monongahela), Shen. (Shenandoah), and Youg. (Youghiogheny).
Shortening the subregion names often is recommended to prevent the names column from
occupying too much space in the final linked micromap plot\index{Linked micromap plot}.


```{r Ch2-shortennamesWV}
wv_data$short_name <- as.factor(wv_data$Random_Wat)

wv_watershed_short <- list(
  "S. Br. Potomac" = "South Branch Potomac",
  "N. Br. Potomac" = "North Branch Potomac",
  "Cacapon/Shen." = "Cacapon/Shenandoah Hardy",
  "Potomac/Shen." = "Potomac Direct Drains/Shenandoah Jefferson",
  "U. New/James" = "Upper New/James",
  "Tygart Valley" = "Tygart Valley",
  "West Fork" = "West Fork",
  "Mononga./Dunk." = "Monongahela/Dunkard",
  "Cheat/Youg." = "Cheat/Youghiogheny",
  "U. Ohio N./S." = "Upper Ohio North/Upper Ohio South",
  "M. Ohio N." = "Middle Ohio North",
  "M. Ohio S." = "Middle Ohio South",
  "Little Kanawha" = "Little Kanawha",
  "Greenbrier" = "Greenbrier",
  "L. New" = "Lower New",
  "Gauley" = "Gauley",
  "U. Kanawha" = "Upper Kanawha",
  "Elk" = "Elk",
  "L. Kanawha" = "Lower Kanawha",
  "Coal" = "Coal",
  "U. Guyandotte" = "Upper Guyandotte",
  "L. Guyandotte" = "Lower Guyandotte",
  "Tug Fork" = "Tug Fork",
  "Big Sandy/L. Ohio" = "Big Sandy/Lower Ohio",
  "Twelvepole" = "Twelvepole"
)

levels(wv_data$short_name) <- wv_watershed_short
```


In the first refined linked micromap plot\index{Linked micromap plot} for option (i), shown in 
Figure \@ref(fig:Ch2-refining1WV), we make use of these abbreviations 
via the newly added `short_name` variable of the `wv_data` data frame.
Moreover, we introduce initial boxplots\index{Boxplot} and 
dotplots with confidence bounds\index{Dotplot with confidence bounds} 
in the two statistical graphics columns.
This is done via `box_summary` and `dot_cl` for the `panel.types` argument.
A boxplot\index{Boxplot} requires five components 
(minimum, first quartile, median, third quartile, and maximum)
and a dotplot with confidence bounds\index{Dotplot with confidence bounds}
requires three components
(median, lower 95% confidence bound, and upper 95% confidence bound)
for each subregion. These have to be passed on as lists to
the `panel.data` argument list. Here, we make use of these
components from the specific conductance variable, specifically using
`list("cond_min", "condq1", "cond_med1", "condq3", "cond_max")`
for the boxplots\index{Boxplot} and
`list("Cond_med", "Cond_LCB95", "Cond_UCB95")`
for the dotplots with confidence bounds.\index{Dotplot with confidence bounds}
It should be noted that these eight components have been precalculated
for each subregion and are variables of the `wv_data` data frame. 
The `mmplot()` function does not calculate these summary
statistics by itself from a provided data frame.


```{r Ch2-refining1WV, fig.cap = 'First refined linked micromap plot\\index{Linked micromap plot} for option (i), based on the `wv_data` data frame. Main changes are the introduction of abbreviated subregion names and initial boxplots\\index{Boxplot} and dotplots with confidence bounds\\index{Dotplot with confidence bounds} in the two statistical graphics columns.', fig.width = 7, fig.height = 7}
mmplot(
  stat.data = wv_data,
  map.data = wv_polys_table,
  map.link = c("Random_Wat", "ID"),
  panel.types = c("dot_legend", "labels", "box_summary", "dot_cl", "map"),
  panel.data = list(
    NA,
    "short_name",
    list("cond_min", "condq1", "cond_med1", "condq3", "cond_max"),
    list("Cond_med", "Cond_LCB95", "Cond_UCB95"),
    NA
  ),
  ord.by = "cond_med1",
  grouping = 5
)
```


In the second (and final) refined linked micromap plot\index{Linked micromap plot} for option (i), shown in 
Figure \@ref(fig:Ch2-refining2WV), we make similar changes as for the third through sixth
refined linked micromap plots\index{Linked micromap plot} in Section \@ref(Ch2-Refining).
This includes a reverse sorting (`rev.ord = TRUE`) that places 
the largest observed median specific conductance at the top, 
a specific color scheme (`colors = RColorBrewer::brewer.pal(n = 5, name = "YlGnBu")[5:1]`),
numerous changes to the sizes and spacing of the perceptual groups and columns,
and the addition of titles, scales, and axis labels.
We keep the maps on the right side in the fifth column. 
A square symbol (instead of a circular one) is used for the color-coded legend in the first column 
via `point.type = 15` in the first list of the `panel.att` list.
The thickness of the boxes in the third column (i.e., the first statistical graphics column)
is reduced via `graph.bar.size = 0.7` in the third list of the `panel.att` list.
The dashed black line in the fourth column (i.e., the second statistical graphics column)
is created via the settings `add.line = 129, add.line.col = "black", add.line.typ = "dashed"`
in the fourth list of the `panel.att` list.
The value `129` microSiemens per cm ($\mu$S/cm) represents the estimated median specific conductance
across all 4th order streams (or less). The Greek letter $\mu$ in the two axis titles is 
produced via its unicode, i.e., `\u03BC`.


```{r Ch2-refining2WV, fig.cap = 'Second (and final) refined linked micromap plot\\index{Linked micromap plot} for option (i), based on the `wv_data` data frame. Main changes are related to sorting, coloring, spacing, and labeling. An overall statistics (or criteria) line has been added to the second statistical graphics column.', fig.width = 7, fig.height = 7}
mmplot(
  stat.data = wv_data,
  map.data = wv_polys_table,
  map.link = c("Random_Wat", "ID"),
  panel.types = c("dot_legend", "labels", "box_summary", "dot_cl", "map"),
  panel.data = list(
    NA,
    "short_name",
    list("cond_min", "condq1", "cond_med1", "condq3", "cond_max"),
    list("Cond_med", "Cond_LCB95", "Cond_UCB95"),
    NA
  ),
  ord.by = "cond_med1",
  rev.ord = TRUE,
  grouping = 5,
  plot.pGrp.spacing = 1.2,
  colors = RColorBrewer::brewer.pal(n = 5, name = "YlGnBu")[5:1],
  panel.att = list(
    list(
      1,
      panel.width = 1.6,
      point.type = 15,
      point.size = 0.85,
      point.border = TRUE
    ),
    list(
      2,
      header = "WVDEP\nSubbasins",
      panel.width = 1.3,
      align = "left",
      text.size = 0.75
    ),
    list(
      3,
      header = "Observed specific conductance\n4th Order Streams or Less",
      graph.bgcolor = "lightgray",
      graph.bar.size = 0.7,
      xaxis.ticks = c(0, 1000, 2000, 3000, 4000, 5000),
      xaxis.labels = c(0, 1, 2, 3, 4, 5),
      xaxis.labels.size = 1.25,
      xaxis.title = "[1,000 \u03BCS/cm]",
      xaxis.title.size = 1.25,
      panel.width = 2.1,
      right.margin = 0,
      left.margin = -0.5
    ),
    list(
      4,
      header = "Estimated specific conductance\n4th Order Streams or Less",
      graph.bgcolor = "lightgray",
      xaxis.ticks = list(0, 150, 300, 450, 600, 750),
      xaxis.labels = list(0, 150, 300, 450, 600, 750),
      xaxis.labels.size = 1.25,
      xaxis.title = "[\u03BCS/cm]",
      add.line = 129,
      add.line.col = "black",
      add.line.typ = "dashed",
      xaxis.title.size = 1.25,
      panel.width = 2.1,
      right.margin = 0.1,
      left.margin = -0.5
    ),
    list(
      5,
      header = "Light Gray Means\nHighlighted Above",
      inactive.border.color = gray(0.7),
      inactive.border.size = 1.5,
      panel.width = 1.5
    )
  )
)
```


Similarly to the modifications for option (i), 
we make changes of the subregion names of the `wv_watershed_sf` object 
(using the previously created list of shortened subregion names in `wv_watershed_short`)
and immediately create the final refined linked micromap plot\index{Linked micromap plot}
(shown in Figure \@ref(fig:Ch2-refining2WVsf)) for option (ii),
rather than splitting this into two steps.
Again, it is not necessary to call the `create_map_table()` function first.
Also, the `stat.data` and `map.link` arguments are not needed as the `wv_watershed_sf` object
contains all relevant geographic and statistical data.
All other arguments for the refinement remain the same as used for Figure \@ref(fig:Ch2-refining2WV).


(ref:Ch2-refining2WVsf-cap) Second (and final) refined linked micromap plot\index{Linked micromap plot} for option (ii), based on the West Virginia watershed dataset. In contrast to the previous figure, this figure has been created directly from the external shapefiles without using the `create_map_table()` function first. It is identical to the one for option (i) shown in Figure \@ref(fig:Ch2-refining2WV).


```{r Ch2-refining2WVsf, fig.cap = '(ref:Ch2-refining2WVsf-cap)', fig.width = 7, fig.height = 7}
wv_watershed_sf$short_name <- as.factor(wv_watershed_sf$Random_Wat)

levels(wv_watershed_sf$short_name) <- wv_watershed_short

mmplot(
  map.data = wv_watershed_sf,
  panel.types = c("dot_legend", "labels", "box_summary", "dot_cl", "map"),
  panel.data = list(
    NA,
    "short_name",
    list("cond_min", "condq1", "cond_med1", "condq3", "cond_max"),
    list("Cond_med", "Cond_LCB95", "Cond_UCB95"),
    NA
  ),
  ord.by = "cond_med1",
  rev.ord = TRUE,
  grouping = 5,
  plot.pGrp.spacing = 1.2,
  colors = RColorBrewer::brewer.pal(n = 5, name = "YlGnBu")[5:1],
  panel.att = list(
    list(
      1,
      panel.width = 1.6,
      point.type = 15,
      point.size = 0.85,
      point.border = TRUE
    ),
    list(
      2,
      header = "WVDEP\nSubbasins",
      panel.width = 1.3,
      align = "left",
      text.size = 0.75
    ),
    list(
      3,
      header = "Observed specific conductance\n4th Order Streams or Less",
      graph.bgcolor = "lightgray",
      graph.bar.size = 0.7,
      xaxis.ticks = c(0, 1000, 2000, 3000, 4000, 5000),
      xaxis.labels = c(0, 1, 2, 3, 4, 5),
      xaxis.labels.size = 1.25,
      xaxis.title = "[1,000 \u03BCS/cm]",
      xaxis.title.size = 1.25,
      panel.width = 2.1,
      right.margin = 0,
      left.margin = -0.5
    ),
    list(
      4,
      header = "Estimated specific conductance\n4th Order Streams or Less",
      graph.bgcolor = "lightgray",
      xaxis.ticks = list(0, 150, 300, 450, 600, 750),
      xaxis.labels = list(0, 150, 300, 450, 600, 750),
      xaxis.labels.size = 1.25,
      xaxis.title = "[\u03BCS/cm]",
      add.line = 129,
      add.line.col = "black",
      add.line.typ = "dashed",
      xaxis.title.size = 1.25,
      panel.width = 2.1,
      right.margin = 0.1,
      left.margin = -0.5
    ),
    list(
      5,
      header = "Light Gray Means\nHighlighted Above",
      inactive.border.color = gray(0.7),
      inactive.border.size = 1.5,
      panel.width = 1.5
    )
  )
)
```


What remains to be done is a summary and interpretation of the 
final refined linked micromap plot\index{Linked micromap plot} shown 
in Figures \@ref(fig:Ch2-refining2WV) and \@ref(fig:Ch2-refining2WVsf).
This plot is related to specific conductance variables in 25 aggregated watersheds and 
subbasins of 4th order streams (or less) in West Virginia in the United States. 

The first statistical graphics column shows boxplots\index{Boxplot} for the 
observed specific conductance for each of these watersheds. The second statistical graphics column 
shows dotplots with confidence bounds\index{Dotplot with confidence bounds} 
that are the population estimates of the specific conductance,
representing the median, the lower 95% confidence bound, 
and the upper 95% confidence bound for each of these watersheds.
The rows in the plot are sorted from highest (at the top) to lowest (at the bottom) 
median observed specific conductance in a certain subbasin.

Each consecutive map shows which subregions have been colored previously in the map(s) above. 
There is a weak spatial pattern with the highest median observed specific conductance in the northern and 
western subbasins of West Virginia. There is a notable spatial cluster in the 
central-eastern subbasins of West Virginia with the five lowest median observed specific conductance
values shown in the perceptual group at the bottom of the plot.

The second statistical graphics column reveals that there is a remarkable relationship between 
the widths of the confidence bounds and the estimated specific conductance. The higher the 
estimated specific conductance, the wider the confidence bound in general.  The value 129 $\mu$S/cm, 
representing the estimated median specific conductance across all 4th order streams (or less), 
has been overlaid on this column.

Such a pattern cannot be noticed in the boxplots\index{Boxplot} in the first statistical graphics column. 
Here, some extreme outliers dominate, e.g., observed specific conductance of about 5,000 $\mu$S/cm
in the Little Kanawha subbasin and of about 4,000 $\mu$S/cm in the Lower (L.) Kanawha subbasin.

While this is hard to visually detect due to these extreme outliers, there is a 
considerable positive correlation between the observed median specific conductance and 
the estimated median specific conductance. Not surprisingly, the correlation coefficient $r$ is 
`r round(cor(wv_data$cond_med1, wv_data$Cond_med), digits = 2)`.


## Summary and Further Reading {#Ch2-SummaryFurtherReading}


In this chapter, we have discussed the four main steps that are required 
to create linked micromap plots\index{Linked micromap plot} with 
the **micromap**\index{R Packages!micromap} R package [@PaOl2024].
This R package is one of two major R packages for the construction of 
linked micromap plots.\index{Linked micromap plot}
We will encounter it in several of the following chapters again.
When designing linked micromap plots\index{Linked micromap plot} in those chapters,
the authors typically have followed the same main steps. However,
in most cases, only the final refined linked micromap plot\index{Linked micromap plot}
will be shown in the following chapters.
While most of those examples will follow option (i) and make use of the `create_map_table()` 
function as outlined in Section \@ref(Ch2-Example2),
the example shown in Figure \@ref(fig:Ch11-micromapWVDEP) 
in Chapter \@ref(Ch11) follows option (ii)
and directly creates a linked micromap plot\index{Linked micromap plot}
from a simple features\index{Simple features} object.

In Chapter \@ref(Ch4), we will see how to modify external shapefiles\index{Shapefiles}
that contain too many edges and/or contain subregions that are too large, too small,
or located too far away from the main geographic area --- and then construct
linked micromap plots\index{Linked micromap plot} that are based on the modified 
shapefiles.\index{Shapefiles}
In Chapter \@ref(Ch5), we will discuss how to display different types
of statistical graphics in the statistical graphics columns
in addition to dotplots,\index{Dotplot}
boxplots\index{Boxplot}, and
dotplots with confidence bounds\index{Dotplot with confidence bounds}
that were introduced in this chapter.

Chapter \@ref(Ch6) discusses linked micromap plots\index{Linked micromap plot}
that are based on point locations, rather than on areal data as in this chapter.
In Chapter \@ref(Ch7), we will see how to construct linked micromap plots\index{Linked micromap plot}
in a web-based environment. Finally, in Chapter \@ref(Ch11), we will
find applications of linked micromap plots\index{Linked micromap plot}
in the environmental field. We will see the example with the
watersheds in West Virginia, first introduced in Section \@ref(Ch2-Example2),
again in that chapter. These three chapters primarily make 
use of the **micromap**\index{R Packages!micromap} R package
although the **micromapST**\index{R Packages!micromapST} R package [@PeCa2024CRAN303]
could be used in a similar way.

We do not want to forget that a user of linked micromap plots\index{Linked micromap plot}
has a choice which R package to use for their creation. 
Chapter \@ref(Ch3) introduces the **micromapST**\index{R Packages!micromapST} R package [@PeCa2024CRAN303].
Chapters \@ref(Ch4b) and \@ref(Ch5b) resemble Chapters \@ref(Ch4) and \@ref(Ch5)
and discuss how to process external shapefiles\index{Shapefiles}
and display different types of statistical graphics with the
**micromapST**\index{R Packages!micromapST} R package. 
Each of the two R packages for linked micromap plots\index{Linked micromap plot}
has its own strengths and weaknesses.
The **micromapST**\index{R Packages!micromapST} R package has a larger number of
built-in types of statistical graphics for the statistical graphics columns
while the **micromap**\index{R Packages!micromap} R package allows (at least in theory)
to add any meaningful type of statistical graphics that is not currently part of this R package.
Similarly, the **micromapST**\index{R Packages!micromapST} R package supports a larger number of
built-in geographic regions
while the **micromap**\index{R Packages!micromap} R package
makes it easy to quickly construct linked micromap plots\index{Linked micromap plot}
whenever external shapefiles\index{Shapefiles} are available.
Such an option also exists for the **micromapST**\index{R Packages!micromapST} R package,
but it takes additional work to process external shapefiles\index{Shapefiles}.
Ultimately, it is up to the user of linked micromap plots\index{Linked micromap plot}
to decide which of these two R packages to use.

We want to finish this chapter with some practical recommendations.
When working with **R Markdown**\index{R Markdown} [@XAG2018;@XDR2021] and/or 
the **bookdown**\index{R Packages!bookdown} R package [@Xie2022bookdown],
sizing the figures with the linked micromap plots\index{Linked micromap plot} for the United States
as `fig.width = 7, fig.height = 9` works well.
For linked micromap plots\index{Linked micromap plot} used in journal publications,
using the arguments `plot.width = 7, plot.height = 9` often works well.
For linked micromap plots\index{Linked micromap plot} used in poster and presentation slides,
using the arguments `plot.width = 9, plot.height = 6` usually works better.
Ultimately, the width is somewhat determined by the number of statistical graphics columns.
A linked micromap plot\index{Linked micromap plot} with only one statistical graphics column
has to be less wide than one with three or four statistical graphics columns.
Similarly, the height is somewhat determined by the number of perceptual groups.
A linked micromap plot\index{Linked micromap plot} with only two or three perceptual groups
has to be less tall than one with nine or ten perceptual groups.

Some additional published examples of linked micromap plots\index{Linked micromap plot}
created with the **micromap**\index{R Packages!micromap} R package are related
to pesticides in 
Florida Department of Environmental Protection (FDEP) drainage basins [@SWSCSS2018],
to variables from the Virginia Stream Condition Index (VSCI) and the 
Virginia Coastal Plain Macroinvertebrate Index (VCPMI) [@VirginiaDEQ2020],
and to housing and immigration variables from the subboroughs in New York City [@Medri2021] --- all 
within the United States.
International examples are related
to population data in selected countries from South America [@SDWPM2014],
to suicide data by prefecture in Japan [@KIT2015],
and to the spatial distributions of religions in China [@SBDSS2016].
Some of these examples needed additional modifications of
external shapefiles\index{Shapefiles}
as discussed in Chapter \@ref(Ch4).


\printbibliography[segment=\therefsegment,heading=subbibliography]