Skip to content

nathan-lindstedt/randomization_tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 

Repository files navigation

randomization_tests

THE SIGNIFICANCE OF PERMUTATION TESTS FOR PROGRAM ASSESSMENT WITH OBSERVATIONAL DATA: ADDRESSING THE ISSUE OF STATISTICAL INFERENCE WITH NON-PROBABILITY SAMPLES

In the book Randomization Tests, Edgington (1980) opens with a trenchant critique of the twinned myth of experimental design and statistical inference:

Experimental design books and others on the application of statistical tests to experimental data perpetuate the long-standing fiction of random sampling in experimental research. Statistical inferences are said to require random sampling and to concern population parameters. In experimentation, however, random sampling is very infrequent; consequently, statistical inferences about populations are usually irrelevant. Thus there is no logical connection between the random sampling model and its application to data from the typical experiment. The artificiality of the random sampling assumption has undoubtedly contributed to the skepticism of some experimenters regarding the value of statistical tests. What is a more important consequence of failure to recognize the prevalence of nonrandom sampling in experimentation, however, is overlooking the need for special statistical procedures that are appropriate for nonrandom samples. As a result, the development and application of randomization tests have suffered.

Randomization tests are statistical tests in which the data are repeatedly divided, a test statistic (e.g., t or F) is computed for each data division, and the proportion of the data divisions with as large a test statistic value an the value for the obtained results determines the significance of the results. For testing hypotheses about experimental treatment effects, random assignment but not random sampling is required. In the absence of random sampling the statistical inferences are restricted to the subjects actually used in the experiment, and generalization to other subjects must be justified by non-statistical argument.

Random assignment is the only random element necessary for determining the significance of experimental results by the randomization test procedure; therefore assumptions regarding random sampling and those regarding normality, homogeneity of variance, and other characteristics of randomly sampled populations, are unnecessary. Thus, any statistical test, no matter how simple or complex, is transformed into a distribution-free test when significance is determined by the randomization test procedure. For any experiment with random assignment, the experimenter can guarantee the validity of any test [they want] to use by determining significance by the randomization test procedure. Chapter 1 summarizes various advantages of the randomization test procedure, including its potential for developing statistical tests to meet the special requirements of a particular experiment, and its usefulness in providing for the valid use of statistical tests on experimental data from a single subject.

A great deal of computation is involved in performing a randomization test and, for that reason, such a means of determining significance was impractical until recent years, when computers became accessible to experimenters. As the use of computers is essential for the practical application of randomization tests, computer programs for randomization tests accompany discussions throughout the book. The programs will be useful for a number of practical applications of randomization tests, but their main purpose is to show how programs for randomization tests are written.

Inasmuch as the determination of significance by the randomization test procedure makes any of the hundreds (perhaps thousands) of published statistical tests into randomization tests, the discussion of application of randomization tests in this book cannot be exhaustive. Applications in the book have been selected to illustrate different facets of randomization tests so that the experimenter will have a good basis for generalizing to other applications. (P. v-vii)

He then continues by sketching the outline of a solution, describing the intuition behind a simple but expensive test that leverages the notions of permutation and random assignment to address the issue of non-probability (or non-random) samples:

A randomization test is a permutation test based on randomization (random assignment), where the test is carried out in the following manner. A test statistic is computed for the experimental data, then the data are permuted (divided or rearranged) repeatedly in a manner consistent with the random assignment procedure, and the test statistic is computed for each of the resulting data permutations. These data permutations, including the one representing the obtained results, constitute the reference set for determining significance. The proportion of data permutations in the reference set that have test statistic values greater than or equal to (or, for certain test statistics, less than or equal to) the value for the experimentally obtained results is the P-value (significance or probability value). If, for example, the proportion is 0.02, the P-value is 0.02, and the results are significant at the 0.05 but not the 0.01 level of significance. Determining significance on the basis of a distribution of test statistics generated by permuting the data is characteristic of all permutation tests; it is when the basis for permuting the data is random assignment that a permutation test is called a randomization test. (P. 1)

Given the language of "experimentation" used throughout these passages, it is perhaps unsurprising that the application of randomization tests or permutation tests to experimental data is more familiar to researchers within the behavioral sciences (e.g., Mewhort, Johns, and Kelly 2010) and the medical sciences (e.g., Rigdon and Hudgens 2014) as a corrective for their frequently less than ideal sampling conditions. On face value, lesser known is its relevance for observational data within the social sciences (yet see Taylor 2020). However, it should be noted that there is an established history of randomization tests and permutation tests within social network analysis as it is employed by the QAP (Hubert and Schultz 1976) and MRQAP (Krackhardt 1988) techniques. That said, the idea that randomization tests or permutation tests can be applied to observational data and not just experimental data is well founded (see Box and Andersen 1953; Chung and Fraser 1958; Rubin 1974).

Part of the reason for its unfamiliarity in the context of observational data are the limitations of such tests to within-sample conclusions. Fortunately, given the scope of common research questions on program assessment under non-experimental settings, which lack the need to generalize to a larger hypothetical population to answer, these limitations do not place any greater constraints on their scope than is needed for an answer. For example, if an analyst desires to assess whether adherents to a program experienced some significantly changed outcome using a non-probability sample, all that analyst cares about is that outcome for that sample and if it was statistically significant. There is no need to generalize to a larger hypothetical population to complete that assessment. Furthermore, analyst concerns regarding the dread of "self-selection" in experimental designs can be assuaged by reframing their understanding of the kind of hypotheses tested in terms of the more limited observational assessments, where data are collected by recording events as they "naturally" take place without manipulations. As program participants are no longer subject to intervention group and control group assignments, but are instead observed to take part in some behavior, the analyst can only offer evidence that those engaging in that behavior differed significantly from their counterparts through the assumption of exchangability. That is, the belief that the null hypothesis of there being no significant difference should hold for those in the sample when there actually is no significant difference between them. Here random assignment being induced via permuting.

Enter the randomization test or permutation test: a non-parametric method free from distributional assumptions. A succinct overview of permutation methods is given in an article by Berry, Johnston, and Mielke (2011).

References

Berry, K. J., Johnston, J. E., and P. W. Mielke. 2011. "Permutation methods." Wiley Interdisciplinary Reviews: Computational Statistics, 3(6):527-542.

Box, G. E. and S. L. Andersen. 1954. “Robust tests for variances and effect of non-normality and variance heterogeneity on standard tests.” Technical Report, North Carolina State University Institute of Statistics Mimeo Series.

Chung, J. H. and D. A. S. Fraser. 1958. "Randomization tests for a multivariate two-sample problem." Journal of the American Statistical Association, 53(283):729–735.

Edgington, E. S. 1980. Randomization tests. 2nd Ed. New York, NY: Marcel Dekker, Inc.

Hubert, L. J. and J. Schultz. 1976. "Quadratic assignment as a general data analysis strategy." British Journal of Mathematical and Statistical Psychology, 29:190-241.

Krackhardt, D. 1988. "Predicting with networks: nonparametric multiple regression analysis of dyadic data." Social Networks, 10:359–381.

Mewhort, D. J. K., Johns, B. T., and M. A. Kelly. 2010. "Applying the permutation test to factorial designs." Behavior Research Methods, 42:366–372.

Rigdon, J. and M. G. Hudgens. 2015. "Randomization inference for treatment effects on a binary outcome." Statistics in Medicine, 34(6):924-935.

Rubin, D. B. 1974. "Estimating causal effects of treatments in randomized and nonrandomized studies." Journal of educational Psychology, 66:688–701.

Taylor, M. A. 2020. "Visualization strategies for regression estimates with randomization inference." The Stata Journal, 20(2):309-335.

Releases

No releases published

Packages

No packages published