Comparing type 1 and type 2 error rates of different tests for heterogeneous treatment effects.

Nestler, S.; Salditt, M.

Research article (journal) | Peer reviewed

Abstract

Psychologists are increasingly interested in whether treatment effects vary in randomized controlled trials. A number of tests have been proposed in the causal inference literature to test for such heterogeneity, which differ in the sample statistic they use (either using the variance terms of the experimental and control group, their empirical distribution functions, or specific quantiles), and in whether they make distributional assumptions or are based on a Fisher randomization procedure. In this manuscript, we present the results of a simulation study in which we examine the performance of the different tests while varying the amount of treatment effect heterogeneity, the type of underlying distribution, the sample size, and whether an additional covariate is considered. Altogether, our results suggest that researchers should use a randomization test to optimally control for type 1 errors. Furthermore, all tests studied are associated with low power in case of small and moderate samples even when the heterogeneity of the treatment effect is substantial. This suggests that current tests for treatment effect heterogeneity require much larger samples than those collected in current research.

Details about the publication

JournalBehavior Research Methods
Volume56
Page range6582-6597
StatusPublished
Release year2024
DOI10.3758/s13428-024-02371-x
Link to the full texthttps://doi.org/10.3758/s13428-024-02371-x
KeywordsCausality; Heterogeneous treatment effects; Randomization tests; Heterogeneous regression

Authors from the University of Münster

Nestler, Steffen
Professorship for statistics and research methods in psychology
Salditt, Marie Babette
Professorship for statistics and research methods in psychology