Examining the Internal Validity and Statistical Precision of the Comparative Interrupted Time Series Design by Comparison With a Randomized Experiment
Although evaluators often use an interrupted time series (ITS) design to test hypotheses about program effects, there are few empirical tests of the design’s validity. We take a randomized experiment on an educational topic and compare its effects to those from a comparative ITS (CITS) design that uses the same treatment group as the experiment but a nonequivalent comparison group that is assessed at six time points before treatment. We estimate program effects with and without matching of the comparison schools, and we also systematically vary the number of pretest time points in the analysis. CITS designs produce impact estimates that are extremely close to the experimental benchmarks and, as implemented here, do so equally well with and without matching. Adding time points provides an advantage so long as the pretest trend differences in the treatment and comparison groups are correctly modeled. Otherwise, more time points can increase bias.