Reconsidering the Impact of High-stakes Testing


  • Henry Braun Educational Testing Service



Over the last fifteen years, many states have implemented high-stakes tests as part of an effort to strengthen accountability for schools, teachers, and students. Predictably, there has been vigorous disagreement regarding the contributions of such policies to increasing test scores and, more importantly, to improving student learning. A recent study by Amrein and Berliner (2002a) has received a great deal of media attention. Employing various databases covering the period 1990-2000, the authors conclude that there is no evidence that states that implemented high-stakes tests demonstrated improved student achievement on various external measures such as performance on the SAT, ACT, AP, or NAEP. In a subsequent study in which they conducted a more extensive analysis of state policies (Amrein & Berliner, 2002b), they reach a similar conclusion. However, both their methodology and their findings have been challenged by a number of authors. In this article, we undertake an extended reanalysis of one component of Amrein and Berliner (2002a). We focus on the performance of states, over the period 1992 to 2000, on the NAEP mathematics assessments for grades 4 and 8. In particular, we compare the performance of the high-stakes testing states, as designated by Amrein and Berliner, with the performance of the remaining states (conditioning, of course, on a state’s participation in the relevant NAEP assessments). For each grade, when we examine the relative gains of states over the period, we find that the comparisons strongly favor the high-stakes testing states. Moreover, the results cannot be accounted for by differences between the two groups of states with respect to changes in percent of students excluded from NAEP over the same period. On the other hand, when we follow a particular cohort (grade 4, 1992 to grade 8, 1996 or grade 4, 1996 to grade 8, 2000), we find the comparisons slightly favor the low-stakes testing states, although the discrepancy can be partially accounted for by changes in the sets of states contributing to each comparison. In addition, we conduct a number of ancillary analyses to establish the robustness of our results, while acknowledging the tentative nature of any conclusions drawn from highly aggregated, observational data.


Download data is not yet available.

Author Biography

Henry Braun, Educational Testing Service

Henry Braun earned a B.Sc. (Hon.) in Mathematics from McGill University and a PhD in Mathematical Statistics from Stanford University. After serving as an assistant professor of statistics at Princeton University, he joined Educational Testing Service in 1979. He was vice-president for research management from 1990 to 1999. He was elected a fellow of the American Statistical Association in 1991. He is a corecipient of the 1986 Palmer O. Johnson Award of the American Educational Research Association and a corecipient of the National Council for Measurement in Education's 1999 Award for Outstanding Technical Contribution to the Field of Educational Measurement. He now holds the title of distinguished presidential appointee in the Research & Development Division at the Educational Testing Service in Princeton, NJ.




How to Cite

Braun, H. (2004). Reconsidering the Impact of High-stakes Testing. Education Policy Analysis Archives, 12, 1.