Achievement at Whose Expense ? A Literature Review of Test-Based Grade Retention Policies in U . S . Schools

The author uses Maxwell’s method of literature reviews for educational research to focus on literature relevant to test-based grade retention policies to make the following argument: although some studies have documented average gains in academic achievement through test-based grade retention, there is increasing evidence that these gains have occurred by limiting the educational opportunities for the most vulnerable of students. The author begins by briefly synthesizing research on high-stakes testing policies and teacher-based retention in general and then examines studies that have evaluated specific test-based retention policies in Chicago, Florida, New York City, Georgia, Texas, Wisconsin, and Louisiana. Drawing on Bourdieu and Passeron’s concept of reproduction in education, the author shows how testing policies have contributed to class selection and exclusion in U.S. schools. Short-term gains produced by test-based retention policies fade over time with students again falling behind but with a larger likelihood of dropping out of school. These unintended consequences are most prevalent among ethnic minority and impoverished students. The author concludes by providing alternatives for ending social promotion that do not include grade retention as well as suggestions for further researching the role such policies play in perpetuating class inequities. epaa aape Education Policy Analysis Archives Vol.22 No.18 2


Purpose
Test-based grade retention policies have elicited great debate, both in education circles and among the general public.Proponents of retention (e.g., Owen & Ranick, 1977;Winters & Greene, 2006) have argued that retention is necessary to ensure that students who are behind master the skills needed to succeed in the next grade level.Opponents (e.g., Shepard & Smith, 1989), however, have claimed that retention unfairly targets the most vulnerable students, rarely results in academic improvement, and increases the likelihood that students will drop out of school.So what do research findings suggest about the impact of test-based retention policies, especially on low-income and ethnic minority students?Some researchers (e.g., Boote & Beile, 2005) claim that the key attributes of quality literature reviews are their thoroughness and comprehensiveness.Maxwell (2006), however, argued that rather than comprehensively summarizing or synthesizing research on a specific topic, effective literature reviews for educational research should instead focus on studies that are most relevant to a specific argument made evident in the literature, thus demonstrating new scholarly insights and areas needing further research.Drawing on Maxwell (2006), rather than providing a comprehensive review of the research on high-stakes testing and grade retention, I focus on studies relevant to testbased retention to make the following argument: although some studies have documented average gains in academic achievement through test-based grade retention, there is increasing evidence that these gains have occurred by limiting educational opportunities for the most vulnerable students.
Other researchers have made similar arguments when reviewing the literature on retention (e.g., Shepard & Smith, 1989).However, these reviews have only addressed teacher and not testbased retention.Researchers of test-based retention policies (e.g., Allensworth & Nagaoka, 2010;Greene & Winters, 2007) have argued that although similar, the two differ in terms of how the retention decision is made and thus warrant separate study.
I begin the review by introducing Bourdieu andPasseron's (1987/1990) theoretical concept of reproduction in education.Reproduction suggests testing policies produce social inequality, bolstering certain types of students while hindering others, and provides a useful lens for analyzing the effects of test-based retention.
Because test-based retention policies are a combination of high-stakes testing and grade retention, I briefly review research on high-stakes testing policies in general.This includes research conducted on testing policies under No Child Left Behind (NCLB) as well as research on minimumcompetency and state testing policies prior to NCLB that have important implications for test-based retention policies.Although NCLB does not require test-based grade retention, some researchers have argued that the assessments resulting from NCLB provide a mechanism for using test results in retention decisions, thus indirectly proliferating test-based retention policies (Penfield, 2010).No studies have specifically examined the effects of NCLB on test-based retention; however, research conducted prior to NCLB does provide some interesting findings.For example, Heilig and Darling-Hammond (2008) found that some Texas schools preemptively retained ninth graders to provide an extra year of instruction prior to the tenth grade high school graduation exams.Others preemptively promoted students from ninth to eleventh grade to avoid tenth grade testing all together.
Next, because teacher and test-based retention are closely related and have produced similar findings, I provide an overview of research on teacher-based grade retention.I briefly discuss the findings of major meta-analyses and literature reviews, highlighting new contributions made by each study.Several literature reviews have been written synthesizing the research on high-stakes testing and teacher-based grade retention.However, no comprehensive reviews have been written on retention policies dictated by standardized test scores.To address this need I conclude by discussing in-depth the research on test-based grade retention policies, noting the similarities between its findings and those on teacher-based retention.

Methods and Definitions
A variety of search methods were used to locate the sources for this review.I first searched for relevant books, articles, and research reports by using numerous databases such as Educational Resources Information Center (ERIC), PsycINFO, Web of Science, Google Scholar, and the GIL Universal Catalog of the University System of Georgia.I used search terms such as "standardized test*," "high-stakes test*," "grade repetition" and "social promotion."I then reviewed the reference lists of each of those sources.
Throughout this review, I use terms such as social promotion, test-based retention, promotional gates, standardized testing, and high-stakes testing.For clarity, I provide the following descriptions.The U.S. Department of Education (1999) has defined social promotion as "allowing students who have failed to meet performance standards and academic requirements to pass on to the next grade with their peers instead of completing or satisfying requirements" (p.5).
Numerous states and larger cities (e.g., Texas, Georgia, New York City, Chicago) have developed test-based grade retention policies in an effort to eliminate social promotion in schools (Marsh et al., 2009).These policies require that test scores be used, at least in part, to determine which students should be promoted and which should be retained.Rather than affecting all grades, these policies frequently contain promotional gates, which are specific grades in which test-based retention policies apply.For example, in Georgia, the test-based retention policy applies in grades 3, 5, and 8 (Georgia State Board of Education, 2001).
Most often, the tests involved in these policies are standardized tests, usually criterionreferenced, that are administered using standardized procedures for administration, completion, and scoring (Haney, 1984).What makes these tests "high-stakes" is that their results, in this case promotion or retention, are used to make important decisions that immediately and directly affect students, teachers, and schools (Madaus, 1988, p. 87).
Researchers of test-based retention policies use a variety of terms to describe students who are at-risk of retention as well as the social barriers they face (e.g., low-performing, low-achieving, low-scoring, low socio-economic (SES), impoverished, low-income, ethnic minority, marginalized, class inequities).When available, I provide the specific definitions researchers use for these terms.Drawing on Bourdieu andPasseron (1970/1990), as I discuss below, I also use the term vulnerable to describe students most at risk of the negative consequences of test-based retention.According to Bourdieu andPasseron (1970/1990) individuals are most vulnerable when they lack the capital to successfully compete for the resources in a given field.

Bourdieu and Passeron on Education and Testing
Teachers have often recognized that students with certain backgrounds tend to flourish in school while others do not.Researchers have frequently attributed such achievement gaps to opportunity gaps or unequal childhoods that occur among class lines in the U.S. (Lareau, 2003).French sociologists Bourdieu andPasseron (1970/1990) researched economic, social, and cultural class domination and, in so doing, developed a theory of reproduction in education that explains how social class inequities play out in terms of academic achievement in schools.Bourdieu andPasseron (1970/1990) explained the process of reproduction with the theoretical concepts of field, capital, and habitus.Individuals are socialized by a variety of fields, what Lareau (2003) has described as "institutional arrangements" (p.275).This socialization greatly influences what individuals recognize as feeling comfortable and natural and thus largely dictates the habitus, how they respond in specific situations.These background experiences also provide individuals with specific amounts and types of capital, which are resources they then use to compete for additional capital within the field.
According to Bourdieu andPasseron (1970/1990), education is a field that consists of its own rules for allocating and accruing resources (e.g., grades, promotions, diplomas) that ultimately determine winners and losers.Schools reward certain types of knowledge, resources, and ways of speaking more than others.Students whose family backgrounds provide them with these skills do well in school while the rest often do not.
What terms such as "at-risk students," "ethnic minorities," "social inequalities," and "class inequities" all have in common is recognition that within a given field, non-dominant groups lack the capital recognized as valuable within that setting.For Bourdieu andPasseron (1970/1990), it is not that students from these families lack knowledge, skills, and language.In fact, such students often bring a rich variety of resources to the classroom.However, what they do lack are resources that are valued within an educational system (field) that is built upon middle-class principles such as "standard" English.
Educational testing policies, Bourdieu andPasseron (1970/1990) argued, play a key role in making sure the rules of the field remain intact.Tests provide "objective" evidence that those who fail are not cut out for academics and proof of their merit and giftedness to those who pass.Reproduction, as Bourdieu andPasseron (1970/1990) called it, occurs when nondominant groups respond by accepting their failure as a natural or taken-for-granted part of the way life is and retreat from school experiences.In so doing, they unknowingly participate in their own oppression, what Bourdieu andPasseron (1970/1990) called misrecognition, ensuring that inequities will continue.Bourdieu andPasseron (1970/1990) criticized much of the research on schooling and examinations because they believed it often helped hide the inequities these structures reproduce.Research, Bourdieu andPasseron (1970/1990) argued, must look beyond student outcomes (e.g., achievement gains) to determine what the examinations themselves are concealing.As I review the following research, in addition to discussing outcomes in educational achievement, I specifically examine the adverse impact on non-dominant students such policies are producing.

Research on High-Stakes Testing
Over the past twenty years, a significant amount of research has been conducted on the impact of tests used for accountability purposes.This research has consisted of a mixture of largescale quantitative studies, surveys, and case studies on testing policies both prior to and under NCLB.This literature is relevant to test-based retention policies in that such policies are themselves a form of high-stakes testing.Moreover, like the research on test-based retention policies, there is evidence the academic gains that have been made occurred by limiting the educational opportunities for the most vulnerable of students.

Beneficial Outcomes of High-Stakes Testing
A few researchers have noted that some high-stakes testing policies have resulted in academic outcomes their supporters believe to be beneficial.For example, Koretz, Stecher, Klein, and McCaffrey (1994) showed that teachers in Vermont spent more time teaching newer curriculum elements such as problem-solving and mathematical representations to prepare their students for their state's portfolio-based, high-stakes assessment.Stecher (2002) argued that some teachers have found tests useful for identifying students' strengths and weaknesses and attaining additional resources for students.School districts have revised curriculum and testing programs to match state curricula and provided after-school and Saturday-school tutoring for low-performing students.Hamilton et al. (2007) found that in California, Georgia, and Pennsylvania, under NCLB, schools were aligning curricula with state standards and assessments, using data for decision making, and providing extra support to low-performing students.
Researchers have also suggested that high-stakes tests also play a role in teacher motivation.Hamilton et al. (2007) found that teachers in California, Georgia, and Pennsylvania have been encouraged by high-stakes testing to improve their own practice.Finnigan and Gross (2007) studied ten elementary schools in Chicago that had been placed on probation for low test scores to determine if accountability sanctions influenced teacher motivation.Drawing on expectancy theory, they defined motivation as a function of a person's valuation of a goal and expectation that the goal could be achieved.Indeed the teachers were motivated to work harder, try new teaching approaches, and participate in professional development.However, Finnigan and Gross (2007) also noted that the teachers appeared to be more motivated to raise test scores because of their professional status and individual goals for students than by external threats.Moreover, the longer schools remained on probation, the more likely teacher morale declined and reversed any gains achieved via increased effort.

Unintended Consequences of High-Stakes Testing
Although some researchers have found positive effects of high-stakes testing, the research documenting unintended and negative effects is widespread.First, there is little evidence suggesting that these policies have actually resulted in academic achievement gains.Hout and Elliott (2011) recently conducted an extensive review of the research on high-stakes testing policies under NCLB.They found that small increases in test scores have occurred, but when similar low-stakes tests were administered, the academic gains were effectively zero for most programs.Hout and Elliott (2011) attributed the gain in test scores to score inflation, in which scores increased due to teaching to the test rather than actual gains in academic achievement.
Other researchers have documented such score inflation as well.Amrein andBerliner (2002, 2003) examined the test scores of 18 states that required that students pass a high-stakes test to graduate from high school during the 1990s.Although the high-stakes test scores increased in the 18 states, no apparent gains were made on the SAT, ACT, Advanced Placement (AP), or National Assessment of Educational Progress (NAEP) exams suggesting score inflation had occurred.Klein, Hamilton, McCaffrey, and Stecher (2000) found evidence of score inflation in Texas when they compared Texas Assessment of Academic Skill (TAAS) scores to NAEP scores from 1994-1998.
They found that the gains on NAEP were much smaller than those on TAAS and were not present on the eighth-grade math or reading tests.
In addition to score inflation, researchers have found that negative effects occurred in the form of curriculum reallocation in which teachers provided more instruction towards those content areas and standards tested than those not tested (Au, 2007;Hamilton et al., 2007;Hargrove et al., 2000;Jones et al., 1999;Smith, 1991;Smith & Rottenberg, 1991;Stecher, 2002;K. W. White & Rosenbaum, 2008).Studies have also indicated that teachers adjust their teaching and assessment styles to match those found on high-stakes tests.In many cases this involved an increase in the use of multiple-choice questions (Au, 2007;Hamilton et al., 2007;Smith, 1991;Smith & Rottenberg, 1991; K. W. White & Rosenbaum, 2008).Stecher (2002) identified negative coaching as teachers spending large amounts of time coaching students on test-taking strategies and practice passages in lieu of time spent teaching content.Other researchers have likewise documented the need teachers feel to teach to the test (Herman & Golan, 1993;Hillocks Jr. & Wallace, 2002).Additional negative effects cited in the literature include cheating (e.g., teachers giving hints, changing answers) (Amrein-Beardsley, Berliner, & Rideau, 2010;Hoffman, Assaf, & Paris, 2001;Nichols & Berliner, 2007;Wilson, Bowers, & Hyde, 2011), emotional stress (Hargrove et al., 2000;Herman & Golan, 1993;Sheldon & Biddle, 1998;Smith, 1991;Smith & Rottenberg, 1991;Triplett & Barksdale, 2005), and the use of educational triage practices (Booher-Jennings, 2005;Neal & Schanzenbach, 2010;Reback, 2008) in which teachers focused on near passing students while providing less instructional time with the lowest performing students.

Adverse Impact in High-Stakes Testing
Numerous researchers have also found that these unintended consequences are most prevalent among schools with low-income and ethnic minority students, thus suggesting that any academic benefits of these policies likely occurred at the expense of the most vulnerable of learners (W.-P.Hong & Young, 2008).For example, Herman and Golan (1993) showed that schools serving low socio-economic-students (SES) spent more time teaching to the test than schools serving higher SES students.
Similarly, Hillocks and Wallace (2002) found increased teaching to the test in schools serving low-SES students.They conducted a case study contrasting the differences between an affluent school and a poor school in Texas preparing students for TAAS.The affluent school was a suburb school with only 5% economically disadvantaged students, while the poor school was an urban school with 96% classified as economically disadvantaged.Unlike teachers at the affluent school, who received progressive writing instruction through a National Writing Project, teachers at the low-SES school received training on test preparation, spent more time teaching to the test, and even postponed instruction in non-tested subjects.
Similarly, Diamond and Spillane (2004) compared instructional practices at four schools under a high-stakes testing policy in Chicago.Two of the schools were on probation for producing low test scores and two were not.The two probation schools consisted largely of low-income students (over 97%), 100% of whom were African American.The two schools that were not on probation had a smaller percentage of low-income students (69% and 85%, and one of those schools was majority White.They found that the probation schools targeted instruction in certain subjects and grades based on the subjects being tested and to whom, whereas the non-probation schools focused on all subjects equally and emphasized improvement for all students in every grade.Probation schools adopted interventions only for specific sub categories of students to raise key test scores whereas non-probation schools adopted interventions for all students.In addition, probation schools focused on strategic ways to raise overall test scores while non-probation schools used data to inform instruction.Diamond and Spillane (2004) argued that a lack of resources and the extra pressure placed on non-probation schools resulted in the instructional differences.Such studies suggest that the positive gains in aggregate scores in districts and states produced through high-stakes testing policies occur most often in White, middle-class schools.However, those positive outcomes are eclipsed by the unintended, negative consequences that occur in low-income, ethnic minority schools.

Research on Teacher-Based Grade Retention
Teacher-based grade retention has been heavily studied over the last 60 years and has produced some of the most consistent findings in the research literature (House, 1989).Additionally, numerous meta-analyses (e.g., Holmes, 1989;Holmes & Matthews, 1984;Jimerson, 2001) and literature reviews (e.g., Jimerson, Anderson, & Whipple, 2002;Shepard & Smith, 1989;Xia & Kirby, 2009) have been published synthesizing this research.Below, I chronologically discuss these metaanalyses and literature reviews, focusing specifically on areas of agreement and disagreement and highlighting the various contributions each review makes.

Retention Meta-Analyses and Reviews
One of the first meta-analyses conducted on teacher-based retention was Holmes and Matthews (1984).See Table I for further information about these reviews.Reviewing the effects of retention on academic achievement, student attitudes toward school, and personal adjustment, Holmes and Matthews (1984) found that promoted students achieved higher academically than retained students in language arts, reading, mathematics, work study skills, social studies, and gradepoint averages.Retained students did not have as favorable of attitudes toward school as did promoted students, and retained students also scored lower than promoted students on personal adjustment measures including three subareas: social adjustment, emotional adjustment, and behavior.Holmes (1989) extended the Holmes and Matthews (1984) meta-analysis to include 63 retention studies and found negative effects occurring from retention in 54 of them.Retained students had lower achievement in language arts, reading, math, and social studies than promoted students.Retained students also scored lower on personal adjustment measures than promoted students though not statistically significant differences in the subcategories of social adjustment, emotional adjustment, and behavior.Of the nine studies that did show positive effects, the retention policies implemented in these settings also included early identification and special help for retained students through individual education plans, continuous evaluation, and low student-teacher ratios.The positive studies also included an unusual number of White and middle-class retainees with high IQs and did not follow up past the first year.Shepard and Smith (1989) reviewed several studies on the effects of retention and school policies and practices regarding retention.Their review was among the first to address a variety of retention issues in addition to academic achievement (e.g., the relationship between retention and dropping out of high school, transition programs, and teacher, parent, and administrators' beliefs about retention).Shepard and Smith (1989) drew the following conclusions: (a) retention in grade does not benefit students academically or in personal adjustment in any way; (b) retention increases the likelihood of dropping out of school by 20-30%, even when controlling for achievement, socioeconomic status, and gender; (c) retaining students in kindergarten, even in a transition program, does not boost academic achievement or solve school readiness problems; (d) from the students' perspectives, retention is harmful; (e) and finally, despite the research findings listed above, teachers, parents, and school administrators often believe that retention is quite beneficial.
Jimerson ( 2001) conducted a meta-analysis examining the effects of retention on students' academic, socioemotional, and behavioral outcomes.Like the reviews mentioned above, Jimerson (2001) found that of the 20 studies he examined, 16 concluded that retention was not an effective strategy for boosting students' academic achievement and socioemotional adjustment.Consequently, Jimerson (2001) argued that since both social promotion and retention are ineffective strategies for helping low-performing students, more research should be devoted to learning about effective intervention strategies for helping these students.Jimerson, Anderson, and Whipple (2002) conducted a systematic literature review focusing specifically on retention as a predictor of dropping out of high school.Consistent with the findings of Shepard and Smith (1989), the authors found that retention was one of the most powerful predictors of dropping out, and students who are retained more than once are at a considerably greater risk of dropping out.
By far, the largest, most comprehensive systematic literature review of the effects of retention on students' academic and socioemotional outcomes was recently conducted by the RAND Corporation (Xia & Kirby, 2009) in preparation for an evaluation of a test-based grade retention policy in New York City.Xia and Kirby (2009) examined 91 studies concerning retention published between 1980 and 2008 and produced findings that both complimented and challenged the conclusions of earlier studies.Like previous meta-analyses, Xia and Kirby (2009) found that retention alone is ineffective for increasing students' academic achievement.Although retained students may make significant gains during the retention year, they are usually not large enough to get retained students to the same level as promoted students.Xia and Kirby (2009) found that the vast majority of studies that demonstrated immediate gains from retention also showed that those effects began to dissipate two to three years after the retention, and completely disappeared after several years with retained students falling behind again.Lorence and Dworkin's (2006) and Lorence, Dworkin, Toenjes, and Hill's (2002) studies in Texas and Alexander, Entwisle, and Dauber's (1994) longitudinal study in Baltimore found much longer-lasting positive effects of retention, but they too decreased over time.Xia and Kirby (2009) did note that in some of these studies where students showed longer-lasting gains (e.g., Lorence & Dworkin, 2006), an intervention was also provided; however, researchers were unable to determine if the improvement was linked to retention or intervention.
Unlike the previous meta-analyses, Xia and Kirby (2009) included newer studies that suggested the social, emotional, attitudinal, and behavioral effects on retained students were mixed and not solely negative.However, retention was associated with a higher likelihood of students' dropping out of school and working in low paying jobs, and a lower likelihood of pursuing postsecondary education.Retention was also found to occur most frequently among the most vulnerable of students (e.g., male, ethnic minority, low SES, youngest in their grade level, most school transfers, living in single-parent households).All of these findings were consistent with previous reviews.
In addition to newer research, some of the findings of prior studies showing fewer negative consequences from retention could be linked to the quality of research selected.Xia and Kirby (2009) limited their review to empirical studies that used credible comparison groups or statistical methods to control for selection bias.They noted that a limitation of the meta-analyses by Holmes and Matthews (1984) and Holmes (1989) was that many of the studies they included were dissertations and master's theses that were dated and lacked methodological rigor.Similarly, Jimerson's (2001) meta-analysis included several studies that contained small sample sizes of less than 100 retained or promoted students.Only three studies had sample sizes of 1,000 students or more in the retained and comparison groups.Jimerson (2001) did not weight the study effects by sample size.
The most recent meta-analysis of the retention literature focused on the quality of the research design of the studies included in the sample.Allen, Chen, Willson, and Hughes (2009) examined 207 effect sizes across 22 studies using multilevel modeling to investigate the effect of retention on academic outcomes published between 1990 and 2007.They found that the use of more rigorous statistical controls was associated with fewer negative effects, challenging research that suggests retention has a negative effect on achievement (e.g., G. Hong & Yu, 2007).Although they did not find negative effects from retention, they did not find positive effects from it either and concluded that there is little justification for the claim that there are benefits of retention.The Allen et al. ( 2009) meta-analysis is consistent with other reviews in finding no benefits for retention.However, it differs by showing that quality of design is associated with fewer negative effects, thus suggesting that retention may not be as harmful to students as previously thought.

Adverse Impact in Teacher-Based Retention
As with high-stakes testing, the research on teacher-based retention points to a similar conclusion.Although there is some evidence that retained students experience academic gains that result from teacher-based retentions, those gains ultimately come at the expense of the most vulnerable of students.Low-income, ethnic minority students are most often targeted for retention (Xia & Kirby, 2009).Even in the cases where these students do receive an academic boost from repeating a grade, those gains fade over time.The children eventually fall behind and are at a much higher risk of dropping out of school.

Research on Test-Based Grade Retention Policies
The majority of researchers who have conducted studies on test-based retention policies have attempted to assess whether policies that combine retention with intervention improve student achievement and help low-performing students catch-up academically with their similarly aged peers.Despite a significant amount of research finding negative consequences of teacher-based grade retention, the popularity of test-based retention policies has continued to grow.Moreover, researchers have argued that studies need to be conducted assessing the outcomes of test-based grade retentions because they are qualitatively different from teacher-based retentions (Allensworth & Nagaoka, 2010;Greene & Winters, 2007).Test-based grade retention provides a different context and basis for retention decisions and thus produces different experiences with retention.Test-based retention policies also have a potential spillover effect on students who are not retained (Allensworth & Nagaoka, 2010;Greene & Winters, 2007).The threat of retention itself, along with extra supports like after-school tutoring and summer school are likely to motivate students who are promoted to work harder as well, especially students who place a high value on passing such tests and believe a passing score is within their reach (Roderick & Engel, 2001).
Although the research on teacher-based retention is more conclusive in terms of the negative effects associated with it, the findings on test-based retention are mixed and based on a more limited pool of studies.Before discussing the findings, I describe the test-based retention policies that have been researched to date and provide an overview of the methods that have been used to understand their outcomes.

Test-Based Retention Policies
The bulk of the research on test-based retention policies has been conducted in Chicago, Florida, and New York City.A few studies have been conducted in Texas, Georgia, Wisconsin, and Louisiana.See Table II for a brief description of these policies, their implementation year(s), requirements, and exceptions as they are described in the research literature.As can be seen from Table II, most of these policies were implemented in a close proximity, time wise, and are quite similar in the requirements and supports they provide.Many of these policies have been initiated by high-profile politicians.Mayor Richard M. Daley started the Chicago policy in 1996 when he was granted power to take over the Chicago Public Schools by the Illinois legislature.Similarly, in 2002, the New York state legislature granted Mayor Michael Bloomberg control of the New York City school system, and he implemented the Children First Initiative which included a series of new programs including a test-based retention policy.Some politicians acknowledged that they were inspired by the legislation of other states when deciding to implement test-based retention.Georgia Governor Roy Barnes, for example, referenced the policy in Texas promoted by then Governor George W. Bush when he urged the Georgia legislature to end social promotion in his 2001 State of the State address (Barnes, 2001).Governor Jeb Bush implemented the policy in Florida soon thereafter (Winters & Greene, 2012).
All of these policies are similar in that they require students to pass a standardized test for promotion.Likewise, they all combine components such as intervention, after-school tutoring, Saturday school and/or summer school in addition to retention.However, they do differ somewhat in their specific requirements for promotion.For example, Texas 1 , Georgia (Livingston & Livingston, 2002), Louisiana (Valencia & Villarreal, 2005) and Chicago (Russo, 2005) have all required a passing score on a single standardized test at certain gateway grades for promotion, whereas New York City (McCombs, Kirby, & Mariano, 2009), Wisconsin (Brown, 2007) and Florida (Winters & Greene, 2006) have allowed for additional indicators such as an assessment portfolio or an alternative standardized test.These policies have also differed somewhat in the numbers of students who have been retained.Chicago initially retained between 7,000 and 10,000 students per year, roughly 20% of third graders and 10% of eighth graders (Roderick & Nagaoka, 2005;Russo, 2005).However, after receiving a civil rights complaint about the policy, Chicago schools softened the promotion requirements in 2000-2001 and retained a much smaller percentage of students thereafter.In Florida, the retention numbers have been high: 12-14% of third graders were retained during the initial years of the policy (Winters & Greene, 2012).In New York City, only 2 to 3% of fifth graders were retained in the first two policy cohorts and only 1% in the third cohort (McCombs et al., 2009).Similarly, in Georgia, 61 to 68% of third graders in 2003-2004 who failed both administrations of the Criterion-Referenced Competency Tests (CRCT) in third grade were "placed" in the next grade through an appeals procedure (Henry, Rickman, Fortner, & Henrick, 2005;Mordica, 2006).These variations in implementation and retention rates may explain some of the different outcomes these studies have documented (Greene & Winters, 2007).

Methodological Overview
Table II lists the research that has been conducted on the test-based retention policies in Chicago, Florida, New York City, Georgia, Texas, Wisconsin, and Louisiana.A wide range of methodologies has been used to understand these policies and assess their effects.For example, in Georgia, both Henry et al. (2005) and Mordica (2006) conducted an evaluation of the policy's first year in 2003-2004 using statistical data provided by the state.In Georgia, Texas, and Louisiana, two studies (Livingston & Livingston, 2002;Valencia & Villarreal, 2005) addressed the issue of adverse impact.When the Georgia legislature passed their test-based retention policy, Livingston and Livingston (2002) compared Georgia CRCT scores with student demographics to predict if adverse impact might occur among African American and impoverished students.Similarly, in Texas, Valencia and Villarreal (2005) correlated failing the Texas Assessment of Knowledge and Skills (TAKS) with being African American or Mexican American (or other Latino) to assess if there was 1 See http://www.tea.state.tx.us/index3.aspx?id=3230&menu_id3=793 an adverse impact of the policy in Texas.In Louisiana, Valencia and Villarreal were able to confirm adverse impact among African Americans by comparing retention rates prior to policy implementation to retention rates after implementation.
Two qualitative studies have been conducted as well, one in Texas and one in Wisconsin.Booher-Jennings (2008) drew on the theoretical concepts of hidden curriculum, achievement ideology, and gender codes to understand how Texas students were responding to the retention policy, while Brown (2007) used a multiple streams approach to agenda setting to explain how key actors understood the need to construct and implement the policy in Wisconsin to improve students' academic achievement.
The majority of the research in Chicago has been conducted by the Consortium on Chicago School Research (Roderick, Nagaoka, & Allensworth, 2005).Their evaluation of test-base retention took place from the first years of the policy's implementation in 1996 to 2001 and used three primary sources of data.First, they used longitudinal school data that included both pre-and postpolicy test scores, student demographics, school attendance, an indicator of dropping out, among others.A cohort of students in third and sixth grades were followed from 1993-1994to 1998-1999(Jacob & Lefgren, 2004).Second, they surveyed teachers, principals, and sixth-and eighth-grade students to assess their experiences with the policy.Finally, qualitative data were collected including observations of summer school and interviews with teachers.
The research on the New York City test-based retention policy was conducted at the request of the school system by the RAND Corporation.McCombs et al. (2009) analyzed demographic and achievement data for four cohorts of fifth graders, each with about 60,000 students, from 2003-2007.In addition, they conducted numerous case studies in which they interviewed and surveyed administrators, teachers, and students concerning their experiences with the policy.In Florida, Winters and Greene (Greene & Winters, 2007;Winters & Greene, 2006, 2012) have focused primarily on the demographic and achievement data of five cohorts of third graders beginning in the 2002-03 academic year.
Although consistent with the overall findings of the retention literature at large, the findings of the effects of the policies in Chicago, New York City, and Florida on students' academic achievement do differ to some degree.Roderick and Nagaoka (2005) and Allensworth and Nagaoka (2010) have argued that different findings among researchers concerning the short-term effects of retention tend to occur for three reasons: (a) the comparability of test scores across grades, (b) the ability of researchers to construct adequate comparison groups of retained and promoted students, and (c) the point at which researchers estimate achievement effects.The research teams in Chicago, New York City, and Florida all approached these issues differently.In terms of the comparability of test scores, in Chicago, to estimate achievement effects using the Iowa Test of Basic Skills (ITBS), researchers had to equate scores for comparisons of growth across grades as well as forms and levels of the test.To do this they converted ITBS scores to a logit metric using Rasch models (Roderick & Nagaoka, 2005).The process was much simpler in New York City and Florida.In Florida, scores on the Florida Comprehensive Assessment Test (FCAT) are reported as developmental scale scores which the Florida Department of Education designed to have the same meaning for proficiency across grades and years (Greene & Winters, 2007).Similarly, in New York City, the New York State English language arts and mathematics tests were vertically scaled through 2005, allowing for comparisons across grades and years.
For comparison groups, a method used in Chicago (Jacob & Lefgren, 2004;Roderick & Nagaoka, 2005), New York City (McCombs et al., 2009), and Florida (Greene & Winters, 2007;Winters & Greene, 2012) was regression discontinuity design in which researchers compared the performance of students just below the test-score cutoff for retention (most of whom were retained) to those who scored just above (most of whom were promoted).Unlike sophisticated matching strategies, regression discontinuity accounts for both observed and unobserved characteristics to provide causal estimates.McCombs et al. (2009) combined a regression discontinuity design with propensity scores and doubly robust regression to estimate the effects of test-based retention on student achievement.
One of the drawbacks of comparing those who barely pass and are promoted to those who barely fail and are retained is the risk of mistaking positive effects from retention with actual regression to the mean (Allensworth & Nagaoka, 2010).Often students who are retained after a particularly bad year will perform better the next year, and students who are promoted after a betterthan-average-year will perform more poorly.Roderick and Nagaoka (2005) addressed this possible problem by using growth curve modeling to estimate achievement effects on the basis of a student's entire prior test score history, thus correcting for regression to the mean.
Another possible problem with constructing comparison groups is the use of pre-policy comparisons such as those used by Greene and Winters (2007) who compared gains of students who were retained during the first year of the Florida policy with students who achieved the same low test scores but were not retained because they entered the third grade in the year prior and thus were not subject to retention.Roderick and Nagaoka (2005) have argued that it is likely that some students with low pre-policy test scores might have improved had they been exposed to the incentives of the policy.To address this concern they were able to make cross-cohort comparisons.When the Chicago policy was altered in 2000-2001 due to a civil rights complaint, most students who failed the ITBS were promoted whereas in 1998 and 1999 those with comparable failing scores were retained.This allowed the Chicago research team to compare students with similar scores across cohorts who all were exposed to the same incentives and instructional supports.
Finally, the point at which researchers estimate achievement effects can influence their findings on the short-term effects of retention.In Chicago, Roderick and Nagaoka (2005) compared the test scores of students at the end of their retained year to those of comparable students the same age and in the same cohort who had been promoted.Essentially, Roderick and Nagaoka were attempting to determine if two years of learning in the same grade provided greater academic growth than two years of learning in subsequent grades.They argued that only same-age comparisons should be used to study the effects of retention under high-stakes testing: "If the primary objective is to evaluate the effectiveness of having a student repeat a grade versus moving on to the next grade, then the evaluation should focus on estimating the counterfactual: what would have been the achievement of retained students in the absence of retention" (Roderick & Nagaoka, 2005, p. 311).
In New York City and Florida, McCombs et al. (2009) and Winters and Greene (2012) both used same-grade comparisons in which they compared retained students' test scores after two years in a specific grade to comparable students' scores after one year in the same grade.McCombs et al. (2009) explained that same-age comparisons were not feasible in their study because the New York City exams ceased to be vertically scaled in 2006.However, scores in each grade were equated across years and thus supported same-grade comparisons.McCombs et al. (2009) also argued that samegrade comparisons supported the theory of action behind the retention treatment.Same-grade comparisons focus on the interim grade improvement of an extra year of instruction provides and thus establishes the possibility that retained students may be better prepared at the end of their academic careers.Similarly, Winters and Greene (2012) used same-grade comparisons arguing that the Florida schools' interests concerned comparing students' performance to same grade peers.As I will discuss below, the use of same-age or same-grade comparisons can greatly influence the findings on academic achievement.In the next section I provide an overview of key findings from the studies described above.

Adverse Impact in Test-Based Retention
Studies in Georgia, Texas, Louisiana, and Florida have all examined the characteristics of the students most likely to be retained under test-based retention policies.As with the studies on highstakes testing and teacher-based retention, researchers have specifically examined whether such policies have an adverse impact on ethnic minority and impoverished students.For example, Livingston and Livingston (2002) conducted a study after the Georgia law was passed but prior to its implementation.They examined CRCT scores from the State of Georgia's Office of Education Accountability and demographic data compiled by the University of Georgia Department of Housing and Consumer Economics for 39 southern counties with high numbers of African American and impoverished students.The demographic data consisted of the following: percentage of African Americans, per capita income, number of children living in poverty, number of African Americans living in poverty, number of female-headed families living in poverty, number of unwed births, and the percentage of the population without a high school diploma.They found that African American children and poor students are much more likely to fail the CRCT and consequently be retained.Livingston and Livingston (2002) argued that implementing test-based retention would have an adverse impact on these students and increase their likelihood of dropping out of school.
Likewise, Valencia and Villarreal (2005) examined the initial TAKS scores for Texas third graders in 2003.Although they were unable to analyze the scores of the second and third retakes2 , based on the initial scores, they predicted that more Mexican American (or other Latino) students would fail and thus be retained, increasing their likelihood of dropping out.Valencia and Villarreal (2005) compared retention rates for Louisiana students over a fouryear period from 1997 to 2001.The state's test-based retention policy was implemented in 2000-2001, and students began taking the Louisiana Educational Assessment Program for the 21 st Century (LEAP 21) for promotion in grades 4 and 8.They found that prior to the policy, 1 in 15 African American fourth graders was retained compared to 1 in 29 Caucasian fourth graders.After the policy was implemented, 1 in 4 African American fourth graders was retained and 1 in 13 Caucasians.An even more disproportionate number of African American students were retained in eighth grade: 1 in 13 African Americans and 1 in 21 Caucasians were retained prior to the policy, and 1 in 3 African Americans and 1 in 10 Caucasians were retained afterwards.Valencia and Villarreal (2005) argued that the shift in retention rates before and after implementation provide evidence that such policies have a disproportionate impact on African American students.Greene and Winters (2009) went a step further and examined the decisions of stateapproved Grade Placement Committee meetings when retentions were appealed.They found that Florida educators discriminated against African American and Latino students when promoting or retaining students in these appeal meetings.African American and Latino students were more likely to be retained (4% and 9% respectively) than Caucasian students, even when controlling for academic achievement.

Short-Term Academic Achievement
The studies conducted in Chicago, New York City, and Florida focused largely on academic achievement and, at least initially, reported positive effects of retention on the initial cohorts they investigated.For example, Roderick and Engel (2001) interviewed 102 low-achieving African American and Latino students in Chicago about their experiences with test-based retention.Students were considered low-achieving if based on prior test trajectory they needed average-to-above average learning gains to meet the test-score cutoff.Drawing on motivation theory, Roderick and Engel (2001) examined whether or not students would be motivated to work harder academically if they valued passing test-based retention exams and believed that passing them was possible.They found that 53% reported that the threat of retention motivated them to work harder and pay greater attention in class.The high stakes also appeared to increase the support these students received from teachers and the time students spent studying outside of school.Roderick, Jacob, and Bryk (2002) found that the test scores in gateway grades in Chicago increased substantially when the high-stakes testing policies began, although the effects were larger for the sixth and eighth grades than for the third.Sixth-and eighth-grade gains were approximately one third to one half of a year's learning gain; results were less conclusive in third grade where policy effects were smaller in reading and less consistent across years.Similarly, in Florida (Winters & Greene, 2006) and New York City (McCombs et al., 2009), students who participated in the retention policy earned higher test scores than those who did not.Third graders in Florida scored 0.06 of a standard deviation higher in reading on both the FCAT and Standford-9 and 0.14-0.15 of a standard deviation higher in math than equally performing third graders not subject to the policy the previous year.Fifth graders in New York City scored 0.10-0.20 of a standard deviation higher in reading and less than 0.10 of a standard deviation higher in math than those previously not subject to the policy.
In Florida (Winters & Greene, 2006), New York City (McCombs et al., 2009), and Chicago (Roderick & Nagaoka, 2005) students who were retained by these policies received an academic boost, at least in the short term.In Florida, students retained in 2003 scored 0.11-0.13 of a standard deviation (3.45-4.1-percentiles)higher on the FCAT and Standard-9 reading tests than comparable promoted students and 0.28-0.30 of a standard deviation (9.3-10.0percentiles) higher on the FCAT and Standard-9 in math (Winters & Greene, 2006).In New York City, students who were retained in fifth grade scored moderately higher on seventh-grade assessments (0.40-0.60 of a standard deviation) than comparable students who were not retained (McCombs et al., 2009).In New York City, the retained students' gains persisted two years after retention (McCombs et al., 2009), and in Florida, retained students' gains increased the second year so that students had about 0.40 of a standard deviation higher academic proficiency than comparable promoted students (Greene & Winters, 2007).In Chicago, Roderick and Nagaoka (2005) found that retained third graders did receive a small academic boost the year after retention, but no substantial positive effects.
Research in Chicago, New York City, and Georgia also emphasized the effectiveness of the interventions that were implemented in tandem with test-based retention policies.In Chicago, Jacob and Lefgren (2004) found that summer school increased third graders' reading and math achievement two years later by about 12% of the average annual learning gains, although for sixth graders the effects were about half as much.In New York City (McCombs et al., 2009) and Georgia (Henry et al., 2005), students who attended summer school scored higher on subsequent tests than those who did not.In New York City, students who attended summer school in fifth grade through the test-based retention policy later scored 0.10-0.15 of a standard deviation higher on both sixth and seventh grade ELA and math tests than comparable students who did not attend summer school.Similarly, in Georgia, students who attended summer school scored on average over four points higher than non-enrolled students on the summer 2004 CRCT retake test in reading, and were more likely to pass it than comparable students who did not attend summer school.Stone and Jacob (2005) found that the test-based retention policy was well-liked in Chicago schools by teachers, principals, and students.Teachers' time spent on test preparation increased substantially immediately after the policy was implemented but declined somewhat in later years.Teachers provided more time on grade-level materials in reading and math skills relevant to the test, and sixth and eighth graders received greater instructional support and reported being more academically engaged than they were prior to the policy.Finally, McCombs et al. (2009) reported that retained students in New York City did not exhibit negative emotional effects.Student surveys showed that retention did not harm their confidence in reading or math and that they reported a greater sense of connectedness to school than at-risk students who were promoted and students who were not-at risk even three years later.

Long-Term Academic Achievement
Although the short-term gains in academic achievement described above look promising, several studies have suggested that these positive outcomes may have occurred at the expense of the most vulnerable of students and fade over time.For example, Roderick and Engel (2001) found that 53% of low-achieving African American and Latino students in Chicago were working harder and paying greater attention in school than they reported they were prior to the implementation of the retention policy.However, 34% of the students reported that they were not motivated by the highstakes test and consequently did poorly.This latter group of students included the most vulnerable students, who faced significantly larger skill gaps and barriers to learning.Roderick and Engel (2001) noted that test-based retention policies may benefit certain students while making "sacrificial lambs" out of those unable or unwilling to pass the required exams (p.221).In other words, the increased motivation that the majority of students in Chicago experienced may have been produced by sacrificing the educational opportunities of the still sizeable group of students who failed.
Additionally, similar to the findings on teacher-based retention, the positive effects on academic achievement from test-based retention fade over time as cohorts of students are followed longitudinally.For example, in Chicago, Roderick and Nagaoka (2005) found that the small academic boost third-grade students received from retention dissipated to zero two years out, and retained sixth graders actually declined in academic growth.Retention in sixth grade was associated with negative growth in achievement one year after the retention year with the negative effects remaining two years later.Roderick and Nagaoka (2005) estimated that retained sixth graders experienced 6%-21% lower achievement growth than comparable non-retained students.The results also revealed that retained students in Chicago were much more likely to be placed in special education and thus be exempted from testing during their retained years.Teachers were given little guidance in working with retained students and thus usually gave them a second dose of the interventions from the previous year, a finding also documented by Stone and Engel (2007).The intervention provided during summer school was a scripted, test-preparation program that focused on skills needed for passing the ITBS (Roderick, Jacob, & Bryk, 2004).
In Florida, Greene and Winters (2007) initially found that retained students' academic gains persisted two years out, and even increased in the second year.However, an analysis conducted five years after retention indicated that although retained students were still benefiting from retention and intervention, the positive effects were dissipating (Winters & Greene, 2012).Winters and Greene (2012) found that the academic boost that students retained in third grade received from retention decreased from a high of 0.40 of a standard deviation to 0.183 of a standard deviation in reading and 0.174 of a standard deviation in math by the time they entered sixth grade.The McCombs et al. (2009) study in New York City only investigated two years beyond retention and found positive effects throughout those two years.One possible explanation for the different findings on the long-term effects of test-based retention on achievement is that the Chicago researchers used same-age comparisons while the researchers in Florida and New York City used same-grade comparisons and the choice of comparison group may influence the findings generated by a study.Same-age comparisons typically produce negative effects in the short-term that plateau or become more positive with time, while same-grade comparisons are initially positive but decrease over time (Allen et al., 2009;Allensworth & Nagaoka, 2010).Another possible explanation for the differences in the short-term gains found in Chicago and the longer-persisting gains in Florida requires a closer look at what is being measured.In Chicago, the researchers were able to measure the impact of retention itself by disentangling it from the effects of summer school.As per the policy, Chicago students were able to take a second administration of the ITBS in summer school and avoid retention (Jacob & Lefgren, 2004).This enabled the researchers to compare the future growth of students who passed the retake exam at the end of summer school and were promoted to students who failed the retake and were retained.In Florida, however, retesting is less uniform, and thus the researchers were unable to separate the effects of retention from the effects of the intervention implemented as a component of the retention policy (Greene & Winters, 2007).

Increased Likelihood of Dropping Out
Thus far, only two studies on the Chicago retention policy have examined the long-term effects of test-based retention on school dropout rates.Allensworth (2005) linked retention in Chicago to dropout rates.She compared eighth-grade cohorts of students before and after implementation of the Chicago policy and showed that retention based on test scores did increase low-achieving students' likeliness of dropping out of school, although the relationship was smaller than the dropout rates associated with traditional teacher-based retention.Interestingly, Allensworth (2005) also found that small decreases in the dropout rates among those not retained counterbalanced the higher number of dropouts among those retained so that the overall retention rate slightly decreased.This finding suggested that although the overall drop-out rate slightly decreased under the policy, it may have done so by increasing the drop-out rate of those who failed the gateway exams and were retained, thus providing further evidence that positive effects from the policy came at the expense of the most vulnerable students.Jacob and Lefgren (2009) also found a link between Chicago's test-based retention policy and high school dropout rates.They compared eighth graders to sixth graders during the first three years of the program (1997 to 1999) and found that retaining low-achieving eighth-grade students substantially increased the likelihood that they would drop out of high school.
Promotes and Demotes: Moral Boundary Work Anagnostopoulos (2006) examined Chicago's test-based retention policy at the high school level in which ninth graders who failed end of year standardized math and reading tests were demoted.In Chicago, at the ninth-grade level, students were demoted by the test-based retention policy rather than retained.Demotion differed from retention in that demoted ninth graders were required to attend a homeroom class designated for demoted students and enroll in remedial math and reading courses at the ninth-grade level, but they still could take other tenth-grade courses.Anagnostopoulos (2006) found that high-school students and teachers used test-based retentions to create social boundaries that distinguished promoted students from demoted ones.Using a cultural sociological perspective, she showed that instead of encouraging teachers and ninth-grade students to achieve academically, the policy promoted a kind of moral boundary work in which teachers justified not providing demoted students, whom they considered undeserving, with enriching learning opportunities.Success or failure on the test provided fodder for student identity constructions and social exclusion.

Masking Social Inequities
Finally, a few studies have suggested that test-based retention policies are reinforcing the ideology that success on high-stakes tests is solely the result of effort while masking the connection between educational achievement and social inequities within the U.S. Drawing on Bourdieu (1982Bourdieu ( /1991;;Bourdieu & Passeron, 1970/1990), Anagnostopoulos (2006) showed that at the high school level, Chicago's test-based retention policy enacted symbolic violence on low-income, ethnic minority demoted students by obscuring the connection between test scores and class inequities while imposing the belief that educational achievement is largely based on moral decisions such as good behavior in school, self-discipline, and perseverance.
Similarly, Booher-Jennings (2008) found that under Texas' test-based retention policy, teachers exposed students to the hidden curriculum of achievement ideology.Through their day-today words and actions, teachers communicated to students that success on the state test was based on hard work and individual effort.However, Booher-Jennings ( 2008) also noticed the teachers differed how they communicated this message to boys and girls.Teachers blamed boys who failed for their poor behavior and bad attitudes.The teachers tended to tell girls that they just need more self-esteem to pass the test.Out of the 37 students Booher-Jennings (2008) interviewed, the vast majority believed that it was fair to promote students to the next grade based on their scores on a standardized test.Most boys accepted the teachers' reasons for their failure, and girls who failed worked hard to show others they were not like the boys who just did not try.Only three students, all boys, questioned the fairness of test-based retention and expressed doubt that working hard in school would benefit their futures.

Discussion: Achievement at Whose Expense?
It is evident that test-based retention has resulted in some of the intended benefits.For example, high-stakes tests can improve alignment between curriculum and instruction (Hamilton et al., 2007;Koretz et al., 1994;Stecher, 2002).Testing has been shown to help teachers identify students' needs and motivate teachers and students to work harder (Finnigan & Gross, 2007).In both teacher and test-based retention programs that incorporate interventions, students have shown to make short-term academic gains (Greene & Winters, 2007;Lorence et al., 2002;McCombs et al., 2009;Roderick et al., 2002;Winters & Greene, 2006;Xia & Kirby, 2009).These programs appear to be popular and motivate the majority of at-risk students to work harder (Roderick & Engel, 2001;Stone & Jacob, 2005).
On the other hand, negative, unintended consequences are often an outcome of these policies and adversely affect the most vulnerable of students.High-stakes testing policies have consistently resulted in negative curriculum reallocation, encouraging teachers to adapt their teaching styles to test formats, negative coaching, cheating, and educational triage practices (Booher-Jennings, 2005;Heilig & Darling-Hammond, 2008;McNeil, Coppola, Radigan, & Heilig, 2008).All of these practices produce score inflation (Koretz, 2008) and appear to be most prevalent in probationary schools with large numbers of low-income, ethnic minority students (W.-P.Hong & Young, 2008).These negative, unintended consequences are evident when retention is tied to high-stakes testing as well.
Researchers have consistently found that the academic boosts produced from teacher-based retention are short-lived, with the retained students falling behind again in later years (Xia & Kirby, 2009).Those retained are often the most vulnerable students, and retention increases the likelihood that these students will later dropout of school (Xia & Kirby, 2009).Although many teachers, administrators, and the public at large (Bulla & Gooden, 2003;Byrnes, 1989;Smith, 1989;Tomchin & Impara, 1992) assume teacher-based retention will help low-performing students, instead it greatly increases the likelihood that many of these students will retreat from educational experiences.
With test-based retention policies, although the majority of students may be motivated to work harder, a significant number of low-achieving students appear unaffected by these policies (Roderick & Engel, 2001).In some cases, the curriculum and teaching methods the students experience during their retained year is not much different than those used in their classes prior to the retention (Stone & Engel, 2007), and retained students are at an increased risk for dropping-out (Allensworth, 2005;Jacob & Lefgren, 2009).Like teacher-based retention, academic gains through test-based retention fade over the long run (Winters & Greene, 2012).Gains in Chicago faded in the second year (Roderick & Nagaoka, 2005).
As seen above, a growing amount of evidence suggests that the academic gains that appear to result from test-based grade retention policies are likely occurring at the expense of the most vulnerable of students.Although these gains for some, but not all students are only short term gains, Linn (2000) has argued that for politicians, short-term gains may be all that is needed.In a case study on Wisconsin's test-based grade retention policy, Brown (2007) argued that Wisconsin policymakers implemented their policy "not to hold students back but rather to instill accountability into the educational system" (p.4).Legislators were being pressed to raise achievement statewide.They saw the retention policy as a means to boost student achievement through increased accountability.Overall achievement, not the fate of those retained was their main concern.Retaining students was an unfortunate necessity to help foster the public perception that schools were maintaining high standards and that the majority of students were being encouraged to do better.Policymakers viewed retention as a byproduct of improving academic performance and not as an intervention itself.The harmful effects of retention were not a problem for Wisconsin policy makers unless they affected large numbers of students.Such findings echo the claims of proponents of test-based retention policies such as Russo (2005) who argued that "…student retention policies are not really about the students who are retained as much as they are about the way the rest of the school system operates when it knows there is not social promotion" (p.47).

Implications for Policy Makers and Researchers
Several professional organizations have issued statements that have urged policy makers to abandon retention practices based on single, high-stakes test scores (AERA, APA, & NCME, 1999;American Educational Researchers Association, 2000;Dennis et al., 2012;Heubert & Hauser, 1999;National Association of School Psychologists, 2003).Standardized tests are only an estimate within a margin of error based on a small sample of questions in a certain area and should not be treated as an exact measure of student knowledge.Penfield (2010) assessed if test-based grade retention is aligned with the National Research Council's (Heubert & Hauser, 1999) standards for fair and appropriate test use.He found that testbased grade retention violates standards related to attribution of cause and effectiveness of treatment.Penfield (2010) cited evidence that test scores for nondominant groups could be attributed to poor instruction or linguistic and cultural content of the assessment rather than their knowledge and skills.Second, research suggests that retention is a potentially harmful placement.If retention harms students' academic performance or increases the likelihood that students will drop out, this could be a violation of fair and appropriate test use.Such consistent ethical concerns by professional educational organizations, along with the growing research documenting the harmful effects of test-based retention policies provide ample evidence that policy makers should strongly consider discontinuing these policies.However, ending test-based retention should not imply that social promotion is a beneficial alternative.Researchers have argued that simply retaining students without providing different instruction places the blame for low academic achievement solely on the student and offers little hope for improvement.However, simply promoting students to the next grade without additional support is a failed strategy as well (Darling-Hammond, 1998;Owings & Kaplan, 2001).
Nevertheless, retention and social promotion are not the only options available.Researchers have suggested numerous alternatives that include using classroom assessments that better inform teaching, and more effectively implementing differentiated and small group instruction (Dennis et al., 2012).Two practical alternatives suggested by researchers (e.g, Darling-Hammond, 1998;Smink, 2001;Smith & Shepard, 1989) and made evident by the studies examined in this review are increasing instructional effectiveness and increasing instructional time.
Darling-Hammond (1998) has advocated the need for improving skillful teaching as an alternative to retention, a point emphasized by the Chicago researchers in this review as well.Allensworth and Nagaoka (2010) noted that retention and not staff development was the focus of the Chicago policy.Few structures were established to improve teaching quality and thus retained students often received a second dose of the same instruction when they were retained.Interestingly, although they found that the city's summer school program (which was heavily scripted) did improve student achievement, they found that students whose teachers intentionally altered the script to meet students' needs performed higher than those who simply followed the script, leading the researchers to believe that teacher expertise made a difference (Roderick et al., 2005).
A second alternative to retention that is underscored by the findings of this review is increasing instructional time.The same-grade comparisons that were used to assess the New York City and Florida policies suggest that students who are given extra time to master material in a specific grade perform higher than those with less time to master the same material.Retention, however, is just one way of adding additional instruction.The studies in Chicago, Florida, and New York City all found that if students are provided additional instruction after school and in summer school, academic achievement increased.Additional instructional time has also been shown to be productive in the forms of universal pre-kindergarten (Lazarus & Ortega, 2007) or multi-grade instruction (Darling-Hammond, 1998;Smith & Shepard, 1989).Smith and Shepard (1989) described various approaches for reconceptualizing school organization to increase instructional time.One consists of having ungraded instruction in the primary grades.Another involves allowing a student who is behind in reading to go to a younger grade for instruction just in that subject.In schools where numerous students move among grades, students experience fewer stigmas related to being older than their peers.Finally, teachers can promote students who are still behind academically but work with their teachers in subsequent years to develop individualized intervention plans for the children.
In terms of implications for researchers, additional attention needs to be given to the short and long-term effects of test-base retention on student motivational processes.Studies on test-based retention that address motivation tend to simply look for evidence that students are working harder based on the degree they value passing the assessment and to what extent they believe passing is actually possible (e.g., Finnigan & Gross, 2007;Roderick & Engel, 2001).Additional attention needs to be paid to development of students' motivational attitudes toward school and learning.
Second, the vast majority of studies on test-based retention have been large-scale, quantitative studies seeking to determine if these policies improve students' academic achievement on standardized tests (e.g., McCombs et al., 2009;Roderick et al., 2002;Winters & Greene, 2006).Only a few qualitative case studies (e.g., Anagnostopoulos, 2006;Booher-Jennings, 2008;Brown, 2007) have attempted to understand how these policies are being negotiated by students, teachers, administrators, and policy makers.Few studies have followed students throughout these policies to better learn how they are actually being implemented in schools and the micro-level effects on students.
The lack of research in this latter area is one that needs to be addressed.As noted earlier, Bourdieu andPasseron (1970/1990) have argued that large-scale, quantitative studies solely focusing on achievement gains as measured by test scores mask the social inequities that produce such scores and the role schools and examinations play in these processes.Further exploration is needed to examine what these tests may be concealing and to flesh out the processes in which these policies obscure the connection between achievement scores and class inequities.

Table 1
Meta-Analyses and Literature Reviews on Teacher-Based Retention

Table 2
Overview of Researched Policies

Table 2 (
Cont.'d) Overview of Researched Policies