Value-Added Model (VAM) Research for Educational Policy: Framing the Issue

In this manuscript, the guest editors of the EPAA Special Issue on “Value-Added Model (VAM) Research for Educational Policy” (1) introduce the background and policy context surrounding the increased use of VAMs for teacher evaluation and accountability purposes across the United States; (2) summarize the five research papers and one research-based commentary that were peer-reviewed and selected for inclusion in this special issue; and (3) discuss the relevance of epaa aape Education Policy Analysis Archives Vol. 21 No. 4 SPECIAL ISSUE 2 the papers both individually and collectively. Their importance is discussed in terms of each paper’s contribution to the general research on this topic and each paper’s potential to inform educational policy. In addition, the papers reflect our shared thinking about VAMs, VAM output, and the inference-based decisions for which VAMs are increasingly being used.


Introduction
Historically, throughout the United States, public education agencies have used localized approaches for evaluating teachers and making determinations about teacher effectiveness.With few exceptions (e.g., the state of Tennessee), teacher evaluation efforts have been traditionally governed and developed by school districts under the guises of district control.Accordingly, school districts have not been encouraged or incentivized to conform to any particular teacher accountability frameworks.Now, however, evaluating teachers using value-added models (VAMs) 1 has become a matter of federal and state education policy, as well as a matter of federal and state educational urgency and resolve (Corcoran, 2010;Stumbo & McWalters, 2011;U. S. Department of Education, 2009a).
Encouraged by over $350 million in federal funds through President Obama's Race to the Top (RttT) competition, states are exploring methods to capture the value a teacher adds to student learning from one year to the next (i.e., a teacher's value-added).To date, 18 states, the District of Columbia (D.C.), and 16 school districts across the country have won RttT funding to support these efforts (U. S. Department of Education [USDOE], 2012a, 2012b).As a result, education agencies are increasingly developing teacher accountability systems based in large part on measures of academic growth that can be attributed to teachers' effectiveness (USDOE, 2009b).Additionally, 44 states and D.C. have applied for No Child Left Behind (NCLB) waivers (Philips, 2012), excusing them from NCLB's prior goal that 100% of the students in their public schools would be academically proficient by the year 2014.In exchange for these pardons, these states have also agreed to adopt stronger teacher accountability mechanisms, again based in large part on the growth demonstrated by students as measured via VAMs.
By definition, VAMs are designed to isolate and measure teachers' contributions to student learning and achievement on large-scaled standardized tests as groups of students move from one grade level to the next.Statisticians measure value-added by mathematically calculating the "value" a teacher "adds to" or "detracts from" student achievement scores over time, and as compared to teachers with "similar" students.Purportedly, VAMs allow for richer analyses of achievement data by tracking student learning trajectories from the time they enter a classroom to the time they leave.In addition, VAMs have arguably improved upon the educational measurement systems previously used for test-based accountability (see, for example, Capitol Hill Briefing, 2011;Harris, 2011).
As such, it makes sense that VAM output be used as an integral component of contemporary teacher accountability policies.To this end, states are increasingly incorporating VAM components within their teacher evaluation frameworks.But just because it makes sense to do this does not mean it works.That said, it is important to examine whether integrating VAM output as part of teacher accountability policies works in the ways theorized (e.g., by those securing state and federal contracts to conduct this work, by the federal government via initiatives such as RttT and NCLB waivers).Furthermore, it is critical to examine whether VAMs work well enough to make highly consequential decisions about teachers (i.e., publishing teacher names and their VAM scores, using VAM output as a significant factor in decisions such as teacher tenure, merit pay, or continuation of employment).
Calls for all types of research on VAMs are urgent and pertinent especially as policy development continues to focus on VAMs with unwarranted levels of certainty and conviction (see, for example, Schafer, Lissitz, Zhu, Zhang, Hou, & Li, 2012).To this end, five manuscripts and one commentary are featured in this special issue of Education Policy Analysis Archives (EPAA).Collectively, the authors present evidence-based arguments about VAMs and their use in local, state, and national policy contexts.In their own unique ways, with unique methods of inquiry, the authors advance our thinking about the proper use and role of VAMs for educational policy.

Special Issue Summaries
Today, EPAA features Diana Pullin's (Boston College) research paper on Legal Issues in the Use of Student Test Scores and Value-Added Models (VAM) to Determine Educational Quality.Pullin addresses the changing legal landscape associated with policy-based, high-stakes, teacher evaluation systems using VAMs.She argues that when policy-based evaluation systems explicitly require use of score-based measures, the data quality standards of those measures may be subjected to critical examination.In addition, given the conflicting perspectives currently held by researchers regarding VAMs, plaintiffs in legal proceedings have begun to argue that such methods may be insufficient or otherwise inadequate to discriminate levels of professional quality in support of high-stakes employment decisions.As a result, courts may burden education agencies by requiring them to provide substantive evidence of reliability and validity.Technical and inferential limitations of VAMbased evaluation ratings may also raise legal considerations pertaining to substantive due process and equal protection issues, infringement on individual civil rights, and issues related to commercial liability (e.g., test design).
In a related article also today, EPAA features The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era authored by Bruce Baker (Rutgers, The State University of New Jersey), Joseph Oluwole (Montclair State University), and Preston Green (The Pennsylvania State University).Contextualized in a review of district and state policies related to teacher evaluation and general human resource decision-making, Baker, Oluwole, and Green provide many reasons to question the usability of student growth and VAMs for such purposes.They also, like Pullin, focus on identifying potential legal ramifications and other consequential outcomes of these policies and practices if such considerations are not taken seriously.
Tomorrow, EPAA will feature an in-depth analysis conducted by Nicole Kersting and Mei-Kuang Chen (University of Arizona) and James Stigler (University of California, Los Angeles) titled

Value-Added Teacher Estimates as Part of Teacher Evaluations: Exploring the Effects of Data and Model
Specifications on the Stability of Teacher Value-Added Scores.Kersting, Chen, and Stigler investigate the extent to which three VAM specifications impact the overall stability of value-added output (with implications for validity).The three specifications they examine include methodological variations in (1) accounting for students' academic and other background variables, (2) using single or multiple cohorts of students to measure teacher value-added, and (3) sample size specifications and their related standards of error, all of which ultimately impact VAM output and teacher-level classifications.Like others, they find issues with stability (Koedel & Betts, 2005;Schochet & Chiang, 2010), sample sizes (Lockwood, McCaffrey, & Sass, 2008;McCaffrey, Lockwood, Koretz, & Hamilton, 2003;Nelson, 2011), and other data and model specifications (Harris, Sass, & Semykina, 2012;Papay, 2011).The paper adds important evidence to support the argument that before VAM data can be used for consequential purposes, such issues must be addressed and taken much more seriously than they currently are.
On Wednesday, EPAA will feature Ecologies of Education Quality authored by Elizabeth Graue, Katherine Delaney, and Anne Karch (University of Wisconsin, Madison).In this qualitative piece, Graue, Delaney, and Karch confront the challenging task of dissecting the influence of school and community contexts on outcome measures of educational quality in general.Using value-added and observational data together, including both teacher and school level data from four school sites in one metropolitan area, they contrast how these evaluative measures are operationalized within the contexts in which the measurements are constructed, interpreted, and used.The authors provide readers with an in-depth review of the complexities, noting specifically the relative inadequacies of the outcome measures used to represent all aspects of teacher quality.They ultimately argue that even with the most sophisticated technical controls, VAMs cannot adequately capture or control for all of the contextual variables involved.
On Thursday, EPAA will feature Sentinels Guarding the Grail: Value-Added Measurement and the Quest for Education Reform authored by Rachael Gabriel (University of Connecticut) and Jessica Nina Lester (Washington State University).Gabriel and Lester present another qualitative analysis, this time analyzing a series of state-level policy discussions surrounding the use of value-added data, specifically derived via the Tennessee Value-Added Assessment System (TVAAS).Using discourse analysis methods, the authors demonstrate how value-added is presented and received among both the public and policymakers as scientific, objective, accurate, and efficient.In addition, Gabriel and Lester highlight the numerous concerns and cautions issued by educational researchers and critics of VAMs (e.g., lack of reliability and validity, bias, errors, and inherent problems and overreliance on achievement tests), and they evidence how these issues are often overlooked and dismissed at multiple policy-levels.
On Friday, EPAA will end the special issue featuring a commentary authored by Moshe Adler (Columbia University) titled "Findings vs. Interpretation in 'The Long-Term Impacts of Teachers' by Chetty et al." Here, Adler critiques the acclaimed, overly publicized, and hotly contested study of New York City teachers' value-added and long-term impacts that was conducted by Chetty, Friedman, and Rockoff (2011).Specifically, Adler critically highlights the omissions, fallacies, and misrepresentations made by the authors, all of which resulted in unwarranted national and international attention and misled policymakers and the public alike, especially in terms of the value and potential (yet exaggerated) powers of VAMs.Adler also makes the case that had Chetty et al.'s work been peer reviewed prior to its public release (see also Lowrey, 2012), it likely never would have had the impact it did given the serious methodological and other shortcomings highlighted here and elsewhere (Ballou, 2012;Ravitch, 2012;Winerip, 2012).
In sum, the papers featured as part of the EPAA special issue on VAMs will highlight the ways in which education agencies across the United States are increasingly aligning their policy interests and dependencies on VAMs; that is, the use of VAMs to assess teacher quality and effectiveness as well as to hold teachers accountable.Yet while a growing volume of published research has revealed substantive methodological and inferential concerns when evaluating teacher effectiveness using VAM approaches (see, for example, Amrein-Beardsley, 2008; Corcoran, 2010; Darling-Hammond, Amrein-Beardsley, Haertel, & Rothstein, 2012;Muijs, 2006;McCaffrey, 2003;Papay, 2012;Raudenbush, 2004;Rivkin, 2007;Rothstein, 2009;Rubin, Stuart, & Zanutto, 2004;Scherrer, 2011;Schochet & Chiang, 2010;Zeis, Waronska, & Fuller, 2009), this dialogue is unfortunately taking place among academic researchers and not between academic researchers and policymakers.The translation of research to practice and policy development has yet to be realized in significant and impactful ways.Much more effort needs to be placed on sharing these results with policymakers (see, for example, Capitol Hill Briefing, 2011).
Indeed, the goal of this EPAA collection is to present a series of research-based studies that are peer-reviewed, but that are also freely and openly accessible to all, including those at multiple policy levels.In addition, because EPAA facilitates providing policymakers direct access to research and research-based recommendations, in this case about a widely (and wildly) popular educational policy, hopefully this too will help to further inform our collective thinking about VAMs.Policymakers and others outside of academia must begin to better understand the methods, model specifications, and assumptions being made, as well as the appropriate interpretations and inferencebased uses of VAM output.

Framing the Issue
With this in mind, we present next the major and minor themes prevalent across papers.We do this here particularly so that policymakers and others might be better equipped to delve further into the very real issues and concerns that come along with the adoption and implementation of VAMs.
First, all of the papers trace their lineage back to the 2009 federal RttT initiative, which represented a critical transition point in public education policy.This popularized VAMs as the policy tools pitched for stronger accountability purposes and purposes of radical educational reform.This widespread and rapid acceptance is an underlying theme across the papers presented in this special issue.
Second, all papers featured in this special issue are applicable to educational policymakers, particularly as all of the contributing researchers are focused on the same topic in their own diverse ways.While they each use unique methods of inquiry, the underlying purpose is to help others better understand and make sense of VAMs' intended and unintended effects.As well, the individual and collective papers should help others to better ascertain whether VAMs indeed work in the ways theorized.Unique contributions here come from the qualitative pieces offered by Graue et al. and Gabriel and Lester.These are unique in that it is rare that qualitative research is published on this topic.Also distinctive are the law-based pieces put forth by Pullin and Baker et al.These are distinctive because we are only just beginning to understand the legal ramifications that might come along with VAM use in general, but more importantly for highly consequential decision-making purposes.In some cases, we are already beginning to witness the impact that VAM use (and abuse) can have on the personal and professional lives of public school teachers (Amrein- Beardsley & Collins, 2012).
Also of great importance is Kersting et al.'s scholarly contribution.Via their examination of the methodological issues often associated with VAM-based ratings, they add more rich evidence in support of similar concerns about model stability, the substantial errors that are inhibiting VAM practicality, and how model specifications can compromise overall levels of validity.As well, Adler adds to our thinking about how technical details matter, especially in a case in which VAM-based findings may have resulted in erroneous conclusions about the long-term instructional effects of teachers.Alder highlights the potential for these judgments to improperly lodge themselves within prevalent policy ethos, especially if research-based findings are publicized prior to being peerreviewed by the researchers best equipped to judge the merits of such research-based studies.
A third perspective positions the papers within a larger conceptual validation study.That is, findings from each combine to form a larger conceptual framework for assessing the suitability of VAMs as tools for making consequential decisions about teachers and their effectiveness.In this regard, the stability of VAM ratings are examined by Kersting et al., while Baker et al. question the wisdom of employing pre-specified weights to VAM components.In addition, Graue et al. look at resource allocation and cultural coherence as unmeasured construct-relevant factors impacting student learning, and Gabriel and Lester highlight the absence of critical reflection on the part of policymakers regarding VAM-based measures.Pullin advances legal consequences emanating from the methodological problems associated with VAMs, and Adler's examination of Chetty et al.'s (2011) study reveals that threats to validity may be internal, due to the improper application and interpretation of methodological approaches.Collectively these papers provide a more nuanced, multi-faceted, view of the VAMs currently situated within the larger policy context of stronger accountability, and also as currently situated as America's ideal mechanisms to promote meaningful educational reform.

Conclusion
A close read of these articles reveals the tension existing between the policy and research communities regarding VAM and the evaluation of teacher effectiveness and educational quality.On one hand, policymakers throughout the country are increasingly embedding score-based (VAM) approaches within educational evaluation and accountability systems.On the other hand, social science researchers are increasingly questioning the methodological, technical, and inferential attributes of these same VAM approaches.
Gabriel & Lester use the phrase sentinel of trust to reflect the degree to which policymakers have come to accept VAM as an objective, reliable, and valid measure of teacher quality.At the same time, they note how the same audience ignores the technical and methodological issues examined by some of the papers here and elsewhere.This is because of what Graue et al. call policymaker's "insatiable appetite[s]" (p. 2) for quality, objective indicators in education.
It is in this context that these research-based papers are presented to readers, individually and collectively, as these papers stand to "add value" to the literature regarding educational policy, high-stakes accountability, and teacher evaluation in general.Specifically, these papers stand to "add value" in terms of how policymakers, their affiliates, and others might more easily access a series of diverse, research-based contributions about how we might more wisely proceed in terms of thinking about VAMs and VAM-based use.

About the Guest Editors
Audrey Amrein-Beardsley (Editor) Affiliation: Associate Professor, Arizona State University, Mary Lou Fulton Teachers College Email: audrey.beardsley@asu.eduAudrey Amrein-Beardsley, Ph.D., is currently an Associate Professor in the Mary Lou Fulton Teachers College at Arizona State University.Her research interests include educational policy, research methods, and more specifically, high-stakes tests and value-added measurements and systems.She was also recently named as one of the top 121 edu-scholars in the nation, honored for being a university-based academic who is contributing most substantially to public debates about the nation's educational system.She is also the creator and host of a show titled Inside the Academy during which she interviews some of the top educational researchers in the academy.For more information please see: http://insidetheacademy.asu.edu.

Clarin Collins (Assistant Editor)
Affiliation: Graduate, Arizona State University, Mary Lou Fulton Teachers College Email: clarin.collins@asu.eduClarin Collins, Ph.D., recently graduated from the Educational Policy and Evaluation program in the Mary Lou Fulton Teachers College at Arizona State University.For her dissertation study she analyzed teachers' understanding of and experiences with the SAS Educational Value-Added Assessment System (SAS ® EVAAS ® ) in the Houston Independent School District where SAS ® EVAAS ® is currently used to evaluate teachers with high-stakes consequences.Her research interests include national and local policy implementation at the classroom level, teacher influences on policymaking and implementation, and education evaluation and accountability systems.

Sarah A. Polasky (Assistant Editor)
Affiliation: Assistant Research Professor, Arizona State University, Mary Lou Fulton Teachers College Email: sarah.polasky@asu.eduSarah A. Polasky, Ph.D. works as the Value-Added Specialist on the Arizona Ready-for-Rigor Project, a federal Teacher Incentive Fund grant awarded to the Mary Lou Fulton Teachers College at Arizona State University in 2010.She works to support the implementation of valueadded evaluation systems in partnering districts and schools, as well as support the betterment of their evaluation and data systems (e.g., by introducing and including new and unique measures into such systems).Her related research interests include assessment systems in early childhood education, the use of alternative achievement tests (e.g., district benchmarks, formative assessments) and non-achievement (i.e., developmental) data for value-added systems in general, and the impact of socioemotional and neurological development of young children on their academic achievement and growth over time.
Edward F. Sloat (Assistant Editor) Affiliation: Director of Research and Accountability, Dysart Unified School District, Surprise,