Policies and Practices of Promise in Teacher Evaluation

Audrey Amrein-Beardsley
Arizona State University
United States

Abstract: This introduction to the special issue on “Policies and Practices of Promise in Teacher Evaluation,” (1) presents the background and policy context surrounding the ongoing changes in U.S. states’ teacher evaluation systems (e.g., the decreased use of value-added models (VAMs) for teacher accountability purposes); (2) summarizes the two commentaries and seven research papers that were peer-reviewed and ultimately selected for inclusion in this special issue; and (3) discussess the relevance of these pieces in terms of each paper’s contribution to the general research on this topic and potential to inform educational policy, for the better, after the federal government’s passage of the Every Student Succeeds Act (ESSA, 2016).

Políticas y prácticas de promesa en la evaluación docente

Resumen: Esta introducción a la número especial sobre “Políticas y prácticas de promesa en la evaluación docente” (1) presenta los antecedentes y el contexto político que rodea los cambios en curso en los sistemas de evaluación docente de los estados de EE. UU. (e.g., la disminución del uso de valor agregado modelos [VAMs] para propósitos de rendición de cuentas del maestro); (2) resume los dos comentarios y siete documentos de investigación que fueron revisados por pares y finalmente seleccionados para su inclusión en este número especial; y (3) analizar la relevancia de estas piezas en términos de la contribución de cada artículo a la investigación general sobre este tema y el potencial para informar la política educativa, para mejor, después de la aprobación del gobierno federal de la ley, Every Student Succeeds Act (ESSA, 2016).

Políticas e práticas de promessa na avaliação de professores

Resumo: Esta introdução à dossier sobre “Políticas e práticas de promessa na avaliação de professores” (1) apresenta o contexto e o contexto político em torno das mudanças em andamento nos sistemas de prestação de contas de professores dos estados dos EUA (por exemplo, a diminuição do uso de valor agregado modelos [VAMs] para propósitos de responsabilização de professores); (2) resume os dois comentários e sete trabalhos de pesquisa que foram revisados por pares e, finalmente, selecionados para inclusão nesta dossier; e (3) discutir a relevância dessas peças em termos da contribuição de cada artigo para a pesquisa geral sobre esse tópico e o potencial de informar as políticas educacionais, para melhor, após a aprovação pelo governo federal da lei, Every Student Succeeds Act (ESSA, 2016).

In January of 2016, former U.S. President Obama signed into law the Every Student Succeeds Act (ESSA). The primary intent of ESSA (2016) was to restore local control to states, reduce the federal government’s regulation over states, and reset the federal government’s relationship with the nation’s 100,000 public schools, its nearly 50 million public school students, and its approximately 3.4 million public school teachers. ESSA (2016) was to also replace the current national accountability policy scheme as primarily based on high-stakes tests, with state-led accountability systems, while returning back to states responsibility for measuring student, teacher, and school performance. While states are still required to test students annually in mathematics and reading in grades three through eight and once in high school, as per the provisions written into No Child Left Behind (NCLB, 2001), and states are to report on these indicators by race, income, ethnicity, disability, etc., specific to this special issue, ESSA (2016) allowed states to decide whether and how to evaluate teachers with or without using or accounting for teachers’ purportedly causal effects on students’ standardized test scores over time, for example, via the use of student growth models (SGMs), more generally, and value-added models (VAMs), more specifically.

While ESSA (2016) is not without controversy (e.g., states’ uses of SGMs and VAMs are still strongly encouraged by the federal government as written into ESSA [2016]), pertinent to this special issue on “Policies and Practices of Promise in Teacher Evaluation,” is that states can now decide how and to what extent states might (or might not) value or explicitly weight students’ test scores as components of their revised teacher evaluation policies and systems. In addition, states now have more freedom to implement teacher evaluation systems that might involve other evaluation indicators and measures (e.g., student surveys), might serve more formative (i.e., developmental) versus summative (i.e., outcomes-based) ends, and might permit more innovation, for example, when developing new or working to improve longstanding evaluation measures (i.e., observational systems) formerly dismissed as being (too) subjective (see, for example, Weisberg, Sexton, Mulhern, & Keeling, 2009; see also Kraft & Gilmour, 2017).

Hence, and now four years post-ESSA (2016), perhaps unsurprisingly, states’ educational policies, systems, and practices surrounding teacher evaluation are changing, or are beginning to change (Close, Amrein-Beardsley, & Collins, 2020; Ross & Walsh, 2019). Again, this is occurring because ESSA (2016) has allowed states to recover power and authorities over these areas. Accordingly, it is the purpose of this special issue to capture how the teacher evaluation situation is, indeed, changing and ideally changing for the better post-ESSA. Changing for the better is defined herein as aligning with the theoretical and empirical research that is currently available in the literature base surrounding contemporary teacher evaluation systems, as well as the theoretical and empirical research that is presented in this special issue.

Correspondingly, via this special issue I have brought together scholars researching, implementing, and assessing such changes and innovations in teacher evaluation policies, systems, and practices, all of whom are doing this important work while drawing upon diverse theoretical, methodological, and conceptual perspectives. Again, contributors to this special issue are beginning to shed light on policies and practices of promise, throughout the US but also with implications for nations beyond in which leaders might alsol be grappling with similar issues related to evaluating their nations’ teachers (see, for example, Araujo, Carneiro, Cruz-Aguayo, & Schady, 2016; Sørensen, 2016).

Accordingly, included in this special issue are a set of two peer-reviewed theoretical commentaries and seven empirical articles, via which authors present or discuss teacher evaluation policies and practices that may help us move (hopefully, well) beyond high-stakes teacher evaluation systems, especially as solely or primarily based on teachers’ impacts on growing their students’ standardized test scores over time (e.g., via the use of SGMs or VAMs). In this introduction to this special issue, as such, I review each of these pieces, in order to capture the essence of each of these pieces so that readers interested in these issues (and namely, each of the pieces included in this special issue) might better understand what is included and in store.

Special Issue Summaries

First is a commentary authored by Jessica Holloway, Deakin University, Australia, titled “Teacher Accountability, Datafication and Evaluation: A Case for Reimagining Schooling.” In this commentary, Holloway discusses how contemporary teacher accountability systems, throughout the US but also globally, have become rooted in testing, evaluation, and dis/incentivization as means for shaping school reform. In the name of equity, she details how global competitiveness and high-stakes accountability practices have steadily weakened teacher expertise, authority, and professionalism by constraining the capacity for teachers to exercise professional discretion. She argues that this continues despite the passage of ESSA (2016). Consequently, she provides a lens for thinking about the role of education and how to radically disrupt the “norms” we have come to accept as necessary features of modern schooling. More specifically, she draws on the growing subfield of “datafication” (e.g., the use of big data, statistical analyses of big data, and advanced technologies to solve highly complex social issues or problems; see, for example, Kitchin, 2014; Lupton, 2018; Williamson, 2017) to illustrate how evolving and emerging data-related techniques and technologies are dramatically undermining teacher expertise and authority. Subsequently, she makes two major assertions. The first is that the “datafication” turn marks a distinct impact on the professional teacher, as digital data techniques proliferate our reliance on, access to, and ability to capture more data about teachers and their practice than before. As our fundamental understanding of individual people (in this case, teachers) becomes entangled with these data pictures, teachers’ data profiles begin to supersede teachers themselves. Second, while many features of the “datafied” classroom might seem rather innocuous, she argues it is important to consider how such conditions pave the way for new, more insidious, forms of data surveillance and control. In sum, she argues that the prevalence of numbers, metrics, and data within education is consistent with modern, Westernized views of “what counts” more broadly. Thus, there is an epistemological and ontological view that our problems and solutions of the world can be understood through statistical calculations. We must, consequently, question how this limited way of thinking constrains our capacity to imagine alternative versions of schooling—versions that might help confront the global challenges of our time. The present time urgently demands a radical re-thinking of education, not only because of the dangers associated with excessive datafication, but also because of pressing social and political challenges that require collective action. We must engage in thought experiments to provide some space for imagining new possibilities and thinking “outside of” the traditional accountability “box.”

Second is another commentary authored by Kelley King, University of North Texas, and Noelle Paufler, Clemson University. This commentary titled “Excavating Theory in Teacher Evaluation: Evaluation Frameworks as Wengerian Boundary Objects” is about how educational policymakers, also intent on assessing and evaluating teacher quality, have (or have not) focused on ensuring teacher competence and provided experiences for professional learning. Since the passage of ESSA (2016), some state policymakers have continued to prescribe a standard view of teacher quality across their public schools, as well as their educator preparation programs. Such evaluation frameworks are theory laden; however, they vary in terms of how explicitly they denote the theoretical underpinnings of “quality teaching” and professional learning as embedded within their states’ teacher evaluation models. The purpose of this commentary, accordingly, was to excavate said unstated theoretical underpinnings in order to better consider how contemporary teacher evaluation systems might better intersect theoretically with social learning theory (Wenger, 1998, 2000). Why? Social learning theory and the empirical research associated with social learning theory support the idea of professional learning broadly, and as participation within and across the boundaries of social communities (e.g., Communities of Practice [CoPs]). Hence, King and Paufler make the case for research examining potential connections in theory and practice. They argue that research is needed that examines and critiques the ability and desirability of current teacher evaluation systems to function as boundary objects around which CoPs can or might help to build practitioner identities and common understandings of teacher practice and what good teacher practice might mean. Theorizing evaluation through research that maps structures and processes for social learning, in other words, would effectively contribute to efforts to substantively increase teacher quality. Research conceptualizing teacher learning in social and organizational contexts, as well, would help to begin to build better and more nuanced understandings about the role of teacher evaluation in social learning, especially when making recommendations for organizational efforts to develop and support teacher learning that might better and more substantively contribute to the field.

The third contribution is the first of the set of seven empirical pieces included in this special issue, positioned first here given it provides a national view of what states are actually doing with their teacher evaluation systems post-ESSA (2016). Coauthored by myself, one of my current doctoral students, Kevin Close, and one of my former doctoral students, Clarin Collins, all at Arizona State University, this piece titled “Putting Teacher Evaluation Systems on the Map: An Overview of States’ Teacher Evaluation Systems Post–Every Student Succeeds Act” is about whether the U.S.’s reauthorization of ESSA (2016) categorically marked a “notable inflection point” in education policy (Ross & Walsh, 2019, p. 3). Via this study we collected nationally representative survey and state website data to investigate how and to what extent states actually changed their teacher evaluation systems post-ESSA (2016). We also gathered key state personnel’s insights to capture their perceptions of the strengths and weaknesses of their states’ teacher evaluation systems once changed. While state-by-state results can be found in the full paper, we found that VAM use substantially decreased, the number of states that explicitly do not use or encourage VAM use substantially increased, there has been a substantial shift toward more local control, and states have taken more holistic views of and approaches towards their teacher evaluation systems post-ESSA (2016). While state department personnel expressed concerns about how there is now, perhaps, too much variety across districts’ teacher evaluation systems within their states, and there is not enough capacity to support districts’ teacher evaluaiton needs given this increase in variety, that states’ post-ESSA (2016) teacher evaluation systems are also more focused on formative (i.e., developmental) versus summative (i.e., outcomes-based) functions and needs was also viewed as a positive, post-ESSA (2016) trend. In short, ESSA has impacted the ways that states’ policymakers are thinking about and enacting or endorsing teacher evaluation systems, that do look different now than they did, prior to the passage of ESSA (2106) and especially after Race to the Top (2011). This reversal of trends, as we and many others would argue, constitute steps in the right direction.

Fourth is an empirical piece authored by Alisha Braun, University of South Florida, and Peter Youngs, University of Virginia, titled “How Middle School Special and General Educators Make Sense of and Respond to Changes in Teacher Evaluation Policy.” In this piece Braun and Youngs review the contemporary policy landscape of accountability and teacher evaluation reform, as per the use of classroom observation tools and student growth measures (SGMs, akin to VAMs), and as per the perspectives and experiences of special educators. While numerous scholars have written about the strengths and limitations of these measures for special education teacher use (Johnson & Semmelroth, 2014; Jones & Brownell, 2014; Jones, Buzick, & Turkan, 2013), few have compared the experiences of special and general education teachers. To address this gap in the literature, Braun and Youngs compared the perceptions and experiences of middle school special and general educators given a “new” teacher evaluation system in Virginia, even though that system was at the time of their study still relying on these two measures as their primary teacher evaluation indicators. What they found was considerable differences between the perceptions and experiences of special and general educators. In comparison to general educators, more specifically, special educators felt that the use of SGMs to assess teacher performance failed to evaluate a significant component of their jobs, namely their roles as case managers. Special educators also experienced conflict between the main elements of the teacher evaluation policy and their beliefs about effective teaching for students with disabilities. This conflict left the special educators studied very critical of the appropriateness of the state’s evaluation system. Ultimately, findings from this study illustrate the importance of acknowledging differences in special and general educators’ roles and responsibilities and encourage policymakers to seriously reconsider developing and implementing uniform teacher evaluation policies of the past.

Fifth is a research piece authored by Jake Malloy, University of Wisconsin-Madison, titled “Entangled Educator Evaluation Apparatuses: Contextual Influences on New Policies.” Malloy notes that along with the other states in which leaders are embracing, or at least considering, redesigns of their teacher evaluation systems, Wisconsin educational leaders are also attempting to move beyond a high-stakes, VAM-based, teacher evaluation model, so as to “inspire and empower” (WI DPI, 2017 July). Put differently, the goal is to help Wisconsin teachers teach well and focus, more on professional development; although, as Malloy illuminates, doing this also presents its own set of challenges, given the residual “baggage” with which Wisconsin leaders must grapple as they move away from the state’s former, post-Race to the Top (2011) teacher evaluation system. Likewise, Malloy explains how and why such desirable changes may not quickly be enacted, in Wisconsin, and likely elsewhere. Drawing on Actor-Network Theory (ANT) perspectives (Latour, 1986, 2005) that conceptualize evaluation as an entangled material-discursive apparatus, Malloy more specifically explores why Wisconsin leaders have struggled to elicit full engagement from educators, despite most educators favoring the switch from a punitive accountability-based logic. Moreover, Malloy found that Wisconsin’s change in theory was not matched by a radical restructuring towards improvement, as also constrained by the state’s teacher evaluation apparatuses (see, for example, Anderson, 2017; Foucault, Davidson, & Burchell, 2008), and that the changes that were made were often not read as authentic because of the broader context in which Wisconsin educators continued to find themselves. Taking into account decades-long struggles for legitimacy by teachers and the general deprofessionalization of teachers through these and other federal and state policies, it is necessary, then, to also understand educators’ approaches to evaluation. To their credit, Wisconsin seems aware of this and has aggressively conveyed their support of the value of teachers and their improvement through professional growth and development.

Sixth, Brady Ridge and Alyson Lavigne, both at Utah State University, offer another empirical piece titled, “Improving Instructional Practice through Peer Observation and Feedback.”In this piece they explore one of the unanticipated costs of prior teacher evaluation reforms—increased pressure on school administrators to observe and provide teachers with feedback more often and in more rigorous and systematic ways. They also note that despite these efforts, only half of teachers have apparently found the feedback they have received from their principals useful (Cherasaro, Brodersen, Reale, & Yanoski, 2016). Subsequently, this problem has led many school leaders to look for alternative forms of support for their teachers. One such strategy is utilizing peers to observe and provide teachers feedback, which is a practice utilized more frequently around the globe, underutilized in the US (OECD, 2014a, 2014b), and relatively understood; althugh, this does show some promise (see, for example, Ackland, 1991, Lu, 2010). Hence, the purpose of this study was to conduct a systematic literature review to determine what the extant literature currently indicates about the efficacy of such an approach, in order to inform further discourse on whether peer observation and feedback might actually be a practice of promise. They evidenced that, indeed, this is an alternative observational practice of promise (alternative to the, perhaps, overreliance on administrators to do this work), but they also evidenced that this practice still lacks sufficient evidence to prevent blanket versus informed and careful adoption. The most salient benefit noted was increased teacher collaboration, whereby teachers purportedly benefited from the opportunity to work more closely with their peers; however, scholars of still very few studies have actually observed meaningful changes in teachers’ instructional practice as a result. Ridge and Lavigne conclude that future research needs to be conducted, especially if states adopt such approaches, in order to truly measure the effect of peer observation and feedback on teachers’ instructional practice, as well as student learning.

Seventh, and related, Sean Kelly, University of Pittsburgh, Robert Bringe, University of North Carolina-Chapel Hill, Esteban Aucejo, Arizona State University, and Jane Fruehwirth, University of North Carolina-Chapel Hill, contributed a research piece titled “Using Global Observation Protocols to Inform Research on Teaching Effectiveness and School Improvement: Strengths and Emerging Limitations.” In this piece they critique the teacher observation protocols often used to evaluate teachers, used perhaps most notably during the well-known Measures of Effective Teaching (MET) Study (Bill & Melinda Gates Foundation, 2013; see also Kane & Staiger, 2012), and used to inform instructional improvement, many of which take a “global” approach to observing and measuring teacher pedagogy and instruction in practice. Indeed, this set of scholars interrogate the set of limitations of said global protocols via this study, which may represent the most comprehensive, multi-faceted critique of such protocols to date. In contrast, they argue for the use of more newly developed, fine-grained, teacher observational systems that can be used to record and more carefully analyze the individual particulars related to effective teacher practice (e.g., utterances, questions, turns at talk, etc.). These systems, Kelly, Bringe, Aucejo, and Fruehwrith argue, seem to offer states’ teacher evaluation systems, and the policies and policy-based consequences surrounding such systems, more promise and potential. Ultimately, they argue, using global observation protocols in some cases can be interpreted as positive; for example, when principals report relying on such data when making hiring decisions. Yet, and especially from a purely measurement standpoint, the limitations surrounding these global protocols that they outline in this study are severe and multifaceted. Henceforth, genuine alternatives to global protocols, including methods also relying on the latest technology in automated methods of observation, should be pursued.

Eighth, Timothy Ford, University of Oklahoma, and Kimberly Kappler Hewitt, University of North Carolina-Greensboro, offer a piece, “Better Integrating Summative and Formative Goals in the Design of Next Generation Teacher Evaluation Systems.” In this article, they explore how the the two main purposes of teacher evaluation—professional growth/improvement (formative) and accountability/goal accomplishment (summative)— are often at odds with one another. Hence, they argue that the challenge of the next generation of teacher evaluation systems will be to better integrate these two purposes in policy and practice. Correspondingly, they integrate frameworks of self-determination theory (SDT; see, for example, Ford, 2018, Ryan & Brown, 2005) and Stronge’s Improvement-Oriented Model for Performance Evaluation (Stronge, 1995) to critically examine teacher evaluation policy in Hawaii and Washington, DC, two distinctly different approaches to teacher evaluation, to identify a set of policy recommendations for improving the design and implementation of teacher evaluation policies moving forward. What they found were, among multiple other findings, inequitable power relationships at levels of both policy and practice, that influence how evaluation feedback is received and used. What they called “lop-sided power dynamics” seem to stifle two-way, meaningful communication and change; hence, one primary goal surrounding both policy and practice should be to work to reduce power inequities and re-center teachers as key actors in any teacher evaluation system. This, Ford and Kappler Hewitt argue, will help to ensure that feedback gets used, not just for effectiveness judgments, but also for actually improving teaching, especially if coupled with peer support, intensive coaching, and successful modeling. Structured autonomy, clarity of expectations, and self-determined action within evaluation systems, they also argue, promote use of feedback for growth (see also a set of six, more specific recommendations for policy and practice in the full paper). While Ford and Kappler Hewitt recognize and make explicit that there is considerable tension between making evaluation personally meaningful while maintaining systems that also allow for at least some inter- or intra-teacher comparisons, they acknowledge that such comparisons should not be normative, but rather criterion-based as per sets of high professional standards.

Ninth, and finally, Mark Paige, University of Massachusetts-Dartmouth, offers his legal perspective in “Moving Forward While Looking Back: How Can VAM Lawsuits Guide Teacher Evaluation Policy in the Age of ESSA?” He notes that immediately following Race to the Top (2011), many states and districts rushed to adopt VAMs for purposes of teacher evaluation and high-stakes employment decisions, which subsequently landed a good number of states and districts (e.g., n @ 15; see, for example, Education Week, 2015) in court. Drawing upon what we as a nation might learn as a result of these lawsuits, Paige provides the most important lessons for states and school districts that continue to use VAMs, or are contemplating their use, so that they might use them in much wiser, more informed, and more defensible ways. Likewise, the significance of understanding these lessons, for states and districts no longer under federal mandates post-ESSA (2016) is even more important, so that states and districts might avoid lawsuits themselves, especially if policy prone to the attachment of high-stakes consequences to VAM-based teacher evaluation output. In addition, even though evidence suggests that the use (and abuse) of VAMs is declining across states (see, for example, Close, Amrein-Beardsley, & Collins, 2020; Ross & Walsh, 2019), several states do still require or permit them, making their continued assessments, especially in terms of the law, relevant. For example, across the cases reviewed in this piece, Paige notes that plaintiffs were generally unsuccessful on theories arising under the substantive due process clause of the Fourteenth Amendment of the U.S. Constitution. However, in at least one federal case, Houston Federation of Teachers v. Houston Independent School District (2017), plaintiffs succeeded in their challenges to VAMs as based on the procedural due process clause of the Fourteenth Amendment. Likewise, a court upheld a challenge to the use of VAMs based on state law. Notwithstanding, Paige concludes this paper with several recommendations that caution against the use of VAMs, again, especially for high-stakes decision-making purposes. While other factors must enter such deliberations, including potential vulnerability to claims under procedural due process, state law, or collective bargaining, quite apart from assessing the legal liability associated with using VAMs, districts must consider the “costs” of continued use of VAMs that include some of the following: the acrimony created by the use of VAMs, a district’s capacity to effectively implement and provide actionable feedback based on VAMs, and the costs of defending (in court) their continued use, if needed.


A close read of these nine articles reveals the tensions still ongoing, really regardless of the passage of ESSA (2016), primarily between policy and research communities, surrounding the evaluation of teacher effectiveness and quality. This is notably evidenced in the commentary authored by Holloway, who notes that these tensions are fundamentally and firmly rooted in epistemological and ontological views that our nation’s (and other nations’) problems and solutions can be understood through “datafication.” For starters, given the freedom we have been afforded by ESSA (2016), Holloway argues, we must consistently question how such limited ways of thinking actually constrain our capacity to imagine new and more innovative solutions to said problems. King and Paufler offer in their commentary one such solution; that is, to excavate the theoretical underpinnings surrounding current teacher evaluation systems in order to better consider how they might better intersect with social learning theory, so as to better support the ideas of professional learning more broadly, also via stakeholder perspectives and participation within and across social and professional boundaries (e.g., via CoPs).

Similar, albeit more pragmatic tensions are evidenced in the pieces by Braun and Youngs and Ford and Kappler Hewitt. Braun and Youngs evidenced how special educators experienced conflict and dissonance, in comparison to their general education peers, when being evaluated using a teacher evaluation system initially developed to be uniform across teachers. Policy implications here include but are not limited to the development and implementation of teacher evaluation policies that not only acknowledge how teacher roles differ by subject area, but also by and as situated within various classroom-, school-, district-, and community-based contexts. The purported need for uniformity may not, in fact, be all that necessary, especially when the goals of a teacher evaluation system might be to support teachers, in order to better support all types of students in all types of learning. Related, Ford and Kappler Hewitt make explici that there is considerable tension between making evaluation personally meaningful, especially as situated within the two main purposes of teacher evaluation—professional growth/improvement (formative) and accountability/goal accomplishment (summative)—both of which are often at odds with each other. Hence, they argue that the challenge of the next generation of teacher evaluation systems will be to better integrate these two purposes into both policy and practice, with emphases on offering solutions to ensure that teachers are not pitted against one another, and to also receive better feedback that can be more easily accessed, understood, internalized, and then used in order to actually improve teaching.

At a larger scale, tensions are noted in the empirical piece authored by Close, Collins, and myself, in terms of how the state-level changes observed post-ESSA (2016) might be interpreted as progressive; although, some states are still very much grappling with adopting and applying such changes. This is true, we argue, likely given the substantial financial and human resources invested in states’ post-Race to the Top (2011) teacher evaluation systems and the residual effects of these systems. Put differently, even though the US is four years past the passage of ESSA (2016), the sweeping reforms called for and incentivized via Race to the Top (2011) are not shifting as rapidly as one might have thought, especially given the enthusiasm that followed after ESSA (2016) was passed (see, for example, Strauss, 2016). Notwithstanding, change is obvious, as is another set of tensions arising as changes take place (e.g., state leaders facing difficulties when trying to support states’ districts’ now more varied teacher evaluation systems). Associated, in his contribution, Malloy explains how and why such changes may not quickly (or as quickly as possibly anticipated) be enacted, with a case in point coming from Wisconsin. Malloy, more specifically, explores why Wisconsin leaders have struggled to elicit full engagement from educators, despite most educators favoring the switch from their state’s former and relatively punitive accountability-based logic. He ultimately argues that delays can be attributed to the fact that Wisconsin’s change in theory was not matched by the radical restructuring for which said theory called. Again, and to their credit, however, Wisconsin seems aware of this and is continuing to move forward with a new and improved teacher evaluation system.

One practice of promise that states like Wisconsin might consider is presented in the systematic literature review offered by Ridge and Lavigne. In sum, they evidenced that developing and implementing, all the while studying peer observation and feedback systems, may offer a sound alternative to the more traditional teacher observational practices of the past, whereby administrators do this work and, apparently and in general, do not do it very well. Inversely, it is becoming increasingly appararent that teachers engaged with peer observation and feedback systems purportedly benefit from the increased opportunities to work more closely with on another such observational approaches offer; although, scholars of few studies have thus far documented significant changes in teachers’ actual instructional practices as a result. Hence, while Ridge and Lavigne do offer a practice of promise in this piece, they note that states might move forward with care and concern about the intended (and unintended) effects that might result if peer observation and feedback systems are developed and implemented. Related, Kelly, Bringe, Aucejo, and Fruehwrith, after offering another thorough and thoughtful critique of traditional observation systems (or “global” obsertional protocols), argue for the use of more newly developed and fine-grained teacher observational systems that can be used to record and more carefully analyze more nuanced and individual particulars related to effective teacher practice, as well as rely on the latest technologies in automated methods of observation. These systems, they posit, will also offer states’ teacher evaluation systems more promise and potential in terms of actually supporting teachers with better feedback, which would likely lead to more internalization and effective use.

While not necessarily a practice of promise, is a set of policy recommendations that come from the final piece in this special issue. This piece, authored by Paige, I would interpret, perhaps, most important for states still using or contemplating using VAMs in their post-ESSA (2016) teacher evaluation policies, systems, and plans. Drawing from the approximately 15 lawsuits that came about a result of states’ adoptions and implementations of high-stakes teacher evaluation policies, as primarily (or solely) based on VAM-based teacher evaluation output (Education Week, 2015), Paige provides us with the most important lessons for states and school districts to use, to not only move their teacher evaluation systems forward in wise and informed ways, but also in more legally defensible ways, especially so as to keep them out of court. In terms of policy implications, in other words, this set of law-based recommendations I would interpret as critical.

Otherwise, it is in this context that these theoretical and empirical papers are presented to readers, individually and collectively, as these papers stand to “add” much “value” to our current thought, with implications for both practice and policy, in and of themselves. While these pieces not only contribute to the literature regarding teacher evaluation systems and the federal and state educational policies that surround them, they also contribute to our collective thought about how policymakers, their affiliates, and others might think in more forward-thinking and innovative ways when moving (or attempting to move) their teacher evaluation systems and measures frontward so as to, ultimately, help teachers improve upon their practice and help students learn and achieve more, and more in terms of what actually matters.


Audrey Amrein-Beardsley
Arizona State University
Policies and Practices of Promise in Teacher Evaluation

