Beyond “Best Practices”: Centering Equity in Teacher Preparation Evaluation 1

: Over the last decade, there have been multiple recommendations for evaluating, assessing


Centering Equity in Teacher Preparation Evaluation
Teacher education has been a highly-scrutinized and contested enterprise since its emergence in the mid 19 th century, and there have been continuous cycles of critique and reform for decades. However, during the last decades of the previous century and the early decades of the current one, the deregulation of teacher preparation providers (Brewer & Lubienski, 2019), the push for teacher preparation practice based on scientific research (National Research Council, 2001), the emphasis on accountability (Cochran-Smith et al., 2018), and unprecedented attention to teacher quality internationally (OECD, 2005;World Bank, 2010) have brought many new demands and new critiques regarding teacher preparation. Influenced by this context and as a result of a Congressionally-mandated study, the National Research Council (NRC) published Preparing Teachers: Building Evidence for Sound Policy in 2010, a report that was widely-disseminated and influential in the policy world, prompting many proposals from various organizations about how teacher preparation evaluation should be carried out. The NRC report was intended to respond to policymakers' demands to know the extent to which the characteristics, practices, and policies that typified teacher preparation in the United States were or were not consistent with scientific evidence. The report reached three key conclusions: there was enormous variation both between and within differing pathways into teaching, rather than one clear superior route; causal evidence linking characteristics of teacher candidates and/or preparation programs with student achievement or other outcomes was complex and very difficult to develop; and, there was a need for a comprehensive data collection system in the United States that would support quality control and accountability in teacher preparation.
Since 2010, there have been multiple reports and other documents that make policy recommendations and/or specify the "best ways" to evaluate, assess, and hold teacher preparation accountable and/or stipulate how evaluation information should be used to improve teacher preparation. The original version of this article was part of the Evaluating and Improving Teacher Preparation Programs project launched by the National Academy of Education with the support of a major grant from the Gates Foundation. 2 As the authors of this article, we were charged with preparing an analysis of recent work regarding "best practices for evaluating teacher preparation programs" by synthesizing and critiquing major policy reports and policy proposals explicitly focused on teacher preparation evaluation. To fulfill this charge, we reviewed 19 major policy reports or proposals about teacher preparation evaluation, assessment, or accountability, published between 2010 and 2020. Our analysis revealed that the primary goal of the majority of existing reports was identifying strengths and weaknesses of evaluation metrics based on rigorous criteria for accuracy and utility. Our analysis also revealed that the majority of reports did not position equity as a central goal of evaluation and actually said very little about equity explicitly, although some assumed that equity was a by-product of rigorous evaluation systems.
Building on our analysis of the 19 reports and in light of continuing inequities in educational resources, outcomes, and experiences for minoritized groups, we call for an equity-centered approach to teacher preparation evaluation that acknowledges the serious inequities in educational opportunity and attainment across groups in the United States as well as the important role teacher preparation evaluation can play as part of larger efforts to overcome disparities in opportunity and attainment. We argue that strong equity, which we elaborate below, should be established as an explicit goal and a desired outcome of teacher preparation evaluation, and that it should be central to the design, interpretation, uses, and consequences of evaluation.

Teacher Preparation Evaluation: A Complex Landscape
In the United States, the teacher preparation evaluation context is particularly complex. Lack of consensus about the value of teacher preparation coupled with market-based responses to the perceived pressures of the global economy (Ambrosio, 2013;Scott, 2016) have combined with other forces over the last three decades to produce a crowded, rapidly changing, and fragmented teacher education field (Lincove et al., 2015) characterized by competing reform agendas (Cochran-Smith, 2001;Zeichner, 2017). Within this larger context, by the early 2000s, there was widespread attention to teacher preparation evaluation and accountability from both within and outside the field. In fact, accountability was regarded by many policy actors as a key mechanism for "fixing" teacher preparation, which was characterized as a "broken" system (Duncan, 2009;U.S. Department of Education, 2002). Understanding the landscape of teacher preparation evaluation involves sorting out the roles of state and federal agencies, philanthropies, independent advocacy organizations, and professional organizations.

Federal and State Roles in Teacher Preparation Evaluation
In 2009, federal Race to the Top legislation was passed in the United States, which was intended to promote innovation and reform in K-12 education by adopting standards and assessments related to college and career readiness, building state data systems to measure students' achievement and improve instruction, provide effective teachers and leaders where most needed, and turn around low-achieving schools. Related to this effort, the Obama administration issued a bold new blueprint for the reform of teacher preparation-Our Future, Our Teachers (U.S. Department of Education, 2011). This report was consistent with the general shift away from the treatment of teacher preparation as a primarily local-and state-level concern and toward the treatment of teacher preparation as a federal-and state-level "policy problem" (Cochran-Smith, 2005), a shift that had been occurring since the mid 1980s (Bales, 2006). Our Future, Our Teachers was completely consistent with the Obama administration's education reform agenda, which, building on the efforts of the previous administration, relied on market competition to elevate "good" programs and drive out "bad" programs (Au, 2016;Scott, 2016;Taubman, 2009). The blueprint aimed to tie federal resources to the achievement of the school students taught by graduates of particular teacher preparation pathways, thus connecting federal, state, and institutional policy levels. Although the 2011 blueprint later died in committee, many of its policies reappeared in the Title II Higher Education Act (HEA) regulations proposed in 2014 (U.S. Department of Education, 2014), which stipulated that annual reporting regulations for all teacher preparation programs in the nation would focus on outcomes, including students' achievement, graduates' job placement and retention data, and graduates' and principals' program satisfaction. These Title II proposed regulations prompted unprecedented public and professional opposition, particularly to the idea of tying teacher preparation program evaluation to the test scores of the students of program graduates. Although the opposition extended over almost two years, the new regulations were nevertheless approved at the end of the Obama administration in late 2016, and then almost immediately rescinded at the beginning of the Trump administration in early 2017.
At the state level, over the course of the 2010s, policy makers and education agencies continued efforts to improve state approval requirements for teacher preparation, with a similar shift in many states toward outcomes-based accountability. States that were awarded Race to the Top grants were required to develop data systems linking preparation programs to K-12 student achievement, a trend followed by some other states (Von Hippel & Bellows, 2018). Additionally, in 2012, the Council of Chief State School Officers (CCSSO) created a multi-state, multi-year reform effort-the Network for Transforming Educator Preparation (NTEP)-to leverage state authority over preparation program approval and licensure, with data systems a key policy lever (CCSSO, 2018).
Despite these developments, as the 2010s went on, there were challenges to state-level data systems, the withdrawal of broader federal policy levers, and growing evidence questioning the validity, reliability, stability, and utility of inferences based on value-added measures and growth modeling for the purpose of evaluating individual teachers and/or teacher preparation programs (e.g., AERA Council, 2015;Darling-Hammond, 2020;Haertel, 2013;Noell et al., 2019). At the same time, many states adopted nationally available or state-initiated performance assessments (e.g., edTPA, ETS NOTE, Massachusetts CAP) as a requirement for teacher certification and/or program approval.

Philanthropic and Advocacy Group Involvement in Teacher Preparation Evaluation
During this same time, there was continued philanthropic interest in teacher preparation evaluation. Hess (2005) called these efforts "muscular philanthropy"-or, large gifts funded by a small group of donors (e.g., Bechtel Foundation, Bill and Melinda Gates Foundation, Broad Foundation, Charles and Lynn Schusterman Foundation, New Schools Venture Fund, Walton Foundation) tied to expectations about innovation and accountability (Colvin, 2005). For example, multiple philanthropies funded private advocacy organizations such as the controversial National Center for Teacher Quality (NCTQ), which critics excoriated because of their highly-politicized report cards for teacher preparation programs (Cochran-Smith et al., 2013;Fuller, 2013;Paulson & Marchant, 2012). Another example of the increased role of philanthropies is the Bill and Melinda Gates Foundation's Teacher Preparation Transformation Centers, which funded TPI-US, an independent inspectorate, to review teacher preparation programs to highlight practices for expansion (TPI-US, 2020). The National Academy of Education project on Evaluating and Improving Teacher Preparation, noted above, was also funded by the Gates Foundation, and many of the reports analyzed in this article were funded by the philanthropies listed above. Zeichner & Peña-Sandoval (2015) have suggested that these efforts represent an "outsized role" of private interests in teacher preparation policy.

Professional Involvement in Teacher Preparation Evaluation
With the 2010s, also came major shifts in national teacher preparation accreditation, reflecting a lack of consensus within the profession. In 2013, the two existing accreditors, the National Council for the Accreditation of Teacher Education (NCATE) and the Teacher Education Accreditation Council (TEAC), merged to form the Council for the Accreditation of Educator Preparation (CAEP) with the goal of presenting a unified voice and elevating the profession. Building on federal HEA Title II reporting regulations, CAEP's standards required preparation programs to demonstrate candidates' and graduates' impact on K-12 student learning. There was enormous controversy surrounding these outcome standards and about candidate selectivity standards, which threatened CAEP's credibility within and outside the profession. In 2017, a new national accrediting body, the Association for Advancing Quality in Educator Preparation (AAQEP) was founded, in part, in response to critiques of CAEP. AAQEP (2020) tied accreditation to innovation, quality, and responsiveness to program context, explicitly stating it was grounded in trust of the profession and with standards developed collaboratively with stakeholders.
While there was disagreement within the profession about accreditation, there was general convergence about the importance of clinical practice (AACTE, 2018;NCATE Blue Ribbon Commission, 2010). Many preparation programs implemented some version of practice-based teacher education, such as clinically-rich preparation, teacher residency programs, and/or emphasis on "core" practices. Along these lines, there was increased attention to the development of measures linking clinical experience and teaching practice (e.g., performance assessments), the quality of program and K-12 school partnerships, and the "effectiveness" of cooperating teachers and fieldbased teacher educators (e.g., Darling-Hammond, 2020;Goldhaber, 2019;Ronfeldt et al., 2018;).

The Role of Equity Agendas in Teacher Preparation Evaluation
During the decade of the 2010s, there were also many efforts by preparation programs and professional collaborations to make equity and justice more central in teacher preparation programs. There were also excoriating critiques of racial injustice within teacher education itself and of its general failure to acknowledge and respond to its own history of White supremacy (e.g., Anderson, 2019;Brown, 2013;Daniels & Varghese, 2020;Milner et al., 2013;Philip et al., 2018;Salazar, 2013;Sleeter, 2017). These criticisms built on a long history of critique by scholars who had advocated over many years for teacher education to address head-on issues of culture, race, social justice, equity, and the values of minoritized groups in curriculum, fieldwork, policy, and practice (e.g., Cochran-Smith, 1995Grant, 2008;King, 2008;Ladson-Billings, 1999;Nieto, 2010;Sleeter, 2001Sleeter, , 2009Villegas & Lucas, 2004;Zeichner, 2003Zeichner, , 2009. Despite historical and contemporary critiques, however, as we elaborate in later sections of this article, during teacher education's "era of accountability" from roughly 1998-2018(Cochran-Smith et al., 2018, there was little explicit attention to equity as a goal of major evaluation systems.

Methods and Analytic Framework
In this article, we analyze and critique major policy recommendations, reports, and critiques published between 2010 and 2020, whose explicit topic is the nature, characteristics, and/or strengths and limitations of teacher education evaluation/assessment/accountability systems, tools, or initiatives in the United States. It is important to note here that the analysis we offer in this article is not a comprehensive, systematic review of the literature about all aspects of teacher preparation evaluation, broadly construed. Rather the focus is limited to widely-disseminated reports, policy briefs, books, or other policy documents published by professional organizations, governmental bodies, policy centers, or major academic publishers and explicitly focused on teacher preparation evaluation policy in the United States. All of these publications include proposals, critiques, recommendations, or analyses of the strengths and weaknesses of differing approaches to teacher preparation evaluation and/or of the components that evaluation policies or systems should include. To constitute this body of literature, we used the search terms, "teacher education" (or "teacher preparation" or "teacher quality") and "evaluation" (or "assessment" or "accountability") to locate reports, books, policy briefs, and other documents published by relevant professional organizations or by peer-reviewed academic presses. We also had recommendations for reports from the senior scholars connected to the National Academy of Education's Evaluating and Improving Teacher Preparation Programs project. Based on this search process, we identified 19 reports and other documents, which are included in the reference list in the Appendix.
It is also worth mentioning here what we did not include in our analysis. We did not include reports focused on teacher evaluation rather than teacher preparation evaluation nor reports about educational evaluation in general nor studies of international teacher quality systems. We also did not search specifically for literature about equity and teacher preparation, although this is a rich and burgeoning literature that has evolved for more than 50 years. There are indeed many articles by individual authors or groups that discuss teacher preparation programs' efforts to include equity in evaluation or present new tools or assessments that may be used in teacher preparation to focus on equity. In short, the initial aim of this article was not to focus on equity and teacher education or even equity and evaluation. In fact, in this article, our focus on teacher preparation evaluation and equity was not the original purpose of this analysis; rather the equity focus (or, more accurately, the lack of attention to equity) emerged from our critical analysis of the emphases and omissions of major policy proposals regarding teacher preparation evaluation/assessment/accountability. Finally, we want to emphasize that the point of this review was not to search out a range of voices, perspectives, and sources, but was rather to identify the major policy reports, as described above, and to unpack the perspectives, purposes, and aims explicit or implicit in these. In our recommendations at the conclusion of this article, we acknowledge and call for a wide range of voices and perspectives in teacher preparation evaluation tools, approaches, and systems. Below we discuss the analytic framework that guided our review of the reports and our positionality as the authors.

Analytic Framework: Theories of Evaluation
Over the last two decades, many professional organizations, philanthropies, consultants, advocacy organizations, and academic groups have made recommendations about how evaluation should be done and how accountability systems should operate in teacher preparation. These organizations work from different assumptions about the purposes of teacher preparation and evaluation. They also disagree about the best measures to use, who should be included, and what the roles and relationships of stakeholders should be.
We organized the 19 reports according to their underlying theories or models of evaluation, drawing on well-known frameworks for describing the history and landscape of the crossdisciplinary field of program evaluation. In seminal work in this area, Alkin and Christie (2004 identified three approaches or models of evaluation, which they labeled methods, use, and valuing. Building on Alkin and Christie's work, Wilson (2012, 2019) suggested four paradigms of evaluation that roughly mapped onto, but also extended, Alkin and Christie's models, which they labeled postpositivist, pragmatic, constructivist, and transformative. We drew on these frameworks to identify three approaches to teacher preparation evaluation. It is important to note here that these categories were developed to facilitate analysis across dozens of evaluation theorists' espoused prescriptions of how evaluation should be practiced. Particular evaluations and evaluators may, and often do, incorporate assumptions from multiple models, blending and adapting approaches in practice rather than adhering to a particular approach. However, when considering trends across evaluations, as we do in this article, this framework is a useful organizing heuristic.

Postpositivist, Methods-focused Approaches to Evaluation
According to Alkin's (2004) seminal volume, the major purpose of evaluation is to assess the degree to which programs are accountable for their actions and use of resources coupled with the public desire for systematic methods of accountability consistent with the conventions of social inquiry. As Alkin and Christie point out, Cook and Campbell (1979) were central in defining this perspective on evaluation as research, which depends on the application of rigorous research methods to produce generalizable findings. This perspective is consistent with Merten and Wilson's (2012, 2019) "postpositivist paradigm" in evaluation, which recognizes that although knowledge is not infallible, it is possible to produce warranted generalizations about human organizations by applying the norms of scientific research (Phillips & Burbules, 2000). Postpositivist, methods-focused approaches to evaluation assume an objective relationship between researchers and those being researched and assume that valid scientific methods should be used to produce justifiable conclusions.

Pragmatic, Use-oriented Approaches to Evaluation
Alkin and Christie's (2004) second approach focuses on use rather than methods-that is, how the knowledge produced through evaluation can be used by stakeholders in program decisions. This approach was prompted by dissatisfaction with methods-focused evaluation research that failed to impact policymaking or practice (Weiss, 1998). Patton (2008) characterized this approach as part of the "utilization turn" in evaluation with an emphasis on "intended uses by intended users." Here, the goal is to design evaluations that produce knowledge to inform the decisions practitioners and others must make. This model of evaluation is consistent with Wilson's (2012, 2019) "pragmatic" paradigm in evaluation, which values the impact of evidence as much as the scientific rigor through which evidence was developed. The use-oriented approach works from a utilitarian stance, assuming that the worth of an evaluation is not simply the rigor of methods, but rather the consequences-that is, whether an evaluation "works" to support certain kinds of improvements. With pragmatic, use-oriented approaches, no particular research method is necessarily privileged; rather, methods are designed to match purpose and use, and evaluators make choices about what to study based on their knowledge and relationships with stakeholders.

Transformative, Equity-centered Approaches to Evaluation
A third approach within the field of evaluation emphasizes evaluation as a constructivist process (Merten & Wilson, 2012, 2019) wherein evaluators make judgments by valuing particular goals intended to serve the public interest (Alkin & Christie, 2004;Christie & Alkin, 2008. Building on this work, evaluation theorists make a distinction between general values-centered approaches to evaluation (Alkin & Christie, 2004;Christie & Alkin, 2008) and explicitly justice-or equity-centered approaches, which are "transformative" (Merten & Wilson, 2012Mertens & Zimmerman, 2015;Thomas & Campbell, 2020). 3 Transformative approaches often utilize dialogic qualitative methods, ethically centered in cultural respect, human rights, and reciprocity. Here, the idea is that evaluation is inherently a valuing-and political-activity with the potential for political influence and that evaluators should guard against power imbalances by considering whose interests are served and whose voices are included (Greene, 2006;House & Howe, 2000). Along related lines, "culturally responsive evaluation," centers evaluation in culture and cultural competence, rejecting the idea that evaluation is culture-free (Haugen & Chouinard, 2019;Hood et al., 2015). Culturally responsive evaluation seeks "to bring balance and equity into the evaluation process" (Hood et al., 2015, p. 283) by recognizing unequal resources and drawing on the lived experiences of marginalized groups (Thomas & Campbell, 2020).
Making issues of equity front and center in evaluation is a growing agenda in program evaluation, a position supported by some funders and philanthropies (e.g., Farrow & Morrison, 2019;Wiggins & Sileo, 2020). Along these lines, a framing paper from the Center for Evaluation Innovation (2017) argues that evaluation itself should be "conceptualized, implemented, and utilized in a manner that promotes equity." As a number of evaluation researchers (Andrews et al., 2019;Gates, 2017;Schwandt & Gates, 2016 have suggested, equity-centered evaluation raises normative questions about objectivity, methods of evaluation, rigor, evaluators as agents of change, and professional responsibility. These questions are definitely not settled in the field of evaluation. To the contrary, these questions and their entanglement with highly politicized issues related to racism and racial justice are currently a point of contention within the field (Hall, 2018).

Researchers' Perspectives/Positionality
As co-authors of this article, we have substantial histories in the field of teacher education. The first author is a university-based teacher education scholar and practitioner who has written extensively about justice and equity and who has studied accountability in teacher education over the last 20 years. The second author has worked on social justice-oriented policy and practice in teacher education for the past decade. Like some of the scholars reviewed above, we work from the assumption that no approach to teacher preparation evaluation is objective, apolitical, or innocent of questions about whose interests are served, whose perspectives are represented, and whose voices are included in evaluation.
Fully recognizing that values are inherent in any approach to teacher preparation evaluation, however, we do not take a relative stance in this article, simply describing variations in recommendations. Rather we aim to take a stand, following Greene (2006) and others (Farrow & Morrison, 2019;Gates, 2020;Haugen & Chouinard, 2019;House & Howe, 2000;Mertens & Zimmerman, 2015;Schwandt & Gates, 2016Thomas & Campbell, 2020), who have argued both that the most defensible values in evaluation are justice, equity, and empowerment and that it is critical to understand how power is taken up in the practice of evaluation. In particular, in this review, we raise questions about the presence, absence, and meanings of equity and justice in teacher preparation evaluation. Our analysis is grounded in the premise that the work and lives of students, teachers, community members, and evaluators are mediated by long-standing, intersecting systems of inequality.

Analyzing "Best Practices" for Teacher Preparation Evaluation
In addition to variations in their conclusions and recommendations, the 19 reports we reviewed differed in format, length, organization, scope, audience, sponsoring agencies, and the larger policy or political agendas to which they were attached. To synthesize and critique these reports, we first grouped them into the three categories introduced above, based on their underlying assumptions and theories of action related to evaluation. In Figure 1, within each of three categories, the reports are organized chronologically and by organization or lead author.

Postpositivist, Methods-focused Approaches to Teacher Preparation Evaluation
As noted above, the 2010 NRC report on teacher preparation and scientific research was seminal. Many of the reports over the next decade, particularly those in our first and second categories, were responses to the NRC call and to the broader policy and political milieu out of which it emerged. Figure 2 summarizes the reports in the first group.
In many countries over the last two decades, accountability has come to be regarded as a powerful policy tool for improving teacher preparation. In the United States, the logic of the accountability approach, as reflected in the reports in this first group, is captured in this string of claims: holding teacher education accountable boosts the quality of preparation; boosting the quality of preparation increases the level of teacher quality, especially in students' achievement; and, higher levels of achievement ensure the economic health of individuals and the nation. The key accountability assumption here is that teacher education can be "fixed" through rigorous public evaluation of the inputs, processes, and outcomes of preparation programs.

Purpose and Values
Most of the reports in this first category were produced in the midst of contentious debates about how teacher preparation should be held accountable to the public and the profession. These reports aim to make evidence-based recommendations for improving or overhauling the state, federal, and professional evaluation systems that govern teacher preparation. By labeling the theory of evaluation underlying the reports in this group, "postpositivist, methods-focused approaches to teacher education evaluation," we emphasize that these reports zeroed in on the preferred metrics of evaluation systems. The core principles of this approach are objectivity and rigor, along with the belief that policymakers and practitioners are responsible for making evidence-based decisions. Here, the assumption is that teacher preparation quality will be improved when programs are accountable for outcomes with severe sanctions for failure to do so. It is important to note that although the reports in the postpositivist category adhere to the principles of rigor and objectivity, they are not devoid of values. Their key underlying value is that teacher quality-defined as a uniformly effective work force-must be provided to all students in all schools.
In the early 2010s, the Center for American Progress published a trio of reports by Crowe (2010Crowe ( , 2011a, an independent advisor on teacher quality for public and private agencies, which outlined a federal model for creating "real" accountability in teacher preparation. Crowe asserted that every state evaluation system should have four assessments: measures of teacher effectiveness (e.g., value-added assessments linking preparation data to teacher and student achievement data), feedback from graduates and employers, tests of teacher knowledge and skill, and measures of teacher retention. Two years later, the Education Trust, a national non-profit organization promoting student achievement, released a report (Almy et al., 2013) recommending that, under threat of removal of federal fund eligibility, the Higher Education Act should require all states to hold preparation programs accountable for the performance of teachers using statewide measures of impact along with employment, retention, and program selectivity data. • Offer a policy agenda for outcomes-based accountability and data-informed improvement in teacher preparation.

Figure 2 Post-positivist, Methods-focused Approaches to Teacher Preparation Evaluation, Assessment, and Accountability (arranged chronologically and by author/group)
• Improve access to data through policies that provide data on teacher preparation program graduates • Outcomes-focused certification processes • Improve the research base on how to prepare effective teachers through evidence on program structures, policies, and practices.
In 2014, Teacher Preparation Analytics (TPA), a company aimed at developing high-leverage strategies to strengthen teacher preparation, released a framework for teacher preparation assessment (Allen et al., 2014). The TPA report, commissioned by CAEP, proposed a set of "key effectiveness indicators" regarding candidate selection, teaching knowledge and skills, classroom performance, and alignment with state needs, to be in place by 2020. Along related but different lines, the report of the American Psychological Association (APA) Task Force on Assessing and Evaluating Teacher Preparation Programs (Worrell et al., 2014), comprised primarily of education school-affiliated psychologists, provided empirical support for several of CAEP's controversial standards. The APA report asserted that all program assessments should be valid and reliable, thus allowing users to make comparisons on an "even playing field." The 2016 report of Deans for Impact, an organization of education school deans supporting the teacher effectiveness agenda, advocated state-level evaluation systems that produced "actionable data" (p. 2). The dean's group argued that it was precisely the lack of "valid, reliable, timely, and comparable data about the effectiveness of the teachers and school leaders they prepare" (p. 2) that had plagued preparation programs for years and prevented them from moving from "chaos" to "data coherency" (p. 3).
Finally, Carinci et al. (2020), education researchers interested in accountability practice and policy, edited a volume in Information Age Publishing's series on assessment in educator preparation. The volume focused on data-driven accountability linking program design to outcomes. This volume was produced after the controversies about Title II HEA regulations and thus was not intended to influence those debates. Nevertheless, the report was intended to drive continuous improvements in practice by using empirical research to open the "black box" between teacher preparation and outcomes.

"Best Practice"
The reports in this first category assumed that the validity of assessments along with their uniform implementation in state and professional evaluation systems were key to improving teacher preparation programs and the teachers they produced. The reports conceptualized evaluation "best practice" in two ways: (1) endorsement of particular methods or statistical approaches based on scientific evidence; and, (2) identification of exemplary state evaluation systems.
With the exception of the Carinci volume, the reports in the first group emphasized that the most important aspect of teacher preparation evaluation-arguably, the "best" of the "best practices"-was state-wide use of valid measures of new teachers' impact on student learning linked to information about the programs that prepared those teachers. Although the reports issued by Education Trust (2013) and Deans for Impact (2016) did not specify particular measures of teacher effectiveness, Crowe (2010Crowe ( , 2011a and both the TPA (Allen et al., 2014) and the APA (Worrell et al., 2014) reports recommended value-added methods, suggesting that problems involved in using these for teacher and program evaluation could be overcome. This conclusion has not been supported by measurement experts (AERA Council, 2015;Braun, 2005;Easton, 2008;Economic Policy Institute, 2010;Haertel, 2013). In addition, researchers have concluded that these systems generally do not provide information about how programs might improve (Goldhaber, 2013;Plecki et al., 2012).
Other "best" measures recommended by the reports in this first category include standardized protocols for classroom observations and interactions with a direct impact on student learning. Here, the APA report (Worrell et al., 2014) specified the CLASS observation instrument (Pianta & Hamre, 2009) and observation protocols identified by the MET project (2012).
Some of the reports made sweeping recommendations regarding "best practices" for state evaluation systems. As noted above, Crowe (2010Crowe ( , 2011aCrowe ( , 2011b proclaimed that all states should adopt new accountability systems to improve outcomes, pointing to Tennessee and Delaware (first round Race to the Top fund recipients) and Florida and Louisiana (second and third round recipients) as promising. Based on data available in their members' states, Deans for Impact (2016) called for a policy agenda to provide teacher effectiveness data in all states and create a new outcomes-focused certification process that elevated effectiveness-centered programs. The Education Trust report (Almy et al., 2013) called for redesigned HEA state reporting requirements with performance measures tied to federal funding. The TPA report (Allen et al., 2014) reviewed 15 states according to their proposed effectiveness indicators, concluding that implementing these indicators was beyond the current efforts of the states.
Although different, the reports in this first group were remarkably consistent in purpose and specific recommendations regarding teacher preparation evaluation. They emphasized externallydriven outcomes-based evaluation systems featuring valid assessments of the impact of new teachers on student learning, standardized classroom observation protocols, satisfaction surveys, and employment information. They assumed that implementing comprehensive external evaluation systems and making data public would dramatically improve teacher preparation by identifying strong programs and forcing weak programs to improve or exit the field.

Pragmatic, Use-oriented Approaches to Teacher Preparation Evaluation
The reports in the pragmatic, use-oriented category draw on many of the same purposes, values, and assumptions as the reports analyzed in the previous post-positivist, methods-focused group. However, the reports in the pragmatic category prioritize usability of evaluation findings by intended users and decision makers (Alkin & Christie, 2004;Mertens & Wilson, 2012, as well as alignment among evaluation purposes, use, selection of tools, and audience. Furthermore, the reports in the pragmatic category emphasize the production of trustworthy evidence of interest to specific audiences (e.g., policy makers, professional organizations, preparation programs). Figure 3 lists the reports in this second group.
Almost all the reports in this category were published in the first half of the 2010s in response to debates about federal and state regulations, professional standards, and the growing number of philanthropies and independent organizations involved in teacher preparation evaluation. Some of these reports responded explicitly to the Obama administration's proposed reform plan (Our Future, Our Teachers) or to proposed revisions to Title II regulations or Race to the Top funding requirements.

Purposes and Values
Like reports in the first category, the reports in the pragmatic, use-oriented category offered evidence-based recommendations for "new and better" systems of teacher preparation evaluation, accountability, and assessment. Exploring multiple measures and sources of evidence, these reports described how stakeholders-including federal or state policy makers and representatives of accrediting agencies, independent non-governmental organizations, and preparation programscould utilize evaluation to improve teacher preparation, teacher quality, and K-12 student learning. These reports also zeroed in on alignment across intended purpose, use, values, audience, measures, and stakeholders as a key aspect of effective evaluation.
For example, in a policy brief released by the National Comprehensive Center for Teacher Quality (NCCTQ, not to be confused with NCTQ), a collaborative effort between the Education Commission of the States, ETS, Learning Points Associates, and Vanderbilt University, Coggshall et al. (2012) called for "rethinking" teacher preparation accountability through a more "results-oriented approach" (p. 2) at the state level. The NCCTQ report called for additional "research and capacity building…to bridge the divide between current data and evaluation capacity, and what is needed for accountability, program improvement, and equity" (p. 34). Despite the fact that equity was mentioned as a purpose of evaluation, NCCTQ's attention to equity was limited to identifying programs that prepared high quality teachers for "high-need schools" and "traditionally underserved populations," with the assumption that redistribution of teachers would address disparities in schooling and society, a point to which we return below.
At about the same time, a report from the Council of Chief State School Officers (CCSSO) Task Force on Educator Preparation and Entry into the Profession (2012) offered guidance to state education agencies and policy makers for transforming the profession through state policy levers, including teacher licensure, program approval, and data reporting. The CCSSO Task Force, composed of state education leaders and policy makers, aimed to support program accountability through rigorous and transparent standards and rating systems to ensure that "learner-centered" teachers helped K-12 students meet college-and career-ready standards. The Task Force recommended multiple measures for meeting standards. Toward the end of the decade, CCSSO (2017, 2018) published two follow-up reports, highlighting "leading state efforts" to transform educator preparation through NTEP. The follow-up reports described lessons from states' efforts to transform teacher preparation, primarily through collaboration among state policy makers, agencies, districts, schools, and teacher educators.
Addressing a research, policy, and teacher education practitioner audience, the NAEd report, Evaluation of teacher preparation programs: Purposes, methods, and policy options (Feuer et al., 2013) aimed to clarify the many variations in teacher preparation evaluation systems, assuming that evaluation was a "necessary ingredient" to improving teaching and learning. The NAEd report analyzed the multiple purposes of evaluation, including accountability, consumer information, and programmatic improvement, and it sorted out the many entities involved in evaluation. The report analyzed the strengths and limitations of various sources of evidence in evaluation systems, arguing that any system, set of measures, or source of evidence should be based on principles of validity that lead to defensible conclusions, so that various entities could use the results to make sound decisions.

"Best Practice"
Like the reports in the post-positivist category, the reports in this pragmatic category featured validity as the primary criterion for assessing preparation program evaluation systems. However, the pragmatic reports explicitly attended to the alignment of program evaluation systems and multiple purposes. The NCCTQ (Coggshall et al., 2012) and CCSSO reports (CCSSO Task Force, 2012;CCSSO, 2017CCSSO, , 2018 called for state-level evaluation systems as key levers for improving teacher preparation, teaching, and learning, while recognizing multiple stakeholders. The NAEd report (Feuer et al., 2013) analyzed purposes, methods, and policy options in program evaluation involving multiple organizations and agencies. Overall, these reports conceptualized "best practice" in teacher preparation program evaluation in terms of: (1) alignment across evaluation purposes, measures, and use; and (2) engagement and use by multiple stakeholders and groups.
In terms of alignment, the NAEd report proposed a list of guiding questions to identify the purpose of the evaluation, articulate aspects of teacher preparation that "matter most," determine sources of accurate evidence and useful information, and monitor intended and unintended consequences. Along similar lines, the NCCTQ report recommended that "states and other organizations, in collaboration with stakeholder groups, should consider the strengths and the weaknesses of the available measures and select those that will best fit the context of the evaluation" (p. 34).
Unlike the post-positivist reports, the pragmatic reports did not advocate for specific measures. Rather these reports analyzed the strengths and weaknesses of multiple measures and cautioned against any single measure as the sole or primary source of evidence in an evaluation system. Specifically, the reports analyzed input/process measures including teacher candidate selection criteria (Coggshall et al., 2012;CCSSO, 2012CCSSO, , 2017CCSSO, , 2018Feuer et al., 2013), course syllabi (Coggshall et al., 2012;Feuer et al., 2013), and faculty qualifications (Feuer et al., 2013). They also identified common measures tied to clinical experience, such as quality or number of hours (Coggshall et al., 2012;CCSSO, 2012;Feuer et al., 2013).
Grounded in an outcomes-based approach to evaluation, the reports in the pragmatic category analyzed many of the same output/outcomes measures as did the reports in the first group, including: licensure tests and performance assessments of teacher candidate knowledge and skills (CCSSO, 2012(CCSSO, , 2017Coggshall et al., 2012;Feuer et al., 2013); K-12 student achievement, including growth modeling and value-added models (CCSSO, 2017(CCSSO, , 2018Coggshall et al., 2012;Feuer et al., 2013); teacher evaluation and classroom observations (CCSSO, 2017(CCSSO, , 2018Coggshall et al., 2012;Feuer et al., 2013); employer surveys (Coggshall et al., 2012;Feuer et al., 2013); program graduate surveys (Coggshall et al., 2012;Feuer et al., 2013); hiring and placement data (Coggshall et al., 2012); and retention data (Coggshall et al., 2012). Some of these reports outlined limitations of specific measures, especially value-added approaches and growth modeling as a method for evaluating preparation programs, citing the challenges associated with these measures and calling for further research on their use (Coggshall, 2012;Feuer et al., 2013).
Together, the reports in the pragmatic category sorted out many actors and organizations involved in preparation program evaluation in terms of their purposes-accountability, consumer protection, and/or programmatic improvement. These included federal and state education agencies (Coggshall et al., 2012;CCSSO, 2012CCSSO, , 2017Feuer et al., 2013) along with non-governmental independent organizations, media, national accreditors, and preparation programs. For example, in the CCSSO (2017) follow-up report, state initiatives deemed exemplars featured strong collaboration and engagement with state education agencies, district superintendents, school administrators and teachers, and university preparation programs. Overall, the reports in this category recognized the multiple purposes and stakeholders involved in teacher preparation evaluation and aimed to enhance the usefulness of evaluation systems based on the purposes, audience, and values of the stakeholders and audiences who would use the results of evaluation.

Transformative, Equity-centered Approaches to Teacher Preparation Evaluation
The reports we placed in the transformative, equity-centered category were published in the latter half of the 2010s, during and immediately following the highly contentious debates about federal and state regulations, national accreditation standards, report cards published by advocacy agencies, and specific measures aimed at teacher preparation evaluation and accountability. As we describe below, the reports in this category differ significantly from those in the previous two categories in terms of purposes, values, and assumptions. They fit within a transformative paradigm of evaluation, prioritizing evaluations intended to serve public purposes, such as democracy, equity, and justice, and they explicitly address issues of power and privilege (Mertens & Wilson, 2019). Figure 4 summarizes these reports.

Purposes and Values
The reports in the transformative, equity-centered category were written by teacher education researchers, practitioners, and leaders of schools of education with commitments to justice and equity. Underlying these reports were three key assumptions. First, the reports assumed that teacher preparation evaluation is fundamentally value-laden, inherently political, and attached to broader agendas. Thus the reports did not aim to be "objective" in terms of the approaches, measures, purposes, or consequences of preparation program evaluation. Second, teacher preparation and teacher quality were regarded as part of larger policy and political systems, not independent factors in educational success. Third, teacher preparation program evaluation was considered in relation to broader equity and democratic projects wherein education was viewed as a public enterprise for the common good, with the aims of facilitating deliberative and critical discourse and democratizing knowledge and participation. From this lens, a key purpose of education is challenging inequities for students, families, and communities.
Together, the reports in this category critiqued the major teacher preparation evaluation and accountability initiatives of the 2010s, unpacking underlying assumptions and assessing evidentiary support. In addition, these reports sought to reframe the "commonsense" and market-based discourses and approaches to evaluation that had become dominant since the late 1990s. In a policy brief published by the National Education Policy Center (NEPC), Kumashiro (2015), a former education dean and founding member of Education Deans for Justice and Equity (EDJE), reviewed the highly controversial federal teacher preparation reporting regulations under Title II of the HEA, explicating specific concerns and critiquing the lack of inclusive, democratic decisionmaking around the regulations. At about the same time, NEPC also released a brief by Cochran-Smith et al. (2016), which analyzed four teacher education accountability initiatives-the HEA Title II reporting requirements approved in 2016, CAEP accreditation standards, NCTQ teacher preparation reports cards, and edTPA. Through an analysis of policy claims and evidence, the brief concluded that these accountability initiatives were based on "thin evidence"-that is, limited evidentiary support that the policies actually had the capacity to work as levers for teacher preparation improvement-and on "thin equity"-that is, they failed to account for the multiple, complex in-and out-of-school factors in addition to teacher quality that perpetuate inequity for students, families, and minoritized communities. Building on this policy brief, Cochran-Smith et al., (2018) wrote the book, Reclaiming Accountability in Teacher Education, which analyzed the emergence of teacher preparation's accountability era, proposed an 8-dimensional framework for understanding competing accountability policies, and critiqued major accountability initiatives in terms of these dimensions. The book called for "reclaiming" accountability in teacher education based on "strong equity" and "intelligent professional responsibility," concepts to which we return below. Building on the work of Kumashiro (2015) and Cochran-Smith et al. (2016, 2018, Education Deans for Equity and Justice (EDJE, 2019a, 2019b), a nationwide alliance of 300+ current and former leaders of colleges, schools, and departments of education, released a policy brief outlining problematic trends in teacher preparation, including widespread teacher education external accountability mechanisms. The EDJE report (2019a) critiqued these reform efforts for obscuring "legacies of systemic injustices" and "focusing narrowly on student achievement, teacher accountability, rewards, and punishments" (p. 3). To address these critiques, EDJE (2019a,b) developed a comprehensive "Framework for Assessment and Transformation" to guide the work of schools of education. EDJE (2019a, 2019b) called for evaluation systems that recognized teacher preparation as part of broader systems that include governance and finance, faculty and staff, teaching and learning, and partnerships and public impact, and that lead to genuine improvement in teacher preparation.

"Best Practice"
In contrast to the reports in the first two categories, the reports in the transformative category positioned equity and democracy at the center of evaluation. These reports rejected the concept of "best practice" in teacher preparation evaluation, instead proposing accountability and evaluation frameworks that: (1) recognize power inequities across multiple individual, institutional, and ideological systems; (2) balance external and internal accountability through democratic processes; and (3) provide feedback that leads to improvement consistent with democratic processes and advancing the aims of equity.
The reports in the transformative category call for teacher preparation program evaluation, assessment, and/or accountability systems that acknowledge the broader systems and structures that perpetuate inequities and injustices. The EDJE (2019a) report argued that, "teacher education should be guided by a deep understanding of the roles of schools and universities within a larger society" (p. 6). EDJE offered a framework with thirteen priority work areas grouped in four categories to identify power structures and systems that perpetuate inequity in order to dismantle them. Along similar lines, in their framework for democratic accountability in teacher preparation, Cochran-Smith et al. (2018) called for accountability and evaluation systems that recognize and challenge systems, structures, and processes that perpetuate inequities as they relate to teaching and teacher education. This involves reframing and expanding measures, approaches, and processes that are part of teacher preparation program evaluation systems.
The reports in the transformative category also argued for systems of teacher preparation evaluation that balance external and internal accountability. Cochran-Smith et al. (2018) called for "intelligent professional responsibility" in teacher preparation, linking the concept of "intelligent accountability" (O'Neill, 2013) with the argument that external accountability structures should create the conditions for capacity-building and collaboration among multiple stakeholders, leading to strong internal accountability and professional trust through joint decision-making and participation (Fullan et al., 2015). Along similar lines, the EDJE reports (2019a,b) called for teacher preparation program assessment that is accountable to, and works in solidarity with, families and communities.
The reports in the transformative, equity-centered category differ from both the reports in the post-positivist, methods-focused category, which call for explicit or single measures, and from the reports in the pragmatic, use-oriented category, which do not prioritize particular aims of evaluation, but rather acknowledge multiple purposes depending on users. In contrast, the reports in the transformative category work from the explicit goal of centering equity, using multiple measures tailored to local contexts. As Cochran-Smith et al. (2018) argue, this kind of evaluation and accountability, "does not assume that all teacher education programs would meet the same goals or use the same assessments, but it does assume that all teacher preparation programs would be responsible for preparing teachers to identify and challenge inequities in school and society and prepare their students to live and work in a democratic society" (p. 169). For the reports in the transformative category, the trustworthiness of evaluation measures is determined by multiple factors, including the extent to which the measures address issues of power and privilege and the extent to which they authentically represent the voices of those with a genuine stake in teacher preparation as part of the evaluation process. Together, the reports in the transformative, equitycentered category aim to reframe teacher preparation evaluation away from a market-oriented, external accountability paradigm and toward an equity-centered, democratic system based on strong equity and intelligent professional responsibility.

Cross-cutting Comments: Teacher Education Evaluation/Accountability Reports
There is certainly no lack of interest in teacher education evaluation and accountability, and, as the above discussion indicates, many groups have weighed in on this topic. Below we highlight some cross-cutting similarities and differences.

"Best Practice" and the Logic of Accountability
The reports in the postpositivist and pragmatic categories share the general premise that teacher preparation evaluation has three purposes-accountability, consumer/public information, and program improvement. With most of the reports in the postpositivist and pragmatic categories, the key to all of these is the use of assessments that yield valid inferences. For the reports in the postpositivist category, assessments are intended to be coupled with state-level data systems linking program data with data on student achievement, teacher performance, program satisfaction, and/or retention. Thus, for this category, "best practice" in evaluation is defined as the widespread implementation of validated, standardized, and uniform measures that assess programs' and graduates' effectiveness consistent with top-down approaches to reform. Although the pragmatic reports also identify validity as the key to selecting measures, they do not define "best practice" in terms of particular assessment tools, and their recommendations do not necessarily coincide with top-down policy approaches. Rather these reports emphasize evaluation use by various audiences, which depends on alignment among purposes, measures, and usability.
The logic of the reports in the transformative category diverges from postpositivist logic and, in a different way, from pragmatic logic. The transformative reports explicitly reject the postpositivist assumption that high-stakes, externally-driven accountability systems, rooted in market logic, will produce substantive change unless and until the staggering economic, social, and political inequities in the nation are also addressed. Further, the reports in the transformative category are informed by the conclusion that top-down, high-stakes evaluation systems and standardized practices in teacher preparation tend to foster superficial compliance, deprofessionalization, and uniformity rather than genuine transformation and attention to local problems (Kornfeld et al., 2007;Valli & Rennert-Ariev, 2002). In contrast, the reports in the transformative category are similar to those in the pragmatic category in that they assume that evaluation should be tied to the interests and values of relevant users rather than predetermined by the designation of particular assessments by state-or federal-level reforms. However, in contrast to the reports in the pragmatic category, the reports in the transformative category go beyond recognition of multiple stakeholders. They also assert that evaluation should be reclaimed and reinvented by the profession in collaboration with the groups most affected by inequities and should be guided by principles related to democratic education, justice, and strong equity.

Teacher Preparation Evaluation and Equity
The terms "equity" and "justice" appear very few times in the reports in the postpositivist and pragmatic categories, while the terms "accountability," "effectiveness," "data systems," and "validity" appear repeatedly. It would be incorrect, however, to conclude that there is not an equity aspect to some of these reports. The reports in the postpositivist category assume that one goal of holding teacher education accountable is ensuring that "all" the nation's children and "all" the nation's schools have access to quality teachers. This perspective on equity rests on two premisesfirst, that teacher quality/teacher effectiveness is the most important school factor in students' achievement, and second, that schools with large numbers of minoritized students and/or students living in poverty are the least likely to have access to effective teachers. The Education Trust (2008 refers to the combination of these two premises as "the teacher quality gap," which is presumed to exacerbate the "achievement gap." The concept of equity implicit in nearly all of the reports in our first category is that lack of access to teacher quality is a primary cause of educational and societal inequity and thus, that redistribution of access to teacher quality is a primary cure for inequity. In other words, it is assumed that the redistribution of educational resources, especially teacher quality, has the power to close the "gaps" that separate under-served students from their economically, politically, and socially advantaged peers. This assumption is consistent with the larger notion, prominent in social policy since Lyndon Johnson's era, that poverty and income inequality are problems that are "susceptible to correction" through education (Kantor & Lowe, 2013). Along these lines, the reports in the postpositivist category assume that equity is more or less a by-product of a system wherein all school students have teachers whose preparation programs (and candidates) have been held accountable to rigorous, evidence-based, and valid metrics regarding performance, impact, and career trajectories. As noted above, Cochran-Smith et al. (2016, 2018 referred to this perspective as thin equity because it assumes that students' equal access to teacher quality-achieved through redistribution-can fix inequity without addressing the systems and structures of power and privilege that produce and reproduce inequity in the first place. Evaluation conceptualized from a thin equity lens does not account for the need to: redistribute education resources, including teacher quality, as well as resources well beyond education; recognize the knowledge and experiential resources of minoritized communities; and, authentically represent the voices of marginalized groups as stakeholders in deliberations. Evaluation from a thin equity perspective thus tends to mask the structural and systemic barriers that perpetuate inequality, including its racialized nature (Au, 2016).
The reports in the pragmatic category concentrate on the use of validated tools and systems aligned with the purposes of intended users and audiences, which means that these reports vary in attention to equity. Most are intended to support state-level policy makers working to redesign their evaluation systems so that they focus on outcomes (CCSSO, 2012(CCSSO, , 2017Coggshall et al., 2012); thus their assumptions related to equity are similar to those of the reports in the postpositivist category. The NAEd report (Feuer et al., 2013), however, acknowledges multiple audiences with varying evaluation purposes. In this sense, the NAEd report is not wedded to a particular view of equity since this depends on the values and intentions of users.
Unlike the reports in the first and second categories, the reports in the third transformative category work from the perspective of strong equity, arguing that issues of justice and equity should be front and center in all aspects of evaluation and accountability. Applying to teacher preparation the social justice theories of political philosopher, Nancy Fraser (2003Fraser ( , 2009 4 and others, Cochran-Smith et al. have defined strong equity in teacher education evaluation in terms of four dimensionsredistribution (a socioeconomic dimension), recognition (a cultural dimension), representation (a political dimension), and reframing (a discursive dimension). We return later to these dimensions, which are elaborated elsewhere (Cochran-Smith, 2010;Cochran-Smith & Keefe, in press;Cochran-Smith et al., 2016, 2018.

Beyond "Best Practices" for Evaluating Teacher Preparation:
Recommendations for Centering Equity Many researchers, policymakers, and practitioners want to know definitively what the "best practices" are for evaluating teacher preparation. To consider this question, we turn once again to ideas from evaluation theory.
Evaluation scholars, Schwandt and Gates' (2016), argue that in "social, political, and cultural environments indelibly marked by significant inequalities, power differentials, uncertainty, ambiguity, and interpretability" (p. 67), evaluation should provide a kind of social conscience. They state: Our goal is to push the practice of evaluation further into the domain of a normative undertaking that tackles the questions, 'Are we doing the right thing?' and 'What makes this the right thing to do?' as opposed to being content with remaining a positive practice largely concerned only with the question of 'Are we doing this right?' (pp. 67-68) Here, we use Gates' (2016, 2021) distinction between "doing things right" and "doing the right thing" to raise questions about "best practices" in teacher preparation evaluation. In this article, each time we use the phrase, "best practices," we enclose it within scare quotes to signal that we are problematizing the term.
In teacher preparation evaluation, the term, "best practices" has the same valence as Schwandt and Gates' question, "Are we doing this right?" In other words, "best practices" is related to the kinds of instrumental questions that animate many of the reports we reviewed, such as: "Can we overcome the difficulties to develop a value-added measure to assess preparation program quality in terms of graduates' impact on achievement?" "Do the observation protocols used by preparation programs provide information for making valid inferences about teacher performance?" "Which evaluation tools align with an organization's or state's evaluation purpose?" "Is there adequate empirical evidence to stipulate teacher candidate admissions criteria as part of program evaluation?" Notably, all of these queries focus on technical, methodological, and/or instrumental aspects of getting evaluation "right," and none is related to the normative question, "Are we doing the right thing?" The current status of teacher preparation evaluation mirrors the status quo of the field of evaluation more generally, as charged by the Equitable Evaluation Project (2017): "evaluation seems to be among the last organizational functions to be examined and revamped through an equity lens." In the preceding sections of this article, we identified the "best practices" recommended for teacher preparation evaluation in the postpositivist and pragmatic reports, and we suggested that the concept of "best practices" is not conceptually consistent with the transformative reports. Consistent with our analysis, we ourselves do not propose "best practices" for teacher preparation development of three dimensions in theorizing accountability (Cochran-Smith et al., 2016, 2018. Detailed elaboration of Fraser's ideas, related concepts and literature, and their application to teacher education accountability/evaluation is included elsewhere (Cochran-Smith, et al., 2016, 2018Cochran-Smith & Keefe, in press). evaluation in concluding this article. Rather, consistent with the normative question, "Are we doing the right thing?" we call for teacher preparation evaluation that draws on "guiding principles" rather than "best practices." The distinction we are making here is conceptual in that the phrase, "guiding principles," is intended to signal that teacher preparation evaluation, which we argue should be strong equity-centered, must be understood as normative, critical, and context-specific. In contrast, the term, "best practices," signals "proven" methods in the sense used by the Department of Education's What Works Clearing House, which suggests that teacher preparation evaluation can be objective, uniform, and decontextualized.
In problematizing the notion of "best practices," we are not asserting that none of the approaches considered as "best practice" in the reports we reviewed or the practices currently in place in state evaluation systems could be part of strong equity-centered evaluation systems. To the contrary, for example, a particular classroom observation protocol or a particular system for tracking the placements and retention of program graduates might indeed be utilized in a strong equitycentered evaluation system. However, this would depend on whether these evaluation tools were part of a larger strong equity-centered evaluation approach that involved the authentic representation of minoritized families and community members, recognized the cultural values of non-dominant groups, included redistribution of resources, and worked at the level of structures and systems. When teacher preparation evaluation is strong equity-centered, as we call for here, mechanisms, processes, and content are jointly determined by relevant professional organizations, participants in preparation programs, and members of local communities, schools, and families, through a codesigned process (Ishimaru et al., 2018(Ishimaru et al., , 2019. This means that evaluation tools cannot be completely pre-determined but, rather, emerge from the "knowledge, priorities, and agendas" of students, families, and communities (Ishimaru, 2019, p. 8), the goals of programs and participants, and appropriate notions of trustworthiness and validity. This means expanding "what counts" in preparation program evaluation, by changing evaluation metrics, policies, and practices to draw on the cultural values and shared knowledge and experience of students, families, and communities.
In light of the issues we raise above, we call for approaches and models of teacher preparation evaluation with strong equity at the center. To support this task, we recommend a set of principles and guidelines (See Figure 5) organized according to the four dimensions of strong equity and consistent with recent discussions of equity in the evaluation field.

Guiding Principles for a Strong Equity-Centered Approach to Teacher Preparation Evaluation
Dimension 1: Reframing Evaluation 1. Establish strong equity as an explicit goal and desired outcome of teacher preparation evaluation, not a presumed by-product.
2. Build attention to equity into the entire process of teacher preparation evaluation, including establishing the purposes of the evaluation, deciding on how evidence will be generated and used and how validity will be defined, determining who will have an authentic voice in establishing the purposes and values that drive evaluation (including members of nondominant communities served by the schools that partner with teacher preparation programs), and how the composition and diversity of the groups that are involved in evaluation policies and practices will be determined.
3. Utilize evaluation tools and instruments related to teacher preparation structures, program components, processes, curricula, and assessments that have the capacity to provide usable information for ongoing programmatic self-examination and improvement with attention to issues of equity in all areas.
4. Draw on the expertise and experience of an interdisciplinary task force that includes all relevant stakeholders in teacher preparation, including teacher education practitioners and members of non-dominant communities served by the schools that partner with preparation programs, to establish a set of key teacher preparation equity indicators to be used across teacher preparation evaluations.
Dimension 2: Redistribution of educational opportunities, access, and resources and resources beyond education 5. Work at a structural/systems level in teacher preparation evaluation that recognizes and addresses the multiple systemic and structural barriers-in addition to teacher qualitythat produce and reproduce inequality in students' achievement and other school outcomes, such as: poverty; inequities in school funding, school organization and support, family and community resources; institutionalized racism; and, social policies and practices that maintain or exacerbate inequities related to health care, housing, transportation, jobs, law enforcement, and early childhood services 6. Consider problems and unequal outcomes and opportunities the responsibility of "the system," not simply of individuals, such as teacher educators, preparation programs, teachers, teacher candidates, and school-based teachers and leaders.

Dimension 3: Representation of multiple stakeholders
7. Ensure that all those with a genuine stake in teacher preparation evaluation, including teacher educators, school-based educators, the families and community members served by the schools (including those from minoritized communities most affected by the inequities that exist), and members of professional organizations are authentically represented in purpose-setting, decision-making, evidence generation and interpretation, and determinations of consequences regarding teacher preparation evaluation.
8. Acknowledge and address power issues in evaluation and incorporate, as appropriate, approaches that are intentionally designed to share power and address power imbalances between external evaluators and those being evaluated, on one hand, and between teacher preparation programs and the communities they serve, on the other hand.
9. Focus on intelligent professional responsibility rather than imposed external accountability; external accountability agencies should be charged with supporting the capacity for strong internal accountability, supporting local innovation, and supporting the democratization of knowledge for teaching and teacher education.

Dimension 4: Recognition of cultural values that are not part of dominant institutionalized hierarchies
10. Recognize and draw on the perspectives, knowledge sources, and experiences of those most affected by the root causes of inequity, especially parents, families, and community member from minoritized communities.
11. Where appropriate, include evaluation models such as participatory evaluation, empowerment evaluation, Indigenous evaluation, and culturally responsive evaluation, all of which are intended to recognize and build on the cultural values of minoritized groups.
Ideally this article would conclude with examples of teacher preparation evaluation systems that are in keeping with the guiding principles outlined above. To our knowledge, however, evaluation systems of this kind do not exist in the United States, and although there are some evaluation systems in other developed countries consistent with some aspects of what we are calling for here, policy borrowing at the level of systems does not seem feasible. We also recognize the power differential across U.S. teacher education programs, in that not all teacher education programs have access to the same resources within/across institutional settings (Labaree, 2008).
It should be noted, however, that although most evaluation systems do not make equity the centerpiece, many local programs, which vary in resources, size, and access to power, are designed intentionally to address equity issues and are involved in efforts to assess their work along these lines. For example, to address the disconnect between teacher candidates and the communities they serve, which is related to the strong equity dimensions of representation and recognition, a number of preparation programs have established equitable relationships with community members affected by inequities, working with them as co-teacher educators to make decisions about teacher preparation curricula, fieldwork, teacher candidate evaluation, and program assessment. 5 Additionally, a number of programs have endeavored to establish and sustain equitable partnerships between schools and universities (e.g., Burroughs et al., 2020;McDonald et al., 2014). Although these programs are clearly not teacher preparation evaluation systems, they are in keeping with the principles we have proposed 5 See descriptions and analyses of community-based preparation programs (Murrell, 2000), such as: the Schools Within the Context of Community teacher preparation program at Ball State University, (Ball State University, 2017;Zygmunt & Clark, 2015) (BSU, 2017); University of Washington's community-centered preparation programs wherein community-based educators share co-equal status as teacher educators (Guillen & Zeichner, 2018;Kretchmar & Zeichner, 2016;; Loyola University Chicago's field-based teacher preparation program created in partnership with the Kateri urban Indigenous community organization (Lees, 2016); Lesley University's program that prepares educators to teach autistic students by positioning them as equals in the community (Keefe, 2015(Keefe, , 2016; and UCLA's Center X long-standing preparation program that evaluates teacher candidates on their activist skills in working with immigrant families and other minoritized students, families, and communities (Quartz, 2003). and can be informative. In addition, there are some evaluation tools and assessments that center equity; these are often used at the individual program level, but some have broader reach. 6 Finally, the recent consensus report of the National Academies of Sciences, Engineering, and Medicine (NASEM), Monitoring Educational Equity (2019) and the notion of "key equity indicators" are also highly relevant to our recommendations here. Although the NASEM project focuses on K-12 education and not teacher preparation, it provides insights into how a group of scholars, researchers, and practitioners can reframe evaluation with equity at the center. The NASEM committee of experts in law, behavioral and social sciences, and measurement and statistics developed sixteen key equity indicators focused on K-12 student outcomes as well as resources and opportunities, recommending a system of indicators to examine disparities across racial, ethnic, linguistic, identified disability, and socioeconomic groups. As committee chair and legal scholar, Christopher Edley Jr. noted, "We think that the equity issue is so important and salient at this moment in time that a focus on educational equity deserves its own space, not simply as a piece of an existing set of data instruments." In concluding this article, we echo Edley's sentiments. We believe that equity issues are so important in teacher preparation at this time, that it is essential to make strong equity the center of models and systems for evaluating and improving teacher preparation. We thus conclude this article with a call to action to constitute a task force modeled after the NASEM project with the explicit project of developing key equity indicators for strong equity-centered teacher preparation evaluation, which could be informative for multiple teacher preparation constituencies, including teacher educators, their communities, families, school partners, professional organizations, state and federal policymakers, and advocacy organizations.