Principal Evaluation in the United States: A National Review of State Statutes and Regulations

The growing recognition of how much principals matter for student learning and how they make a difference has fueled the need to ensure that effective principals are leading every school. One way to achieve this is through principal evaluation, which has experienced significant changes in the last decade. We conducted a national exploratory study (50 states) to document the trends in and provide an illustration of the current situation of states’ principal evaluation policies and practices. Using literature-based themes, our analysis of state statutes and regulations revealed that a majority of states have policies requiring at least one literature-based element. Only four (8%) states had statutes and/or regulations regarding all elements of principal evaluation that have been noted in the literature. Student achievement measures were the most common component—required in 66% of states. In addition, most states required principal evaluators to be trained and principals to be evaluated annually. We propose that future research focuses on the validity and reliability of measures and models used for principal evaluation—two aspects rarely addressed in principal evaluation policies—to ensure principal performance is meeting the needs of students, teachers, and schools.


Introduction
Research demonstrates that principals hold an essential position within schools (Babo & Ramaswami, 2011;Fullan, 2014;Leithwood et al., 2004;Wallace Foundation, 2013). Their responsibilities span all facets of the school, from student achievement to lunch rooms and teacher effectiveness to building maintenance (Ch et al., 2017). Ingrained in every fiber of the school, principals can have great impact on school outcomes, results, and reputation (Garza et al., 2014;Leithwood et al., 2004) when they attract, retain, and support effective teachers and create working conditions that foster teacher success (Boyd et al., 2011;Ladd, 2009; see Marzano et al., 2005, for a comprehensive review). As such, principals have been identified in research as second only to teachers in terms of the in-school variables that matter most for student learning (Leithwood et al., 2004). Effective principals are especially important in schools that have been identified as struggling, such as turnaround schools (Leithwood et al., 2004).
The growing awareness of how much principals matter underscores the importance of having qualified and effective principals in each school. One way that districts and states ensure good principals are in the schools is through principal evaluation: "Principal evaluation has been variously described as (a) holding principals accountable for the performance of students and faculty, as well as their own performance, (b) determining the effectiveness of principal practice, or (c) distinguishing between effective and ineffective principals" (Amsterdam et al., 2003, p. 222). Principal evaluation has also been identified as a tool for professional growth, especially when it includes effective feedback, facilitates adult and group learning, and emphasizes continuous improvement (Micheaux & Parvin, 2018). Thus, the purpose of principal evaluation has been described as both summative and formative in nature (Pashiardis & Brauckmann, 2008).
With the above considerations in mind, we conduct an exploratory study to examine what evaluation policies states have put in place to hold principals accountable and determine their effectiveness (Amsterdam et al., 2003). We first provide a brief history of principal evaluation in the United States including a review of principal evaluation items (hereafter referred to as principal evaluation elements) in empirical studies. We collected state statutory and regulatory policies regarding principal or administrative evaluations from state legislatures and state school boards. We examined these policies to determine whether the identified principal evaluation elements from the literature were included and to what degree. The results of this analysis are given, followed by a discussion of our findings that focuses on the implications of the absence or presence of various principal evaluation elements for principals, school districts, and policymakers. principal practice (National Conference of State Legislatures [NCLS], 2013). Broadly, RTTT encouraged evaluations that were based, in significant part, on student achievement. Principal evaluation, thus, was likely a function of the ripple effect of holding teachers accountable for student learning and new statistical procedures that purported to isolate the value that an individual teacher adds to their students' learning . Furthermore, the push for more rigorous principal evaluations occurred in conjunction with findings from research which suggested that principal evaluations were loosely or not at all aligned to standards and evidence (Goldring et al., 2009), such as the Professional Standards for Educational Leaders by the National Policy Board of Educational Administration (2020), and professional development or school improvement goals (Portin et al., 2006). Research also suggested that principals perceived their evaluations as perfunctory (Portin et al., 2006), inconsistent (Thomas et al., 2000), and not psychometrically rigorous (Condon & Clifford, 2010;Goldring et al., 2009;Heck & Marcoulides, 1996).
Realizing this challenge as an opportunity, in 2009 the Joint Committee on Standards for Educational Evaluation released their Personnel Evaluation Standards to support schools in the broad task of assessing personnel (Gullickson & Howard, 2009). Then in 2010, the National Comprehensive Center for Teacher Quality (NCCTQ, 2010) released a report on evaluating school principals and recommended the following strategies: 1) establish a clear set of expectations and goals for the assessments (e.g., what, who, how, frequency), 2) use valid and reliable measures that help principals improve, 3) link assessments to research-based standards, and 4) use multiple forms of assessment to gather a holistic view of principal performance. In 2012, a call to action for a new framework for principal evaluation was provided in a joint-commissioned report released by the Principal Evaluation Committee of the National Association for Elementary School Principals (NAESP) and the National Association for Secondary School Principals (NASSP). Recognizing the challenged state of principal evaluations, the Committee presented a framework which outlined six domains of principal leadership for evaluation: professional growth and learning, student growth and achievement, school planning and progress, school culture, professional qualities and instructional leadership, and stakeholder support and engagement (NAESP & NASSP, 2012). In the same year, the American Institutes for Research released a report synthesizing research on principal effects, specifically for informing the design of principal evaluation systems . The report highlighted two emerging perspectives typically present in United States principal evaluation policy-one of a focus on practice (what do principals do) and one focused on impact (what outcomes do principals achieve). By 2014, 36 states had adopted laws requiring principals to receive regular evaluations of their performance (Superville, 2014). However, with notable increases in expectations for principals, specifically, in instructional and equity-oriented leadership (Farley et al., 2019), attention to the role expectations of principals as well as the evaluation of such expectations has continued in the United States and across the globe (e.g., Díaz-Delgado & Garcia-Martinez, 2019;Fuller et al., 2015;Lambert & Bouchamma, 2019;Larochelle-Audetet al., 2019;Liu et al., 2017;Rivero et al., 2019;Williams, 2015).

Current Policy Climate
Today, in order for states to receive federal funding under the Every Student Succeeds Act (ESSA) of 2015, indicators of principal effectiveness must be submitted as part of the application process (ESSA, 2018). In particular, the ESSA contains various provisions that provide greater flexibility and funding as it pertains to high-quality principals. For example, under Title II, Part A (supporting effective instruction), approximately $2.3 billion/year has been allocated to states to improve the quality of teachers and school leaders, with 3% that can be allocated specifically for the principal pipeline (e.g., recruitment, retention, professional development). Under Title II, Part B, nearly $489 million a year has been authorized for use for human-capital management systems, including performance incentives (e.g., bonuses) for principals based on student achievement outcomes. In addition, criteria used by states to measure principal effectiveness must be made public and be evidence-based. This demonstrates effort towards ensuring that principals are being evaluated (according to the National Center for Education Statistics [2019], 79% of traditional public school principals and 69% of charter school principals reported having been evaluated within the last year) and that the evaluations have consequences (such as remediation programs or dismissal).
In short, today's principal evaluation systems are an extension of the last two decades of educational reform in the United States that are characterized by increased accountability and external pressures for and oversight of principals (West et al., 2010). As such, principal evaluation is heavily controversial. On the one hand, there has been a significant push to hold principals accountable for student learning. However, others have warned policymakers to put principal effects in perspective. For instance, Williams (2015) notes that: Although it is true that principals and teachers are responsible with what happens in schools and classrooms, they alone are not solely responsible for how much and at what rate a student learns...As research has shown, the isolated effects of the principal are difficult to ascertain; therefore, it is a stretch to place "blame" on the principals for students' lack of achievement. (p. 224) Apart from the research-based recommendations (e.g., based on research on principal effects) from various organizations NAESP & NASSP, 2012;NCCTQ, 2010), little research exists on current principal evaluation models in practice and the validity and reliability of those evaluations (Amsterdam et al., 2003;Anderson & Turnball, 2016;Babo & Ramaswami, 2011;Babo & Villaverde, 2013;Clifford & Ross, 2012). Like others have noted (Catano & Stronge, 2007) this body of literature continues to be particularly sparse in light of the considerable expansion of research on teacher evaluation (see Lavigne & Good, 2019) 2 . Comprehensive reviews of principal evaluation were conducted in 2009, 2015, and in 2019, and although useful, illustrates the need for continued or different review approaches. In Goldring and colleagues' 2009 comprehensive examination of how states evaluate principals, the authors found that the evaluation of principals lacked "justification and documentation in terms of the utility, psychometric properties, and accuracy of the instruments" at the district level (Goldring et al., 2009, p. 19).
Since then, Fuller et al. (2015) examined survey data from the Center on Great Teachers and Leaders. This analysis was based on state personnel surveys, used survey data from 2011 to examine 35 state policies, including DC, a follow-up survey in 2012 which included five additional states. The remaining 11 states' data were obtained "through extensive Internet searches for original source materials on state principal evaluation efforts located on SEA website and secondary information from other websites identified through Google searches" (p. 168). This 50-state document review examined the purpose of evaluation and the use of the results. Additionally, it examined measures included within the evaluation, such as student growth, and evaluation system quality insurances.
In 2019, NCTQ published a report examining the changes in principal evaluation policies between 2015 and 2019 (Ross & Walsh, 2019). Specifically, it compared the number of states with principal evaluation policies regarding measures of state growth, principal observations or site visits, annual evaluations, and survey data, as well as requiring improvement plans for underperforming principals. Since Goldring et al.'s (2009) and Fuller et al. (2015) data collection, many laws have changed on a national level, including the implementation of ESSA. While the NCTQ report (Ross & Walsh, 2019) helps us understand the national scope and trends regarding the abovementioned evaluation elements, our study extends and provides additional details regarding individual state requirements, in addition to a few evaluation elements we discovered in our review of the literature.
To drive our exploratory study, we wondered: what elements of principal evaluation have been identified in empirical literature as potentially important elements? Have states incorporated these elements into their state policies? It is significant to note there was a wide variety in the number and detail of requirements in state statutes and regulations across the nation. While we do not submit that more or fewer requirements are appropriate or claim which requirements are more or less appropriate, these findings note that the field has not come to a consensus as to what a principal should and should not be held responsible for. This research will allow researchers, policymakers, and educators a glimpse of principal evaluation requirements across the United States as they endeavor to improve their practices, determine appropriate and effective principal evaluation practices, and continue to research the impact of principals on schools.

Background
To begin the search for literature, Education Source and ERIC search engines were utilized to find peer-reviewed, empirical literature. Keywords used in searching for empirical articles included evaluation and various arrangements of the following terms: instructional leader, principal, school, and school leader, and school manager. Articles not directly linked to principal evaluation (such as teacher evaluation) were eliminated. The result consisted of 28 articles. Despite the scarcity of peer-reviewed, empirical literature found in our search, there were a few common themes found regarding practices for principal evaluation. These themes include: student achievement, personal goals, input from multiple sources, and the timeliness of the evaluations.

Student Achievement
Although primarily through indirect pathways, research has found principal actions explain approximately a quarter of the variance in student achievement outcomes (Hattie, 2009 [Cohen's d = .32]; Leithwood et al., 2004). Thus, student achievement has become an increased focus within principal evaluation. One study discovered just over five years ago that some states required a percentage of principal evaluation scores to be attributed to student test scores, such as 20% in Delaware and 50% in Ohio (Superville, 2014). This finding was similar to the Wallace Foundation's Principal Pipeline Initiative (PPI) (Anderson & Turnball, 2016) where, in 2015, five of the six 3 participating districts weighted student growth as 40% or more on the principal's evaluation, in alignment with their state's requirements. However, in some cases the percentage of principal's evaluation attributed to student growth varied widely. This is demonstrated in Gwinnett County, Georgia where student growth accounted for 70% of a principal's evaluation in 2015 4 as opposed to 40% which was assigned to student growth in all other participating districts. Student growth, however, does not just pertain to student growth on state standardized tests in the literature. It can also include local assessments, growth of the lowest-performing students on standardized tests, and/or growth on school-level student learning objectives (Anderson & Turnball, 2016).
Despite its use in practice 5 , the inclusion of student achievement data in principal evaluations is not without its challenges. For example, a study found student achievement outcomes are most heavily predicted by income and education levels of the community, meaning that evaluation matrices dependent upon student test scores are "fatally flawed" (Tienken et al., 2017, p. 11). In addition, some studies have had difficulty linking student achievement to the direct and indirect influence of the principal (Fuller & Hollingworth, 2014;Leithwood et al. , 2010;Williams et al., 2008). As a result, principal evaluations rarely correspond with achievement data (McMahon et al., 2014). Despite the lack of corroborating research, student achievement continues to be used as a part of principal evaluations in order to align with the National Association of Secondary School Principals and the National Association of Elementary School Principals' frameworks of leadership responsibility (Clifford & Ross, 2013).

Multi-sourced Evaluation
Research has shown expanded expectations for school leaders have had a ripple effect on evaluation and has led to the inclusion of non-academic measures such as chronic absenteeism and discipline and suspension rates under ESSA (Kostyo et al., 2018). Likewise, several researchers have suggested that principal evaluation take into account information from several sources to provide a holistic measurement of effectiveness, rather than relying solely on the superintendent's view or student achievement (Clifford & Ross, 2011). Multiple forms or sources of evaluation are needed to create a robust evaluation, according to researchers, because the principal effects are complex and indirect (Sanders et al., 2012). Additionally, Fuller and Hollingworth (2014) examined ten strategies that focused principal evaluation on student test scores. They concluded that using student test scores as the only means to measure principal effectiveness was too simplistic. Instead, student achievement data might be part of multiple datapoints used in principal evaluation, such as part of a portfolio, which may also include artifacts evidencing principal effectiveness (Babo & Villaverde, 2013). For example, portfolios might include: principal observations, school improvement plans, school board meeting agenda, minutes, and presentations, department, faculty, and staff meeting agendas and minutes, school newsletters, information about or evidence of community partnerships, crisis and emergency plans, school audits, school website, and/or communication logs (NCLS, 2013). 4 Notably, whereas teacher evaluation models have changed over the course of educator evaluation reform to place less weight on student achievement growth and relative to observational data, some principal evaluation models demonstrate an opposite trend-with equal, and in some cases more weight attributed to student growth relative to supervisors' ratings and other measures of professional practice such as: teacher and parent ratings and human resource management (Anderson & Turnball, 2016). 5 Although our identification of themes emerged from the literature, however, examples from practice of the use of student achievement to hold schools (principals) and districts accountable can be found around the nation. Notably, in Tennessee, which is known for its use of value-added measures, has both an indicator for growth (as measured by value-added) and a second for achievement (where either absolute performance or achievement proficiency targets is chosen) with slightly greater weight for each at the elementary and primary as opposed to the secondary level (Tennessee Department of Education, 2019).
In addition to having multiple sources of data for evaluation purposes, research shows evaluations should be created with input and feedback from multiple individuals, such as stakeholders .  explored how principals reacted to feedback from multiple sources, particularly feedback from teachers. They found that while teacher feedback did not guarantee improved performance, it did lead to potential improvement in the performance of the principal. Taken together, this abovementioned literature suggests that principal evaluation should include data from multiple individuals and data sources 6 .

Goal Setting
In creating goals to be included in future evaluations, Hvidston, Range, and McKim (2015) found that principals have an opportunity to continue to focus on school goals while keeping aligned with district and state goals and mandates. In order to create good goals, research shows selfreflection may be a useful tool. In Hvidston et al.'s study, principals reported that evaluations were more valuable to them when they were able to self-reflect. Reflecting on the feedback the principals received after meeting with the superintendent regarding their evaluations was also considered helpful according to the principals.
Research also identifies goal setting as related to learner-centered leadership (Sun & Youngs, 2009). For example, in a quantitative study of leadership in 13 Michigan school districts (n = 19 principal evaluators, n = 138 principals), Sun and Youngs found that instructional-leadership focused principal evaluation models included a focus on school goal setting, and this, in turn directly supported principals' effects on teaching and learning. In addition, individual goals led to a greater commitment to and a greater likelihood of achieving the goal, according to one study (Sinnema & Robinson, 2012). This finding was particularly interesting, as principals were sometimes asked to create school-level goals in place of, or in addition to, individual-level goals (Anderson & Turnball, 2016). While more likely to set a teaching and learning goal than any other kind, principals were more likely to set vague performance-type goals, leading to vague achievement evaluations (Sinnema & Robinson, 2012). One way principals can improve their goals is by creating them in conjunction with and receiving feedback on their goals from trained evaluators and/or supervisors to increase the extent to which goals are specific and have designated measurable outcomes (citation or reword).

Trained Supervisors
If principals are to become better through the feedback on their evaluations, the individual providing the feedback must be trained to give it (Clifford & Ross, 2011). Principals have noted in multiple studies that feedback needs to be given from superintendents or supervisors who are trained to do so in order to receive helpful, individual, and specific feedback (Hvidston et al., 2015;McMahon et al., 2014). There is evidence showing evaluating principals is difficult (Zepeda et al., 2014). In a qualitative study, a superintendent noted the tensions around balancing what should and should not be included in principal evaluation and discrepancies in data used within evaluations (Zepeda et al., 2014). While no assumption was made by the authors that this superintendent was or was not given training on carrying out principal evaluations, it is logical to assume that lack of training increases superintendents' tensions regarding how to evaluate principals and, if an evaluation system is already in place, how to rate and rank principal effectiveness within the evaluation.
However, research shows principals desire honest and open communication in relation to their evaluations from supervisors in the form of positive reinforcement and constructive criticism on areas for improvement (Hvidston et al., 2018). This study concluded such reinforcement and criticism requires that the superintendent is "prepared and knowledgeable" (p. 220). Additional research concluded, some districts work with principal supervisors to establish calibration so that supervisors are consistently rating principals across a district, engage principal supervisors in evaluation simulations, such as rating videos and providing feedback, and provide ongoing support to help supervisors better distinguish between artifacts and evidence, and rating categories such as effective, highly effective, and needs improvement (Anderson & Turnball, 2016).

Timeliness and Frequency
While a trained supervisor or evaluator is imperative to improved performance, it will not be helpful to principals hoping to improve if the evaluations and formative feedback are given during the summer or after the principal has been moved to a different school. Feedback must be timely (NCLS, 2013). Notably, a study showed some principals did not receive feedback from their evaluations until three to six months later (Hvidston et al., 2015). This could mean principals are not receiving feedback from mid-school year evaluations until the school year has ended. In addition, research shows more consistent observations and opportunities for feedback are desired by principals. They want to be observed more and get more feedback to know they are "on the right track" (Hvidston et al., 2018, p. 221). Unfortunately, research shows principals rarely receive regular feedback, leaving them without much direction. For example, as part of a larger study, Sun et al. (2012) sampled 88 Michigan principals and found that 40% had been evaluated once every three years, 26.25% had been evaluated once a year, and 6% had never been evaluated. Furthermore, as supervision and evaluation serve distinctively different purposes, but are intricately intertwined, the ability for evaluation to drive school improvement is diminished when supervisors are not frequently providing feedback informed by the evaluation rubric (Anderson & Turnball, 2016). In essence, endof-year feedback does little to nothing in helping a school leader make improvements throughout the year.
In sum, extant literature points to the inclusion of student achievement, personal goals, and multiple measures in principal evaluations. These evaluations should be conducted in a timely manner, with frequent observations and feedback from supervisors who are trained to provide effective and knowledgeable feedback. We use these themes to inform the current study.

Methodology
Our descriptive 50-state policy review consisted of the following steps: 1) examine literature for themes surrounding principal evaluation, and specifically what should or should not be included in the evaluation, 2) identify themes that emerged from the literature, 3) locate statutes and regulations from each state, 4) use themes from the literature to guide data collection efforts and analysis in each state's statutes and regulations (is it included or is it not and in how much detail).
Given the relatively limited empirical literature on principal evaluation, we intentionally chose to employ a descriptive and exploratory methodology to establish a better understanding of the state of principal evaluation and to "uncover patterns and inform and improve decision-making" (Loeb et al., 2017, p. 3). Thus, our full analysis is guided by the following research questions: 1. What elements are present in the principal evaluation policies in the United States?
2. Do states' principal evaluation policies identify particular stakeholder groups?

Procedures
To determine which elements are present in state statute and regulation, each of them first had to be acquired. State government and board of education websites of all 50 states were accessed by the first author between June, July, and August 2019 and the latest available statutes and regulations were downloaded. State representatives were contacted on multiple occasions if documentation of statutes and regulations were unavailable to determine their presence or absence. Statutes and regulations without mention of principal evaluations were discarded. Between the board regulations, educational codes (which are created in many states by state educational boards and are hereafter included as board regulations), and legislative statutes, all 50 states had some form of statute or regulation policy regarding principal evaluations from which we could collect data. While it is difficult to ascertain the number of documents that were accessed as part of the entire review of state statutes and regulations for reasons such as broken links, incorrect documentation, and differing state-to-state formatting of these policies, we estimate about 40 state regulations and 30 state statutes included principal evaluation policies and were included in this analysis. State statutes and board regulations were then analyzed to determine what each state required regarding principal evaluation in regards to student achievement (growth and raw scores), goal setting (individual and/or with supervisor), stakeholders (e.g., parents, students, teachers, staff), measures used (input sources such as supervisor and teachers, student assessments, weighting), and feedback (from trained individuals, timeliness). This information was then recorded (see Appendix 1). The second author checked and verified sources when sources were ambiguous, unclear, or were unable to be located.
To verify the accuracy of data collection, the second author used a random number generator to check 10% of the sample, or in this case the population (n = 5), for reliability (Kaid & Wadsworth, 1989). Reliability on the selected sub-sample-Alabama , Arkansas, Idaho, Illinois, and Michigan-was 100% on all variables. Inter-rater reliability exceeded recommendations for intercoder agreement-a minimum of 85% (Miles, Huberman, & Saldaña, 2013) for this randomly selected, initial sub-set. Therefore, the inter-rater reliability check was determined complete.

RQ 1. What elements are present in the principal evaluation policies in the United States?
Of the 50 states, only four (8%) -Connecticut, New Jersey, Pennsylvania, and West Virginia -had statutes and/or regulations regarding all elements noted in the literature review. A full list of states and evaluation policy elements regarding goal setting, stakeholder input, student achievement, other sources of data, and weighting are given in Appendix 1 and evaluator training and feedback elements are displayed in Appendix 2, but a synopsis of each is provided here.
Seventeen states (34%) mentioned-either required or suggested the optional inclusion ofgoal setting in their policies regarding principal evaluations. We defined collaborative goal setting as creating goals in conjunction with or under the approval of a supervisor. Thirteen states (26%) required collaborative goal setting (see Figure 1). Some policies, such as Delaware, require goal setting conferences multiple times per year (Del. Code tit. 14, §108). Other states, such as Louisiana, maintain that goals used for evaluation are based only on "student learning targets" (La. Code tit. 28 §301). West Virginia regulation states, "The school leader and the evaluator with mutually establish annual written goals for the administrator's performance evaluation on or before November 1" (W. Va. Code, 126 §142). The remaining four states (8%) -Colorado, Iowa, Pennsylvania, and South Dakota -required principals to create and achieve goals, but were not required to collaborate in the creation process.

Figure 1
States that Include Collaborative Goal Setting in Principal Evaluation Policies Note: Shaded states require collaborative goal setting.
Student achievement was described in many different ways across policies, including growth, achievement, learning, performance, and student data. Overall, 33 states (66%) required some type of student achievement data to be used in principal evaluations while Alaska, California, and Ohio (6% of states) mentioned achievement to be an optional measure. Growth data was required by 25 (50%) states (see Figure 2) and 18 (36%) states required other forms of student achievement data. Indiana required student achievement and/or growth data (Ind. Admin. Code, 511 §10, 2016). Arizona, Florida, Louisiana, Michigan, Pennsylvania, and Texas included valueadded models to assess student achievement as part of their requirements or optional principal evaluation elements. Of the 36 (72%) states that mentioned student achievement data in their policies on principal evaluation. States described student achievement data in a variety of ways. Connecticut requires "multiple student learning indicators" (Connecticut State Department of Education, 2017) to be used, including graduation rates and other "indicators… relevant to the student population" (Connecticut State Department of Education 2017, Section 3). Pennsylvania requires the use of district and national standardized tests in their evaluations, but also requires the use of student projects and portfolios (Pa. Code, 022 §19.2,2013). A total of 12 (24%) states required both growth scores and other student achievement data to be used in principal evaluations.

Figure 2
States that Require Student Achievement Growth in Principal Evaluation Policies Note: Shaded states require student achievement growth. Stakeholder input was required by 17 (34%) states. Three additional states gave stakeholder input as optional to use in evaluations. Stakeholders particularly mentioned included teachers, community members, peers, supervisors, among other groups. Further information regarding stakeholders mentioned within the policies is given in the findings of research question 2.
Beyond the elements listed above, many states required multi-sourced principal evaluations. Of these, the most common included observations and standards. Seventeen (34%) states required observation(s) or some form of site visit(s). The frequency and number of observations differed from state to state. For example, Utah requires "multiple supervisor observations at appropriate intervals" (Utah Admin. Code, R277-533, 2018) while other states, such as West Virginia only mention observation by noting that evaluators should use "the online observation form" (W. Va. Code, 126 §142) leaving it unclear if principals are to be observed.
Standards of conduct, ethics, job descriptions, leadership, performance, and/or similar criteria were required by 19 (38%) states. In Illinois, this included not only aligning the evaluation "with research-based standards" (ILCS 105 §24A, 2019), but also "consider[ing] the principal's specific duties, responsibilities, management, and competence as a principal" (ILCS 105 §24A, 2019). Wyoming's requirements are an example of how some states give more latitude to the individual districts regarding use of standards (Wyo. Admin. Rules 29 §3). The evaluation system is to be based on either the Wyoming standards for districts and leaders or "standards prescribed by the board of trustees, so long as standard 1 of the Wyoming standards for district and school leaders… is included in the board's standard" (Wyo, Admin. Rules 29 §3).
Additional requirements included the administrator's "effectiveness in addressing school safety and enforcing student discipline" (Va. Code 15 §22.1), efficiency and ability (HI Rev. Stat. §302A), and an "assessment of the educator's effectiveness in supporting every student in meeting rigorous learning goals through the performance of the educator's duties" (Mont. Admin. Rules §10.55.701). Overall, 41 (82%) states have requirements for multiple sources of data (including goal setting, stakeholder input, and student achievement).
The weighting, or a range of weights, for elements of principal evaluations on the state level was prescribed by 19 (38%) states (see Figure 3). As with the other elements, weightings differed across states. Arizona requires "quantitative data on student academic progress [to account] for between 33% and 50% of the evaluation outcomes" (ARS §15-203), but does not give weightings for any other elements of the evaluation. Colorado's weighting of 50% principal professional practice items and 50% student learning typifies several states (1 CCR 301-87-5.01). One of the more complex and variable weighting systems comes from New Jersey, where the evaluation must include 10-40% growth, 10-20% teacher growth, and 10-40% administrator goal, but principal practices must comprise less than 50% (NJAC 6A §10).

States that Prescribe Weights for Principal Evaluation Measures
Note: Shaded states prescribe evaluation measure weights.
Training for evaluators in connection with principal evaluations were mentioned in 31 (62%) state policies. See The state regulation expounds on this, requiring the training to include the states effectiveness evaluation framework (OAC 210 §20). Many states have similar requirements. However, other states have far more policy regarding evaluators. New York has a subpart of their statute going into detail about evaluator training, including how to use the evaluation assessments, what research-based observation should look like, and "considerations in evaluating teachers and principals of English language learners and students with disabilities" (NY CRR tit. 8 § 30).

Figure 4
States that Require that Principal Evaluators be Trained Note: Shaded states require principal evaluators to be trained.
Feedback, meaning timeliness of feedback after evaluation and frequency of evaluations, was discussed in 43 (86%) state policies. Fifteen states described how timely the feedback was to be given to principals. The timeliness ranged from specific dates to within a set number of days of the evaluation, such as within 10 days of the evaluation (Fla. Rule 6A, 2018; Ga. Code 160 §5) or before September 15 (4 AAC 19.055), to as soon as possible (NMAC 6 §69). Likewise, there was a wide range in the required frequency of evaluations. While 27 (54%) states required annual evaluations of all principals, the frequency varied based on licensure or experience in 10 (20%) states. For example, Washington requires annual evaluations for the first three years and then a comprehensive evaluation every six years thereafter with focused evaluations taking place annually between comprehensive evaluations (RCW 28A §405). Kansas does biannual evaluations for the first two years, annual for the third and fourth, and then every three years thereafter (Kan. Stat. tit. 281 §72). Cyclical variations were used in Oregon and Rhode Island.

RQ 2. Do states' principal evaluation policies identify particular stakeholder groups?
Nineteen (38%) states require or give the option for stakeholders to be used in principal evaluations (See Figure 5). Within these 19 states, 16 (32%) gave required stakeholders to be included and 5 (10%) gave optional stakeholder suggestions. Teachers and staff were most likely to be mentioned, with eight (16%) states requiring their input while Colorado and Idaho list them as optional data sources. Parents or guardians were required by Alaska, Rhode Island, Utah, and West Virginia and optional for Colorado, Florida, Idaho, North Carolina, and Pennsyvlania. Other groups mentioned, as required or optional, included, students, administrators/peers, and community members. Supervisors were specifically mentioned by Colorado, New Jersey, North Carolina, and Rhode Island. Connecticut and Oklahoma specifically required stakeholder input, but did not elaborate as to who those stakeholders were.

Figure 5
States with Required or Optional Stakeholder Input Policies Note: Shaded states have optional or required stakeholder input policies. While many states did not expand upon how stakeholder groups should or could provide their input, Colorado gave many examples. Colorado suggests the use of "school newsletters,… evidence of community partnerships, parent engagement and participation rates, '360 degree' survey tools designed to solicit feedback from multiple stakeholder perspectives,… [and] teacher retention data" (1 CCR 301) as ways to obtain and measure teacher, student, parent/guardian, and/or other administrators' input. The next section will discuss implication of these findings and their limitations.

Discussion
The purpose of this exploratory study was to examine what evaluation elements were included in state statutes and regulations, across the United States. South Carolina's General Assembly asserts "that the leadership of the principal is key to the success of a school, and support for ongoing, integrated professional development is integral to better schools and to the improvement of the actual work of teachers and school staff" (SCCL 24 §59). As school success is dependent upon the leadership of the principal (Hattie, 2009), evaluations of principals are essential to determine whether the principal is indeed a positive influence on the school. Although limited, current literature has focused on goal setting (Anderson & Turnball, 2016;Hvidston et al., 2015;Sinnema & Robinson, 2012, Sun & Youngs, 2009), multi-sourced evaluations, including stakeholder input (Clifford & Ross, 2011;Fuller & Hollingworth, 2014;, student achievement (Anderson & Turnball, 2016;Fuller & Hollingworth, 2014;Leithwood et al., 2010;McMahon et al., 2014;Williams et al., 2008), and evaluator training and feedback (Anderson & Turnball, 2016;Hvidston et al., 2015;Hvidston et al., 2018;McMahon et al., 2014;Sun et al., 2012;Zepeda et al., 2014) as elements of principal evaluations.
In short, using literature-based themes, our analysis of state statutes and regulations revealed that a majority of states required at least one literature-based element. Only four (8%) states had statutes and/or regulations regarding all elements of principal evaluation that have been noted in the literature. Although the most controversial, student achievement measures were the most common component and were required in 66% of states. In addition, most states required principal evaluators to be trained, and that principals were to be evaluated annually.

Limitations
First, it should be noted that this analysis, as well as state educators and lawmakers, was limited by the lack of research on principal evaluation. This limitation is important to keep in mind, for as we can report our findings from within state policies, we are unable to comment as to the effectiveness or validity of any measures required or suggested for principal evaluations within state statutes and regulations, an issue that was noted a decade ago by Goldring et al. (2009) and that continues to be problematic in principal evaluation.
Further, we did not examine principal evaluation in practice nor the evaluation models or rubrics that were either recommended or prescribed by states. Therefore, we are unable to comment on how or to what extent statutes and regulations impacted actual practice. However, the literature on educational policy demonstrates that policies are often modified to fit local context. Honig and Hatch (2004) describe this as "crafting coherence" as districts make sense between internal goals and external demands. Thus, we would anticipate variation and deviance from statutes regulations, especially across districts. It is possible, like in the case of standards-based reform (see Coburn et al., 2016, for a review of this literature), that policy has relatively superficial or limited impact on practice, rendering it important to extend this work to the study of "why" not just the "what" and "how" of principal evaluation, taking into consideration important variables like capacity (Honig, 2003) or network strength (Daly & Finnegan, 2011).

Implications
Yet even with these limitations in mind, we provided important foundational work to guide subsequent research. Relative to the 36 states that had adopted laws requiring principals to receive regular evaluations of their performance by 2014 (Superville, 2014), our search of state statutes and regulations indicated that as of 2019 all 50 states did have policies regarding principal evaluations, though they vary greatly in their requirements and details. It is unsurprising that all states have some form of policy regarding principal evaluation, as ESSA funds require principal effectiveness indicators to be created (Elementary andSecondary Education Act of 1965, 2018). Beyond some emerging consensus among states in possible measures of principal effectiveness, the weights applied to and importance placed upon such measures varied widely and there appears to be little progress on this, at least within the last three years (Anderson & Turnball, 2016) or even within the last decade (Goldring et al., 2009). Hence, a principal engaging in a set of practices and achieving a given set of outcomes may be designated as effective in one state, but ineffective in another. This inconsistency across states is concerning as it suggests the possibility of a gap between law-makers' and researchers' understanding of expectations for principals and the influence principals have in improving student outcomes and teacher effectiveness. This deficiency leads to confusion regarding the roles and responsibilities of the school principal. Our calls for greater transparency align with those by Goldring et al. (2009) from nearly a decade ago for better documentation of psychometric properties of instruments utilized by districts to evaluate principals. Sinnema and Robinson (2012) suggested that goal setting is an important part of principal evaluation because "goals set a work and development agenda for the subsequent year" (p. 137). Furthermore, districts' focus on principals' enactment of ambitious learning goals in principal evaluation is related to principals' learner-centered leadership (Sun & Youngs, 2009). One promising finding is that 17 (34%) states require goal setting in their principal evaluations. However, if setting goals is important for school improvement, it is likely that principals are setting goals even if it is not part of their evaluations. While adding goal-setting as a required evaluation element in policy may not change the practices of principals already setting goals, it is worth considering if more states should have similar policies, particularly regarding collaborative goal making.
While the use and weight of student achievement data in evaluating principals will likely continue to be debated, most states have policies in place requiring the use of such data. Beyond standardized test scores, many policies were unclear as to what type of student achievement data was to be used. For instance, Wisconsin statute stated "student performance" (Wis. Statute 115 §415) while Pennsylvania regulations require more than testing data to be used in their evaluations (Pennsylvania Code, 022 § 19.2., 2013). The use of non-standardized testing student achievement data, such as suspension rates or absenteeism is one way to include multiple measurements for principal evaluations (Kostyo et al., 2018;Sanders et al., 2012). However, without clarity as to what these performance measures are, we are unable to determine whether inclusion in these evaluations is valid. In addition, lack of clarity as to what measures are used may lead to differing interpretations among district, meaning a principal under one district's evaluation measures may be labeled as 'proficient', while under another district's measures is labeled 'needs improvement'.
Growth scores from standardized tests were the most common source of student achievement data. This may contribute to greater reliability in measurements of principal evaluations, particularly when there is high principal retention at the same school. However, it seems unlikely that growth scores would be a valid evaluation element when principals change schools often. In 2016-2017, the national average of tenure of principals in their school was four years and nearly 35% of principals have been at their school for two years or less. Annually, nearly 18% principals leave their school from one year to the next, and in high poverty schools this percentage is even higher-21% (Levin & Bradley, 2019). Yet, the effects of principals' improvement efforts are not recognizable in student achievement outcomes for 5 to 7 years (Borman et al., 2003;Fullan, 2001;Gross et al., 2009). Taken together, these findings suggested that despite the wide use of student achievement data by states, it is unlikely that principals are actually being held accountable for results that can be attributed to their own leadership actions and initiatives. Thus, the use of student achievement data in principal evaluations continues to be problematic and warrants additional study. Furthermore, it may be particularly problematic in settings that experience higher levels of turnover, such as schools that serve a larger proportion of minoritized students (Gates et al., 2006) 7 . Future research would help shed light on these issues, particularly: what measures of student achievement data are actually used in principal evaluation.
In addition, extraneous variables may have great influence, on both growth and student achievement scores, such as teacher attrition, community tragedies, and even fire alarms going off during testing. Perhaps it is because of the unknown reliability of holding principals directly responsible for student achievement, including growth scores, that all states requiring the use of growth scores also required at least one other assessment measure to be used, with the exception of Michigan.
It was interesting to find that all 19 states that had a weighting scale also required student achievement measures in their evaluations. We wonder if this was done either to ensure student achievement measures were used -and not eliminated -from principal evaluations, or to ensure student achievement measures were not the only source of information to be used on evaluations. Further research could show whether these options were the case or if other reasoning was behind these weighting decisions.
Many states require principals to be evaluated annually. Of the states where evaluation cycles are determined based on experience or certification (18%), Arkansas is the only state to not require evaluation of the principal for the first three years. This is interesting, particularly in comparison to Kansas and North Dakota which require biannual evaluations for the first two and three years, respectively. It would interesting to know if this has to do with tenure or licensure processes, the type of evaluation done in each state, or, in the case of Kansas and North Dakota, to ensure beginning principals have access to additional assistance through more frequent evaluations. Future research might explore to what extent states encourage or consider development supervision and evaluation practices and processes (see Glickman et al., 2017, for a developmental approach to teacher supervision), honoring what we know about career stages of the principalship (Day & Bakioğlu, 1996;Oplatka, 2012) and the time is takes to acquire results from school improvement efforts, as noted earlier.
In addition, it was interesting to note that Washington only evaluates their principals comprehensively every six years while using shorter or focused evaluations annually to assess performance in "one of the eight criteria selected for a performance rating plus professional growth activities specifically linked to the selected criteria" (RCW 295 §303). This greatly differs from the 27 states that only state principals are to be evaluated annually. More research could shed light on which of these practices produces better results, including student achievement results, teacher retention, goal achievement, and financially.
In examining our overall analysis, we have many questions and thoughts regarding future research. Future research could explore how states developed their principal evaluation policies and how these policies were enacted in practice, following in the footsteps of the more robust literature on teacher evaluation. We wonder, therefore, in the absence of more research on principal evaluation how states arrived at: 1) the chosen measures included in their principal evaluation models, and 2) the weights, if they have been assigned, to such measures-a worthy topic for future research. In cases where policymakers do point to research, we wonder if they are using such findings in appropriate ways to inform policy (Lubienski et al., 2016). In alignment with extant literature (Hvidston et al., 2015(Hvidston et al., , 2018McMahon et al., 2014), we also wonder: How were superintendents trained to evaluate principals and what was the rigor of such trainings (e.g., required a passing score on a test, multiple trainings throughout the year, interrater reliability establishment and attention to coder drift)? How prepared are superintendents to evaluate principals? What is the effectiveness of the feedback they provide to principals? What are the effects of principal evaluation systems on principals' growth and development, practices, and effectiveness? Notably, many of these questions were raised nearly a decade ago (see Clifford & Ross, 2011), but significant gaps in research remain.
We also wonder what principals' experiences of the evaluations are in each state. Pashiardis and Brauckmann (2008) point to large gaps between stated and perceived purposes of principal evaluation. We wonder, then: do principals' daily experiences align with their evaluation measures? Does this alignment include professional development opportunities catered to the evaluative feedback they were given (McMahon et al., 2014)? Additionally, it would be interesting to note if principals' engagement in the development of the evaluation system shaped their perceptions of the system's effectiveness.
Principals struggle to find enough time to accommodate the existing and the increasing expectations placed upon them (Lavigne & Good, 2019). Evaluations are one tool to help principals prioritize these demands. It allows districts to communicate what responsibilities of the principal they deem most important. The purpose of this exploratory study was to examine what evaluation elements were included in state statutes and regulations, across the United States. Ultimately, the great variation we observed in some aspects of principal evaluation across the states underscores the need for more and better research to inform principal evaluation policy and practice in ways that helps supervisors effectively assess and improve principal effectiveness in the United States, and perhaps even across the globe. Readers are free to copy, display, distribute, and adapt this article, as long as the work is attributed to the author(s) and Education Policy Analysis Archives, the changes are identified, and the same license applies to the derivative work. More details of this Creative Commons license are available at https://creativecommons. Please send errata notes to Audrey Amrein-Beardsley at audrey.beardsley@asu.edu Join EPAA's Facebook community at https://www.facebook.com/EPAAAAPE and Twitter feed @epaa_aape.