education policy analysis

: Formal teacher leader programs that develop, position, and reward teachers to work with peers to improve instruction are a growing reform effort in the United States, yet there are few published studies of their efficacy. In this paper, we examine the impacts of one district’s teacher leadership program on students’ annual state test performance. The program placed full-time instructional coaches and partly released English language arts (ELA) and mathematics content specialists in each of 11 district schools. To assess the program’s impact, we examined five years of student-level state test data; two years before the adoption of the intervention and three years afterwards. Using an interrupted time series design, we examined trends in performance before and after the adoption of the intervention. Overall, there were no significant effects in ELA, and a small negative effect in mathematics. By contrast, in the stable sub-sample of students who were in the district for the five years examined in the study, there was a large significant positive effect in mathematics and large but non-significant positive effect in ELA. We conclude with a discussion the implications of these findings for research and policy.


The Impact of a Formal Teacher Leadership Program on Student Performance
Formal teacher leader programs that develop, position, and reward teachers to work with their colleagues to improve instruction are a growing reform effort in the United States (Berg et al., 2019), yet there are relatively few published studies of their efficacy. In this paper, we examine the impacts of one district's formal teacher leadership program on students' annual state test performance. The teacher leadership program of interest placed full-time instructional coaches and supported partly released English language arts (ELA) and mathematics content area specialists in each of 11 district schools.
The study data consisted of five years of student-level state test data, including two years before the adoption of the teacher leadership intervention and three years afterwards. The analyses focused on the elementary and middle school grades where the tests were uniform. Using an interrupted time series (ITS) design, we examined trends in performance before and after the adoption of the intervention via three analyses. The first analysis used the overall sample, the second included the sub-sample of students for which we had five years of data, and the third analysis explored the heterogeneous effects on student sub-populations (gender, ethnicity, economic disadvantage, and English learner status).
The results showed important differences across the three analyses. In the full sample, there was no significant effect in ELA, and a small and statistically significant negative effect in mathematics. By contrast, in the stable sample, there was a non-significant but positive and large effect in ELA and a large significant positive effect in mathematics. These patterns followed in the disaggregated analyses. In the full sample there were no significant differences in ELA performance within subgroups and in mathematics the only significant positive effect of the treatment was among Black students, while all other subgroups performed significantly worse in performance after the intervention relative to beforehand. Disaggregating the stable sample by student demographic characteristics showed no effects in ELA and significant and positive effects for boys and White and Black students. Clues to the possible differences between the results for the full and stable samples can be found in differences in the compositions of the two that may be due to mobility: relative to the full sample, the stable sample had greater proportions of White student, a smaller percentage of Black students, and was less disadvantaged.
The article proceeds as follows. First, we provide a brief overview of the literature on teacher leadership, focusing on research on the impacts of formal teacher leadership programs and initiatives. Next, we describe the context where the study took place and a description of the structure and focus of teacher leadership program that we investigated. This is followed by an enumeration of the research questions and overview of the study design. We then detail the study methods, including the data, study samples, and specification of the analytic model. Afterwards, we describe the study results, including the impacts on the different samples and the findings for different subgroups of students. After discussing the limitations of the study, we consider the implications of the findings for research and policy.

Literature Review
The diversity of definitions of teacher leadership in education reflects the idea's multitude of purposes and goals, which makes assessing its impacts particularly challenging. In a review of the research literature on teacher leadership, Wenner & Campbell (2017) defined the concept broadly as "teachers who maintain K-12 classroom-based teaching responsibilities, while also taking on leadership responsibilities outside of the classroom" (p. 5). Similarly, attempts to document the varied and boundary-spanning nature of teacher leadership in schools are multiple. This includes teachers' work in curriculum development (Hunzicker, 2018), enhancing school culture (Cansoy & Parlar, 2017), working with parents and the community (Vranješević & Frost, 2016), mentoring of colleagues (Gul et al., 2019), peer coaching (Charteris & Smardon, 2014), instructional modeling (Supovitz & Comstock, 2021), and providing workshops and other professional development opportunities (Alexandrou & Swaffield, 2012;Poekert, 2012). Further, the framing of teacher leadership within theories of distributed leadership encapsulates how teachers can become a valuable resource for their schools based upon their individual willingness, opportunity, and context (Fairman & Mackenzie, 2015;Spillane et al., 2001). Thus, teacher leadership can encompass a wide range of educational activities at different system levels and in a variety of both formal and informal capacities.
To distinguish between different types of teacher leadership, Supovitz (2017) delineated four paradigms of teacher leadership that focus on improving teaching and learning. The first concept, organic teacher leadership, occurs naturally in schools with a sense of collective responsibility where teachers work together on improvements in instruction and student performance. The second conception, improvised teacher leadership, is where teachers are encouraged to take on leadership roles in support of their schools and provided with some training and resources, but there are no changes in the structure of the school within which they are operating to support their work. The third paradigm, quasi-formal teacher leadership, describes efforts to incorporate teacher leaders into the organizational structure of schools, by providing them with titles and/or positional status but stops short of providing them with formal authority to influence the behavior and practices of their peers. The fourth conception, formal teacher leadership, is an explicit model whereby positional teacher leaders are charged with improving the practices of teachers and outcomes of students.
From a policy perspective, the conceptualization of teacher leadership is crucial. At the crux of the distinctions amongst the paradigms is whether teacher leadership should be viewed as a largely naturally occurring phenomena or as a designed intervention. For program developers and policy makers who seek to understand the impacts of programmatic interventions or policy mechanisms to organize and influence the varied ways that teacher leadership occurs in schools, the ability to describe and assess the impacts of formal teacher leadership programs is essential.
There has also recently been a proliferation of teacher leadership programs in the United States. In 2019, Berg et. al. conducted a scan of the teacher leadership programs in the United States and found over 280 active programs which either prepared, positioned and/or rewarded teachers for providing leadership in their contexts. For these reasons, this paper focuses on the fourth paradigm: formal teacher leadership, where teachers are trained, provided with specific roles, responsibilities, and authority to work with their peers on instructional improvement, and compensated for their work.
To assess the strength of the evidence base of the impacts of formal teacher leadership programs on student outcomes, we conducted a review of the literature. Our search approach was a snowball strategy that began with a google scholar search of terms such as "teacher leadership and student achievement" and "impacts of teacher leadership." We then read the relevant articles and searched their reference lists for additional applicable studies. We found multiple studies of the impacts of organic or improvised teacher leadership on student outcomes (e.g., Ahmed & Qazi, 2011;Ingersoll et al., 2017;Sebastian et al., 2017), but there were there were far fewer studies that assessed the impacts of formal teacher leadership programs on student outcomes.
One study by Li & Liu (2022) used structural equation modelling to examine the relationships amongst principal transformational leadership, teacher leadership, teacher self-efficacy, and student performance in a sample of Chinese schools that included some that had formal teacher leadership initiatives. Using survey data, the researchers developed a scale of teacher leadership that included such formal teacher leadership practices as setting the school's direction, involvement in teacher hiring, developing people, designing the organization, and managing instruction. They found that principal transformative leadership was significantly positively related to teacher leadership and that teacher leadership was also significantly and positively associated with student performance.
Yost, Vogel & Liang (2009) conducted a mixed-method study of the impact of a teacher leadership program called Project Achieve on teachers and students in a large urban middle school with 42 teachers and over 1,100 students in the eastern United States. Project Achieve trained and provided ongoing support to a cadre of six full-time teacher leaders to work with teachers at the school. The teacher leaders devoted 100% of their time to coaching, mentoring, modelling lessons and providing professional development to the literacy, mathematics and special education teachers at the middle school. To measure impacts of the program on students' state test results, the researchers compared the treated school in the second year of the intervention to another middle school in the same district. They found that student test scores for the school with teacher leadership were significantly higher in both reading and math across grade levels. Shen at al. (2020) conducted a meta-analysis of research on the impacts of teacher leadership on student achievement. The focus of their study was fairly broad -to examine the impacts on student outcomes of different activities of teacher leaders, including promoting a shared vision of student learning, coordinating and managing beyond the classroom, facilitating improvements in curriculum and instruction, and fostering a collaborative culture in schools. Their inclusion criteria required quantitative studies that focused on teacher leadership where they could produce effect sizes for comparability. Even with this range of teacher leader activities, their search yielded only 21 studies. Of these, 13 were dissertations, and eight from the peer reviewed literature. Categorizing the 21 studies by Supovitz's (2017) four paradigms, 20 of the 21 studies were assessments of organic or improvised teacher leadership.
The one study reported by Shen et al. (2020) of a formal teacher leadership program was the evaluation of the impact of a teacher leader model on student achievement conducted by Iarussi & Larwin (2015). The teacher leader model focused on the student impacts of a formal coaching program for teacher leaders in 13 Ohio school districts. The researchers conducted a quasiexperiment using individual level student data from four districts in the same region not adopting the model as a comparison group. The authors conducted simple t-tests by grade level comparing the treatment and comparison group students' value-added scores on the Ohio State Test, but did not adjust for either observable differences between the schools and districts in the treatment and control groups, the nested nature of the data, or the repeated hypothesis tests. Given the large sample sizes (13,291 students) and unsophisticated design, it is perhaps not surprising that the authors found significant differences, favoring the treatment group, in both English language arts and mathematics. The Shen et al. (2020) meta-analysis is indicative of the early state of evidence of the impact of formal teacher leadership programs.
In further review of the literature, we located only two other studies of the impacts of formal teacher leadership programs. The first of the two was an evaluation of three cohorts of Iowa's Teacher Leadership and Compensation Program, conducted by American Institutes of Research evaluators in 2016 (Citkowicz et al. 2017). Iowa's program, launched in 2014, provided state support for districts to create structured systems for teacher leader roles and support aligned with state standards implementation. The impact components of the evaluation examined teacher retention and student achievement impacts.
To assess impacts on student achievement, the researchers used a multi-level interrupted time series analysis that compared trends before and after implementation. The models used district and grade level fixed effects to compare results within district, within grade cohorts, and controlled for non-varying student indicators (i.e., gender, race, ELL status, special education status, and lunch assistance). The overall results showed no impacts in reading or mathematics, except for a negative impact in math in year 3. There were some differences across cohorts, grades, and district sizes; with only cohort 2 showing positive effects in years one and two, small districts fared better than larger districts, with negative effects in the lower grades and null effects in the middle and high school grades. Students receiving lunch assistance and on individualized education plans fared worse after the intervention in mathematics.
The second study was an evaluation by a teacher leadership development program called Leading Educators, which conducted analyses of its teacher leadership program in Louisiana and four Michigan districts (Leading Educators, 2019). The Leading Educators intervention in districts in both states worked with the districts to identify school-based teacher leaders and provided them with summer professional development and on-going coaching and support to work with teachers in their schools. With Leading Educator support, the participating schools created systems for teacher leaders to facilitate weekly curriculum-aligned professional learning for teachers, to work with teachers on instructional planning, and to lead subject-specific teams to collaboratively set instructional goals and monitor student progress.
The evaluation, which reported only a summary of the analyses rather than detailed results, used both propensity score weights to match students and a difference-in-difference design over three school years to look at the differences between Leading Educator schools and control schools both before and after the program began. The authors reported significant impacts of the Leading Educator teacher leadership program at multiple grade levels in mathematics in both Louisiana and Michigan and reading in Michigan.
Finally, we note a recent meta-analysis of coaching initiatives that found significant causal evidence of the impacts of coaching on instruction and student achievement (Kraft et al., 2018). These authors defined coaching broadly as "all in-service PD programs where coaches or peers observe teachers' instruction and provide feedback to help them improve" (p. 548). Instructional coaching overlaps in many ways with quasi-formal and formal teacher leadership in that its core activities are structured to support more individualized, sustained, subject and context specific engagements; yet coaching can also have distinct characteristics, including the possible position of coaches outside of the schools or across schools rather than school-embedded teacher leadership.
In this article, we seek to add to the limited research base of the impact of formal teacher leadership programs on student outcomes by examining the results of a district's teacher leadership program in a moderate sized American suburban school district.

Study Context
The study district is located in the northeastern United States. At the time of the study, the district had 11 total schools: eight elementary schools (Pre-K to Grade 5), one middle school for Grades 6-7, one middle school for Grades 8-9, and one high school for Grades 10-12. In 2018-19 the district had approximately 8,800 students (National Center for Education Statistics, 2019).
The district's Teacher Leadership Program developed and positioned three types of teacher leaders in each school. First, full-time instructional coaches were teachers recruited from the district who were trained centrally and who provided one-on-one instructional support for teachers in their school. Second, each school had both mathematics and English language arts specialists who worked for half the day to coach teachers on content-specific issues and worked with students in small intervention/basic skills groups for the remainder of their day. Third, technology coaches were nonreleased teachers who assisted their peers with educational technology in their classrooms. All of three types of teacher leaders were charged with fostering a collaborative culture of educator support and development in their schools, promoted the use of data to inform instruction, and facilitated teachers' improvements of instruction and student learning. In addition, they sat on their school's leadership team, participating in school-level decision-making. The teacher leaders were also expected to meet monthly with their school's principal for strategic planning.
The district's Teacher Leader Program grew out of a grant that supported teachers to develop learning modules and roll them out with supportive coaching. This helped the district to establish a structure for building teachers' capacity to support their peers' learning. When the grant ended, assistant principal positions in the elementary and middle schools were eliminated to make room in the district's budget to continue the teacher leader roles. Instructional coaches received a salary increase while they were in the role; content specialists received a stipend, and technology coaches also received a small stipend and counted their technology support work as one of their administrative duty periods.
Teacher leaders were prepared to perform these roles with support from the Program's director who provided a combination of learning activities including summer retreats, monthly rolealike professional development sessions, school walk-throughs and book clubs. In addition, the teacher leaders were encouraged to participate in the district's Teacher Leadership Academy, a yearlong training program open to all district teachers who want to improve their teaching, leadership, and communication to improve student learning. In 2019, the district's Program was used as a model by the state's Department of Education for its new Teacher Leader Endorsement. Teacher leaders in the district reported to the Program's director, who managed the program as part of a portfolio of district human resource duties for staff development and evaluation.

Research Questions and Study Design
In this study we used an interrupted time series (ITS) design to address three research questions about the effects of the Teacher Leadership Program.
1. What were the impacts of the Teacher Leadership Program on student achievement? 2. How, if at all, did the impacts differ for the subset of students who were in the schools for the entire five years for which we had data? 3. Did the impacts differ by student characteristics and educational classifications?
The program of study was introduced to all the schools in the district in the same year (2016-2017). We considered several designs to develop a counterfactual for the intervention, including identifying one or more comparison districts. Ultimately, we decided that using an ITS approach, we could estimate the counterfactual by comparing student performance before and after the intervention period. A significant difference in the trend in the post-period theoretically indicates a causal effect of the intervention. This approach rests on the assumption that in the absence of substantial change, past performance can predict future performance (Bloom, 2003).
Importantly, an ITS design also contains several threats to internal validity (Campbell & Stanley, 1963;Wong et al., 2015). Most notably, there is potential for other factors coincident with the treatment-such as simultaneous initiatives, state policy changes, or dramatic changes to the student population-to confound an explanation of the difference in performance between the preand post-intervention periods, which threatens the validity of the ITS design. In our discussions with district administrators, we did not identify any major simultaneous state or local policy actions that might confound the results.

Methods
Here we describe the data used for the study, the samples used to address the different research questions, and the specification of the analytic model.

Data
This study used five years of longitudinal, student-level administrative data from all elementary and middle schools in the district. The data were requested directly from the district as part of our ongoing study of teacher leadership in the district. The district provided the five years of data for all students from the 2014-15 through the 2018-19 school year. The district initiated its teacher leadership program in the 2016-17 school year. Thus, our longitudinal dataset contained two years of pre-treatment data and three years of post-treatment data. The dataset also contained information on student race/ethnicity, English learner (EL) status, and economic disadvantage. The district also provided data separately for mathematics and English language arts (ELA) achievement, which served as the outcomes of interest for this study. The demographic and achievement data were linked using unique student identifiers.
Per federal law, state accountability policy required students in Grades 3-8 to take annual assessments in mathematics and ELA. Students were also required to take one additional standardized assessment in each subject in high school; however, depending on course enrollment, there are multiple subject tests (e.g., in math, these can be Algebra I, Algebra II, or Geometry) and these assessments are not required at a particular grade level. Therefore, we excluded the high school data from our analyses. Assessment scores ranged from 650 to 850 for all grades and were scaled horizontally, such that scores can be compared across grades (i.e., a score of 700 in one grade is considered equivalent to a score of 700 in any other grade). For the ELA analysis, we focused on Grades 3-8, in which students took the state standardized summative ELA assessment. For the mathematics analysis, we focused on Grades 3-7, in which students took the state standardized summative mathematics assessment before they branched into the subject tests. By doing this, we excluded 367 seventh-grade students across the five years who took the Algebra I exam in lieu of the mathematics assessment, and we excluded all eighth-grade scores due to a substantial portion of the eighth-grade population who took the Algebra I or Geometry assessments instead of the mathematics assessment (93% of students in eighth grade).
Given the exclusion of students in eighth grade and above in mathematics, the full mathematics sample represented students from 9 of the 11 district schools (excluding one middle school and the high school). The full ELA sample represented students from 10 schools (excluding the high school).

Study Samples
In this study, we examined effects of the Teacher Leadership Program on two samples of students. The first sample was the full set of 7,348 unique students in the grades of interest for each subject. We refer to this as the 'full analytic sample.' Importantly, however, there was a small but notable amount of student mobility from the full analytic sample each year (i.e., students leaving the sample), as shown in Table 1. The average full analytic sample mobility rate was 5% for ELA and 12% for mathematics. The mobility rate was higher in mathematics due to students in seventh grade who took the Algebra I exam in lieu of the mathematics exam and thus were removed from the full analytic sample. However, as shown in Table 1, this switching of tests does not account for annual student mobility. The mobility rates in grades 3-5 ranged from 3%-7% per year, excluding the Grade 6 mobility rate in mathematics that was substantially higher due to students taking the Algebra exam. These small but cumulative non-test-related mobility rates could introduce selection bias into the results if students enter or leave the district for a reason related to quality of instruction or other intervention opportunities (e.g., to receive certain services). Given potential validity threats due to mobility, we also estimated the effects of the Program on the sample of 1,022 unique students who were present in the data for all five consecutive years of the study, which we refer to as the 'stable analytic sample.' Descriptive statistics for the full and stable analytic samples are presented in Table 2. In the full analytic samples, there were a significantly higher percentage of White students (in math), Black students, and economically disadvantaged students (i.e., those who qualified for free or reducedprice lunch) in math the years after the intervention was implemented. There was also a significantly lower percentage of English learners. There was also a significantly lower percentage of other/mixed-race students in both subjects after the intervention was initiated. One assumption of ITS is that "the characteristics of the populations remain unchanged throughout the study period" (Kontopantelis et al., 2015, p. 2). The differences in pre-and post-intervention demographics suggest that the district population saw some demographic changes throughout the study period, which may threaten internal validity.
By definition, the stable analytic sample offers a consistent sample across the full study time period. Though race, as a socially defined construct, is a time-varying characteristic, the stable analytic sample shows race as fairly consistent over time. The exception is a marginally significant difference in the percentage of Black students in math from pre-to post-intervention, which may be due to how individuals identify their race in district documentation. Notably, however, English learner (EL) status changed significantly from pre-to post-intervention observations in both ELA and math.
There were significantly fewer students in the stable analytic sample who are EL status (a change from 14% to 0.1% in ELA and from 24% to .5% in math), suggesting that the vast majority of students identified as EL prior to the 2015-16 school year exited EL status after the teacher leadership initiative began. Upon further examination, this drop in ELs occurred the year before the intervention began, not coincident with the intervention. But his represented a potential confounding variable in our estimation, as overlapping EL initiatives or policy changes could account for differences we saw in student achievement.
Also notable is the comparison between those students in the stable analytic sample versus those in the full analytic sample. In other words, how were those students in the stable sample similar or different from those students who entered or left the district during the 5-year study period? Table A1 in the Appendix provides descriptive statistics that compare students in the stable analytic sample versus students who were in the full analytic sample. Notably, the stable samples for both subjects had a significantly higher percentage of White students and English learners compared to the students in the full sample. The stable sample in ELA also had a significantly lower percentage of Black students and economically disadvantaged students, and the stable sample in math had a significantly lower percentage of mixed-race students. In other words, Black students and students who were economically disadvantaged were more likely to not be in the ELA sample for all five years, and mixed-race students were more likely to not be in the math sample for all five years. These patterns align with studies of student mobility, which show that students of color and low-income students are more likely to change schools than their White and more affluent peers (Ashby, 2010). Notes. Mean (sd). Achievement values are on a scale of 650 to 850. All other variables are proportions. T-tests assess differences between pre-and post-intervention samples. Asterisks indicate significant difference between pre and post-intervention samples in the subject indicated. Coefficients statistically significant at *p < 0.10, **p < 0.05, and ***p < 0.01.

Model Specification
In order to make claims about the impact of teacher leadership on student performance, we must consider factors that may confound the relationship between teacher leadership and student performance, such as characteristics of the school or student-level characteristics. The basic regression equation we modeled is as follows: where equals student achievement on either ELA or math. "Time" is a variable taking on the values of -2 to 3, indicating the number of years pre or post the start of the intervention (for the intervention year, Time = 0). "TeacherLeadership" is a time-invariant indicator that equals 1 in the post-intervention years and 0 for the pre-intervention years. 3 is the ITS estimator and the parameter of interest-it indicates the change in the slope of the outcome variable in the postintervention period relative to the pre-intervention period. In this model, we also controlled for a series of student-and school-level characteristics directly.
is a vector of time-invariant and time-varying student characteristics (race/ethnicity, EL status, and economic disadvantage), and is a vector of school-level covariates, including percentages of student composition by race/ethnicity, ELs, and economic disadvantage. In each model, standard errors are clustered at the school level.
Though we controlled for student and school-level factors that may confound the impact of teacher leadership on student achievement, there is potential for unobserved confounding factors associated with schools. Thus, we incorporated school fixed effects to isolate the effects of teacher leadership. School fixed effects account for other potential time-invariant characteristics at the school level that may be associated with student achievement, such as school-level policies related to ELs, curriculum and instruction, and school leadership. Finally, after running the ITS models for the full and stable analytic samples, we estimated whether there were heterogenous treatment effects for subgroups in each sample.

Descriptive Results
The Achievement row at the bottom of Table 2 provides descriptive results on mean outcome measures for both the full and stable analytic samples. These descriptives indicate that average ELA achievement significantly increased after the start of the intervention in both the full and stable analytic samples. In mathematics, the mean achievement was significantly higher postintervention in the full analytic sample but not the stable analytic sample.
Prior to running our models, we also looked at descriptive trends in outcome measures over time (Figures 1 and 2). We draw attention to several important features of the descriptive trends. First, while the mean achievement was higher in ELA, the slopes for both the full and stable analytic samples pre-and post-intervention appear consistent. Second, the full analytic sample in math shows a fairly steep increase in the pre-intervention years, followed by a flat slope across the three postintervention years. Third, in each graph, there appears to be a slight increase in mean scores the year before the intervention. Collectively, these results suggest that there may be some instability in the pre-and post-intervention trends.

Figure 1
Mean ELA Achievement by Year

Treatment Effect on Student Achievement
The analyses that address the first research question about the effects of the teacher leadership program on student achievement are shown in Table 3. As represented by the interaction of the intervention and time in Table 3, the results from the ITS models indicate that the teacher leadership initiative had a positive effect in ELA in both samples, but these results were not statistically significant due to the imprecision of the estimate relative to the standard error. In mathematics, there was a significantly negative effect on students in the full analytic sample and a large and significantly positive effect on the students in the mathematics stable sample. These findings may indicate that the Teacher Leadership Program had a positive effect on students' math performance for non-mobile students-i.e., those students who received consistent teacher leadership supports.

Subgroup Analyses
Analyses of subgroups for the full analytic sample and stable analytic sample of students respectively are shown in Tables 4 and 5. For the full analytic sample (Table 4), the subgroup results in ELA indicate that there were no positive effects for any subgroup of students. In mathematics, there was a significant negative effect for all populations except Black students, for whom there was a significant positive effect. In the stable analytic sample (Table 5), the results showed that in ELA there were no positive effects for any subgroup of students. In mathematics, there was a significant positive effect for male, White, and Black students. It appears that the overall positive effects in the stable analytic sample in mathematics were driven by Black student achievement. However, we interpret this estimate with some caution, given that Black students only represented about 5% of the sample (see Table 2). Notes. Each column is a separate regression. All regressions include a school-level fixed effect. Standard errors clustered at the school level. Intervention is an indicator variable that equals 1 for years after the intervention was implemented (2016-17 to 2018-19) and 0 otherwise. Time is a categorical variable indicating the number of years before or after the intervention year (2016-17); during the 2016-17 academic year, Time is zero. All regressions include school-level demographic controls (student characteristics aggregated at the school*year level) and school-level fixed effects. Observations represent total observations in the dataset across years, not individual students; in some cases, the number of observations is not divisible by 5 due to slight fluctuations in reporting of student characteristics across years. Heterogeneous effects for ELs for the stable analytic sample could not be calculated due to small sample size. Coefficients statistically significant at *0.10, **0.05, and ***0.01.

Study Limitations
This study contains several limitations. First, the treatment had no comparison group, which makes it difficult to rule out competing co-incident explanations for any significant findings (Wong et al., 2015). Thus, we caution against interpreting these findings causally. Rather, we offer these results as a quasi-experimental analysis of the effects of teacher leadership on student achievement. Second, the small samples of Black and Hispanic students (about 6-7% of the students) may make the impacts for these groups less reliable. Replicating the impacts for minority students is an important next step to verify these impacts of teacher leadership. Third, and similarly, the large drop in the sample of English learners from the full sample (about 8%) to the stable sample (about 4%) could reflect both mobility and/or reclassification and may have also influenced the results. Finally, like most impact studies, these global effects do little to disentangle the elements of the teacher leadership program that are more or less impactful.

Discussion
Programs that develop and place teachers in formal leadership roles in schools to work part or full time with their colleagues on instructional improvement are increasingly widespread (Berg et al. 2019). However, much of the literature on the impact of teacher leadership suffers from either general notions of teacher leadership or descriptive designs (e.g., Ingersoll et al., 2017;Li & Liu, 2022;Sebastian et al., 2017). From a policy perspective, it is important to accrue more rigorous evidence of the impact of formal teacher leadership programs to test the efficacy of this growing phenomenon.
In this study, we contribute to the literature on the impact of formal teacher leadership programs by examining the impact of a district-wide program in a small sized (11 schools) district program that provided professional development and support for fully released instructional coaches and part-time ELA and mathematics content specialists to work with teachers on instructional improvement in each school.
To assess the impacts of the program we used an interrupted time series design that has strong internal validity, except for its Achilles heel of a coincident event that potentially confounds with the initiation of the treatment. While there was a decline in students designated as English Learners around the time of the intervention, this process appeared to precede the intervention and we otherwise know of no other major similarly timed reforms.
Nonetheless, the results of this study are complex and point to the differences in impacts as a consequence of both student mobility and decomposition by subgroups. The results using the full analytic sample showed no impacts on student ELA performance and a small negative effect of the treatment on mathematics performance (4 points on a test where the average performance was 760). However, when we analyzed the stable analytic sample of students for whom we had data for the full five years, the mathematics result was a robust, statistically significant, and positive 15-point improvement associated with the treatment. The impacts in ELA were even larger (21 points), although not statistically significant. While we only have demographic data on students to speculate about the reason for the difference in the results between the full analytic sample and stable analytic sample in mathematics, it is notable that, relative to the full sample, the stable sample has a greater proportion of White students, fewer Black students, and a smaller proportion of economically disadvantaged students (as shown in Table 1).
The patterns we saw in the full and stable analytic samples generally followed in the analyses decomposing by subgroups: in the full analytic sample there were no differences in ELA for any subgroup, and in mathematics negative effects for economically disadvantaged students, boys, White students, and Hispanic students, although there was a marginally positive significant effect for Black students. Yet many of these results flipped in the analyses of the stable analytic sample, where boys performed significantly better from pre-to post-intervention by 19 test points, as did White (10 points) and Black (38 points) students. However, we interpret these minority student effects with caution due to their small sample sizes.
Even so, there are several possible interpretations for these results; one is a dosage effectthat the stable sample of students who spent more time in teacher leader supported schools had detectable impacts, while those who had more fleeting experiences did not. Another possible interpretation can be found in the different compositions of the full and stable analytic samples, where more advantaged students (regardless of race) may have disproportionally benefitted from the district's teacher leader initiative. Finally, we note the subject matter differences in the results between mathematics and ELA, and hypothesize that in mathematics, where teachers generally have less expertise relative to ELA, students may have benefited from the extra coaching and support (Campbell & Malkus, 2011;Harris & Muijs, 2004).
These findings contribute to the need for more impact evidence about increasingly popular formal teacher leadership interventions being adopted in districts and schools across the United States. As a single study, which inevitably raises as many questions as it addresses, these mixed findings suggest that teacher leadership is not a panacea, but that the reform has some promise to enhance the instructional experiences that contribute to student learning.
Yost, D. S., Vogel, R., & Liang, L. L. (2009). Embedded teacher leadership: support for a site-based model of professional development. International Journal of Leadership in Education, 12(4), 409-433.  Notes. Each column is a separate regression. Stable analytic sample is an indicator variable that equals 1 for students who are in the stable analytic sample (i.e., have 5 years' worth of data) and 0 otherwise. Coefficients statistically significant at *0.10, **0.05, and ***0.01. Standard errors in parentheses.