Math and Science Outcomes for Students of Teachers from Standard and Alternative Pathways in Texas

We assess the impact of teachers from different preparation pathways on Algebra I and Biology learning outcomes in Texas. Data come from the state of Texas for academic years 2010-2011 and 2011–2012. We examine both novice and experienced teachers. We make three sets of comparisons, ranging from broad pathways to specific programs. First we compare teachers from all standard university programs to all teachers from alternative certification programs. Second, we select teachers from a collection of leading universities and compare them with alternatively certified novice teachers for the same subjects in the same schools. Third, we repeat this process for teachers from UTeach, a STEM-specific university-based program with 8 sites in Texas. Students whose teachers came from a standard pathway gain around one more Month of Schooling than those whose teachers followed an alternative pathway. Effects are larger for mathematics than for science. For some subgroups including Economically Disadvantaged and Gifted students, the advantage of having teachers from standard programs may be as large as 6 to 9 Months of Schooling.


Introduction
The United States faces a growing shortage of teachers, with recent estimates of 100,000 more new teachers needed each year than are available (Sutcher, Darling-Hammond, & Carver-Thomas, 2019). Conversations about the complex issue of teacher shortages often turn to the certification pathways that are available to prospective teachers, and, complicating the issue, how pathways that increase the quantity of teachers may not provide the quality of training required for them to be successful. Alternative certification programs grew out of the desire to develop marketbased (Hess, 2002) disruptive (Christensen, 1997) competitors to traditional university programs, affording individuals who have already completed a bachelor's degree an expedited route into the classroom. Given the complementary features and goals of these certification programs, it is clear that one cannot consider the issues of teacher shortages and preparation programs independently. Texas provides a unique opportunity to explore these related issues since it prepares more teachers and has also proceeded further towards obtaining teachers from non-standard pathways than any other state.
The central aim in this study is to inform decisions on whether to allow and how to regulate alternative certification programs leading to rapid certification. Underlying the issue of teacher quality and teacher shortages is a complex system of interconnected costs and benefits. Regulation of any certification pathway should be fully informed by an understanding of the implications for both the number and adequate preparation of teachers. We focus on STEM teachers, examining teacher retention and student test scores in Algebra I and Biology. In this area of teaching, because of severe shortages we document, any effort to increase the supply of teachers must not be dismissed casually. We will show, particularly in Algebra 1, that students learn more in classes of teachers from standard programs, and that since 2008, alternatively certified teachers have not persisted in teaching as long as those from standard programs. Yet the sizes of these effects are not large enough to justify precipitous actions, and thus our conclusions and policy recommendations focus on how alternative certification pathways might fit into the ecosystem of STEM teaching in a way that neither disrupts this complex system, nor students' opportunities to learn.
Some of the most acute shortages of secondary teachers arise in the STEM fields, as indicated by surveys of employers (American Association for Employment in Education, 2016) or by the data in Table 1. Nearly 40% of mathematics teachers either lack full teaching certification or lack a major or minor in mathematics. In the physical sciences, over 60% of teachers lack one or the other of these qualifications. If one uses Advanced Placement Computer Science (CS) exams (College Board, 2016) as the basis for an estimate, less than 20% of U.S. high schools even offer computer science. Such shortages are poised to become greater because the number of teachers prepared in the highest-producing states has been falling ( Figure 1).  The decline in production of teachers is particularly striking in New York and California. Texas, which now produces far more teachers than any other state despite having a smaller population than California, has also seen a drop, but it is less severe overall, and the trend in production has been upward since 2011-2012. The main reason for this is that Texas now obtains most of its teachers from alternative certification programs, and these are dominated by a few companies that specialize in rapid online teacher certification. Thus it is of interest to determine if and in what way preparation pathways impact teacher quality because of the increasing quantity of alternatively certified teachers, and the use of alternative certification programs to ameliorate teacher shortages.
We focus on STEM teachers in particular, for which Figure 2 shows production has been declining as well. The difficulty of recruiting STEM teachers has been a subject of concern at least since the Gathering Storm report (Augustine, 2006). The largest effort at the Federal level to increase the number of STEM teachers is NSF's Noyce Scholarship program (National Science Foundation, 2016). In 2012, the most recent year for which data are available, around 1250 unique individuals obtained a first year of scholarship or stipend support through Noyce (Bobronnikov et al., 2014). This is less than 1% of the national need. Thus the institutions that have traditionally supplied the United States with teachers are not supplying enough in STEM shortage areas, and programs intended to rectify these problems are doing so at a smaller scale than would be needed to solve the problem. Not only have teacher shortages persisted for decades (National Commission on Excellence in Education, 1983), the quality of preparation programs in colleges of education has been subject to criticism. Standard preparation programs stand accused on the one hand of being an "industry of mediocrity" (Greenberg, McKee, & Walsh, 2013) and on the other hand of surviving despite low quality because they are a "cash cow" for universities (Duncan, 2010). Some reformers have advocated allowing almost anyone with a content degree to enter teaching, and sorting out who is qualified to continue by assessing their performance on the job (Hess, 2002). Value-added models have specifically been recommended as the way to identify effective teachers (Gordon, Kane, & Staiger, 2006;. The combination of skepticism about colleges of education and concern over teacher shortages has motivated policy-makers across the country to allow alternative pathways to teaching. Alternative certification programs now exist in all states, but there is great variation in the regulations that control what they are able or not able to do. Some alternative certification programs live within universities and differ only slightly from the standard programs at the same institutions. Others are web-based and come close to implementing the recommendation that almost anyone with a content degree should be allowed into teaching as rapidly as possible.
In the general context of market-based education reform, national policies affecting educator preparation are developing to incorporate two new ingredients: accountability from value-added models, and parallel preparation systems.
Accountability systems can use student test scores not only to evaluate the effectiveness of educators, but to evaluate the effectiveness of educator preparation programs. This nearly became mandatory across the country in 2016. In the fall of that year, the US Department of Education released guidance that directed every state to develop ratings of each Teacher Preparation Program (TPP). The guidance required states to "make meaningful differentiations in teacher preparation program performance" (US Department of Education, 2016, p. 670). To accomplish this goal, "For each year and each teacher preparation program in the State, a State must calculate the aggregate student learning outcomes of all students taught by novice teachers," where a novice teacher is a "teacher of record in the first three years of teaching" (US Department of Education, 2016, p. 656). Poor performance according to these measures could lead to consequences as severe as program closure. The rules spurred considerable opposition (Tatto et al., 2016), and they were eventually rescinded by Congress in the spring of 2017. Even so, this idea has not gone away at the state level. For example, current rule holds Texas TPPs accountable for "achievement, including improvement in achievement, of students taught by beginning teachers for the first three years following certification, to the extent practicable" (Texas Education Agency, 2019). Technical challenges have so far prevented implementation.
The second development is expansion of parallel teacher preparation systems that transform teacher development from a preservice to an in-service activity. Parallel systems have mainly drawn attention in connection with Teach for America and New York City Teaching Fellows (Shiva Mungal, 2015;Shiva Mungal, Trujillo, & Scott, 2016). We will report here on the intersection of parallel preparation systems with for-profit certification, which provide the possibility of rapid growth to large scale.
Much of the research addressing these reforms has focused on the use of student test scores to hold teacher preparation programs accountable. The main finding, as we will review in the next section, is that it is very difficult to discern differences between programs based on the learning gains of the students of their graduates. These research findings may have been published with the expectation of warding off use of value-added models to evaluate educator preparation programs. However, a consequence, intentional or not, has been to provide a research base that supports expansion of for-profit alternative teacher preparation pathways. For if there is no measurable difference in student test scores between teachers from standard and alternative programs, then there is no reason states should not change their rules to enable alternative pathways.

Background
The study is set in Texas. This is for the simple reason that, as shown in Figure 1, Texas stands out from the rest of the country in the number and fraction of teachers coming from alternative certification programs. More than two decades ago Texas created a parallel certification structure that operated without much attention and now provides more than half of the state's new teachers each year. The Texas experience should be of interest to the rest of the country because the Texas experience may be coming to the rest of the country. The two largest Texas companies providing alternative certification are expanding to other states. In the period between 2016 and the end of 2018 these companies secured permission to operate in Florida, Nevada, Utah, Indiana, South Carolina, North Carolina, Hawaii, Arizona, Michigan, Louisiana, and the District of Columbia (iTeach, 2019;Teachers of Tomorrow, 2018). Additional states are likely to follow suit. Thus there should be interest in acquiring evidence about the effects on students of this growing force in education.

Alternative Certification in Texas
Alternative certification of teachers was first permitted in Virginia in 1982, soon followed by California, Texas, and New Jersey (Suell & Piotrowski, 2007). Alternative certification is difficult to define precisely and can encompass a wide range of programs. In broad terms, alternative certification refers to "pathways designed to attract a wider range of candidates into teaching generally by reducing or eliminating pre-service education coursework and speeding paid entry into the classroom" (Grimmett & Young, 2012, p. 34).
The first alternative certification program in Texas was offered by the Houston Independent School District in 1985, with other school districts following suit. In 2000, the first community college offered an alternative certification program, and the first for-profit company began certifying teachers in 2002 (Etheredge, 2015). Figure 3 shows the numbers of mathematics and science teachers prepared through university-based programs and non-university alternative certification programs in Texas since 2004. From 2004 until 2011, the numbers of alternatively certified teachers rose steadily. In the fall of 2011, production from alternative certification programs suddenly dropped by half, due to widely publicized budget cuts that led tens of thousands of teachers to face dismissal. In such an environment, a drop in the number of people seeking to change careers and enter teaching through an alternative certification program is to be expected. Production eventually recovered, and by 2017 over half of Texas' new STEM teachers were certified in alternative programs. Figure 4 shows the total number of STEM teachers prepared since 2004. As this figure shows, the number of STEM teachers prepared in 2016-2017 was essentially unchanged from a decade before. Therefore, introducing alternative certification programs in Texas has offset the decline in the production of STEM teachers from standard certification pathways illustrated in Figure 3, but has not led even to increases one might expect just on the basis of population growth.
The National Research Council describes one particular difficulty in assessing the effectiveness of teacher preparation programs (NRC, 2010): "there is more variation within categories such as `traditional' and `alternative' --and even within the category of master's degree programs -than there is between the categories" (p. 2). This worry is less applicable to Texas than it may be to other jurisdictions. In practice, the standard and alternative pathways are so different as to form parallel certification systems (Shiva Mungal, 2015).
Alternative certification emerged from the philosophy that barriers to teaching should be removed. The candidates, all of whom already have finished a first Bachelor's degree, usually have a few weeks of instruction and observation after which they enter the classroom working full time, completing their pedagogical coursework during an internship year, under a Provisional (or Internship) certificate. Thus, for alternative certification candidates, teacher preparation is an inservice program.
By contrast, standard university programs provide coursework, often but not necessarily as part of a degree, as well as fieldwork prior to a student teaching semester. The candidates then begin teaching full time with a standard certificate. Thus, in the standard model, teacher preparation is a preservice program. This distinction is not apparent from the state rules, since universities are permitted to place students in schools with Provisional certificates, while alternative certification programs are permitted to offer their candidates a student teaching experience. Nevertheless, when we describe our study sample, we will show that the standard and alternate programs are largely distinct when it comes to specific preparation practices.
There is an additional respect in which one might wonder if standard and alternative pathways differ, and that is the undergraduate major of the students. However, from 1987 until 2019, secondary teachers were forbidden to major in education in Texas. Therefore, for all pathways, the teachers have majored in one of the subjects they will teach, typically mathematics for mathematics teachers, and biology for biology teachers.

The Use of Value-Added Modeling to Evaluate Teachers and their Preparation Programs
Value-added modeling arose from work of Hanushek (1971) and Sanders & Rivers (1996). Value-added models use multilevel linear regressions to estimate the expected scores of students in a classroom, based on prior test scores and demographic characteristics. Deviations between this estimate and actual classroom performance can be attributed to the skill of the teacher.
Value-added models are particularly controversial when they are used to make high-stakes decisions about individual teachers (Amrein-Beardsley, 2009;Darling-Hammond, Amrein-Beardsley, Haertel, & Rothstein, 2012). One of the largest problems is that the estimates are noisy. It violates basic fairness to make promotion and dismissal decisions about individual teachers based on a volatile stochastic process, even if the estimates used for the decisions are shown to be right on average.
In addition to large margins of uncertainty, value-added models are prone to systematic bias, and small technical changes can have the effect of raising or lowering the apparent value added by whole groups of teachers. This problem should be possible to address with technical improvements, and the conventions that guide practitioners are evolving (Koedel, Mihaly, & Rockoff, 2015;Rivkin, Hanushek, & Kain, 2005). For example, it was once common to see models where students' prior test scores entered to linear order, and this created a systematic bias in favor of teachers whose students' average prior scores were at the high or the low end of the possible range (Marder, 2012). In the last five years it has become common (e.g. Backes, Goldhaber, Cade, Sullivan, & Dodson, 2018) to include students' prior scores to quadratic or cubic order, as a result of which this particular source of bias goes away.
Value-added models can be used in three separate ways in relation to teacher preparation. First, they can be used to evaluate student learning gains due to teachers from specific programs. Examples here are evaluations of Teach for America (Clark et al., 2013;Decker, Mayer, & Glazerman, 2004;Turner, Goodman, Adachi, Brite, & Decker, 2012) and of UTeach (Backes et al., 2018). According to Clark et al. (2013) the difference in value-added effectiveness between TFA graduates and those of comparison programs is .06 standard deviations, and the value added by UTeach graduates is on the same order. These findings set the scale for the largest value-added differences one might expect to find for individual programs.
The previous Texas results are of particular relevance for us. Mellor et al. (2008) studied student test scores in classrooms of novice teacher graduates from University of Texas System campuses, with data from 2003 through 2007. Their primary goal was "to determine how student achievement in the classroom might be used as an indicator of the success of teacher preparation programs" (p. 8). At the time there was no state-wide data system in place and they spent years obtaining data from over 400 districts. They carried out a variety of comparisons with multilevel models, but almost none of the effects they found was large enough to rule out having been caused by sampling uncertainty. They sum up by saying, "Our most significant finding was that limitations of most state data and assessment systems, including the one in Texas where our study was conducted, make this kind of research difficult" (p. 24). Six years later, the problem of evaluating learning gains due to Teacher Preparation Programs (TPPs) was revisited by von Hippel et al. (2016), now with the advantage of a statewide data set. They could not detect program differences and conclude, "The potential benefits of TPP accountability may be too small to balance the risk that noisy TPP estimates will encourage needless, disruptive, and ineffective policy actions" (p. 2). These conclusions are similar to the findings of (Koedel, Parsons, et al., 2015) in Missouri, and have since been extended to several other states (von Hippel & Bellows, 2018).
Third, value-added models can be used to examine the efficacy of broad classes or types of preparation pathways (Harris & Sass, 2011). The present study of teacher preparation pathways in Texas falls into this third class. Some studies (Gordon et al., 2006; conclude that factors such as preparation routes and advanced degrees have almost no measurable effects on student outcomes. Boyd et al. (2012), analyzing some of the same data from New York City as , conclude that differences in teacher background can be detected; the difference between the studies lies in how the models were constructed. The models of Boyd et al. (2012) pay more attention to grouping teachers with similar characteristics and from similar programs. The largest single effect in the base model of Boyd et al. (2012) is that a teacher have five years of experience, which corresponds to a value-added gain of 0.1 standard deviations in student test scores for middle school mathematics. The largest program differences, which are for Teach for America corps members, are around 0.05 standard deviations, while for College Recommended teachers the effect is around 0.02 standard deviations.
The net result of studies comparing different types of teacher preparation programs has been sustained uncertainty. The National Research Council determined that "Because the information about teacher preparation and its effectiveness is so limited, high-stakes policy debates about the most effective ways to recruit, train, and retain a high-quality teacher workforce remain muddled" (NRC, 2010, p. 5). Grossman & Loeb (2008, p. 185) similarly conclude that "[t]he available research does not paint a complete picture of either optimal recruiting and selection criteria nor optimal preparation opportunities." In the absence of convincing results about pathways, it is not surprising that the US Department of Education decided that "effectiveness of graduates is not associated with any particular type of preparation program, [so] the only way to determine which programs are producing more effective teachers is to link information on the performance of teachers in the classroom back to their teacher preparation programs" (US Department of Education, 2016, p. 566).
Although some scholars argue that noisy value-added model estimates make it difficult to reliably determine the effectiveness of individual TPP's (von Hippel & Bellows, 2018), there is some experimental evidence ) that the estimates can be accurate when averaged over a sufficiently large number of teachers. As such, using value-added models to evaluate teacher preparation pathways can provide reliable estimates of their efficacy in improving student test scores (Constantine, Player, Silvaa, Grider, & Deke, 2009;Guarino, Santibanez, & Daley, 2006;Wayne & Youngs, 2003). Rather than determine the efficacy of individual TPP's, we seek to estimate the effects of alternative and standard teacher preparation pathways on student achievement using statewide available in Texas. Given the large number of teachers we include in our study, the estimates we provide for the efficacy of teacher preparation pathways are reliable.

Theoretical Framework
We make use of value-added models, which are a standard tool in addressing questions such as those in this paper. A difficulty of these models, which is common to studies using administrative data rather than random assignment, is that they permit confounding associations.
There are no experimental studies that address directly the questions we pose. Clark et al. (2013) and  switched teachers randomly between classrooms in a school. This randomization would be useful if teachers from some pathway were preferentially assigned to certain sorts of students within each school. We examine this possibility later and find no evidence for it. The main confounding association we actually find is that schools with higher percentages of disadvantaged students are more likely to have alternatively certified teachers.
An experimental study to eliminate this confounding association would need to allow teachers to be hired into schools by the normal process and then, once they had been given their teaching assignments, randomly switch them between schools. The practical and ethical barriers to such a study are so considerable it is unlikely ever to be carried out.
The modern theory of causality provides tools to permit thinking about such problems while designing and analyzing data. Two important figures are Rubin and Pearl (Imbens & Rubin, 2015;Pearl, 2009;Rosenbaum P. & Rubin, 1983;Rubin, 1974Rubin, , 2005. Texts and review articles summarize their methods for application in the social sciences (Holland, 1986;Morgan & Winship, 2015;Vanderweele, 2015).
A strong recommendation from this literature is for studies such as this one to present relations between variables in graphical form with diagrams where arrows indicate the direction of causality, and absence of an arrow indicates absence of a causal relation. We provide such a diagram for our system in Figure 5. When we construct hierarchical linear models, we will provide four primary variants. The assumptions in these variants are indicated through three causal links that are only present in some of the models as indicated. In two of the models, student score depends on classroom averages of demographic variables, and in one of the models, teacher quality depends upon years of experience independent of preparation pathway.
The most consequential causal assumption is whether Campus Quality impacts Student Score independent of Teacher Quality. If one assumes that excellent campuses are excellent mainly because of the quality of the teachers, then there is no direct causal link from Campus Quality to Student Score in year Y. This is the formal way that the confounding association between teacher and school quality shows up. We have chosen to deal with it by constructing models that correspond to the two different causal assumptions. For Algebra I, results do not vary much when we do this, and this gives us confidence in the conclusions. For Biology the results vary much more, which is why we end up more uncertain about the effect of teacher preparation pathways in Biology.
Note in all models the absence of an arrow from Student Demographics to Preparation Pathway. This represents a claim we will substantiate in the next section that within a school, the characteristics of students in a classroom are uncorrelated with the preparation pathway of the teacher.
Two additional concepts are the importance of thinking in concrete terms about interventions, and the need to think through the causal relations between known variables. An example pertinent to the current case may help explain. A robust finding of the literature on teacher quality is that teachers get better with experience, at least up through their first five years (Boyd et al., 2012). We will show that teachers from standard programs are more likely to remain in teaching than those from alternative programs. Therefore, one of the reasons that students of teachers from standard programs might learn more is that teachers from standard programs on average are more experienced. Now imagine some policy intervention that has the effect of increasing the fraction of teachers in a state that comes from standard programs. How should one estimate the effect on students of this intervention?
Causal reasoning says that if staying longer in teaching follows from attending a standard program, and if attending a standard program follows from the policy intervention, then it is fine to attribute increases in student test scores from their experienced teachers to the policy. This means that in constructing a value-added model for teachers where the population is in steady state, one should not control for years of experience. In Figure 5, Years of Experience depends on Preparation Pathway for three of the models, and therefore these models do not include a control for Years of Experience. Formally, controlling for the causal descendent of a factor introduces new confounding associations (Morgan & Winship, 2015, pp. 101-109). This argument is only fully convincing if the age population of teachers from different pathways has reached a steady state, but this is not the case, and for this reason we do present a model including Years of Experience.
Given the competing causal explanations that attach to models with different terms, we have run many different forms of the models, including many variants we do not have space to present. We regard our results as most reliable when they are robust against all our variations.

Data Source
We obtained our data from the Texas Educational Research Center (ERC). This is a data repository populated by a number of different Texas agencies, including the Texas Education Agency and the Texas Higher Education Coordinating Board. We first obtained access in 2014, with a request to study student outcomes in high school STEM disciplines as a function of teacher preparation pathway. The right to access and publish results depends upon staying within the confines of the original request.
The data sources are longitudinal, meaning every teacher and every student has a unique anonymous identifier that follows them through the years. For the purpose of this study, we needed to be able to link students and teachers, so that we would know the preparation background of the teachers and associate them with their students. Although the ERC has test score data going back to as far as 1994, links between students in a class and their teacher only became available for the year 2012. Thus when we first applied for data access we had available one year's worth of data, while at this point we have seven years to work with. The only high school STEM exams offered over this time period are Algebra I and Biology, and these are the exams we study. Essentially all Texas students take these exams; however, tens of thousands of students take Algebra I and its associated exam at the end of middle school in eighth grade. The population of eighth graders taking Algebra I is quite different from the population taking it in ninth grade or later. The eighth-grade students have much higher prescores on average and much lower incidence of free and reduced lunch. In this study we restrict our attention to the ninth grade Algebra I population only. Almost all students take Biology in ninth grade, so the question of restricting to that population does not arise.

Descriptive Statistics Teacher Retention
We begin with some straightforward characterizations of teachers coming from different pathways and the schools in which they work. Our first finding is that in recent years STEM teachers coming from standard programs have been more likely to stay in teaching than those from alternative programs. We investigated this by examining teachers appearing in the Texas dataset, excluding those coming in from out of state. Once they appeared on the teaching roster somewhere in the state, we checked if they were still teaching n years later and computed the fraction remaining in teaching as a function of years in service and program pathway. The result appears in Figure 6. We see that for the period 2003-2007, STEM teachers from alternative and standard pathways stayed in teaching at approximately the same rate for any number of years of service. However, for STEM teachers entering the profession between 2008 and 2012, at the five-year mark 5% more from standard pathways were still in teaching, while for those entering after 2013, 10% more were still in teaching at the five-year mark. These findings for STEM teachers are compatible with the results for teacher retention by pathway for all teachers (Ramsay, 2017b). Thus, 15 years ago when the fraction of teachers from alternative pathways was small ( Figure  3), the teachers entering in this fashion were just as committed to continuing as those from the standard routes. More recently, as teachers from alternative pathways have become dominant in the state, their likelihood of staying in teaching has dropped further and further behind that of the teachers from standard pathways.

Teacher Population
The specific population of teachers we study in detail is Algebra I and Biology teachers from the years 2011-2012 through 2017-2018. We do not include all the Algebra I and Biology teachers in the state. Our dataset has those in public schools, regular and charter, but no private schools. For the purposes of this study, we exclude teachers who were certified out of state. Furthermore, because the period in which alternative certification programs existed in Texas extends back approximately two decades, we consider teachers with up to 20 years of experience. A cutoff is appropriate because more than 20 years ago the teacher preparation environment in Texas was so different than it is today it does not make sense to use data from teachers prepared then as a guide to the future.
Another sub-population of teachers described by years of service merits special attention. Federal teacher preparation regulations from 2016 specified that teacher preparation programs were to be assessed on the performance of their novice teachers, defined as those with less than four years of experience (US Department of Education, 2016, p. 68). These regulations were rescinded, but Texas and other states are moving to put in place such evaluation systems anyway. Thus, we decided to use two teacher experience groups: novice teachers, defined as those with less than four years of experience, and all teachers with up to twenty years of experience. Table 2 displays the total numbers of teachers in our sample for each academic year, and for these two levels of experience. The number of teachers prepared in Texas teaching either Algebra I or Biology started at around 3000 and increased by around 20% over the seven years we examined. However the number of novice teachers assigned to these courses dropped by nearly half. We will return to this point later. Now we look further into teachers assigned to Algebra I and Biology courses each year, and examine the pathways that led them into teaching. Three variables describing them can appear in principle in any combination. Teachers can come from a standard or alternative program, they can come from an Institution of Higher Education (IHE), or not, and they can enter teaching on a Standard or Provisional Certificate. The Standard certificate means they have a student teaching experience; otherwise they enter classrooms as full-time paid teachers on Provisional (or Intern) certificates without having had student teaching. Table 3 shows the likelihood of combinations of these variables for the subjects, years of experience, and academic years we are considering. Matters are not quite as simple as a binary distinction between standard and alternative programs, but cases that muddle the boundaries are not common. For example, fewer than 10% of teachers come from university post-baccalaureate programs with Provisional certificates. The entities offering alternative certification programs are varied. They include school districts, state-supported education service centers and universities. However the largest providers by far are companies that advertise low cost and provide many services online ("iTeach," 2019; Teachers of Tomorrow, 2018). Therefore, rather than studying the eight possible pathway scenarios, we reduced them to two. We define standard programs to include teachers prepared by Institutions of Higher Education (IHE) enrolled in standard or post-baccalaureate programs and obtaining Standard first certificates. Everyone else, including some of the students from universities, we attribute to an alternative program. We ran variants of the analysis, for example including graduates of university-based alternative programs who began teaching with a standard certificate in the standard group. However as the numbers of teachers in such classifications were small, and none of the conclusions in our analysis were affected, we report only results from the comparison groups we have just described. The great majority of teachers come either from standard university programs with student teaching or from alternative programs without it.
For teachers with up to 20 years of experience in our sample, around 40% come from standard programs and 60% from alternative programs. For novice teachers, 30% or less come from standard programs, and 70% or more from alternative programs. These percentages illustrate the scale to which alternative certification has grown in Texas. However, please note that the detailed results in this section describe the teachers in our sample assigned to teach particular high-stakes courses, not teachers in the state overall.

Student Population
Next we turn attention to the characteristics of students, and the associations between teacher pathway and student demographics. Table 4 provides descriptions of students appearing in our sample, comparing the classrooms of teachers from standard and alternative pathways. We aggregate all years together, since changes over time were not worth remarking. Standard and alternatively certified teachers have significantly different student populations. The columns labeled "Standard" and "Alternative" report the percentage of teachers' students with a certain characteristic. For example, 3.4% of the students of teachers from standard pathways were designated as Gifted. The alternatively certified teachers have a higher fraction of their students who are eligible for free and reduced lunch and who are Black and Hispanic. In general, for any factor that tends to lead to lower student outcomes, alternatively certified teachers have more of these students. This makes it important to control for these student characteristics in the analysis.
We investigated whether the difference in student populations of the teachers from the two pathways is mainly within schools or between schools. To do this we constructed hierarchical linear models (Bolker et al., 2016;Gelman & Hill, 2007) of the form where X i is a demographic characteristic of student i, C i is their campus, StdCert i is the certification pathway of their teacher, here and elsewhere is a random term making X i normally distributed, and N is the normal distribution. The results appear in the final column of Table 4. They show that after one controls for the campus, the difference between student populations of teachers from standard and alternative pathways becomes much smaller, often insignificant, and may reverse sign. The difference in student populations is mainly due to the schools in which teachers from standard and alternative pathways are likely to work. Within a given school the teachers from standard and alternative programs are not preferentially assigned to one sort of student or another. The final row of the table describes students who are tracked into Algebra I in eighth grade.

Multilevel Models for Student Test Scores
We studied changes in student test scores through multilevel models where students are nested within classroom, classrooms are nested within teacher, teachers are nested within campus, we control for each student's pre-score, an array of demographic information about student and campus, and estimate the effect of teacher pathway on student test scores. Thus, to the extent possible, the models compare teachers with other teachers teaching the same subject in the same school and attempt to compensate for differences in school and classroom populations.
We start with data from the 2011-2012 academic year using pre-scores from 2010-2011 and proceed through 2017-2018. The 2011-2012 academic year was the first year that student-teacher links became available in the Texas statewide dataset, and the 2017-2018 academic year provides the most recent data available 2 .
During our study period, Texas was transitioning between sets of high-stakes standardized exams, from TAKS to STAAR. The only high school STEM exams offered during this entire period were STAAR Algebra I and STAAR Biology. In 2011-2012, pre-scores came from TAKS eighth- We kept only cases where the student had a valid ninth-grade score in year Y and a valid eighth-grade score in the previous year. There were tens of thousands of students who took Algebra I in eighth grade, mainly high-achieving students in suburban middle schools. We decided not to include an analysis of this population. There were several accommodations available to students, including provisions for English-language learners, vision-impaired students, and a modified exam for students with learning disabilities. We could not simply group these students in with other students because most of them were taking a substantially different exam. Thus, in most of our analyses, we exclude all students who received any of these accommodations. However, we included a large subset of them in the following way. In our report on student sub-populations, we create a multilevel model for all students who took the alternate exam in year Y -1 and also took the alternate exam in year Y.
Test score effect sizes are customarily obtained by dividing exam scores by the standard deviation. For ninth graders who took Algebra I in 2011-2012, the standard deviation was 0.17. For these same students the standard deviation of their mathematics scores the year before in 8th grade was 0.15, and other years are similar. We express all model results in units of the exam standard deviations, which is around 0.16.
Any given student test score result could end up in our data set from one to six times. The test scores appeared multiple times when the student took classes with separate identification numbers in separate semesters, when more than one teacher was associated with the class section, or if the student changed schools in Texas. We weighted every student record inversely with the number of times they appeared, so if a student was taught by several teachers during the year, each of them shared equally, and so that student did not contribute more to the final results than a student who appeared only once.

Model Specifications
We explored many different multilevel models, using lmer in R (Bolker et al., 2016). In our first collection of models, which are variants of Eq. (1), all data from 2011-12 through 2017-18 were included at once. At the top level, S i is the score of student i in some year and S i,Y−1 is the same student's score on the exam in the same subject the previous year in a cubic polynomial. Teacher of student contributes through the random intercept [ ] . By modeling the teacher in this way, each teacher should contribute equally to the estimate of the effect of their pathway to teaching (Koedel, Parsons, et al., 2015). The campus contributes random intercept C k[i] as does the class through Class n[i] , meaning that student is in class . Coefficients for student-level demographic factors X range over Gifted, racial and ethnic groups, Limited English Proficiency (LEP), Free/Reduced Lunch Eligibility (EcoDis), and Special Education. Here g[i] is the value of assigned group affiliation for student .
We modeled the influence of tracking, as recommended by Jackson (2014). The most important form of student tracking in Texas is placing students in Algebra I in eighth grade. To control for this, we removed students enrolled in Algebra 1 during their eighth-grade year from our study of mathematics and created a flag for them when modeling biology. Some variants of Eq. (1) control for classroom-level averages of the demographic variables through γ X X ̅ n[i] , and others include a variable to control for teacher years of experience, assuming binned values of 0-4, 4-10, and 10-20 years of experience.
The main item of interest, certification pathway StdCertm for teacher of student out of program enters as a fixed effect. Finally, the second level of the model has random intercepts for teacher T, campus C, and class section Class. See Gelman & Hill (2007, Chapter 12.5) for notation.
Tables 5 and 6 provide results from these models. We present results from four different variants of this general form so as to display the effect upon the variable of interest, StdCert, of progressively adding terms. The causal assumptions of these four models were illustrated in Figure 5. Model (1a) lacks a campus intercept and also lacks averages of classroom demographics. The causal assumption in this model is that high quality campuses with students who perform well are high quality mainly because their teachers are of high quality. Model (1b) adds the campus intercept as a random effect, allowing high quality campuses to improve student scores through means other than teacher quality. (1c) adds classroom averages of demographic variables, and (1d) adds a control for years of service, which is justified by assuming it does not result from teacher pathway. Model (1c) is probably the most persuasive of the model specifications, since classroom averages of demographic variables do turn out to affect the results. While controlling for years of service in (1d) is debatable on causal grounds, adding it or not does not make much of a difference.
We also created estimates in which each year was treated separately. Model 2 is similar to model (1c) with random intercepts for campus, class, and teacher, and controls both for student demographics, classroom averages of student demographics, and tracking. T j ∼ N(μ T ; σ T 2 ); C k ∼ N(μ C ; σ C 2 ); Class n ∼ N(μ L ; σ L 2 ) . T j ∼ N(μ T ; σ T 2 ); Class n ∼ N(μ L ; σ L 2 ) This is the same as the previous, except that campus is treated as a fixed effect at the top level, rather than being modeled as a random effect at the second level. This model is less appropriate for finding the contribution of teacher pathway because in cases where a campus has teachers from only a single pathway, the campus fixed effect subtracts them off rather than comparing them with teachers in similar campuses as the campus random effect model does.
Model 4 is T j ∼ N(μ T ; σ T 2 ); Class n ∼ N(μ L ; σ L 2 ) In this case there is no campus intercept. One would use this model if one adopts the view that the difference between campus performance is mainly due to the teachers and not to other nonstudent factors.
We had a fifth model which we applied to subgroups of students.
This model was applied after being restricted to a demographic subset of our sample, for example to the subgroup of economically disadvantaged students, gifted students, or Special Needs students taking an alternate test. This model allowed us to focus on the effect of teacher pathway on a specific group of students, while also controlling for the broader demographic compositions of their classes. \ We begin our discussion of results with Model (1). The random and fixed effects are given in Table 5 for Algebra I and Table 6 for Biology. The coefficients related to student subgroups for the models of Algebra I and Biology are quite similar to each other.

Multi-level Models
Among the random effects, the largest is the difference between campuses, with a standard deviation of 0.26 for Algebra I and 0.28 in Biology. The standard deviation of the difference between teachers is around in 0.25 Algebra I and 0.2 in Biology. That is, we find slightly larger differences between campuses than within them. The standard deviation of classes taught by the same teacher is between 0.13 and 0.17 in both subjects.
We report the teacher pathway effect (StdCert) as positive when students of teachers from standard programs get higher scores than those from alternative programs. As shown in Table 5, students with teachers from standard pathways gain around 0.03-0.05 in Algebra I.  This result is significant in all the model specifications chosen, although the effect is nearly twice as large in the models with fewer controls. That is not surprising, since we know from Table 4 that teachers from standard programs find themselves overall in classrooms with fewer economically disadvantaged students, and more white students. In Biology, as shown in Table 6, results are significant in models 1a and 1b, but not in models 1c and 1d that include averages of classroom demographics and an intercept for each campus.
Thus the case for improved student learning in classrooms with teachers from standard programs is strong for Algebra I, and robust against all variation of the models we examined. The case is weaker in Biology than in Algebra I, where we measure a significant advantage for teachers from standard pathways only in models that attribute most of the student learning gains in schools that are high-performing overall to the teachers rather than to other factors.
The difference between models 1c and 1d is that 1d includes a control for years of experience, and 1c does not. We include the control for teacher years of experience despite the evidence that years of experience depends on pathway because it has been conventional in previous studies. One sees that whether it is included or not turns out not to make much of a difference.
We now turn to a more detailed analysis year by year and for various subgroups. Table 7 presents results for the three different models that vary the way campus effects are treated. The fixed effect model (Eq. 2, Fixed) which maximizes how much of an effect is attributed to the campus tends to give the smallest results, and the model without campus effects (Eq. 3, None) tends to give the largest results, showing that strong Algebra I teachers are associated with strong campuses. In every year and for each of the models, students gain about 0.03-0.05 in classes of Algebra I teachers from standard programs with up to 20 years of experience. The point estimates in classrooms of novice Algebra I teachers with up to four years of experience are similar, but the sample sizes are much smaller, and few of the differences are statistically significant. For Biology classrooms, there are fewer statistically significant results, and the point estimates are scattered between positive and negative. If one removes the classroom averages of demographic variables from models (2)-(5), there are many cases where students Biology teachers from standard programs have significantly higher learning gains, but these models are harder to defend than the ones we have used, and we do not report their results.   Table 7 Estimates from Models (2) Tables 8 and 9 provide results for each combination of teacher pathway, subject, and a variety of student subgroups, using the model in Eq. 5 for teachers with up to 20 years of experience and novice teachers, respectively. Almost all of the estimates in Algebra I by subgroup indicate that students of teachers with standard certification gain between 0.03 and 0.08 in standard deviation units more than their counterparts with alternatively certified teachers. The largest differences are for students flagged as gifted, but the results from students eligible for free and reduced lunch (FRL) and those of limited English proficiency (LEP) are also noteworthy. The effects are stronger in Algebra I for teachers with up to 20 years of experience than they are for novice teachers. In Biology there are few statistically significant results, except for novice teachers in 2011-2012 and experienced teachers in 2016-2017. For both Algebra I and Biology the majority of the point estimates favor teachers from standard programs, and there are only two statistically significant results favoring teachers from alternative pathways, which are for LEP and Hispanic students of novice Biology teachers in 2017-2018.
The Algebra I results from this section are summarized in graphical form in Figure 7. The results for all students come from the model in Eq. (2), and the rest from Eq. (5). Only results for teachers with up to 20 years of experience are shown in the graph.

Teacher Assignment
We considered whether our results might be affected by the way teachers were selected to teach courses with high-stakes exams. Because of the very high stakes for schools and their personnel associated with these exams, one could expect principals to monitor past results carefully, and assign teachers with a good track record for raising student test scores to Algebra I and Biology (Dieterle, Guarino, Reckase, & Wooldridge, 2015).
We find evidence for such assignment bias, and it shows up in several ways. We constructed a two-stage model for the probability of being assigned to teach. The first stage of the model is Equation 1, which computes a value-added coefficient for every teacher. The second stage is a binomial logistic regression model that computes the probability a teacher was assigned to teach Algebra I or Biology as a function of the value-added score in the course in the same school the year before. Thus, the probability of being assigned ( = 1) to a course given value-added score T in the previous year and certification pathway StdCert is . (5) Here T is the value-added score we compute for each teacher normalized by the standard deviation of value-added scores and StdCert=1 corresponds to a teacher who came from a standard program. The coefficients of this model for Algebra I and Biology appear in Table 10.
The results are significant every year. For example, if an Algebra I teacher from a standard program in 2011-2012 had a value-added score 1.5 standard deviations above the mean, they had a 60% chance of returning to teach the course the next year as opposed to a 41% chance if their value-added score was 1.5 standard deviations below the mean. We also find that teachers from standard pathways were more likely to be reassigned to teach than those from alternative pathways, after controlling for their value-added scores. Perhaps this is because department heads and principals saw something of value in their practice the scores did not capture.  School principals and department heads did not of course have access to the specific valueadded scores we have computed, but they had in their possession all the raw data about student test scores that go into making them up and appear to have acted accordingly (Dieterle et al., 2015;Grissom, Loeb, & Nakashima, 2014).
While value-added score was the strongest single predictor we found of whether a teacher was assigned twice in a row to teach Algebra I or Biology, many characteristics of the teacher population changed. It is worth nothing that the STAAR exams we use as a post-test were employed for the first time in the spring of 2012; at that time high school students were expected to take 15 exams in order to graduate. In the spring of 2013, the Texas legislature reduced the required exams from 15 to 5 and abolished all the STEM exams but Algebra I and Biology. One result was a dramatic shift in the experience distribution of teachers assigned to Algebra I and Biology. In 2011-2012, 35% of all Texas mathematics teachers had 0-5 years of experience (Ramsay, 2017a), and 39% of the Algebra I teachers had 0-5 years of experience. But by 2014-2015, when the percentage of Texas mathematics teachers with 0-5 years of experience was essentially unchanged at 37%, the percentage of Algebra I teachers with 0-5 years of experience dropped to 17%. As shown in Figure  8, the drop-in novice Algebra I teachers was accompanied by a rise in teachers with 10-20 years of experience. Placement of biology teachers was equally well predicted by value-added scores, and the distribution of Biology teachers changed in a very similar fashion, from a distribution characteristic of science teachers overall, to a distribution greatly weighted towards teachers with 10 to 20 years of experience.

Discussion
Some previous studies have concluded that characteristics of teacher education are too small to detect or too small to matter in raising student test scores (Aaronson, Barrow, & Sander, 2007;Gordon et al., 2006;Harris & Sass, 2011;Rivkin et al., 2005;Staiger & Rockoff, 2010;von Hippel & Bellows, 2018). We found significant effects on the order of 0.03 to 0.05 standard deviations for ninth-grade students of Algebra I in favor of teachers with standard certification. Effects in Biology are weaker in the models with the strongest controls, although the occasional significant differences favor teachers with standard preparation. The pathway effects we find in Tables 5 through 9 are small compared to the typical deviations between teachers and schools, although they are consistent with teacher pathway effects found in other studies (Boyd et al., 2009).
Whether gaining 0.03 in standard deviation units is an important educational difference merits additional discussion. It corresponds to a 25% greater chance of getting one more problem right on a 50-question exam, since the standard deviation on the exam is 0.16 x 50=8 questions, and 3% of that is a quarter of a question. This may seem too small to matter. However, if sustained over time, the magnitude of this effect is comparable to that of living in poverty. For example, in Tables 5  and 6, the coefficient for EcoDis, free and reduced lunch eligibility, is around −0.07. As an additional illustration, Figure 9, employing methods of (Bendinelli & Marder, 2012) shows that if one groups students according to their mathematics scores and free/reduced lunch status in fourth grade, and then follows the students through 11 th grade, the difference between the well-off and low-income students develops to around 0.06 in standard deviation units and it takes around three years to develop. That is, the difference in test score results due to having a math teacher from a standard program for a year is of the same order as the effect on test results over a year associated with living in poverty. One could conclude that the standardized tests are not very sensitive either to instruction (Popham, 2007;Stroup, 2009) or to poverty, but this does not mean the tests are completely incapable of detecting them. Because the tests are not very sensitive to what we wish to measure, it takes a large number of teachers and students to arrive at reliable values. As seen in Tables 5 and 6, for a single teacher teaching multiple sections of the same class at the same time, the typical variation from one section of the class to another is around 0.18 standard deviations. We estimate that to find the effect of any particular type of teacher preparation pathway with uncertainty less than 0.03 standard deviations, one must average results from around 1000 teachers. We accomplished this by aggregating together preparation programs in groups with similar practices rather than focusing on effects at the level of a single program. Such grouping also is a feature of the study of National Board certification in Cowan and Goldhaber (2016), and of pathways in New York City by Boyd et al. (2009).
Overall Algebra I teachers from standard certification pathways improved student test scores by 0.03 to 0.05 standard deviations. For subgroups including gifted students, students eligible for free and reduced lunch, and Black and Hispanic students, students of standard teachers gained 0.03 to 0.08 standard deviations. In Biology the evidence for positive effects for teachers from standard programs overall is not robust, although there are scattered positive results on the order of 0.03 standard deviations in Table 8 for teachers with up to 20 years of experience, and mainly positive but also some negative results in Table 9 for novice teachers. The columns in Table 6 for Models 1a and 1b also find significant test score gains for Biology students of teachers from standard programs. These are the models based on the assumption that the main reason students in some schools have higher scores is that the teachers are better.
We also find, as often found before, that there is more variation of student outcomes within teacher preparation pathways than between. This finding has been used in support of policies that reduce barriers for new people to enter teaching, but make it difficult for them to continue unless they can demonstrate favorable student outcomes (Gordon et al., 2006). While such policies might make sense in cases where there are more people wishing to become teachers than there are positions available, they are less justifiable for shortage areas such as secondary STEM. It is hard to imagine that young people or career changers will be attracted to secondary teaching by the prospect of having tenure and merit decisions made by value-added models. Newspaper accounts such as those of Bonner (2016) conclude that use of value-added scores for merit and promotion has been exacerbating teacher shortages. Teacher shortages may not directly impact high-stakes subjects such as Algebra I and Biology; schools have to staff them or face severe penalties. Shortages show up for subjects such as computer science where there are no high-stakes assessments, and where only a small fraction of high schools even offer a course (Guzdial, n.d.). It is tempting to consider policies that make it difficult for teachers with low value-added scores to continue teaching altogether, in hopes of capturing some of the 0.18 standard deviations advantage for the best teachers in Tables 5  and 6. However, reducing the stability of teaching careers will impact which individuals decide to enter teaching or settle instead on other careers. There is no assurance that secondary students will benefit in the end.
We remark in passing that our model estimates in Tables 5 and 6 contain interesting results beyond those directly pertaining to teacher certification pathway. For example, both in Algebra and Biology, a student identified as Economically Disadvantaged gains about 0.07 standard deviations less per year than one who is not. However the effect of poverty concentration is larger; in Algebra I, a class where none of the students is Economically Disadvantaged gains 0.09 standard deviations in comparison with a class where all the students are disadvantaged, and in Biology concentrated poverty can be responsible for a drop as large as 0.17 standard deviations. This observation should be taken into account in connection with studies of school choice involving lotteries (Angrist, Hull, Pathak, & Walters, 2017) because it estimates the effect students have on each other. If charter schools concentrate students with supportive families independent of ethnicity and economic need, then effects attributed to school organization and pedagogy might instead be due to influence of students on each other. Comparing charter school students to students denied admission by lottery back in the regular public schools does not address this problem.
The change we found in teacher assignment over time provides reason to worry about how high-stakes tests are being used. The tests were designed to measure student mastery of academic material. They are now being used to allow students to advance academically, to judge the performance of individual teachers, to judge the performance of schools and school administrators, and finally to judge the programs that prepare the teachers. For test results to provide an unbiased estimate of preparation programs, principals would have to ignore teachers' track record when assigning them to classes with high stakes assessments, even when the future of both students and the administrators are at risk, and when administrators are constantly impressed with the importance of data-driven decision making (Houston, 2013). This is not realistic.
It has frequently been stated that that variance between classrooms within schools is larger than variance between schools (Nye, Konstantopoulos, & Hedges, 2004). Staiger & Rockoff (2010, p 103) conclude that "School leaders have very little ability to select effective teachers during the initial hiring process" and present as evidence "the fact that most of the variation in teacher effects occurs among teachers hired into the same school." Our results are different. In Tables 5 and 6, variation between schools is the largest random effect, followed by variation between teachers in schools, then followed by different classrooms of the same teacher. Thus, if we take into account both the variance between schools and the way we found teachers to be assigned, evidence indicates that principals hire teachers and assign them to classes based on information about their effectiveness.
This point may be particularly significant. The parallel teacher preparation system in Texas, spreading now to other states, represents an evolution of systems that enabled the growth of New York City Teaching Fellows and Teach for America (Darling-Hammond, Holtzman, Gatlin, & Heilig, 2005;Laczko-Kerr & Berliner, 2002). These programs were intended to demonstrate that smart students carefully chosen and rapidly prepared could obtain better student results than conventionally prepared teachers (Clark et al., 2013;Decker et al., 2004). The scale at which this could be carried out has had limits because it depends upon philanthropic contributions of hundreds of millions of dollars per year (Education Week, 2016). Texas-style alternative certification does not have this limit because companies have established a viable business model where in-service candidates pay the program costs. The quality control provided by Teach for America selection criteria appears to be missing, but that impression is almost certainly mistaken. It is instructive to examine accountability reports for the educator preparation programs (Texas Education Agency, 2018). In 2016-2017 the two largest alternative programs had over 30,000 applicants, admitted about 14,000 of them, had over 55,000 listed as program participants without having completed, and of those around 7000 had positions in schools. This indicates that district human resource departments and school principals are quite selective in whom they take from the alternative programs. The selection is based upon interviews and upon a more extensive examination of a candidate's vita than is apparent in any state datasets. The hiring process plays a critical role in mediating between preparation programs and student outcomes.

Conclusions
Our results lead to nuanced policy prescriptions, and in particular we must distinguish between implications in Texas and implications for other states. In Texas, the differences we found for students in Algebra I and Biology because of their teachers' pathway do not justify disruptions that would come from an abrupt policy change impacting alternative certification programs. As alternative certification developed steadily over the last 20 years to become a major contributor to Texas' teacher supply, the schools learned how to hire and assign teachers to increase student testscore gains. In Algebra I there are many subgroups of students for whom the advantages of having a teacher from a standard university program are significant, but in Biology the effects are weaker. One must keep in mind because of the shortage of STEM teachers that it is difficult to justify reducing teachers from any pathway. Slightly increasing the scores of low-income students on Algebra I exams but reducing the number able to take Physics or Chemistry at all would almost certainly be a very poor trade.
On the other hand, our results do not provide strong incentive for other states to follow Texas's lead in establishing a large for-profit alternative certification sector. As shown in Figures 3  and 4, the growth of alternative certification in Texas since the early 2000s has not led in the end to an increase in the production of STEM teachers. At best it has stemmed the decline. And if alternative certification were to grow much more rapidly in other states, as a disruptive innovation (Christensen, 1997), than it has in Texas, based on the for-profit companies' experiences but without corresponding ability in schools to select alternatively certified teachers wisely, the results might be quite different.
Caution is merited on all sides. Texas should be cautious about damaging the ecosystem that developed over the past two decades to prepare teachers. Other states should be cautious about rapidly introducing a parallel for-profit teacher preparation system that competes with universities. University faculty engaged in teacher preparation should be cautious about ignoring or dismissing alternative pathways with the capacity to prepare huge numbers of teachers in new ways, and be attentive to the ways that school hiring practices impact quality.
Around 700,000 undergraduates obtain STEM degrees from U.S. universities each year. This is an enormous pool; persuading just 1.5% more to obtain a teaching certificate along with their degree each year would add over 10,000 new STEM teachers. On balance, our evidence says the standard way of preparing teachers at universities in person is still the best. Teachers from standard pathways stay in teaching longer and their students learn more. If alternative certification does expand to ameliorate shortages, let this happen slowly and carefully. In view of the substantial extra time students spend preparing in standard pathways before full-time teaching, universities should consider what lessons can be learned from the alternative programs. At the same time, we urge renewed support for the preparation of STEM teachers through standard university pathways as the most efficient, scalable, and high-quality way to address the critical need for improved STEM education and to address the shortage of STEM teachers.