This article has been retrieved   times since September 6, 2002

   other vols.   |   abstracts   |   editors   |   board   |   submit   |   book reviews   |   subscribe   |   search


 

Education Policy Analysis Archives

Volume 10 Number 36

September 6, 2002

ISSN 1068-2341


A peer-reviewed scholarly journal
Editor: Gene V Glass
College of Education
Arizona State University

Copyright 2002, the EDUCATION POLICY ANALYSIS ARCHIVES.
Permission is hereby granted to copy any article if EPAA is credited and copies are not sold. EPAA is a project of the Education Policy Studies Laboratory.

Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education.


Research and Rhetoric on Teacher Certification:
A Response to "Teacher Certification Reconsidered"

Linda Darling-Hammond
Stanford University 1

Citation: Darling-Hammond, Linda. (2002, September 6). Research and rhetoric on teacher certification: A response to "Teacher Certification Reconsidered," Education Policy Analysis Archives, 10(36). Retrieved [date] from http://epaa.asu.edu/epaa/v10n36.html.

Abstract
In October, 2001, the Baltimore-based Abell Foundation issued a report purporting to prove that there is "no credible research that supports the use of teacher certification as a regulatory barrier to teaching" and urging the discontinuation of certification in Maryland. The report argued that large inequities in access to certified teachers for poor and minority students are not a problem because research linking teacher education to student achievement is flawed. In July, 2002, the U.S. Secretary of Education cited the Abell Foundation paper in his Annual Report on Teacher Quality as the sole source for concluding that teacher education does not contribute to teacher effectiveness. The Secretary's report then recommended that requirements for education coursework be eliminated from certification standards, and attendance at schools of education and student teaching be made optional. This article documents the many inaccuracies in the Abell Foundation paper and describes the actual findings of many of the studies it purports to review, as well as the findings of other studies it ignores. It details misrepresentations of a number of studies, including inaccurate statements about their methods and findings, false claims about their authors' views, and distortions of their data and conclusions. The article addresses methodological issues regarding the validity and interpretation of research. Finally, the article presents data challenging the Abell Foundation's unfounded claims that uncertified teachers are as effective as certified teachers, that teacher education makes no difference to teacher effectiveness, that verbal ability is the most important determinant of teaching effectiveness, that private schools staffed by uncertified teachers are more effective than public schools, and that untrained teachers are more qualified than prepared teachers. It concludes with a discussion of the policy issues that need to be addressed if all students are to be provided with highly qualified teachers.

 

In October, 2001, the Baltimore-based Abell Foundation issued a report purporting to prove that there is "no credible research that supports the use of teacher certification as a regulatory barrier to teaching" (Walsh, 2001, p. 5). (Note 2) The Abell Foundation paper argued against Maryland's efforts to strengthen teacher preparation requirements and defended the continuation of a local short-term alternative route into teaching that had come under criticism. Suggesting that "educators, policymakers, the media, and the public mistakenly equate teacher quality with teacher certification" (p. 1), Kate Walsh, the author of the paper, complained that efforts to improve education for poor and minority children in Baltimore by the state and local superintendents of schools and by local advocacy organizations foolishly sought to secure more fully certified teachers for their schools. She cited as wrong-headed newspaper articles raising concerns, for example, that: "Least prepared teachers are at worst city schools: One-third lack basic credentials for certification," (p. 1). Calling misguided the efforts of a Baltimore community group that released a study which "bemoaned the fact that more uncertified teachers were teaching in the city's high-poverty, predominantly African-American schools than the city's whiter, more affluent schools" (p. 2), the paper sought to demonstrate that these inequalities in access to certified teachers are not problematic if certification can be discounted as a determinant of achievement.

The Abell Foundation proposed that Maryland should 1) "eliminate the coursework requirements for teacher certification" and require only a bachelor's degree and a passing score on an appropriate teacher's exam; 2) "report the average verbal ability score of teachers in each school district and of teacher candidates graduating from the State's schools of education;" and 3) "devolve its responsibility for teacher qualification and selection to its 24 public school districts," delegating all hiring authority to individual school principals (pp. vii-viii).

Although these ideas might seem indefensible to those who are engaged in research regarding teacher preparation and recruitment, the U.S. Secretary of Education echoed these recommendations in his Annual Report on Teacher Quality (USDOE, 2002), a report on the national state of teacher quality required under the 1998 reauthorization of Title II of the Higher Education Act. In this report, the Secretary argued that teacher certification systems are "broken," imposing "burdensome requirements" for education coursework that make up "the bulk of current teacher certification regimes" (p. 8). The report argues that certification should be redefined to emphasize higher standards for verbal ability and content knowledge and to de-emphasize requirements for education coursework, making attendance at schools of education and student teaching optional and eliminating "other bureaucratic hurdles" (p. 19).

The report suggests that its recommendations are based on "solid research." However, only one reference among the report's 44 footnotes is to a peer-reviewed journal article (which is misquoted in the report); most are to newspaper articles or to documents published by advocacy organizations, some of these known for their vigorous opposition to teacher education. (Note 3) For the recommendation that education preparation be eliminated or made optional, the Secretary's report relies exclusively on the Abell Foundation's paper. Though written as a local rejoinder to Maryland's efforts to strengthen teacher preparation and certification, it appears to have become a foundation for federal policy.

This article includes the response I wrote to Walsh's paper (Note 4) when it was first issued, with some additions that respond to a reply she issued with Michael Podgursky (Note 5) and a briefer version of her report recently printed in Education Next, a magazine put out by the Hoover Institution (Walsh, 2002).

In order to make a case for her agenda, Walsh attacks all research that has found relationships between teachers' preparation and their measured effectiveness, including students' achievement. She characterizes much of the education research as "flawed, sloppy, aged and sometimes academically dishonest" (p. 13), a characterization that more aptly describes her own paper, which consistently misrepresents the statements of researchers, the findings of studies, and the evidence base for her claims. She claims to have reviewed all of the studies ever cited by proponents of teacher education. In fact, a large number of the references in the paper and appendix are not directly on the topic of teacher education, and many studies of teacher education effects are not included in the report. Furthermore, her paper does not actually review most of the studies it mentions. An original report appendix listing studies shrank from 175 in July, 2001 to fourteen in the version of the report released in October, 2001 selected according to no obvious criteria and omitting many of the most prominent studies on the topic. (Note 6) The "reviews" in a now separate appendix published on the foundation's website are generally not careful assessments of research methods or findings but a list of complaints and random observations—sometimes accurate but often not—about various aspects of the studies or how they have been cited by others. (A number of examples are included below.)

All studies have limitations, and some are too problematic to be relied upon, including a number that Walsh relies upon for her own assertions. However, Walsh's paper, which is littered with inaccuracies, misstatements, and misrepresentations, sheds little light on the research or its implications for teacher education and certification. In what follows I discuss the inaccuracies in Walsh's account, the actual findings of many of the studies she purports to review, and the findings of other studies she chooses to ignore, as well as the implications of her proposals for teachers, their knowledge, and the students they teach.

In the course of the paper, I review some of the studies that have found influences of teacher education and certification on student achievement at the levels of the individual teacher (e.g. Goldhaber & Brewer, 2000; Hawk, Coble, & Swanson, 1985; Monk, 1994); the school (Betts, Rueben, & Danenberg, 2000; Fetler, 1999); the school district (Ferguson, 1991; Strauss & Sawyer, 1986); and state (Darling-Hammond, 2000c). The convergence of findings in analyses using different units of analysis reinforces the strength of the inferences that might be drawn from any single study.

What are the Arguments?

The Abell Foundation report admits that teacher qualifications make a difference but it also tries to make a case that "the backgrounds and attributes characterizing effective teachers are more likely to be found outside the domain of schools of education. The teacher attribute found consistently to be most related to raising student achievement is verbal ability.... usually measured by short vocabulary tests..." (p. v). Later in the report, Walsh suggests that subject matter knowledge may be an additional criterion for hiring secondary teachers, but not for elementary teachers. Walsh objects to the state requirements regarding content coursework in each of the core academic areas for elementary teachers, since many who want to enter through the alternative Resident Teacher program in Maryland have had trouble meeting these requirements.

Walsh then tries to dismiss all studies that find evidence that knowledge about teaching also makes a different for teacher performance, or to claim that studies finding positive effects of teacher education or certification are either too old, too small, too highly aggregated, or dependent on evidence about teacher performance other than student achievement or are not really about certification after all, even if their authors say they are. She often does this by misrepresenting the studies' actual methods and findings, as I detail below.

While there are legitimate concerns to be raised about various studies in the literature—on all sides of the question—this article does not shed much light on them. A thorough review of the quality and accurately portrayed findings of the several bodies of research that bear on this question would be a service to this field. Unfortunately, this document's inaccuracies and misinterpretations make it of little use in this regard.

In what follows, I address five major issues regarding the Abell report and the research base on teaching and teacher education:

  1. Evidence Ignored. Evidence about student learning in reading and other areas documents the need for teachers to have professional knowledge that includes and extends beyond subject matter knowledge. The Abell Foundation report does not consider this evidence or answer the question of how teachers are to acquire this knowledge if they are not professionally prepared.

  2. Unfounded Claims. No evidence supports Walsh's claim that either verbal ability or subject matter knowledge alone makes teachers effective. She lacks supporting evidence—and fails to consider contradictory evidence—for her claims about the relative effectiveness of certified and uncertified teachers, the outcomes of teacher education, the primacy of verbal ability as the most important measure of teaching, the effectiveness of private and public schools and the preparation of their teachers, and the attributes of individuals who enter teaching without certification.

  3. Misrepresentations of Research. Walsh's claim that she has reviewed 100 to 200 studies cited in support of teacher education and found that "none of them holds up to scrutiny" is not true. In fact, she is unable to discount a number of important studies that support teacher education or certification. In addition, a large number of the studies relevant to the question of teacher education effects are not reviewed at all in Walsh's paper. Most of the studies she mentions do not concern teacher education or certification directly: at most 80 of the nearly 200 studies listed in the study or appendix are focused on teacher education or certification. A number of those reviewed are badly misrepresented, including inaccurate statements about their methods and findings, false claims about their authors' views, and distortions of their data and conclusions. Many are not reviewed for their methods and findings, but are dismissed because of their sample size, age, dependent variable, or publication venue—unless Walsh likes one of the findings, in which case she uses the study, sometimes after already having dismissed it. Even the studies that Walsh says she reviewed are missing from the appendix of the report, where she refers readers for evidence. (Note 7)

  4. Methodological Issues and Double Standards in Using Research. Walsh misunderstands some fundamental research design issues, including the difference between experimental and correlational studies and the interpretation of research conducted at different levels of aggregation. In her effort to make the evidence base about teacher education disappear, Walsh eliminates from consideration studies that have been cited regarding the contributions of various measures of teacher qualifications to teacher effectiveness if they have small sample sizes, if they were published more than 20 years ago, or if they were published as dissertations, technical reports, or conference papers rather than in peer-reviewed journals. She also eliminates all studies that use measures of teacher effectiveness other than student achievement (e.g. supervisors' ratings of performance, researchers' observation-based measures of teacher practice). There are legitimate issues associated with the sample size, age, quality assurance, and measurement that warrant discussion (see below). However, as a blanket means of eliminating evidence from consideration, this strategy is problematic, as Walsh's frequent citations of studies that fail to meet her own criteria suggest.

  5. Illogical Policy Conclusions. While it is clear that teacher certification systems are not perfect and there are many weak teacher education programs, points that I have frequently made in my own research, it does not follow that the response to these problems should be to eliminate expectations for teachers to acquire the knowledge they need to teach students effectively. The more appropriate policy response is to improve the quality of teacher education—a process that has been underway with important results in a number of states, and one that rests on the processes of accreditation and certification that provide policymakers with levers for change and improvement.

Evidence Ignored

While the Abell Foundation report claims that teachers do not need professional knowledge in order to teach, the field has been moving rapidly to codify the ways in which teaching knowledge makes a difference in student learning. For example, the National Reading Panel of the National Institute of Child Health and Human Development last year published a major review of carefully controlled research which found that children's reading achievement is improved by systematic teaching of phonemic awareness, guided repeated oral reading, direct and indirect vocabulary instruction with careful attention to readers' needs, and a combination of reading comprehension techniques that include metacognitive strategies.

The report notes that teacher education is critical to the success of reading instruction with respect to both instruction in phonemic awareness and more complex comprehension skills:

Knowing that all phonics programs are not the same brings with it the implication that teachers must themselves be educated about how to evaluate different programs to determine which ones are based on strong evidence and how they can most effectively use these programs in their own classrooms. It is therefore important that teachers be provided with evidence-based preservice training and ongoing inservice training to select (or develop) and implement the most appropriate phonics instruction effectively. (p. 11)

Teaching reading comprehension strategies to students at all grade levels is complex. Teachers not only must have a firm grasp of the content presented in the text, but also must have substantial knowledge of the strategies themselves, of which strategies are most effective for different students and types of content and of how best to teach and model strategy use.... (Data from the studies reviewed on teacher training) indicated clearly that in order for teachers to use strategies effectively, extensive formal instruction in reading comprehension is necessary, preferably beginning as early as pre-service (National Reading Panel, 2000, pp. 15-16).

Studies have documented that professional training can be effective in providing teachers with the strategies that enable them to teach these complex comprehension skills, and teachers who receive such training significantly improve students' reading outcomes (e.g, Duffy, Roehler, Sivan et al., 1987; Duffy & Roehler, 1989, regarding explicit strategy instruction; Palincsar & Brown, 1989, regarding reciprocal teaching).

Similar insights in our understanding of how to develop student proficiency in mathematics and science, and how to develop teachers' skills for doing so, have recently emerged. For example, recent analyses of the National Assessment of Educational Progress (NAEP) which control for student characteristics and a number of measures of school inputs have found that students whose teachers have majored in mathematics or mathematics education, who have had more pre- or in-service training in how to work with diverse student populations and more training in how to develop higher-order thinking skills, and who engage in more hands-on learning do better on the NAEP mathematics assessments. Similarly, students whose teachers have majored in science or science education and who have had more pre- or in-service training in how to develop laboratory skills and who engage in more hands-on learning do better on the NAEP science assessments (Weglinsky, 2000). (Note 8)

A recent review commissioned by the Department of Education, which was carefully vetted by a panel of researchers, disagreed with the Abell Foundation's conclusions. This review, which analyzed 57 studies that met specific research criteria and were published after 1980 in peer-reviewed journals, concluded that the available evidence demonstrates a relationship between teacher education and teacher effectiveness (Wilson, Floden, & Ferrini-Mundy, 2001). The review shows that empirical relationships between teacher qualifications and student achievement have been found across studies using different units of analysis and different measures of preparation and in studies that employ controls for students' socioeconomic status and prior academic performance.

It is ironic that just as the field is learning more about how to prepare teachers to teach children effectively, the Abell Foundation suggests that we truncate teacher education and end the certification policies that would encourage and enable teachers to acquire this knowledge—or at least that we do so for the children of the poor, who also attend school in districts with minimal resources for professional development. The unanswered question is, How are teachers to learn what is known about how to teach well if there are no expectations, incentives, or supports for them to do so?

Unfounded Claims

While ignoring these serious questions, Walsh makes a number of claims that are not supported either by the research she presents or by other evidence in the field. These include the following:

  • New teachers who are certified do not produce greater student gains than new teachers who are not certified.

  • There is little evidence that the content and skills taught in preservice education coursework is (sic) either retained or effective.

  • Verbal ability and subject matter alone are sufficient to produce effective teachers.

  • Private schools do not hire certified teachers and they are more effective than public schools.

  • Individuals with higher academic ability will be recruited to teaching if certification standards are eliminated.

The Effectiveness of Certified and Uncertified Teachers

For her proposition that "new teachers who are certified do not produce greater student gains than new teachers who are not certified," Walsh cites seven studies, none of which provides support for this proposition, and five of which actually provide evidence that contradicts her claim. Three of the studies (Bliss, 1992; Stoddart, 1992; Lutz & Hutton, 1989) include no data on student achievement at all, although Walsh elsewhere dismisses all other studies that do not use student achievement data as the dependent variable. (In a reply to my response, Walsh and Podgursky (2001) note that these studies have been deleted in a newly printed version, along with some studies Walsh cited that were not peer reviewed, "so that the report ... does not appear to convey a double standard" (p. 15)).

Six of the studies Walsh cites actually deal with alternatively certified rather than uncertified teachers—that is, teachers who had undertaken teacher education at the post-baccalaureate level in university- or school district-based programs that rearrange the way teacher education is delivered. The findings across the studies are mixed, but none of them shows that uncertified teachers do as well as certified teachers, and one of them shows that this is clearly not true. Several of the studies point instead to the value of teacher education: The more positive findings are found for the alternatives that provide more complete preparation.

  1. Bliss (1992) wrote about the Connecticut alternative certification program, a two-year training model which the author notes features "a significantly longer period of training than in any other alternate route program" in existence at that time (p. 52). This report does not examine uncertified teachers, nor does it meet Walsh's criteria for inclusion in a review of literature, because it includes no data about teacher effectiveness as gauged by student achievement measures. Bliss notes that most recruits reported their initial training to be helpful, and she briefly mentions results from another researcher's survey of recruits' supervisors which suggested mixed reviews of their performance: 33 percent of supervisors said that the alternate route teachers were weaker than others in classroom management (presumably, then, 67 percent said they were not weaker than others in this area), while 38 percent said they were stronger than others in teaching skills (and 62 percent presumably said they were not stronger than others in this area).

  2. Stoddart (1992) reports on the subject matter qualifications and attrition rates of recruits to the Los Angeles Teacher Trainee Program, also a two-year training model. She found that content qualifications were comparable to those of traditionally trained recruits, except for math recruits, who had lower GPAs than traditionally trained mathematics teachers, and that attrition rates for those who entered were relatively low in the first two years but higher than national rates after 5 years. (Note 9) Results cited by Stoddart from other studies about the observed practices of these teachers in comparison with university-trained teachers produced mixed results: university-trained English teachers appeared more skillful than alternate route teachers, but the levels of skill appeared lower for mathematics teachers from both groups.

  3. Lutz and Hutton (1989) compared the demographic characteristics, attitudes, certification test scores, and opinions of Dallas Public Schools' alternative certification (AC) recruits with other first year teachers in the district. Like the other studies noted above, this study did not examine student achievement gains of the recruits' students. The program provides summer training to recruits and then places them in mentored internships during the school year while they are completing other coursework. The study found many similarities but some differences between AC recruits and other first year teachers, including significantly lower rates of expected long-term continuation in teaching for the AC recruits (40% vs. 72% for other first year teachers). They also examined supervisors' perceptions of recruits—a measure that Walsh argues should eliminate other studies from consideration. These were positive for the 54% of the pool (59 out of 110) defined as "successful" interns in the study—those who completed the intern year without dropping out (10%) or being held back for another year or more due to 'deficiencies' in various areas of performance (36%). The study also reported data from another evaluation of the program by the Texas Education Agency (Mitchell, 1987), which surveyed principals, finding that:

    The principals rated the [traditionally-prepared] beginning teachers as more knowledgeable than the AC interns on the eight program variables: reading, discipline management, classroom organization, planning, essential elements, ESL methodology, instructional techniques, and instructional models. The ratings of the AC interns on nine other areas of knowledge typically included in teacher preparation programs were slightly below average in seven areas compared with those of beginning teachers. It might therefore be assumed that pre-service teacher education programs are doing something right! (p. 250).

    In the paragraph cited above, Lutz and Hutton wax enthusiastic about preservice teacher education programs that seemed in these data to outperform the alternative route. Later they wax enthusiastic about the alternative route, given results from another survey of principals, most of whom felt that alternative credential candidates who eventually made through the program were comparable to other beginning teachers. At the end of the piece, they note that the high attrition rates and difficulty maintaining the program suggest the alternate route will not likely be a long-term solution to teacher supply problems. Although Walsh cites Lutz and Hutton's enthusiastic feelings about the AC program, she does not accurately report the complete data from the study, including the low rates of successful program completion, the low rates of planned retention in teaching, and the mixed reviews of their performance. In her appendix, she includes this study with the following "review:" "Darling-Hammond ignores the unqualified authors' (sic) endorsement of the merits of alternative route to teaching...." One presumes that she means to reference the authors' "unqualified endorsement" rather than to call the authors themselves unqualified. Yet as the above excerpts make clear, the study does not provide an unqualified endorsement of the program.

    Walsh repeats this mistake in the appendix when she critiques a review of alternate certification programs (Darling-Hammond, 1992). She states that, "Darling-Hammond cites the findings from many studies that looked at alternative programs; but she does not include findings that show alternatively trained teachers are at least as effective at raising academic achievement as those who graduate from traditional programs," (p. A-3), citing Lutz and Hutton (1989), despite the fact that their study presented no empirical data on academic achievement of students and presented mixed evidence about the rated performance and retention rates of these recruits.

    Two other studies Walsh cites do include student achievement data, but they do not, as she states, compare certified with uncertified teachers. Both deal with alternatively certified teachers who receive a substantial amount of education coursework while they are undertaking mentored teaching supervised by both university supervisors and classroom mentors.

  4. Miller, McKenna, & McKenna (1998) is a matched comparison group study of what the study's authors call a "carefully constructed" university-based alternate route program for middle school teachers. Reflecting the characteristics of alternative routes endorsed by the National Commission on Teaching and America's Future (1996), this program offered 15 to 25 credit hours of coursework before interns entered classrooms where they were intensively supervised and assisted by both university supervisors and school-based mentors while they completed additional coursework needed to meet full standard state certification requirements. Forty-one of these teachers were compared to a group of 41 traditionally certified teachers matched for years of experience, using ratings of their teaching conducted by trained observers. Then student test score data were collected for 18 of these teachers. Although the sample size is too small to meet Walsh's criteria (Note 10) for studies worth considering (a point she seems to have forgotten here), and data are not provided on student pre-test scores, the study appears reasonably well-conducted.

    The traditionally trained teachers in this study felt somewhat more confident in their practice and scored slightly higher on the two sub-scales of an observation instrument used by trained observers to rate their teaching. However, these differences were not significant, and the authors report, without including the actual data analyses, that there were no significant differences in the student achievement of 18 teachers from the two groups by the 3rd year of practice after both had completed all of their education coursework. (The authors did not control for prior achievement levels of students; however, they stated that the initial differences in student achievement across groups were not significant.)

    Because the design of this program was so different from many quick-entry alternative routes, Miller, McKenna, and McKenna note that their studies "provide no solace for those who believe that anyone with a bachelor's degree can be placed in a classroom and expect to be equally successful as those having completed traditional education programs.... The three studies reported here support carefully constructed AC programs with extensive mentoring components, post-graduation training, regular in-service classes, and ongoing university supervision" (p. 174). This finding does not support Walsh's contentions throughout her paper that only general intelligence and subject matter knowledge make a difference for teacher effectiveness, her statement that uncertified teachers do as well as certified teachers, or her claim that there is no evidence which supports teacher education and certification.

  5. The other study on alternative certification cited favorably by Walsh (Bradshaw & Hawk, 1996) was not published as a peer-reviewed article or research report—one of Walsh's criteria for rejecting the results of other reports. It is actually not an empirical study but a literature review that, like other reviews Walsh criticizes, is based on a mixture of unpublished papers and on studies that, for the most part do not examine student achievement. Some of the papers cited do not include empirical evidence at all. Walsh characterizes the report's findings as providing "mixed, inconclusive" evidence. This is certainly true. Studies examining measures of knowledge, teacher beliefs and attitudes, teacher ratings, and student views report no differences on some measures and differences, typically favoring traditionally prepared teachers, on others, especially measures of professional knowledge and performance.

    With respect to student achievement, Bradshaw and Hawk list five papers that discuss outcomes for differently trained teachers. The first, an unpublished paper by Barnes, Salmon, and Wale (1989) does not present any empirical data or discussion of specific studies, but it includes a statement that two districts in Texas reportedly found equivalent outcomes for alternative and traditional program teachers. While it does not mention what programs might have been compared, it does include a table listing teacher education programs designated as alternatives. This list includes one- and two-year university-based master's programs (which are called "alternative" in Texas because they are not undergraduate models) along with district alternative programs that generally offer only a few weeks of summer training before teachers are assigned to classrooms. Thus, the "alternative" group included programs providing extensive graduate level training of the sort that many states would call 'traditional," along with programs that provide little formal preparation. Aside from the unanswered question of what analyses some unnamed parties might have been done to support assertions about relative effects, the wide range of program models included as "alternative" precludes any inferences about the effects of preparation on teacher effectiveness.

    A second study, by Denton & Peters (1988) provides another example of the definitional problems associated with the terms "alternative" and "traditional". This paper actually studied two versions of a university's college-based teacher education program. The one called "alternative" in their paper was in fact an expansion of the regular teacher education program, rather than a reduction in coursework. Graduates of this more extensive curriculum had students who had stronger performance in earth and physical sciences, while scores in mathematics were stronger for students of the regular teacher education program

    Of the remaining studies, two found that student achievement gains were higher for the students of traditionally prepared teachers in language arts (Gomez & Grobe, 1990, in a comparison with alternatively certified teachers) and mathematics (Hawk, Coble, & Swanson, 1985, in a comparison with uncertified mathematics teachers). The last (Stafford & Barrow, 1994) did not present original research but referenced studies reporting differences associated primarily with teaching experience between the performance of alternative program teachers, other first-year teachers, and experienced teachers.

    In combination, these studies do not provide any support for the statement that uncertified teachers are as effective as certified teachers. In addition to its other inaccuracies, Walsh's review confuses alternative certification—a strategy that provides candidates with preparation that is differently packaged from what various states deem "traditional" training (usually the difference is that training is post-baccalaureate rather than undergraduate and is streamlined into about a year rather than spread across four years of college)—with lack of certification—which generally indicates a lack of preparation. Having already missed this critical distinction, Walsh does not begin to attempt to sort out the effects of the differences in preparation experiences and outcomes associated with different models of teacher education. Thus, she does not note that program designs that include a comprehensive and coherent program of coursework and intensive mentoring (e.g. Miller, McKenna, & McKenna, 1998) have been found to produce more positive evaluations of candidate performance than models that forego most of this coursework and supervised support.

    For example, a comparative study of more than 200 alternative certification candidates in New Hampshire, who are certified via three years of on-the-job training in lieu of formal preparation, found they were rated by their principals significantly lower than university-prepared teachers on instructional skills and instructional planning, and they rated their own preparation significantly lower than did the university candidates (Jelmberg, 1995). To understand the outcomes of different approaches, studies of alternatives need to acknowledge the differences in program models.

    Finally, Walsh cites two additional studies that include uncertified teachers, but she gets the findings wrong. Neither study shows that uncertified teachers do as well as certified teachers. One shows that the reverse is true.

  6. In one study (Goldhaber & Brewer, 2000), the authors found that high school students who had a certified teacher in mathematics did significantly better, after controlling for initial achievement and student demographic factors, than those who had uncertified teachers. The same trends were true in science, but the influences were somewhat smaller. The effects of certification on achievement were larger than—and in addition to—the effects of a subject matter degree. In this sample, students of a small number of science teachers who held emergency or temporary certification (24 out of the 3,469 teachers in the overall sample) did no worse than the students of certified teachers, although they, too, did better than the students of uncertified teachers. Another analysis of these data (Darling-Hammond, Berry, & Thoreson, 2001) showed that in this sample most of the teachers on temporary / emergency certificates were experienced and most had education training comparable to that of the certified teachers. Most appeared to be already licensed teachers from out-of-state who were in the transition period to securing a new state license or experienced teachers teaching out of their main field. Only a third were new entrants whose characteristics may have suggested a content background with little education training. The students of this sub-sample of teachers had lower achievement gains in an analysis of co-variance that controlled for pre-test scores, content degrees, and experience than those of the more experienced and traditionally trained teachers.

  7. Finally, Walsh cites a recently released study of Teach for America (TFA) by Raymond et al. (2001). This study is relevant to Walsh's discussion of the Resident Teacher Program through which she notes that many TFA recruits enter teaching in Maryland. However, the study did not compare certified to uncertified teachers, as Walsh claims. Although they had the data to do so, the authors chose not to examine how TFA teachers performed in comparison to trained or certified teachers. The study examined the influences of TFA teachers on student achievement scores, using regression methods that controlled for teacher experience and school demographics; thus, the comparison was between TFA recruits and other inexperienced teachers in high-minority schools in Houston—where most underqualified teachers are placed. Since about 50% of Houston's new hires are uncertified and about 35% were found to lack a bachelors degree in the most recent year of the study, TFA recruits were compared to an extraordinarily underprepared set of teachers. In this comparison, students of TFA teachers did about as well as those of other inexperienced, largely untrained teachers, many of them without bachelors degrees. (Reviewers of this report have noted that the report should have compared TFA recruits to other BA holders and to prepared or certified teachers; based on the statistics shown, it is not clear that the results of these comparisons would be favorable to TFA.) (Note 11) Another study that compared TFA teachers to certified teachers found significantly higher scores for the students of certified teachers (Laczko-Kerr and Berliner, 2002). The Raymond et al. report also indicated that minority students in Houston, who are disproportionately taught by these underprepared teachers, lose ground academically each year. In addition, only about 50% of African American and Latino 9th graders in Houston graduate from high school four years later (Haney, 2000; NCES, 2000). It would be hard to argue that the assignment of so many underprepared teachers to these students has nothing to do with their lack of success.

    The TFA study found that students of experienced teachers performed significantly better than students of inexperienced teachers, including TFA recruits. Along with the report's finding that, over a three year period, between 60% and 100% of TFA candidates had left after their second year of teaching, this finding raises additional questions about Teach for America's contribution to the education of Houston students, since they do not stay long enough to gain the experience that could support student achievement. Earlier data from the Maryland Department of Education showed that TFA recruits in Baltimore had similar attrition rates, with 62 % gone by the third year of teaching (Darling-Hammond, 2000b).

    These high attrition rates resemble those found in some other studies of short-term alternative routes (Darling-Hammond, 2000c) and suggest another important outcome of teacher preparation policies. Both the Houston study and Walsh's own review indicate that experienced teachers are more effective than inexperienced teachers (Walsh, pp. 5-6), yet many short-term alternative program recruits leave quickly. Other research indicates that those who complete 5-year teacher education programs enter and stay in teaching at much higher rates than 4-year teacher education graduates, who stay in teaching at higher rates than teachers hired through alternatives offering only short-term summer training before full-time teaching (Andrew & Schwab, 1995; Darling-Hammond, 2000b). One reason for this might be the fact that 5-year program graduates typically have both a disciplinary major and a full-year of student teaching tightly integrated with education coursework.

    Student teaching appears to make a strong difference in teacher retention. In a longitudinal study of recent college graduates who entered teaching in 1993, a recent NCES report notes that recruits without student teaching—most common among untrained recruits or those who enter through shorter-term alternative routes—leave teaching at rates nearly twice as high as those who have had this kind of clinical training (Henke, Chen, & Geis, 2000). The authors noted:

    In comparison with new teachers who had less training in pedagogy, those with more training were less likely to have left teaching without returning by 1997. Fifteen percent of those who had student taught had left the profession and not returned by 1997, compared with 29 percent of those who had not student taught. Where as 14 percent of certified teachers had left by 1997, 49 percent of those without certification had not done so (p. 49).

    Findings about the high attrition rates of those hired without full preparation for teaching raise questions about the cost-effectiveness of a recruitment strategy that relies on teachers with little preparation who are likely to leave the profession before they can learn to become effective with children. Meanwhile, the children they have taught—almost always the most disadvantaged students in the most disadvantaged schools—have not had the benefit of a teacher with either professional knowledge or experience—two sources of greater teaching skill.

    A recent study in Texas showed that teacher attrition costs school systems at least $8,000 for each recruit who leaves in the first few years of teaching (Texas Center for Educational Research, 2000). It estimated that the high attrition of beginning teachers in Texas, a growing number of whom enter with little or no preparation and receive few supports in learning to teach, costs the state more than $200 million per year (p. 16). This and other studies of teacher attrition suggest that policymakers should consider both teaching effects and retention patterns when they think about how to recruit and prepare teachers.

    Walsh chooses to ignore other studies showing that certified teachers do better than uncertified teachers.

  8. One of these by Hawk, Coble, & Swanson (1985), entitled "Certification: It Does Matter," found—in contradiction to Walsh's statement cited above—that teachers' certification in mathematics has a large and statistically significant effect on student achievement gains in both general mathematics and, to an even greater extent, in algebra. It compared pre- and post-test scores of students whose teachers who were certified in mathematics as compared to those of teachers with similar levels of experience who were uncertified in mathematics. This study is dismissed in one part of Walsh's review as too small (p. 34), so that its findings can be discounted with respect to certification. However, the size of the study does not appear to matter to Walsh when she chooses to cite it as a basis for arguing that only subject matter makes a difference to teaching effectiveness (p. 65). This double standard about the use of research permeates the report. A study is declared inadequate when it finds any contribution of teacher education or certification to any measure of teacher effectiveness but a study of comparable size or methodology—often the same study—is embraced elsewhere and used to support a different argument.

    While the study does have a small sample size (it examined 36 teachers, paired by school, course, and ability level of students being taught and the 826 students they taught), it is a reasonably well-controlled matched comparison design. The study does support the idea that subject matter knowledge matters to teaching. However, Walsh misrepresents the study as suggesting that only subject matter knowledge matters. The study did not directly examine the isolated effects of subject matter knowledge but the combined effects of subject matter knowledge and educational knowledge—including methods courses in the teaching of the content area—that are part of the certification requirements for an in-field credential. Authors Hawk, Coble, and Swanson concluded:

    The results of this study lend support to maintaining certification requirements as a mechanism to assure the public of qualified classroom teachers... " (p. 15). (Note 12)

    As this and other studies reviewed here suggest, content knowledge in combination with content pedagogical knowledge—that is, knowledge about how to teach the content, which, together with student teaching, constitute the major components of certification—appear to make contributions to student learning that exceed the contributions of either component individually. An important policy point from this and other studies of certification is the fact that teachers would not have been guided or encouraged to acquire the content knowledge and content pedagogical knowledge represented by in-field certification unless there were certification requirements. While Walsh and the Fordham Foundation manifesto she endorses would turn all hiring decisions over to principals, it was principals in these schools—and in many others across the country—who hired and assigned out-of-field teachers to teach mathematics as well as other subjects (Ingersoll, 1998). In a policy world that eliminates teacher certification, there would be no barrier to that practice occurring on an even more widespread basis.

  9. Another, much larger study resulted in similar findings about teacher certification in California. Fetler (1999) examined the relationship between school scores on the state's mathematics test and teachers' average experience levels and certification status in 795 high schools, after controlling for student poverty rates and test participation rates. It found that the percent of teachers on emergency credentials exerted a strong and highly significant negative influence on student achievement. The author concluded that, "After factoring out the effects of poverty, teacher experience and preparation are significantly related to achievement" (p. 13).

    This study is cited but never discussed in Walsh's revised report. In her original appendix, Walsh applauded the study's methods but then sought to dismiss its findings with two inaccurate assertions. First, she suggested, incorrectly, that the study's results pertained to subject matter knowledge alone, not to the combination of subject matter and teaching knowledge represented by certification. She misread both the study and the requirements of California's credentialing system to make this claim, appearing to believe that individuals who have passed only the subject matter requirement of a content test are granted full credentials in California (they are not), that individuals who are certified through internship programs (California's alternative route) do not have to complete pedagogical requirements (this is false), and that individuals are hired on emergency permits solely if they lack content knowledge (this is also false). (Note 13) Walsh also suggested, incorrectly, that the study "may have some basic methodology problems, by reaching conclusions using aggregated state-wide data." However, all of the study's data are aggregated to the school level, not the state level. (See the author's confirmation of this statement, below.) In the original appendix, (Note 14) Walsh stated:

    The article would be only be of interest if someone tried to assert that a teacher who knows no math could be a good math teacher. Any attempt to use this study as evidence against the practice of hiring alternatively trained teachers, as appears to be Darling-Hammond's implies (sic) and as Wilson et al. interpret it, loses all of its impact after reading Fetler.... In fact the author.... is primarily advocating ensuring that math teachers take more subject matter coursework, and is clearly disinterested in any effect that may be had from coursework in "professional knowledge."

    The author, Mark Fetler, took strong issue with this interpretation of his findings. When I shared Walsh's statement with Fetler, he wrote in reply:

    I am surprised that Kate Walsh makes those statements. I had a brief telephone conversation with her, but she was not forthcoming about her intent. Meeting the subject matter requirement involves both knowing the topic, e.g., Algebra, and the specific procedures needed to teach it in the classroom. Someone who knows how to solve quadratic equations, but does not know how to convey that information to children in a classroom, is a poor teacher. Both math subject knowledge and math pedagogy are essential. I believe that my study is consistent with these statements.... I would be surprised to hear of any research that demonstrated successful teaching that lacked either of those elements. My study supports the importance of appropriate credentials. Supposing that you could find people who know math to teach, if they lack the ability to communicate effectively with children, they will not succeed in the classroom and will create dissatisfied students, parents, colleagues, administrators, and board members. It will be a mess. Higher standards, not lower, are the solution.

    Fetler also noted that, "the unit of analysis in my paper is the school. It is not based on statewide aggregated data."

    Two other recent school-level studies in California have found significant negative relationships between average student scores on the state examinations and the percentage of teachers on emergency permits, after controlling for student socioeconomic status and other school characteristics (Betts, Rueben, & Dannenberg, 2000; Goe, forthcoming). Like Fetler's study, these studies also found smaller positive relationships between student scores and teacher experience levels, with negative effects on student achievement associated with the proportion of beginning teachers.

    California's experience is a good example of what happens when pressures and supports for hiring credentialed teachers are relaxed. After nearly a decade of inadequate and unequal salaries, easy access to emergency permits and waivers, and few incentives for the training and equitable distribution of qualified teachers for high-need fields and locations, California, now one of the lowest-achieving states in the nation, found itself with more than 40,000 teachers teaching on emergency permits or waivers by 1999-2000. The vast majority of these teachers were teaching in a small number of urban school systems in schools with the highest proportions of low-income students and students of color. High-minority schools were nearly seven times as likely to have uncredentialed teachers as low-minority schools. Low-achieving schools were nearly five times as likely to have uncredentialed teachers as high-achieving schools (Note 15) (Shields et al., 2000, pp. 41-43).

    These results mirror those already noted in Baltimore, Houston, and other cities. The pattern appears across the country. For example, a recent series in the Chicago "Sun Times" (Note 16) documented that "children in the state's lowest-scoring, highest-minority and highest-poverty schools were roughly five times more likely to have teachers who had flunked at least one certification test" and were least likely to have teachers who were "correctly certified." The burden should be on those who argue against efforts to ensure minimally qualified teachers for all students to prove that the confluence of race, poverty, and low achievement with the presence of untrained and uncertified teachers does not further disadvantage our nation's most vulnerable students.

Evidence about Preservice Teacher Education

For the proposition that "there is little evidence that the content and skills taught in preservice education coursework is (sic) either retained or effective" (p. 7), Walsh cites two articles (Murnane, 1983; Veenman, 1984) from among the many dozens of studies of teacher education that could have been retrieved from the peer-reviewed literature, had she done a search. Both of these are very old pieces, published long before recent reforms in teacher education. Neither of them makes any statement in support of Walsh's claim.

  1. Veenman (1984) describes the most frequently cited problems by novice teachers. These included concerns about topics ranging from classroom management to teaching loads and class sizes. Nowhere in the article does he suggest that what teachers learned in preservice education was not retained or effective. In fact, he notes that researchers should look more to the conditions of schooling than to teacher education for explanations for many of the problems beginning teachers cite. Veenman notes that the outcomes of teacher education may vary by characteristics of programs, citing studies finding that those who had had more intense student teaching, more competency-oriented teacher education coursework, or who were more satisfied with their teacher education experiences reported fewer problems in the classroom.

  2. Murnane's (1983) article is not an empirical study but a brief commentary on the work of another author who proposed the development of doctoral degrees for teacher leaders. While he questions the value of doctoral education for developing pedagogical skills (as would I), Murnane is careful to point out that there are forms of teacher education that may be helpful, and that lack of evidence in large data sets about the effects of preservice education may be related to the lack of data collected on the topic at that time, nearly 20 years ago. (See additional discussion of this point under "Evidence about Verbal Ability" below.)

  3. Walsh ignores the findings of other studies on this topic, including some she has cited for other propositions. She criticizes Evertson, Hawley, and Zlotnik (1985) for their interpretion of the findings of Edward Begle (1979), "a respected mathematician" regarding his findings about teachers' subject matter preparation (p. 34). In one of the few early data sets providing evidence about teacher preparation—a mammoth study of 112,000 students conducted through the National Longitudinal Study of Mathematical Abilities—Begle (reported in Begle & Geeslin, 1972 and, with additional data, in Begle, 1979) found that measures of teacher subject matter knowledge did not exert strong influences on student achievement. He also found that coursework in mathematics methods had a stronger effect on student achievement than higher-level coursework in the subject matter (discussed in Begle, 1979). On the lack of influence of subject matter knowledge in his earlier study (Begle & Geeslin, 1972) Begle noted, and Walsh reports, that the teachers in the study may have had stronger content knowledge than the norm, since they had all been accepted to a National Science Foundation Summer Institute. This is an appropriate point.

    However, Walsh chooses to ignore Begle's findings about the value of education coursework. She does not explain why. Walsh cites Begle's work at several points in her text, and refers readers to her appendix for a review of his work that is no longer there. In her separately-published appendix, Walsh admits of Begle (1979) that, "this is a scholarly work, employing defensible analyses at the time it was written for examining the data." She then nonetheless sought to dismiss it with a vague statement about possible aggregation bias (although achievement data were aggregated only to the classroom level), "too many variables" in the data set, and "much greater variance in the number of subject matter courses teachers took than the number of methodology courses they took." This last complaint is particularly odd. The implications of greater variability in subject matter courses contradicts the point she makes above about the possibly high levels of subject matter knowledge among sample members (in re: Begle & Geeslin, 1972). In fact, wider variability would generally make it easier to find effects, if they are there to be found, rather than harder. In another instance (regarding Byrne, 1983), Walsh notes, correctly, that the limited variability in subject matter coursework levels may have made effects more difficult to find. Walsh seems confused about the research findings and their implications but clear about her goal of discrediting any results that support the value of teachers learning about how to teach their content to others.

  4. Monk (1994) offers similar findings on this question from a more recent data set that incorporates more fine-grained variables about teacher education. Using data on 2,829 students from the Longitudinal Study of American Youth, Monk (1994) found that teachers' content preparation, as measured by coursework in the subject field, is positively related to student achievement in mathematics and science, but he notes that the relationship is curvilinear, with diminishing returns to student achievement of teachers' subject matter courses above a threshold level (e.g., five courses in mathematics). In addition, teacher education coursework (e.g. methods courses in the content area) had a positive effect on student learning in mathematics, exhibiting "more powerful effects than additional preparation in the content area" (p. 142). Monk concluded that "a good grasp of one's subject area is a necessary but not a sufficient condition for effective teaching" (p. 142).

    Monk told me that when Walsh first shared her brief appendix review of his work with him, he was surprised that she had used his work to emphasize the importance of subject matter knowledge without acknowledging his findings on the value of education courses. He noted in an email to me that he had communicated to Walsh that:

    My study of relationships between teacher course taking experiences and subsequent student gains in performance showed that the number of both content courses and content-specific pedagogy courses in a teacher's background is positively related to pupil test score gains in the relevant content area. It is misleading to report the positive results for the content courses and to not acknowledge the positive results for the pedagogy courses.

    After Monk communicated with Walsh, she did acknowledge in her appendix that Monk's study provides support for the contention that education coursework has a positive effect on teaching performance; however, she did not incorporate this admission in her claims that "not one" of the studies ever cited on this topic provides such support.

  5. In addition to newer databases that allow some large-scale examinations of the influences of teacher education variables on student achievement, recent studies have begun to look at the outcomes of different teacher education program designs. For example, studies of 5-year teacher education programs—programs that include a bachelor's degree in the discipline plus an additional year of education study and extended student teaching—have found graduates to be more confident and better rated than graduates of 4-year programs in the same institutions and as effective as more senior teachers, as well as more likely to enter and remain in teaching (Andrew & Schwab, 1995; Denton & Peters, 1988). Walsh does not review or cite any of these studies, even those that were available for her information from previous research she claims to have scrutinized.

The Influence of Verbal Ability on Teacher Effectiveness

There is little disagreement about the fact that verbal ability and subject matter knowledge influence teacher effectiveness, although Walsh tries to set up a straw man by suggesting, inaccurately, that some researchers, including myself, have argued otherwise. (See the section on "Misrepresentations of Research" below.) There are two areas of real disagreement, however. One is whether verbal ability alone is the only or best measure of teacher effectiveness. The other is how to evaluate the size of relative contributions of various kinds of knowledge to teacher effectiveness.

As examples cited earlier illustrate, the literature on teacher characteristics and their effects on teacher performance has been a captive of the measures most likely to be available in large data sets at any moment in time. While there are many studies evaluating the influences of teachers' standardized test scores, especially measures of verbal or general academic ability, because these variables have been readily available in large-scale data sets since the 1960s, data on teachers' course-taking backgrounds or teacher education experiences have been included in large data sets only since the early 1990s. Thus, there are more studies finding influences of variables that have most often been measured.

Finally, most of the studies that have included measures of verbal ability or content knowledge have not included measures of teacher education or certification. In a recent review, Wayne and Youngs (in press) found five studies that observed relationships between measures of teachers' verbal or general academic ability and student achievement and that met the standard of having controlled for students' socioeconomic status and prior achievement. Four of these studies employed data sets from the 1960s and 1970s and none of the five included measures of teacher education or certification. Looking across studies in these different eras, in many cases, the relative effect sizes of verbal ability measures are no larger than those of teacher education and certification measures in the studies that use these instead.

  1. Walsh uses an article by Murnane (1983) written nearly 20 years ago to argue for the primacy of verbal ability as a correlate of teacher effectiveness. She states, illogically, that, "to concede this relationship would mean acknowledging that formal teacher preparation is not as critical to student achievement as some would advocate" (p. 41). However, Murnane pointed out in his article that evidence about the influence of verbal ability was partly a function of the fact that teachers' standardized test scores were one of the few variables about teachers available in large-scale databases at that time, which did not include good measures of teacher education. In discussing the results on verbal ability, he diverges from Walsh's interpretation, stating:

    Clearly one should not interpret these results as indicating that intellectual ability should be the sole criterion used in recruiting teachers or that formal teacher training cannot make a difference. In fact, the lack of evidence supporting formal preservice training as a source of competence may be to some extent a result of limitations in the available data. For example, all databases suitable for examining the correlates of teaching effectiveness as measured by student achievement gains pertain to a single school district. Since there is less variation in training among teachers within a district than among teachers in the country at large, these databases do not permit the most powerful possible tests of the efficacy of alternative teacher training programs (p. 565).

  2. Walsh tries to use another article by Greenwald, Hedges, and Laine (1996) as evidence that verbal ability is the only critical variable influencing teacher effectiveness, and misrepresents a communication she had with Larry Hedges, one of the study's authors, regarding the appropriate interpretation of his findings. Characterizing Greenwald, Hedges, and Laine's article as "a sound review of 60 studies," she then criticizes a direct reference to its findings in a report by the National Commission on Teaching and America's Future (Walsh, p. 17). Her criticism first alludes, incorrectly, to a chart in the Commission's report (which in fact referred to another study, (Note 17)) then she criticizes the interpretation of the chart. The correct chart in the Commission's report (Figure 5, entitled "Effects of Educational Investments" in Darling-Hammond, 1997, p. 9) was reproduced directly from Greenwald, Hedges, and Laine's table 7, column 1 (p. 379) with the same variable labels and statistics as presented in the original source. It describes the size of increase in student achievement for every $500 spent on several different kinds of investments. Here is a reproduction of the table from Greenwald et al.'s study:

    Table 7
    The effect of $500a per student on achievementb

    Sample

    Input Variable

    Full Analysis

    Publication bias robustness

    Per pupil expenditure

    0.15

    0.15

    Teacher education

    0.22

    0.20

    Teacher experience

    0.18

    0.17

    Teacher salary

    0.16

    0.08

    Teacher/pupil ratio

    0.04

    0.04

    a1993-94 dollars
    bAll achievement outcomes are in standard deviation units.

    In explaining the table, study authors noted that

    The magnitudes (of the effects) for teacher education and teacher experience are higher than, but of the same magnitude, as PPE (per pupil expenditures). That is, one would expect comparable and substantial increases in achievement if resources were targeted to selecting (or retaining) more educated or more experienced teachers. (p. 380)

    The Commission used this finding, as Greenwald, Hedges, and Laine had done, as an indicator that investments in teacher education showed stronger influences on pupil achievement gains than investments in other resources, like reduced teacher/pupil ratios. We noted in discussing their overall study that the authors had found evidence of the influences of teacher ability and experience, along with teacher education. However, Walsh criticizes the Commission's two-sentence characterization of the research (which she calls a discussion "in considerable detail") for failing to note that Greenwald, Hedges, and Laine found more studies supporting the influences of teacher verbal ability on achievement than what they labeled "teacher education" (measured in their study as masters degrees because this was the most widely used measure in large data sets.) She suggests that Hedges disagrees with the Commission's characterization, a view that Hedges clarified was inaccurate when I spoke to him. He indicated that Walsh had not revealed her interpretation of his findings when she contacted him, and wrote the following to explain his own view of the proper interpretation of his findings:

    It is true that the relationship between teacher verbal ability and student achievement is relatively large and consistent across the few studies that have examined it. However this does not imply that investing in teacher ability (among possibly poorly qualified teachers) is a cost effective way to enhance student achievement. There are two reasons. First, teacher ability (among qualified teachers) may be more expensive than other resources that could be purchased to improve achievement. That is, there could be a strong relationship but high cost. Second, and more important, the relations found in the studies Greenwald, Hedges, and Laine (1996) reviewed were studies of practicing teachers. There is no reason to expect that the same relation holds among those who are not part of the teaching workforce.

    The point here, similar to that made by Murnane (above), is not that verbal ability is not important, but that the evidence does not prove it is the only important contributor or the most efficient way to achieve teacher effectiveness. In fact, most current certification systems combine tests of basic skills and general academic ability, subject matter, and teaching knowledge with evidence of successful supervised clinical experience and coursework focused on teaching knowledge and skills to help candidates assemble many sources of expertise in a more coherent way than would otherwise be the case.

    In pursuit of her argument that only verbal ability makes a difference, Walsh seeks to discount other studies that have found strong influences of teacher certification test scores on teacher effectiveness as being relevant only to the measurement of verbal ability and irrelevant to the broader question of teacher certification. These studies are also misrepresented.

  3. In her discussion of Schalock (1979) in the appendix (B13), Walsh seeks to dismiss his review's findings about the limited evidence regarding the relationships between teachers' measured intelligence and other indicators of effectiveness because the review is "old, old!!" and because, she argues, "More recent research such as Summers and Wolfe, 1977; Ferguson, 1991; Ferguson & Womack, 1996 (sic); Murnane, 1983; Hanushek, 1971; Strauss and Sawyer, 1986 suggest that intelligence (measured by SAT, verbal ability tests and college selectivity) are indeed substantially important."

    Aside from the facts that two of these "more recent" studies pre-date the review she dismisses as "old, old!" and one (Murnane, 1983) is not a study at all, Walsh here cites two studies that she dismisses elsewhere for "aggregation bias" (Ferguson, 1991 and Strauss & Sawyer, 1986, see Walsh, p. 27) and another (Ferguson & Womack, 1993) that she dismisses without stating a reason (see discussion of Wilson et al., in Appendix B). (Note 18) Walsh's readers are referred to Appendix B for reviews of these issues, but the studies are not included there.

  4. Walsh cites Ferguson (1991) for a number of her propositions, including the fact that teacher quality matters (p. 5), that teacher race does not matter (p. 6), and that verbal ability matters (p. 6). Later, she claims—when she wants to dismiss the study for its findings about teacher education and certification—that the study suffers from aggregation bias, a concern I address in the next section on methodological issues. Ferguson's analysis of nearly 900 Texas school districts controlled for student background and district characteristics; he found that combined measures of teachers' expertise—scores on a state teacher licensing examination, master's degrees, and experience—accounted for more of the inter-district variation in students' reading and mathematics achievement (and achievement gains) in grades 1 through 11 than student socioeconomic status. An additional, smaller contribution to student achievement was made by lower pupil-teacher ratios and smaller schools in the elementary grades. The effects were so strong, and the variations in teacher expertise so great, that after controlling for socioeconomic status, the large disparities in achievement between black and white students were almost entirely accounted for by differences in the qualifications of their teachers.

    As I noted in an earlier review of this study (Darling-Hammond, 2000c), of the teacher qualifications variables, the strongest relationship was found for scores on the TECAT, a state licensing examination described by the test developer as a test that measures basic skills and professional knowledge. The Texas Education Agency's published outline of the test content shows that it seeks to measure verbal ability, logical thinking, research skills, and a set of items on professional knowledge. Walsh takes issue with this description of the test and argues that the study does not support the value of teacher certification because the test should be considered primarily a basic literacy test. In Walsh's view, this makes it irrelevant to the question of teacher certification—even though it is required for teachers to maintain their certification. She also argues that the relatively smaller influence of master's degrees in Ferguson's study (which accounted for about 5% of the explained variance) means that teacher education is unimportant, and she criticizes the fact that I discuss the three variables associated with teacher quality (TECAT scores, experience, and masters degrees) in combination, although this is also the way in which Ferguson discusses them at several points in his analysis.

    Walsh's arguments are illogical in several ways. First, while it is true the TECAT measures basic skills, it also measures other academic abilities and professional knowledge, as confirmed by the test maker's documentation and administering agency's descriptions. There is no basis for making judgments contrary to the claims of the developers. In addition, the test would not exist at all if there were not a state certification system requiring it. Like all of the other variables one can evaluate in studies of this kind, the test scores are a rough proxy for many aspects of teacher capacity that may matter for their performance. In a regression equation of this sort where one variable stands in for others for which data are not available, it undoubtedly captures the effects of other unmeasured factors. Even if it were true that the test was a weak measure of professional knowledge, this would not mean that professional knowledge is unimportant or that verbal ability is the only important variable for predicting teaching ability. Only a better measure of professional knowledge (coursework or a more in-depth test of teaching knowledge) would allow a test of this question. Finally, as Hedges notes above, since the Ferguson study was based on practicing teachers, its findings do not shed light on the relative effectiveness of non-teachers who might score differently on the tests.

    Masters degrees and experience are other very partial measures of teacher knowledge and skill that show a modest effect in this study and a larger effect in Ferguson and Ladd's (1996) similar study in Alabama that included a weaker test measure of pre-college general skills (the ACT), which is not designed to capture knowledge relevant to teaching. However, masters degrees are also a very crude proxy for teacher education, given the wide variability in the content of masters degrees pursued by teachers, many of which have been pointed at jobs outside of teaching, such as administration, counseling, measurement and evaluation. In fact, aside from MAT preparation programs in a small number of institutions and specialist programs for reading and special education, there were few masters degree programs for the study of teaching until the recent advent of 5-year teacher education programs and masters degrees developed around the National Board for Professional Teaching Standards that focus on content pedagogy. Thus, there is reason to expect that some masters degree studies would affect teaching ability, but not much reason to expect the effect of masters degrees as an undifferentiated variable to be uniform or large in the aggregate, a point I have made in earlier commentary (Darling-Hammond, 2000a). Goldhaber and Brewer (1998, 2000) have made the same point and have completed research that documents the greater influence of both bachelors and masters degrees in the content area taught (e.g. mathematics or mathematics education) as compared to undifferentiated degrees.

    It makes more sense to consider these variables together as proxies for expertise than to treat them as mythically precise measures of totally unrelated constructs. As I have argued elsewhere, research on teaching suggests a view of expertise that includes general knowledge and ability, verbal ability, and subject matter knowledge as foundations; abilities to plan, organize, and implement complex tasks as additional factors; knowledge of teaching, learning, and children as critical for translating ideas into useful learning experiences; and experience as a basis for aggregating and applying knowledge in nonroutine situations (Darling-Hammond, 2000a). David Berliner's studies of expertise in teaching, for example, include experience along with several other traits as a critical aspect of expertise (see e.g. Berliner, 1986). All of these factors combine to make teachers effective; furthermore, one cannot fully partial out the effects of one factor as opposed to another as many are highly correlated.

  5. Walsh also cites Strauss and Sawyer (1986) for her proposition that verbal ability matters (p. 6), but fails to report the study's actual findings and seems unconcerned that it might suffer from "aggregation bias." In a study of 145 school districts in North Carolina, these researchers found that teachers' average scores on the National Teacher Examinations (NTE) had a strong influence on average school district test performance. Although the authors did not specify which portion(s) of the NTE were used as measures, the Weighted Common Examinations Test (WCET) was required in North Carolina at that time The WCET included separate subtests measuring general knowledge and professional knowledge about teaching. Walsh apparently wants to count this as a test of verbal ability, but does not acknowledge the Professional Knowledge Examination portion of the test.

    The authors found that, taking into account per-capita income, student race, district capital assets, student plans to attend college, and pupil/teacher ratios, teachers' certification test scores had a strikingly large effect on students' failure rates on the state competency examinations: a 1% increase in teacher quality (as measured by NTE scores) was associated with a 3 to 5% decline in the percentage of students failing the exam. The authors' conclusion is similar to Ferguson's (1991):

    Of the inputs which are potentially policy-controllable (teacher quality, teacher numbers via the pupil-teacher ratio and capital stock), our analysis indicates quite clearly that improving the quality of teachers in the classroom will do more for students who are most educationally at risk, those prone to fail, than reducing the class size or improving the capital stock by any reasonable margin which would be available to policy makers (p. 47).

    The same illogic holds in regards to the dismissal of this study as the previous one.

    In addition to questions about the content of tests used in various studies, the measures that appear in large data sets are always relatively crude proxies for the constructs under study, so it is impossible to know with great precision exactly what trait is being represented when a variable shows an effect. For example, scores on tests of academic ability like the SAT have generally been strongly correlated with scores on ETS subject matter and professional knowledge tests (Gitomer, Latham, and Zimek, 1999); in eras when higher degrees were less common (e.g. pre-1980), verbal ability scores were also strongly correlated with masters degrees. Where certification tests are in place, test scores correlate with certification status. And both certification status and masters degrees typically correlate with teacher experience, since most states require teachers to obtain certification in order to remain in the workforce and most teachers have traditionally secured masters degrees by taking courses over time while teaching. (This is changing to some extent where beginning teachers are being trained in post-baccalaureate or 5-year programs and sometimes enter the workforce with a masters degree).

    These interrelationships do not invalidate studies that have used one or more of these variables, but they are one reason why it is difficult to say with certainty which of these measures—or other unmeasured variables that are related to them—are associated with measured effects. The correlational studies that Walsh relies on almost exclusively do not establish causation; they point to possible relationships for further, more fine-grained exploration. However, Walsh often dismisses other large studies and the more fine-grained studies from consideration, at least when the findings do not suit her predilections.

  6. Walsh also cites Ferguson & Womack (1993) for her proposition that verbal ability matters most, although the reason for this is unclear. This study of more than 250 candidates from a single teacher education program examined the influences on 13 dimensions of teaching performance of education and subject matter coursework, NTE subject matter test scores, and GPA in the student's major. The ratings of performance were based on detailed descriptors of teaching on 107 items evaluated by subject matter specialists and education supervisors. The authors found that the amount of education coursework completed by teachers explained more than four times the variance in teacher performance than did measures of content knowledge (NTE specialty scores and GPA in the major). It is possible that Walsh cites this study as support for verbal ability influences because she has confused the NTE specialty tests of subject matter knowledge with other components of the NTE battery measuring general academic ability. In any event, the strength of the relationship was very small. Given her willingness to cite the study for a very weak finding about verbal ability, it is interesting that she does not cite it for its much stronger finding that education coursework mattered for teaching performance.

    In her separately-published appendix, Walsh seeks to dismiss the Ferguson & Womack study because it is limited to a single institution (Note 19) and uses "supervisor's evaluations" as the measure of performance. As noted earlier, she is willing to use studies based on such measures for her own claims, despite her assertions that they should not be included. More important, in this study the ratings are not the global ratings from school principals that have often been found to be relatively low in reliability. They are lower-inference ratings based on a detailed protocol used by subject matter specialists and university supervisors, which are typically more reliable. In addition, the limitations on generalizability created by the use of a single institution are not fatal to consideration of the findings. They require that the study be considered in the context of other studies on similar questions using different samples. Such studies have been conducted.

  7. In a similar study which compared relative influences of different kinds of knowledge on 12 dimensions of teacher performance for more than 270 teachers, Guyton and Farokhi (1987) found consistent strong, positive relationships between teacher education coursework performance and teacher performance in the classroom as measured through a standardized observation instrument (the Georgia Teacher Performance Assessment Instrument), while relationships between classroom performance and subject matter test scores were positive but insignificant and relationships between classroom performance and basic academic skill scores were almost nonexistent. (The two measures of basic academic skills were the Georgia Regents' test, a required examination for public university students, for which the researchers used reading and essay scores, and the states' Teacher Competency Test.)

    The researchers noted that extensive reliability studies had been conducted to support the reliability of the TPAI performance measure, which was used statewide as an assessment for certification. Walsh eliminates this study from consideration because it is a single institution study and refers the reader to Appendix B for her review (p. 25). In her appendix, Walsh criticizes the study for its reliance on supervisors' ratings, again failing to distinguish the research on principals' general teacher evaluation ratings from the research on the reliability of the TPAI as an observational instrument. She also apparently failed to read the study carefully, questioning why the numbers of teachers differ for various comparisons, not having noted the authors' explanation that all correlations depended upon the number of teachers for whom data on both variables were available (p. B11).

    Whereas Walsh tries to paint an unambiguous picture about the value of such measures as verbal ability (suggesting, for example, that these scores be reported statewide as a primary measure of accountability) and the lack of value of teacher education, the real picture is decidedly more complex. Her evidence for her claims confuses measures of verbal ability with measures of professional knowledge and subject matter knowledge, and often includes studies that actually show influences of these other kinds of knowledge that are at least as strong as measures of verbal ability. The world is just not as simple as Walsh would like to make it appear. Even strong advocates of the notion that academic ability matters are not willing to make the kinds of over-assertions Walsh urges. For example, Hanushek (1992), whom Walsh cites repeatedly for her defense of verbal ability as a key measure concludes:

    The closest thing to a consistent finding among the studies is that "smarter" teachers who perform well on verbal ability tests do better in the classroom. Even for that the evidence is not very strong (p. 116).

    While it would be ridiculous to argue that verbal ability and subject matter knowledge do not matter for teaching, it is equally ridiculous to argue that knowledge of teaching and learning and the opportunity to learn to teach under the close supervision of a master teacher through student teaching and other guided experiences do not matter at all. The literature just does not support this reading or the policy implications that Walsh would draw.

The Academic Ability of Teachers who Lack Certification

Another argument made by those who would eliminate certification is that an unconstrained market would allow the recruitment of individuals with higher verbal or general academic ability who do not now enter teaching. While it is probable that some individuals would choose to teach if they did not have to prepare, it is not clear that most of these entrants would be more academically able, that they would be better teachers, or that they would stay long in teaching. It is also unlikely that given current wages, individuals who are now preparing for much higher-paying careers in medicine, the law, engineering, and other professions that require much more onerous preparation and licensing processes would choose teaching as a career simply because they did not have to be certified.

Labor market contexts are relevant to this question. The qualifications of individuals preparing for teaching improved noticeably between the early 1980s and the early 1990s in terms of both academic attainment and ability measures, in part because of the changes in admissions requirements to teacher education adopted by states and universities but also likely because of the substantial increases in real wages for teachers that occurred during the 1980s. Whereas prospective teachers were disproportionately drawn from the bottom quartile of college students in the early 1980s (Lanier & Little, 1986), both grades and test scores improved for teacher candidates by the 1990s.

The Recent College Graduates Survey, which tracks college graduates into the labor market, found that the grade point averages of newly qualified teachers in 1990 were higher than those of the average college graduate, with 51% earning a GPA of 3.25 or better as compared to 40% of all graduates (Grey et al., 1993). However, average GPAs were significantly lower for the 15% of college graduates entering teaching who were neither certified nor eligible for certification. Most of the uncertified entrants (57%) had grade point averages below 3.25, and 20% had GPAs below 2.25. Attrition was also high for the untrained candidates. By the time of the survey (one year later), only one-third of the uncertified entrants were still engaged in teaching as their primary jobs (Grey et al., 1993).

In addition, the Educational Testing Service found that among 270,000 test-takers in 1995 through 1997, college admissions test scores were highly correlated with initial teacher licensing scores (Praxis I and Praxis II), and the lowest average scores on both kinds of tests were those held by individuals who entered teaching without preparation (Gitomer, Latham, and Zimek, 1999). (Walsh describes this 14% of the sample as an "error" in the study since the individuals had not enrolled in a teacher education program; she misunderstands the fact that these Praxis test-takers were the entrants to teaching who used emergency or alternative routes. (Note 20) Prepared teachers scored much higher than unprepared teachers.

While students who prepare to enter fields other than teaching have higher average test scores on measures like the SAT than do those preparing to enter elementary school teaching, there is no significant difference for prospective secondary teachers, most of whom earn a disciplinary degree along with their teaching certificate. The narrowing of this gap between prospective teachers and others is likely a function of the more rigorous admissions requirements for teacher education enacted in most states and the growth in wages between the early 1980s and the mid-1990s.

Finally, the study found that graduates of NCATE-accredited colleges of education passed the Praxis subject matter tests for teacher licensing at a significantly higher rate than did graduates of unaccredited programs, boosting their chances of passing the examination by nearly 10 percent (Gitomer, Latham, and Zimek, 1999). Walsh suggests that this higher Praxis pass rate might simply reflect the fact that NCATE schools could be located in states with low cutoff scores. However, additional analyses of the data by ETS and another independent study (Note 21) indicate that this is not the case. A more likely explanation is that NCATE's requirements that colleges demonstrate how they screen applicants for general ability and that they ensure strong content backgrounds translate into somewhat greater attention to these matters in institutions that are accredited. These data suggest that standards may increase the general as well as specialized qualifications of prospective teachers. They do not suggest that removal of certification requirements brings higher ability individuals into teaching or keeps them there.

It is important to recognize that labor market incentives operate among individuals actually entering teaching. For example, several studies of alternative certification programs found that the academic records of recruits varied substantially by teaching field, with alternatively-certified candidates in high demand shortage fields, such as mathematics and science, having much poorer academic records than candidates in other fields and than candidates from traditional teacher education programs in those same fields (see Natriello & Zumwalt, 1992, re: New Jersey; Lutz and Hutton, 1989 re: Dallas; Stoddart, 1992, re: Los Angeles). It is unlikely that eliminating requirements for training would increase the career attractions to teaching for academically able candidates as much as increased wages would. Meanwhile, eliminating training requirements could result in a less well-qualified teaching force, especially if the elimination of certification standards not only reduced the knowledge of entrants but also reduced pressures for competitive wages.

The Private School Argument

Finally, a claim sometimes made by opponents of teacher certification, including Walsh, is that private schools are more effective than public schools, and that this is because—or at least is not impeded by—the fact that private school teachers are not certified. There are two major problems with the private school "proof": First, there are conflicting findings about the relative effectiveness of public and private schools, with credible evidence on both sides of the question. Second, most private school teachers are certified and an even larger majority have specific preparation for teaching, even when they have not sought certification.

On the effectiveness of private schools, Walsh cites Coleman, Hoffer, & Kilgore (1982), who examined data from the first wave of High school and Beyond surveys, conducted in 1980, and found evidence of higher performance for comparable students in Catholic and other private schools as compared to public schools. The researchers attributed their findings primarily to differences in student behavior across school sectors, measured by variables like lower rates of absenteeism, cutting class, and fighting, along with factors like more time spent on homework and higher individual student attendance. They also found that achievement was actually higher for comparable students who were in public schools that had these characteristics. Subsequent studies have produced findings that favor both public and private schools after controlling for student characteristics and school organization (Bryk & Lee, 1992; Lee & Bryk, 1988; Lee, Dedrick, & Smith, 1991). Most studies have pointed to variables like school and class size, school organization, and curriculum differentiation as critical variables in determining both public and private school effectiveness. When these factors are controlled, public school students often do as well or better than private school students in schools with similar features.

Furthermore, differences in the preparation of public and private school personnel are not as large as many people assume. More than 30 states certify private school personnel (Feistritzer, 1984), and, when Coleman did his analysis, more than 85% of private and parochial school teachers were certified, as compared to about 95% of public school teachers (NCES, 1985). This has changed only slightly in the years since. Although certification is not required for private school teachers in all states, only 34% of private school teachers in 1993-94 (the most recent year for which national data are available), were not certified in their primary assignment field. Some of these teachers were certified in fields other than their primary assignment field. Many undertook teacher preparation, even though they did not apply maintain a state license or certificate. In 1993-94, public and private school teachers were almost equally likely to have received an undergraduate degree in education (68.9% for public vs. 61.5% for private elementary teachers and 19.8% for public vs. 19.3% for private secondary teachers) (NCES, 1997, p. 25). The education degree as an indicator of preparation is quite partial, since the education degree has waned as certification increasingly requires a content degree with an education minor or credential. The percentage of 1992-93 bachelor's degree recipients who had taken education courses was 87.1% for public school teachers and 71.6% for private school teachers, (Note 22) and the average number of education credits earned was 37.4 for public school teachers as compared to 35.2 for private school teachers (NCES, 1997, table A-51). (Note 23)

Public school teachers were also more likely to have taken subject matter degrees in their teaching fields than private school teachers. For example, 66% of public school mathematics teachers held a major or minor in the field, as compared to 58% of those in private school. (Goldhaber and Brewer, 2000 reported a similar finding.) The same differentials hold in other fields to somewhat lesser extents. The greater content preparation of public school teachers is likely a function of the fact that certification has required increasing amounts of subject matter coursework in the field to be taught, thus leveraging stronger content preparation for public school teachers in states where private school teachers are not required to hold certification. Almost all states now require certified teachers to hold at least a minor in the field to be taught, and many require a major in the field.

Finally, even if it were true that untrained teachers were unusually effective in some private schools for students of comparable initial achievement levels—a point about which there is no published evidence—it would be a large leap of faith to assume that such teachers would be equally effective in schools where many students have much greater educational needs and students are not pre-selected for their academic ability, their positive school attendance and behavior, and their parents' income and interest in education. There are very large differences in the populations of students attending public and private schools in the United States, (Note 24) which have important implications for teachers' knowledge and skills. It is one thing for a teacher to offer information in whatever manner comes instinctively to students who are academically able, have learned to learn independently, and are well-supported at home by educated parents, tutors, and other supports for their learning. It is quite another thing to teach by the seat of the pants when students do not have these learning supports at home and may present a variety of language and learning differences. Being effective with students who need substantial support for their learning requires greater diagnostic ability and knowledge of how to present information and structure experiences in ways that help them become successful. Systematic knowledge about how to organize curriculum and reach students with special learning needs is most needed in the schools that serve most students with these needs.

Other Misrepresentations of Research Findings

The remainder of Walsh's review continues the kind of misrepresentations documented above, appearing to rely on the belief that readers will read its accusations, but will not read or understand the research itself. Although she prepared a draft appendix with 192 studies that sought to critique many of the studies she dismisses (often inaccurately), it was not published with the report. Appendix B, to which the reader is repeatedly referred for reviews, includes only 14 studies. Throughout the report, the reader is referred to this appendix for critiques of studies that do not appear there. The selection of research included in the published version of the report's appendix is very strange. Many strong studies—some of the key citations in the field—are omitted, along with the flawed rationales for dismissing them that now appear in a separately-published appendix. Some much less important and less well-designed studies are included, with the apparent goal of critiquing their size or designs as though they represented the dozens of studies not mentioned or excluded. Thus, the paper does not include information regarding most of the studies Walsh claims she has reviewed and does not provide evidence for her claim that, of all the studies cited in support of teacher education and certification, "none bear up to scrutiny."

Here are just a few additional examples of major misrepresentations.

  1. Goldhaber & Brewer (2000). In a string of citations, Walsh lists a study by Goldhaber and Brewer (2000), for its finding that teachers with a degree in their subject matter are more effective than those without such degrees. This study fits all of Walsh's desiderata: It is large (using a data set that includes more than 3,000 teachers), recent, and published in a peer-reviewed journal. However, Walsh does not cite the authors' findings that certification status has an even greater influence on teachers' effectiveness than a degree in the subject area. Later, Walsh states, "...most research indicates that the most distinct problem in schools serving poor children is the number of teachers who are teaching subjects in which they have no expertise (Goldhaber & Brewer, 2000; ... Hawk, Coble, & Swanson, 1985). These studies do not show that certification status, as an isolated variable, has any significant effect on the achievement level of children who are poor or minority." (p. A6). Neither study examined the subject matter expertise of teachers in low-income schools, and both found strong effects of certification on student achievement. In fact, Goldhaber and Brewer wrote:

    Turning to an examination of the effect of teacher certification, we find that the type (standard, emergency, etc.) of certification a teacher holds is an important determinant of student outcomes. In mathematics, we find the students of teachers who are either not certified in their subject (in these data we cannot distinguish between no certification and certification out of subject area) or hold a private school certification do less well than students whose teachers hold a standard, probationary, or emergency certification in math. Roughly speaking, having a teacher with a standard certification in mathematics rather than a private school certification or a certification out of subject results in at least a 1.3 point increase in the mathematics test. This is equivalent to about 10% of the standard deviation on the 12th grade test, a little more than the impact of having a teacher with a BA and MA in mathematics. Though the effects are not as strong in magnitude or statistical significance, the pattern of results in science mimics that in mathematics. Teachers who hold private school certification or are not certified in their subject area have a negative (though not statistically significant) impact on science test scores (p. 139).

    The authors note that the effect size of "having a teacher with a standard certification in mathematics rather than a private school certification or a certification out of subject" is "a little more than the impact of having a teacher with a BA and MA in mathematics." Of course, the certification itself includes requirements for subject matter knowledge as well as for knowledge of teaching and learning. In fact, certified mathematics teachers are more likely to have a degree in the field than non-certified teachers. The fact that the study found a significant effect of certification status even after controlling for whether teachers had a degree in their field and after controlling for experience suggests that whatever is represented by the certification variable has an influence above and beyond the influence of content knowledge and classroom experience.

  2. Druva & Anderson (1983). This meta-analysis of 65 studies examined relationships between science teacher characteristics and teaching behaviors, student achievement in science, or both, using meta-analytic techniques to translate results from a wide range of studies into Pearson correlation coefficients in order to compare them. It found that ratings of teaching effectiveness by principals and students were most strongly correlated with the number of education courses taken, followed by student teaching grades, and teaching experience. On a teacher "effectiveness" scale composed of many teaching behaviors associated in process-product research with student achievement, both science training (examined in 28 studies) and education coursework and performance (examined in 47 studies) were related to effectiveness, as were teacher attitudes, values, and temperament. Associations with cognitive and affective student outcome measures were found for both science training and, to a somewhat smaller extent, for education coursework and performance, based on 34 studies for each of these sets of variables. The authors concluded that:

    Student outcomes are positively associated with the preparation of the teacher, especially science training, but also preparation in education and academic work generally.... While the hiring official seeking a new science teacher certainly must look beyond information on the teacher characteristics considered in this study, information on some of these characteristics certainly is worthy of inclusion in the decision-making process.... In general, the hiring official would be well advised to employ teachers with thorough preparation in both professional education and the sciences being taught. There is a relationship between teacher preparation programs and what their graduates do as teachers (p. 477).

    Walsh seeks to dismiss the results of this study in part by misreporting them. She states the study "did not show the benefit of education coursework on student achievement" (p. 19), and that education coursework is not significantly related to student outcomes, although significance statistics were not reported in the study. This assertion is not supported by the authors' reported findings that both science coursework and education training showed a relationship to teacher effectiveness as defined by student outcomes (in both cases, though to a greater extent for science coursework) (Note 25) as well as teaching behaviors and ratings (reported in the case of education coursework only).

  3. Darling-Hammond (2000). Walsh criticizes and misquotes a study that this author conducted, which examined both the literature on teacher characteristics and student achievement and conducted a regression analysis of state-level data from the National Assessment of Educational Progress and the Schools and Staffing Surveys (Darling-Hammond, 2000). The study found that measures of teacher preparation and certification were by far the strongest correlates of student achievement in reading and mathematics, both before and after controlling for student poverty and language status. The conclusion discussed a number of potential reasons for these large effects:

    The strength of the "well-qualified teacher" variable may be partly due to the fact that it is a proxy for both strong disciplinary knowledge (a major in the field taught) and substantial knowledge of education (full certification). If the two kinds of knowledge are interdependent as suggested in much of the literature, it makes sense that this variable would be more powerful than either subject matter knowledge or teaching knowledge alone. It is also possible that this variable captures other features of the state policy environment including general investments in, and commitment to, education, as well as aspects of the regulatory system for education, such as the extent to which standards are rigorous and the extent to which they are enforced.... Finally, there may be unmeasured correlations between the extent to which states enact and enforce high standards for teachers and the extent to which they have enacted other policies that are supportive of public schools. Although it does not appear that teaching standards are strongly related to investments regarding class sizes or to overall education spending, it is possible that there are other factors influencing student achievement which generally co-exist with teacher quality and which were unmeasured in these estimates.

    Walsh seeks to invalidate these findings by raising two complaints, one of which is inaccurate and the other of which is a matter of legitimate discussion in the field. She states, incorrectly, that, "Darling-Hammond did not control for class size differences among the states" (p. 26). State-level differences in average class size were in fact included in the analyses, and the variable had a very small, insignificant effect. Walsh also complains that the state-level analyses suffer from aggregation bias because they used average student test scores—a critique she also levels against other studies she cited approvingly for their findings in other parts of the paper (see e.g. Ferguson, 1991; Strauss & Sawyer, 1986; Coleman, 1966). (Note 26) There are legitimate debates in the field on this point, and I addressed this question in the study itself, as I do again below in the section on "Methodological Issues." For purposes of tracking broad policy trends at the state level, analyses of state level data offer one useful lens. This perspective was shared by the nine reviewers who recommended this paper's publication in a peer-reviewed journal and a peer-reviewed research report series.

    Finally, the literature review contained in this study is repeatedly mischaracterized throughout Walsh's paper and her appendix as minimizing or ignoring the influences of verbal ability and subject matter preparation for teaching.

    On the relationship between academic ability and teacher effectiveness, Walsh states:

    Darling-Hammond (1999, p. 6) claims there is "little or no relationship between teachers' measured intelligence and their students' achievement." She supports this statement with two studies by Soar, Medley and Cocker (sic) (1983) and Schalock (1979). These two studies simply recycle research from the 1940s and earlier, none of which is retrievable for scrutiny (p. 21).

    Walsh misrepresents this analysis by quoting a portion of a sentence out of context and citing the reviews that summarized research on IQ tests as an example of the inappropriate use of older studies. Here is what I actually said:

    While studies as long ago as the 1940s have found positive correlations between teaching performance and measures of teachers' intelligence (usually measured by IQ) or general academic ability (Hellfritsch, 1945; LaDuke, 1945; Rostker, 1945; Skinner, 1947), most relationships are small and statistically insignificant. Two reviews of such studies concluded that there is little or no relationship between teachers' measured intelligence and their students' achievement (Schalock, 1979; Soar, Medley, & Coker, 1983). Explanations for the lack of strong relationship between measures of IQ and teacher effectiveness have included the lack of variability among teachers in this measure and its tenuous relationship to actual performance (Vernon, 1965; Murnane, 1985). However, other studies have suggested that teachers' verbal ability is related to student achievement (e.g., Bowles & Levin, 1968; Coleman et al., 1966; Hanushek, 1971), and that this relationship may be differentially strong for teachers of different types of students (Summers & Wolfe, 1975). Verbal ability, it is hypothesized, may be a more sensitive measure of teachers' abilities to convey ideas in clear and convincing ways (Murnane, 1985)."

    Walsh's attempt to distort the text misses two critical points: First, studies of the relationship between IQ and teaching effectiveness (which I noted had found positive though small relationships) were primarily conducted before the 1960s, because IQ tests came into question as measures of ability at that time and were no longer often available in large data sets thereafter. Measures of verbal ability became more popular and widely available in data sets in the 1960s and following, and showed somewhat stronger relationships with teacher outcomes, as I reported in my summary. The studies I cited include many of the same ones that Walsh cites for this proposition—a point she does not acknowledge as she tries to suggest, inaccurately, that I minimize the value of measures of academic ability for teachers. (Note 27)

    On the topic of subject matter knowledge, Walsh also suggests on numerous occasions that I seek to minimize the importance of teachers' knowledge of content. She offers my work as an example of her sweeping statement that "certification advocates ... offer evidence that knowledge of subject matter has little effect on teaching performance" (p. 19). Here is what I actually said in my brief summary of the literature, offering an analysis that clearly acknowledges the importance of subject matter knowledge for teaching and interprets the mixed results of studies in terms of what teachers may need to know in order to teach different things.

    Byrne (1983) summarized the results of thirty studies relating teachers' subject matter knowledge to student achievement. The teacher knowledge measures were either a subject knowledge test (standardized or researcher-constructed) or number of college courses taken within the subject area. The results of these studies were mixed, with 17 showing a positive relationship and 14 showing no relationship. However, many of the "no relationship" studies, Byrne noted, had so little variability in the teacher knowledge measure that insignificant findings were almost inevitable. Ashton and Crocker (1987) found only 5 of 14 studies they reviewed exhibited a positive relationship between measures of subject matter knowledge and teacher performance.

    It may be that these results are mixed because subject matter knowledge is a positive influence up to some level of basic competence in the subject but is less important thereafter. For example, a controlled study of middle school mathematics teachers, matched by years of experience and school setting, found that students of fully certified mathematics teachers experienced significantly larger gains in achievement than those taught by teachers not certified in mathematics. The differences in student gains were greater for algebra classes than general mathematics (Hawk, Coble, & Swanson, 1985). However, Begle and Geeslin (1972) found in a review of mathematics teaching that the absolute number of course credits in mathematics was not linearly related to teacher performance.

    It makes sense that knowledge of the material to be taught is essential to good teaching, but also that returns to subject matter expertise would grow smaller beyond some minimal essential level which exceeds the demands of the curriculum being taught. This interpretation is supported by Monk's (1994) more recent study of mathematics and science achievement. Using data on 2,829 students from the Longitudinal Study of American Youth, Monk (1994) found that teachers' content preparation, as measured by coursework in the subject field, is positively related to student achievement in mathematics and science but that the relationship is curvilinear, with diminishing returns to student achievement of teachers' subject matter courses above a threshold level (e.g., five courses in mathematics).

    It may also be that the measure of subject matter knowledge makes a difference in the findings. Measures of course-taking in a subject area have more frequently been found to be related to teacher performance than have scores on tests of subject matter knowledge. This might be because tests necessarily capture a narrower slice of any domain. Furthermore, in the United States, most teacher tests have used multiple-choice measures that are not very useful for assessing teachers' ability to analyze and apply knowledge. More authentic measures may capture more of the influence of subject matter knowledge on student learning. For example, a test of French language teachers' speaking skill was found to have significant correlation to students' achievement in speaking and listening (Carroll, 1975).

    It seems logical that teachers' abilities to handle the complex tasks of teaching for higher-level learning are likely to be associated, to varying extents, with each of the variables reviewed above: verbal ability, adaptability and creativity, subject matter knowledge, understanding of teaching and learning, specific teaching skills, and experience in the classroom, as well as interactions among these variables. In addition, considerations of fit between the teaching assignment and the teacher's knowledge and experience are likely to influence teachers' effectiveness (Little, 1999), as are conditions that support teachers' individual teaching and the additive effect of teaching across classrooms, such as class sizes and pupil loads, planning time, opportunities to plan and problem solve with colleagues, and curricular supports including appropriate materials and equipment (Darling-Hammond, 1997).

    Finally, Walsh suggests in several places that I have characterized the research as indicating a "negative relationship between student outcomes and the NTE subject matter tests" (p. 19). In fact, I stated that "Studies of teachers' scores on the subject matter tests of the National Teacher Examinations (NTE) have found no consistent relationship between this measure of subject matter knowledge and teacher performance as measured by student outcomes or supervisory ratings. Most studies show small, statistically insignificant relationships, both positive and negative (Andrews, Blackmon & Mackey, 1980; Ayers & Qualls, 1979; Haney, Madaus, & Kreitzer, 1986; Quirk, Witten, & Weinberg, 1973; Summers & Wolfe, 1975)." (Note 28) Walsh misrepresents this statement numerous times.

Methodological Issues

One of the ways that Walsh seeks to make much of the research on teacher education disappear is by suggesting that it is inappropriate to cite studies that are older, smaller, use measures of performance other than student achievement scores, are aggregated at a level above the classroom, or are published in venues other than peer-reviewed journals.

As noted above, Walsh uses a double standard in selecting research to reject when it finds evidence of the influence of teacher education on student learning and research to cite for her own purposes. While she discounts the findings of many dissertation studies and technical reports because they were not published in peer-reviewed journals, in making her own claims, she cites at least 15 studies that were not published in peer-reviewed journals or technical report series and at least 20 that were published before 1980, including some that she elsewhere dismissed from consideration because she did not like specific findings. For findings she likes, she also cites several that use supervisory ratings as the only measures of teacher effectiveness and others that she later dismisses for aggregation bias. Sometimes she represents the studies' findings accurately; sometimes not. Many of the studies she cites for various propositions do not contain the findings for which they are cited—or, in several cases, any data on the question at all.

I would not argue, as Walsh does, that none of these studies have value as contributions to the literature. However, the double standard she applies in using studies of different eras, sizes, aggregation levels, dependent variables, and publication statuses perhaps proves the point that to evaluate the weight of evidence in a field it is often necessary to triangulate findings that used different methods, over different time periods, and at different levels of aggregation to see where there is an accrual of evidence over time and across methods. Of course it is important to do this with appropriate attention to the methodological strengths and weaknesses of various studies and lines of research. Unfortunately, Walsh often does this poorly, appearing to misunderstand critical research design issues. Below, I discuss the issues of study size and design, level of aggregation, choice of dependent variable (including the use of supervisory ratings of teacher performance), age, and venue of publication.

Study Size and Design

In one part of her review, Walsh bemoans the lack of experimental research. She then rejects the results of studies with experimental designs because of their smaller sample sizes and cites almost exclusively non-experimental correlational studies, which—though larger—lack direct controls for the variables of interest and must rely on statistical manipulations of data to account, indirectly, for these other influences. This kind of correlational research is, of course, legitimate for staking out broad possibilities in relationships among variables, but it has its own limitations. Many of the more carefully controlled experimental designs can in fact offer more solid evidence about effects, because the "treatment" they are studying is known and the samples can be better controlled than is true for large correlational studies that use proxies and sta