In October, 2001, the Baltimore-based
Abell Foundation issued a report purporting to prove that there
is "no credible research that supports the use of teacher
certification as a regulatory barrier to teaching" (Walsh,
2001, p. 5). (Note 2) The
Abell Foundation paper argued against Maryland's efforts to
strengthen teacher preparation requirements and defended the
continuation of a local short-term alternative route into
teaching that had come under criticism. Suggesting that
"educators, policymakers, the media, and the public
mistakenly equate teacher quality with teacher
certification" (p. 1), Kate Walsh, the author of the paper,
complained that efforts to improve education for poor and
minority children in Baltimore by the state and local
superintendents of schools and by local advocacy organizations
foolishly sought to secure more fully certified teachers for
their schools. She cited as wrong-headed newspaper articles
raising concerns, for example, that: "Least prepared
teachers are at worst city schools: One-third lack basic
credentials for certification," (p. 1). Calling misguided
the efforts of a Baltimore community group that released a study
which "bemoaned the fact that more uncertified teachers
were teaching in the city's high-poverty, predominantly
African-American schools than the city's whiter, more
affluent schools" (p. 2), the paper sought to demonstrate
that these inequalities in access to certified teachers are not
problematic if certification can be discounted as a determinant
of achievement.
The Abell Foundation proposed that
Maryland should 1) "eliminate the coursework requirements
for teacher certification" and require only a
bachelor's degree and a passing score on an appropriate
teacher's exam; 2) "report the average verbal ability
score of teachers in each school district and of teacher
candidates graduating from the State's schools of
education;" and 3) "devolve its responsibility for
teacher qualification and selection to its 24 public school
districts," delegating all hiring authority to individual
school principals (pp. vii-viii).
Although these ideas might seem indefensible to
those who are engaged in research regarding teacher preparation
and recruitment, the U.S. Secretary of Education echoed these
recommendations in his Annual Report on Teacher Quality (USDOE,
2002), a report on the national state of teacher quality required
under the 1998 reauthorization of Title II of the Higher
Education Act. In this report, the Secretary argued that teacher
certification systems are "broken," imposing
"burdensome requirements" for education coursework
that make up "the bulk of current teacher certification
regimes" (p. 8). The report argues that certification
should be redefined to emphasize higher standards for verbal
ability and content knowledge and to de-emphasize requirements
for education coursework, making attendance at schools of
education and student teaching optional and eliminating
"other bureaucratic hurdles" (p. 19).
The report suggests that its recommendations are
based on "solid research." However, only one
reference among the report's 44 footnotes is to a
peer-reviewed journal article (which is misquoted in the report); most are
to newspaper articles or to documents published by advocacy
organizations, some of these known for their vigorous opposition
to teacher education. (Note 3) For the recommendation that education preparation be
eliminated or made optional, the Secretary's report relies
exclusively on the Abell Foundation's paper. Though
written as a local rejoinder to Maryland's efforts to
strengthen teacher preparation and certification, it appears to
have become a foundation for federal policy.
This article includes the response I wrote to
Walsh's paper (Note 4)
when it was first issued, with some additions that respond to a
reply she issued with Michael Podgursky (Note 5) and a briefer version of her report
recently printed in Education Next, a magazine put out by the
Hoover Institution (Walsh, 2002).
In order to make a case for her agenda, Walsh attacks all
research that has found relationships between teachers'
preparation and their measured effectiveness, including
students' achievement. She characterizes much of the
education research as "flawed, sloppy, aged and sometimes
academically dishonest" (p. 13), a characterization that
more aptly describes her own paper, which consistently
misrepresents the statements of researchers, the findings of
studies, and the evidence base for her claims. She claims to
have reviewed all of the studies ever cited by proponents of
teacher education. In fact, a large number of the references in
the paper and appendix are not directly on the topic of teacher
education, and many studies of teacher education effects are not
included in the report. Furthermore, her paper does not actually
review most of the studies it mentions. An original report
appendix listing studies shrank from 175 in July, 2001 to
fourteen in the version of the report released in October, 2001
selected according to no obvious criteria and omitting many of
the most prominent studies on the topic. (Note 6) The "reviews" in a now separate
appendix published on the foundation's website are generally
not careful assessments of research
methods or findings but a list of complaints and random
observationssometimes accurate but often notabout
various aspects of the studies or how they have been cited
by others. (A number of examples are included below.)
All studies have limitations, and some are too problematic
to be relied upon, including a number that Walsh relies upon for
her own assertions. However, Walsh's paper, which is
littered with inaccuracies, misstatements, and
misrepresentations, sheds little light on the research or its
implications for teacher education and certification. In what
follows I discuss the inaccuracies in Walsh's account, the
actual findings of many of the studies she purports to review,
and the findings of other studies she chooses to ignore, as well
as the implications of her proposals for teachers, their
knowledge, and the students they teach.
In the course of the paper, I review some of the studies that
have found influences of teacher education and certification on
student achievement at the levels of the individual teacher (e.g.
Goldhaber & Brewer, 2000; Hawk, Coble, & Swanson, 1985;
Monk, 1994); the school (Betts, Rueben, & Danenberg, 2000;
Fetler, 1999); the school district (Ferguson, 1991; Strauss &
Sawyer, 1986); and state (Darling-Hammond, 2000c). The
convergence of findings in analyses using different units of
analysis reinforces the strength of the inferences that might be
drawn from any single study.
What are the Arguments?
The Abell Foundation report admits that teacher qualifications
make a difference but it also tries to make a case that
"the backgrounds and attributes characterizing effective
teachers are more likely to be found outside the domain of
schools of education. The teacher attribute found consistently
to be most related to raising student achievement is verbal
ability.... usually measured by short vocabulary
tests..." (p. v). Later in the report, Walsh
suggests that subject matter knowledge may be an additional
criterion for hiring secondary teachers, but not for elementary
teachers. Walsh objects to the state requirements regarding
content coursework in each of the core academic areas for
elementary teachers, since many who want to enter through the
alternative Resident Teacher program in Maryland have had trouble
meeting these requirements.
Walsh then tries to dismiss all studies that find evidence
that knowledge about teaching also makes a different for teacher
performance, or to claim that studies finding positive effects of
teacher education or certification are either too old, too small,
too highly aggregated, or dependent on evidence about teacher
performance other than student achievement or are not really
about certification after all, even if their authors say they
are. She often does this by misrepresenting the studies'
actual methods and findings, as I detail below.
While there are legitimate concerns to be raised about various
studies in the literatureon all sides of the questionthis article
does not shed much light on them. A thorough
review of the quality and accurately portrayed findings of the
several bodies of research that bear on this question would be a
service to this field. Unfortunately, this document's
inaccuracies and misinterpretations make it of little use in this
regard.
In what follows, I address five major issues regarding the
Abell report and the research base on teaching and teacher
education:
- Evidence Ignored. Evidence about student
learning in reading and other areas documents the need for
teachers to have professional knowledge that includes and extends
beyond subject matter knowledge. The Abell Foundation report
does not consider this evidence or answer the question of how
teachers are to acquire this knowledge if they are not
professionally prepared.
- Unfounded Claims. No evidence supports
Walsh's claim that either verbal ability or subject matter
knowledge alone makes teachers effective. She lacks supporting
evidenceand fails to consider contradictory evidencefor
her claims about the relative effectiveness of
certified and uncertified teachers, the outcomes of teacher
education, the primacy of verbal ability as the most important
measure of teaching, the effectiveness of private and public
schools and the preparation of their teachers, and the attributes
of individuals who enter teaching without certification.
- Misrepresentations of Research. Walsh's
claim that she has reviewed 100 to 200 studies cited in support
of teacher education and found that "none of them holds up
to scrutiny" is not true. In fact, she is unable to
discount a number of important studies that support teacher
education or certification. In addition, a large number of the
studies relevant to the question of teacher education effects are
not reviewed at all in Walsh's paper. Most of the studies
she mentions do not concern teacher education or certification
directly: at most 80 of the nearly 200 studies listed in the
study or appendix are focused on teacher education or
certification. A number of those reviewed are badly
misrepresented, including inaccurate statements about their
methods and findings, false claims about their authors'
views, and distortions of their data and conclusions. Many are
not reviewed for their methods and findings, but are dismissed
because of their sample size, age, dependent variable, or
publication venueunless Walsh likes one of the findings,
in which case she uses the study, sometimes after already having
dismissed it. Even the studies that Walsh says she reviewed are
missing from the appendix of the report, where she refers readers
for evidence. (Note 7)
- Methodological Issues and Double Standards in Using
Research. Walsh misunderstands some fundamental research
design issues, including the difference between experimental and
correlational studies and the interpretation of research
conducted at different levels of aggregation. In her effort to
make the evidence base about teacher education disappear, Walsh
eliminates from consideration studies that have been cited
regarding the contributions of various measures of teacher
qualifications to teacher effectiveness if they have small sample
sizes, if they were published more than 20 years ago, or if they
were published as dissertations, technical reports, or conference
papers rather than in peer-reviewed journals. She also
eliminates all studies that use measures of teacher effectiveness
other than student achievement (e.g. supervisors' ratings
of performance, researchers' observation-based measures of
teacher practice). There are legitimate issues associated with
the sample size, age, quality assurance, and measurement that
warrant discussion (see below). However, as a blanket means of
eliminating evidence from consideration, this strategy is
problematic, as Walsh's frequent citations of studies that
fail to meet her own criteria suggest.
- Illogical Policy Conclusions. While it is
clear that teacher certification systems are not perfect and
there are many weak teacher education programs, points that I
have frequently made in my own research, it does not follow that
the response to these problems should be to eliminate
expectations for teachers to acquire the knowledge they need to
teach students effectively. The more appropriate policy response
is to improve the quality of teacher educationa process
that has been underway with important results in a number of
states, and one that rests on the processes of accreditation and
certification that provide policymakers with levers for change
and improvement.
Evidence Ignored
While the Abell Foundation report claims that
teachers do not need professional knowledge in order to teach,
the field has been moving rapidly to codify the ways in which
teaching knowledge makes a difference in student learning. For
example, the National Reading Panel of the National Institute of
Child Health and Human Development last year published a major
review of carefully controlled research which found that
children's reading achievement is improved by systematic
teaching of phonemic awareness, guided repeated oral reading,
direct and indirect vocabulary instruction with careful attention
to readers' needs, and a combination of reading
comprehension techniques that include metacognitive
strategies.
The report notes that teacher education is critical to the
success of reading instruction with respect to both instruction
in phonemic awareness and more complex comprehension skills:
Knowing that all phonics programs are not the same brings with
it the implication that teachers must themselves be educated
about how to evaluate different programs to determine which ones
are based on strong evidence and how they can most effectively
use these programs in their own classrooms. It is therefore
important that teachers be provided with evidence-based
preservice training and ongoing inservice training to select (or
develop) and implement the most appropriate phonics instruction
effectively. (p. 11)
Teaching reading comprehension strategies to students at all
grade levels is complex. Teachers not only must have a firm
grasp of the content presented in the text, but also must have
substantial knowledge of the strategies themselves, of which
strategies are most effective for different students and types of
content and of how best to teach and model strategy use....
(Data from the studies reviewed on teacher training) indicated
clearly that in order for teachers to use strategies effectively,
extensive formal instruction in reading comprehension is
necessary, preferably beginning as early as pre-service (National
Reading Panel, 2000, pp. 15-16).
Studies have documented that professional training can be
effective in providing teachers with the strategies that enable
them to teach these complex comprehension skills, and teachers
who receive such training significantly improve students'
reading outcomes (e.g, Duffy, Roehler, Sivan et al., 1987; Duffy
& Roehler, 1989, regarding explicit strategy instruction;
Palincsar & Brown, 1989, regarding reciprocal teaching).
Similar insights in our understanding of how to develop
student proficiency in mathematics and science, and how to
develop teachers' skills for doing so, have recently
emerged. For example, recent analyses of the National Assessment
of Educational Progress (NAEP) which control for student
characteristics and a number of measures of school inputs have
found that students whose teachers have majored in mathematics or
mathematics education, who have had more pre- or in-service
training in how to work with diverse student populations and more
training in how to develop higher-order thinking skills, and who
engage in more hands-on learning do better on the NAEP
mathematics assessments. Similarly, students whose teachers have
majored in science or science education and who have had more
pre- or in-service training in how to develop laboratory skills
and who engage in more hands-on learning do better on the NAEP
science assessments (Weglinsky, 2000). (Note 8)
A recent review commissioned by the Department of Education,
which was carefully vetted by a panel of researchers, disagreed
with the Abell Foundation's conclusions. This review,
which analyzed 57 studies that met specific research criteria and
were published after 1980 in peer-reviewed journals, concluded
that the available evidence demonstrates a relationship between
teacher education and teacher effectiveness (Wilson, Floden,
& Ferrini-Mundy, 2001). The review shows that empirical
relationships between teacher qualifications and student
achievement have been found across studies using different units
of analysis and different measures of preparation and in studies
that employ controls for students' socioeconomic status and
prior academic performance.
It is ironic that just as the field is learning
more about how to prepare teachers to teach children effectively,
the Abell Foundation suggests that we truncate teacher education
and end the certification policies that would encourage and
enable teachers to acquire this knowledgeor at least
that we do so for the children of the poor, who also attend
school in districts with minimal resources for professional
development. The unanswered question is, How are teachers to
learn what is known about how to teach well if there are no
expectations, incentives, or supports for them to do so?
Unfounded Claims
While ignoring these serious questions, Walsh makes a number
of claims that are not supported either by the research she
presents or by other evidence in the field. These include the
following:
- New teachers who are certified do not produce greater student
gains than new teachers who are not certified.
- There is little evidence that the content and skills taught in
preservice education coursework is (sic) either retained
or effective.
- Verbal ability and subject matter alone are sufficient to
produce effective teachers.
- Private schools do not hire certified teachers and they are
more effective than public schools.
- Individuals with higher academic ability will be recruited to
teaching if certification standards are eliminated.
The Effectiveness of Certified and Uncertified
Teachers
For her proposition that "new teachers who are certified
do not produce greater student gains than new teachers who are
not certified," Walsh cites seven studies, none of which
provides support for this proposition, and five of which actually
provide evidence that contradicts her claim. Three of the
studies (Bliss, 1992; Stoddart, 1992; Lutz & Hutton, 1989)
include no data on student achievement at all, although Walsh
elsewhere dismisses all other studies that do not use student
achievement data as the dependent variable. (In a reply to my
response, Walsh and Podgursky (2001) note that these studies have
been deleted in a newly printed version, along with some studies
Walsh cited that were not peer reviewed, "so that the
report ... does not appear to convey a double
standard" (p. 15)).
Six of the studies Walsh cites actually deal with
alternatively certified rather than uncertified teachersthat
is, teachers who had undertaken teacher education at the
post-baccalaureate level in university- or school district-based
programs that rearrange the way teacher education is delivered.
The findings across the studies are mixed, but none of them shows
that uncertified teachers do as well as certified teachers, and
one of them shows that this is clearly not true. Several of the
studies point instead to the value of teacher education: The more
positive findings are found for the alternatives that provide
more complete preparation.
- Bliss (1992) wrote about the
Connecticut alternative certification program, a two-year
training model which the author notes features "a
significantly longer period of training than in any other
alternate route program" in existence at that time (p.
52). This report does not examine uncertified teachers, nor
does it meet Walsh's criteria for inclusion in a review of
literature, because it includes no data about teacher
effectiveness as gauged by student achievement measures. Bliss
notes that most recruits reported their initial training to be
helpful, and she briefly mentions results from another
researcher's survey of recruits' supervisors which
suggested mixed reviews of their performance: 33 percent of
supervisors said that the alternate route teachers were weaker
than others in classroom management (presumably, then, 67 percent
said they were not weaker than others in this area), while
38 percent said they were stronger than others in teaching skills
(and 62 percent presumably said they were not stronger
than others in this area).
- Stoddart (1992) reports on the subject matter
qualifications and attrition rates of recruits to the Los Angeles
Teacher Trainee Program, also a two-year training model. She
found that content qualifications were comparable to those of
traditionally trained recruits, except for math recruits, who had
lower GPAs than traditionally trained mathematics teachers, and
that attrition rates for those who entered were relatively low in
the first two years but higher than national rates after 5
years. (Note 9) Results
cited by Stoddart from other studies about the observed practices
of these teachers in comparison with university-trained teachers
produced mixed results: university-trained English teachers
appeared more skillful than alternate route teachers, but the
levels of skill appeared lower for mathematics teachers from both
groups.
- Lutz and Hutton (1989) compared the demographic
characteristics, attitudes, certification test scores, and
opinions of Dallas Public Schools' alternative
certification (AC) recruits with other first year teachers in the
district. Like the other studies noted above, this study did not
examine student achievement gains of the recruits'
students. The program provides summer training to recruits and
then places them in mentored internships during the school year
while they are completing other coursework. The study found many
similarities but some differences between AC recruits and other
first year teachers, including significantly lower rates of
expected long-term continuation in teaching for the AC recruits
(40% vs. 72% for other first year teachers). They also examined
supervisors' perceptions of recruitsa measure that
Walsh argues should eliminate other studies from consideration.
These were positive for the 54% of the pool (59 out of 110)
defined as "successful" interns in the studythose who
completed the intern year without dropping out (10%) or
being held back for another year or more due to
'deficiencies' in various areas of performance
(36%). The study also reported data from another evaluation of
the program by the Texas Education Agency (Mitchell, 1987), which
surveyed principals, finding that:
The principals rated the [traditionally-prepared] beginning
teachers as more knowledgeable than the AC interns on the eight
program variables: reading, discipline management, classroom
organization, planning, essential elements, ESL methodology,
instructional techniques, and instructional models. The ratings
of the AC interns on nine other areas of knowledge typically
included in teacher preparation programs were slightly below
average in seven areas compared with those of beginning
teachers. It might therefore be assumed that pre-service teacher
education programs are doing something right! (p. 250).
In the paragraph cited above, Lutz and Hutton wax enthusiastic
about preservice teacher education programs that seemed in these
data to outperform the alternative route. Later they wax
enthusiastic about the alternative route, given results from
another survey of principals, most of whom felt that alternative
credential candidates who eventually made through the program
were comparable to other beginning teachers. At the end of the
piece, they note that the high attrition rates and difficulty
maintaining the program suggest the alternate route will not
likely be a long-term solution to teacher supply problems.
Although Walsh cites Lutz and Hutton's enthusiastic
feelings about the AC program, she does not accurately report the
complete data from the study, including the low rates of
successful program completion, the low rates of planned retention
in teaching, and the mixed reviews of their performance. In her
appendix, she includes this study with the following
"review:" "Darling-Hammond ignores the
unqualified authors' (sic) endorsement of the merits
of alternative route to teaching...." One presumes
that she means to reference the authors' "unqualified
endorsement" rather than to call the authors themselves
unqualified. Yet as the above excerpts make clear, the study
does not provide an unqualified endorsement of the program.
Walsh repeats this mistake in the appendix when
she critiques a review of alternate certification programs
(Darling-Hammond, 1992). She states that, "Darling-Hammond
cites the findings from many studies that looked at alternative
programs; but she does not include findings that show
alternatively trained teachers are at least as effective at
raising academic achievement as those who graduate from
traditional programs," (p. A-3), citing Lutz and Hutton
(1989), despite the fact that their study presented no empirical
data on academic achievement of students and presented mixed
evidence about the rated performance and retention rates of these
recruits.
Two other studies Walsh cites do include student achievement
data, but they do not, as she states, compare certified with
uncertified teachers. Both deal with alternatively certified
teachers who receive a substantial amount of education coursework
while they are undertaking mentored teaching supervised by both
university supervisors and classroom mentors.
- Miller, McKenna, & McKenna (1998) is a matched
comparison group study of what the study's authors call a
"carefully constructed" university-based alternate
route program for middle school teachers. Reflecting the
characteristics of alternative routes endorsed by the National
Commission on Teaching and America's Future (1996), this
program offered 15 to 25
credit hours of coursework before interns entered classrooms
where they were intensively supervised and assisted by both
university supervisors and school-based mentors while they
completed additional coursework needed to meet full standard
state certification requirements. Forty-one of these teachers
were compared to a group of 41 traditionally certified teachers
matched for years of experience, using ratings of their teaching
conducted by trained observers. Then student test score data
were collected for 18 of these teachers. Although the sample
size is too small to meet Walsh's criteria (Note 10) for studies worth considering (a
point she seems to have forgotten here), and data are not
provided on student pre-test scores, the study appears reasonably
well-conducted.
The traditionally trained teachers in this study felt somewhat
more confident in their practice and scored slightly higher on
the two sub-scales of an observation instrument used by trained
observers to rate their teaching. However, these differences
were not significant, and the authors report, without including
the actual data analyses, that there were no significant
differences in the student achievement of 18 teachers from the two groups
by the 3rd year of practice after
both had completed all of their education coursework. (The
authors did not control for prior achievement levels of students;
however, they stated that the initial differences in student
achievement across groups were not significant.)
Because the design of this program was so different from many
quick-entry alternative routes, Miller, McKenna, and McKenna note
that their studies "provide no solace for those who believe
that anyone with a bachelor's degree can be placed in a
classroom and expect to be equally successful as those having
completed traditional education programs.... The three
studies reported here support carefully constructed AC programs
with extensive mentoring components, post-graduation training,
regular in-service classes, and ongoing university
supervision" (p. 174). This finding does not support
Walsh's contentions throughout her paper that only general
intelligence and subject matter knowledge make a difference for
teacher effectiveness, her statement that uncertified teachers do
as well as certified teachers, or her claim that there is no
evidence which supports teacher education and certification.
- The other study on alternative certification cited
favorably by Walsh (Bradshaw & Hawk, 1996) was not
published as a peer-reviewed article or research reportone of
Walsh's criteria for rejecting the results of other
reports. It is actually not an empirical study but a literature
review that, like other reviews Walsh criticizes, is based on a
mixture of unpublished papers and on studies that, for the most
part do not examine student achievement. Some of the papers
cited do not include empirical evidence at all. Walsh
characterizes the report's findings as providing
"mixed, inconclusive" evidence. This is certainly
true. Studies examining measures of knowledge, teacher beliefs
and attitudes, teacher ratings, and student views report no
differences on some measures and differences, typically favoring
traditionally prepared teachers, on others, especially measures
of professional knowledge and performance.
With respect to student achievement, Bradshaw and Hawk list
five papers that discuss outcomes for differently trained
teachers. The first, an unpublished paper by Barnes, Salmon, and
Wale (1989) does not present any empirical data or discussion of
specific studies, but it includes a statement that two districts
in Texas reportedly found equivalent outcomes for alternative and
traditional program teachers. While it does not mention what
programs might have been compared, it does include a table
listing teacher education programs designated as alternatives.
This list includes one- and two-year university-based
master's programs (which are called
"alternative" in Texas because they are not
undergraduate models) along with district alternative programs
that generally offer only a few weeks of summer training before
teachers are assigned to classrooms. Thus, the
"alternative" group included programs providing
extensive graduate level training of the sort that many states
would call 'traditional," along with programs that
provide little formal preparation. Aside from the unanswered
question of what analyses some unnamed parties might have been
done to support assertions about relative effects,
the wide range of program models
included as "alternative" precludes any inferences about
the effects of preparation on teacher
effectiveness.
A second study, by Denton & Peters (1988) provides another
example of the definitional problems associated with the terms
"alternative" and "traditional". This
paper actually studied two versions of a university's
college-based teacher education program. The one called
"alternative" in their paper was in fact an expansion
of the regular teacher education program, rather than a reduction
in coursework. Graduates of this more extensive curriculum had
students who had stronger performance in earth and physical
sciences, while scores in mathematics were stronger for students
of the regular teacher education program
Of the remaining studies, two found that student achievement
gains were higher for the students of traditionally prepared
teachers in language arts (Gomez & Grobe, 1990, in a
comparison with alternatively certified teachers) and mathematics
(Hawk, Coble, & Swanson, 1985, in a comparison with
uncertified mathematics teachers). The last (Stafford &
Barrow, 1994) did not present original research
but referenced studies reporting differences associated primarily with
teaching experience between the performance of alternative
program teachers, other first-year teachers, and experienced
teachers.
In combination, these studies do not provide any support for
the statement that uncertified teachers are as effective as
certified teachers. In addition to its other inaccuracies,
Walsh's review confuses alternative certificationa
strategy that provides candidates with preparation that is
differently packaged from what various states deem
"traditional" training (usually the difference is
that training is post-baccalaureate rather than undergraduate and
is streamlined into about a year rather than spread across four
years of college)with lack of certificationwhich
generally indicates a lack of preparation. Having already missed
this critical distinction, Walsh does not begin to attempt to
sort out the effects of the differences in preparation
experiences and outcomes associated with different models of
teacher education. Thus, she does not note that program designs
that include a comprehensive and coherent program of coursework
and intensive mentoring (e.g. Miller, McKenna, & McKenna,
1998) have been found to produce more positive evaluations of
candidate performance than models that forego most of this
coursework and supervised support.
For example, a comparative study of more than 200 alternative
certification candidates in New Hampshire, who are certified via
three years of on-the-job training in lieu of formal preparation,
found they were rated by their principals significantly lower
than university-prepared teachers on instructional skills and
instructional planning, and they rated their own preparation
significantly lower than did the university candidates (Jelmberg,
1995). To understand the outcomes of different approaches,
studies of alternatives need to acknowledge the differences in
program models.
Finally, Walsh cites two additional studies that include
uncertified teachers, but she gets the findings wrong. Neither
study shows that uncertified teachers do as well as certified
teachers. One shows that the reverse is true.
- In one study (Goldhaber & Brewer, 2000), the
authors found that high school students who had a certified
teacher in mathematics did significantly better,
after controlling for initial achievement and student demographic
factors, than those who had uncertified teachers. The same
trends were true in science, but the influences were somewhat
smaller. The effects of certification on achievement were larger
thanand in addition tothe effects of a subject
matter degree. In this sample, students of a small number of
science teachers who held emergency or temporary certification
(24 out of the 3,469 teachers in the overall sample) did no worse
than the students of certified teachers, although they, too, did
better than the students of uncertified teachers. Another
analysis of these data (Darling-Hammond, Berry, & Thoreson,
2001) showed that in this sample most of the teachers on
temporary / emergency certificates were experienced and most had
education training comparable to that of the certified teachers.
Most appeared to be already licensed teachers from out-of-state
who were in the transition period to securing a new state license
or experienced teachers teaching out of their main field. Only a
third were new entrants whose characteristics may have suggested
a content background with little education training. The
students of this sub-sample of teachers had lower achievement
gains in an analysis of co-variance that controlled for pre-test
scores, content degrees, and experience than those of the more
experienced and traditionally trained teachers.
- Finally, Walsh cites a recently released study of Teach
for America (TFA) by Raymond et al. (2001). This study is
relevant to Walsh's discussion of the Resident Teacher
Program through which she notes that many TFA recruits enter
teaching in Maryland. However, the study did not compare
certified to uncertified teachers, as Walsh claims. Although
they had the data to do so, the authors chose not to examine how
TFA teachers performed in comparison to trained or certified
teachers. The study examined the influences of TFA teachers on
student achievement scores, using regression methods that
controlled for teacher experience and school demographics; thus,
the comparison was between TFA recruits and other inexperienced
teachers in high-minority schools in Houstonwhere most
underqualified teachers are placed. Since about 50% of
Houston's new hires are uncertified and about 35% were found to lack
a bachelors degree in the most recent year of the study, TFA
recruits were compared to an extraordinarily underprepared set of
teachers. In this comparison, students of TFA teachers did
about as well as those of other inexperienced, largely untrained
teachers, many of them without bachelors degrees. (Reviewers of
this report have noted that the report should have compared TFA
recruits to other BA holders and to prepared or certified
teachers; based on the statistics shown, it is not clear that
the results of these comparisons would be favorable to TFA.) (Note 11)
Another study that compared TFA teachers to certified teachers
found significantly higher scores for the students of certified
teachers (Laczko-Kerr and Berliner, 2002).
The Raymond et
al. report also indicated that minority students in Houston, who
are disproportionately taught by these underprepared teachers,
lose ground academically each year. In addition, only about 50%
of African American and Latino 9th graders in Houston
graduate from high school four years later (Haney, 2000; NCES,
2000). It would be hard to argue that the assignment of so many
underprepared teachers to these students has nothing to do with
their lack of success.
The TFA study found that students of experienced teachers
performed significantly better than students of inexperienced
teachers, including TFA recruits. Along with the report's
finding that, over a three year period, between 60% and 100% of
TFA candidates had left after their second year of teaching, this
finding raises additional questions about Teach for
America's contribution to the education of Houston
students, since they do not stay long enough to gain the
experience that could support student achievement. Earlier data
from the Maryland Department of Education showed that TFA
recruits in Baltimore had similar attrition rates, with 62 % gone
by the third year of teaching (Darling-Hammond, 2000b).
These high attrition rates resemble those found in some other
studies of short-term alternative routes (Darling-Hammond, 2000c)
and suggest another important outcome of teacher preparation
policies. Both the Houston study and Walsh's own review
indicate that experienced teachers are more effective than
inexperienced teachers (Walsh, pp. 5-6), yet many short-term
alternative program recruits leave quickly. Other research
indicates that those who complete 5-year teacher education
programs enter and stay in teaching at much higher rates than
4-year teacher education graduates, who stay in teaching at
higher rates than teachers hired through alternatives offering
only short-term summer training before full-time teaching (Andrew
& Schwab, 1995; Darling-Hammond, 2000b). One reason for this
might be the fact that 5-year program graduates typically have
both a disciplinary major and a full-year of student teaching
tightly integrated with education coursework.
Student teaching appears to make a strong difference in
teacher retention. In a longitudinal study of recent college
graduates who entered teaching in 1993, a recent NCES report
notes that recruits without student teachingmost common
among untrained recruits or those who enter through shorter-term
alternative routesleave teaching at rates nearly twice
as high as those who have had this kind of clinical training
(Henke, Chen, & Geis, 2000). The authors noted:
In comparison with new teachers who had less training in
pedagogy, those with more training were less likely to have left
teaching without returning by 1997. Fifteen percent of those who
had student taught had left the profession and not returned by
1997, compared with 29 percent of those who had not student
taught. Where as 14 percent of certified teachers had left by
1997, 49 percent of those without certification had not done so
(p. 49).
Findings about the high attrition rates of those hired without
full preparation for teaching raise questions about the
cost-effectiveness of a recruitment strategy that relies on
teachers with little preparation who are likely to leave the
profession before they can learn to become effective with
children. Meanwhile, the children they have taughtalmost
always the most disadvantaged students in the most
disadvantaged schoolshave not had the benefit of a
teacher with either professional knowledge or experiencetwo
sources of greater teaching skill.
A recent study in Texas showed that teacher attrition costs
school systems at least $8,000 for each recruit who leaves in the
first few years of teaching (Texas Center for Educational
Research, 2000). It estimated that the high attrition of
beginning teachers in Texas, a growing number of whom enter with
little or no preparation and receive few supports in learning to
teach, costs the state more than $200 million per year (p. 16).
This and other studies of teacher attrition suggest that
policymakers should consider both teaching effects and retention
patterns when they think about how to recruit and prepare
teachers.
Walsh chooses to ignore other studies showing that certified
teachers do better than uncertified teachers.
- One of these by Hawk, Coble, & Swanson
(1985), entitled "Certification: It Does
Matter," foundin contradiction to
Walsh's statement cited abovethat teachers'
certification in mathematics has a large and statistically
significant effect on student achievement gains in both general
mathematics and, to an even greater extent, in algebra. It compared
pre- and post-test scores of students whose teachers who were
certified in mathematics as compared to those of teachers with
similar levels of experience who were uncertified in
mathematics. This study is dismissed in one part of
Walsh's review as too small (p. 34), so that its findings
can be discounted with respect to certification. However, the
size of the study does not appear to matter to Walsh when she
chooses to cite it as a basis for arguing that only subject
matter makes a difference to teaching effectiveness (p. 65).
This double standard about the use of research permeates the
report. A study is declared inadequate when it finds any
contribution of teacher education or certification to any measure
of teacher effectiveness but a study of comparable size or
methodologyoften the same studyis embraced
elsewhere and used to support a different argument.
While the study does have a small sample size (it examined 36
teachers, paired by school, course, and ability level of students
being taught and the 826 students they taught), it is a
reasonably well-controlled matched comparison design. The study
does support the idea that subject matter knowledge matters to
teaching. However, Walsh misrepresents the study as suggesting
that only subject matter knowledge matters. The study did
not directly examine the isolated effects of subject matter
knowledge but the combined effects of subject matter knowledge
and educational knowledgeincluding methods courses in
the teaching of the content areathat are part of the
certification requirements for an in-field credential. Authors
Hawk, Coble, and Swanson concluded:
The results of this study lend support to maintaining
certification requirements as a mechanism to assure the public of
qualified classroom teachers... " (p. 15). (Note 12)
As this and other studies reviewed here suggest, content
knowledge in combination with content pedagogical knowledgethat
is, knowledge about how to teach the content, which, together
with student teaching, constitute the major
components of certificationappear to make contributions to
student learning that exceed the contributions of either
component individually. An important policy point from this and
other studies of certification is the fact that teachers would
not have been guided or encouraged to acquire the content
knowledge and content pedagogical knowledge represented by
in-field certification unless there were certification requirements.
While Walsh and the Fordham Foundation manifesto she
endorses would turn all hiring decisions over to principals, it
was principals in these schoolsand in many others across
the countrywho hired and assigned out-of-field teachers
to teach mathematics as well as other subjects (Ingersoll, 1998).
In a policy
world that eliminates teacher certification, there would be no
barrier to that practice occurring on an even more widespread
basis.
- Another, much larger study resulted in similar
findings about teacher certification in California. Fetler
(1999) examined the relationship between school scores on the
state's mathematics test and teachers' average
experience levels and certification status in 795 high schools,
after controlling for student poverty rates and test
participation rates. It found that the percent of teachers on
emergency credentials exerted a strong and highly significant
negative influence on student achievement. The author
concluded that, "After factoring out the effects of
poverty, teacher experience and preparation are significantly
related to achievement" (p. 13).
This study is cited but never discussed in Walsh's
revised report. In her original appendix, Walsh applauded the
study's methods but then sought to dismiss its findings
with two inaccurate assertions. First, she suggested,
incorrectly, that the study's results pertained to subject
matter knowledge alone, not to the combination of subject matter
and teaching knowledge represented by certification. She misread
both the study and the requirements of California's
credentialing system to make this claim, appearing to believe
that individuals who have passed only the subject matter
requirement of a content test are granted full credentials in
California (they are not), that individuals who are certified
through internship programs (California's alternative
route) do not have to complete pedagogical requirements (this is false),
and that
individuals are hired on emergency permits solely if they lack
content knowledge (this is also false).
(Note 13) Walsh also suggested, incorrectly, that the study
"may have some basic methodology problems, by reaching
conclusions using aggregated state-wide data." However,
all of the study's data are aggregated to the school level,
not the state level. (See the author's confirmation of
this statement, below.) In the original appendix, (Note 14) Walsh stated:
The article would be only be of interest if someone tried to
assert that a teacher who knows no math could be a good math
teacher. Any attempt to use this study as evidence against the
practice of hiring alternatively trained teachers, as appears to
be Darling-Hammond's implies (sic) and as Wilson et al.
interpret it, loses all of its impact after reading
Fetler.... In fact the author.... is primarily
advocating ensuring that math teachers take more subject matter
coursework, and is clearly disinterested in any effect that may
be had from coursework in "professional
knowledge."
The author, Mark Fetler, took strong issue with this
interpretation of his findings. When I shared Walsh's
statement with Fetler, he wrote in reply:
I am surprised that Kate Walsh makes those statements. I had a
brief telephone conversation with her, but she was not
forthcoming about her intent. Meeting the subject matter
requirement involves both knowing the topic, e.g., Algebra, and
the specific procedures needed to teach it in the classroom.
Someone who knows how to solve quadratic equations, but does not
know how to convey that information to children in a classroom,
is a poor teacher. Both math subject knowledge and math pedagogy
are essential. I believe that my study is consistent with these
statements.... I would be surprised to hear of any research
that demonstrated successful teaching that lacked either of those
elements. My study supports the importance of appropriate
credentials. Supposing that you could find people who know math
to teach, if they lack the ability to communicate effectively
with children, they will not succeed in the classroom and will
create dissatisfied students, parents, colleagues,
administrators, and board members. It will be a mess. Higher
standards, not lower, are the solution.
Fetler also noted that, "the unit of analysis in my
paper is the school. It is not based on statewide aggregated
data."
Two other recent school-level studies in California have found
significant negative relationships between average student scores
on the state examinations and the percentage of teachers on
emergency permits, after controlling for student socioeconomic
status and other school characteristics (Betts, Rueben, &
Dannenberg, 2000; Goe, forthcoming). Like Fetler's study,
these studies also found smaller positive relationships between
student scores and teacher experience levels, with negative
effects on student achievement associated with the proportion of
beginning teachers.
California's experience is a good example of what
happens when pressures and supports for hiring credentialed
teachers are relaxed. After nearly a decade of inadequate and
unequal salaries, easy access to emergency permits and waivers,
and few incentives for the training and equitable distribution of
qualified teachers for high-need fields and locations,
California, now one of the lowest-achieving states in the nation,
found itself with more than 40,000 teachers teaching on emergency
permits or waivers by 1999-2000. The vast majority of these
teachers were teaching in a small number of urban school systems
in schools with the highest proportions of low-income students
and students of color. High-minority schools were nearly seven
times as likely to have uncredentialed teachers as low-minority
schools. Low-achieving schools were nearly five times as likely
to have uncredentialed teachers as high-achieving schools (Note 15) (Shields
et al., 2000, pp. 41-43).
These results mirror those already noted in Baltimore,
Houston, and other cities. The pattern appears across the
country. For example, a recent series in the Chicago "Sun
Times" (Note 16)
documented that "children in the state's
lowest-scoring, highest-minority and highest-poverty schools were
roughly five times more likely to have teachers who had flunked
at least one certification test" and were least likely to
have teachers who were "correctly certified." The
burden should be on those who argue against efforts to ensure
minimally qualified teachers for all students to prove that the
confluence of race, poverty, and low achievement with the
presence of untrained and uncertified teachers does not further
disadvantage our nation's most vulnerable students.
Evidence about Preservice Teacher Education
For the proposition that "there is little evidence that
the content and skills taught in preservice education coursework
is (sic) either retained or effective" (p. 7), Walsh cites
two articles (Murnane, 1983; Veenman, 1984) from among the many
dozens of studies of teacher education that could have been
retrieved from the peer-reviewed literature, had she done a
search. Both of these are very old pieces, published long
before recent reforms in teacher education. Neither of them
makes any statement in support of Walsh's claim.
- Veenman (1984) describes the most frequently cited
problems by novice teachers. These included concerns about
topics ranging from classroom management to teaching loads and
class sizes. Nowhere in the article does he suggest that what
teachers learned in preservice education was not retained or
effective. In fact, he notes that researchers should look more to
the conditions of schooling than to teacher education for
explanations for many of the problems beginning teachers cite.
Veenman notes that the outcomes of teacher education may vary by
characteristics of programs, citing studies finding that those
who had had more intense student teaching, more
competency-oriented teacher education coursework, or who were
more satisfied with their teacher education experiences reported
fewer problems in the classroom.
- Murnane's (1983) article is not an empirical
study but a brief commentary on the work of another author who
proposed the development of doctoral degrees for teacher
leaders. While he questions the value of doctoral education for
developing pedagogical skills (as would I), Murnane is careful to
point out that there are forms of teacher education that may be
helpful, and that lack of evidence in large data sets about the
effects of preservice education may be related to the lack of
data collected on the topic at that time, nearly 20 years ago.
(See additional discussion of this point under "Evidence
about Verbal Ability" below.)
- Walsh ignores the findings of other studies on this topic,
including some she has cited for other propositions. She
criticizes Evertson, Hawley, and Zlotnik (1985) for their
interpretion of the findings of Edward Begle (1979),
"a respected mathematician" regarding his findings
about teachers' subject matter preparation (p. 34). In one
of the few early data sets providing evidence about teacher
preparationa mammoth study of 112,000 students conducted
through the National Longitudinal Study of Mathematical AbilitiesBegle
(reported in Begle & Geeslin, 1972 and, with
additional data, in Begle, 1979) found that measures of teacher
subject matter knowledge did not exert strong influences on
student achievement. He also found that coursework in
mathematics methods had a stronger effect on student achievement
than higher-level coursework in the subject matter (discussed in
Begle, 1979). On the lack of influence of subject matter
knowledge in his earlier study (Begle & Geeslin, 1972) Begle
noted, and Walsh reports, that the teachers in the study may have
had stronger content knowledge than the norm, since they had all
been accepted to a National Science Foundation Summer
Institute. This is an appropriate point.
However, Walsh chooses to ignore Begle's findings about
the value of education coursework. She does not explain why.
Walsh cites Begle's work at several points in her text, and
refers readers to her appendix for a review of his work that is
no longer there. In her separately-published appendix, Walsh
admits of Begle (1979) that, "this is a scholarly work,
employing defensible analyses at the time it was written for
examining the data." She then nonetheless sought to
dismiss it with a vague statement about possible aggregation bias
(although achievement data were aggregated only to the classroom
level), "too many variables" in the data set, and
"much greater variance in the number of subject matter
courses teachers took than the number of methodology courses they
took." This last complaint is particularly odd. The
implications of greater variability in subject matter courses
contradicts the point she makes above about the possibly high
levels of subject matter knowledge among sample members (in re:
Begle & Geeslin, 1972). In fact, wider variability would generally
make it easier to find effects, if they are there to be found,
rather than harder. In another instance (regarding Byrne, 1983),
Walsh notes, correctly, that the limited variability in subject
matter coursework levels may have made effects more difficult to
find. Walsh seems confused about the research findings and
their implications but clear about her goal of discrediting any
results that support the value of teachers learning about how to
teach their content to others.
- Monk (1994) offers similar findings on this question
from a more recent data set that incorporates more fine-grained
variables about teacher education. Using data on 2,829 students
from the Longitudinal Study of American Youth, Monk (1994) found
that teachers' content preparation, as measured by
coursework in the subject field, is positively related to student
achievement in mathematics and science, but he notes that the
relationship is curvilinear, with diminishing returns to student
achievement of teachers' subject matter courses above a
threshold level (e.g., five courses in mathematics). In
addition, teacher education coursework (e.g. methods courses in
the content area) had a positive effect on student learning in
mathematics, exhibiting "more powerful effects than
additional preparation in the content area" (p. 142). Monk
concluded that "a good grasp of one's subject area is
a necessary but not a sufficient condition for effective
teaching" (p. 142).
Monk told me that when Walsh first shared her brief appendix
review of his work with him, he was surprised that she had used
his work to emphasize the importance of subject matter knowledge
without acknowledging his findings on the value of education
courses. He noted in an email to me that he had communicated to
Walsh that:
My study of relationships between teacher course taking
experiences and subsequent student gains in performance showed
that the number of both content courses and content-specific
pedagogy courses in a teacher's background is positively
related to pupil test score gains in the relevant content area.
It is misleading to report the positive results for the content
courses and to not acknowledge the positive results for the
pedagogy courses.
After Monk communicated with Walsh, she did acknowledge in
her appendix that Monk's study provides support for the
contention that education coursework has a positive effect on
teaching performance; however, she did not incorporate this
admission in her claims that "not one" of the studies
ever cited on this topic provides such support.
- In addition to newer databases that allow some large-scale
examinations of the influences of teacher education variables on
student achievement, recent studies have begun to look at the
outcomes of different teacher education program designs. For
example, studies of 5-year teacher education
programsprograms that include a bachelor's degree in
the discipline plus an additional year of education study and
extended student teachinghave found graduates to be more
confident and better rated than graduates of 4-year programs in
the same institutions and as effective as more senior teachers,
as well as more likely to enter and remain in teaching (Andrew
& Schwab, 1995; Denton & Peters, 1988). Walsh does
not review or cite any of these studies, even those that were
available for her information from previous research she claims
to have scrutinized.
The Influence of Verbal Ability on Teacher
Effectiveness
There is little disagreement about the fact that verbal
ability and subject matter knowledge influence teacher
effectiveness, although Walsh tries to set up a straw man by
suggesting, inaccurately, that some researchers, including
myself, have argued otherwise. (See the section on
"Misrepresentations of Research" below.) There are
two areas of real disagreement, however. One is whether verbal
ability alone is the only or best measure of teacher
effectiveness. The other is how to evaluate the size of
relative contributions of various kinds of knowledge to teacher
effectiveness.
As examples cited earlier illustrate, the literature on
teacher characteristics and their effects on teacher performance
has been a captive of the measures most likely to be available in
large data sets at any moment in time. While there are many
studies evaluating the influences of teachers' standardized
test scores, especially measures of verbal or general academic
ability, because these variables have been readily available in
large-scale data sets since the 1960s, data on teachers'
course-taking backgrounds or teacher education experiences have
been included in large data sets only since the early 1990s.
Thus, there are more studies finding influences of variables that
have most often been measured.
Finally, most of the studies that have included measures of
verbal ability or content knowledge have not included measures of
teacher education or certification. In a recent review, Wayne
and Youngs (in press) found five studies that observed
relationships between measures of teachers' verbal or
general academic ability and student achievement and that met the
standard of having controlled for students' socioeconomic
status and prior achievement. Four of these studies employed
data sets from the 1960s and 1970s and none of the five included
measures of teacher education or certification. Looking across
studies in these different eras, in many cases, the relative
effect sizes of verbal ability measures are no larger than those
of teacher education and certification measures in the studies
that use these instead.
- Walsh uses an article by Murnane (1983) written
nearly 20 years ago to argue for the primacy of verbal ability as
a correlate of teacher effectiveness. She states, illogically,
that, "to concede this relationship would mean
acknowledging that formal teacher preparation is not as critical
to student achievement as some would advocate" (p. 41).
However, Murnane pointed out in his article that evidence about
the influence of verbal ability was partly a function of the fact
that teachers' standardized test scores were one of the few variables about
teachers available in large-scale databases at that time, which
did not include good measures of teacher education. In discussing
the results on verbal ability, he diverges from Walsh's
interpretation, stating:
Clearly one should not interpret these results as indicating
that intellectual ability should be the sole criterion used in
recruiting teachers or that formal teacher training cannot make a
difference. In fact, the lack of evidence supporting formal
preservice training as a source of competence may be to some
extent a result of limitations in the available data. For
example, all databases suitable for examining the correlates of
teaching effectiveness as measured by student achievement gains
pertain to a single school district. Since there is less
variation in training among teachers within a district than among
teachers in the country at large, these databases do not permit
the most powerful possible tests of the efficacy of alternative
teacher training programs (p. 565).
- Walsh tries to use another article by Greenwald,
Hedges, and Laine (1996) as evidence that verbal ability is
the only critical variable influencing teacher effectiveness, and
misrepresents a communication she had with Larry Hedges, one of
the study's authors, regarding the appropriate
interpretation of his findings. Characterizing Greenwald,
Hedges, and Laine's article as "a sound review of 60
studies," she then criticizes a direct reference to its
findings in a report by the National Commission on Teaching and
America's Future (Walsh, p. 17). Her criticism first
alludes, incorrectly, to a chart in the Commission's report
(which in fact referred to another study, (Note 17)) then she criticizes the interpretation
of the chart. The correct chart in the Commission's report
(Figure 5, entitled "Effects of Educational
Investments" in Darling-Hammond, 1997, p. 9) was reproduced
directly from Greenwald, Hedges, and Laine's table 7,
column 1 (p. 379) with the same variable labels and statistics as
presented in the original source. It describes the size of
increase in student achievement for every $500 spent on several
different kinds of investments. Here is a reproduction of the
table from Greenwald et al.'s study:
Table 7 The effect of $500a per student
on achievementb
|
Sample
|
|
Input Variable
|
Full Analysis
|
Publication bias robustness
|
|
Per pupil expenditure
|
0.15
|
0.15
|
|
Teacher education
|
0.22
|
0.20
|
|
Teacher experience
|
0.18
|
0.17
|
|
Teacher salary
|
0.16
|
0.08
|
|
Teacher/pupil ratio
|
0.04
|
0.04
|
a1993-94 dollars
bAll achievement outcomes are in standard
deviation units.
|
In explaining the table, study authors noted that
The magnitudes (of the effects) for teacher education and
teacher experience are higher than, but of the same magnitude, as
PPE (per pupil expenditures). That is, one would expect
comparable and substantial increases in achievement if resources
were targeted to selecting (or retaining) more educated or more
experienced teachers. (p. 380)
The Commission used this finding, as Greenwald, Hedges, and
Laine had done, as an indicator that investments in teacher
education showed stronger influences on pupil achievement gains
than investments in other resources, like reduced teacher/pupil
ratios. We noted in discussing their overall study that the authors had
found evidence of the influences of teacher ability and
experience, along with teacher education. However, Walsh
criticizes the Commission's two-sentence characterization
of the research (which she calls a discussion "in
considerable detail") for failing to note that Greenwald,
Hedges, and Laine found more studies supporting the influences of
teacher verbal ability on achievement than what they labeled
"teacher education" (measured in their study as
masters degrees because this was the most widely used measure in
large data sets.) She suggests that Hedges disagrees with the
Commission's characterization, a view that Hedges clarified
was inaccurate when I spoke to him. He indicated that Walsh had
not revealed her interpretation of his findings when she
contacted him, and wrote the following to explain his own view of
the proper interpretation of his findings:
It is true that the relationship between teacher verbal
ability and student achievement is relatively large and
consistent across the few studies that have examined it. However
this does not imply that investing in teacher ability (among
possibly poorly qualified teachers) is a cost effective way to
enhance student achievement. There are two reasons. First,
teacher ability (among qualified teachers) may be more expensive
than other resources that could be purchased to improve
achievement. That is, there could be a strong relationship but
high cost. Second, and more important, the relations found in
the studies Greenwald, Hedges, and Laine (1996) reviewed were
studies of practicing teachers. There is no reason to expect
that the same relation holds among those who are not part of the
teaching workforce.
The point here, similar to that made by Murnane
(above), is not that verbal ability is not important, but that
the evidence does not prove it is the only important contributor
or the most efficient way to achieve teacher effectiveness. In
fact, most current certification systems
combine tests of basic skills and general academic ability,
subject matter, and teaching knowledge with evidence of
successful supervised clinical experience and coursework focused
on teaching knowledge and skills to help candidates
assemble many sources of expertise in a
more coherent way than would otherwise be the case.
In pursuit of her argument that only verbal
ability makes a difference, Walsh seeks to discount other studies
that have found strong influences of teacher certification test
scores on teacher effectiveness as being relevant only to the
measurement of verbal ability and irrelevant to the broader
question of teacher certification. These studies are also
misrepresented.
- In her discussion of Schalock (1979) in the appendix
(B13), Walsh seeks to dismiss his review's findings about
the limited evidence regarding the relationships between
teachers' measured intelligence and other indicators of
effectiveness because the review is "old, old!!" and
because, she argues,
"More recent research such as Summers and Wolfe, 1977;
Ferguson, 1991; Ferguson & Womack, 1996 (sic);
Murnane, 1983; Hanushek, 1971; Strauss and Sawyer, 1986 suggest
that intelligence (measured by SAT, verbal ability tests and
college selectivity) are indeed substantially important."
Aside from the facts that two of these "more
recent" studies pre-date the review she dismisses as
"old, old!" and one (Murnane, 1983) is not a study at
all, Walsh here cites two studies that she dismisses elsewhere
for "aggregation bias" (Ferguson, 1991 and Strauss
& Sawyer, 1986, see Walsh, p. 27) and another (Ferguson &
Womack, 1993) that she dismisses without stating a reason (see
discussion of Wilson et al., in Appendix B). (Note 18) Walsh's readers are
referred to
Appendix B for reviews of these issues, but the studies are not
included there.
- Walsh cites Ferguson (1991) for a number of
her propositions, including the fact that teacher quality matters
(p. 5), that teacher race does not matter (p. 6), and that verbal
ability matters (p. 6). Later, she claimswhen she
wants to dismiss the study for its findings about teacher
education and certificationthat the study suffers from
aggregation bias, a concern I address in the next section on
methodological issues. Ferguson's
analysis of nearly 900 Texas school districts controlled for
student background and district characteristics; he found that combined
measures of teachers' expertisescores on a state teacher
licensing examination, master's degrees, and
experienceaccounted for more of the inter-district
variation in students' reading and mathematics achievement (and
achievement gains) in grades 1 through 11 than student
socioeconomic status. An additional, smaller contribution to
student achievement was made by lower pupil-teacher ratios and
smaller schools in the elementary grades. The effects were so
strong, and the variations in teacher expertise so great, that
after controlling for socioeconomic status, the large disparities
in achievement between black and white students were almost
entirely accounted for by differences in the qualifications of
their teachers.
As I noted in an earlier review of this study
(Darling-Hammond, 2000c), of the teacher qualifications variables,
the strongest relationship was found for scores on the TECAT, a
state licensing examination described by the test developer as a
test that measures basic skills and professional knowledge. The
Texas Education Agency's published outline of the test
content shows that it seeks to measure verbal ability, logical
thinking, research skills, and a set of items on professional
knowledge. Walsh takes issue with this description of the test
and argues that the study does not support the value of teacher
certification because the test should be considered primarily a
basic literacy test. In Walsh's view, this makes it
irrelevant to the question of teacher certificationeven
though it is required for teachers to maintain their
certification. She also argues that the relatively smaller
influence of master's degrees in Ferguson's study
(which accounted for about 5% of the explained variance) means
that teacher education is unimportant, and she criticizes the
fact that I discuss the three variables associated with teacher
quality (TECAT scores, experience, and masters degrees) in
combination, although this is also the way in which Ferguson
discusses them at several points in his analysis.
Walsh's arguments are illogical in several ways. First,
while it is true the TECAT measures basic skills, it also
measures other academic abilities and professional knowledge, as
confirmed by the test maker's documentation and
administering agency's descriptions. There is no basis for
making judgments contrary to the claims of the developers. In
addition, the test would not exist at all if there were not a
state certification system requiring it. Like all of the other
variables one can evaluate in studies of this kind, the test
scores are a rough proxy for many aspects of teacher capacity
that may matter for their performance. In a regression equation
of this sort where one variable stands in for others for which
data are not available, it undoubtedly captures the effects of
other unmeasured factors. Even if it were true that the test was
a weak measure of professional knowledge, this would not mean
that professional knowledge is unimportant or that verbal ability
is the only important variable for predicting teaching ability.
Only a better measure of professional knowledge (coursework or a
more in-depth test of teaching knowledge) would allow a test of
this question. Finally, as Hedges notes above, since the
Ferguson study was based on practicing teachers, its findings do
not shed light on the relative effectiveness of non-teachers who
might score differently on the tests.
Masters degrees and experience are other very partial measures
of teacher knowledge and skill that show a modest effect in this
study and a larger effect in Ferguson and Ladd's (1996)
similar study in Alabama that included a weaker test measure of
pre-college general skills (the ACT), which is not designed to capture
knowledge relevant to teaching. However, masters degrees
are also a very crude proxy for teacher education, given the wide
variability in the content of masters degrees pursued by
teachers, many of which have been pointed at jobs outside of
teaching, such as administration, counseling, measurement and
evaluation. In fact, aside from MAT preparation
programs in a small number of institutions and specialist
programs for reading and special education, there were few
masters degree programs for the study of teaching until the
recent advent of 5-year teacher education programs and masters
degrees developed around the National Board for Professional
Teaching Standards that focus on content pedagogy. Thus, there
is reason to expect that some masters degree studies would affect
teaching ability, but not much reason to expect the effect of
masters degrees as an undifferentiated variable to be uniform or
large in the aggregate, a point I have made in earlier commentary
(Darling-Hammond, 2000a). Goldhaber and Brewer (1998, 2000) have
made the same point and have completed research that documents
the greater influence of both bachelors and masters degrees in
the content area taught (e.g. mathematics or mathematics
education) as compared to undifferentiated degrees.
It makes more sense to consider these variables together as
proxies for expertise than to treat them as mythically precise
measures of totally unrelated constructs. As I have argued
elsewhere, research on teaching suggests a view of expertise that
includes general knowledge and ability, verbal ability, and
subject matter knowledge as foundations; abilities to plan,
organize, and implement complex tasks as additional factors;
knowledge of teaching, learning, and children as critical for
translating ideas into useful learning experiences; and
experience as a basis for aggregating and applying knowledge in
nonroutine situations (Darling-Hammond, 2000a). David
Berliner's studies of expertise in teaching, for example,
include experience along with several other traits as a critical
aspect of expertise (see e.g. Berliner, 1986). All of these
factors combine to make teachers effective; furthermore, one
cannot fully partial out the effects of one factor as opposed to
another as many are highly correlated.
- Walsh also cites Strauss and Sawyer (1986)
for her proposition that verbal ability matters (p. 6), but fails
to report the study's actual findings and seems
unconcerned that it might suffer from "aggregation bias."
In a study of 145
school districts in North Carolina, these researchers found that
teachers' average scores on the National Teacher
Examinations (NTE) had a strong influence on average school
district test performance. Although the authors did not specify
which portion(s) of the NTE were used as measures, the Weighted
Common Examinations Test (WCET) was required in North Carolina at
that time The WCET included separate subtests measuring general
knowledge and professional knowledge about teaching. Walsh
apparently wants to count this as a test of verbal ability, but
does not acknowledge the Professional Knowledge Examination
portion of the test.
The authors found that, taking into account per-capita income,
student race, district capital assets, student plans to attend
college, and pupil/teacher ratios, teachers' certification
test scores had a strikingly large effect on students'
failure rates on the state competency examinations: a 1% increase
in teacher quality (as measured by NTE scores) was associated
with a 3 to 5% decline in the percentage of students failing the
exam. The authors' conclusion is similar to
Ferguson's (1991):
Of the inputs which are potentially policy-controllable
(teacher quality, teacher numbers via the pupil-teacher ratio and
capital stock), our analysis indicates quite clearly that
improving the quality of teachers in the classroom will do more
for students who are most educationally at risk, those prone to
fail, than reducing the class size or improving the capital stock
by any reasonable margin which would be available to policy
makers (p. 47).
The same illogic holds in regards to the dismissal of this
study as the previous one.
In addition to questions about the content of tests used in
various studies, the measures that appear in large data sets are
always relatively crude proxies for the constructs under study,
so it is impossible to know with great precision exactly what
trait is being represented when a variable shows an effect. For
example, scores on tests of academic ability like the SAT have
generally been strongly correlated with scores on ETS subject
matter and professional knowledge tests (Gitomer, Latham, and
Zimek, 1999); in eras when higher degrees were less common (e.g.
pre-1980), verbal ability scores were also strongly correlated
with masters degrees. Where certification tests are in place,
test scores correlate with certification status. And both
certification status and masters degrees typically correlate with
teacher experience, since most states require teachers to obtain
certification in order to remain in the workforce and most
teachers have traditionally secured masters degrees by taking
courses over time while teaching. (This is changing to some
extent where beginning teachers are being trained in
post-baccalaureate or 5-year programs and sometimes enter the
workforce with a masters degree).
These interrelationships do not invalidate studies that have
used one or more of these variables, but they are one reason why
it is difficult to say with certainty which of these measuresor
other unmeasured variables that are related to themare associated
with measured effects. The
correlational studies that Walsh relies on almost exclusively do
not establish causation; they point to possible relationships for
further, more fine-grained exploration. However, Walsh often
dismisses other large studies and the more fine-grained
studies from consideration, at least when the findings do not
suit her predilections.
- Walsh also cites Ferguson & Womack (1993) for
her proposition that verbal ability matters most, although the
reason for this is unclear. This study of more than 250
candidates from a single teacher education program examined the
influences on 13 dimensions of teaching performance of education
and subject matter coursework, NTE subject matter test scores,
and GPA in the student's major. The ratings of performance
were based on detailed descriptors of teaching on 107 items
evaluated by subject matter specialists and education
supervisors. The authors found that the amount of education
coursework completed by teachers explained more than four times
the variance in teacher performance than did measures of content
knowledge (NTE specialty scores and GPA in the major). It is
possible that Walsh cites this study as support for verbal
ability influences because she has confused the NTE specialty
tests of subject matter knowledge with other components of the
NTE battery measuring general academic ability. In any event, the
strength of the relationship was very small. Given her
willingness to cite the study for a very weak finding about
verbal ability, it is interesting that she does not cite it for
its much stronger finding that education coursework mattered for
teaching performance.
In her separately-published appendix, Walsh seeks
to dismiss the Ferguson & Womack study because it is limited
to a single institution (Note 19) and uses "supervisor's evaluations" as
the measure of performance. As noted earlier, she is willing to
use studies based on such measures for her own claims, despite
her assertions that they should not be included. More important,
in this study the ratings are not the global ratings from school
principals that have often been found to be relatively low in
reliability. They are lower-inference ratings based on a
detailed protocol used by subject matter specialists and
university supervisors, which are typically more reliable. In
addition, the limitations on generalizability created by the use
of a single institution are not fatal to consideration of the
findings. They require that the study be considered in the
context of other studies on similar questions using different
samples. Such studies have been conducted.
- In a similar study which compared relative influences of
different kinds of knowledge on 12 dimensions of teacher
performance for more than 270 teachers, Guyton and Farokhi
(1987) found consistent strong, positive relationships
between teacher education coursework performance and teacher
performance in the classroom as measured through a standardized
observation instrument (the Georgia Teacher Performance
Assessment Instrument), while relationships between classroom
performance and subject matter test scores were positive but
insignificant and relationships between classroom performance and
basic academic skill scores were almost nonexistent. (The two
measures of basic academic skills were the Georgia Regents'
test, a required examination for public university students, for
which the researchers used reading and essay scores, and the
states' Teacher Competency Test.)
The researchers noted that extensive reliability studies had
been conducted to support the reliability of the TPAI performance
measure, which was used statewide as an assessment for
certification. Walsh eliminates this study from consideration
because it is a single institution study and refers the reader to
Appendix B for her review (p. 25). In her appendix, Walsh
criticizes the study for its reliance on supervisors'
ratings, again failing to distinguish the research on
principals' general teacher evaluation ratings from the
research on the reliability of the TPAI as an observational
instrument. She also apparently failed to read the study
carefully, questioning why the numbers of teachers differ for
various comparisons, not having noted the authors'
explanation that all correlations depended upon the number of
teachers for whom data on both variables were available (p. B11).
Whereas Walsh tries to paint an unambiguous picture about the
value of such measures as verbal ability (suggesting, for
example, that these scores be reported statewide as a primary
measure of accountability) and the lack of value of teacher
education, the real picture is decidedly more complex. Her
evidence for her claims confuses measures of verbal ability with
measures of professional knowledge and subject matter knowledge,
and often includes studies that actually show influences of these
other kinds of knowledge that are at least as strong as measures
of verbal ability. The world is just not as simple as Walsh
would like to make it appear. Even strong advocates of the
notion that academic ability matters are not willing to make the
kinds of over-assertions Walsh urges. For example, Hanushek
(1992), whom Walsh cites repeatedly for her defense of verbal
ability as a key measure concludes:
The closest thing to a consistent finding among the studies is
that "smarter" teachers who perform well on verbal
ability tests do better in the classroom. Even for that the
evidence is not very strong (p. 116).
While it would be ridiculous to argue that verbal ability and
subject matter knowledge do not matter for teaching, it is
equally ridiculous to argue that knowledge of teaching and
learning and the opportunity to learn to teach under the close
supervision of a master teacher through student teaching and
other guided experiences do not matter at all. The literature
just does not support this reading or the policy implications
that Walsh would draw.
The Academic Ability of Teachers who Lack
Certification
Another argument made by those who would eliminate
certification is that an unconstrained market would allow the
recruitment of individuals with higher verbal or general academic
ability who do not now enter teaching. While it is probable
that some individuals would choose to teach if they did not have
to prepare, it is not clear that most of these entrants would be
more academically able, that they would be better teachers, or
that they would stay long in teaching. It is also unlikely that
given current wages, individuals who are now preparing for much
higher-paying careers in medicine, the law, engineering, and
other professions that require much more onerous preparation and
licensing processes would choose teaching as a career simply
because they did not have to be certified.
Labor market contexts are relevant to this question. The
qualifications of individuals preparing for teaching improved
noticeably between the early 1980s and the early 1990s in terms
of both academic attainment and ability measures, in part because
of the changes in admissions requirements to teacher education
adopted by states and universities but also likely because of the
substantial increases in real wages for teachers that occurred
during the 1980s. Whereas prospective teachers were
disproportionately drawn from the bottom quartile of college
students in the early 1980s (Lanier & Little, 1986), both
grades and test scores improved for teacher candidates by the
1990s.
The Recent College Graduates Survey, which tracks college
graduates into the labor market, found that the grade point
averages of newly qualified teachers in 1990 were higher than
those of the average college graduate, with 51% earning a GPA of
3.25 or better as compared to 40% of all graduates (Grey et al.,
1993). However, average GPAs were significantly
lower for the 15% of college graduates entering
teaching who were neither certified nor eligible for
certification. Most of the uncertified entrants (57%) had grade
point averages below 3.25, and 20% had GPAs below 2.25.
Attrition was also high for the untrained candidates. By the
time of the survey (one year later), only one-third of the
uncertified entrants were still engaged in teaching as their
primary jobs (Grey et al., 1993).
In addition, the Educational Testing Service found that among
270,000 test-takers in 1995 through 1997, college admissions test
scores were highly correlated with initial teacher licensing
scores (Praxis I and Praxis II), and the lowest average scores on
both kinds of tests were those held by individuals who entered
teaching without preparation (Gitomer, Latham, and Zimek, 1999).
(Walsh describes this 14% of the
sample as an "error" in the study since the
individuals had not enrolled in a teacher education program; she
misunderstands the fact that these Praxis test-takers were the
entrants to teaching who used emergency or alternative routes. (Note 20)
Prepared teachers scored much higher than unprepared
teachers.
While students who prepare to enter fields other than
teaching have higher average test scores on measures like the SAT
than do those preparing to enter elementary school teaching,
there is no significant difference for prospective secondary
teachers, most of whom earn a disciplinary degree along with
their teaching certificate.
The narrowing of this gap between prospective teachers and others
is likely a function of the more rigorous admissions requirements
for teacher education enacted in most states and the growth in
wages between the early 1980s and the mid-1990s.
Finally, the study found that graduates of NCATE-accredited
colleges of education passed the Praxis subject matter tests for
teacher licensing at a significantly higher rate than did
graduates of unaccredited programs, boosting their chances of
passing the examination by nearly 10 percent (Gitomer, Latham,
and Zimek, 1999). Walsh suggests that this higher Praxis pass
rate might simply reflect the fact that NCATE schools could be
located in states with low cutoff scores. However, additional
analyses of the data by ETS and another independent study (Note 21) indicate that this is
not the case. A more likely explanation is that NCATE's
requirements that colleges demonstrate how they screen applicants
for general ability and that they ensure strong content
backgrounds translate into somewhat greater attention to these
matters in institutions that are accredited. These data suggest
that standards may increase the general as well as specialized
qualifications of prospective teachers. They do not suggest that
removal of certification requirements brings higher ability
individuals into teaching or keeps them there.
It is important to recognize that labor market
incentives operate among individuals actually entering teaching.
For example, several studies of alternative certification
programs found that the academic records of recruits varied
substantially by teaching field, with alternatively-certified
candidates in high demand shortage fields, such as mathematics
and science, having much poorer academic records than candidates
in other fields and than candidates from traditional teacher
education programs in those same fields (see Natriello & Zumwalt, 1992,
re: New Jersey; Lutz and Hutton, 1989 re: Dallas; Stoddart, 1992,
re: Los Angeles). It is unlikely that eliminating
requirements for training would increase the career attractions
to teaching for academically able candidates as much as increased
wages would. Meanwhile, eliminating training requirements could
result in a less well-qualified teaching force, especially if the
elimination of certification standards not only reduced the
knowledge of entrants but also reduced pressures for competitive
wages.
The Private School Argument
Finally, a claim sometimes made by opponents of teacher
certification, including Walsh, is that private schools are more
effective than public schools, and that this is becauseor
at least is not impeded bythe fact that private school
teachers are not certified. There are two major problems with
the private school "proof": First, there are
conflicting findings about the relative effectiveness of public
and private schools, with credible evidence on both sides of the
question. Second, most private school teachers are certified and
an even larger majority have specific preparation for teaching,
even when they have not sought certification.
On the effectiveness of private schools, Walsh cites Coleman,
Hoffer, & Kilgore (1982), who examined data from the first
wave of High school and Beyond surveys, conducted in 1980, and
found evidence of higher performance for comparable students in
Catholic and other private schools as compared to public
schools. The researchers attributed their findings primarily to
differences in student behavior across school sectors, measured
by variables like lower rates of absenteeism, cutting class, and
fighting, along with factors like more time spent on homework and
higher individual student attendance. They also found that
achievement was actually higher for comparable
students who were in public schools that had these
characteristics. Subsequent studies have produced findings that
favor both public and private schools after controlling for
student characteristics and school organization (Bryk & Lee,
1992; Lee & Bryk, 1988; Lee, Dedrick, & Smith, 1991).
Most studies have pointed to variables like school and class
size, school organization, and curriculum differentiation as
critical variables in determining both public and private school
effectiveness. When these factors are controlled, public school
students often do as well or better than private school students
in schools with similar features.
Furthermore, differences in the preparation of
public and private school personnel are not as large as many
people assume. More than 30 states certify private school
personnel (Feistritzer, 1984), and, when Coleman did his
analysis, more than 85% of private and parochial school teachers
were certified, as compared to about 95% of public school
teachers (NCES, 1985). This has changed only slightly in the
years since. Although certification is not required for private
school teachers in all states, only 34% of private school
teachers in 1993-94 (the most recent year for which national data
are available), were not certified in their primary assignment
field. Some of these teachers were certified in fields other
than their primary assignment field. Many undertook teacher
preparation, even though they did not apply maintain a state
license or certificate. In 1993-94, public and private school
teachers were almost equally likely to have received an
undergraduate degree in education (68.9% for public vs. 61.5% for
private elementary teachers and 19.8% for public vs. 19.3% for
private secondary teachers) (NCES, 1997, p. 25). The education
degree as an indicator of preparation is quite partial, since the
education degree has waned as certification increasingly requires
a content degree with an education minor or credential. The
percentage of 1992-93 bachelor's degree recipients who had
taken education courses was 87.1% for public school teachers and
71.6% for private school teachers, (Note 22) and the average number of education
credits earned was 37.4 for public school teachers as compared to
35.2 for private school teachers (NCES, 1997, table A-51). (Note 23)
Public school teachers were also more likely to have taken
subject matter degrees in their teaching fields than private
school teachers. For example, 66% of public school mathematics
teachers held a major or minor in the field, as compared to 58%
of those in private school. (Goldhaber and Brewer, 2000 reported
a similar finding.) The same differentials hold in other fields
to somewhat lesser extents. The greater content preparation of
public school teachers is likely a function of the fact that
certification has required increasing amounts of subject matter
coursework in the field to be taught, thus leveraging stronger
content preparation for public school teachers in states where
private school teachers are not required to hold certification.
Almost all states now require certified teachers to hold at least
a minor in the field to be taught, and many require a major in
the field.
Finally, even if it were true that untrained teachers were
unusually effective in some private schools for students of
comparable initial achievement levelsa point about which
there is no published evidenceit would be a large leap
of faith to assume that such teachers would be equally effective
in schools where many students have much greater educational
needs and students are not pre-selected for their academic
ability, their positive school attendance and behavior, and their
parents' income and interest in education. There are very
large differences in the populations of students attending public
and private schools in the United States, (Note 24) which have important implications for
teachers' knowledge and skills. It is one thing for a
teacher to offer information in whatever manner comes
instinctively to students who are academically able, have learned
to learn independently, and are well-supported at home by
educated parents, tutors, and other supports for their learning.
It is quite another thing to teach by the seat of the pants when
students do not have these learning supports at home and may
present a variety of language and learning differences. Being
effective with students who need substantial support for their
learning requires greater diagnostic ability and knowledge of how
to present information and structure experiences in ways that
help them become successful. Systematic knowledge about how to
organize curriculum and reach students with special learning
needs is most needed in the schools that serve most students with
these needs.
Other Misrepresentations of Research Findings
The remainder of Walsh's review continues the kind of
misrepresentations documented above, appearing to rely on the
belief that readers will read its accusations, but will not read
or understand the research itself. Although she prepared a draft
appendix with 192 studies that sought to critique many of the
studies she dismisses (often inaccurately), it was not published
with the report. Appendix B, to which
the reader is repeatedly referred for reviews, includes only 14
studies. Throughout the report, the reader is referred to
this appendix for critiques of studies that do not appear
there. The selection of research included in the published
version of the report's appendix is very strange. Many
strong studiessome of the key citations in the fieldare
omitted, along with the flawed rationales for dismissing them
that now appear in a separately-published appendix. Some much
less important and less well-designed studies are included, with
the apparent goal of critiquing their size or designs as though
they represented the dozens of studies not mentioned or
excluded. Thus, the paper does not include information
regarding most of the studies Walsh claims she has reviewed and
does not provide evidence for her claim that, of all the studies
cited in support of teacher education and certification,
"none bear up to scrutiny."
Here are just a few additional examples of major
misrepresentations.
- Goldhaber & Brewer (2000). In a string
of citations, Walsh lists a study by Goldhaber and Brewer (2000),
for its finding that teachers with a degree in their subject
matter are more effective than those without such degrees. This
study fits all of Walsh's desiderata: It is large (using a
data set that includes more than 3,000 teachers), recent, and
published in a peer-reviewed journal. However, Walsh does not
cite the authors' findings that certification status has an
even greater influence on teachers' effectiveness than a
degree in the subject area. Later, Walsh states,
"...most research indicates that the most distinct
problem in schools serving poor children is the number of
teachers who are teaching subjects in which they have no
expertise (Goldhaber & Brewer, 2000; ... Hawk, Coble,
& Swanson, 1985). These studies do not show that
certification status, as an isolated variable, has any
significant effect on the achievement level of children who are
poor or minority." (p. A6).
Neither study examined the subject matter expertise of teachers
in low-income schools, and both found strong effects of
certification on student achievement.
In fact, Goldhaber and Brewer
wrote:
Turning to an examination of the effect of teacher
certification, we find that the type (standard, emergency, etc.)
of certification a teacher holds is an important determinant of
student outcomes. In mathematics, we find the students of
teachers who are either not certified in their subject (in these
data we cannot distinguish between no certification and
certification out of subject area) or hold a private school
certification do less well than students whose teachers hold a
standard, probationary, or emergency certification in math.
Roughly speaking, having a teacher with a standard certification
in mathematics rather than a private school certification or a
certification out of subject results in at least a 1.3 point
increase in the mathematics test. This is equivalent to about
10% of the standard deviation on the 12th grade test,
a little more than the impact of having a teacher with a BA and
MA in mathematics. Though the effects are not as strong in
magnitude or statistical significance, the pattern of results in
science mimics that in mathematics. Teachers who hold private
school certification or are not certified in their subject area
have a negative (though not statistically significant) impact on
science test scores (p. 139).
The authors note that the effect size of "having a
teacher with a standard certification in mathematics rather than
a private school certification or a certification out of
subject" is "a little more than the impact of having
a teacher with a BA and MA in mathematics." Of
course, the certification itself includes requirements for
subject matter knowledge as well as for knowledge of teaching and
learning. In fact, certified mathematics teachers are more likely
to have a degree in the field than non-certified teachers. The
fact that the study found a significant effect of certification
status even after controlling for whether teachers had a degree
in their field and after controlling for experience suggests that
whatever is represented by the certification variable has an
influence above and beyond the influence of content knowledge and
classroom experience.
- Druva & Anderson (1983). This meta-analysis of
65 studies examined relationships between science teacher
characteristics and teaching behaviors, student achievement in
science, or both, using meta-analytic techniques to translate
results from a wide range of studies into Pearson correlation
coefficients in order to compare them. It found that ratings of
teaching effectiveness by principals and students were most
strongly correlated with the number of education courses taken,
followed by student teaching grades, and teaching experience. On
a teacher "effectiveness" scale composed of many
teaching behaviors associated in process-product research with
student achievement, both science training (examined in 28
studies) and education coursework and performance (examined in 47
studies) were related to effectiveness, as were teacher
attitudes, values, and temperament. Associations with cognitive
and affective student outcome measures were found for both
science training and, to a somewhat smaller extent, for education
coursework and performance, based on 34 studies for each of these
sets of variables. The authors concluded that:
Student outcomes are positively associated with the
preparation of the teacher, especially science training, but also
preparation in education and academic work generally....
While the hiring official seeking a new science teacher certainly
must look beyond information on the teacher characteristics
considered in this study, information on some of these
characteristics certainly is worthy of inclusion in the
decision-making process.... In general, the hiring official
would be well advised to employ teachers with thorough
preparation in both professional education and the sciences being
taught. There is a relationship between teacher preparation
programs and what their graduates do as teachers (p. 477).
Walsh seeks to dismiss the results of this study in part by
misreporting them. She states the study "did not show the
benefit of education coursework on student achievement" (p.
19), and that education coursework is not significantly related
to student outcomes, although significance statistics were not
reported in the study. This assertion is not supported by the
authors' reported findings that both science coursework and
education training showed a relationship to teacher effectiveness
as defined by student outcomes (in both cases, though to a
greater extent for science coursework) (Note 25) as well as teaching behaviors and
ratings (reported in the case of education coursework
only).
- Darling-Hammond (2000). Walsh criticizes and
misquotes a study that this author conducted, which examined both
the literature on teacher characteristics and student achievement
and conducted a regression analysis of state-level data from the
National Assessment of Educational Progress and the Schools and
Staffing Surveys (Darling-Hammond, 2000). The study found that
measures of teacher preparation and certification were by far the
strongest correlates of student achievement in reading and
mathematics, both before and after controlling for student
poverty and language status. The conclusion
discussed a number of potential reasons for these large
effects:
The strength of the "well-qualified teacher" variable may be
partly due to the fact that it is a proxy for both strong
disciplinary knowledge (a major in the field taught) and
substantial knowledge of education (full certification). If the
two kinds of knowledge are interdependent as suggested in much of
the literature, it makes sense that this variable would be more
powerful than either subject matter knowledge or teaching
knowledge alone. It is also possible that this variable captures
other features of the state policy environment including general
investments in, and commitment to, education, as well as aspects
of the regulatory system for education, such as the extent to
which standards are rigorous and the extent to which they are
enforced.... Finally, there may be unmeasured correlations
between the extent to which states enact and enforce high
standards for teachers and the extent to which they have enacted
other policies that are supportive of public schools. Although it
does not appear that teaching standards are strongly related to
investments regarding class sizes or to overall education
spending, it is possible that there are other factors influencing
student achievement which generally co-exist with teacher quality
and which were unmeasured in these estimates.
Walsh seeks to invalidate these findings by raising two
complaints, one of which is inaccurate and the other of which is
a matter of legitimate discussion in the field. She states,
incorrectly, that, "Darling-Hammond did not control for
class size differences among the states" (p. 26).
State-level differences in average class size were in fact
included in the analyses, and the variable had a very small,
insignificant effect. Walsh also complains that the state-level
analyses suffer from aggregation bias because they used average
student test scoresa critique she also levels against
other studies she cited approvingly for their findings in other
parts of the paper (see e.g. Ferguson, 1991; Strauss &
Sawyer, 1986; Coleman, 1966). (Note 26) There are legitimate debates in the field on this
point, and I addressed this question in the study itself, as I do
again below in the section on "Methodological
Issues." For purposes of tracking broad policy trends at
the state level, analyses of state level data offer one useful
lens. This perspective was shared by the nine reviewers who
recommended this paper's publication in a peer-reviewed
journal and a peer-reviewed research report series.
Finally, the literature review contained in this study is
repeatedly mischaracterized throughout Walsh's paper and
her appendix as minimizing or ignoring the influences of verbal
ability and subject matter preparation for teaching.
On the relationship between academic ability and teacher
effectiveness, Walsh states:
Darling-Hammond (1999, p. 6) claims there is "little or
no relationship between teachers' measured intelligence and
their students' achievement." She supports this
statement with two studies by Soar, Medley and Cocker (sic)
(1983) and Schalock (1979). These two studies simply recycle
research from the 1940s and earlier, none of which is retrievable
for scrutiny (p. 21).
Walsh misrepresents this analysis by quoting a portion of a
sentence out of context and citing the reviews that summarized
research on IQ tests as an example of the inappropriate use of
older studies. Here is what I actually said:
While studies as long ago as the 1940s have found positive
correlations between teaching performance and measures of
teachers' intelligence (usually measured by IQ) or general
academic ability (Hellfritsch, 1945; LaDuke, 1945; Rostker, 1945;
Skinner, 1947), most relationships are small and statistically
insignificant. Two reviews of such studies concluded that there
is little or no relationship between teachers' measured
intelligence and their students' achievement (Schalock,
1979; Soar, Medley, & Coker, 1983). Explanations for the
lack of strong relationship between measures of IQ and teacher
effectiveness have included the lack of variability among
teachers in this measure and its tenuous relationship to actual
performance (Vernon, 1965; Murnane, 1985). However, other
studies have suggested that teachers' verbal ability is
related to student achievement (e.g., Bowles & Levin, 1968;
Coleman et al., 1966; Hanushek, 1971), and that this relationship
may be differentially strong for teachers of different types of
students (Summers & Wolfe, 1975). Verbal ability, it is
hypothesized, may be a more sensitive measure of teachers'
abilities to convey ideas in clear and convincing ways (Murnane,
1985)."
Walsh's attempt to distort the text misses two critical
points: First, studies of the relationship between IQ and
teaching effectiveness (which I noted had found positive though
small relationships) were primarily conducted before the 1960s,
because IQ tests came into question as measures of ability at
that time and were no longer often available in large data sets
thereafter. Measures of verbal ability became more popular and
widely available in data sets in the 1960s and following, and
showed somewhat stronger relationships with teacher outcomes, as
I reported in my summary. The studies I cited include many of
the same ones that Walsh cites for this propositiona
point she does not acknowledge as she tries to suggest,
inaccurately, that I minimize the value of measures of academic
ability for teachers. (Note 27)
On the topic of subject matter knowledge, Walsh also suggests
on numerous occasions that I seek to minimize the importance of
teachers' knowledge of content. She offers my work as an
example of her sweeping statement that "certification
advocates ... offer evidence that knowledge of subject
matter has little effect on teaching performance" (p. 19).
Here is what I actually said in my brief summary of the
literature, offering an analysis that clearly acknowledges the
importance of subject matter knowledge for teaching and
interprets the mixed results of studies in terms of what teachers
may need to know in order to teach different things.
Byrne (1983) summarized the results of thirty studies relating
teachers' subject matter knowledge to student achievement.
The teacher knowledge measures were either a subject knowledge
test (standardized or researcher-constructed) or number of
college courses taken within the subject area. The results of
these studies were mixed, with 17 showing a positive relationship
and 14 showing no relationship. However, many of the "no
relationship" studies, Byrne noted, had so little
variability in the teacher knowledge measure that insignificant
findings were almost inevitable. Ashton and Crocker (1987) found
only 5 of 14 studies they reviewed exhibited a positive
relationship between measures of subject matter knowledge and
teacher performance.
It may be that these results are mixed because subject matter
knowledge is a positive influence up to some level of basic
competence in the subject but is less important thereafter. For
example, a controlled study of middle school mathematics
teachers, matched by years of experience and school setting,
found that students of fully certified mathematics teachers
experienced significantly larger gains in achievement than those
taught by teachers not certified in mathematics. The differences
in student gains were greater for algebra classes than general
mathematics (Hawk, Coble, & Swanson, 1985). However, Begle
and Geeslin (1972) found in a review of mathematics teaching that
the absolute number of course credits in mathematics was not
linearly related to teacher performance.
It makes sense that knowledge of the material to be taught is
essential to good teaching, but also that returns to subject
matter expertise would grow smaller beyond some minimal essential
level which exceeds the demands of the curriculum being taught.
This interpretation is supported by Monk's (1994) more
recent study of mathematics and science achievement. Using data
on 2,829 students from the Longitudinal Study of American Youth,
Monk (1994) found that teachers' content preparation, as
measured by coursework in the subject field, is positively
related to student achievement in mathematics and science but
that the relationship is curvilinear, with diminishing returns to
student achievement of teachers' subject matter courses
above a threshold level (e.g., five courses in mathematics).
It may also be that the measure of subject matter knowledge
makes a difference in the findings. Measures of course-taking in
a subject area have more frequently been found to be related to
teacher performance than have scores on tests of subject matter
knowledge. This might be because tests necessarily capture a
narrower slice of any domain. Furthermore, in the United States,
most teacher tests have used multiple-choice measures that are
not very useful for assessing teachers' ability to analyze
and apply knowledge. More authentic measures may capture more of
the influence of subject matter knowledge on student learning.
For example, a test of French language teachers' speaking
skill was found to have significant correlation to
students' achievement in speaking and listening (Carroll,
1975).
It seems logical that teachers' abilities to handle the
complex tasks of teaching for higher-level learning are likely to
be associated, to varying extents, with each of the variables
reviewed above: verbal ability, adaptability and creativity,
subject matter knowledge, understanding of teaching and learning,
specific teaching skills, and experience in the classroom, as
well as interactions among these variables. In addition,
considerations of fit between the teaching assignment and the
teacher's knowledge and experience are likely to influence
teachers' effectiveness (Little, 1999), as are conditions
that support teachers' individual teaching and the additive
effect of teaching across classrooms, such as class sizes and
pupil loads, planning time, opportunities to plan and problem
solve with colleagues, and curricular supports including
appropriate materials and equipment (Darling-Hammond, 1997).
Finally, Walsh suggests in several places that I have
characterized the research as indicating a "negative
relationship between student outcomes and the NTE subject matter
tests" (p. 19). In fact, I stated that "Studies of
teachers' scores on the subject matter tests of the
National Teacher Examinations (NTE) have found no
consistent relationship between this measure of subject
matter knowledge and teacher performance as measured by student
outcomes or supervisory ratings. Most studies show small,
statistically insignificant relationships, both positive and
negative (Andrews, Blackmon & Mackey, 1980; Ayers &
Qualls, 1979; Haney, Madaus, & Kreitzer, 1986; Quirk, Witten,
& Weinberg, 1973; Summers & Wolfe, 1975)." (Note 28)
Walsh misrepresents this statement numerous times.
Methodological Issues
One of the ways that Walsh seeks to make much of the research
on teacher education disappear is by suggesting that it is
inappropriate to cite studies that are older, smaller, use
measures of performance other than student achievement scores,
are aggregated at a level above the classroom, or
are published in venues other than peer-reviewed journals.
As noted above, Walsh uses a double standard in selecting
research to reject when it finds evidence of the influence of
teacher education on student learning and research to cite for
her own purposes. While she discounts the findings of many
dissertation studies and technical reports because they were not
published in peer-reviewed journals, in making her own claims,
she cites at least 15 studies that were not published in
peer-reviewed journals or technical report series and at least 20
that were published before 1980, including some that she
elsewhere dismissed from consideration because she did not like
specific findings. For findings she likes, she also cites
several that use supervisory ratings as the only measures of
teacher effectiveness and others that she later dismisses for
aggregation bias. Sometimes she represents the studies'
findings accurately; sometimes not. Many of the studies she
cites for various propositions do not contain the findings for
which they are citedor, in several cases, any data on
the question at all.
I would not argue, as Walsh does, that none of these studies
have value as contributions to the literature. However, the
double standard she applies in using studies of different eras,
sizes, aggregation levels, dependent variables, and publication
statuses perhaps proves the point that to evaluate the weight of
evidence in a field it is often necessary to triangulate findings
that used different methods, over different time periods, and at
different levels of aggregation to see where there is an accrual
of evidence over time and across methods. Of course it is
important to do this with appropriate attention to the
methodological strengths and weaknesses of various studies and
lines of research. Unfortunately, Walsh often does this poorly,
appearing to misunderstand critical research design issues.
Below, I discuss the issues of study size and design, level of
aggregation, choice of dependent variable (including the use of
supervisory ratings of teacher performance), age, and venue of
publication.
Study Size and Design
In one part of her review, Walsh bemoans the lack of
experimental research. She then rejects the results of studies
with experimental designs because of their smaller sample sizes
and cites almost exclusively non-experimental correlational
studies, whichthough largerlack direct controls
for the variables of interest and must rely on statistical
manipulations of data to account, indirectly, for these other
influences. This kind of correlational research is, of course,
legitimate for staking out broad possibilities in relationships
among variables, but it has its own limitations. Many of the
more carefully controlled experimental designs can in fact offer
more solid evidence about effects, because the
"treatment" they are studying is known and the
samples can be better controlled than is true for large
correlational studies that use proxies and sta |