Education Policy
Analysis Archives
3. Evidence and Boosters of the Myth
Given the consequences attached to performance on TAAS,
it is not surprising that this test has had major impact on
education in Texas. At first glance, this impact appears to
have been largely positive; and it is evidence of the
apparent positive impact of TAAS, and the Texas system of
school accountability, that has helped give rise to the
"miracle" story of education reform in Texas over the
last decade.
Four kinds of evidence seem to have been most widely cited
as indicative of major improvements in education in Texas,
namely: 1) sharp increases in the overall pass rates on TAAS
during the 1990s; 2) apparent decreases in the achievement
gap between White and minority students in Texas (again
based on TAAS scores); 3) seemingly decreasing rates of
students dropping out of school before high school
graduation; and, 4) apparent confirmation of TAAS gains by
results on the National Assessment of Educational Progress
(NAEP).
3.1 Improved results on TAAS
The main evidence contributing to the perception of dramatic
educational gains in Texas during the 1990s (what the March
21, 2000 USA Today editorial called "widespread
improvement in student achievement") seems to have been
sharp increases in passing rates on the TAAS. TAAS was
introduced in Texas in 1990-91, and, as recounted
previously, was administered at somewhat varied grades (and
seasons) during the early 1990s. In several publications,
the TEA has presented TAAS pass rates aggregated across
different grades. Inasmuch as this sort of aggregation may
obscure as much as it reveals, here I present results mainly
for grade 10 TAAS testing.
Table 3.1 (and corresponding Figure 3.1) shows the results
on the grade 10 TAAS test from 1994 to 1998.
Table 3.1 TAAS Grade 10 Percent Passing 1994-1998 All
Students Not in Special Education
(Does Not Include Year-Round Education
Results)
|
|
1994
|
1995
|
1996
|
1997
|
1998
|
| TAAS Reading
|
76%
|
76%
|
81%
|
86%
|
88%
|
| TAAS Math
|
57%
|
59%
|
65%
|
72%
|
78%
|
| TAAS Writing
|
81%
|
86%
|
85%
|
88%
|
89%
|
| TAAS All Tests
|
52%
|
54%
|
60%
|
67%
|
72%
|
Source: Selected State AEIS Data: A Multi-Year History
(www.tea.state.tx.us/student.assessment/swresult/gd10sp98.htm)
As can be seen from these data, grade 10 TAAS results show a
pattern of steady improvement from 1994 through 1998, with
the percentage of students passing the TAAS reading test
rising from 76% to 88%; the percentage passing the TAAS math
test rising from 57% to 78%; and the corresponding increase
for the TAAS writing test going from 81% to 89%. The
percentage of grade 10 students passing all three tests
increased from 52% in 1994 to 72% in 1998.
3.2 Decrease in Race Gap in Test Scores
Even as test scores were improving overall, the gaps in
achievement between White and nonwhite students
(specifically Black and Hispanic students) appeared to have
been narrowing. The USA Today editorial (3/21/2000)
reported that "Texas is one of the few states that has
narrowed its racial learning gap." Figure 3.2 and
Table 3.2 show how the "racial learning gap"
appears to have narrowed on the grade 10 TAAS tests (for
economy of presentation here, I do not show results
separately for the reading, writing, and math tests, but
only the percentages of grade 10 students passing all three
tests).
Table 3.2 TAAS Grade 10 Percent Passing All Tests by Race
1994-1998 All Students Not in Special Education
(Does Not
Include Year-Round Education Results)
|
|
1994
|
1995
|
1996
|
1997
|
1998
|
| Black
|
29%
|
32%
|
38%
|
48%
|
55%
|
| Hispanic
|
35%
|
37%
|
44%
|
52%
|
59%
|
| White
|
67%
|
70%
|
74%
|
81%
|
85%
|
Source: Selected State AEIS Data: A Multi-Year History:
www.tea.state.tx.us/student.assessment/swresult/gd10sp98.htm
As can be seen, in 1994 there was a huge disparity in the
grade 10 pass rates for Black and Hispanic students as
compared with White students. The 1994 White pass rate of
67% was 38 points higher than the Black pass rate of 29%;
and 32 points more than the Hispanic rate of 35%. In other
words, in 1994, White students were passing the grade 10
TAAS tests at about double the rate of Black and Hispanic
students. This gap was just about what might have been
predicted based on the 1990 field test results (see Table
2.1). By 1998, the White grade 10 pass rate had climbed 18
points to 85%. But the Black and Hispanic pass rates had
climbed even more, 26 and 24 points respectively. So in a
period of just five years, the race gaps had been reduced
from 38 to 30 percentage points for Whites and Blacks and
from 32 to 26 for Whites compared with Hispanic tenth grade
students. Or in other words, minorities had increased
their rate of passing grade 10 TAAS tests from less than 50%
of the White pass rate to two-thirds of the White pass rate
in just four years.
3.3 Decreases in Dropout Rates
If the dramatic gains in grade 10 pass rates overall and
substantial decreases in the "racial learning gap"
were not sufficiently remarkable, official TEA statistics
indicated that over the same interval high school dropout
rates were also declining.
Table 3.3 Texas Annual Dropout Rate, Grades 7-12
1994-1998
|
|
1994
|
1995
|
1996
|
1997
|
1998
|
| All Students
|
2.8%
|
2.6%
|
1.8%
|
1.8%
|
1.6%
|
| Black
|
3.6%
|
3.2%
|
2.3%
|
2.3%
|
2.0%
|
| Hispanic
|
4.2%
|
3.9%
|
2.7%
|
2.5%
|
2.3%
|
| White
|
1.7%
|
1.5%
|
1.2%
|
1.1%
|
1.0%
|
Source: Selected State AEIS Data: Five Year History
www.tea.state.tx.us/perfreport/aeis/hist/state.html
As shown in Table 3.3, TEA data indicated that between 1994
and 1998, even as pass rates on the TAAS were increasing
among grade 10 students, dropout rates were decreasing not
just among secondary students overall, but also for each of
the three race groups for which data were disaggregated. In
short, what appeared to be happening in Texas schools in the
1990s truly did seem to be a miracle.
As Peter Schrag has recently written: "Some of Texas's
claims are so striking they border on the incredible. The
state's official numbers show that even as TAAS scores were
going up, dropout rates were cut from an annual 6.1 percent
in 1989-90 to 1.6 percent last year. If ever there was a
case of something being too good to be true, this is
it" (Schrag, 2000). But before reviewing the doubts of
Schrag and others, let me recap one additional source of
evidence that seemed to confirm the miracle story.
3.4 NAEP Results for Texas
Anyone even remotely familiar with recent education
history of the United States must view with some skepticism
the meaningfulness of the almost inevitable increases in
performance that follow introduction of a new testing
program. When a new testing program is
introduced, students and teachers have little familiarity
with the specifics of the new tests. But after a few years,
they become familiar with the style and format of the tests and
students can be coached specifically for the test in
question. Hence, performanceor at least average
test scoresalmost inevitably increases.
That students can be successfully coached for
particular tests has been well known among education
researchers for decades. As far back as 1927, Glimore, for
example, reported that students could be coached on Otis
group intelligence tests "to the point of increasing
their standing and score in intelligence tests even in the
case of the material used in coaching being only similar and
not identical with that of the basic test" (Gilmore,
1927, p. 321). Indeed what happens when students are
coached for a specific test has come to called the "saw
tooth" phenomenon because of the regular pattern in
which scores steadily rise following introduction of a new
testing program, only to fall dramatically when a different
test is introduced (Linn, 2000, p. 7).
The phenomenon of falsely inflated test scores was brought
to wide public attention in the late 1980s and early 1990s
because of publicity for what came to be known as the "Lake
Wobegon" phenomenon in test results. Lake Wobegon is the
mythical town in Minnesota popularized by Garrison Keillor
in his National Public Radio program "A Prairie Home
Companion." It is the town where "all the women
are strong, all the men are good looking, and all the
children are above average." In the late 1980s it was
discovered that Lake
Wobegon seemed to have invaded the nation's schools. For
according to a 1987 report by John Cannell, the vast
majority of school districts and all states were scoring
above average on nationally normed standardized tests
(Cannell, 1987). Since it is logically impossible for all
of any population to be above average on a single measure,
it was clear that something was amiss, that something about
nationally normed standardized tests or their use had been
leading to false inferences about the status of learning in
the nation's schools.
Cannell was a physician by training and not a specialist in
education or education research. His original (1987)
report was published by "Friends for Education,"
the foundation he established to promote accountability in
education. A revised version of Cannell's report was
published in the Summer 1988 issue of Educational
Measurement: Issues and Practice (Cannell, 1988)
together with responses and commentary from representatives
of major test publishers and officials of the U.S Department
of Education (Phillips and Finn, 1988; Drahozal and Frisbie,
1988; Lenke and Keene, 1988; Williams, 1988; Qualls-Payne,
1988; Stonehill, 1988). Cannell's charges regarding
misleading test results were hotly debated in this and other
forums. Some people doubted whether the Lake Wobegon
phenomenon was real (that is, whether large majorities of
states, schools and districts were in fact scoring above
average on the national norms of the tests), while most observers accepted
the reality of the phenomenon but disputed what caused it.
Among the causes suggested and debated were problems in the
original norming of the tests, outdated norms, lack of test
security, manipulation of populations of students tested,
artificial statistical manipulation of test results, and
teachers and schools teaching to the tests, either purposely
or inadvertently.
The publicity surrounding the Lake Wobegon phenomenon was
sufficiently widespread that the U.S. Department of
Education funded researchers at the Center for Research on
Evaluation, Standards and Student Testing (CRESST) to
investigate. On the basis of a survey of state directors of
testing, Shepard (1989) concluded that the conditions for
inflated test resultssuch as high stakes being
pinned on test results, efforts to align curricula to the
tests, and direct teaching to the testsexisted
in virtually all of the states. And on the basis of an
analysis of up to three years of test results from 35 states
from which they were available, Linn, Graue and Sanders
(1989) essentially confirmed Cannell's basic finding that
test results across the nation were implausibly
inflatedLake Wobegon had invaded the nation's
schools. For instance, they found that "for grades 1
through 6, the percentage of students scoring above the
national median in mathematics ranges from a low of 58% in
grade 4 for the 1985 school year to a high of 71% in grade 2
for the 1987-88 school year . . . " (p. 8). Linn,
Graue and Sanders concluded that the use of old norms was
one cause of the abundance of "above average
scores" (p. 23), but also pointed out that in
situations in which the same form of a test is used year
after year, "increased familiarity with a particular
form of a test" (p.24) likely contributed to inflated
scores.
The practice of using a single form of a test year after
year poses a logical threat to making inferences about the
larger domain of achievement. Scores may be raised by
focusing narrowly on the test objectives without improving
achievement across the broader domain that the test
objectives are intended to represent. Worse still, practice
on nearly identical or even the actual items that appear on
a test may be given. But as Dyer aptly noted some years
ago, "if you use the test exercises as an instrument of
teaching you destroy the usefulness of the test as an
instrument for measuring the effects of teaching (Dyer,
1973, p. 89)." (Linn, Graue and Sanders, 1989, p. 25).
The problem was illustrated even more clearly in a
subsequent study reported by Koretz, Linn, Dunbar & Shepard
(1991), which compared test results on one "high-
stakes" test, used for several years in a large urban
school district, with those on a comparable test that had
not been used in that district for several years. They
found that performance on the regularly used high-stakes
test did not generalize to other tests for which students
had not been specifically coached, and again commented that
"students in this district are prepared for high-stakes
testing in ways that boost scores . . . substantially more
than actual achievement in domains that the tests are
intended to measure" (p. 2). To put the matter bluntly,
teaching to a
particular test undermines the validity of test results as
measures of more general learning.
While education researchers were essentially confirming
Cannell's initial charges, the intrepid physician was
continuing his own investigations. In late summer 1989,
Cannell released a new report entitled The "Lake
Wobegon" Report: How Public Educators Cheat on
Standardized Achievement Tests. This time Cannell
presented new instances of the Lake Wobegon
phenomenon and a variety of evidence of outright fraud in
school testing programs, including a sampling of testimony
from teachers concerned about cheating on tests. After
presenting results of his own survey of test security in the
50 states (concluding that security is generally so lax as
to invite cheating), Cannell outlined methods to help people
detect whether cheating is going on in their school
districts, and "inexpensive steps" to help prevent
it.
More recently Koretz and Barron (1998; RAND, 1999) of the RAND
Corporation investigated the validity of dramatic gains on
Kentucky's high stakes statewide tests. Like Texas,
Kentucky had adopted policies to hold schools and teachers
accountable for student performance on statewide tests.
During the first four years of the program, Kentucky
students showed dramatic improvements on the state tests.
What Koretz and Barron sought to assess was the validity of
the Kentucky test gains by comparing them with
Kentucky student performance on comparable tests,
specifically the National Assessment of Educational Progress
(NAEP) and the American College Testing Program (ACT)
college admissions tests. What they found was that the
dramatic gains on the Kentucky test between 1992 and 1996
were simply not reflected in NAEP and ACT scores.
They concluded that the Kentucky test scores
"have been inflated and are therefore not a meaningful
indicator of increased learning" (RAND, 1999).
Even before the release of the report showing inflated test
scores in Kentucky, anyone familiar with the Lake Wobegon
phenomenon, widely publicized in the late 1980s and early
1990s, had to view the dramatic gains reported on TAAS in
Texas in the 1990s with considerable skepticism. Were the
gains on TAAS indicative of real gains in student
learning, or just another instance of artificially inflated
test scores?
In 1997, results from the 1996 the National Assessment of
Educational Progress (NAEP) in mathematics were released.
The 1996 NAEP results showed that among the states
participating in the state-level portion of the math
assessment, Texas showed the greatest gains in percentages
of fourth graders scoring at the proficient or advanced
levels. Between 1992 and 1996, the percentage of Texas
fourth grades scoring at these levels had increased from 15%
to 25%. The same NAEP results also showed North Carolina to
have posted unusually large gains at the grade 8 level, with
the percentages of eighth graders in North Carolina scoring
at the proficient or advanced levels improving from 9% in
1990 to 20% in 1996. (Reese et al., 1997)
Putting aside for the moment that the 1996 NAEP
results also showed that math achievement in these two
states was no better (and in some cases worse) than the
national average, these findings led to considerable
publicity for the apparent success of education reform in
these two states. The apparent gains in math, for example,
led the National Education Goals Panel in 1997 to identify
Texas and North Carolina as having made unusual progress in
achieving the National Education Goals.
3.5 Plaudits for the Texas Miracle
In Spring 1998, Tyce Palmaffy published an article titled
"The Gold Star State: How Texas jumped to the head of
the class in elementary school achievement." Citing
both 1996 NAEP results and TAAS score increases, Palmaffy
praised Texas for being in the vanguard of "an accountability
movement sweeping the states" (not surprisingly he also
mentioned North Carolina and Kentucky). Regarding TAAS,
Palmaffy reported "In 1994, barely half of Texas
students passed the TAAS math exam. By last year, the
proportion had climbed to 80 percent. What's more, the
share of black and Hispanic children who passed the test
doubled during that time to 64 percent and 72 percent
respectively." Palmaffy's article, published in a
Heritage Foundation journal, also included testimonials for
the Texas success story from divergent vantage points. Kati
Haycock, "director of the Education Trust, a Washington
D.C.-based organization devoted to improving educational
opportunities for low-income children" was quoted as
touting Texas as "a real model for other states to
follow." The article also referred to "researcher Heidi
Glidden of the American Federation of Teachers union"
as praising the sort of education accountability system
used in Texas.
Meanwhile, the National Education Goals Panel had
"commissioned Dr. David Grissmer, an education
researcher with the RAND Corporation, to conduct an analysis
of education reforms in both states [Texas and North
Carolina] to determine that the improvements were indeed
significant and to seek to identify the factors that could
and could not account for their progress" (Grissmer &
Flanagan, 1998, p. i). The National Education Goals Panel
released the Grissmer/Flanagan report in November 1998.
Without trying to recap or critique the Grissmer/Flanagan
report here, let me simply summarize how it was conveyed to
the outside world. The report was released November 5, 1998
with a press release titled "North Carolina and Texas
Recognized as Models for Boosting Student Achievement."
The first paragraph of the press release read:
(WASHINGTON, D.C.) A new study that both belies conventional
wisdom about problems in K-12 education and illuminates some
approaches for solving them points to the extraordinarily
successful policies of two states North Carolina and Texas
as models for reform throughout the nation. (NEGP,
11/5/98)
After quotes from North Carolina Governor Jim Hunt and
Texas Governor George W. Bush, the press release went on to
summarize the Grissmer/Flanagan findings. The researchers
found that "several factors commonly associated with
student achievement, such as real per pupil spending,
teacher pupil ratios, teachers with advanced degrees, and
experience level of teachersare not adequate for
explaining the test score gains." (National Education
Goals Panel, November 5, 1998, p. 1). The press release
explained that, instead, Grissmer and Flanagan attributed
the achievement gains in Texas and North Carolina to three
broad factors common to the two states (business leadership,
political leadership, consistent reform agendas) and seven
educational policies (adopting statewide standards by grade
for clear teaching, holding all students to the same
standards, linking statewide assessments to academic
standards, creating accountability systems with benefits and
consequences for results, increasing local control and
flexibility for administrators and teachers, providing test
scores and feedback via computer for continuous improvement,
and shifting resources to schools with more disadvantaged
students).
Grissmer and Flanagan (1998) did not explain how they had
determined that these were the factors behind the apparent
achievement gains in Texas and North Carolina; but whatever
the case, this 1998 report from the National Education Goals
Panel, coupled with the sort of diverse support for the
Texas model education accountability system cited by
Palmaffy, seemed to certify the apparent miracle of
education reform in Texas. The success of education reform
in Texas was being heralded by observers as diverse as
Palmaffy (of the Heritage Foundation), Haycock (head of an
organization dedicated to improving the educational
opportunities of low-income children), and Glidden (a
researcher with one of the nation's largest teachers
unions). The Grissmer/Flanagan report seemed to be the
clincher. Here was a report from a bipartisan national
group (the National Education Goals Panel), prepared by a
Ph.D. researcher from a prestigious research organization,
the RAND Corporation, that straight out said, "The
analysis confirms that gains in academic achievement in both
states are significant and sustained. North Carolina and
Texas posted the largest average gains in student scores on
tests of the National Assessment of Educational Progress
(NAEP) administered between 1990 and 1997. These results
are mirrored in state assessments during the same period,
and there is evidence of the scores of disadvantaged
students improving more rapidly than those of advantaged
students" (Grissmer & Flanagan, 1998, p. i). Few
people seemed to notice that the Grissmer & Flanagan report
was not actually published by RAND.
Nonetheless, the report from the National Education Goals
Panel seemed to certify the seeming miracle of education
reform in Texas. Subsequently, the story of the Texas
miracle has been circulated far and wide. Without trying to
document all of the stories on the Texas miracle I have
seen, let me mention here just two examples. On June 10,
1999, the Boston Globe ran a front-page story headlined
"Embarrassed into success: Texas school experience may
hold lessons for Massachusetts" (Daley, 1999). And on
March 21, 2000, in the editorial cited at the start of this
article, USA Today, in urging the U.S. Senate to adopt
a Texas-style school accountability system for the $8
billion Title I program providing federal aid to poor
schools, the editors cited "Texas-size school
success" in the Lone Star state. In an apparent
reference to 1996 NAEP results, the editorial cited the
Education Trust as the source of evidence about gains in
Texas on 1996 math tests administered nationally.
0: Home
|
1: Intro.
|
2: History
|
3: The Myth
|
4: TAAS
|
5: Missing Students
6: Teachers
|
7: Other Evidence
|
8: Summary
|
Notes & Ref.
|
Appendix
|