Education Policy
Analysis Archives
4. Problems with TAAS
Two years ago when I agreed to help MALDEF on the TAAS
case, I had no way of foreseeing
the extent to which education reform in Texas would come to
be touted as a model to be emulated elsewhere. Nonetheless,
as I studied what had been happening with TAAS in Texas, I
quickly came to think otherwise. Before summarizing what I
think is wrong with TAAS and how it is being misused in
Texas, I should mention that some of what I recount in the
remainder of this article is based on two unpublished reports
that I prepared in connection with the TAAS casea
preliminary report in December 1998, and supplementary
report in July 1999 (Haney, 1998; 1999). However, it also
draws on additional evidence acquired and analyses
undertaken since completion of the supplementary report in
summer 1999.
The problems with TAAS and the way it is being used in Texas
may be summarized under five sub-headings: 1) the TAAS is
having a continuing adverse impact on Black and Hispanic students;
2) the use of the TAAS test in isolation to control award of
high school diplomas is contrary to professional standards
concerning test use; 3) the passing score on TAAS is
arbitrary and discriminatory; 4) a variety of evidence casts
doubt on the validity of TAAS scores; and 5) more
appropriate use of test results would have more validity and
less adverse impact.
4.1 Adverse impact
In previous research and law, three standards have been
recognized for determining whether observed differences
constitute discriminatory disparate impact: 1) the 80
percent (or four-fifths) rule; 2) tests of the statistical
significance of observed differences; 3) and evaluation of the
practical significance of differences. The "80
percent" or four-fifths rule refers to a provision of
the 1978 Uniform Guidelines on Employee Selection Procedures
(43 F.R. No. 166, 38290-38296, 1978) which reads:
Sec. 6D. Adverse impact and the "four-fifths
rule." A selection rate for any race, sex or ethnic
group which is less than four-fifths (or eighty percent) of
the rate for the group with the highest rate will be
generally regarded by Federal enforcement agencies as
evidence of adverse impact, while a greater than four-fifths
rate will generally not be regarded by Federal enforcement
agencies as evidence of adverse impact. (As quoted in
Fienberg, 1989, p. 91).
As a result of its standing in federal regulations, the 80
percent rule as a test of adverse or disparate impact has
been widely recognized. Nonetheless, simple differences in
percentage rates have some undesirable properties. The
simple difference, for example "is inevitably small
when the two percentages are close to zero" (David H.
Kaye and David A. Freedman, Reference guide on statistics,
Federal Judicial Center, 1994). Hence, most observers and
considerable case law now hold that in assessing disparate
impact, it is important to apply not just the 80% or four-
fifths rule but also to consider the practical and
statistical significance of differences in selection or pass
rates (Fienberg, 1989; Kaye & Freedman, 1994; see also,
Office of Civil Rights, 1999). In previous reports
regarding the TAAS case (Haney, 1998; 1999), I applied these
three tests of adverse impact to a variety of TAAS results.
However, for economy of presentation here, I provide only
illustrative results.
Eighty Percent or Four-Fifths Rule.
To apply this test of adverse impact, we simply
multiply the pass rates on TAAS for White students by 80%
and check to see whether the pass rates for Blacks and
Hispanics fall below these levels. Table 4.1 presents the
application of the 80% rule to the TAAS results previously
presented in Table 3.2 above. As can be seen, even though
grade 10 pass rates for all three TAAS tests for Black and
Hispanics have improved between 1994 and 1998, these pass
rates still lag below 80% of the White pass rates.
According to this standard of adverse impact, the TAAS grade
10 tests continue to show adverse impact on Black and
Hispanic students. (Note 5)
Table 4.1 Eighty Percent Rule and TAAS Grade 10 Pass Rates:
Percent Passing All Tests by Race 1994-1998 All Students
Not in Special Education
(Does Not Include Year-Round
Education Results)
|
|
1994
|
1995
|
1996
|
1997
|
1998
|
| White
|
67%
|
70%
|
74%
|
81%
|
85%
|
| White*80%
|
53.6%
|
56.0%
|
59.2%
|
64.8%
|
68.0%%
|
| Black
|
29%
|
32%
|
38%
|
48%
|
55%
|
| Hispanic
|
35%
|
37%
|
44%
|
52%
|
59%
|
Source: Selected State AEIS Data: A Multi-Year
History
Statistical Significance of Differences in Pass Rates.
As mentioned, comparisons of simple percentages passing
have some weaknesses from a statistical point of view. For
example, differences in pass rates, particularly if small
numbers of examinees are involved, may result from random
variation in the particular sample of candidates who take an
examination in a particular year. To check against this
possibility, a second kind of standard for evaluating
discriminatory disparate impact is generally employed;
namely, a test of the statistical significance of observed
differences. A test of statistical significance is used to
assess the probability that a particular outcome (such as
differences in proportions passing a test) might have
occurred simply by chance or random sampling.
The obvious statistical significance test to apply in a case
such as that of proportions of candidates passing the TAAS
is the test of the difference in proportions of two
populations. As explained in most statistics textbooks,
such as Paul Hoel's Introduction to mathematical
statistics (1971, pp. 134-137), if p1 and
p2
refer to the proportions of successes in two samples,
q1
and q2
refer to the proportions of failures in the two
samples, and n1
and n2 refer to the sizes of
the samples, the standard error of the difference in
proportions is calculated as follows:
SEdiff = (p
1q1/n
1 +
p2q
2/
n2)
1/2
Using this formula we may calculate the standard error
of the difference in proportions for each comparison we wish
to make and then divide the standard error of the
difference into the observed difference to calculate the
number of standard errors equivalent to the observed
difference. Table 4.2 shows the results of such
calculations for the Spring 1998 TAAS results.
Table 4.2 Statistical Significance of Differences in 1998
Grade 10 Pass Rates
| |
TAAS Reading |
TAAS Math |
TAAS Writing |
| |
No. Tested |
% Pass |
No. Tested |
% Pass |
No. Tested |
% Pass |
| Black |
26790 |
81% |
27434 |
61% |
26717 |
84% |
| Hispanic |
70666 |
79% |
71747 |
67% |
70481 |
82% |
| White |
108887 |
95% |
109595 |
88% |
108935 |
96% |
|
Source: TAAS Summary ReportTest Performance All Students
Not In Special Ed. Grade 10Exit Level Report Date April
98 Date of Testing: March 1998
(www.tea.state.tx.us/student.assessment/results/summary/sum98/gxen98.htm)
|
| White-Black Differences |
| SE of difference |
0.0025 |
|
0.0031 |
|
0.0023 |
|
| Obs'd Difference |
|
14% |
|
27% |
|
12% |
| Obs'd Diff/SE |
|
56.312 |
|
86.982 |
|
51.721 |
| White-Hispanic Differences |
| SE of difference |
0.0017 |
|
0.002 |
|
0.0016 |
|
| Obs'd Difference |
|
16% |
|
21% |
|
14% |
| Obs'd Diff/SE |
|
95.894 |
|
104.41 |
|
89.503 |
As can be seen from Table 4.2, the differences in pass rates
for both White-Black and White-Hispanic comparisons are
easily statistically significant, with observed differences
equivalent to some fifty to over 100 standard errors.
(Other statistical tests on TAAS results also yield results
of this magnitude; see Haney, 1998; 1999).
Practical significance of observed differences
What of the practical significance of the observed
differences in the 1998 grade 10 TAAS pass rates? Later in
this report, I discuss the apparent consequences of the TAAS
for grade retention and dropping out of school, but for the
moment let us simply examine the numbers of students
involved in the differential pass rates.
On the TAAS writing test in 1998, 96% of White students
passed, 84% of Black students and 82% of Hispanic students.
While these differences do not exceed the 80% rule (96%*0.80
= 76.8%), let us consider the numbers of students involved.
Specifically we may consider the numbers of Black and
Hispanic students who would have passed the 1998 grade 10
writing test had the passing rates for Black and Hispanic
students been the same as that for White students. These
numbers are approximately 3,200 Black students and 9,900
Hispanic students, for a total of about 13,000 (comparable
calculations show that on the TAAS math for 1998, about
22,000 more Black and Hispanic students would have passed
had their pass rates been the same as for White students).
Do the differential results on the 1998 grade 10 TAAS
writing test, on which approximately 13,000 more Black and
Hispanic students failed than would have been the case had
the Black and Hispanic pass rates been the same as that of
White students, constitute practical adverse impact? Do the
differential results on all of the 1998 grade 10 TAAS tests,
on which close to 34,000 more Black and Hispanic students
failed (10,700 Black and 23,200 Hispanic students) than
would have been the case had the Black and Hispanic pass
rates been the same as that for White students constitute
practical adverse impact? The answer, especially when
results are also suspect under both the 80% rule and tests
of statistical significance, seems clear, at least to me. A
test that leads to failure for tens of thousands more
minority than non-minority students, had they had equivalent
passing rates, surely has practical adverse impact. Hence,
the validity and educational necessity of such a test
deserve close scrutiny.
Before turning to those issues, however, I should mention
that in his opinion in the TAAS case on January 7, 2000,
Judge Prado ruled that "Plaintiffs have made a prima
facie showing of significant adverse impact" (p. 23,
though it should be added that the opinion has a discussion
of disparate impact in two places, pp.15-17 and 20-23)
4.2 TAAS Use in Isolation Violates Professional
Standards
The use of TAAS scores in isolation to
control award of high school diplomas (or for that matter
use of any test results alone to make high stakes decisions
about individuals or institutions) is contrary both to
professional standards regarding testing and to sound
professional practice.
The standards to which I refer are the Standards for
Educational and Psychological Testing published by the
American Educational Research Association (AERA), the
American Psychological Association (APA) and the National
Council on Measurement in Education (NCME). These standards
have been in existence for nearly 50 years (in current and
previous editions; AERA, APA & NCME, 1985; 1999), and have
been relied upon in numerous legal proceedings concerning
testing in state and federal courts. (Note 6) One specific
provision of these standards reads as follows:
Standard 13.7 In educational settings, a decision or
characterization that will have a major impact on a student
should not be made on the basis of a single test score.
Other relevant information should be taken into account if
it will enhance the overall validity of the decision.
. . . It is important that in addition to test scores,
other relevant information (e.g., school record, classroom
observation, parent report) is taken into account by the
professionals responsible for making the decision.
(AERA, APA & NCME, 1999, pp. 146-47) (Note 7)
It seems clear that
the practice in Texas of controlling award of high school
diplomas on the basis of TAAS test scores in isolation
without weighing other relevant information such as
students' grades in high school (HSGPA) is contrary to this
provision of the 1999 Standards for Educational and
Psychological Testing (and the corresponding provision
of the 1985 Standards).
Witnesses for the state of Texas during the TAAS trial
(Susan Phillips and William Mehrens) disputed my
interpretation of this standard. Here is how Judge Prado
summarized and resolved the dispute in his decision:
There was little dispute at trial over whether this standard
exists and applies to the TAAS exit-level examination. What
was disputed was whether the TAAS test is actually the sole
criterion for graduation. As the TEA points out, in addition
to passing the TAAS test, Texas students must also pass each
required course by 70 percent. See Texas Admin. Code
§ 74.26(c). Graduation in Texas, in fact, hinges on
three separate and independent criteria: the two
objective criteria of attendance and success on the TAAS
examination, and the arguably objective/subjective criterion
of course success. However, as the Plaintiffs note, these
factors are not weighed with and against each other; rather,
failure to meet any single criterion results in failure to
graduate. Thus, the failure to pass the exit-level exam does
serve as a bar to graduation, and the exam is properly
called a "high-stakes" test.
On the other hand, students are given at least eight
opportunities to pass the examination prior to their
scheduled graduation date. In this regard, a single TAAS
score does not serve as the sole criterion for
graduation. The TEA presented persuasive evidence that the
number of testing opportunities severely limits the
possibility of "false negative" results and
actually increases the possibility of "false
positives," a fact that arguably advantages all
students whose scores hover near the borderline between
passing and failing. (Prado 2000, pp. 14-15)
Nonetheless, I believe that my interpretation of this
standard is more in keeping with preponderance of
professional opinion than are the narrow interpretations
offered by the witnesses for the state of Texas. This may
be illustrated by reference to the 1999 report from the
Board on Testing and Assessment of the Commission on
Behavioral and Social Sciences of the National Research Council.
As a result of increasing controversy over high stakes
testing, the U.S. Congress passed legislation in 1997
requesting that the National Academy of Sciences undertake a
study and make recommendations regarding the appropriate use
of tests for student grade promotion, tracking and
graduation (Heubert & Hauser, 1999, p. 1). The resulting
report High Stakes: Testing for Tracking, Promotion,
and Graduation specifically cites Standard 8.12 of
the 1985 joint standards and clearly points out that a
compensatory or sliding scale approach to using test scores
in combination with grades would be "more compatible
with current professional standards" than using an
absolute cut-off score on a test to control high school
graduation (Heubert & Hauser, 1999, pp. 165-66). More
generally, this National Research Council report
recommends:
High stakes decisions such as tracking, promotion, and
graduation should not automatically be made on the basis of
a single test score but should be buttressed by other
relevant information about students' knowledge and skills
such as grades, teacher recommendations and extenuating
circumstances. (Heubert & Hauser, 1999, p. 279) (Note
8)
Ironically enough, reliance on TAAS scores in isolation to
control award of high school diplomas in Texas is even
contrary to the following passage from the TEA's own
Texas Student Assessment Program Technical Digest:
All test result uses regarding individual students or groups
should incorporate as much data as possible. . . . Student
test scores should also be used in conjunction with other
performance indicators to assist in making placement
decisions, such as whether a student should take a reading
improvement course, be placed in a gifted and talented
program or exit a bilingual program. (pp. 2-3)
In sum, the state of Texas's use of TAAS scores in
isolation, without regard to students' high school grades,
to control award of high school diplomas, is contrary not
only to both professional standards regarding test use and
the advice of the recent NRC report, but also to the TEA's
own advice on the need to use test results in conjunction
with other performance indicators.
4.3 Passing scores on TAAS Arbitrary and
Discriminatory
The problem of using TAAS scores in isolation to control
award of high school diplomas is exacerbated by the fact
that the passing scores set for TAAS are arbitrary and
discriminatory. This is important because when a pass or
cut score is set on a test, the validity of the test depends
not just on test content, administration and scoring, but
also on the manner in which the passing score is set.
The 1999 Standards for Educational and Psychological
Testing state:
Standard 4.19 When proposed score interpretations involve
one or more cut scores, the rationale and procedures used
for establishing cut scores should be clearly documented.
(AERA, APA & NCME, 1999, p. 59)
Also, standard 2.14 says that "Where cut scores are
specified for selection or classification, the standard
errors of measurement should be reported in the vicinity of
each cut score (AERA, APA & NCME, 1999, p. 35) . (Note
9)
Considerable technical and professional
literature has been published on alternative methods for
setting passing scores on tests.
Glass (1978) wrote an early critique of methods of
setting passing scores that questioned the very advisability
of even attempting to make this use of tests.
In 1986,
Ronald Berk published "A consumer's guide to setting
performance standards on criterion-referenced tests"
(Review of Educational Research, 56:1, 137-172) in
which he reviewed 38 different methods for setting standards
(or pass or cut-scores) on standardized tests. (Note 10)
I sought to learn exactly how the passing scores were
set on the TAAS in 1990 and to obtain copies of any data
that were used in the process of setting passing scores on
the TAAS exit test. The most complete account of the
process by which the passing scores were set
is provided in Appendix 9 of the Texas
Student Assessment Program Technical Digest for the Academic
Year 1996-1997, (TEA, 1997, pp. 337-354).
Specifically contained in this appendix are 1) a memo dated
July 14, 1990, from Texas Education Commissioner Kirby to
members of the state Board of Education (including a summary
of results from a field test of the TAAS) and 2) Minutes of
the State Board of Education meeting in July 1990 at which
the passing scores on the grade 10 TAAS were
established.
In his memo, Commissioner Kirby recommended a passing score
of 70% correct for the exit level of TAAS, but also
recommended that this standard be phased in over a period of
three years, with the passing score of 60% proposed for
Fall 1990. After considerable discussion, the State
Board voted unanimously to adopt the recommendations of the
commissioner regarding the Texas Assessment of Academic
Skills, specifically that: "For the Academic Skills
Level, a minimum standard of 70% of the test items must be
answered correctly."
Following a statement by a Dr. Crawford about the importance
of giving "notice regarding the standard required for
graduation from high school . . . to those students who will
be taking the exit level test" (p. 6/353), the Board
also voted 11 to 3 in favor of an amendment to the original
proposal to "give notice that the 1991-92 standard will
be 70" (p. 7/354).
What struck me about this record of how the passing score on
the TAAS exit test was set are the following:
- The process was not based on any of the professionally
recognized methods for setting passing standards on
tests;
- It appears to have failed completely to take the
standard error of measurement into account; and,
- As I explain below, the process yielded a passing score
that effectively maximized the adverse impact of the TAAS
exit test on Black and Hispanic students.
Before I elaborate on the latter point, let me emphasize
that from the available record I have done my utmost to
understand the rationale that motivated the Board to set the
passing score where it did, namely at 70% correct. As best
I can tell from the record, the main reasons for setting the
passing score at 70% correct appear to have been that this
is where the passing score had been set on TEAMS and this
level was suggested by the Texas Education Code. The
minutes of the Board meeting report that "the
Commissioner cited the portion of the Texas Education code
that requires 70 percent as passing (Attachment A),
explaining that there is a rationale for aiming at 70
percent of test items as the mastery standard" (p.
1/348).
In my view this is simply not a reasonable or professionally
sound basis for setting a passing standard on an important
test such as the TAAS exit test. Indeed from the available
record it is not even clear that the Texas code cited by the
Commissioner was actually referring to anything more
than the passing standard for course grades. Moreover, the
minutes to the July 12, 1990, meeting also report the
following remarks by Dr. Crawford: "Testing is driving
a curricular program, which means that the curriculum is not
at the place where you want it to be when you start
out." She commented that "70 only has whatever
value that is given to it, and in testing 70 is not the
automatic passing standard on every test" (p.
4/351).
In sum, the process used in setting the passing scores on
the TAAS exit test in 1990 did not adhere to prevailing
professional standards regarding the setting of passing
scores on standardized tests. For example, from the record
available, it is clear that the process used to set the
passing score on the TAAS exit test in 1990 failed to meet
all six criteria of "technical adequacy" described
in Berk's (1986) review of criteria for setting performance
standards on criterion-referenced testsa review
published in a prominent education research journal, and
of which TEA officials surely should have been aware
in 1990.
TAAS cut score study.
To understand more fully the process by which the
TAAS passing scores were set in 1990, I requested a copy of
the TAAS field test data that were presented to the Board of
Education in the meeting at which it set the passing score
on the TAAS-X. Using these data, I undertook a study (with
the assistance of Boston College doctoral student Cathy
Horn) which came to be called our "TAAS cut score
study." In this study, we asked individuals, reviewing
the data available to the Texas Board of Education in July
1990 to select the passing scores (or cut scores) students
would be need to attain in order to pass the TAAS reading
and math tests. For both the reading and math tests, each
research subject was presented with a graph showing the
percentage of students, separately for White, Hispanic and
Black ethnic groups, passing each number of percent correct
answers on the field or pilot test of the TAAS exit test in
1990. These graphs are shown in Figures 4.1 and 4.2
below.
Each person in the cut score study was then presented with
the following instructions:
The following graph presents the percentage of students
passing the reading / math section of the Texas Assessment
of Academic Skills (TAAS) at each number of questions
answered correctly. Choose the number of questions correct
that most clearly differentiates White students (represented
by a black line) from Black and Hispanic students.
Respondents could then ask clarifying questions before
selecting a response. After a pilot test of the cut score
study in 1998, Ms. Horn (a native of Texas and secondary
school teacher there before she came to Boston College for
graduate studies) extended the cut score study to nine
Texans. The exercise was administered, by phone or in
person, to 9 individuals residing in the state of Texas.
(Those individuals who were interviewed by phone had paper
copies of the Figure 4.1 and 4.2 graphs and the prompt for
the exercise in front of them when they selected cut
points.) The professions of the nine respondents are listed
below.
Respondents (all currently living in Texas):
2 teachers
3 engineers
2 college students
1 financial analyst
1 director of communications
The cut or passing scores selected by these nine individuals
as most clearly differentiating between White students and
Black/Hispanic students are shown in Table 4.3 below.
Table 4.3 Results of Cut Score Study with Nine
Texans
| |
Reading |
Math |
| Person 1 |
34 |
34 |
| Person 2 |
35 |
37 |
| Person 3 |
35 |
38 |
| Person 4 |
34 |
37 |
| Person 5 |
36 |
40 |
| Person 6 |
33 |
40 |
| Person 7 |
34 |
37 |
| Person 8 |
36 |
43 |
| Person 9 |
44 |
44 |
| Summary |
| Minimum |
33 |
34 |
| Maximum |
44 |
44 |
| Mean |
35.7 |
38.9 |
| Median |
35 |
38 |
As shown, respondents selected passing scores ranging from
33 to 44 on the reading test and from 34 to 44 for the math
test. The median value across all nine respondents was 35
for the reading test and 38 for the math test.
The passing scores of 70% correct for the TAAS exit test
recommended by Commissioner Kirby and accepted by the Board
of Education in July 1990 were 34 for the reading test and
42 for the math test. The results of our cut score study
show that if the intent in setting passing scores based on
the TAAS field test results in July 1990 had been
discriminatory, i.e., to set the passing scores so that
they would most clearly differentiate between White students
and Black/Hispanic students, then the passing scores would
have been set just about where the Board of Education did in
fact set them.
At the same time, there is no evidence of which I know, in
the record of the process of setting passing scores on the
TAAS in 1990, that the explicit intent of either
Commissioner Kirby or the Board was discriminatory.
However, the available record shows no indication that
Commissioner Kirby, the TEA or the Board relied on any
professionally recognized method for setting passing scores
on the test, and the passing scores set were indeed
consistent with those that would have been set, based on the
TAAS field test results, if the intent had been
discriminatory.
Use of measurement error in setting passing
scores.
The reason the setting of passing scores on a high stakes
test such as the TAAS is so important is that the passing
score divides a continuum of scores into just two
categories, pass and fail. Doing so is hazardous because
all standardized test scores contain some degree of
measurement error. Hence, the 1985 Standards for
Educational and Psychological Testing and other
professional literature clearly indicate the
importance of considering measurement error and consequent
classification errors in the process of setting passing
scores on tests.
Before discussing this topic further, two introductory
explanations may be helpful. First, from the available
record of the July 1990 meeting of the Board of Education,
there is no indication that consideration of measurement
error entered into the Board's deliberations. Second, the
issue of measurement and classification errors regarding
TAAS was addressed, as far as I know at least in the 1993-94
and 1996-97 editions of Texas Student Assessment Program
Technical Digest. Unfortunately there are two serious
errors in the manner in which these issues are addressed.
Before explaining the nature of these errors,
let me first summarize what the 1996-97 edition of Texas
Student Assessment Program Technical Digest says about
test reliability, standard error of measurement and
classification errors.
Chapter 8 of the 1996-97 Technical Digest, entitled
"reliability" provides a brief discussion
of internal consistency estimates and formulas for
calculating internal consistency reliability estimates
(p.41). This is followed (p. 42) by a discussion of (and
formulas for) calculating standard errors of measurement
from reliability estimates. These discussions provide
references to appendix 7 which shows data to indicate that
for the Spring 1997 administration of TAAS at grade 10
(administered to 214,000 students) the internal consistency
estimates for the TAAS math, reading and writing sub-tests
were 0.934, 0.878 and 0.838, respectively; and the
corresponding standard errors of measurement were 2.876,
2.352 and 2.195.
This represents the first serious error in the technical
report's handling of measurement and classification error.
Specifically, while the technical report bases the
calculation of standard error of measurement on internal
consistency reliability estimates, it clearly should have
been based on test-retest or alternate-forms reliability
estimates. Test-retest reliability refers to the
consistency of scores on two administrations of a test.
Alternate-forms reliability refers to the consistency of
scores on two different forms or versions of the same test.
Since the purpose of TAAS testing is not simply to estimate
students' performance on one version of the TAAS
test, but to estimate their competence in reading, math and
writing, in general, as might be measured by any
version of the relevant TAAS tests, alternate-forms
reliability is more appropriate for assessing reliability
than is internal consistency reliability. As Thorndike and
Hagen (1977, p. 79) point out in their textbook on
measurement and evaluation, "evidence based on
equivalent test forms should usually be given the most
weight in evaluating the reliability of a test."
In general, alternate forms test reliability tends to be
lower than internal consistency reliability. Hence, it seems
clear to me that the figures reported in the 1996-97
Technical Digest overestimate the relevant
reliability of grade 10 TAAS test scores and underestimate
the standard error of measurement associated with TAAS
scores.
I have attempted to estimate the alternate-forms
reliability of TAAS test scores using two independent
sources of data. First I employed the cross-tabulations reported by
Linton & Debeic (1992) of test-retest data on students in
several large Texas districts who took the TAAS exit level
test in October 1990 and again in April 1991. Using the
Linton & Debeic cross tabular results, I calculated the
following test-retest correlations: TAAS-Reading 0.536;
TAAS-Math 0.643; and TAAS-Writing 0.555.
Second, as part of the background work for the TAAS case,
Mark Fassold developed a remarkable longitudinal database of
all 1995 sophomore students in Texas and their TAAS scores
on up to ten different administrations of TAAS:
1 March 1995
2 May 1995
3 July 1995
4 October 1995
5 March 1996
6 May 1996
7 July 1996
8 October 1996
9 February 1996
10 April 1996
At my request Mr. Fassold ran an analysis of all test-retest
correlations on this cohort of students (total N of about
230,000). Correlations were calculated separately by ethnic
group and for TAAS Reading and Math tests. Given 16
different test-retest possibilities this yielded 214
different coefficients (2 x 16 x 6 ethnic groups).
Results varied widely (in part because in some comparisons
sample sizes were very small). Overall, however, the
observed test-retest correlations tended to cluster in the
0.30 to 0.50 range.
These test-retest correlations based on both the Linton-Debeic
and Fassold data are, however, attenuated in that in
both data sets only students who failed a TAAS test took it
again. There are methods for correcting observed test-retest
correlations for such attenuation (see Haney, Fowler
and Wheelock, 1999, for an example), but as a more
conservative approach here, let me simply discuss what
previously published literature suggests about the
relationships between test-retest and internal consistency
reliability.
As mentioned above, the 1996-97 Technical Digest
cites internal consistency reliability estimates for the
three grade 10 TAAS sub-tests of 0.934, 0.878 and 0.838, and
standard errors of measurement of 2.876, 2.352 and 2.195.
It is common for tests which show internal consistency
reliability of about 0.90 to show alternate forms
reliability of 0.85 or 0.80 (see for example, Thorndike &
Hagen, 1977, p. 92). On page 42 of the 1996-97 Technical
Digest, the example is shown in which a test with an
internal consistency reliability of 0.90 (and a standard
deviation of 6.3) is estimated to have a standard error of
measurement of 2.0. However, if instead of an internal
consistency reliability of 0.90, we were to use in these
calculations an alternate forms reliability of 0.85 or 0.80,
the resulting standard errors of measurement would be 2.44
and 2.82. This suggests that the appropriate standard
errors of measurement for the TAAS tests may be on the order
of 20 to 40% greater than the estimates reported in the TAAS
1996-97 Technical Digest.
The second serious error in the technical report's handling
of measurement and classification error occurs on pages 30 and
31 in a section labeled " Exit level testing standards
and the standard error of measurement." Here the
authors of the 1996-97 Technical Digest point out
that a student with a "true achievement level at the
passing standard would be likely to pass on the first
attempt only 50% of the time" (p. 31). This passage
then goes on to assert that "if such a student has
attempted that test eight times, the student's passing is
almost assured (probability of passing is 99.6%)" (p.
31). In other words, the chances of a minimally qualified
student failing the TAAS eight times and being misclassified
as not having the requisite skills is only 0.4% (0.50 to
the 8th power is 0.0039).
This calculation strikes me as erroneous, or at
least potentially badly misleading, because the authors
have presented absolutely no evidence to show the
probability that a student who fails the TAAS will continue
to take the test seven more times. As I explain later,
available evidence suggests that students who fail the TAAS
grade 10 test more than once or twice are likely to be held
back in grade and to drop out of school long before they
reach grade 12 by which time they would have had a chance to
take the TAAS exit test eight times. Since 0.50 to the
second power is 0.25; and to the third power is 0.125, this
indicates that a student with a "true achievement level
at the passing standard" who takes the TAAS twice or
three times, before becoming discouraged and not taking the
test again, has a 25% or 12.5% chance of being misclassified
as failing.
Before proceeding to present evidence bearing on this
point, let me discuss how the standard error of measurement
might usefully have been taken into account in adjusting
passing scores. Because of the error of measurement in test
scores, when scores are used to make pass-fail decisions
about students, two kinds of classification errors can
occur. A truly unqualified student can pass the test (a
false pass) or a truly qualified student can fail the test
(a false failure). How one thinks about the balance of
these two misclassification errors depends on the risks (or
benefits and costs) associated with each type of
misclassification. If one were confident that a student
failing TAAS would receive special attention and support
educationally, one might be inclined to weigh false passes
as more serious than false failures. If on the other hand,
one thought that students failing TAAS were unlikely to
receive effective instruction, and instead merely to be
retained in grade 10 and to be stigmatized as failures, then
one would probably feel that false failures would be more
harmful than false passes.
Here is how Berk (1986) discussed this point:
Assessing the relative seriousness of these consequences, is
a judgmental process. It is possible to assign plusses
(benefits) and minuses (costs or losses) to the consequences
so that the cutoff scores can be set in favor of a specific
error reduction rate. A loss ratio (benefits: losses) can
be specified for each decision application with the cutoff
score adjusted accordingly. (Berk, 1986, p. 139).
To study the relative risks associated with the two
kinds of classification errors associated with a high school
graduation test, with the assistance of Kelly Shasby, (a
doctoral sudent in the Educational Research, Measurement and
Evaluation program at Boston College), I undertook what came
to be known as our "risk analysis" study.
The survey form used in the risk analysis study was entitled
"Survey of risk associated with classification
decisions" and opened with the following
introduction:
When classifying large numbers of individuals using
standardized exams, two different kinds of mistakes are
made. Some people will be falsely classified as
"qualified" or "passing" while others
will be falsely classified as "unqualified" or
"failing." There is a degree of risk associated
with mistakes of this kind, both for the individual who is
incorrectly classified and for the society in which that
individual lives. We would like your help in assessing the
severity of the risk, or possible harm, caused to
individuals and to society when mistakes are made on a
number of different types of standardized tests.
The purpose of this survey is to assess the public's
perception of misclassifications of individuals. These
misclassifications can have an impact on the individual and
on the society in which that individual lives. This impact
has the potential to be harmful, and we are interested in
determining how harmful the public thinks different
misclassifications can be.
On a scale from 1 to 10, 1 being "minimum harm"
and 10 being "maximum harm," rate each scenario
with respect to the degree of harm it would cause that
individual and then the degree of harm it would cause
society. Then circle the number, which corresponds,
to the rating you chose.
After this introduction, respondents were asked to rate the
risk on a 1 to 10 scale of harm associated with 16
different misclassifications that might results from
classifying people pass-fail based on standardized test
results. Respondents were asked to rate separately the harm
to individuals and to societyand to give credit where it
is due, this distinction, a clear improvement over the
initial version of our survey, was suggested by Ms. Shasby.
Specifically, survey respondents were asked to rate the
degree of harm, separately for individuals and society,
associated with the following kinds of misclassification:
- A kindergartner who is ready to enter school is denied
entrance.
- A kindergartner who is not ready to enter school is
granted entrance.
- An airline pilot who is not qualified is given a license
to fly.
- An airline pilot who is qualified is denied a license to
fly.
- A qualified high school student is denied a diploma.
- An unqualified high school student is granted a
diploma.
- A qualified accountant is denied certification.
- An unqualified accountant is granted certification.
- A qualified student is denied promotion from grade eight
to grade nine.
- An unqualified student is granted promotion from grade
eight to grade nine.
- A qualified doctor is denied a license to practice.
- An unqualified doctor is granted a license to
practice.
- A qualified candidate is denied admission into
college.
- An unqualified candidate is granted admission into
college.
- A qualified teacher is denied certification.
- An unqualified teacher is granted certification.
The risk survey form was sent to a random sample of 500
secondary teachers in Texas (specifically only math and
English/Language Arts teachers) on May 23, 1999. As of
June 30, 1999, we had received 66 responses (representing a
response rate of 13.2%). (Note 11)
Table 4.4 below summarizes the results of the risk analysis
survey.
Table 4.4 Results of Risk Analysis Survey with Secondary
Teachers in
Texas
| |
For individual |
For society |
| |
Mean |
SD |
Mean |
SD |
| 1. A kindergartner who is ready to enter school is
denied entrance. |
6.45 |
2.67 |
3.94 |
2.64 |
| 2. A kindergartner who is not ready to enter school is
granted entrance. |
7.20 |
2.23 |
5.06 |
2.71 |
| 3. An airline pilot who is not qualified is given a
license to fly. |
8.36 |
2.32 |
9.55 |
1.00 |
| 4. An airline pilot who is qualified is denied a license
to fly. |
7.74 |
2.37 |
4.39 |
2.99 |
| 5. A qualified high school student is denied a
diploma. |
9.11 |
1.69 |
6.39 |
2.58 |
| 6. An unqualified high school student is granted a
diploma. |
6.85 |
2.72 |
7.74 |
2.26 |
| 7. A qualified accountant is denied certification. |
8.65 |
1.50 |
5.32 |
2.62 |
| 8. An unqualified accountant is granted
certification. |
8.65 |
1.50 |
5.32 |
2.62 |
| 9. A qualified student is denied promotion from grade
eight to grade nine. |
8.89 |
1.52 |
6.15 |
2.39 |
| 10. An unqualified student is granted promotion from
grade eight to grade nine. |
8.15 |
2.01 |
7.80 |
2.12 |
| 11. A qualified doctor is denied a license to
practice. |
8.80 |
1.68 |
7.32 |
2.64 |
| 12. An unqualified doctor is granted a license to
practice. |
7.15 |
2.87 |
9.37 |
1.72 |
| 13. A qualified candidate is denied admission into
college. |
8.83 |
1.73 |
6.30 |
2.43 |
| 14. An unqualified candidate is granted admission into
college. |
6.08 |
2.66 |
6.08 |
2.66 |
| 15. A qualified teacher is denied certification. |
8.64 |
1.76 |
8.38 |
2.13 |
| 16. An unqualified teacher is granted
certification. |
6.62 |
2.84 |
9.15 |
1.60 |
As this table shows, the risk associated with denying a high
school diploma to a qualified student is for individuals the
most severe risk associated with any of the
misclassification scenarios we asked respondents to rate.
The only scenarios showing higher average risks are the
risks for society associated with licensing an unqualified
pilot (mean = 9.55), licensing an unqualified doctor (9.37)
and licensing an unqualified teacher (9.15).
Particularly germane to our discussion of the setting of
passing scores on the TAAS graduation test are the relative
risks associated with denying a diploma to a qualified high
school student (mean = 9.11) and granting a diploma to an
unqualified student (6.85). These results indicate that
the risk of denying a diploma to a qualified student is much
more severe than granting a diploma to an unqualified
student (the difference, by the way, is statistically
significant).
These results indicate that if a rational passing score had
been established on the TAAS exit test, the passing or
cutoff scores should be adjusted downward in order to
minimize overall risk. A common practice in setting
passing scores on important tests is to reduce an
empirically established passing score by one or two standard
errors of measurement. While I want to stress that
the passing scores of
70% correct on the TAAS are arbitrary, unjustified and
discriminatory, we can see from Figures 4.1 and 4.2 what the
consequences would be for Black and Hispanic pass rates (on
the TAAS field test) if the passing scores of 70% had been
corrected for error of measurement. Recall that the passing
scores set by the Board on the field test administration of
the TAAS were 34 items correct on the reading test and 42 on
the math test. Recall also that the standard errors for the
reading and math tests reported in the Technical
Digest were in the range of 2.5 to 3.0 raw score points.
Suppose that to take error of measurement into account, the
initially selected passing scores of 34 and 42 were lowered
5 points, to 29 and 37 on the reading and math tests,
respectively. What can be easily seen from Figures 4.1 and
4.2 is that these adjustments would have increased the
passing rates for Black and Hispanic students about 12% on
the math test and 20% on the reading test.
The foregoing results were presented in a written report
before the TAAS trial (Haney, 1999) and also discussed
during testimony at trial. Judge Prado (2000) apparently
did not find these points persuasive for he commented merely
that in setting the passing score on the TAAS tests,
"the State Board of Education looked at the passing
standard for the TEAMS test, which was also 70 percent, and
also considered input from educator committees" (p.
11). Regarding the disparate impact of the passing score,
he commented simply, "The TEA understood the
consequences of setting the cut score at 70 percent"
(p. 11).
4.4 Doubtful Validity of TAAS Scores
The Technical Digest on TAAS (TEA, 1997) contains
an extremely short section (pp. 45-47) discussing test
validity. Though this three-page passage mentions content,
construct and criterion-related validity, it maintains that
"the primary evidence for the validity of the TAAS and
end-of-course tests lies in the content validity of the
test" (TEA, 1997, p. 47). This discussion, it seems to
me is woefully inadequate because test validation should
never rest primarily on test content. Test validation refers
to the interpretation and meaning of test scores and
these depend not just on test content, but also on a host of
other factors, such as the conditions under which tests are
administered, and how results are scored and interpreted
(e.g., in terms of a passing score, as discussed in the
previous section).
Nonetheless, the TEA has previously undertaken a number of
studies examining the relationship between TAAS scores and
course grades. In one study, for example, it was reported
that in one large urban district, 50% of the students who
had received a grade of B in their math courses failed the
TAAS math test (TEA, 1996 Comprehensive Report on Texas
Public Schools, pp. 14-15). Another summary finding was
that when "TEA correlated exit level students' TAAS
mathematics scores with the same students' course grades for
several different mathematics courses in the 1992-93 school
year . . . the correlation between TAAS scale scores and
students' end-of-year grades was only moderately positive
(0.32). . . " (TEA, 1997, Technical Digest, p. 47).
Inasmuch as this correlation is remarkably low in light of
previous research that has generally shown test scores to
correlate with high school grades in the range of 0.45 to
0.60 (see Haney, 1993, p. 58), as part of work on the TAAS
case I sought to acquire the actual data set on which this
TEA finding was based.
The data set in question contains records for 3,281 students
in three districts that TEA documentation describes as
"large urban district," "mid-sized suburban
district," and "small rural district." The
TEA has previously reported analyses of these data in
"Section V: A study of correlation of course grades
with Exit Level TAAS Reading and Writing Tests"
pp. 189-197 in Student Performance Results 1994-95, Texas
Student Assessment Program, TAAS and End-of-Course
Examinations and Other Studies (Texas Education Agency,
Austin, Texas, ND, but presumably 1995).
After opening the file and verifying its structure, I sought
to confirm that the results reported by the TEA could be
replicated. This was impossible to do precisely because TEA
did not report results with great precision. Nonetheless,
initial results corresponded reasonably well with what TEA
reported. Also, it should be noted that while the data
file included records on a number of grade 11 students, I
restricted most analyses to grade 10 students pooled across
the three districts, though the bulk of this sample (>
2,400 cases out of 3,300) comes from the one large urban
district.
Then we calculated basic descriptive statistics on variables
of interest, in particular scores for the TAAS reading and
writing test administered in March 1995 and grades for the
English II courses completed in May 1995 (these data were
provided by the districts to the Student Assessment Division
of TEA.) Next we calculated relationships between
variables. Table 4.5 shows the intercorrelations between
the three TAAS test scores (writing, reading and math) and
English II course grades. Given the size of this sample
(>3,000) all of these correlation coefficients are
statistically significant at the 0.01 level.
Table 4.5 Correlations between TAAS Scores (Standard
scores) and English II
Grades
| |
Write SS |
Read SS |
Math SS |
Grade |
| Write SS |
1.00 |
|
|
|
| Read SS |
0.50 |
1.00 |
|
|
| Math SS |
0.51 |
0.69 |
1.00 |
|
| Grade |
0.32 |
0.34 |
0.37 |
1.00 |
Note the magnitudes of the correlations between English II
course grades and TAAS scores. They are all in the range of
0.32 to 0.37. As indicated above, previous studies have
generally shown test scores to correlate with high school
grades in the range of 0.45 to 0.60. Contrary to
expectations, English II grades correlate more highly with
TAAS math scores (0.37) than with writing (0.32) or reading
(0.34) scores. Note also the odd intercorrelations among
TAAS scores. The TAAS math scores correlate at the level of
0.69 with the TAAS reading scores, while the TAAS reading
scores correlate at the level of 0.50 with the TAAS writing
scores. This is contrary to the expectation that scores of
two verbal measures (of reading and writing) should
correlate more highly with one another than with a measure
of quantitative skills. These results cast doubt on the
validity and the reliability of TAAS scores.
People unfamiliar with social science research doubtless
find it hard to make sense of correlation coefficients in
the range of 0.32 to 0.37. Hence to provide a visual
representation, Figure 4.3 shows a scatterplot of the
relationship between TAAS reading scores and English II
grades. As can be seen from this figure, the relationship
between these two variables is a quite weak. Students with
grades in the 70 to 100 range have TAAS reading scores from
well below 40 to well over 80. Conversely, students with
TAAS reading scores in the 80 to 100 range have English II
grades from well below 40 to well over 80.
Figure 4.3 Scatterplot of TAAS Reading Scores and
English II Grades
I next examined whether there were differences in the
relationships between TAAS scores and English II grades
across ethnic groups. Table 4.6 provides an example of the
relationship between passing and failing TAAS and passing or
failing in terms of English II course grades for Hispanics,
Blacks and Whites. As can be seen from this table, of those
students who passed their English II courses in the spring
of 1995, 27-29% of Black and Hispanic students failed the
TAAS reading test taken the same semester as their English
courses compared with 10% of White students. In other words,
of grade 10 students in these three districts who are
passing their English II courses, the rate of failure on the
TAAS reading test for Black and Hispanic students is close
to triple that of White students. A similar, but slightly
smaller, disparity is apparent on the TAAS writing sub-test.
Table 4.6 Rates of Passing and Failing TAAS and
English II Course
| |
TAAS-Exit Test Results |
| |
Black students |
Hispanic students |
White students |
| |
Reading |
Reading |
Reading |
| English II Course |
Failed |
Passed |
Failed |
Passed |
Failed |
Passed |
| Failed N |
39 |
23 |
242 |
189 |
17 |
34 |
| (%) |
10.1% |
5.9% |
11.0% |
8.6% |
3.1% |
6.3% |
| Passed N |
111 |
214 |
596 |
1181 |
55 |
436 |
| (%) |
28.7% |
55.3% |
27.0% |
53.5% |
10.1% |
80.4% |
| |
Writing |
Writing |
Writing |
| English II Course |
Failed |
Passed |
Failed |
Passed |
Failed |
Passed |
| Failed N |
33 |
29 |
173 |
258 |
20 |
31 |
| (%) |
8.5% |
7.5% |
7.8% |
11.7% |
3.7% |
5.7% |
| Passed N |
69 |
256 |
366 |
1411 |
50 |
441 |
| (%) |
17.8% |
66.1% |
16.6% |
63.9% |
9.2% |
81.4% |
Such a disparity can result from several causes. First, if
the TAAS reading test is in fact a valid and unbiased test
of reading skills, the fact that close to 30% of Black and
Hispanic students who are passing their sophomore
English courses failed the TAAS reading test, as
compared with only 10% of White students must indicate that
minority students in these three districts are simply not
receiving the same quality of education as their White
counterpartsespecially when one realizes, as I will
show in Part 5 of this article that by 1995 Black and Hispanic
students in Texas statewide were being retained in grade 9
at much higher rates than White students. The only other
explanation for the sharp disparity is that the TAAS tests
and the manner in which they are being used (with a passing
score of 70% correct) are simply less valid and fair
measures of what Black and Hispanic students have had an
opportunity to learn, as compared with White students.
These analyses were reported in the July 1999 report (Haney,
1999) and discussed in direct testimony and cross-examination
during the TAAS trial in September 1999. Here
is how Judge Prado interpreted these findings in his January
7 ruling:
The Plaintiffs provided evidence that, in many cases,
success or failure in relevant subject-matter classes does
not predict success or failure in that same area on the TAAS
test. See Supplemental Report of Dr. Walter Haney,
Plaintiff's expert, at 29-32. In other words, a student may
perform reasonably well in a ninth-grade English
class, for example, and still fail the English portion of
the exit-level TAAS exam. The evidence suggests that the
disparities are sharper for ethnic minorities. Id. at
33. However, the TEA has argued that a student's classroom
grade cannot be equated to TAAS performance, as grades can
measure a variety of factors, ranging from effort and
improvement to objective mastery. The TAAS test is a solely
objective measurement of mastery. The Court finds that,
based on the evidence presented at trial, the test
accomplishes what it sets out to accomplish, which is to
provide an objective assessment of whether students have
mastered a discrete set of skills and knowledge. (Prado,
2000, p. 24)
With due respect to Judge Prado, I believe there are
two flaws in this reasoning. First, Judge
Prado interprets the disparities in the rates at which,
among students who pass their English II courses, minorities
fail the "English portion" of TAAS far more
frequently than White students, as evidence of the need for
"objective assessment" of student skills. Though
he did not explicitly say so, his reasoning seems to be that
an objective test is necessary because the grades of
minority students are inflated. This interpretation,
however, takes one specific finding out of the context in
which I presented it, both in the Supplementary
report (Haney, 1999, pp. 29-33) and in testimony at
trial. In both cases, and as described above, it was shown
that even if one ignores the question of possibly inflated
grades, the intercorrelations among TAAS scores themselves
(i.e., that reading and math scores correlate more
highly than reading and writing scores) raise serious doubts
about their validity.
Second, even if we assume the validity of TAAS tests and
accept Judge Prado's reasoning that the lack of
correspondence between English grades and TAAS reading and
writing scores demonstrates the need for objective
assessment of student mastery, the fact that "the
disparities are sharper for ethnic minorities,"
represents prima facie evidence of inequality in opportunity
to learn. Even if Black and Hispanic students' teachers are
covering the same academic content as White students'
teachers, that 27-29% of Black and Hispanic
students who passed their English II course failed the TAAS
reading test (as compared with 10% of White students)
obviously must indicate that their teachers are not holding
them to the same academic standards as the teachers of White
students.
4.5 More appropriate use possible
This discussion leads naturally to a simple solution for
avoiding reliance on test scores in isolation to make high
stakes decisions about students. As previously mentioned,
the recent High Stakes report of the National
Research Council (Heubert & Hauser, 1999) states clearly
that using a sliding scale or compensatory model combining
test scores and grades would be "more compatible with
current professional testing standards" than
relying on a single arbitrary passing score on a test
(Heubert & Hauser, 1999, pp. 165-66). Moreover this is
exactly how test scores are typically used in informing
college admissions decisions, such that students with higher
high school grade point averages (GPA) need lower test
scores to be eligible for admission, and conversely students
with lower GPA need higher test scores. Ironically enough
this is indeed exactly how institutions of higher education
in Texas use admissions test scores in combination with GPA.
For example, in 1998, the University of Houston required
that in order to be eligible for admissions, high school
students who had a grade point average of 3.15 or better
needed to have SATI total scores of at least 820, but if
their high school GPA was only 2.50, they needed to have
SATI total scores of 1080 (University of Houston, 1998).
Literally decades of research on the validity of college
admissions test scores show that such an approach, using
test scores and grades in sliding scale combination produces
more valid results than relying on either GPA or admissions
test scores alone (Linn, 1982; Willingham, Lewis, Morgan &
Ramist, 1990). Moreover, such a sliding scale approach
generally has been shown to have less disparate impact on
ethnic minorities (and women) than relying on test scores
alone (Haney, 1993).
The tendency for a sliding scale approach to have smaller
adverse impact on minorities can be illustrated with the
data on TAAS scores and English II grades discussed in the
last section. Texas now effectively uses a double-cut or
conjunctive model of decision-making, whereby students
currently must have a grade of 70 in their academic courses
(such as English II) and a score of 70 on TAAS
to graduate from high school. These requirements are
illustrated in Figure 4.4 (which is the same as Figure 4.3
except that a vertical line has been added to represent the
70-grade requirement and a horizontal line has been added to
represent the TAAS 70-score requirement.
Figure 4.4 Scatterplot of TAAS Reading Scores and
English II Grades with 70 Minima Shown
Note also that the data shown in Figure 4.4 are the same as
those summarized in the top portion of Table 4.6. As
indicated there, 80.4% of white students in this sample
passed both the English II course and the TAAS reading test,
while only 10.1% of White students passed English II and
failed the TAAS reading test. In contrast, 53-55% of Black
and Hispanic students passed both the course and the test,
but 27-29% of Black and Hispanic students passed English II,
but failed the TAAS test.
Suppose now that instead of applying a double cut rule so
that students have to have scores of 70 in both the course
and the test to pass, they need to have a minimum of 140
combined. This circumstance is illustrated in Figure 4.5,
below.
Figure 4.5 Scatterplot of TAAS Reading Scores and
English II Grades with Sliding Scale Shown
As can be seen, under such a sliding scale approach, higher
grades can compensate for lower test scores and vice versa
(that is why the sliding scale approach is sometimes called
a compensatory model). Under this approach, the number of
Black and Hispanic students passing would increase from 1,395
to 1,765a 27% increase. Under a sliding scale
approach, the number of White students passing would also
increase slightly (from 436 to 487), but since the latter
increase is smaller proportionately, the disparate impact on
Black and Hispanic students would be reduced.
The sliding scale decision rule illustrated here (TAAS-R +
Eng II grade > 140) was chosen merely for illustrative
purposes. As with college admissions tests, in practice such a
sliding scale approach ought to be based on empirical
validation studies. But the example illustrates the way in
which an approach more in accord with professional standards
would significantly reduce adverse impact. The literature
on college admissions testing strongly suggests it would
yield more valid decisions too.
0: Home
|
1: Intro.
|
2: History
|
3: The Myth
|
4: TAAS
|
5: Missing Students
6: Teachers
|
7: Other Evidence
|
8: Summary
|
Notes & Ref.
|
Appendix
|