This article has been retrieved
times since October 26, 2000
Education Policy Analysis Archives | ||
Volume 8 Number 49 |
October 26, 2000 |
ISSN 1068-2341 |
|
Editor: Gene V Glass, College of Education Arizona State University
Copyright 2000, the
EDUCATION POLICY ANALYSIS ARCHIVES. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education. |
What Do Test Scores in Texas Tell Us?Stephen P. Klein
|
|
Abstract We examine the results on the Texas Assessment of Academic Skills (TAAS), the highest-profile state testing program and one that has recorded extraordinary recent gains in math and reading scores. To investigate whether the dramatic math and reading gains on the TAAS represent actual academic progress, we have compared these gains to score changes in Texas on another test, the National Assessment of Educational Progress (NAEP). Texas students did improve significantly more on a fourth-grade NAEP math test than their counterparts nationally. But, the size of this gain was smaller than their gains on TAAS and was not present on the eighth-grade math test. The stark differences between the stories told by NAEP and TAAS are especially striking when it comes to the gap in average scores between whites and students of color. According to the NAEP results, that gap in Texas is not only very large but increasing slightly. According to TAAS scores, the gap is much smaller and decreasing greatly. Many schools are devoting a great deal of class time to highly specific TAAS preparation. While this preparation may improve TAAS scores, it may not help students develop necessary reading and math skills. Schools with relatively large percentages of minority and poor students may be doing this more than other schools. We raise serious questions about the validity of those gains, and caution against the danger of making decisions to sanction or reward students, teachers and schools on the basis of test scores that may be inflated or misleading. Finally, we suggest some steps that states can take to increase the likelihood that their test results merit public confidence and provide a sound basis for educational policy. |
IntroductionDuring the past decade, several states have begun using the results on statewide tests as the basis for rewarding and sanctioning individual students, teachers, and schools. Although testing and accountability are intended to improve achievement and motivate staff and students, concerns have been raised in both the media and the professional literature (e.g., Heubert & Hauser, 1999; Linn, 2000) about possible unintended consequences of these programs.The high-stakes testing program in Texas has received much of this attention in part because of the extraordinarily large gains the students in this state have made on its statewide achievement tests, the Texas Assessment of Academic Skills (TAAS). In fact, the gains in TAAS reading and math scores for both majority and minority students have been so dramatic that they have been dubbed the "Texas miracle." However, there are concerns that these gains were inflated or biased as an indirect consequence of the rewards and sanctions that are attached to the results. Thus, although there is general agreement that the gains on the TAAS are attributable to Texas' high-stakes accountability system, there is some question about what these gains mean. Specifically, do they reflect a real improvement in student achievement or something else? We conducted several analyses to examine the issue of whether TAAS scores can be trusted to provide an accurate index of student skills and abilities. First, we used scores on the reading and math tests that are administered as part of the National Assessment of Educational Progress (NAEP) to investigate how much students in Texas have improved and whether this improvement is consistent with what has occurred nationwide. NAEP scores are a good benchmark for this purpose because they reflect national content standards and they are not subject to the same external pressures to boost scores as there are on the TAAS. Next, we assessed whether the gains in TAAS scores between 1994 and 1998 were comparable to those on NAEP. We did this to examine how much confidence can be placed in the TAAS score gains. Similarly, we measured whether the differences in scores between whites and students of color on the TAAS were consistent with the differences between these groups on NAEP. Specifically, is the gap on TAAS credible given the gap on NAEP? And finally, we investigated whether TAAS scores are related to the scores on a set of three other tests that we administered to students in 20 Texas elementary schools. Our findings from this research raise serious questions about the validity of the gains in TAAS scores. More generally, our results illustrate the danger of relying on statewide test scores as the sole measure of student achievement when these scores are used to make high-stakes decisions about teachers and schools as well as students. We anticipate that our findings will be of interest to local, state, and national educational policymakers, legislators, educators, and fellow researchers and measurement specialists. Readers also may be interested in a RAND study by Grissmer et al. (2000) that compared the NAEP scores of different states across the country. Grissmer and his colleagues found that after controlling for various student demographic characteristics and other factors, Texas tended to have higher NAEP scores than other states and there was some speculation as to whether this was due to the accountability system in Texas. Thus, while the Grissmer et al. (2000) report and the research presented in this issue paper both used NAEP scores, these studies differed in the questions they investigated, the data they analyzed, and the methodologies they employed. A forthcoming RAND issue paper will discuss some of the broader policy questions about high-stakes testing in schools.
|
BackgroundScores on achievement tests are increasingly being used to make decisions that have important consequences for examinees and others. Some of these "high-stakes" decisions are for individual students--such as for tracking, promotion, and graduation (Heubert & Hauser, 1999). Some states and school districts also are using test scores to make performance appraisal decisions for teachers and principals (e.g., merit pay and bonuses) and to hold schools and educational programs accountable for the success of their students (Linn, 2000). Although the policymakers who design and implement such systems often believe they lead to improved instruction, there is a growing body of evidence which indicates that high-stakes testing programs can also result in narrowing the curriculum and distorting scores (Koretz & Barron, 1998; Koretz et al., 1991; Linn, 2000; Linn, Graue, & Sanders, 1990; Stecher, Barron, Kaganoff, & Goodwin, 1998). Consequently, questions are being raised about the appropriateness of using test scores alone for making high-stakes decisions (Heubert & Hauser, 1999).In this issue paper, we examine score gains on one statewide test in an effort to assess the degree to which they provide valid information about student achievement in that state and about improvements in achievement over time. This investigation is the latest in a decade-long series of RAND studies of high-stakes testing (e.g., Koretz & Barron, 1998). We believe that this work will provide lessons to help policymakers understand some of the challenges that arise in the context of high-stakes accountability systems. Our interest in Texas was prompted by an unusual empirical relationship we observed between scores on TAAS and tests we administered to students in a small sample of schools as part of a larger study on teaching practices and student achievement. Because our set of schools was small and not representative of the state, we decided to explore statewide patterns of achievement on TAAS and on NAEP. In addition, Texas provides an ideal context in which to study high-stakes testing because its accountability system has received attention from the media and from the policy community, and it has been cited as possibly contributing to improved student achievement (e.g., Grissmer & Flanagan, 1998; Grissmer et al., 2000). TAAS scores are a central component of the accountability system. For example, students must pass the TAAS to graduate from high school, and TAAS scores affect performance evaluations (and, in some cases, compensation) for teachers and principals. The TAAS program has been credited not only with improving student performance, but also with reducing differences in average scores among racial and ethnic groups. For example, a recent press release announced a record high passing rate on the TAAS. According to Commissioner of Education Jim Nelson, "Texas has justifiably gained national recognition for the performance gains being made by our students." Nelson also stated that Texas has "been able to close the gap in achievement between our minority youngsters and our majority youngsters, and we've again seen how we're progressing in that regard" (Jim Nelson as quoted by Mabin, 2000). The unprecedented score gains on the TAAS have been referred to as the "Texas miracle." However, some educators and analysts (e.g., Haney, 2000) have raised questions about the validity of these gains and the possible negative consequences of high-stakes accountability systems, particularly for low-income and minority students. For example, the media have reported concerns about excessive teaching to the test, and there is some empirical support for these criticisms (Carnoy, Loeb, & Smith, 2000; McNeil & Valenzuela, 2000; Hoffman et al., in press). For instance, teachers in Texas say they are spending especially large amounts of class time on test preparation activities. Because the length of the school day is fixed, the more time that is spent on preparing students to do well on the TAAS often means there is less time to devote to other subjects. There are also concerns that score trends may be biased by a variety of formal and informal policies and practices. For example, policies about student retention in grade may affect score trends (McLaughlin, 2000). States may vary in the extent to which their schools promote students who fail to earn acceptable grades and/or statewide test scores. Eliminating these so-called "social promotions" would most likely raise the average scores at each grade level in subsequent years while lowering it at each age level. This is likely to occur because although the students who are held back may continue to improve, they are likely to do so at a slower rate than comparable students who graduate with their classmates (Heubert & Hauser, 1999). Another concern is inappropriate test preparation practices, including outright cheating. There have been documented cases of cheating across the nation, including in Texas. If widespread, these behaviors could substantially distort inferences from test score gains (Hoff, 2000; Johnston, 1999). The pressure to raise scores may be felt most intensely in the lowest-scoring schools, which typically have large populations of low-income and minority students. Students at these schools may be particularly likely to suffer from overzealous efforts to raise scores. For example, Hoffman et al. (in press) found that teachers in low-performing schools reported greater frequency of test preparation than did teachers in higher-performing schools. This could lead to a superficial appearance that the gap between minority and majority students is narrowing when no change has actually occurred. Evidence regarding the validity of score gains on the TAAS can be obtained by investigating the degree to which these gains are also present on other measures of these same general skills. Specifically, do the score trends on the TAAS correspond to those on the highly regarded NAEP? The NAEP tests are generally recognized as the "gold standard" for such comparisons because of the technical quality of the procedures that are used to develop, administer, and score these exams. Of course, NAEP is not a perfect measure. For example, there are no stakes attached to NAEP scores, and therefore student motivation may differ on NAEP and state tests, such as TAAS. However, it is currently the best indicator available. There are several other reasons why score gains on the TAAS are not likely to have a one-to-one match with those on NAEP if these tests assess different skills and knowledge. However, the specifications for the NAEP exams are based on a consensus of a national panel of experts, including educators, about what students should know and be able to do. Hence, NAEP provides an appropriate benchmark for measuring improvement. As Linn (2000) notes, "Divergence of trends does not prove that NAEP is right and the state assessment is misleading, but it does raise important questions about the generalizability of gains reported on a state's own assessment, and hence about the validity of claims regarding student achievement" (p. 14). |
Questions for Our ResearchUnderstanding the source and consequences of the impressive score gains on the TAAS would require an extensive independent study. We have not done that. Instead, the analyses described below address the following questions about student achievement in Texas:
Description of the TAASTAAS was initiated in 1990 to serve as a criterion-referenced measure of the state's mandated curriculum. It is intended to be comprehensive and to measure higher-order thinking skills and problem-solving ability (Texas Education Agency, 1999). Since the full implementation of the TAAS program in 1994, it has been administered in reading and mathematics in grades 3, 4, 5, 6, 7, 8, and 10. Other subjects are also tested at selected grade levels. Last year, for example, a writing test was given at grades 4, 8, and 10. Science and social studies were tested at grade 8. The TAAS tests consist primarily of multiple-choice items, but the writing test includes questions that require written answers.Teachers administer the TAAS tests to their own students. Answers are scored by the state. The questions are released to the public after each administration of the exam, and a new set of TAAS tests is administered each year. However, the format and content of the questions in one year are very similar to those used the next year. Each form of the TAAS contains items that are being field-tested for inclusion in the forms to be used in subsequent years. These items are also used to link test scores from one year to the next to help ensure consistent difficulty over time. These experimental items are not used to compute student scores nor are they released to the public. This practice is consistent with that employed in many other large-scale testing programs. The TAAS is administered only in Texas. Thus, there are no national norms or benchmarks against which to compare the performance of Texas students on this test. However, the Texas Education Agency administered the Metropolitan Achievement Tests to a sample of Texas students to determine how well these students performed relative to a national norm group. We discuss this study in a later section of this issue paper.
|
Description of NAEPThe national portion of NAEP is mandated by Congress and is administered through the National Center for Education Statistics. It is currently the only assessment that provides information on the knowledge and skills of a representative sample of the nation's students. The content of NAEP tests is based on test specifications that were developed by educators and others, and is intended to reflect a consensus about what students should be learning at a given grade level. Hence, the questions are not tied to standards of a single state or district. (Note 1) Like TAAS, NAEP is designed to assess problem-solving skills in addition to content knowledge. A national probability sample of schools is invited to participate in NAEP. Schools that decline are replaced with schools where the student characteristics are similar to those at the schools that refused to participate.Most states, including Texas, also arrange to have the NAEP exams administered to another (and larger) group of their schools to allow for the generation of reliable state-level results. This state-level testing utilizes the same general procedures as the national NAEP program does; e.g., third-party selection of the participating schools and having a cadre of trained consultants (rather than classroom teachers) administer the tests. However, unlike the national program, these consultants may be local district personnel. In both the national and state-level programs, a given student is asked a sample of all the questions that are used at that student's grade level. This permits a much larger sampling of the content domain in the available testing time than would be feasible if every student had to answer every item. Different item formats (including multiple-choice, short-answer, and essay) are used in most subjects. The breadth of content and item types, as well as the consensus of a national panel of experts that is reflected in NAEP frameworks, makes NAEP a useful indicator of achievement trends across the country. The validity of NAEP scores is enhanced by the procedures that are used to give the exams and ensure test security (e.g., test administrators do not have a stake in the outcomes). However, the utility of NAEP scores is limited by some of the other features of this testing program. For instance, NAEP is not administered every year, and when it is administered, not every subject is included, only a few grade levels are tested, and individual student, school, and district scores are not available. These features preclude examining year-to-year trends in a particular subject or tracking individual student progress over time. The motivation to do well on the NAEP tests is intrinsic rather than driven by external stakes. However, any reduction in student effort or performance that may stem from NAEP being a relatively low-stakes test should be fairly consistent over time and therefore not bias our measurement of score improvements across years. How We Report ResultsNAEP and TAAS results are typically reported to the public in terms of the percentage of students passing or meeting certain performance levels (or "cut" scores). Although this type of reporting seems easier to understand, it can lead to erroneous conclusions. For example, the difficulty of achieving a passing status or a certain level of performance (such as "proficient") may vary between tests as well as within a testing program over time. Making comparisons based on percentages reaching certain levels also does not account for score changes among students who perform well above or below the cut score.To avoid these and other problems with percentages, we adopted the research community's convention of reporting results in terms of "effect" sizes. The effect size is the difference in mean scores (between years or groups) divided by the standard deviation of those scores. In other words, it is the standardized mean difference. The major advantage of using effect sizes is that they provide a common metric across tests. As a frame of reference for readers who are not familiar with this metric, the effect size for the difference in achievement between white and black students has ranged from 0.8 to 1.2 across a variety of large-scale tests (Hedges & Nowell, 1998). The effect size for the difference in third grade student reading scores between large and small classes in Tennessee was approximately 0.25 (Finn & Achilles, 1999). (Note 2) |
Have Reading and Math Skills Improved in Texas?NAEP data have been cited as evidence of the effectiveness of educational programs in Texas (e.g., Grissmer & Flanagan, 1998). For instance, within a racial or ethnic group, the average performance of the Texas students tends to be about six percentile-points higher than the national average for that group (Grissmer et al., 2000; Reese et al., 1997).These results are consistent with the findings obtained by the Texas Education Agency in its 1999 Texas National Comparative Data Study, in which a sample of Texas students took the Metropolitan Achievement Tests, Seventh Edition (MAT-7). Texas students at every grade level scored slightly higher than the national norming sample in most subjects (Texas Education Agency, 1999). However, it is difficult to draw conclusions from this study because, according to the sampling plan for this research, each participating school selected the classrooms and students that would take the MAT. Moreover, Texas did not report the mean TAAS scores of the students who took the MAT. Under the circumstances, the TAAS data are vital for determining whether those who took the MAT were truly representative of their school or the state. For example, the interpretation of the MAT findings would no doubt change if it was discovered that the mean TAAS scores of the students who took the MAT were higher than the corresponding state mean TAAS scores. Data from a single year cannot tell us whether achievement has improved over time or whether trends in TAAS scores are reflected in other tests. To answer the question of whether performance improved, we compared the scores of Texas fourth graders in one year with the scores of Texas fourth graders four years later. We did this in both reading and mathematics. We also did this for eighth graders in mathematics (NAEP's testing schedule precluded conducting a similar analysis for eighth graders in reading). We then contrasted these results with national trends to assess whether the gains in Texas after the full statewide implementation of the TAAS differed from those in other states. Figures 1 through 3 present the results of these analyses. The main finding is that over a four-year period, the average test score gains on the NAEP in Texas exceeded those of the nation in only one of the three comparisons, namely: fourth grade math.
Figure 1 shows that the Texas fourth graders in 1998 had higher NAEP reading scores than did Texas fourth graders in 1994. The size of the increase was .13 standard deviation units for white students and .15 units for students of color. However, these increases were not unique to Texas. The national trend was for all students to improve. In fact, only among white fourth graders was the improvement in Texas greater than improvement nationally, and then only slightly (the difference in the effect sizes between Texas and the United States was .08). We discuss the implications of this difference in score gains between groups when we discuss the question of whether Texas has narrowed the gap in performance among racial and ethnic groups. The TAAS data tell a radically different story (see Figure 1). They indicate there was a very large improvement in TAAS reading scores for all groups (effect sizes ranged from .31 to .49). Figure 1 also shows that on the TAAS, black and Hispanic students improved more than whites. The gains on TAAS were therefore several times larger than they were on NAEP. And, contrary to the NAEP findings, the gains on TAAS were greater for students of color than they were for whites.
Figure 2 shows that fourth graders in Texas in 1996 had substantially higher NAEP math scores than did fourth graders in 1992 (effect sizes ranged from .25 to .43). Moreover, this improvement was substantially greater than the increase nationwide. This was especially true for white students. Nevertheless, the gains on TAAS were much larger than they were on NAEP, especially for students of color. (Note 3) Figure 3 shows that Texas eighth graders in 1996 had higher NAEP scores than did Texas eighth graders in 1992, but these differences were only slightly larger than those observed nationally. Thus, as with fourth grade reading, there was nothing remarkable about the NAEP scores in Texas, and students of color did not gain more than whites. In contrast, there were huge improvements in eighth grade math scores on the TAAS during a similar four-year period, and these increases were much larger for students of color than they were for whites. The same was true for eighth grade TAAS reading scores during this period (effect sizes for whites, blacks, and Hispanics were .28, .45, and .37, respectively).
Table 1
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The same radically disparate NAEP and TAAS trends were also present for
the Hispanic-white gap; i.e., the gap got slightly wider on NAEP but
substantially smaller on TAAS over comparable four-year periods (see
Figure 4b). In addition, although fourth grade math was the subject on
which Texas showed the largest gains over time relative to the nation,
the white-Hispanic NAEP gap grew in Texas but not nationally, and the
white-black gap remained constant in Texas but actually shrank
nationally. In short, gap sizes on NAEP were moving in the opposite
direction than they were on TAAS.
It is worth noting that even the relatively small NAEP gains we observed
might be somewhat inflated by changes in who takes the test. As
mentioned earlier, Haney (2000) provides evidence that exclusion of
students with disabilities increased in Texas while decreasing in the
nation, and Texas also showed an increase over time in the percentage of
students dropping out of school and being held back. All of these
factors would have the effect of producing a gain in average test scores
that overestimates actual changes in student performance.
Why Do TAAS and NAEP Scores Behave So Differently?The large discrepancies between TAAS and NAEP results raise serious questions about the validity of the TAAS scores. We do not know the sources of these differences. However, one plausible explanation, and one that is consistent with some of the survey and observation results cited earlier, is that many schools are devoting a great deal of class time to highly specific TAAS preparation. It is also plausible that the schools with relatively large percentages of minority and poor students may be doing this more than other schools.TAAS questions are released after each administration. Although there is a new version of the exam each year, one version looks a lot like another in terms of the types of questions asked, terminology and graphics used, content areas covered, etc. Thus, giving students instruction and practice on how to answer the specific types of questions that appear on the TAAS could very well improve their scores on this exam. For example, in an effort to improve their TAAS scores, some schools have retained outside contractors to work with teachers, students, or both. If the discrepancies we observed between NAEP and TAAS were due to some type of focused test preparation for the TAAS, then this instruction must have had a fairly narrow scope. With the possible exception of fourth grade math, it certainly did not appear to influence NAEP scores. In short, if TAAS scores were affected by test preparation for the TAAS, then the effects of this preparation did not appear to generalize to the NAEP exams. This explanation also raises questions about the appropriateness of what is being taught to prepare students to take the TAAS. A small but significant percentage of students may have "topped out" on the TAAS. In other words, their TAAS scores may not reflect just how much more proficient they are in reading and math than are other students. If that happened, it would artificially narrow the gap on the TAAS between whites and students of color (because majority students tend to earn higher scores than minority students). Thus, the reduced gap on the TAAS relative to NAEP may be an artifact of the TAAS being too easy for some students. (Note 4) If so, it also would deflate the gains in TAAS scores over time. In short, were it not for any topping-out, the TAAS gain scores in Figures 1 through 3 would have been even larger, which in turn would further increase the disparity between TAAS and NAEP results. What Happens on Other Tests?We collected data on about 2,000 fifth graders from a mix of 20 urban and suburban schools in Texas. This study was part of a much larger project that included administering different types of science and math tests to students who also took their state's exams. The 20 schools were from one part of Texas. They were not selected to be representative of this region let alone of Texas as a whole. Nevertheless, some of the results at these schools also raised questions about the validity of the TAAS as a measure of student achievement.Test AdministrationIn the spring of 1997, our Texas students took the English language version of the TAAS in reading and math. A few weeks later, we administered the following three tests to these same students: the Stanford 9 multiple-choice science test, the Stanford 9 open-ended (OE) math test, and a "hands-on" (HO) science test developed by RAND (Stecher & Klein, 1996). The Stanford 9 OE math test asked students to construct their own answers and write them in their test booklets. In the HO science test, students used various materials to conduct experiments. They then wrote their answers to several open-ended questions about these experiments in a simulated laboratory notebook. Table 3 shows the means and standard deviations on each measure.
Some Expected and Unexpected FindingsWe analyzed the data in two ways. First, we investigated whether the students who earned high scores on one test tended to earn high scores on the other tests. Next, we examined whether the schools that had a high average score on one test tended to have high average scores on the other tests. We also looked at whether the results were related to type of test used (i.e., multiple-choice or open-ended), subject matter tested (reading, math, or science), and whether a student was in a free or reduced-price school lunch program. The latter variable serves as a rough indicator of a student's socioeconomic status (SES). For the school-level analyses, SES was indicated by the percentage of students at the school who were in the subsidized lunch program.
Table 3
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Deviation |
Deviation |
|||
| TAAS math | ||||
| TAAS reading | ||||
| Stanford 9 science | ||||
| Stanford 9 OE math | ||||
| HO science | ||||
| Percentage in lunch program (SES) |
||||
Some of our results were consistent with those in previous studies.
Others were not. We begin with what was consistent and then turn to
those that were anomalous.
The first column of Table 4 shows the correlation between various pairs
of measures when the student (N approx. 2,000) is the unit of analysis.
(Note 5) The second column shows the results when
the school (N = 20) is the unit of analysis. The first set of rows show
that the measures we administered correlated about .55 with each other
when the student was the unit of analysis. These correlations were
substantially higher when the school was the unit. For example, the
correlation between Stanford 9 science and Stanford 9 OE math was .55
when the student was the unit, but it was .78 when the school was the
unit. These results are very consistent with the general findings of
other research on student achievement.
| Correlations between: | ||
Non-TAAS tests
|
.55 .53 |
.78 .71 |
SES and non-TAAS tests
|
.10 .18 |
.72 .66 |
SES and TAAS tests
|
.14 |
.21 |
TAAS and non-TAAS tests
|
.46 .48 .52 .42 .53 |
.02 .03 .10 .21 .13 |
| TAAS math and TAAS reading | ||
![]() |
|
Bennett, R. E. (1998). Reinventing assessment. Princeton, NJ: Educational Testing Service.
Carnoy, M., Loeb, S., & Smith, T. L. (2000). Do higher state test scores in Texas make for better high school outcomes? Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans.
Finn, J. D., & Achilles, C. M. (1999). Tennessee's class size study: Findings, implications, misconceptions. Educational Evaluation and Policy Analysis, 21, 97-109.
Grissmer, D., & Flanagan, A. (1998). Exploring rapid achievement gains in North Carolina and Texas. Washington, DC: National Education Goals Panel.
Grissmer, D., Flanagan, A., Kawata, J., & Williamson, S. (2000). Improving student achievement: What state NAEP test scores tell us. Santa Monica, CA: RAND, MR-924-EDU.
Hamilton, L. S., Klein, S. P., & Lorie, W. (2000). Using web-based testing for large-scale assessment. Santa Monica, CA: RAND, IP-196-EDU.
Haney, W. (2000). The myth of the Texas miracle in education. Education Policy Analysis Archives, 8 (41). Available at http://epaa.asu.edu/epaa/v8n41.
Hedges, L. V., & Nowell, A. (1998). Black-white test score convergence since 1965. In Jencks, C., & Phillips, M. (Eds.), The Black-White Test Score Gap (pp. 149-181). Washington, DC: Brookings.
Heubert, J. P., & Hauser, R. M. (Eds.) (1999). High stakes: Testing for tracking, promotion, and graduation. A Report of the National Research Council, Washington, DC: National Academy Press.
Hoff, D. J. (2000). As stakes rise, definition of cheating blurs. Education Week, June 21.
Hoffman, J. V., Assaf, L., Pennington, J., & Paris, S. G. (in press). High stakes testing in reading: Today in Texas, tomorrow? The Reading Teacher.
Johnston, R. C. (1999). Texas presses districts in alleged test-tampering cases. Education Week, March 17.
Koretz, D., & Barron, S. I. (1998). The validity of gains on the Kentucky Instructional Results Information System (KIRIS). Santa Monica, CA: RAND, MR-1014-EDU.
Koretz, D., Linn, R. L., Dunbar, S. B., & Shepard, L. A. (1991). The effects of high-stakes testing: Preliminary evidence about generalization across tests, in R. L. Linn (chair), The Effects of High Stakes Testing, symposium presented at the annual meetings of the American Educational Research Association and the National Council on Measurement in Education, Chicago, April.
Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29 (2), 4-16.
Linn, R. L., Graue, M. E., & Sanders, N. M. (1990). Comparing state and district test results to national norms: The validity of claims that "everyone is above average." Educational Measurement: Issues and Practice, 9, 5-14.
Mabin, Connie (2000). State's students again improve on TAAS scores. Austin American-Statesman, May 18.
McLaughlin, D. (2000). Protecting state NAEP trends from changes in SD/LEP inclusion rates. Palo Alto, CA: American Institutes for Research.
McNeil, L., & Valenzuela, A. (2000). The harmful impact of the TAAS system of testing in Texas: Beneath the accountability rhetoric. Cambridge, MA: Harvard University Civil Rights Project.
Reese, C. M., Miller, K. E., Mazzeo, J., & Dossey, J. A. (1997). NAEP 1996 report card for the nation and the states. Washington, DC: National Center for Education Statistics.
Stecher, B. M., Barron, S., Kaganoff, T., & Goodwin, J. (1998). The effects of standards-based assessment on classroom practices: Results of the 1996-97 RAND Survey of Kentucky Teachers of Mathematics and Writing (CSE Technical Report 482). Los Angeles: Center for Research on Evaluation, Standards, and Student Testing.
Stecher, B. M., & Klein, S. P. (Eds.) (1996). Performance assessments in science: Hands-on tasks and scoring guides. Santa Monica, CA: RAND, MR-660-NSF.
Texas Education Agency (1999). Texas Student Assessment Program: Technical digest for the academic year 1998-1999. Available at http://www.tea.state.tx.us/student.assessment/techdig.htm.
Texas Education Agency (2000). 1999 Texas National Comparative Data Study. Available at http://www.tea.state.tx.us/student.assessment/researchers.htm.
Texas Education Agency (2000). Texas TAAS passing rates hit seven-year high; four out of every five students pass exam. Press release, May 17.
Dr. Laura Hamilton is an Associate Behavioral Scientist at RAND where she conducts research on educational assessment and the effectiveness of educational reform programs. Her current projects include a study of systemic reforms in math and science, an investigation of the validity of statewide assessments for students with disabilities, and an analysis of the effectiveness of private governance of public schools.
Dr. Daniel F. McCaffrey is a Statistician at RAND where he works on studies of health and educational issues. His work on education includes studies on teaching practices and student achievement, the effects of class size reduction on the test scores of California's students, and the properties of hands-on performance measure of achievement in science.
Dr. Brian Stecher is a Senior Social Scientist in the Education program at RAND. Dr. Stecher's research emphasis is applied educational measurement, including the implementation, quality, and impact of state assessment and accountability systems and the cost, quality, and feasibility of performance-based assessments in mathematics and science.
Copyright 2000 by the Education Policy Analysis ArchivesThe World Wide Web address for the Education Policy Analysis Archives is epaa.asu.edu General questions about appropriateness of topics or particular articles may be addressed to the Editor, Gene V Glass, glass@asu.edu or reach him at College of Education, Arizona State University, Tempe, AZ 85287-0211. (602-965-9644). The Commentary Editor is Casey D. Cobb: casey.cobb@unh.edu . EPAA Editorial Board
|