|
page 1 | introduction | background | reliability & validity | interviews | conclusions | references Conclusions and RecommendationsThe Ad Hoc Committee was formed in the summer of 1998 out of concern that important decisions were being based on the Massachusetts Teacher Tests (MTT) scores before any reasonable evidence had been produced concerning their reliability and validity. Since the DOE and NES have not made available any documentation on the reliability and validity of the MTT, in clear violation of professional standards concerning testing, and despite repeated requests for such documentation, the Ad Hoc Committee set out to study the technical merits of the new tests. Our original idea was compare individuals' scores on the MTT with scores on post-collegiate tests (such as the Praxis and the GRE) on which technical documentation is available. Toward this end we invited people to send us score reports on both the MTT and other tests. As of December we had not received sufficient data to undertake a concurrent validity study, comparing MTT scores with those on established tests. But, in the meantime, we examined the reliability of the new tests. Specifically, using data on over 200 individuals who took the MTT in April and July 1998 (generously provided to us by eight institutions of higher education in the Commonwealth), we studied the test-retest reliability of the MTT. We found the correlations between April and July test to be extraordinarily low: about 0.30 for both the MTT Reading and Writing tests. Test-retest correlation coefficients for well-developed standardized tests typically range between 0.80 and 0.90. To examine the possibility that very low correlations were due to restriction of range (only people who scored below 70 on the April tests had to retake them), we corrected for attenuation due to restriction of range and estimated test-retest correlations for the unrestricted population of test-takers. The results indicated test-retest correlations of 0.50 to 0.70 -- still well below the reliability of well-developed tests. We used these results to estimate the error of measurement in MTT scores. We found that MTT scores contain unusually high levels of measurement error--with an error of measurement on the new tests in the range of 9 to 17 points. We estimate that MTT Reading and Writing test scores contain two to three times the degree of error as well-developed tests. Next, we compared pass and failure rates on the April and July administrations to consider the rates of misclassification on the MTT. Using both our test-retest sample, and a much larger sample of data reported on the DOE web site, we found that the MTT tests have very high rates of misclassification--as indicated by the fact that among those who "failed" either the MTT reading or writing test in April, more than 50% "passed" the test in July. Evidence suggests also that a fair number of people who "passed" the MTT did so simply because of error in the tests. We also considered the content and construct validity of the MTT tests. At least one portion of the MTT Writing test (the dictation exercise) raises doubts about the content validity of the MTT and specifically their job-relatedness. Moreover, when we examined the correlation between MTT Reading and Writing test scores, the resulting correlations of about 0.50 raise serious doubt about their construct validity. Previous research suggests that the scores for tests of two related verbal constructs correlate in the range of 0.65 to 0.80. Finally, we report on results of interviews with 15 candidates who took the MTT in April, July, or October (7 of whom passed and 8 failed). Since this was a small and self-selected sample, results are merely suggestive. But they indicate that the unreliability and poor validity of MTT scores may result from the lack of a study guide for the new tests, confusion over whether the April results would "count" towards certification, poor conditions of administration (in at least some test sites), simple fatigue resulting from the 8-hour duration of the tests, and test content. Although all those interviewed supported the idea of certification testing for teachers, as is common with other professions, many compared the MTT unfavorably with other teacher certification tests they had taken (e.g. the Praxis or NTE and certification tests in other states). Recommendations
As James Madison wrote in 1787, in the passage candidates were asked to transcribe in the April 1998 version of the MTT, "No man is allowed to be a judge in his own cause because his interest would certainly bias his judgment and, not improbably, corrupt his integrity." So too with organizations; the DOE, having implemented new teacher certification tests of undocumented validity and reliability, should not be allowed to judge its own cause. Notes
|