The SAS Education Value-Added Assessment System (SAS® EVAAS®) in the Houston Independent School District (HISD): Intended and Unintended Consequences
Abstract
The SAS Educational Value-Added Assessment System (SAS® EVAAS®) is the most widely used value-added system in the country. It is also self-proclaimed as “the most robust and reliable” system available, with its greatest benefit to help educators improve their teaching practices. This study critically examined the effects of SAS® EVAAS® as experienced by teachers, in one of the largest, high-needs urban school districts in the nation – the Houston Independent School District (HISD). Using a multiple methods approach, this study critically analyzed retrospective quantitative and qualitative data to better comprehend and understand the evidence collected from four teachers whose contracts were not renewed in the summer of 2011, in part given their low SAS® EVAAS® scores. This study also suggests some intended and unintended effects that seem to be occurring as a result of SAS® EVAAS® implementation in HISD. In addition to issues with reliability, bias, teacher attribution, and validity, high-stakes use of SAS® EVAAS® in this district seems to be exacerbating unintended effects.
Keywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
6 Responses to “The SAS Education Value-Added Assessment System (SAS® EVAAS®) in the Houston Independent School District (HISD): Intended and Unintended Consequences”
Leave a Reply
This article has been viewed: 14012 times since April 30, 2012
This work is licensed under a Creative Commons Attribution 3.0 License.
education policy analysis archives
Contact EPAA//AAPE at
1. I teach social studies. Each year students study a different area of social studies. For example, in 5th grade, students study U.S. History, in 6th grade, World Cultures & Geography, in 7th grade, Texas History, and in 8th grade, U.S. History. I feel that comparing test scores from one grade to the next in social studies is like comparing apples to oranges. In reading, it is more like comparing apples to apples, which is fair.
2. If students perform very well one year — 99% for example, and the following year students also perform at 99%, students have performed well, but according to my understanding of EVAAS, the teacher would be seen as not very effective, since there was not much growth. However, 99% any year with any student, is excellent. Students getting 99% means the teacher did her job, but EVAAS does not agree with that. Even if the students went from 99% one year to 98% the next year, the students have still performed very well, but EVAAS would show this as bad for the teacher. There are some major flaws in the EVAAS and value-added system.
The Evaas system discourages high performing teachers with high performing students. For example; if last year the students average at 99% it may seem awesome, yet if this year the students average at 98% the system records this at no growth/progress. The system does not take into account test day variables. (For example, student parent just died, house burned, police arrest, custody issues,apathetic students who do not finish any test, or just plain illness.) It seems that school administrators ignore this anomaly. It appears with prejudice this common practice to include these test scores that are not edited due to the circumstances. Much less the scientific validity of the program.
The whole standards and standardized testing regime has been debunked by Wilson in “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577 . I know of no rebuttal-professional or otherwise-to this study. For a shorter take down of validity or ‘invalidity’ as Wilson likes to call it, as developed in the bible of educational/psychological testing (AERA, APA, & NCME 2000) can be found in “A Little Less than Valid: An Essay Review” at: http://www.edrev.info/essays/v10n5.pdf .
One of the most troubling aspects, other than the total invalidity of the entire process, is that the AERA, APA & NCME and all the test producers state that to use a testing device for any thing other than its stated purpose is UNETHICAL. For example the EVAAS system uses 5th grade math, reading, etc. . . , test scores to significantly evaluate a teacher. Sorry but that is dead wrong and unethical.
As R. Ackhoff states “If you’re doing the wrong thing and attempt to do it better you are getting ‘wronger’”. Or as Wilson states in his essay review “As before, I focus on validity. Why? Because as the good book [AERA, APA, & NCME 2000]says, ‘Validity is, therefore the most fundamental consideration in developing tests (pg 9).’ I concur. If the test event is not valid, if indeed the test is invalid, then all else is VAIN and ILLUSORY (my emphasis).”
This article is a bit of a mess from a social science perspective which, I think, is a fair one against which to evaluate it since it was published in a peer-reviewed journal and leans heavily on the work that will appear in the co-author’s dissertation. If this is an unfair standard against which to judge this publication, then fine. However, taking this position would mean the authors would have to dramatically qualify into meaninglessness any conclusions or implications they generated.
1.) I’m fairly sure the authors chose their cases based on the values of their dependent variable (assuming, and this is difficult to tease out, that the dispositions of the selected teachers is the outcome of interest). This, of course, is a problem for a number of reason, one of which is that you have no variation on your outcome, another of which is makes it easy to select cases that conform to your expected/desired outcomes. You could have chosen four teachers who had not been terminated, but who had similar VA scores, and compared their “lived experiences.” Frankly, this would have been interesting and at least reasonably scientifically defensible, and wouldn’t have exposed you to criticisms of confirmation bias.
2.) The case selection and “analysis” of the data derived from those cases, suffer from a number of biases that render the findings fairly useless. Take the “quasi-selected” teachers. What is the principal rival hypothesis that the authors can’t eliminate? That is, what control variables are missing from their model of teacher termination? They have no way of knowing, beyond some tepid and unverifiable assurances (again, this is a bit of a problem when your principal control is unfalsifiable and you can’t compensate for this by making assumptions about the veracity of these statements since most people are loathe to self-criticize, particularly when their embroiled in a lawsuit to save their job) from the teachers, that they were not terminated for reasons other than their EVAAS scores.
Their PDAS scores provide some evidence here, but the authors do not consider how these may have impacted the teachers’ termination decision. Without knowing the distribution of PDAS scores in the district (I’m assuming that they are severely skewed and overwhelming positive), these scores look quite low. And, interesting, consistently low (with one exception). What percentile were these scores? How much worse (or better) were these teachers’ scores than other faculty? Were these scores lower or higher than other teachers with similar VAM scores who were not terminated? These are interesting questions.
View a video prepared by the authors at: http://epaa.asu.edu/ojs/blog/?p=1944
I have taught physical education and adapted physical education for over 30 years. No one has ever been able to apply any of these evaluations for merit pay in regard to art, music, physical education, special education, speech and language, physical therapy, occupational therapy, guidance counselors, or any or the specialized teachers so prevalent in the public school system. Not every teacher is in a self-contained classroom with a homogeneous and stable population of students. In some of the schools where I’ve taught, the student body turns over up to a third of their students during the school year! How does that get applied? These factors are not even touched in any of the studies I’ve read.