EVAAS ® ) in the Houston Independent School District (HISD): Intended and Unintended Consequences

The SAS Educational Value-Added Assessment System (SAS ® EVAAS ® ) is the most widely used value-added system in the country. It is also self-proclaimed as “the most robust and reliable” system available, with its greatest benefit to help educators improve their teaching practices. This study critically examined the effects of SAS ® EVAAS ® as experienced by teachers, in one of the largest, high-needs urban school districts in the nation – the Houston Independent School District (HISD). Using a multiple methods approach, this study critically analyzed retrospective quantitative and qualitative data to better comprehend and understand the evidence collected from four teachers whose contracts were not renewed in the summer of 2011, in part given their low SAS ® EVAAS ® scores. This study also suggests some intended and unintended effects that seem to be occurring as a result of SAS ® EVAAS ® implementation in HISD. In addition to issues with reliability, bias, teacher attribution, and validity, high-stakes use of SAS ® EVAAS ® in this district seems to be exacerbating unintended effects.


Introduction
Since the implementation of No Child Left Behind (NCLB) in 2002, researchers, econometricians, and statisticians have explored different analytical methods to document students' academic progress over time, specifically to replace Adequate Yearly Progress (AYP) measures.More recently, President Obama's Race to the Top competition (2009) encouraged similarly oriented initiatives, contributing over $350 million in federal support (Robelen, 2012) to be allocated to those states that adopt methods to better measure the "value" a teacher "adds" to student learning from year to year.
In theory, value-added models (VAMs) allow for richer analyses of test score data because students are simply followed to assess their learning trajectories from the time they enter a teacher's classroom to the time they leave.In practice, however, these models do not seem to work in many of the ways theorized.For example, it still remains uncertain whether teachers are accurately classified as contributing to differential gains, whether teachers teaching certain types of students (e.g., special education, gifted, and English Language Learners (ELLs)) are fairly assessed, and whether teachers are using value-added output to inform instructional modifications and improvements (Au, 2010;Eckert & Dabrowski, 2010;Haertel, 2011;Hill, Kapitula, & Umland, 2011;Newton, Darling-Hammond, Haertel, & Thomas, 2010;Papay, 2010;Rothstein, 2009).In addition, while the implementation and use of VAMs for high-stakes purposes is increasing across the country, there lingers a paucity of research evidence to support the attachment of significant consequences to value-added output (Braun, 2005;Harris, 2011;Ho, Lewis, & Farris 2009;Schochet & Chiang, 2010).
The purpose of this study was to contribute to the existing research base by critcally examining some intended and unintended effects of the largest and most commonly used VAM -the SAS Education Value-Added Assessment System (SAS ® EVAAS ® ) -in the Houston Independent School District (HISD).This district is using value-added data more than any other in the country for high-stakes purposes, expressly for merit awards and to make teacher termination decisions (Corcoran, 2010;Harris, 2011;Mellon, 2010;Otterman, 2010;Papay, 2010).During the summer of 2011, the two researchers examined SAS ® EVAAS ® data from four teachers whose contracts were not renewed in terms of reliability, bias, teacher attribution, and validity.They examined other intended consequences (e.g., value-added use and data informed change) and unintended consequences (e.g., perverse side effects) as well.

The SAS Education Value-Added Assessment System (SAS ® EVAAS ® )
and the Houston Independent School District (HISD) HISD is the largest school district in Texas and the seventh largest district in the country.The district consists of 300 schools, over 200,000 students, and approximately 13,000 teachers.In addition, the majority of the students in the district are from high-needs backgrounds, with 63% of students labeled at risk, 92% from racial minority backgrounds, 80% on the federal free-or-reduced lunch program, and 58% classified as ELLs, Limited English Proficiency (LEP), or bilingual.While Tennessee, North Carolina, Pennsylvania, and Ohio use SAS ® EVAAS ® statewide, and other states, districts, and schools are using or have plans to implement this model locally, no other school, district, or state uses SAS ® EVAAS ® for consequential decision-making more than HISD (Harris, 2011;Lowrey, 2012;Sparks, 2011).
In 2007, HISD created the Accelerating Student Progress: Increasing Results & Expectations (ASPIRE) program to recognize and celebrate great teaching as measured by student progress (HISD, 2010).District administrators contracted with the SAS software company to measure this progress via their SAS ® EVAAS ® system; this at an approximate cost of $500,000 per year.
In short, the district has two main teacher evaluation and accountability systems: 1) the ASPIRE program in which the district uses one year of SAS ® EVAAS ® scores to rank order teachers throughout the district and 2) the Professional Development and Appraisal System (PDAS), in which teacher observation data is collected by certified appraisers and used to evaluate teachers in eight different domains of teacher performance. 1Considering the two different foci, however, it is common that the district labels and rewards HISD teachers differently across systems, for example, labeling a teacher below average on the PDAS while rewarding the teacher with a bonus through the ASPIRE program and vice versa.The district's oft-conflicting systems cause a fair amount of confusion and mistrust, in particular among HISD teachers (Corcoran, 2010;Harris, 2011;Papay, 2010).
Regardless, with over 20 years of development, the system in use -the SAS ® EVAAS ®is the largest, most widely implemented, and most widely used VAM in the country.While there are at least eight entities developing such models (Banchero & Kesmodel, 2011), like the VAM developed by the Value Added Research Center (VARC) in Wisconsin and the growth model developed by Dr. Damian Betebenner (the Student Growth Percentiles (SGP) model), SAS ® EVAAS ® is "the most comprehensive reporting package of value-added metrics available in the educational market" (SAS, 2012).It is "the most robust and reliable" system available, better than the "other simplistic models found in the market today" (SAS, 2012).It "provides valuable diagnostic information about [instructional] practices," helps educators become more proactive and make more "sound instructional choices," and helps teachers use "resources more strategically to ensure that every student has the chance to succeed" (SAS, 2012).These claims are not without controversy, however (Amrein-Beardsley, 2008;Sanders & Wright, 2008).Researchers used these assertions to frame this study, in particular in terms of the intended or expected outcomes, as advertised, as well as the unintended outcomes researchers discovered along the way.

Preliminary Evidence
Even though the district reported that the majority of teachers favor the ASPIRE program overall (Harris, 2011), researchers found evidence suggesting that HISD teachers have aversions towards the program's SAS ® EVAAS ® component (Collins, in progress).In terms of reliability, those receiving merit monies attached to their SAS ® EVAAS ® output often compare winning the rewards to "winning the lottery," given the random, "chaotic," year-to-year instabilities they see.Such consistencies are also well noted in literature (Baeder, 2010;Baker, Barton, Darling-Hammond, Haertel, Ladd, Linn et al., 2010;Haertel, 2011;Koedel & Betts, 2007;Papay, 2010).Teachers do not seem to understand why they are rewarded, especially because they profess that they do nothing differently from year to year as their SAS ® EVAAS ® rankings "jump around."Along with the highs come much-appreciated monetary awards, but for what teachers did differently from one year to the next remains unknown.
Teachers who do not receive merit monies attribute the lack of rewards to the types of students they teach and how these students might bias their scores (Collins, in progress; see also Hill et al., 2011;Newton et al., 2010;Rothstein, 2009).Teachers who loop or teach back-to-back grade levels report bonuses for the first year and nothing the next as they "max out" on growth the first year with the same students.Teachers of grades in which ELLs are transitioned into mainstreamed English-only classrooms report being the least likely to demonstrate added value and the most likely to be deemed "ineffective."Teachers of inordinate numbers of special education students express similar concerns (Collins, in progress; see also Hill et al., 2011;Newton et al., 2010;Rothstein, 2009).
There are also ceiling effects prevalent, whereas teachers of gifted students experience difficulties demonstrating added value (see also Wright et al., 1997).
Almost half (46%) of a sample of HISD teachers who moved to different grade levels reported switching value-added ranks after the move, from "ineffective" to "effective" or vice versa, also across grade levels that were adjacent (Collins, in progress).This is problematic as the SAS ® EVAAS ® system is purported to measure the teacher effectiveness construct consistently, and validly.Dr. William L. Sanders, the developer of the SAS ® EVAAS ® , claims that teachers who move from one environment to another, even if radically different, continue to do just as well (LeClaire, 2011).
Furthermore, over half (55%) of a same sample of HISD teachers noted that their SAS ® EVAAS ® reports did not match their supervisors' observational PDAS scores (Collins, in progress).In addition, some suggest that their supervisors are skewing their observational scores to match their SAS ® EVAAS ® scores given external pressures to do so (Collins, in progress).Such practices have been shown to occur elsewhere with the Tennessee Value-Added Assessment System (TVAAS) from which the SAS ® EVAAS ® was derived (Garland, 2012).In New York as well, if teachers have two years of low value-added scores, the teachers are to be rated ineffective overall and terminated, regardless of what other measures (e.g., supervisor evaluation scores) indicate or disclose (Ravitch, 2012).Because these other measures are often perceived as less objective, it seems that measuring teacher effectiveness using value-added output is beginning to trump other indicators capturing what it means to be an effective teacher.This raises major concerns about cogency and power (i.e., evidence of criterion-related validity).Such practices also contradict the field standards developed by the prominent national associations on educational measurement and testing (AERA, APA, & NCME, 2000).These standards note first and foremost that high-stakes decisions "should not be made on the basis of test scores alone.Other relevant information should be taken into account to enhance the overall validity of such decisions" (AERA, 2000).
Ten percent of the same teachers (10%) noted substantive concerns about being evaluated for content they were not teaching, or being held accountable while teaching alongside other teachers teaching the same students the same subjects at the same time (Collins, in progress).SAS ® EVAAS ® methodologists state they can adequately control for this using fractions and proportional contributions, however (Derringer, 2010;Sanders & Horn, 1994).
Numerous teachers, especially science and social studies teachers teaching non-tested subjects in every grade level, also note issues when norm-referenced tests are used with criterionreferenced tests to determine SAS ® EVAAS ® growth from year to year.They note concerns about the pretest scores used to calculate value-added coming from different tests than the post-test scores, and vice versa.Additionally, they note concerns about the norm-referenced tests not being linked to state standards.While norm-referenced and criterion-referenced tests can be normed, and this is somewhat common, this still raises issues with content alignment (i.e., evidence of contentrelated validity).
In terms of formative uses, because SAS ® EVAAS ® output is often received months after students leave, teachers express that such output makes little sense, and they are learning little about what they did effectively or how they might use SAS ® EVAAS® data to improve their own instruction (see also Eckert & Dabrowski, 2010;Harris, 2011).Of the same sample of HISD teachers surveyed, the majority (55%) note that they receive their SAS ® EVAAS ® reports in the summer or fall after students leave their classrooms.A plurality (40%) also reported they were unaware of HISD-sponsored professional development trainings about how to better understand or use their SAS ® EVAAS ® data to improve their instruction (Collins, in progress).This is problematic since SAS ® EVAAS ® 's principal claimed strength is to provide a "wealth of positive diagnostic information" for formative purposes (Sanders, Wright, Rivers & Leandro, 2009, p. 9).

HISD, SAS ® EVAAS ® , and Teacher Non-Renewals
In the spring of 2011, HISD did not renew 221 of its teachers' contracts (HISD, 2011).A number of these teachers' contracts were not renewed at least in part due to "a significant lack of student progress attributable to the educator," or "insufficient student academic growth reflected by [SAS ® EVAAS ® ] value-added scores."HISD did not respond to researchers ' Open Records Request (submitted September 15, 2011) soliciting the actual number of unnamed teachers whose contracts were not renewed at least in part due to SAS ® EVAAS ® scores in spring of 2011, however, so it is uncertain how many teachers were actually terminated for these reasons.All that is known is that, according to one of the lead lawyers retained in these teachers' defense (A. Reichek, personal communication, June 8, 2011), a number of HISD teachers' non-renewal letters cited these reasons for termination.According to the Vice President of the Houston Federation of Teachers (HFT), this number was greater than 100 or nearly 50% (Z.Capo, personal communication, April 6, 2012).Researchers are also unaware of how many teachers pursued due process hearings, how many of them followed their due process hearings through to culmination, and how many were actually terminated after their due process hearings concluded.Researchers are, however, aware that attaching such high-stakes decisions to VAM output in general is expected "to lead to a flood of litigation challenging teacher dismissals" as "value added modeling as a basis for high stakes decision making is fraught with problems likely to be vetted in the courts" (Baker, 2012).What researchers examined here are four such cases.

Participants
For four of the terminated teachers, the same lead lawyer invited one of the researchers, (the first author, hereafter referred to as the primary researcher) to serve as the expert witness and testify on their behalves.In terms of sampling procedures, the primary researcher did not select the four teachers with any methodological reason or representative sampling approach.Rather, the teachers quasi-selected the researcher via their lawyer.The lawyer retained the primary researcher to testify regarding (1) the SAS ® EVAAS ® in general, (2) whether SAS ® EVAAS ® output for each teacher accurately evidenced that the teacher positively or negatively impacted student achievement and growth, and (3) whether the grounds and reasoning on which their contracts were not renewed were justifiable and sound.
The teachers, four female elementary school teachers, were from racial minority backgrounds (three were African American and one was Latina).Their ages ranged from 28-51.They collectively averaged 11.8 years of teaching experience and 7.5 years teaching in HISD.Two were certified via a traditional teacher certification program and the other two were certified via HISD's Alternative Teaching Certificate (ATC) program.All teachers taught core subject areas (reading, language arts, math, social studies, and science) in grades 3-7, and they all taught in different schools under different school administrations.
It should be noted, here, that given the sensitive nature of these teachers' experiences and testimonies, the primary researcher also secured the four teachers' signed permissions to use the data collected for the lawsuit also for presentation and publication purposes.The primary researcher consulted with each participant about confidentiality, her general rights, and in particular her right to opt out of the study or pull her data from study inclusion at any time, and if at any time she felt she was at risk or to be placed at risk in the foreseeable future.The primary researcher also gave each participant the contact information for the Chair of the Human Subjects Institutional Review Board, through the Arizona State University Office of Research Integrity and Assurance (IRB Study #11108006705).

Multiple-Methods, Case Study Approach
The researchers conducted a case study (Campbell, 1975;Flyvbjerg, 2011;Ragin & Becker, 2000;Thomas, 2011a) using multiple-methods to examine the collective cases of the four units at focus (Gerring, 2004).The cases were similar and separate enough to permit such an analysis, especially as each of the teachers had associated experiences and could serve as comparable instances of the same general phenomenon (Ragin & Becker, 2000).Their practical experiences (Flyvbjerg, 2011) could help others better understand how this value-added system was being used within HISD.
The primary researcher collected retrospective quantitative and qualitative data to better comprehend and understand the four teachers' data that more holistically captured their effectiveness as teachers (Creswell, 2008;Day, Sammons, & Gu, 2008;Greene, Caracelli, & Graham, 1989;Johnson & Onwuegbuzie, 2004).The quantitative documents included each teacher's SAS ® EVAAS ® scores and supervisor observational scores based on the district's PDAS system, and the qualitative information came from the written comments provided on each teacher's PDAS forms for the same years for which SAS ® EVAAS ® data were collected and from the in-depth phone interviews the primary researcher also conducted.
Specifically, the primary researcher collected each teacher's SAS ® EVAAS ® Teacher Value-Added Reports (see Figure 1).The district in contract with SAS provides such reports, alongside SAS ® EVAAS ® Reports for Teacher Reflection (see also Figure 1), to teachers yearly through an online portal.These reports include a color chart intended to offer teachers a graphical display of how their different students (low, middle, and high performing) progressed in their classrooms as compared to the district average.The reports also include a table to complement the chart and quantify the colors displayed.Resource guides are available to help consumers understand these reports as well (see, for example, SAS, 2007) As written, these reports are to be used to evaluate how well individual teachers facilitate student achievement on Texas's Assessment of Knowledge and Skills (TAKS and TAKS Accommodated) and Stanford/Aprenda achievement tests that are used in non-TAKS grades and subject areas.These reports are used to compare how well teachers influence student progress as compared to similar teachers within the district.While scores include an individual teacher's normal curve equivalent (NCE) gain, a measure of standard error for confidence, and a district reference gain also expressed as an NCE indicating how the district did compared to the state average each year, the score of interest here was the gain score index.This score compares each teacher to other similar teachers across the district, and this is the score that HISD uses for determining ASPIRE awards.2The primary researcher also collected each teacher's PDAS supervisor evaluation scores.She collected their PDAS scores, as they are also valued in HISD's ASPIRE system, for the same years as each teacher's SAS ® EVAAS ® scores to help contextualize and better understand each teacher's SAS ® EVAAS ® data.On the PDAS, both numerical scores (i.e., scores marked for each of the eight domains included on the PDAS instrument and overall) and supervisors' written comments (i.e., by domain and overall) were collected for analysis (see a listing of these domains in Sidebar 1; see also PDAS, 2004).While it is likely that observational scores are often inflated, and this is in large part why more objective measures of teacher effectiveness like VAMs are adopted and implemented (Jacob & Lefgren, 2007;Harris, 2009Harris, , 2011;;Ravitch, 2012), it was also important to examine whether in fact observational measures correlated with SAS ® EVAAS ® output (see also Milanowski, Kimball, & White, 2004).The primary researcher collected this information here because general evidence is lacking to indicate that this measure of teacher value-added is indeed related to at least one other correlated criterion (i.e., evidence of criteria-related validity).The primary researcher also collected other indicators teachers might have had for the same years of analysis, especially about their effectiveness as teachers and to further examine this type of validity.
Finally, the primary researcher collected qualitative data via extensive phone interviews.The researcher spoke with each of the four teachers by phone in the summer of 2011 an average of 2.5 hours per interview.She followed-up with shorter phone calls for verification purposes on occasion.During each phone interview, she first asked each teacher a set of demographic questions (teaching certification, number of total years teaching and teaching in HISD, age, and racial backgrounds).She then asked each teacher to explain their corpus of data for each school year, as aligned with the aforementioned documents.Last, she asked each teacher an additional four, open-ended questions to get at any information that might have been missed and to take a preliminary look at data comprehension, use, and levels of professional support for both.The primary researcher asked each teacher the following: • Is there anything else you can think of in terms of reasons why your contract is not being renewed (e.g., excessive absenteeism, insubordination, other test scores)?• Do you understand your SAS ® EVAAS ® value-added scores?
• Have you received training on how to understand your SAS ® EVAAS ® reports/scores?• Have you received professional development as a result of your SAS ® EVAAS ® scores?

Data Analysis
The primary researcher transcribed the interview data and analyzed the transcripts alongside the numerical data, year-by-year, to establish a longitudinal chain of evidence (Yin, 1994).Specifically, the primary researcher analyzed the data by case, and then compared incidents within individuals over time.The researcher developed working assertions across cases, as well, to integrate and develop broader themes (Glaser & Strauss, 1967;Lincoln & Guba, 1985;Patton, 2001).
The teachers involved in the study verified results and findings via a series of memberchecks (Guba & Lincoln, 1981).The four teachers read the final report and checked it for accuracy and authenticity, clarified misunderstandings and misconceptions, and verified the overall findings.Researchers also resituated the findings within the literature if they added to specific topics about value-added methods and systems specifically or in general.
It is important to highlight that the experiences of these four teachers should not, however, be used to generalize beyond HISD or to all teachers in HISD.Nevertheless, the researchers are confident that their findings still deliver a strong message and may generalize to the other approximately 100 plus teachers whose contracts were not renewed at least in part due to "a significant lack of student progress attributable to the educator," or "insufficient student academic growth reflected by [SAS ® EVAAS ® ] value-added scores."Even with a limited, non-representative sample of four, patterns and overall findings may also help others understand this particular valueadded system better, via the lived experiences of these teachers in HISD (Feagin & Orum, 1991;Yin 1994).

Results
Teacher A Teacher A, a university-certified teacher, was an elementary school teacher in HISD since 2000.Illustrated in Table 1 is a summary of Teacher A's SAS ® EVAAS ® and PDAS scores and ASPIRE bonuses since 2007, the first year of HISD's ASPIRE system.).This means that the progress Teacher A's class made was not detectively different from the reference gain scores of other teachers across HISD given one standard error; however, the scores are still reported to both the teachers and their supervisors as they are here.
Across all years and subject areas for which Teacher A had SAS ® EVAAS ® data, she added value to her students' learning (relative to all other HISD teachers) 50% of the time (8/16 of SAS ® EVAAS ® observations), and detracted value (relative to all other HISD teachers) the other 50% of the time (8/16 of SAS ® EVAAS ® observations).According to these SAS ® EVAAS ® output, the probability that Teacher A was truly an effective or ineffective teacher was no different than the flip of a coin.Additionally, looking at Teacher A's most recent years of activity, she added more value than she had in previous years, making termination unreasonable and indefensible, especially on the grounds that there was "a significant lack of student progress attributable to the educator" or "insufficient student academic growth reflected by [SAS ® EVAAS ® ] value-added scores." Analyzing Teacher A's SAS ® EVAAS ® scores alongside her PDAS scores, it is not only visually obvious that there is something peculiar about the relationship between Teacher A's performance on the SAS ® EVAAS ® and her supervisor evaluation scores, it is also statistically evident.The correlation between Teacher A's SAS ® EVAAS ® and PDAS scores across reading (r = -0.51),math (r = -0.83),and language arts (r = -0.11)from 2007-2010 suggest that beyond no correlation, the better Teacher A did on the SAS ® EVAAS ® the worse she did in the eyes of her supervisor(s), and vice versa.In addition, Teacher A was monetarily rewarded in a way that did not make sense.The worse she did the more money she received (r = -0.42).Until 2010-2011, Teacher A "exceeded expectations" across every PDAS domain, and her colleagues recognized her as both a "Teacher of the Month" and the "Teacher of the Year" in 2010.
Otherwise, Teacher A was only familiar with SAS ® EVAAS ® due to the score reports distributed each year and because her colleagues and supervisors used to talk about something called "value-added."Nobody ever explained her SAS ® EVAAS ® scores to her, and she never fully understood what the numbers meant, how they could impact or "hurt her," or how she could use her SAS ® EVAAS ® scores to help her improve her own instruction.Additionally, she never received professional development as a result of her value-added scores, although whether she needed professional development to help her improve her value-added scores is questionable.

Teacher B
Teacher B, a career-changer with a bachelor's and master's degree in mathematics, was certified as a math teacher for grades 2-12 via HISD's Alternative Teaching Certificate (ATC) program in 2007.Illustrated in Table 2 is a summary of Teacher B's SAS ® EVAAS ® and PDAS scores and ASPIRE bonuses scores since 2008.).This means that the progress Teacher B's class made was not detectively different from the reference gain scores of other teachers across HISD given one standard error; however, the scores are still reported to both the teachers and their supervisors as they are here.
Teacher B's relative value-added scores were negative for math for two years, and positive for the most recent year for which she had SAS ® EVAAS ® data.In her most recent position for which she had SAS ® EVAAS ® data she seemed to have added value to her students' learning.She taught alongside another math teacher who taught nearly half of her students math an equal amount of time per week all year.Whether she alone demonstrated "a significant lack of student progress attributable to the educator," or "insufficient student academic growth reflected by [SAS ® EVAAS ® ] value-added scores" is debatable.In addition, value-added researchers agree that at least three years of value-added data are needed to make such judgments (Baker, 2012;Brophy, 1973;Cody, McFarland, Moore, & Preston, 2010;Harris, 2011), and even then a 25% risk of misclassification remains (Au, 2010;CCSSO, 2010;Otterman, 2010;Schochet & Chiang, 2010;Shaw & Bovaird, 2011).She did not have three years of consistent data, and her most recent year was demonstrably her best.
Analyzing Teacher B's SAS ® EVAAS ® scores alongside her PDAS scores, there is a strong relationship between Teacher B's SAS ® EVAAS ® and supervisor evaluation scores (r = 0.91).The better Teacher B did on the SAS ® EVAAS ® the better she did in the eyes of her supervisor(s), and vice versa.This yields the type of correlation coefficient we would expect to see if both indicators reliably and validly measured teacher effectiveness (i.e., criterion-related evidence of validity).In addition, Teacher B was monetarily rewarded in a way that made sense; the better she did the more money she received (r = 0.93).
Otherwise, the knowledge that Teacher B had about the SAS ® EVAAS ® was also sparse.She did not understand how "they" calculated her value-added scores.She would "just see the scores."She also knew that "they" compared her scores "to everybody else's in the district."This teacher did not receive training to understand, or professional development to improve her value-added scores, although whether her most-recent value-added scores were in need of improvement is unclear.

Teacher C
Teacher C graduated with a bachelor's degree in early childhood education in 1999, and received a master's degree in school counseling in 2000.Thereafter, she served as a long-term substitute in HISD until she took a full-time teaching position in HISD, teaching 6 th grade in 2003.).This means that the progress Teacher C's class made was not detectively different from the reference gain scores of other teachers across HISD given one standard error; however, the scores are still reported to both the teachers and their supervisors as they are here.

Illustrated in
Teacher C's overall SAS ® EVAAS ® scores across years and subjects evidence that Teacher C detracted value from her students' learning (relative to all other HISD teachers) 100% of the time across three subject areas.This was likely because Teacher C taught some of the highest needs students, possibly across the district, however.The ages of the 6 th grade students in her remedial classes ranged from 10 (the typical age of a 6 th grader) to 15 (the typical age of a high school freshman).Almost half of Teacher C's students over time had been retained in grade one to four times prior.
Analyzing Teacher C's SAS ® EVAAS ® scores alongside her math PDAS scores was not possible as only two SAS ® EVAAS ® scores were available, although her social studies SAS ® EVAAS ® and PDAS scores were mildly related (r = 0.26).Teacher C's monetary bonuses and PDAS scores were also mildly related (r = 0.29).Until 2010-11, she "exceeded expectations" across almost every domain in terms of her supervisor evaluations.She was also given a "Teacher of the Year" award during the 2007-08 school year by her teacher peers.
Otherwise, the knowledge that Teacher C had about the SAS ® EVAAS ® was also limited.She understood that she was being compared to other HISD teachers who taught the same subject areas to students who were "very different than her students."She, like the others, never received training to understand, or professional development to improve, her value-added scores.

Teacher D
Teacher D graduated with a bachelor's degree in business and administration in 2005 and in 2007 was certified as a teacher for grades 4-8 via HISD's Alternative Teaching Certificate (ATC) program.She took a full-time teaching position in HISD in 2006.Illustrated in Table 4 is a summary of Teacher D's SAS ® EVAAS ® and PDAS scores and ASPIRE bonuses since 2007.).This means that the progress Teacher D's class made was not detectively different from the reference gain scores of other teachers across HISD given one standard error; however, the scores are still reported to both the teachers and their supervisors as they are here.
Up until 2009-2010 Teacher D, like Teacher A, switched back and forth across subject areas, demonstrating added overall from 2006-2009 50% of the time (3/6 SAS ® EVAAS ® observations) and demonstrating negative value 50% of the time (3/6 SAS ® EVAAS ® observations).According to her SAS ® EVAAS ® output, like Teacher A, the probability that Teacher D was an effective teacher up until 2009-2010 was no different than the flip of a coin.Given Teacher D's most recent year of SAS ® EVAAS ® data (2009-2010), however, she seemingly detracted from student learning across all three subject areas.In 2009-2010 Teacher D was assigned to teach an inordinate number of ELLs who were transitioned into her classroom.This will be discussed in more detail later.Regardless, whether Teacher D demonstrated "a significant lack of student progress attributable to the educator," or "insufficient student academic growth reflected by [SAS ® EVAAS ® ] value-added scores" is still disputable.
In terms of the relationship between Teacher D's performance on the PDAS and her students' SAS ® EVAAS ® scores in reading, there was a mild correlation (r = 0.29).In terms of her performance on the PDAS and her students' SAS ® EVAAS ® scores in language arts, there was a strong correlation (r = 0.92).In addition, the better Teacher D scored on the SAS ® EVAAS ® the more money she received (r = 0.79).Until 2010-11, she "exceeded expectations" or was "proficient" across every domain in terms of her supervisor evaluations.
In terms of Teacher D's knowledge about the SAS ® EVAAS ® , she reported not understanding how "they" could use different tests to evaluate her and whether she added or detracted value from her students' learning.She also did not trust whether "they" could really account for the types of students she had her in classroom, especially when she taught a disproportionate number of ELLs, in comparison and in her last year.While she reported having tried to figure SAS ® EVAAS ® out on her own online via the district's online resources, she found it very confusing.It "just did not hit home."

Reliability
According its developers, SAS ® EVAAS ® is meant to "assess and predict student performance with precision and reliability" and it is "the most robust and reliable" value-added system available, more than the "other simplistic models found in the market today" (SAS, 2011).In terms of the data presented here, however, it is clear that inconsistencies were a consistent problem.Across the four cases, issues with reliability were evident.Such issues with reliability are also well documented in the literature (Au, 2010;Baeder, 2010;Baker et al, 2010;CCSSO, 2010;Haertel, 2011;Koedel & Betts, 2007;Papay, 2010, Shaw & Bovaird, 2011;Schochet & Chiang, 2010).
Yet these four teachers were removed from their teaching positions "at least in part" due to SAS ® EVAAS ® data that in three of the four cases researchers evidenced as unreliable (see Tables 1-4).The probability that three of the four teachers added or detracted value from year-to-year was roughly the same as the flip of a coin.This is pragmatically, methodologically, conceptually, and morally concerning.In addition, as researchers suggest that at least three years of value-added data are needed to make such judgments (Brophy, 1973;Cody et al., 2010;Harris, 2011), and even then with a 25% risk of misclassification (Au, 2010;CCSSO, 2010;Otterman, 2010;Schochet & Chiang, 2010;Shaw & Bovaird, 2011), this is also troublesome.Not one of the four teachers had three years of consistent data (that were detectibly different from other similar teachers) to warrant nonrenewal.
Other HISD teachers whom researchers (Collins, in progress)  SAS ® EVAAS ® developers claim they have evidence that teachers who move from one environment to another, even if radically different, continue to do as well and are classified the same in SAS ® EVAAS ® terms over time (LeClaire, 2011).Evidence presented herein should yield caution regarding this assertion.

Bias
Teachers credited such "chaos" to the different students they taught and the different classroom contexts in which they taught year-to-year.For example, while Teacher C's SAS ® EVAAS ® data illustrated that Teacher C consistently detracted value from her students' learning, and did so across subject areas, this was likely because Teacher C taught some of the highest-need students, possibly across the district.
In addition, HISD teachers note that those teaching inordinate numbers of special education students in mainstreamed classrooms are least likely to add value (Collins, in progress).Teachers teaching the same students over consecutive years (e.g., looping) report receiving bonuses for the first year and nothing the next as they are "maxing out" on growth, and actually "competing with themselves."Teachers agree that it is best for them "to get average kids, yes, because the regular kids, you can grow those kids!" Teachers teaching gifted students report finding it very difficult to add value and get merit pay as a result (Collins, in progress; see also Wright, Horn, & Sanders, 1997) Another 4 th /5 th grade teacher explained, "When they say nobody wants to do 4th gradenobody wants to do 4th grade!Nobody" (Collins, in progress).This was evidenced in the data collected for Teacher D as well who, like Teacher A, switched back and forth across subject areas until her last year during which she purportedly detracted value across subject areas.This was the year her supervisor assigned her to teach an ELL transition year, during which an inordinate number of ELLs entered her classroom.
Until SAS ® EVAAS ® developers can evidence that teachers teaching inordinate numbers of ELLs particularly in transition years, and teachers teaching special education or gifted students are not disparately impacted by the non-random placement of these students into their classrooms (Monk, 1987;Rothstein, 2009), terminating teachers on these grounds is remiss and morally indefensible.Just recently, both SAS ® EVAAS ® and HISD's Chief Human Resources Officer acknowledged via email that ceiling effects adversely impacted some teachers working with gifted students in their capacities to demonstrate value-added (A. Best, personal communication, January 21, 2012).

Teacher Attribution
The aforementioned lack of reliability could also be due to other context-related issues that further complicate the calculation of a teacher's value-added.Teacher B, for example, whose SAS ® EVAAS ® scores were negative for two years and positive for the most recent year for which she had data, taught for the same years alongside a math enrichment teacher who taught almost half of her students at the same time and an equal amount of time per week.Teacher A was not a teacher of record for approximately 50% of her students one of the years for which she was held accountable using the SAS ® EVAAS ® because she was moved from teaching the third to the fourth grade midyear.Another HISD teacher taught alongside a reading specialist four days per week, and then posted the most growth and received the largest bonus she ever had (Collins, in progress).It is uncertain whether the reading specialist received a bonus for her apparent contributions as well.
Nonetheless, these instances raise concerns about the percentage of value teachers under this system add to, or detract from their students' learning and achievement and whether they can be held responsible for 100% of their students' scores.These issues might also play into why such inconsistencies are evident.Determining what percent of value-added scores can be attributed to teachers is very difficult, if even possible (Campbell & Stanley, 1963;Corcoran, 2010;Ishii, & Rivkin, 2009;Kane & Staiger, 2008;Kennedy, 2010;Linn, 2008;Nelson, 2011;Papay, 2010;Rothstein, 2009).
SAS ® EVAAS ® developers claim, though, that through a linking verification process (during which teachers mark for what percent of each student's instruction (s)he should be held accountable) they can partition out different teachers' value-added effects (Derringer, 2010;Sanders & Horn, 1994).However, there is no empirical evidence suggesting that numerically splitting or dividing teacher effects accurately accounts for a teacher's contribution.In addition, not only is such a practice counterintuitive, but breaking up effort across teachers using percentages and proportions is nonsensical given the interaction effects that occur among and between students and teachers (Monk, 1987).Teachers are situated in complex and collaborative learning environments.It is highly unlikely their value-added effects can be fractionalized using simple or even complex mathematics and statistics.

Criterion-Related Evidence of Validity
One way to generate criterion-related evidence of validity is to assess teachers who demonstrate added value are also the teachers deemed effective through other, independent measures of teacher quality concurrently or at the same time (see also McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004a).In this instance, researchers examined whether the four nonrenewed teachers also seemed to be ineffectual given their PDAS scores, specifically to determine if these teachers' supervisors also observed that these teachers were inadequate.
Analyzing the four teachers' SAS ® EVAAS ® scores over time alongside their PDAS scores, researchers found statistical signals indicating that both of these measures were not measuring the teaching effectiveness construct accurately and consistently across teachers.The better Teacher A did on the SAS ® EVAAS ® the worse she did in the eyes of her supervisor(s) (r = -0.51,r = -0.83,r = -0.11).Yet for Teacher B, the better she did on the SAS ® EVAAS ® the better she did on the PDAS (r = 0.91).This yields the type of correlation coefficient we would expect to see if in fact both indicators pointed in the same direction, yielding valid results.Researchers were not able to analyze Teacher C's math SAS ® EVAAS ® scores alongside her PDAS scores, but her social studies SAS ® EVAAS ® and PDAS scores were mildly related (r = 0.26).For Teacher D there were weak to strong results (r = 0.29, r = 0.92).The conclusion here is that there is nothing substantive to evidence that a valid teacher evaluation system, based on SAS ® EVAAS ® and PDAS scores, is in place and in use.This assertion is however limited by the small sample size herein.
Additionally, analyzing all four teachers' SAS ® EVAAS ® scores over time alongside their bonuses, researchers found that both of these measures failed to assess the teaching effectiveness construct accurately and consistently across teachers as well.The worse Teacher A did on the SAS ® EVAAS ® the more money she received (r = -0.42),and the better Teacher B did the more money she received (r = 0.93).Teacher C's monetary bonuses and SAS ® EVAAS ® scores were mildly related (r = 0.29), and Teacher D's monetary bonuses and SAS ® EVAAS ® scores were more strongly related (r = 0.79).Again, the small sample size certainly limits generalizability here, although evidence of this occurring elsewhere exists (Collins, in progress;Harris, 2011).
In addition, three of four teachers were honored with teaching awards (e.g., teacher of the year or month awards) during the same years for which they posted SAS ® EVAAS ® data that at least in part led to their contracts not being renewed.Teacher C ironically received a Teacher of the Year award, awarded to her by her peers, at the same time she detracted the most value from her students' learning according to her SAS ® EVAAS ® data.This raises additional concerns about whether these indicators are capturing the teaching effectiveness construct effectively, or validly.See notes above about sample size limitations.

External Pressures
It is also important to note that these teachers felt that they were targeted for termination because of the performance of the schools in which they taught, which were labeled "in-need-ofimprovement" under NCLB.According to the four teachers, administrators were under intense district and state pressure, and administrators set out or were forced to "restructure the school" and "start firing teachers."Teachers A, B, C, and D all felt that they were part of "a larger plan."Because they believed their supervisors perceived them to have low, or possibly lower value-added scores than their peers, the teachers felt that they had been put "on a list."It was at this time when they became most vulnerable, and when their PDAS observational scores plummeted.Teacher A, for example, "exceeded expectations" on her yearly PDAS reports until 2010-2011 when a new principal arrived and ranked her "proficient" or "below expectations" across domains.Teacher B's PDAS scores dropped as well, but her supervisor wrote on her PDAS form that she could not have earned higher scores because the state classified the school's scores as "unacceptable."Three different administrators evaluated Teacher C and she consistently "exceeded expectations," but in 2010-2011 when she was evaluated by a short-term administrator, she too was rated as "proficient" or "below expectations" across the board.Similarly, Teacher D's supervisor's actions became perceptibly more aggressive.
Other teachers noted that their supervisors were beginning to skew their observational scores given external pressures to do so (Collins, in progress) A middle school teacher agreed: "Well my evaluations were fine, but of course now they have to make the evaluation match the SAS ® EVAAS ® .We now have to go through that."An 8 th grade teacher added: They're not about to go to bat [for us, although] a few of them will.But most of them are going to go in there, and they're going to create a teacher evaluation that reflects the [SAS ® EVAAS ® ] data because they don't want to have to explain, again and again, why they're giving high classroom observation assessments when the data shows [sic] that the teacher is low performing.
A 4 th grade teacher noted, "Our principal pressures us.You bet she pressures.If you don't make [SAS ® EVAAS ® ], then it goes against you in your PDAS.In a roundabout way she finds a way to put that against you."An 8 th grade advanced English teacher added: My boss had to go to the district superintendent and explain why we needed to be kept, when ultimately the data showed that we weren't good teachers… [But] you've got other good teachers who are being thrown under the bus because of this system.
From these teachers' perspectives, it seems that many district administrators are more trusting of SAS ® EVAAS ® and are skewing PDAS data to match.This makes sense in theory, as the SAS ® EVAAS ® is the objective system that the district has purchased, and traditional observational scores are increasingly being dismissed as subjective (Harris, 2011).In Tennessee and New York there is evidence of local policies pushing such practices (Baker, 2012;Garland, 2012;Ravitch, 2012), although again such practices contradict the field standards encouraging the use of multiple measures for decision-making (AERA, APA, & NCME, 2000).

Diagnostics and Formative Uses
Overall, Teachers A, B, C, and D were only familiar with SAS ® EVAAS ® .They understood that they were being compared to other similar teachers within the district, and they understood their scores were available each year via the district's online portal system, but that was about the extent of their knowledge.Nobody had explained their SAS ® EVAAS ® data to them, and none of the four teachers understood what their SAS ® EVAAS ® numbers meant, how they were calculated, how their SAS ® EVAAS ® scores could be "used against [them]," or conversely how they could use their SAS ® EVAAS ® scores to help them improve their instruction.Teacher D took steps to figure out her SAS ® EVAAS ® scores on her own, but her SAS ® EVAAS ® scores still "just did not hit home" (see also Eckert & Dabrowski, 2010).
The four terminated teachers did not receive professional development from HISD or SAS ® EVAAS ® as a result of their value-added scores either.However, given the scores illustrated in Tables 1-4, whether each teacher needed professional development to improve their value-added scores is disputable.Because they were terminated at least in part due to their SAS ® EVAAS ® scores, and because they were reportedly not given professional development to improve their scores, this too is troublesome.Similarly, none of the teachers noted that they used SAS ® EVAAS ® data to inform their instruction, in many ways because they did not understand it.
In short, no data suggest that for these four teachers in HISD that the SAS ® EVAAS ® system "provides valuable diagnostic information about [instructional] practices," helps educators become more proactive and make more "sound instructional choices," and helps teachers use their "resources more strategically to ensure that every student has the chance to succeed" (SAS, 2011).In addition, 60% of a sample of HISD teachers indicate that they are not using SAS ® EVAAS ® data to inform their instruction either.This is not to say, however, that this is not occurring elsewhere, perhaps in the district for the other 40% (taking into account sampling error) or in other states, districts, and schools using the SAS ® EVAAS ® system (see, for example, marketing testimonials available on the SAS website, SAS, 2012).

Conclusions
In the end, Teachers A, B, and D pursued due process hearings, but they decided not to follow their hearings through to culmination.They ultimately decided to quit teaching in HISD or altogether.Teacher C (the teacher who according to her SAS ® EVAAS ® output had the poorest value-added scores) took her case through her due process hearing.Her hearing officer noted that the types of students Teacher C typically taught most likely biased her capacity to demonstrate value-added and show growth.The hearing officer also noted that Teacher C did not have multiple years of consistent data in the core subject areas she taught to warrant a decision regarding whether she was indeed an effective teacher.
But in sum, and based on the cases of these four teachers, it seems the district is inappropriately using inconsistent data within and across subject areas to make high stakes decisions about teachers, and in this case teacher termination.This was evidenced through examinations of four teachers' SAS ® EVAAS ® data, how they correlated with other data meant to capture the same teaching effectiveness construct, and teachers' complementary stories, collected to better examine the data and other relevant issues.
The goal of this study was also to examine other intended and unintended effects of the SAS ® EVAAS ® system, in particular given HISD's use of the system for high-stakes decisionmaking.In terms of intended effects, the four terminated teachers did not seem to understand SAS ® EVAAS ® output well enough to understand or use value-added data to inform or improve their instruction.This happens particularly when district leaders do not provide professional development to promote formative use (see also Eckert & Dabrowski, 2010;Harris, 2011).But this is also particularly problematic in that "when cases challenging dismissal based on VAM make it to court, deliberations will center on [among other things]…whether teachers are able to understand the basis for which they have been dismissed and whether it is assumed that they have had any control over their fate" (Baker 2012).In general, whether VAMs succeed in their intended objectives will also bet contested.Researchers examined these issues here by framing this study around the marketing materials publicized by SAS ® EVAAS ® .
In terms of unintended effects, however, researchers also evidenced specific issues with reliability, bias, teacher attribution, and validity; issues also evident in the growing research literature and also named in the anticipated lawsuits (Baker, 2012).Researchers found that high-stakes use of SAS ® EVAAS ® in this district seems to be exacerbating unintended effects.
Results from the four teachers indicate there are consistent problems with inconsistencies with the SAS ® EVAAS ® data (see also Au, 2010;Baeder, 2010;Baker et al., 2010;Corcoran, 2010;Haertel, 2011;Koedel & Betts, 2007;Papay, 2010).These inconsistencies are likely related to the measurement errors already inherent in standardized tests and the errors intensified when SAS ® EVAAS ® researchers mix norm-and criterion-referenced tests together, use tests that are not appropriately scaled or designed to measure growth upwards, and try to account for or inpute missing longitudinal data.SAS ® EVAAS ® researchers also do not seem to be sufficiently controlling for many extraneous variables using even their most sophisticated controls and blocking methods.Such extraneous variables include parental contributions to learning outside of school, after school programming, pullout and intensive programs, tutor effects, prior teachers' residual effects on current year test scores, differential summer learning losses and gains, student motivation factors, peer and teacher interaction effects, and other variables impacting non-traditional classrooms (see also Haertel, 2011;Harris, 2011;Rothstein, 2009;Sanders et al., 2009;Shaw & Bovaird, 2011;Wilson, Hallman, Pecheone, & Moss, 2007).
These inconsistencies are also likely related to the proposition that SAS ® EVAAS ® output are biased by student demographics.This was evidenced in this study, particularly for the teachers who taught ELLs and an inordinate number of students previously retained in grade (see also Newton et al., 2010;Hill et al., 2011;Rothstein, 2009).HISD teachers also mentioned not wanting to teach high numbers of gifted, special education, or ELL students for fear of posting low SAS ® EVAAS ® scores (Collins, in progress).In addition, the issue of SAS ® EVAAS ® bias seems to hold true for teachers of high achieving or gifted students when ceiling effects prevent their students' aggregated scores from yielding significant growth (see also Wright, Horn, & Sanders, 1997).SAS ® EVAAS ® methodologists have recently verified that test ceilings are a concern as well, without yet providing suggestions about how to address this issue.Inversely, researchers have no evidence to date that regression to the mean artificially inflates value-added scores for teachers with large groups of low-scoring students.
Limited evidence also exists to indicate that SAS ® EVAAS ® output are related to at least one other correlated criterion (i.e., evidence of criterion-related validity), in this case in terms of the PDAS (see also Milanowski et al., 2004;Wilson, Hallman, Pecheone, & Moss, 2007).It is methodologically and pedagogically more beneficial that a teacher be classified similarly on at least one other, medium-to-highly correlated, unbiased measure to independently assess the same construct at the same time before consequences are tied to value-added output (AERA, 2000).And this must happen before anyone can make a solid case that a teacher is effective or ineffective, or should be monetarily rewarded or contractually terminated (Baker et al., 2010;Harris, 2011;Hill, 2009;Hill et al., 2011;Newton et al., 2010;Papay, 2010).The more that multiple indicators converge or correlate (e.g., in terms of inter-indicator consistency; see, for example, Amrein-Beardsley, Haladyna, & Polasky, 2012), and the more years over which the indicators yield the same results, the stronger the accountability system should be, and the more justifiable high-stakes decision(s) surrounding teacher evaluation should become.
Either way, high-stakes decisions should not be made on the basis of value-added scores alone (AERA, APA, & NCME, 2000).The evidence presented here indicates that, at least in the cases of these four teachers, HISD is violating this highly relevant standard calling for multiple indicators, or distorting it as principals seem to be skewing at least some teachers' PDAS scores to match what appear to be the superior scores derived via SAS ® EVAAS ® (see also Baker, 2012;Garland, 2012;Ravitch, 2012).
Whether those at SAS ® EVAAS ® should share in the responsibility to ensure their system is used properly is a debate for another day.Perhaps the focus of such conversations need to shift towards discussing how such system are to be used, and whom should be held responsible for ensuring they are used correctly and validly.While in this case it would be easy to blame the forprofit institution netting significant returns from model sales, perhaps it is not SAS's responsibility to ensure proper use of the SAS ® EVAAS ® .However, SAS ® EVAAS ® does have the responsibility of identifying effective teachers, schools, and systems, in a precise, unbiased, and reliable manner and providing "valuable diagnostic information about [instructional] practices," helping educators become more proactive and make more "sound instructional choices," and helping teachers use "resources more strategically to ensure that every student has the chance to succeed" (SAS, 2012).These deliverables are advertised in the SAS ® EVAAS ® literature and marketing materials.Yet these claims were countered with empirical, albeit case-based evidence in this study.Researchers further situated these findings in the ever-evolving literature base surrounding VAMs, as well as experiences from other HISD teachers (Collins, in progress).
In theory, VAMs allow for richer analyses of test score data because groups of students are simply followed to assess their learning trajectories from the time they enter a classroom to the time they leave.In practice, however, these models do not seem to work in the ways purported, and in this case, advertised.This was evidenced here, as researchers conducted one of the first studies to examine how this particular value-added system, as marketed, is working in practice.This is also the first study to look at this particular value-added system and its implications from the teacher's perspective.
We ultimately assert that even though results may not generalize far beyond the confines of this study, there is a lot to be learned, given the results presented, about the real impact of the SAS ® EVAAS ® on the very real lives of some teachers in Houston.Perhaps the methodologists pushing, and in this case selling the SAS ® EVAAS ® model for profit, are promising more than their model can and ever will deliver.What they are delivering, however, is also a series of unintended consequences, some of which are being exacerbated in HISD with its highly consequential use of SAS ® EVAAS ® output.These unintended consequences cannot continue to go unrecognized, and whether the unintended consequences outweigh the intended consequences warrants further research and evaluation.
Sidebar 1 The Eight Observational Domains of the Professional Development Appraisal System (PDAS) 3 I.Active Successful Participation in the Learning Process II.Learner-Centered Instruction III.Evaluation and Feedback on Student Progress IV.Management of Student Discipline, Instructional Strategy, Time, and Materials V. Professional Communication VI.Professional Development VII. Compliance with Policies, Operating Procedures and Requirements VIII.Improvement of Academic Performance of all Students on the Campus

Table 1
Teacher A's SAS ® EVAAS® and PDAS scores and ASPIRE bonuses (2007-2010)Scores shaded as green indicate that the teacher added value according to SAS ® EVAAS ® data and in comparison to other similar teachers across the district.Scores shaded as red indicate the opposite.(1) Scores with asterisks ( *Notes: * ) do not signify statistical significance, but the opposite.They signify that the scores were not detectibly different (NDD

Table 2
Teacher B's SAS ® EVAAS® and PDAS scores and ASPIRE bonuses (2008-2010)Notes: Scores shaded as green indicate that the teacher added value according to SAS ® EVAAS ® data and in comparison to other similar teachers across the district.Scores shaded as red indicate the opposite.(1) Scores with asterisks ( * ) do not signify statistical significance, but the opposite.They signify that the scores were not detectibly different (NDD * Table 3 is a summary of Teacher C's SAS ® EVAAS ® and PDAS scores and ASPIRE bonuses since 2007.

Table 3
Teacher C's SAS ® EVAAS® and PDAS scores and ASPIRE bonuses (2007-2010)Notes: Scores shaded as green indicate that the teacher added value according to SAS ® EVAAS ® data and in comparison to other similar teachers across the district.Scores shaded as red indicate the opposite.(1) Scores with asterisks ( * ) do not signify statistical significance, but the opposite.They signify that the scores were not detectibly different (NDD *

Table 4
Teacher D's SAS ® EVAAS® and PDAS scores and ASPIRE bonuses (2007-2010)Notes: Scores shaded as green indicate that the teacher added value according to SAS ® EVAAS ® data and in comparison to other similar teachers across the district.Scores shaded as red indicate the opposite.(1) Scores with asterisks ( * ) do not signify statistical significance, but the opposite.They signify that the scores were not detectibly different (NDD * noted concerns about this as well, again comparing the receipt of merit monies based on SAS ® EVAAS ® data to "winning the lottery."One eighth grade advanced English teacher noted: My principal flipped him with the 6th grade science teacher who was getting the highest [SAS ® ] EVAAS ® scores on campus.Huge [SAS ® ] EVAAS ® scores.[And] now the 6th grade teacher [is showing] no growth [as an 8 th grade teacher], but the 8th grade teacher who was sent down [to the 6 th grade] is getting the biggest bonuses on campus.
I do what I do every year.I teach the way I teach every year.[My]firstyear got me pats on the back.[My]secondyear got me kicked in the backside.And for year three my scores were off the charts.I got a huge bonus, and now I am in the top quartile of all the English teachers.What did I do differently?I have no clue.A 7 th grade history teacher classified her past three years as "bonus, bonus, disaster."Asocial studies teacher added:We had an 8th grade teacher, a very good teacher, the "real science guy," [who was a] very good teacher…[but] every year he showed low [SAS ® ] EVAAS ® growth.
. They report being able to "only get them up so much!"One teacher working with gifted students noted: Every year I have the highest test scores, [and] I have fellow teachers that [sic] come up to me when they get their bonuses…One recently came up to me [and] literally Another 5 th grade teacher working with gifted students explained: I have students [in a 5 th grade gifted reading class] who score at the 6 th, 7 th, & 8 th grade levels in reading.But I'm like please babies, score at the 9 th grade level, cause if you don't score at the 9 th or 10 th grade or higher in 5 th grade with me, I'm going to show negative growth.Even though you, you're gifted, and you're talented, and you're high!I can only push you so much higher when you are already so high.I'm scared.Teachers teaching in grades in which ELLs were transitioned into mainstreamed Englishonly classrooms also report being the least likely to add value.One 4 th grade teacher noted: I went to a transition classroom, and now there's a red flag next to my name.I guess now I'm an ineffective teacher?I keep getting letters from the district, saying 'You've been recognized as an outstanding teacher'…this, this, and that.But now because I teach English Language Learners who 'transition in,' my scores drop?And I get a flag next to my name for not teaching them well?
cried, 'I'm so sorry.'…I'm like, 'Don't be sorry…It's not your fault.'Here I am…with the highest test scores and I'm getting $0 in bonuses.It makes no sense year-to-year how this works….How do I, how do I, you know, I don't know what to do.I don't know how to get higher than a 100%.
. One social studies teacher stated: Here's the problem: No principal wants to be called in by the superintendent or another superior and [asked], 'How come your teachers show negative growth but you have high evaluations on them?Are you doing your job?I don't understand.Your teacher shows no growth but you have [marked them] as exceeding expectations all up and down the chart?' Now it's not just this [sic] data over here that's gonna harm us, it's the principals [who are] adjusting our data over there to match the [SAS ® ] EVAAS ® .So it looks like they're being consistent.