State-Mandated Testing and Teachers ' Beliefs and Practice

In this article, I examine the relationship between state-mandated testing and teachers' beliefs and practice. The studies reviewed suggest that while state testing does matter by influencing what teachers say and do, so, too, do other things, such as teachers' knowledge of subject matter, their approaches to teaching, their views of learning, and the amalgam of experience and status they possess in the school organization. As a result, the influence state-mandated testing has (or not) on teachers and teaching would seem to depend on how teachers interpret state testing and use it to guide their actions. Moreover, the influence state testing may or may not have on teachers and teaching expands beyond individual perceptions and actions to include the network of constructed meanings and significance extant within particular educational contexts. Consequently, although a relationship between the state-mandated testing and teachers' beliefs and practice does exist, testing does not


Introduction
As part of a larger movement to raise academic expectations for all students attending public schools in the United States, state-mandated testing programs reportedly tied to new or revised state curriculum standards are currently being developed and used in a majority of states. Over the past decade, of the 48 states that have statewide assessment programs, almost all report having revamped or being in the process of revamping their state tests to align them more closely with some or all of the specific state standards (Quality Counts, 2000).
One important assumption underlying some of the recent changes made in state-mandated testing is "that testing drives much of what teachers do, and so curricular and instructional change will occur if and when the tests change" (Grant, 2000, p. 2). If true, this statement suggests policy-makers sense "the potential for big pedagogical changes with a modicum of effort: Change the test and one changes teachers' practices" (p. 2).
A second assumption underlying some of the recent changes made in state-level assessments involves increasing the "power" or "stakes" of these tests through the use of rewards and sanctions (Schwille, Porter, Belli, Floden, Freeman, Knappen, Kuhs & Schmidt, 1983). According to Heubert and Hauser (1999), "low" and "high-stakes" tests represent two fundamentally different ways of using testing in the service of educational policy goals: A low-stakes test has no significant, tangible, or direct consequences attached to the results, with information alone assumed to be a sufficient incentive for people to act. The theory behind this policy is that a standardized test can reliably and validly measure student achievement; that politicians, educators, parents, and the public will then act on the information generated by the test; and that actions based on tests results will improve educational quality and student achievement. In contrast, high-stakes policies assume that information alone is insufficient to motivate educators to teach well and students to perform to high standards. Hence, it is assumed, the promise of rewards or the threat of sanctions is needed to ensure change. Rewards in the form of financial bonuses may be allocated to schools or teachers; sanctions may be imposed through external oversight or takeover by higher-level authorities. (pp. 35-36) State policymakers have, in the past, typically relied upon low-stakes state-mandated testing to address a number of goals. In a profile of state-mandated testing programs in 1992-93, Barton and Coley (1994) report that the most prevalent purposes of state programs were accountability, instructional improvement, and program evaluation. With the arrival of standards-based reform and the explicit setting of standards that define what teachers should teach and students should learn, a greater role for state-mandated testing has emerged. Some states, for example, use performance on state-mandated tests to make presumably "high-stakes" decisions that have important consequences for students, such as whether a student is allowed to take a certain course or program, will be promoted to the next grade, or will graduate from high school (Heubert & Hauser, 1999). There can be important consequences for school districts and educators as well. As of 1996, at least 23 states reported attaching explicit consequences at the school level to state test results, such as funding, warnings, assistance from outside experts, loss of accreditation, and in a few places, the eventual state takeover of schools (Bond, Braskamp & Roeber, 1996).
Despite the widespread belief that state-mandated testing-standards-based or otherwise-contributes to educational improvement at the local level, evidence to support this claim has yet to be established. As Stake and Rugg (1991) point out: The validity of increasing the use and importance of (state-mandated) tests in order to improve the schools is a long step further in the unknown. In sixty years of vast international research on school testing, the policy of emphasizing test performance in order to improve education has never been validated. (p. xx) Zancanella (1992) makes a similar point, indicating that the increased use of state-mandated testing as a lever for change has not been paralleled by significant growth in knowledge about how testing affects teaching, learning, and schools in actual practice. He concludes that the idea that state-mandated testing will somehow lead to better teaching and learning expresses more hope than reality: This shortage of empirical investigations means that the hopes of policy-makers and the public that more tests will somehow lead to better teaching or more learning rest on largely unvalidated assumptions. (p. 283) Corbett and Wilson's (1991) study of statewide testing programs in Pennsylvania and Maryland is a case in point. Their research suggests that while statewide testing programs can influence educational activity at the local level, that activity might not necessarily be reform. While state-mandated testing did appear to inspire "changes" occurring at the local level in both states, Corbett and Wilson found those changes to be "merely differences" rather than "changes for the better" or improvements (p. 111).
Going further, Corbett and Wilson (1991) also contend that a formal trigger of consequences need not be built into the testing program for the stakes to be high. Based on their study of "low-stakes" state-mandated testing in Pennsylvania and "high-stakes" state-mandated testing in Maryland, they found that as the pressure to improve scores intensified in both states, educators in Pennsylvania especially, reported taking the test more seriously and doing so "not because they believed that they were actually improving their instructional program," but for "political reasons" (p. 104). Corbett and Wilson conclude that the level of stakes educators and the public associated with state tests and/or testing programs was less a characteristic of a test or the testing policy itself and more a characteristic of educators and the public's perceptions of the test or testing policy. Building on the earlier and largely anecdotal work of Madaus (1988), they leave open the possibility that "people may attach a level of stakes to a test that is out of character with the formal consequences associated with it" (p. 26). In so doing, Corbett and Wilson suggest that the notion of stakes might be defined more by local perception than by state edict.
Taken together, these two assumptions suggest the need to explore how state-mandated testing is interpreted and engaged by teachers. Subsequently, in the paper, I explore what is currently known about the relationship between state testing and teachers' beliefs and practice. In using the term teachers' beliefs and practice, I refer to a complicated mix of ideas that include what teachers believe and perceive about the work of teaching and how these ideas are expressed through language and action.
Based on the studies I review here, the existence of a relationship between state-mandated testing and teacher beliefs and practice is consistently confirmed. State-mandated tests do matter and do influence what teachers say and do in their classrooms. But while there is overall agreement that a relationship between the two does exist, the nature of that relationship is neither simple nor easy and requires further elucidation.

Selection and Analysis of Literature
A query using multiple and varied combinations of keywords such as teachers, teaching methods, tests, and testing conducted through ERIC (Educational Research Information Center) began the search. Because of my interest in teachers/teaching and state tests/testing in the United States, I cross-referenced the aforementioned descriptors with the keywords state and state-mandated testing. Works germane to this body of literature prompted further discovery of related books, journal articles, conference papers, project reports, essays, research studies and historical materials. From this body of literature, I selected those works that focused specifically on state-mandated testing within the last ten years.
Much of the professional literature I was able to locate was theoretical rather than empirical in nature. Only those works that could be identified as qualitative or quantitative research were considered. The exclusion of "non-empirical" works (including, but not limited to essays, anecdotal reports, testimonials) reduced an extensive list of citations to a small body of work.
From the research available, I selected those studies that I believed met the standards for qualitative and quantitative research put forth by Howe & Eisenhart (1990). 1 Much of the research I located focused more on the relationship between state-mandated testing and students (see, for example, Natriello & Pallas, 1998) and as such, implicated teachers and teaching obliquely. Many studies lumped classroom teachers, teaching specialists (e.g., reading, and special education teachers), counselors, department chairpersons, principals, and administrators, together into a single category called "educators." If attention was paid to a group of educators and teachers were specifically addressed as a subset of this group in the research, the study was included in this review (e.g., Glasnapp, Poggio, & Miller, 1991). Similarly, if no attention was directly paid to teachers' beliefs or practice, the study was not included (see e.g., Corbett & Wilson, 1991;Bond & Cohen, 1991). In short, only a handful of studies specifically exploring teachers' perceptions of state-mandated testing and the import such tests hold for their practices remained. These studies will be examined in this article.
Analysis began with my reading the target research to see what researchers had to say about the relationship between state-mandated testing and teachers' beliefs and practice. As I read each study, I first looked for points of comparison and connection. Preliminary review revealed two camps of researchers: Those who said state-mandated testing has had significant influence on teachers' beliefs and practice, and those who said state-mandated testing has had little influence, if any. Some researchers said teachers perceived the state tests as having a mostly negative impact on their beliefs and practice, whereas others said teachers perceived the state tests to be a mixed bag. Almost all of the studies suggested that while state-mandated testing did influence what teachers said and did, so, too, did a number of other things, namely teachers' knowledge of subject matter, their approaches to teaching, their views of learning, and the amalgam of experience and status they possessed in the school organization.
Ideas and phrases drawn from the larger literature, such as the use of state-mandated testing as a lever for change (Grant, 2001) change as activity and difference vs. change as improvement (Corbett & Wilson, 1991) and teachers as policybrokers vs. teachers as implementors (Schwille et al., 1983), emerged as key to the second level of analysis. All of these ideas and phrases prompted consideration of two types of policy: teacher policy and external policy. 2 Put simply, the research suggested that just as state policymakers made policy, so, too, did teachers. Furthermore, the interaction and negotiation of the two seemed to occupy more of a middle ground "in the classic sociological contrast between professional autonomy and bureaucratic subordination. It pictures teachers as more or less rational decision makers who take higher-level policies and other pressures into consideration in their calculation of benefits and costs" (Schwille et al., 1988, p. 377).
As I read the research again, I thought about the aforementioned ideas and phrases in their relation to other key ideas and phrases found in each account. From there, I compared these ideas in one account with the ideas found in the other accounts. The elaboration of these concepts informed, but did not necessarily organize the ways in which I report the findings in the section to follow.

Findings
A review of the larger literature initially suggested that state-mandated testing both positively and negatively influenced what teachers' beliefs and practice. Once my gaze focused only on those works that could be identified as research, however, empirical support for the claim that state-mandated testing positively influences teachers' beliefs and practice (e.g., Popham, 1987;Resnick & Resnick, 1985) seemed to vanish.
Consistently confirmed in all of the research I analyzed is the notion that state-mandated tests do matter and do influence what teachers say and do. But while there is overall agreement that a relationship between the two does exist, the nature of that relationship is more complicated than clear. Some researchers, for example, contend that teachers' beliefs about state-mandated tests are mostly negative, whereas others say teacher beliefs are mixed. Furthermore, this particular body of research suggests that there are those who believe state-mandated testing is having a significant and wide-ranging influence on teachers' curricular and instructional practices, and those who believe the influence of state-mandated testing is overstated and limited. As such, the research analyzed in this paper suggests that the nature of the relationship between state-mandated testing and teachers' beliefs and practice is neither simple nor easy and in need of further clarification and qualification. Brown (1992Brown ( , 1993, Romberg, Zarinnia, and Williams (1989), Smith, Edelsky, Draper, Rottenberg, and Cherland (1989), and Smith (1991) contend that state-mandated testing greatly influences teachers and their work, and does so negatively. Drawbacks of state-mandated testing reported were the: 1) narrowing of the curriculum and instruction, 2) fostering of anxiety, confusion, fear, shame, anger, and/or mistrust, 3) deskilling of teachers and/or a perception of powerlessness; 4) the invalidity and inadequacies of these tests as accurate measurements of what is taught and learned; and 5) loss of instructional time due to test preparation and testing. While Haney's (2000) work generally corroborates these findings, he suggests that not all effects of tests on on teaching are negative. In the paragraphs to follow, I discuss these ideas further.

The Significant and Wide-Ranging Influence of State-Mandated Testing
From the open-ended interviews conducted with 30 fifth and sixth-grade teachers and twelve principals from states with high-stakes state-mandated testing (i.e., Tennessee, Illinois, and New York), Brown (1992) found that "teachers reported altering the scope and sequence of the curriculum and eliminating concepts that were not covered on state tests" (p. 13). Moreover: Participants reported a reluctance to use innovative instructional strategies (e.g., whole language approach, cooperative learning, high order thinking activities) and mentioned the use of more traditional instructional methods (e.g., lecture, recitation) due to the belief that these strategies would better prepare students for state tests. (p. 14) In their study of eighth-grade teachers' perceptions of the influence of mandated testing on mathematics instruction, Romberg et al. (1989) found much of the same in an earlier study. The teachers in their study said that they devoted more time to basic skills and less time given to creative projects, cooperative learning activities, and computer use because of the influence of state-mandated testing.
Drawing on the same data from his earlier 1992 study, Brown (1993) indicates that teachers reported feeling confused about the purposes of state-mandated testing, perceived themselves as powerless in the face of state-mandated testing policy, mistrusted state education departments and state legislators, questioned the effectiveness of the tests in evaluating student achievement, expressed concern that the test results were overemphasized by those mainly outside the profession (e.g., parents, the media), and did not consider state tests to be an accurate measure of student learning or school accountability. As a result of these findings, Brown concludes that state-mandated testing negatively impacted teachers' practices, and that the ways in which state-mandated testing mandates were interpreted and implemented by educators at the local level suggested that a number of barriers existed between local educators and state policy-makers. In general, educators reported a growing distrust and a lack of faith in decisions mandated "from above" (p. 29). Smith et al.'s (1989) 15-month study of state-mandated testing effects in two elementary schools in a Phoenix metropolitan district revealed that high-stakes state-mandated testing significantly and negatively influenced teachers' beliefs and practice. In this study, Smith and her colleagues interviewed teachers, students, and administrators as well as directly observed classroom instruction, meetings, and school life generally. Key findings, as summarized in Rottenberg and Smith (1990), suggest that state-mandated testing encouraged use of instructional methods and materials that resembled testing. For example, in one school where test scores fell slightly short of a year's growth in language, the principal created a daily review program that required students to answer multiple-choice questions on grammar, usage, punctuation, and capitalization. A second finding is that teachers neglected material not included on the state tests. Specifically, teachers spent little time on science, social studies, and writing, concentrating instead on those skills tested by the state-mandated Iowa Test of Basic Skills: reading, word recognition, recognition of errors in spelling, usage, punctuation, and arithmetic operations. Third, Smith and her colleagues conclude that state-mandated testing reduced the time available for other instruction. Based on observations of classroom instruction, they estimated that the time devoted to test preparation and testing was equivalent to 100 hours or approximately three to four weeks of school. In addition to pressures stemming from issues of time and the need to improve student performance on these tests, teachers also reported pressure stemming from the publishing of test results in newspapers and on television and the subsequent comparison of scores among schools and school districts. As a result, teachers indicated that they altered their instructional strategies and curricular emphases in ways they thought would improve students' scores.
In a follow-up report in 1991, Smith draws on her earlier work (Smith et al., 1989) and that of Haas, Haladyna and Nolen (1989) and . The two latter studies are based on surveys and interviews of Arizona educators statewide. Smith states that while her work and that of Haas et al. and Nolen et al. were independent in their conception, method, and execution, the findings of these studies confirm her own. Hence, she combines the information gained from these studies (although how she does so is unclear), suggesting that the effects of testing on teachers fall into not three, but six categories. 3 According to Smith's (1991) follow-up report, one effect of state-mandated testing is that teachers experience negative emotions such as anxiety, shame, embarrassment, guilt and anger as a result of the publication of test scores. Smith explains: "Many express frustration, feel off balance, out of control, and held to standards that, if the truth were known, are technically impossible for them to meet" (p. 9). She says that teachers believe the editorial attention and media coverage of test scores serve to fuel dominant public perception of Arizona schools as failures, teachers as not particularly hard-working, and the educational bureaucracy as inept. As a result, teachers seek to avoid feelings of anxiety, shame, and pressure, Smith reports, "by teaching to the test" so that students' scores improve (p. 9). The second effect relates to teachers' feelings of dissonance and alienation stemming from beliefs about the invalidity of the test and the need to raise scores. In short, because of the mismatch between what was taught and what was tested, the numeric test scores were perceived as having little value to teachers. Third, teachers felt anxious and guilty about the emotional impact these tests had on young children. Interestingly, Smith is careful to point out that not every teacher shared these beliefs and that the influence of testing varied across different grade levels: Unlike the teachers of elementary grades, teachers of older pupils are more likely to dismiss negative effects of tests on pupils. Instead, they frequently complain of pupils "blowing off" the test and having no incentive to put in the effort the tests require. (p. 9) The fourth effect Smith reports is that testing programs reduce the time available for instruction. The time taken up by testing, she summarizes, "significantly reduces the capacity of teachers to adapt to local circumstances and needs of pupils or to exercise any discretion over what to teach and how to teach it" (p. 10). Fifth, Smith notes that teachers' attention to tests "results in a narrowing of possible curriculum and a reduction of teachers' ability to adapt, create, and diverge" (p. 10). Smith states that subject matter not tested by the test such as social studies and health was crowded out and, in some cases, disappeared altogether. Even instruction in the subjects that were tested (e.g., reading and math) was slighted in that instructional approaches focused primarily on the ways in which these subjects were tested. The result, Smith says, was the narrowing of the curriculum and deskilling of teachers: Faced with the "packed curriculum" (a set of requirements-tests, programs, scope and sequence, extra programs ... that exceeds the ability and time of any teacher to cover all of them competently) and the restricted number of instructional hours available, some teachers aligned their actions with expectations. They began discarding what was not to be tested and what was not part of the formal agenda and high priorities of the principal and district administrators. (p. 10) Sixth, Smith concludes with the contention that testing ultimately deskills teachers: "Because multiple-choice testing leads to multiple-choice teaching ... the methods that teachers have in their arsenal become reduced, and teaching work is deskilled" (p. 10).
Although most of what Haney (2000) reports focuses on the effects that high-stakes state testing has had on minority and at-risk students in the state of Texas, of relevance to this paper are three surveys of Texas educators-one conducted by Haney in 1999 and two other surveys undertaken independently by Gordon and Reese (1997) and Hoffman, Pennington, Assaf, and Paris (1999). Inspired by the introduction of a new "criterion-referenced" and high-stakes state testing program in the fall of 1990, all three surveys were used to explore teachers' perceptions of the Texas Assessment of Academic Skills or TAAS. This group of researchers deemed studying teachers' perceptions of the TAAS important not only because the state testing program was new, but because it represented an important shift in the focus of state testing in Texas from "minimum skills to academic skills" and to test "high-order thinking skills and problem solving ability" (Teacher Education Agency, 1997, p.1).
Although each of the surveys polled somewhat different samples of Texas educators 4 , Haney combines the information gained from his study with that of Gordon and Reese (1997) and Hoffman et al. (1999) and identifies four common themes. They are: 1) that the teachers appear to be devoting a huge amount of time and energy preparing students specifically for the TAAS; 2) that the emphasis placed on the TAAS is hurting more than helping teaching and learning; 3) that the emphasis on the TAAS is particularly harmful to at-risk students; and 4) that the emphasis on the TAAS contributes to both retention in grade and students dropping out of school. These themes suggest that the Texas educators generally viewed the effects of the TAAS as harmful and, as such, these findings concur with others described thus far in this section. It is important to note, however, that a small minority of respondents in Haney's (2000) and Hoffman et al.'s (1999) studies did report the TAAS as having some positive effects, namely that students were learning more basic and test-taking skills and achieving better test scores as a result. Interestingly, these respondents often qualified their responses in their written responses, stating that such learning did not occur without some sacrifice. Provided below are two examples of teachers' written comments illustrating this point: Students are learning more of the basic skills [the] TAAS tests because teachers are figuring out better ways to teach them. Students are NOT receiving a well-rounded education because Social Studies & Science are being cut to teach TAAS skills. (Haney, 2000, Part 6, p. 10) Yes, there is increased learning but at a partial price. I have seen more students who can pass the TAAS but cannot apply those skills to anything if it's not in TAAS format. I have students who can do the test but can't look up words in a dictionary and understand the different meanings. They can write a story but have trouble following directions for other types of learning. As for higher quality teaching, I'm not sure that I would call it that. Because of the pressure for passing scores, more and more time is spent practicing the test and putting everything in TAAS format. (Haney, 2000, Part 6, p. 10) Suggested by these two comments is the idea that an increase in test scores might reflect an increase in the acquisition of basic and test-taking skills and knowledge, but not what some call understanding (Cohen, McLaughlin, & Talbert, 1993;Wiggins & McTighe, 1998;Wiske, 1998). Furthermore, the second comment, in particular, suggests that an increase in learning might not mean higher quality teaching. Both comments imply that the motivation fueling instructional change at the classroom level via state-mandated testing might be more political than educational. Such comments are consistent with Corbett and Wilson's (1991) findings noted earlier. Recall that in their study of state-mandated testing in Pennsylvania and Maryland, Corbett and Wilson found that as the pressure to improve scores intensified in both states, some educators attended to the tests more seriously "not because they believed that they were actually improving their instructional program," but for "political reasons" (p. 104). This claim is further bolstered by a handful of respondents who indicate that the substantial gains in TAAS scores were not necessarily due to "increased learning and higher quality teaching," but to the TAAS tests "getting easier over time, to schools excluding low scoring students, or to administrators' cheating" (Haney, 2000, Part 6, p. 10).
The studies reviewed in this section indicate a number of things. First, teachers say that they are tailoring their curricula and instruction to the form and content of concepts covered on the state tests. They also suggest that what is tested appears to become what is emphasized in their classrooms, with the content and form measured on the tests occupying most of the instructional time. Other subjects, such as social studies and science, although worthy, appear to get crowded out. Furthermore, these studies intimate how limited, limiting, and deleterious the quest to improve state test scores may be for teaching and learning alike.

The Overstated and Limited Influence of State-Mandated Testing
Some of the research, on the other hand, suggests that the influence of state-mandated testing is overstated and limited (Firestone, Mayrowetz, & Fairman, 1998;Glasnapp, Poggio, & Miller, 1991;Grant, 2000Grant, , 2001Zancanella, 1992). These researchers posit that the influence of state tests and/or testing programs-as motivators of curricular and instructional reform at the local level-may be non-existent or of very low intensity. They also contend that teachers' perceptions about state-mandated testing are mixed and that state-mandated testing is one of many interacting influences (Grant, 2001). Glasnapp, Poggio, & Miller (1991) surveyed school board members, superintendents, building principals, teachers, parents, and students concurrent with the administration of Kansas's low-stakes minimum competency testing program in 1982, 1983, and 1987. From the state's 325 districts, 1, 358 teachers responded to the survey in 1982, 816 in 1983, and 1,244 in 1987. Teachers from grades two to eleven were randomly selected within each district; the representativeness of this sample in terms of the larger teacher population, however, was not shared.
Based on the results of their survey, Glasnapp, Poggio, & Miller (1991), report that one action-increasing the emphasis of the state's objectives in the curriculum-was reportedly taken by more than half or 57% of the teachers in 1987. This result importantly varied across grade levels. Teachers at grade levels where state tests occurred (2, 4, 6, 8, and 10) generally had higher response rates than teachers who taught at grade levels at which no state tests occurred (grades 3, 5, 7, 9, and 11). A noted exception was the tenth-grade.
Fewer than 20% of the teachers reported that they narrowed the curriculum, sacrificed instruction in other skill areas, or changed their instructional methods because of the state tests. They did indicate, however, that the presence of minimum competency testing led to a 40% increase in such activities as drills, coaching, and practice on testing. It is interesting to note that while the notion of drills, coaching, and practice on testing may have increased, such activity was not necessarily viewed by the teachers as a narrowing of curriculum, a sacrifice to instruction, or a change in their instructional methods overall. And while teachers in this survey did report that they felt the pressure for a high level of performance consistently increased over time, they did not report, as a group, significant change in their overall practices as a result. Glasnapp, Poggio, and Miller conclude that as sanctions that could serve as motivators of curricular and instructional reform at the local level, Kansas's low-stakes testing program was non-existent or of very low intensity. Firestone, Mayrowetz & Fairman (1998) say that low/moderate stakes "performance-based" state-mandated testing programs in Maine and high-stakes testing programs in Maryland influence teachers' content decisions, but the influences are weaker than expected, especially in terms of instruction. Using an embedded case-study design, they report data gained from semi-structured interviews with administrators and teachers as well as from site visits to the district and individual classrooms. In their examination of educators' responses to middle school mathematics state assessments that were decidedly more "performance based" (in that they include sections where students are to construct answers vs. select from an array of possible answers), Firestone et al. report that they saw considerable activity focused on the particular state test itself, such as the aligning of subjects taught with the test as well as what many commonly refer to as "teaching to the test" (e.g., teaching test-taking skills, using materials like those on the test, and focusing primarily on content known to be covered on the test). Interestingly, it was the use of classroom observations, in particular, that led Firestone et al. to conclude that the effects of state-mandated testing on teaching may be overrated by advocates and opponents of state-mandated testing alike. Although three times as many teachers in Maryland reported making changes in their teaching to accommodate the state test than did teachers in Maine, classroom observations and the exploration of instructional patterns 5 found within revealed that the teaching in both states was relatively the same. Firestone et al. subsequently conclude that while state assessments, when combined with moderately high-stakes and other conditions, may generate activity focused on the test itself, and may promote certain, observable changes such as the alignment of subjects taught with the test, they appear less successful in changing how teachers teach those subjects. They argue that policy-makers' "fiddling" with the use of assessments as tools of educational reform ultimately misses the major point, which is that "assessment policy will not get around the need to ensure that teachers have a solid foundation in the subjects they teach and clear understanding of how to help children learn those subjects" (p. 112).
Following a similar line of argument as Firestone, Mayrowetz, and Fairman (1998). both of the studies by Grant (2000Grant ( , 2001 indicate that while state testing may influence teachers' decisions about what to teach, it does not necessarily influence how teachers teach it. Based on a larger multi-year and multidisciplinary study of the relationships between national, state, and local education reform efforts and school/classroom practices in New York State, Grant (2001) reports on the relationship between high-stakes state-level testing and two high school social studies teachers who teach in the same suburban school and prepare students for the same state test in eleventh-grade. Based on information he gained through interviews and classroom observations of these two teachers teaching a civil rights unit, he found little direct influence of state-mandated testing on either teacher's content or pedagogical decision making. Grant's findings indicate that state tests are but "one of several interacting influences on teachers' instructional practice, none of which is primary at all times" (p. 422). Grant argues that while testing, as a tool for policy reform, may be a lever, it is an uncertain one. He finds that while the influence of the state test in social studies is apparent, "that influence interacts with a range of other factors, particularly the teachers' view of subject matter and learners" (p. 414).
In another study based on information gained from focus group interviews with elementary and secondary teachers from rural, urban, and suburban districts in New York State, Grant (2000) reports considerable variability in the way the consequences of recent changes made in the state's testing program are playing out, with as many unintended as intended consequences. Based on these interviews, he contends that teachers see the new state tests reportedly tied to the new state learning standards as "a mixed bag": The prospects of tests which more closely mirror and support thoughtful instruction and closer collaboration with colleagues are mitigated by the problems of, among other things, uncertainty about the rationale for and consequences of the new tests and the unevenness of the opportunities to learn about and respond to changes in the tests. In short, teachers across grade levels and subject matters express an uneasy combination of hope and fear, anticipation and dread. (p. 6) While most of these teachers "praised state efforts to bring standardized assessments into closer alignment with the kind of ambitious instruction they believe is important" (p. 7), they also expressed concerns "that the new tests could produce undesirable effects," namely reductionistic approaches to learning and teaching as well as an increased emphasis on remediation (p. 14).
Of no particular surprise is that all of the teachers in this study reported being concerned about students' scores on these new tests. What is intriguing, however, is Grant's (2000) observation that suburban teachers seemed to be even more concerned about their students' performance than their urban peers. Because of the uncertainty of how their students will perform on the new state tests, teachers felt anxious about these tests and their potential threat to schools' standings. Grant suggests one possible explanation for this occurrence is that "not all suburban districts are equal" (p.15) and that more is at stake for those suburban districts that have consistently ranked in the top quartile (according to a highly publicized local business magazine): Top quartile spots on this list have real consequences for real estate values, bragging rights, and the like, and so the scramble to move up can be intense ... . School people in high performing [suburban] schools want to maintain their position; educators in middle and low performing [suburban] schools hope to at least avoid dropping further. (p. 15) Grant's (2000) findings suggest that there are differences in the ways teachers perceive reforms across grade levels as well. Teachers at the high school level, for example, indicated that they felt pressured by their principals to ensure higher scores. While elementary teachers indicated that they did not feel less pressure than their high schools peers to improve students' scores, they reported that their principals were "more likely to talk about test scores as part of a bigger picture of how students are progressing" (p. 16). As a result, the building principal was therefore perceived to be less of a factor among elementary teachers. Grant's (2000) findings also suggest that there are real and important differences in the ways teachers perceive reforms by subject matter. He reports that while state tests of language arts, mathematics, and science at the elementary and secondary level have undergone substantial transformations in terms of a reduction in the number of multiple-choice items and an increase in the number and range of performance tasks, the changes made to the social studies assessment, in comparison, "are less dramatic" (p. 7). The persistence and heavy presence of generally low-level multiple choice questions (which account for 50 to 55% of a student's score) on New York State's new and revised state social studies assessments at the high school levels have led secondary social studies teachers, in particular, to argue that the test has changed little overall.
As with Haney's (2000) report mentioned earlier, not all of the consequences of state-mandated testing reported by teachers in Grant's (2000) study were negative, however. Several teachers, especially elementary and high school math and English teachers, for example, cited greater collaboration with their peers. The development of informal networks and relationships, therefore, was reported as one of the key benefits stemming from the changes made in the state-mandated testing program. Zancanella (1992) investigated the influence of the language arts segment of the low stakes, state-mandated Missouri Mastery and Achievement Tests given in grades 6, 7, 8, and 9 on the thought and action of three, tenured, middle school/junior high school English-language arts teachers. These case studies revealed that the import that the new state tests held for these teacher's thinking and practice in the teaching of literature was related to two factors: 1) the fit between the teacher's preferred approach to teaching literature and the conceptions of literature embodied in the state tests and 2) the amount of "curricular power" the teacher held-that is, the teacher's place in the curricular decision-making structure of the school. He concludes that teachers' responses to state policy were cast in terms of their prior learning, beliefs, and attitudes. In other words, the biography of the teacher's own past experiences was an important force in the ways in which these three teachers responded, especially as they related to: (a) the degree to which teachers' conceptions of the subject matches the conception of the subject the tests represent, a version of what is often called "curricular alignment"; and (b) the amount of what might be called "curricular power" the teacher possesses, the amalgam of experience, status, and position in the school organization that determines how much say the teachers has in both formal and informal decisions about which ways of teaching a subject are viewed as legitimate. (p. 292) Though tenured, Ms. Kelly, for example, reported feeling pressure from the school administration to have her students perform well on the tests and the need to change her teaching style to do so. Another teacher (who also taught in same school as Ms. Kelly) saw her inductive approach to literature and the nurturing of lifelong readers as being at odds with the state tests, but did not feel threatened that low test scores would damage her reputation or position. As department chair and a veteran teacher with thirteen years experience, Ms. Martin reported that she felt free to "go beyond quiet resistance to 'hammer away' at her principal" about issues related to the test (p. 228). The third teacher, Mr. Davidson, saw the tests as compatible with his idea of teaching literature, although there were times when the tests were seen as intrusive and counterproductive. Despite his misgivings, his way of teaching literature was seen as compatible, if not "aligned" with the new test. Mr. Davidson, therefore, did not see a need to adjust his teaching and believed issues related to the new test had little direct consequence on his teaching.
The studies reviewed in this section indicate that teachers' interpretations of state testing are influenced not only by state testing itself, but the particular beliefs, knowledge, and experience individual teachers possess. The view that teachers' practices are subject to multiple influences, therefore, emerges as key. According to Grant (1999), such a view fosters a richer understanding of teach decision-making, in that "[i]t obviates the notion that any single factor ... or set of factors ... substantially influence[s] teachers' practices. Instead, what teachers do in their classrooms is likely to be influenced by a range of factors reflecting a variety of sources" (p. 238).

Discussion
All of the studies reviewed consistently confirm that state-mandated testing does matter and does influence what teachers say and do. But while these studies suggest that the instructional methods teachers employ, the materials they use, and the activities they plan are, to some degree, shaped by the form and content of state-mandated tests and the state objectives that accompany them, there appears to be no clear or consistent pattern of influence. As such, the research that is currently available presents a picture more complicated than clear and begs further elucidation.
Some of the studies indicate that the effects of statewide testing vary according to the "stakes" involved. Because high-stakes tests and/or testing programs are used for important decisions, these tests are assumed to have more power than low stakes tests and/or testing programs to modify local behavior (Heubert & Hauser, 1998;Madaus, 1988). Following this line of argument, high-stakes test are more likely to impact, if not constrain, teachers' beliefs and practice. Brown (1992Brown ( , 1993, Smith et al. (1989), and Smith (1991), for example, argue that teachers from states with high-stakes state-mandated testing (e.g., Arizona, Illinois, New York, and Tennessee) reported and were observed tailoring their instructional methods, materials, and activities to the type of performance elicited by the state tests. Under these conditions, Brown, Smith et al, and Smith assert that the state tests became more the goal of instruction, rather than the means to assess it. In addition, these researchers contend that the attention the media, the state education department, and various people at the local level (e.g., administrators, principals, school board members, parents, and community members) pay to test scores may catapult state tests into even higher stakes status. In short, this camp of researchers argue that high-stakes state-level testing serves to constrain, if not homogenize instruction.
This, however, does not appear to be the case in Grant's (2001) study of two high school teachers in New York State-a state boastful of its high-stakes state tests known as the Regents Examinations. Grant suggests that while the influence of the state Regents exam was apparent, that influence held no privileged position and interacted with a range of other factors, particularly the teachers' views of subject matter and learners. Both teachers sought to prepare students for the same high-stakes test and yet, their instructional practices were found to be radically different. Muddying the waters further, Firestone, Mayrowetz, and Fairman's (1998) observations led them to conclude that the teaching of math was much the same in both Maine and Maryland despite the difference in stakes and state tests. What Firestone, Mayrowetz, and Fairman's work, in particular, reveals is that under some circumstances, state-mandated testing policies and the stakes attached to them can promote specific behavior and procedures in the classroom more easily than deeper understandings of subject matter and how to teach it. Taken together, Grant's (2001) and Firestone, Mayrowetz, and Fariman's (1998) studies importantly show that while the state test and the stakes involved may have influenced these teachers' practices in some way, these things did not necessarily construct or determine the instruction that was ultimately provided. Without denying that the effects of statewide testing could vary according to the "stakes" involved, these researchers suspect that the argument that high-stakes testing encourages teachers to teach to the test may be overrated. Instead, their work prompts important consideration of whether all high-stakes state-mandated testing is necessarily high-stakes for all-especially for those who teach in districts with large numbers of students failing state tests-and how teachers who are significantly influenced by state-mandated testing differ from those who are minimally influenced.
Other factors may also be interacting with the state test to influence teachers' beliefs and practice in addition to the stakes involved, teachers' subject matter knowledge, and teachers' views of learners and teaching. The grade level taught (e.g., elementary vs. secondary; grade levels tested vs. grade levels not tested) emerged as a potential factor (Glasnapp, Poggio, & Miller, 1991;Grant, 2000;Smith 1991) as did the amalgam of knowledge, beliefs, experience, status, and position a teacher possesses (Zancanella, 1992). District and building-level expectations, local context and conditions, and state and district policy climate also emerged as possibilities (Brown, 1992(Brown, , 1993Firestone, Mayrowetz, & Fairman, 1998;Grant, 2001;Smith, 1991;Smith et al., 1989). So while state-mandated testing may be an influence, it would appear to be one of many. Consequently, although a relationship between the state-mandated testing and teachers' beliefs and practice does exist, state-mandated testing does not appear to be an exclusive or primary lever of change.
Given what is currently known about the general relationship of state-level testing and teaching, one caveat must be offered. Research in this area overall tends to rely heavily on teachers' perceptions gained through surveys and interviews (e.g., Brown, 1992Brown, , 1993Glasnapp, Poggio, & Miller, 1991;Grant, 2000). While it would be foolish to discount self-reported and interview data, the lack of observational data begs the question of how tests influence teaching in actual practice. Even when teachers report change, for instance, little is known about what that change looks like and whether change has occurred at all. Only a few studies that couple surveys and interviews with classroom observations currently exist (Firestone, Mayrowetz, & Fairman, 1998;Grant, 2001;Smith et al., 1989;Smith, 1991;Zancanella, 1992). While in-depth interviews may provide access to teacher perception, coupling interviews with classroom observation allows both thought and action to be put in context. Coupling interviews with observations appears to provide data not only on teachers' understandings of what and how to teach, but also on how those understandings are operationalized and carried out. That said, the question of "why" teachers changed or did not change in lieu of state-mandated testing begs further exploration. For these reasons, I believe that studies that allow for the contextualization of teachers' beliefs and practice hold considerable promise for future research. For if school reform via state-level testing is to prove constructive for education, research on how teachers understand and interpret new policy in the context of their knowledge, beliefs, experience, and teaching circumstances is vital.

Conclusion
The studies reviewed suggest that while state testing does matter and influence what teachers say and do, so, too, do other things, such as teachers' knowledge of subject matter, their approaches to teaching, their views of learning, and the amalgam of experience and status they possess in the school organization. As a result, the influence state-mandated testing has (or not) on teachers and teaching would seem to depend on how teachers interpret state testing and use it to guide their action. Moreover, the influence state testing may or may not have on teachers and teaching expands beyond individual perceptions and actions to include the network of constructed meanings and significance extant within particular educational contexts. How tests matter then is not always clear and simple.
Given the limited number of studies that are currently available and the limited nature of the data on which many of these findings are based, studies that provide a richer, more in-depth understanding of the relationship between state-mandated testing and teaching in actual school settings not only point toward important directions for continued research in this area, but are greatly needed. For if state-mandated testing continues to be viewed as a viable mechanism of educational reform, it is necessary to understand the ways in which this mechanism is mediated through the local contexts and the minds, motives, and actions of teachers. Subsequently, studies that provide a richer, more in-depth understanding of the relationship between state-mandated testing and teaching in actual school settings not only point toward important directions for future research in this area, but are greatly needed. Notes 1. Howe and Eisenhart (1990) provide five general standards for qualitative and quantitative educational research, specifically, that there be: 1) a fit between research questions and data collection and analysis techniques; 2) the effective application of specific data collection and analysis techniques; 3) an alertness to and coherence of background assumptions; 4) an overall warrant; and 5) an awareness of both external and internal value constraints.
2. Schwille et al., (1983) describes teacher policy "as the definitive allocation of public resources by working-level personnel in education" and external policy "as policy in the usual sense-the laws, regulations, and other directives of boards, legislatures, and executive departments" (p. 376). In this paper, external policy refers to state tests and/or a state-mandated testing policy.
3. I am referring to Smith et al.'s (1989) three key findings stated in the previous paragraph. Smith posits that 1) teachers were encouraged to use instructional methods and materials that resembled state-mandated testing; 2) content areas not included on the state tests were neglected; and 3) time devoted to test preparation and testing reduced time available for other instruction. In her follow-up study, she expands this list from three to six. 4. Haney (2000) surveyed secondary math and English/Language Arts teachers, Hoffman et al. (1999) surveyed reading specialists statewide, and Gordon and Reese (1997) surveyed Texas teachers who were "graduate students in educational administration" (p. 349).