teacher evaluation as a policy target for improved student learning : a fifty-state review of statute and regulatory action since nclb

This paper reports on the analysis of state statutes and department of education regulations in fifty states for changes in teacher evaluation in use since the passage of No Child Left Behind Act of 2001. We asked what the policy activity for teacher evaluation is in state statutes and department of education regulations, how these changes in statutes and regulations might affect the practice of teacher evaluation, and what were the implications for instructional supervision from these policy actions. Teacher evaluation statutes and department of education regulations provided the data for this study, using archival records from each state's legislature and education departments that were placed into a comparison matrix based on criteria developed from the National Governors Association (NGA) goals for school reform (Goldrick, 2002). Data were analyzed deductively in terms of these criteria for underlying theories of action (Malen, 2005), trends, and likely effects on teacher evaluation and implications for supervision. The majority of states adopted many of the NGA strategies, asserted oversight and involvement in local teacher evaluation practices, decreased the frequency of veteran teacher evaluation, and increased the types of data used in evaluation. Whether or not the changes in teacher evaluation will improve student learning in the long run remains to be seen.


Introduction
Throughout its complex history, supervision has long held the promise to improve teachers and their classroom instruction (Hazi & Arredondo Rucinski, 2005, 2006).This is largely due to the fact that supervision is usually understood as teacher evaluation in the schools (see Holland, 2005, among others).With the adoption and implementation of No Child Left Behind (U.S.Department of Education, 2002) and the resultant call by the National Governors Association (NGA) to target teacher evaluation policy as a way to achieve the goal of a highly qualified teacher in every classroom, policy makers focused efforts on this promise to improve student learning (Goldrick, 2002).The NGA identified six policy goals for improving student learning: define teacher quality, focus evaluation policy on improving teaching practices, incorporate student learning into teacher evaluation, create professional accountability through developing career ladders, train evaluators in pre-service programs, and broaden participation in evaluation designs (Goldrick, 2002).While the national scene has shifted away from a direct focus on teacher evaluation in recent months, it seems likely that because of the renewed interest in pay for performance plans, evaluation will soon be a policy target once more.

Purpose and Research Questions
The purpose of this research is to determine the extent to which the identified NGA goals appear in individual state statutes and regulations, and to consider the likely effects on teacher evaluation and the implications for instructional supervision.Thus, we focus on three research questions: First, what is the policy activity for teacher evaluation in state statutes and department of education regulations?Second, how might these changes in statutes and regulations affect the practice of teacher evaluation?Third, what implications for instructional supervision are likely to result from these policy actions?

Background
Teacher evaluation statute and policy has long been the topic of research (e.g., Furtwengler, 1995ß;Wise, Darling-Hammond, McLaughlin & Burnstein, 1984;Wuhs & Manatt, 1983;Zirkel, 1979-90).Prior to the 1980s, teacher evaluation was left to local discretion (Veir & Dagley, 2002;Zirkel, 1979-80).Since the 1980s, however, policy activity has tended to ebb and flow with various national initiatives.For example, in response to A Nation at Risk (The National Commission on Excellence in Education, 1983), some states targeted teacher evaluation to upgrade teacher quality (Hazi & Garman, 1988).Also, Furtwengler (1995) found that states enacted their first requirements for teacher evaluation; specified criteria, procedures, tenure and instruments; attempted performance evaluation systems; and offered training in evaluation.Furthermore, states in the southeast were more active and detailed in their revisions, while those in the northeast had the least regulation of teacher evaluation.
As a result of No Child Left Behind's demand for highly qualified teachers in every classroom, teacher evaluation became a policy target in the states.The National Governors Association targeted evaluation as "a tool for instructional improvement" (Goldrick, 2002, p. 3).Since the National Governors Association is one of the organizations most influential over educational policy in the United States (Swanson & Bariage, 2006), it is important to see how this organization has influenced teacher evaluation policy in the states during this era of accountability, especially since its practice has been historically a matter of local judgment and discretion.Initially we wondered whether some states would be more prescriptive than others in their approach to teacher evaluation, and whether there was a trend to embed recommended practices from supervision and professional development into state statute and department of education regulations.
Also, while some scholars hold the opinion that traditional forms of evaluating teachers (i.e., supervisory observation) have "served the profession of teaching well for decades," are "by and large unproblematic," and are not the "hot button policy issues in current political debates" (Glass, 2004, p.1.),we believe that evaluation is flawed, contested, and problematic.We believe that existing evaluation statutes and regulations will be changed to try to make teachers more accountable through this highly ritualistic procedure, and in so doing, will further complicate a flawed practice.In addition, there has been much confusion about supervision and evaluation.Researchers tend to view them as separate processes, while practitioners believe them to be synonymous (e.g., Holland, 2005).We attempt to differentiate supervision and evaluation by purpose, i.e., supervision as the helping or teacher professional development function, and evaluation as the personnel function.In fact, much effort was expended during the 1960s and 1970s to refocus evaluation so that it was more democratic and to change the perception of supervision from that of an "evaluative function" to that of a "helping function" through clinical supervision models.
This tension between the dual purposes of evaluation and supervision is not new.According to Glanz (1998), it has been evident in the supervision literature since the work of Hosic (1920).Commenting on the intractability of these purposes, Glanz (p. 64) cited Tanner and Tanner as having argued (in 1987) that the conflicting duality in purpose has presented an "almost insurmountable dilemma for educators" and "is probably the most serious and, up until now, unresolved problem in the field of supervision."Conflicting perspectives about teacher learning may be even older than Hosic's work.In the early 1900s John Dewey developed a theory of instrumental education that advanced the notion that engagement in real world problem solving with intelligent thought and action constituted learning.According to Dewey (1910) authentic learning only occurs when human beings focus their attentions, energies, and abilities on solving dilemmas and complexities while reflecting on their experiences.This view of learning appears especially relevant to both supervisors and teachers, and with Dewey and others advocating authentic learning experiences for students, it seems likely that some supervisors focused time and energy on helping teachers think about their teaching in ways that stimulated their own learning.And, as supervision and evaluation processes became more democratic, emphasis on the goal of professional development of teachers for the purpose of improving classroom instruction became increasingly prevalent.
While some clinical supervision models both implied and intended reflection as professional development, a direct focus on using teacher reflection as a strategy for improving teaching did not appear as a significant part of the supervision literature until the 1970s.According to Pajak (1993), developmental and reflective supervision models first began to appear following the publication of Schön's (1983) book on the reflective practitioner.Garman (1982) was among the first to write about reflective practice for in-service supervision, while Zeichner and Liston (1987), Grimmett and Erickson (1988) and others made reflection popular in pre-service teacher education.
As noted by Glanz (1998) concomitant conflicting trends were evident in the field.During this same time period, much of the effective teaching literature moved toward more technical or didactic models of teaching (see e.g., Acheson & Gall, 1980;Hunter & Russell, 1977;Joyce & Showers, 1982).Similarly, an emphasis on principals as instructional leaders became more prevalent, along with the increasingly technical/didactic models of supervision (Acheson & Gall, 1980;Hunter, 1986;Pajak, 1993).This was especially true on the West Coast of the United States as "effective teaching models" and "effective learning environments" were described in considerable detail (Bransford & Vye, 1989;Brophy & Good, 1984;Marzano, Pickering, Arredondo, Blackburn, Brandt, & Moffett, 1992;1996), developmental models of supervision (Glickman, Gordon, & Ross-Gordon, 1985;Sergiovanni & Starratt, 1993), model(s) for cognitive coaching (Costa & Garmston, 1994), and reflection models for mentoring teacher development (Arredondo & Rucinski, 1998;Reiman & Thies-Sprinthall, 1998) among others, were promulgated.Along with these models, arguments about supervision's dual and conflicting purposes-the helping vs. evaluating purposescontinued and further developed.The ongoing tension is currently reflected in views of supervision as instructional leadership.As states have moved to adopt the National Governors Association strategies for defining teaching quality, and adding practices that encourage professional development, the implied theory of action is that increasing professional teacher behaviors through development activities and embedding these into state statute and policy regulations will lead to improved student learning.

Methods
Teacher evaluation statutes and department of education regulations provided the data for this study.These data were accessed through the websites of each state's legislature and education departments and collected in three phases.Both statute and regulation were reviewed in each state since these usually work in tandem.While statute typically provides a minimalist's perspective on such items as evaluation procedure, due process, and grounds for dismissal, it is state regulation that provides the details of its practice.Some practices, however, may have found their way into statute and, thus, become institutionalized (e.g., aspects of clinical supervision in the 1970s such as the preconference).In this research the statute was viewed as the foundation and the state's regulation was considered its details.In the analysis, both were examined to reveal the extent to which procedures and practices have become embedded within statute, and thus, less likely to be amended.Manuals and other documents on a state's website were also included when available.
Various sources were used to construct a comparison matrix to collect and analyze the state statutes and policies.The six NGA policy strategies (Goldrick, 2002) were used to first create criteria for the matrix.In addition to these, the work of such scholars as : Furtwengler, (1995), Peterson (2004), Pipho (1991), Rossow and Tate (2003), Wise et al. (1984), andZirkel (1996) were also consulted.As new criteria emerged to account for novel or unanticipated changes in state statute and policy, categories reflecting these criteria were added to the matrix.
Once evaluation statutes and regulations were collected and analyzed, it seemed helpful to categorize the levels of state control over teacher evaluation practices, especially since some states went to great lengths to achieve oversight, while others left much to local discretion.We therefore developed a four-level state control rating.In Level 1, the least prescriptive, the state department of education delegates choice and control of the evaluation policy, criteria, and the instrument to the local school district, thus Level 1 is local discretion.In Level 2, the state allows the evaluation policy, criteria, and instrument to be determined locally, but must approve, monitor and/or inspect it.Here, the state has remote control.In Level 3, definitional control, the state is more involved locally by specifying the criteria by which teachers are to be evaluated.In Level 4, procedural control, the state is most involved with local practices by specifying the instrument and/or procedures by which teachers are to be evaluated.
In the first phase of the study, 20 states were selected based on two criteria: whether it was centralized or decentralized in educational policy making based on Pipho's (1991) classification and on whether or not recent evaluation policy activity was reported by the Education Commission of the States (Hazi & Arredondo Rucinski, 2005).In phase two, data were collected for ten randomly selected states and added to the study (Hazi & Arredondo Rucinski, 2006).In phase three, data were collected from the remaining twenty states and merged with the total data set.The matrix was revised to display the data, and data were then analyzed through a process of deductive analysis.In deductive analysis of qualitative data, informal hypotheses are formulated and data analyzed to allow researchers to either confirm or reject the specific hypothesis statements.As hypotheses are rejected, new ones are formulated and additional data are collected as needed.This process continued throughout the data collection and analysis phases.Malen (2005) has categorized this type of policy analysis as a "theory of action" strategy in which broad policy initiatives can be examined and assessed based on the underlying "theories of action, or sets of principles and propositions, orientations, and related assumptions" that underpin the policy (p.196) and are either stated explicitly or can be inferred from written descriptions of the policy.

Policy Activity for Teacher Evaluation in State Statutes and Regulations
Results show that the states engaged in four general types of activity: adopting NGA strategies, asserting more oversight and involvement in local evaluation practices, decreasing the frequency of veteran teacher evaluation, and increasing the data used in evaluation.Each of these is described separately.Table 1 presents selected dimensions of our data that may affect a state's move to change its evaluation statute and regulations: whether it is centralized or decentralized (Pipho, 1991), whether or not it has collective bargaining (Education Commission of the States, 2002), its level of state control (Hazi & Arredondo Rucinski, 2005), and the frequency of evaluation for veteran or tenured teachers, whose performance has been judged satisfactory.As shown in The reader will note (also in Table 1) that in six states (Georgia, Hawaii, Louisiana, Pennsylvania, Texas, West Virginia) the state control is at the highest level (4) on our rating scale, with schools required to use state department evaluation instruments and/or to follow certain identified procedures.Five of these six states are located in the South or East, which is consistent with a finding in Furtwengler's 1995 analysis, while the other (Hawaii) is located outside of the continental United States.We further noted that those states that were more active, i.e., with early attention to evaluation statute or department of education procedures, tended to be centralized in policy making, and to have higher levels of control (i.e., 2, 3, or 4) over local evaluation policy and practices.On the other hand, those states that were more decentralized in educational policy making and permitted collective bargaining tended to be Level 1 (or sometimes 2, on our rating scale), thus leaving many of the details of teacher evaluation to local discretion with some remote control.
We find that veteran teachers are evaluated less frequently.Table 1 also displays these data.It is interesting that while 21 states require annual evaluations for tenured teachers, 19 states have adopted extended timelines, i.e., with 11 states moving to once in 3 years, three to once in two years, and five to once in five years.Eight states have undefined timelines and two have adopted nonspecific language such as "periodically" or "regularly" (see Table 1).Further, states which have a mandate to evaluate teachers more frequently (i.e., either annually or once every 2 years) also tend to be those with collective bargaining.There is a definite trend across the states to adopt the strategies recommended by the National Governors Association.For example, all but nine states have adopted at least one of the NGA strategies (see Table 2).Training evaluators was one of the most frequently adopted strategies, with Texas requiring 36 hours in instructional leadership and 20 hours in evaluation instrument training.Alabama offers a one-week training with performance demonstration before administrators are certified to evaluate teachers.
Defining teacher quality is also adopted most frequently.Most states have taken the approach of listing indicators of effective teaching, identifying standards, attributes, or performance dimensions.Kansas and New Jersey have identified the greatest number of items to define teaching (Kansas at 93 and New Jersey at 91).Broadening participation in evaluation is the next most frequently adopted NGA strategy (in 16 states).States have encouraged the representation of parents (in Florida and Utah), citizens and students (in Colorado, Kentucky, Louisiana and New York), and teacher associations (in 10 states) on the committees designing teacher evaluation systems.
The three NGA strategies adopted less frequently include using peer review and/or portfolios (12 states), increasing professional accountability through use of career ladders (10 states), and incorporating student achievement data into teacher evaluation ratings (12 states).We view the use of student performance data as a noteworthy addition to teacher evaluation.Ten states (California, Florida, Georgia, Tennessee, Colorado, New Mexico, Virginia, Indiana, Kansas, Maine) note that these data should be used in evaluation in some non-specified way, while two states (Delaware and Texas) calculate the proportion by fraction or percent of the rating scale to be based on student achievement.Our analysis of teacher evaluation statutes and regulations indicates that state departments of education are adopting a variety of oversight strategies.Table 3 presents an array of oversight from the least invasive (not specified) to the most (evaluating teachers and approving or developing guides for remediation).State strategies that seem to be predictable include: presenting a model of evaluation (1 state), requiring local districts to file their policy (2 states), mandating at least some oversight (1 state), getting reports on the results of evaluations (3 states), and monitoring LEA evaluation policy and practices (7 states).These represent minimal oversight, given the recommendations of the NGA.The most frequently adopted form of state oversight is the approval of local evaluation policy (in 14 states).As shown in Table 3, other state strategies that seem to intervene considerably in local policy and practices include: on-site review of evaluation (California and Kentucky); increasing the frequency of evaluation in low performing schools (Mississippi, New Mexico, and North Carolina); evaluating teachers (Illinois); approving remediation plans (South Carolina); developing guidelines for improvement plans (Delaware); and handling appeals of evaluation (Kentucky).
Finally, changes in teacher evaluation statutes and regulations are increasingly focused on data.Peterson's (2004) review of research on teacher evaluation was instructive in our portrayal of changes that seem to be occurring.Our categories for grouping these data are adding new data, collecting the data, using the data, and conducting evaluations.As shown in Table 4, states place greater emphasis on adding new data, collecting data, and using the data than on conducting evaluations.Thus, while veteran teachers are often evaluated less frequently, we expect stakes in evaluation to be higher, especially when student achievement data are being used.For example, in Delaware and Texas student achievement data are used in calculating teachers' evaluation ratings.Obviously, the stakes would be even higher were the ratings to be connected to salary or merit increases, an action currently in use or under debate.We also identify a related finding regarding data-that is, while teachers are the primary focus of this round of policy activity, administrators are not immune.For example, the evaluation of administrators in five states (Delaware, Florida, Georgia, Tennessee and Washington) now includes data about teacher evaluation and student gains in their schools.In addition, in two states (Illinois, Texas), the evaluation of administrators now includes incentive pay and the use of independent evaluators from outside a district.

The Practice of Teacher Evaluation
First and foremost, teacher evaluation is likely to be complicated by the changes that states have made.Which and how many of the NGA strategies adopted by a state are likely to determine whether problems are immediate or delayed.For example, if a state only adopts the NGA strategy of training, problems in the state's schools may be delayed.However, if a state adds student progress data to evaluation and subsequently ties it to salary increases, we anticipate more problems and that those problems will surface sooner rather than later.Furthermore, the complications will be mitigated or heightened by teacher-administrator relations in individual locales.
Florida is a case in point regarding both student progress and performance-based pay, two high profile NGA strategies.Florida has a history of dabbling in pay for performance plans.In the 1980s it used the controversial Florida Performance Measurement System that formulaically identified outstanding teachers and took principal judgment out of the equation until a court case dismantled its use (Hazi, 1989).In 2006, it unsuccessfully tried to link annual bonuses with the academic progress of students in what was called E-Comp or "Effectiveness Compensation" (Florida Department of Education, 2006;Pinzur, 2006).
The Merit Award Program is its most recent effort to award teacher bonuses based on student performance on the Florida Comprehensive Assessment Test (FCAT) or other district designed tests for those teachers of subjects not covered by the FCAT (Simmonson, 2007).Only 7 of Florida's 67 school districts participated in the unpopular teacher performance pay plan.The Merit Award Program was the state's fourth plan in six years (Merit pay plan's unintended lesson, 2008).We believe it is only a matter of time, in states such as Florida, before performance pay plans will be challenged in courts where judges might next legally define what constitutes student progress.
Our second concern is that certain practices added to procedure in these states may further complicate evaluation and make it more ritualistic.Those practices added, as shown in Table 4, include a classroom walkthrough, multiple measures, customer service data, use of student achievement data, peer review, portfolios, goal setting, and reflection.While these terms sit largely undefined and ambiguous in state statute and regulation, how they are ultimately defined by those who train and conduct evaluations will largely determine if they are used to help or to control teachers.While each practice is well-intentioned, when introduced into the arena of teacher evaluation as a mandated practice, it can be misused.
For example, consider the supervisory practices of pre-conference and goal-setting.While not unique to this decade of reform, they are examined here for a number of reasons.First, the preconference was one of the earliest supervisory practices, from clinical supervision, that found its way into evaluation statute.Second, it became a practice endorsed by teachers and teacher associations as a protection against unannounced observations.Similarly, goal-setting gave teachers a way to participate more fully in evaluation.Both found their way into practice in the 1980s and tended to be popular among teachers.
The finding that states are specifically defining effective teaching in performance objectives is likely to lead to restricted definitions of teaching and learning (Lewis, 2007).Such a view of what teaching and learning entails focuses evaluation processes on the state's specific definition of quality teaching.In addition to being inconsistent with current research on student learning, these restricted definitions of teaching lead to increased checklists, walkthroughs, and increased specificity of procedures and instruments.For example, in one school in Georgia, regular walkthroughs are used to target specific teaching of an identified standard for a specified time period.

Implications for Instructional Supervision
To the extent that supervision and teacher evaluation are viewed as synonymous concepts, the implications described for evaluation will be the same for supervision.In several states, "recommended practices" believed by some educators to "show promise" in supervision are included in state statutes and department of education regulations, and have become institutionalized.For example, practices such as alignment of teacher evaluation to school improvement (in five states-Colorado, Connecticut, Delaware, Kentucky, and New Jersey), peer review (in Florida, Louisiana, and Oklahoma), use of mentors, (in South Carolina), use of selfevaluation and reflection (in Louisiana and Texas), peer cognitive coaching and action research (in Tennessee), if reflecting the leading edge of a trend, are troubling.If states are, indeed, embedding recommended professional development practices into the statute and regulation of teacher evaluation, what appears on the surface to be an effort toward building teacher capacity, may simply portend prescribed designs for required teacher learning activities and are inconsistent with adult learning principles.We wonder to what extent these mandates will further reinforce the view of teacher evaluation as "ritual" (Hazi & Arredondo Rucinski, 2006).
Our discovery of a large number of states focused on data and the subsequent development of data cottage industries have potential implications for supervision.D3M or "data-driven decisionmaking" the new "buzz word for this century" is fueled by NCLB and its need for data about students (Mercurius, 2005).D3M requires "data repositories" to house, maintain and analyze information to improve teaching and learning that once used to be done manually or with limited software such as Excel.One example of these industries is data warehousing, the storage of demographic and test data in one electronic location.Data warehousing allows schools to store all data and then "mine" it, i.e., call forth and reconfigure specific information with minimal effort (Cohen, 2003).More importantly it promotes beliefs such as the following about the power of data: Also, the more data we review, the more confidently we can draw our inferences.
For instance, if we see that a particular teacher has average students for three consecutive years who perform below their classmates, we can conclude that the teacher's effectiveness is below average, allowing supervisors to offer assistance where it is most needed.Including large numbers of students in the comparison makes our conclusions even more likely to be accurate.(Cohen, 2003, np) Other examples include data destruction companies and virtual supervision programs with IPbased videoconferencing equipment delivering video clips of teaching to data files of teaching to other locations (Amodeo & Taylor, 2004).
We are concerned that such technology promotes surveillance, restricts access to data gathered, and perpetuates the illusion of objectivity.First, with student achievement data in the spotlight, some believe that principals will observe less.Indeed, our data show that state policies require that veteran teachers be evaluated less frequently.We wonder whether the achievement test is the new venue for teacher surveillance, replacing the once popular intercom as a listening device.We note that this surveillance of teachers seems consistent with the national trend where there is increased surveillance in the workplace to deter sexual harassment, accidents, theft, violence, sabotage, and goofing off (Kitchen, 2006) and in electronic search engines such as Google to deter on-line pornography (Hafner, 2006).Such acts appear innocuous and to benefit the greater good because they occur under the guise of increased productivity or safety.Access to achievement test data and its accompanying jargon-subtests, disaggregated scores, subgroups, test bank items, benchmarking, formative assessment-is now not only limited to those with the most testing knowledge, but also to those with special passwords.We worry that teachers may not be among those with equal access.
Finally, what is most disturbing is the false confidence that accompanies numbers, as if they can replace professional judgments about teaching.Such confidence occurred with the Florida Performance Measurement System (Hazi, 1989), that was believed to be objective, precise, and research-based, and that resulted in a score calculated by a computer, replacing observer judgment as well as feedback.Now, instead of an observation instrument, data warehouses will supply judgments about teaching, as shown in the earlier Cohen (2003) quote, to say with certitude what is happening inside of a classroom, without even stepping inside one.
We may be in a period of what we call distressing practices.When teachers and their professional associations are involved in evaluation policy making, the intent of their involvement has often been protection against distressing practice.One example is the pre-conference.Originating with clinical supervision, a rationale for practice born out of work with student teachers in Harvard's MAT program, the pre-conference was designed to help both supervisor and teacher plan the lesson and prepare for the observation (Cogan, 1973).As its practice was adopted by schools, and sometimes legislated (in the case of California), it became a way for teachers to protect themselves from poor evaluation (Black, 1993).When it became institutionalized in statute and policy and applied to teacher evaluation, the pre-conference became a way for teachers to learn to do a "tap dance" more in tune with the administrator's expectations (Garman & Hazi, 1988).While the pre-conference protected the teacher from poor evaluation, we now believe that peer, team, and mentor involvements may attempt to provide that same protection, and thus, become further complications to the practice of evaluation.It seems likely to us that supervision, forever entangled with evaluation, will most likely continue to be viewed cynically by both teachers and principals.

Conclusion
This research examined teacher evaluation statutes and department of education regulations promulgated since NCLB in the 50 states.Data were collected from state websites and analyzed deductively in terms of the strategies identified by the National Governors Association for underlying theories of action, trends, likely effects on teacher evaluation and implications for supervision.Identified trends show that the majority of states adopted NGA strategies, asserted more oversight and involvement in local evaluation practices, decreased the frequency of veteran teacher evaluation, and increased the data used in evaluation.While the effects of these policy actions on student learning remain unclear at this point, it is evident that states have moved forward in their adoption of the NGA strategies.
The continued and long standing tensions between the helping and evaluative functions of supervision, the "renaming" of supervision as instructional leadership, and resulting interpretations of instructional leadership as teacher evaluation, support our earlier predictions that the dual functions, although intertwined throughout supervision history, are likely to remain incompatible (Hazi & Arredondo Rucinski, 2006).It seems unlikely to us that state department involvement viewed as increasingly invasive and controlling will lead to the development of ideal learning conditions aimed at improving teacher capacity.Hence, the implicit expectation from state departments and state legislatures that the policy actions described in this study will lead to improved student learning seems problematic.And, whether or not the changes will "transform and revolutionize" teacher evaluation in the long run remains to be seen.
Research such as this contributes to the education policy literature regarding the standards and accountability initiatives aimed at improving student learning.It catalogues the efforts of the 50 states to address the NGA strategies for school reform.If, as Porter and Chester (2004, p. 2) have argued, "a carefully crafted and continuously refined assessment and accountability program can lead to more effective schools and higher levels of student persistence and achievement," then examining changes in statutes and policy on teacher evaluation may shed light on the assumptions underlying such policies, and illustrate that "theories of action" connecting increased controls of teacher performance may rest on tenuous and uncertain linkages.

Table 1 ,
most state departments of education have involvement or control over evaluation policy and local practices at Level of State Control 2, 3, or 4. Thus, state departments of education in a combined 29 states (58%) approve, monitor, or inspect local evaluation policy (11 or 22% at Level of State Control 2), specify evaluation criteria (12 or 24% at Level of State Control 3), or require a specific instrument or procedure to evaluate teachers (6 or 12% at Level of State Control 4).

Table 1
State-by-state comparison of the centralization, levels of control, collective bargaining status, and frequency of tenured teacher evaluation Pipho (1991)pho (1991)classification of state curriculum decision making.² Based on a satisfactory performance rating.³For low-performing schools.

Table 2
* Texas requires 36 hours of Instructional Leadership and 20 hours of instrument training.

Table 3
Types of state involvement or control