Introduction to the Special Issue: Historical and Contemporary Perspectives on Educational Evaluation

Most special issues on evaluation focus on one form or type of evaluation (e.g., program evaluation, personnel evaluation, and, increasingly, educational system evaluation. This special issue is unique in that there are papers on system evaluation, program evaluation, teacher evaluation, and student evaluation. Some papers are primarily conceptual, others are empirical, and still others are a little of each. Some papers are more historical, some contemporary, and some a little of each. The authors represent four countries: Canada, Mexico, South Africa, and the United States, providing an international perspective on key issues. The final paper contains six recommendations concerning the future of educational evaluation based on an analysis of commonalities across the papers.


Introduction to the Special Issue: Historical and Contemporary Perspectives on Educational Evaluation
Slightly more than 20 years ago, Michael Scriven (1996) wrote that "evaluation is a very young discipline -although it is a very old practice" (p. 395). Although there can be no doubt that educational evaluation in some form has been around for a long time, one can question whether evaluation is, in fact, a discipline. The Oxford English Dictionary defines a discipline as "a branch of learning or scholarly instruction." Beyer & Lodahl (1976) have described disciplinary fields as providing the structure of knowledge in which faculty members are trained and socialized; in which they carry out tasks of teaching, research, and administration; and in which they produce research and educational output. Disciplinary worlds are considered separate and distinct cultures that exert varying influence on scholarly behaviors as well as on the structure of higher education.
Rather than evaluation being a discipline, I would suggest that evaluation may be better thought of as a collection of disciplines. Program evaluation, for example, is "separate and distinct" from teacher evaluation as evidenced by the fact that the references included in program evaluation reports are void of any references to teacher evaluation and vice versa. Similar statements can be made for student evaluation (e.g., both in terms of scores on standardized tests and of the assignment of grades or marks) and the evaluation of entire educational systems (e.g., state, nation). To further support the presence of "separate and distinct" disciplines within evaluation a cursory search of the table of contents of journals will produce special issues on program evaluation (Ross, 2010), system evaluation (Lenkeit & Caro, 2014), teacher evaluation (Harris & Herrington, 2015), and student evaluation (Erickson, Ysseldyke, Thurlow, & Elliott, 1998).
This special issue is unique in that there are papers on system evaluation, program evaluation, teacher evaluation, and student evaluation. Some papers are primarily conceptual, others are empirical, and still others are a little of each. Some papers are more historical, some contemporary, and some a little of each. The authors represent four countries: Canada, Mexico, South Africa, and the United States, providing an international perspective on key issues.
Each of the papers included in this special issue began as an oral presentation given at a conference held in Mexico City on September 1-2, 2016, sponsored by Mexico's National Institute for the Evaluation of Education (INEE). The theme of the conference was "Key Issues in the Evaluation of Basic Education." At the time of the conference, there was a great deal of upheaval in the educational system throughout Mexico, largely because of the enactment of a national education reform effort that impacted programs, students, and, particularly, teachers. In her paper, Maria de Ibarrola presents a comprehensive overview of the state of affairs at the time of the conference. Several of the other authors (particularly Richard Shavelson and Servaas van der Berg) refer to the Mexican situation in their papers. In a sense, the Mexican situation is the international situation in miniature, in stark relief. As a consequence, the lessons to be learned from the papers are widely applicable, and their relevance extends far beyond the Mexican context for which they were originally written.
In the first paper, which is intended as an introduction to the other papers, D. C. Philips provides a historical context for contemporary evaluations. Among the issues discussed are the role of the evaluator (that is, whether evaluators are to inform decision makers or make the decisions themselves), the major differences between formative and summative evaluations (and the pros and cons of each), and the importance of examining unintended consequences or effects of programs. He argues that there are many functions of evaluation and that the method used to conduct evaluations should be informed by, and consistent with, the intended function.
In the second paper, William Schubert argues that both curriculum and evaluation deal with matters of worth or value. Furthermore, what is judged to be worthy or valuable depends to a large extent on the curricular orientation of the person (or group) making the judgment. He describes five orientations to curriculum: Intellectual Traditionalist, Social Behaviorist, Experientialist, Critical Reconstructionist, and Postmodern Global Anti-Imperialist. Each offers a perspective worth considering by those who wish to improve curriculum and the ways in which curriculum (and learning) are evaluated.
In the third paper, Richard J. Shavelson presents three general questions that drive evaluation: (1) Descriptive, that is, "What's happening?" (2) Causal, that is, "Is there a systematic effect?" and (3) Process or mechanism, that is, "Why or how is this happening?" Depending on the type of question, formative evaluation, summative evaluation, or some combination may be appropriate. In designing evaluations, it is important to pay attention to politics and measurement models and methods. Measurement matters should be addressed after the key questions have been determined. He includes several concrete examples to show how assumptions and misperceptions can upend or change the outcomes of evaluations.
In the fourth paper, Lorin W. Anderson relies on both historical and contemporary research and writing to address five questions: (1) Why do we grade students? (2) What do grades mean? (3) How reliable are students' grades? (4) How valid are students' grades? and (5) What are the consequences of grading students? The results of his analysis suggest that we grade students for various reasons; the meaning of grades is largely context-and person-specific; individual grades lack reliability whereas cumulative grades are reasonably reliable; the validity of individual grades is difficult to determine, but the validity of cumulative grades is quite strong; and grades have an impact (both positive and negative) on a variety of student affective characteristics (e.g., self-esteem).
In the fifth paper, Servaas van der Berg argues for the importance of international evaluations in improving educational systems, particularly in developing countries. International evaluations have consistently shown that socioeconomic status has a systematic influence on educational outcomes and, also, that social gradients vary across and within countries. The need, therefore, is not for "league tables," but for data that allow countries to judge the appropriateness of their policies and strategies in an international context. Efficient and targeted application of resources and policies to improve education in developing countries requires information on system performance, inequalities, progress and stagnation. International evaluations should be expanded to more countries, should be better anchored and comparable, and should be demystified.
In the sixth paper, Kadriye Ercikan, Mustafa Asil & Raman Gover discuss the extent and impact of the "digital divide" in education in general and on standardized tests in particular. A social context relevant to learning and assessment in the digital age is the large differences in access to and competence in technology among students. As a consequence, access and competency in relation to technology become critical contexts for evaluations that rely on digitally-based assessments. The authors examine the digital divide among students from different segments of society and discuss strategies for minimizing the effects of the digital divide on students' scores on standardized tests as well as on the interpretation of those scores.
In the seventh paper, Sylvia Schmelkes describes several projects conducted under the auspices of the Mexican National Institute for the Evaluation of Education (INEE). These projects address the problems of evaluating indigenous children and teachers as well as the educational policies and programs that pertain to them. Four separate projects are described, including efforts to reduce the cultural and linguistic bias in standardized tests, means of evaluating the indigenous language competence of teaching candidates, and the design of a qualitative instrument for evaluating teacher performance. The projects described are but a starting point toward improving evaluation and resolving conflicts and dilemmas that are likely to be faced in the future.
In the eighth paper, Maria de Ibarrola details the enactment of National Educational Reform legislation and its implementation over time. Her analysis describes the (1) aims and intentions of the Reform, (2) the problems in its implementation, (3) the opposition of a radical wing of the National Union of Educational Workers, and (4) the social turmoil this confrontation has caused. She provides a theoretical basis for a systematic analysis of educational reform, one that includes an understanding of the political nature of the reforms, the distinction between design and implementation, and the use of a variety of constructs to provide a solid analytic framework.
In the ninth paper, David C. Berliner discusses the inadequacies of two common methods of evaluating teachers: standardized achievement test data and classroom observation systems. Both have serious flaws: the former primarily with validity, the latter primary with reliability. The flaws of each are sufficiently serious so that neither should be used as the primary grounds for rewarding, punishing, or firing teachers. He discusses two alternatives that show some promise: duties-based teacher evaluation, and performance measures. Although these alternatives have much to recommend them, like all methods of personnel evaluation, reliability and validity issues remain problematic.
The last paper by Lorin W. Anderson is intended to summarize key points made in the previous nine papers. Based on his cross-paper analysis, he offers six recommendations, ranging from the need to be aware of political, societal, cultural, and economic factors affecting evaluation studies, to the importance of flexibility in the implementation of evaluation studies (to take into consideration unintended events or intended alterations), to the importance of ensuring that the results of the evaluation are interpreted properly and well understood.
The recommendations offered in the closing chapter bring us back to the question of whether evaluation is, in fact, a discipline. If there are, in fact, "common threads" across the various forms and targets of evaluation, a positive answer to the question would seem justified. On the other hand, if the differences in theory and practice are so great as to make it virtually impossible to find "common ground," then the answer quite likely is negative. After reading the papers in this special issue, we will leave it to the reader to decide. that shape the relations between education and work; and with the agreement of her Center and the National Union of Educational Workers, for the years 1989-1998 she served as General Director of the Union's Foundation for the improvement of teachers' culture and training. Maria has served as President of the Mexican Council of Educational Research, and as an adviser to UNESCO and various regional and national bodies. She has published more than 50 research papers, 35 book chapters, and 20 books; and she is a Past-President of the International Academy of Education.

D. C. Phillips
Stanford University d.c.phillips@gmail.com D. C. Phillips was born, educated, and began his professional life in Australia; he holds a B.Sc., B.Ed., M. Ed., and Ph.D. from the University of Melbourne. After teaching in high schools and at Monash University, he moved to Stanford University in the USA in 1974, where for a period he served as Associate Dean and later as Interim Dean of the School of Education, and where he is currently Professor Emeritus of Education and Philosophy. He is a philosopher of education and of social science, and has taught courses and also has published widely on the philosophers of science Popper, Kuhn and Lakatos; on philosophical issues in educational research and in program evaluation; on John Dewey and William James; and on social and psychological constructivism. For several years at Stanford he directed the Evaluation Training Program, and he also chaired a national Task Force representing eleven prominent Schools of Education that had received Spencer Foundation grants to make innovations to their doctoral-level research training programs. He is a Fellow of the IAE, and a member of the U.S. National Academy of Education, and has been a Fellow at the Center for Advanced Study in the Behavioral Sciences. Among his most recent publications are the Encyclopedia of Educational Theory and Philosophy (Sage; editor) and A Companion to John Dewey's "Democracy and Education" (University of Chicago Press).