Contributed Commentary on
Volume 4 Number 17: Taylor & Nolen Reframing Assessment Concepts



19 December 1996

Jonathan A. Plucker
University of Maine

plucker@maine.maine.edu

I read with considerable interest Taylor and Nolen's (1995) recent article on reconceptualizing assessment concepts. As a faculty member who is responsible for teaching undergraduate and graduate education students these concepts and who is very concerned with the shortcomings of traditional methods for teaching such concepts, the description of efforts at the University of Washington was quite helpful and thought-provoking. However, the theoretical underpinnings of the course disappointed me -- I find the end agreeable but not the means.
A few caveats are worth mentioning before I share my comments. First, I hesitated to respond to the original article because I am afraid that any commentary will be perceived as a blind defense of traditional psychometrics -- a perception with which I am not comfortable. But I believe that educators and psychologists criticize standardized testing much too harshly and promote the advantages of alternative assessments far too enthusiastically. For example, alternatives to traditional psychometric approaches are often fraught with important problems (Plucker & Renzulli, in press; Ruiz-Primo & Shavelson, 1996), students from certain ethnic groups may be as culturally biased in favor of standardized testing as others are biased against it (Plucker, 1996), and both theoretical (Sternberg, 1994) and empirical evidence (e.g., Bridgeman & Morgan, 1996) suggests that individuals with specific learning styles may prefer standardized testing over alternative assessments. However, these reservations do not prevent me from researching the use of alternative assessments or working with my students to develop nontraditional assessments. Indeed, in two of my major areas of interest -- creativity and gifted education -- teacher checklists, performance-based assessments, and teacher/parent/peer nominations have been used for decades by educators and researchers. My views are heavily influenced by my work in both of these areas, and it is through this lens that my comments should be viewed.
The following five points are meant to be a springboard for future discussion, since the ideas raised by Taylor and Nolen (1996) are certainly timely and very important. First, is the growth of alternative assessment due to "a growing belief that the teacher can be one of the best sources of information about student learning" or is it due to a lack of satisfaction with traditional (i.e., standardized) assessment? Research on teacher accuracy in the assessment of students calls the opening statement into question (Guskin, Peng, & Simon, 1992; Hocevar & Bachelor, 1989; Holland, 1959; Pegnato & Birch, 1959; Plucker, Callahan, & Tomchin, 1996). This is a minor point, but historically an important one. For example, if the purpose of alternative assessment adoption is to create assessments that are less biased toward students from specific ethnic groups, then the bias inherent in alternative assessments becomes a stumbling block and focus of future development efforts.
Second, the overarching issue may be that the techniques we use to teach measurement concepts -- not the content -- need to be improved. Taylor and Nolen note that teachers do not "perceive the information learned in traditional tests and measurement courses to be relevant" to classroom contexts, that "few teacher preparation programs provide adequate training for the wide array of assessment strategies used by teachers", and that "teachers do not believe they have the training needed to meet the demands of classroom assessment." They also discuss the ways in which measurement and assessment texts fail to aid the teaching of measurement concepts. Most individuals responsible for preparing future teachers would agree with the authors' summary. But rather than argue that our efforts and the texts fail due to insufficient theoretical foundations, why not argue that the content and text are merely passive objects that are actively manipulated by teachers to create learning experiences for students? The same argument is used by critics of the way we instruct future teachers to foster creativity, apply knowledge of motivation and behavior management, and even construct a realistic lesson plan. All of these areas are marked by a call for greater curricular application and application of principles of situated cognition, but not by a call for a complete revision of content. Why not? Because it may largely be unnecessary.
In the interest of brevity, I will not analyze the authors' reconceptualization of reliability and validity and the resulting description of the courses in detail. But readers should be aware that many of the underlying characteristics of the authors' work are really not any different than those addressed by traditional psychometricians. "Validity Dimension 1" is content validity, Dimension 2 is item or task analysis and construct validity, Dimension 3 is content validity again but from the perspective of the assessment, Dimension 4 is the detection of bias and criterion-related validity, and Dimension 5 is a consideration of the social implications of assessment and score interpretation. All of these concepts are certainly worthwhile (I especially like the emphasis on social implications), and most educators could provide dozens of studies that reinforce the inclusion of each dimension. And while the notion of the objective observer has held too much importance in the past, modern conceptions of reliability and validity (such as intra- and inter-rater agreement and criterion-related validity) tacitly acknowledge the fallibility and bias associated with assessment and evaluation. In most of the examples and discussion provided by Taylor and Nolen, familiar psychometric concepts and ideas are merely recast in postmodern terminology.
Fourth, abstract concepts (e.g., standard error of measurement [SEM]) and traditional standards of psychometric quality still need to be taught. Most, if not all, students take standardized tests as they progress through the educational system, and many of our future teachers will administer these tests and/or advise students who are about to take them. A case in point, and one that I use with my undergraduates, is the importance of standard errors of measurement. The students find this topic to be quite dry and lacking in application when I introduce the topic, but they begin to see the importance of SEM after we discuss several high-stakes applications (including school-by-school test score comparisons, which are known to influence parental decision-making in climates of school choice [Maddaus & Marion, 1995]). The question becomes one less of replacing traditional concepts and more of modifying our coursework and course sequences to include additional concepts. If qualitative inquiry has taught us nothing else, it has shown that multiple perspectives can be taught successfully within the framework of a single course.
Fifth, the most important issue may be the distinction between norm-referenced and criterion-referenced measurement. As Taylor and Nolen note, "classroom teachers are less interested in the consistency of student performance across similar measures than they are in whether students learn what [teachers] are teaching (the targeted constructs)." Alternative assessments used for high stakes (i.e., norm-referenced) purposes should be required to meet traditional standards of reliability and validity. As the authors state, "the meanings of assessment in the context of the classroom must be considered carefully when large scale assessment programs decide to use classroom assessments for the purposes of district, state, or national accountability." At the same time, classroom- based assessment and evaluation used for primarily criterion- referenced purposes should be held to slightly different standards. Attention has been focused on the type of assessment and not how it is used (which is this most important aspect of measurement and evaluation).
Finally, given the validity concerns surrounding the use of alternative assessments (e.g., Ruiz-Primo, & Shavelson, 1996), educators must avoid the appearance of calling for new conceptions of reliability and validity because they cannot produce high quality alternative assessments as judged by traditional standards. While this was almost certainly not the authors' intent, it is not altogether impossible to understand why critics of alternative assessment infer this logic from our reasoning. Arguing that teachers and future teachers do not learn measurement concepts because of the way in which they are taught is reasonable; arguing that they do not learn measurement concepts because of how they are taught AND because the concepts are not applicable is more of a stretch.
In conclusion, I find much within the Taylor and Nolen article with which to agree. Indeed, if they had simply described their course which was "designed to engage students in tasks relevant to their own work as preservice teachers and demand that they consider assessment in the context of disciplinary structures and instructional practices," I would have filed the article away in a folder that was easily accessible for myself, my colleagues, and my students. But the authors' proposed reconceptualization of psychometric concepts is merely a presentation of the wolf in postmodernism's clothing. Educators need to begin questioning whether we need to replace our conceptualizations of and standards for psychometric quality or expand the conceptualizations and the teaching of them to incorporate fresh perspectives. The latter course is more reasonable and more feasible than the former.


References

Bridgeman, B., & Morgan, R. (1996). Success in college for students with discrepancies between performance on multiple- choice and essay tests. Journal of Educational Psychology, 88, 333-340.

Guskin, S. L., Peng, C.-Y. J., & Simon, M. (1992). Do teachers react to "multiple intelligences"? Effects of teachers' stereotypes on judgments and expectancies for students with diverse patterns of giftedness/talent. Gifted Child Quarterly, 36, 32-37.

Hocevar, D., & Bachelor, P. (1989). A taxonomy and critique of measurements used in the study of creativity. In J. A. Glover, R. R. Ronning, & C. R. Reynolds (Eds.), Handbook of creativity (pp. 53-75). New York: Plenum Press.

Holland, J. L. (1959). Some limitations of teacher ratings as predictors of creativity. Journal of Educational Psychology, 50, 219-223.

Maddaus, J., & Marion, S. F. (1995). Do standardized test scores influence parental choice of high school? Journal of Research in Rural Education, 11, 75-83.

Pegnato, C. W., & Birch, J. W. (1959). Locating gifted children in junior high schools: A comparison of methods. Exceptional Children, 25, 300-304.

Plucker, J. A. (1996). Gifted Asian American students: Curricular and counseling concerns. Journal for the Education of the Gifted, 19, 315-343.

Plucker, J. A., Callahan, C. M., & Tomchin, E. M. (1996). Wherefore art thou, multiple intelligences? Alternative assessments for identifying talent in ethnically diverse and economically disadvantaged students. Gifted Child Quarterly, 40, 81-92.

Plucker, J. A., & Renzulli, J. S. (in press). Psychometric approaches to the study of creativity. In R. J. Sternberg (Ed.), Handbook of human creativity.

Ruiz-Primo, M. A., & Shavelson, R. J. (1996). Rhetoric and reality in science performance assessments: An update. Journal of Research in Science Teaching, 33, 1045-1063.

Sternberg, R. J. (1994, Nov.). Allowing for thinking styles. Educational Leadership, 52(3), 36-40.

Taylor, C. S., & Nolen, S. B. (1995). "What does a psychometrician's classroom look like?": Reframing assessment concepts in the context of learning. Education Policy Analysis Archives, 4(17). (WWW URL: http://olam.ed.asu.edu/epaa)