Should teacher observation systems be used for making high-stakes decisions?
DOI:
https://doi.org/10.14507/epaa.34.9841Keywords:
teacher observation, high-stakes decisions, inferenceAbstract
This study questions the suitability of teaching observation data for making high-stakes decisions that affect teachers. We define suitability (the extent to which intended purposes are advanced without causing undue harm) and argue that it fundamentally depends on the technical properties of the data produced by an observation system, which, in turn, depend on the attributes designed into the system. We conducted an experiment to understand better the relationship between the attributes of teaching observation systems and the suitability of their data. We compared three systems with different attributes, including rubrics that impose varying inference loads on raters. Experienced raters were randomly assigned to a system and properly trained. Then, they evaluated the instruction of advanced teacher candidates by viewing videos of their lessons. We considered three criteria when judging the resulting data: the power to predict a teacher’s contribution to student learning, the correlation of scores across systems, and rater agreement within systems. We found that a system with a low inference load (along with other attributes) outperformed systems with higher inference loads, but it may still be insufficient for making confident, high-stakes decisions. We maintain that few, if any, widely used observation systems are.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Michael Strong, Jaehoon Lee, John Gargani, Minju Yi, Hyunjin Shim, Hyunchang Moon

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
