Chapter 14: What do tests measure?PreviewIn this chapter I discuss in more detail the question of what it is that a test measures. In what sense can it be said to measure knowledge or ability? To what extent does it perform a ritual task and measure nothing? Or is it the wrong question? Should we rather ask, what do tests produce?Tests and scalesA measure, or scale, assumes of course that equal intervals anywhere on the measure are in some sense of equal value. That the difference between sixty and seventy percent is in some way equivalent to the difference between twenty and thirty percent. So if a test is a measure then it must be a measure of something, and we would expect equal differences to represent equal differences in that something.We know that a ruler measures length and the unit is a metre. We know that a clock measures time and the unit is a second. We know that a balance measures mass and the unit is a kilogram. And relative humidity measures what fraction of the water vapour the air could contain at a given temperature that in fact it does contain. So this is a pure number. Nevertheless, it is a ratio of two quantities that do have units. So what does a test measure? And what is the unit of measurement? Let's look at the unit issue first. It is clear that there are no units. The measure is a pure number. Unlike relative humidity, however, it is not a ratio of two measures of absolute humidity which do include units. Again, this supports the idea that the numbers are not measures, but ordinal numbers - numbers that represent an ordering of some kind. Numbers that describe a position in a series. Numbers in this case that assert that some performances, or people, have more of "something" than do others. At this point it is worth mentioning that the whole paraphernalia of
normalising scores and otherwise fiddling with them has two purposes: One
is to try to magick a linear scale out of an ordinal one by making various
sorts of assumptions about the distribution of the "something" that is
being "measured"; the second is to produce "measures" that are mathematically
pliable, that are accessible to the manipulations and pleasures of mathematicians;
that will, in short, turn a horse race into a profession (See chapter 11).
Cultural differencesBack to the problem of the "something" that is measured by the test. For the most part, Europeans and their colonial converts on the one hand, and the United States and their spheres of influence on the other, have different approaches.To the Europeans it has never been a problem. Inured by tradition to a religious belief in the Judge, they have generally accepted the proposition that the test or examination measures whatever the Judge says it measures. The acceptance of this "fact" denies the existence of a problem. The Judge says that tests measure student achievement. Pressed further, he or she might say that student achievement is a measure of what has been learnt on the course of study being tested. The test is simply that part of the course where learning is demonstrated. And the Judge, who holds the mystical secret and truth of standards, is able to convert this demonstration into a mark which is the true measure of what is achieved. As I wrote that last paragraph I was aware of how "right" it sounded. Like all religions, there is a plausibility in its logical circularity that is terribly enticing, a simplicity in it's self-evident truth that gives a deep sense of security. Articles of faith are characteristically immune to both the challenges of logic, and the intrusion of empirical data. To paraphrase Horkheimer and Adorno (1972), faith needs knowledge to sustain it, and thus pollutes knowledge in the act of attaining it (p20). The Americans, whose religious tradition is democratic and competitive rather than monarchic, have little faith in particular Judges or, for that matter, Presidents. Which is not to say that they do not revere even more in compensatory manner the institutions of power in which these fallible humans are niched. Regardless, their tests must be free of the Judge's subjective idiosyncrasies, and pay due homage to the competitive individualism that is central to the American dream. The problem of subjectivity was (mythologically) solved through the medium of the "objective" test: So the "something" that the test measures is measured economically and objectively, but we are still left with the sticky problem of what this "something" is. For when the Judge goes away, this problem raises its (previously covert) head. Over the years, American test gurus have come up with a plethora of things that they claim to be measuring; intelligence, specific ability, attainment, achievement, competence, factors of the mind, specific outcomes, curriculum objectives, minimal competencies, true scores, universe scores, latent traits. An interesting oscillation between physics and metaphysics, between outside behaviours and inside mind-potential, between performance and hypothetical mental structures. Be assured however that efficiency has been conserved. In many cases the same test item can be used to measure all of these "things." (Nairn 1980; Taylor, 1994; Sternberg, 1990) The simplest conclusion is that multiple choice tests measure exactly what the people who construct them claim that they measure; the definition of the abstraction they claim to measure is simply the score on the test. Which puts the Americans in a similar position to the Europeans, with the substitution of test agencies for individual Judges, of an elitist junta for the monarchy. One corollary of this conclusion is that the tests really do measure something but no one is sure what it is. In the light of all of the evidence this seems unlikely to me. Contradictions are predictable from the logical type confusions that are inherent in the whole test process. A more plausible corollary is that the tests do not measure anything in particular, nor do they place people in any particular order of anything, except the order that participating in testing events of any sort tends to generate. But they do place them in an order, along a single line of "merit," and that is all they are required to do. One more point is very significant.
"Ability" or "achievement" tests like the Scholastic Aptitude Test do place
groups of students (not to be confused with individual students) in an
order very closely related to parental income and social class. In this
sense they contribute significantly to the stability of an unequable social
structure whilst at the same time producing an ideological smoke screen
by asserting that they are ordering on the basis of individual ability.
And the victim pays for the test. Fantastic! (Nairn,1980; Friedenberg,
1969, p29).
Social skillsIn 1976 I was about to begin a five year research project looking at social development in school classrooms. At the time there was much educational discourse about teaching social skills, which many thought were in short supply in young people. "Improving social skills" was an objective in courses from grade one Mathematics to grade seven English to grade twelve Economics. As part of the preliminary work I visited schools in Australia, Canada and the United States, and talked to many teachers about the social development of their students.These teachers were all interested in the social skills of their students. They taught young people from the age of five in infant schools to the age of seventy five in Ph.D programs. Yet in describing their students to me there was enormous similarity in their descriptions. It went something like this: "When they first come to me they are pretty bad. Inarticulate really. Stumble over words, tend to answer just yes or no. Can't put two coherent sentences together. Can't listen properly. Can't concentrate. Just don't seem to be able to relate to other people. Bad with their peers, and worse with me. Then as the year goes on and they get more practice in speaking up and their confidence grows they improve tremendously. By the end of the year I've generally been able to produce a class with quite mature social skills." What particularly struck me about these conversations was that they appeared to be the same regardless of the age of the students. So how could the social skills of five year olds be the same as those of twenty five year olds? Then I thought about my own experience over the previous two years as a "leader" of communication workshops; thirty teachers doing residential five day courses to increase their communication skills. Weren't they exactly the same? At the beginning of the week hesitant, not really listening to other people, insensitive to feelings. Then by the end of the week attentive and empathic, talking poetry rather than cliches. Had we been asking the wrong question? Did this change have anything to do with learning new skills? Or had we, over the five days, changed the social environment so that it was now appropriate to engage in a different sort of dialogue? Had the group experiences produced enough trust and cohesiveness to allow for some flow in human relationships, to overcome the stultifying role restrictions and mistrust that characterise much of our normal discourse? Were these observed changes simply indications of emotional openness, with concomitant increase in divergent thinking and spontaneity? The implications for our research were clear. The question we should address was not "How do we teach better social skills?," but rather "How do we develop the classroom group so that mature social relations and discourse are appropriate?" How can a social skill belong to one person? At least two people are always involved, and what is appropriate interaction, whether verbal or non-verbal, must always be a function of the relationship between them, of the context of the communication. What appears to be a quality of the person, a skill, turns out to be a production of a particular environment, a particular aspect of a human interaction, a discourse appropriate to a social relation. As with the quality of the bridge,
so with the quality of social behaviour: Even if it can be labelled, the
label can't be pinned on any particular object.
KnowledgeWhat do you mean, I rigged it? You wanted to prove your point about not pinning a label to a person. Then you chose social skills to talk about. And OK, you've got a case there. But what about intelligence? What about intellectual skills? What about cognitive achievement? What about mental ability? That's where the action is. Certainly that's where the money is. Skills are what employers seem to want, and increasingly what education seems to be about. And as you suggest, cognitive skills, facts and knowledge and understanding, are at the high status end of the skills spectrum. But why are they so different to social skills? Because they surely do belong to a single person. You don't have knowledge in relation to someone. Analytic ability is not a relationship with another person. Reasoning skills are surely inside the person and not in some mystical relationship that characterises an event. So let's look them in turn in more detail. Let's take knowledge first. If it's knowledge we're talking about, then it's got to be knowledge about something. So choose something. Computers. So how would you know that you had some knowledge about computers? I've used them at work for various things; cataloguing, letters, drafting. So I know what programs to use for particular purposes, and I know how to use them. In other words, you would reflect on particular interactions that you have had with computers, and on the results and feelings associated with those interactions? I suppose so. And you would interpret that recall of those experiences as knowledge? Well, if I hadn't had the knowledge I couldn't have done the work. But you just told me that you only knew that you had the knowledge because you had done the work. Yeah, well that's now. But what about the first time? What about the first time? The first time I must have had the knowledge first or I couldn't have used the computer properly. Tell me about the first time. Did you use the computer properly? Well, you know. I had to mess around and experiment a bit before I got it right. So the first time you had some knowledge, but not enough to do it properly? Yeah. And how did you know that you had enough knowledge to even make a beginning? Well, that needed a bit of confidence, and a bit of taking a risk. So it required a certain emotional state as well as a little preliminary knowledge? Yeah, that's right. And how did you know, or suspect, that you had that preliminary knowledge? Well, I'd done some other work with computers. And of course there was the instruction manual. In other words, you recalled other experiences with other computers. And you followed the instructions in the manual. So is the knowledge in the instructions? The instructions are meaningless without an event involving an interpreter and a computer. Ok. If I had to follow the instructions then I didn't have enough knowledge. Reading the instructions became part of the event and enabled me to proceed. Now they are part of my experience that I can recall for future events. So knowledge, once again, becomes, or at least involves, the process of recalling prior interactions. So you reckon my "knowledge" of computers consists of reflections about real past events, or following instructions to produce an event which I can recall, in which I interact with a particular computer in particular ways. Knowledge appears in this case to be the construction, or the reconstruction, of an interactional event, a relational experience. Knowledge also implies that the emotional tone of that event is positive. Exactly. Knowledge isn't something that you have. It's something that you do. It's something that is reconstructed in the present from memory traces of things that you've done before. You can carry out those reconstructions visually or in language in your own head, or in action with whatever objects are involved. And so knowledge of a particular field is continually created and recreated in the processes of selecting and applying memories of experience in that field. StoriesLet's make a slight diversion here to consider how this process of learning occurs developmentally in young children.Yet there is another trap more subtle still. For not only do we get caught up in our own stories, we also get caught up in the stories of other people, particularly those we admire, or love, or are controlled by. For we do not live alone. We are social animals, and our life stories require other people to bring them into being. Thus our stories about ourselves in the world are constructed out of our experience in the world. And this experience may come to us by direct involvement in the world, or involvement through the incorporated stories told us by others. And once these stories become accepted by us, they become part of our reality, part of our way of living in the world. Then we tend to construct our experience out of our stories. This is not a cause-effect relation, but an ecology of effects; our consciousness of the world, our way of being, involves an intimate interconnection of our experience, and the stories we use to make sense of that experience. Our knowledge of ourself is just that interconnection.
Knowledge of a fieldIn just the same way do we construct knowledge in a particular field of study. We create events around the object of study, observe what happens, and then make up a story about what is happening. Or more likely accept someone else's story about what is happening. For any field of study is just such a consensus story, comprising what Foucault calls a "regime of truth." Then we use the story to help us make sense of other events involving the object, or other objects in that field.This is equally true whether the field of attention is immense, as in mysticism or physics or history or engineering, or is small, as in building a table or washing dishes or driving a car. So our knowledge of the field consists of descriptions of events involving a selected set of data constructed out of the relation between story and experience, between hypothesis and interpretation (possibly involving measurement), between conception and perception. As Wolf (1991) expresses it, "sophisticated thought follows a 'zig-zag' course between craft and vision"(p41). But again, let us be clear on this fundamental point. The data, the knowledge, does not belong to the object of study. It is not a property of the object. Nor is it the name or a measure of a property of the object. It is rather information about the relationship of the object to its environment during a particular event, a particular interaction, suggested by the story in which it has a part to play. Messick (1989a) comes close to this but does not follow it up. In claiming that tests "do not have reliabilities and validities, only test responses do," he goes on to say "that test responses are a function not only of items, tasks, or stimulus conditions but of the persons responding and the context of measurement" (p14). In my terminology, they are functions of events. We could generalise. All knowledge is knowledge of the relations that
identify events. And as we are observers at some point in the interaction,
either at the level of direct observation, or at the level of constructing
and interpreting the story that is the basis for the data collection, then
we ourselves are involved in the interaction, and are thus part of the
knowledge. And for the very reason that we are part of the knowledge, we
are not that knowledge, and the knowledge is not part of us.
Human abilityIn the light of the above, how are we to make sense of the notion of human ability, of capacity, of intelligence, of cognitive achievement, of some factor of the mind, of a latent trait?These are normally considered properties of the person, attributes of an isolated mind, functions of an individual human consciousness. Yet our analysis of how we collect information about the other, or even how we obtain knowledge about our self, denies the possibility of such separation, and acknowledges the possibility only of information about relations. I described knowledge of the field as a selected set of data constructed out of the relation between story and experience. Such selection is always in a context of some action, even if the most recent action is talking to oneself. Ability is a redundancy concept that acknowledges the action and then claims responsibility for it. It is an example of the common epistemological error of attributing a cause to the relational balance of an ecological system. Semantically, this is achieved through the simple trick of nominalisation; of changing a verb into a noun, and thus of converting a process into an object. It is very simple: I do something, I am part of an event. Therefore, the causal logic goes, I am able to do the things I do (before I do them), otherwise I wouldn't have been able to do them. Therefore I must have (here comes the nominalisation) an ability, some property located somewhere within me, that allows me to do this thing that I do. This is an example of the dormative principle. Keeney (1983), explains how it works: My running is now explained by a little permanent stable packaged bundle of something inside me called "ability to run." It is a fixed static. As such it is a glue that helps fix me in time and space. It enables me to be compared, labelled and classified in terms of this property. It becomes part of my individuality. What difference does it make? It makes world of difference, and a difference in the world. If the limits to my occupational choice and political power are largely determined by my cultural experience, by my practise in the field in which my interest lies, then most people might legitimately claim the right to such experience. On the other hand, if my ability severely limits my possibilities in that field, then I have no legitimate claim to further practise. My exclusion is legitimised. I cannot become a doctor or engineer or lawyer not because of lack of opportunity or experience, but because of lack of ability. Foucault (1992), in two condensed epigrammatic passages, sums up the essence of this argument: . . . the disciplines characterize, classify, specialize; they distribute along a scale, around a norm, hierarchize individuals in relation to one another and, if necessary, disqualify and invalidate (p223). ConclusionSo what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.The person who does the test has already accepted the name of the test and the measure that the test makes by the very act of doing the test; when you enter the raffle you agree to abide by the conditions of the raffle. So the mark becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete. Return to Table of Contents |