Education Policy Analysis Archives Homepage Enter the EPAA Archives for full text of articles Enter the EPAA Abstracts Contact the EPAA Editors Visit the EPAA Editorial Board Guidelines for submitting an article to EPAA Guidelines for submitting Commentary on published articles Search text of EPAA articles on keywords Subscribe to the EPAA listserv for announcements of newly published articles EPAA Volume 1  1993 EPAA Volume 2  1994 EPAA Volume 3  1995 EPAA Volume 4  1996 EPAA Volume 5  1997 EPAA Volume 6  1998 EPAA Volume 7  1999
This article has been retrieved  times since June 8, 1999

Education Policy Analysis Archives

Volume 7 Number 20

June 8, 1999

ISSN 1068-2341


A peer-reviewed scholarly electronic journal
Editor: Gene V Glass, College of Education
Arizona State University

Copyright 1999, the EDUCATION POLICY ANALYSIS ARCHIVES.
Permission is hereby granted to copy any article
if EPAA is credited and copies are not sold.

Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education.

Testing On Computers:
A Follow-up Study Comparing Performance On
Computer and On Paper

Michael Russell
Boston College

Abstract
Russell and Haney (1997) reported that open-ended test items administered on paper may underestimate the achievement of students accustomed to writing on computers. This study builds on Russell and Haney's work by examining the effect of taking open-ended tests on computers and on paper for students with different levels of computer skill. Using items from the Massachusetts Comprehensive Assessment System (MCAS) and the National Assessment of Educational Progress (NAEP), this study focuses on language arts, science and math tests administered to eighth grade students. In addition, information on students' prior computer use and keyboarding speed was collected. Unlike the previous study that found large effects for open-ended writing and science items, this study reports mixed results. For the science test, performance on computers had a positive group effect. For the two language arts tests, an overall group effect was not found. However, for students whose keyboarding speed is at least 0.5 or one-half of a standard deviation above the mean, performing the language arts test on computer had a moderate positive effect. Conversely, for students whose keyboarding speed was 0.5 standard deviations below the mean, performing the tests on computer had a substantial negative effect. For the math test, performing the test on computer had an overall negative effect, but this effect became less pronounced as keyboarding speed increased. Implications are discussed in terms of testing policies and future research.

Introduction

Recently, Walt Haney and I (Russell & Haney, 1997) reported that open-ended tests administered on paper to students accustomed to working on computer may seriously underestimate students' achievement. Although previous research on multiple-choice items suggests that the mode of administration, that is paper versus computer administration, does not significantly affect the test taker's performance (Bunderson, Inouye & Olsen, 1989), our study suggests that the mode of administration may have an extraordinarily large effect on students' performance on open-ended items.

Focusing on students participating in a project that placed heavy emphasis on computers, the study indicates that approximately 60% of the students in the Advanced Learning Laboratory (ALL School) were performing adequately on writing tests before the project began. Nearly two years after the program was implemented the same writing tests taken on paper indicated that only 30% of the students were writing adequately, an apparent decline of approximately 30% points. Yet, when the same tests were administered on computer (without student access to word processing tools such as spell checking or grammar checking), nearly 70% of students performed adequately. This significant and startling difference also occurred on National Assessment of Educational Progress (NAEP) reading and science items, which required students to respond to open-ended items (similar to items used in the Third International Math and Science Study, the Massachusetts Comprehensive Assessment System and other state level testing programs). The study concludes that for the students in the ALL School, most of whom are accustomed to working on computers, open-ended test questions administered on paper severely underestimated students' achievement.

Although our findings raise questions about the validity of open-ended test results for students accustomed to working on computer but who completed tests on paper, our study had several shortcomings. As we noted, only one extended writing item was used. Furthermore, no information regarding the extent to which students used computers or the proficiency with which students used computers was available. All of the examinees included in the study were accustomed to working on computers. Thus it was not possible to study the mode of administration effect across varied levels of previous computer use. Finally, beyond scores for a set of open-ended items performed by both groups on paper, no other information about prior academic achievement, such as standardized test scores or grades, was considered.

Despite these shortcomings, the results raise important questions about the extent to which scores for open-ended items administered on paper can be used to make inferences about individual students (or their schools) who are accustomed to working on computers. Moreover, if test scores are used to evaluate the effect increased expenditures for computers have on student achievement, the use of open-ended items administered on paper may also undermine the growing emphasis on educational technology.

In this study, I build on our prior work and overcome the shortcomings of our previous study. Specifically, I improved the study design in five ways. First, the sample of students was broadened to cover a range of prior computer experience. Second, information about students' prior use of computers, preference for writing on computer or on paper, and an indicator of students' keyboarding skill was collected. Third, rather than focusing on one extended writing task, several items were administered. Fourth, rather than focusing specifically on writing, open-ended math, language arts, and science items were examined. And fifth, all items included in this study had been used in state or national testing programs and had been validated previously. Thus, for this study I analyze the effect of computer administration across levels of computer use/proficiency using several open-ended items in the areas of language arts, math, and science. Specifically, the following research questions are addressed:

  1. Does the effect reported by Russell and Haney (1997) hold across levels of computer use/proficiency?
  2. Does the mode of administration effect occur within and across subject areas and if so, is the magnitude of the effect consistent across subject areas?
  3. Do students who prefer to write on paper perform better than predicted on open-ended items administered on paper and do students who prefer to write on computer perform better than predicted on open-ended items administered on computer?

Background

For three decades, educational theorists have proposed many ways in which computers might influence education. Although it was not until the 1970's that computers began having a presence in schools, since then the use of computers in education has increased dramatically (Zandvliet & Farragher, 1997). The National Center for Education Statistics reports that the percentage of students in grades 1 to 8 using computers in school more than doubled from 31.5 in 1984 to 68.9 in 1993 (Snyder & Hoffman, 1990; 1994). Similarly, the availability of computers to students in school increased from one computer for every 125 students in 1983 to one computer for every 9 students in 1995 (Glennan & Melmed, 1996). As the number of computers has increased, theories about how computers might benefit students' writing have proliferated. To a lesser extent, some researchers have carried out formal studies to examine whether writing on computer actually leads to better writing. Many of these studies have reported that writing on computers leads to measurable increases in students' motivation to write, the quantity of their work and the number of revisions made. Some of these studies also indicate that writing on computers improved the quality of writing. In a meta-analysis of 32 computer writing studies, Bangert-Drowns (1993) reports that about two-thirds of the studies indicated improved quality for text produced on computer. However, the extent to which writing on computers leads to higher quality writing seems to be related to the type of students examined. Generally, improvements in the quality of writing produced on a computer are found for learning disabled students, early elementary students, low-achieving students and college-aged students. Differences generally are not found for middle school and high school students.

Learning Disabled, Early Elementary Students and College-Aged Students

Although neither Kerchner and Kistinger (1984) nor Sitko and Crealock (1986) included a comparison group in their studies, both noted significant increases in motivation, quantity and quality of work produced by learning disabled students when they began writing on the computer. After teaching learning disabled students strategies for revising opinion essays, MacArthur and Graham (1987) reported gains in the number of revisions made on computer and the proportion of those revisions that affected the meaning of the passage. They also noted that essays produced on computer were longer and of higher quality. In a separate study, MacArthur again reported that when writing on a computer, learning disabled students tended to write and revise more (1988). At the first grade level, Phoenix and Hannan (1984) report similar differences in the quality of writing produced on computer.

Williamson and Pence (1989) found that the quality of writing produced by college freshman increased when produced on computer. Also focusing on college age students, Robinson-Stavely and Cooper (1990) report that sentence length and complexity increased when a group of remedial students produced text on the computer. Hass and Hayes (1986a) also found that experienced writers produced papers of greater length and quality on computer as compared to those who created them on paper.

Middle and High School Students

In a study of non-learning disabled middle school students, Dauite (1986) reported that although writing performed on the computer was longer and contained fewer mechanical errors, the overall quality of the writing was not better than that generated on paper. In a similar study, Vacc (1987) found that students who worked on the computer spent more time writing, wrote more and revised more, but that holistic ratings of the quality of their writing did not differ from text produced with paper-and-pencil.

At the middle school level, Grejda (1992) did not find any difference in the quality of text produced on the two mediums. Although Etchison (1989) found that text produced on computer tended to be longer, there was no noticeable difference in quality. Nichols (1996) also found that text produced on computer by sixth graders tended to be longer, but was not any better in quality than text produced on paper. Yet, for a group of eighth grade students, Owston (1991) found that compositions created on computer were rated significantly higher than those produced on paper.

Focusing on high school freshman, Kurth (1987) reports that there was no significant difference in the length of text produced on computer or on paper. Hawisher (1986) and Hawisher and Fortune (1989) also found no significant differences in the quality of writing produced by teenagers on paper and on computer. Hannafin and Dalton (1987) also found that for high achieving students, writing on computer did not lead to better quality writing. But for low-achieving students, texts produced on the computer were of a higher quality than those produced on paper.

Summary of Studies

The research summarized above suggests many ways in which writing on computer may help students produce better work. Most formal studies report that when students write on computer they tend to produce more text and make more revisions. Studies that compare student work produced on computer with work produced on paper find that for some groups of students, writing on computer also has a positive effect on the quality of student writing. This positive effect is strongest for students with learning disabilities, early elementary- aged students and college-aged students. All of the studies described above focus on student work produced in class under un-timed conditions. These studies also focus on work typically produced for English or Language Arts class, such as short stories or essays. However, the study presented here focuses on writing produced under formal timed testing conditions in three subject areas, namely language arts, math and science. Specifically, this study addresses the extent to which producing open-ended responses on computer or on paper effects students' performance, particularly for students with different levels of computer use.

Study Design

To better understand whether open-ended test items administered on paper underestimate the achievement of students accustomed to working on computers, six open-ended math, six science, and six language arts items were converted to a computer format and then administered in two modes, paper and computer. In addition, all students completed a computer use survey and performed a short keyboarding test. Finally, an indicator of prior achievement, namely Grade 7 Stanford Achievement Test version 9 (SAT 9) scores, was collected for each student. As is explained in detail below, the indicator of achievement was used to stratify and randomly assign representative sample groups and is used as a covariate for some analyses.

The study focuses on three subject areas: math, language arts, and science. For each subject area, a total of six open-ended items were administered. To decrease the amount of testing time required for each student, students were divided into four groups. Two of these four groups performed the six science items and three of the language arts items, only. For ease of reference I call these groups of students SL and LS. The remaining two groups of students performed the six math items and the other three language arts items, only. These groups of students are referred to as ML and LM. All students completed the computer use survey and performed the keyboarding test. In addition, an indicator of prior achievement was collected for each student.

The study occurred in three stages. During stage 1, SAT 9 scores were collected for each student. In total, four SAT 9 scores were collected for each student: Comprehensive Normal Curve Equivalent (NCE), Math NCE, Language Arts NCE and Science NCE. Once collected, the Comprehensive NCE was used to stratify and randomly assign four groups of students. Two of these groups formed the SL and LS students while the remaining two groups formed the ML and LM students.

During stage 2, all students completed the computer use survey and performed the keyboarding test. During stage 3, a crossed design was used to administer the open-ended items to each group. In this crossed design, the SL students first performed the science items on computer and then performed three language arts items on paper. The LS students first performed the three language arts items on computer and then performed the science items on paper. Similarly, the ML students first performed the math items on computer and then performed the three language arts items on paper. Finally, the LM students first performed the language arts items on computer and then performed the math items on paper. Below, the instruments, sampling method and scoring method are discussed in greater detail.

Instruments

The instruments used in this study fall into three categories: indicators of prior achievement; computer experience; and open-ended tests.

Indicator of Prior Achievement

As described in greater detail below, an indicator of prior achievement was used to assign students to experimental groups and as a covariate during analyses. Since the sample of students was limited to students in grade eight, the students' grade 7 SAT 9 NCE scores were used as the indicator of prior achievement.

Computer Experience

Two instruments were used to estimate students' level of computer experience. First, a survey that focused on prior computer use was administered to all students. Second, all students completed a brief keyboarding test administered on computer.

Student Questionnaire

The survey was designed to collect information about how much experience students had working with computers and, in particular, how they used computers during their writing process. The survey included questions that asked:

  1. how long the student has had a computer in his/her home:
  2. how many years they have used a computer;
  3. how often they currently use a computer in school and at home;
  4. how often they use a computer during different stages of their writing process (e.g., brainstorming, outlining, composing a first draft, editing, writing the final draft); and
  5. whether they prefer to write papers on paper or on computer.

In addition, the survey asked students to draw a picture of their writing process and to then describe what they had drawn. The purpose of the drawing prompt was to collect information about if and when computers enter the student's writing process. Finally, the student questionnaire asked students to indicate their gender and their race/ethnicity.

To code student drawings, the following guide was used:

  • 0 - No computer visible
  • 1 - Computer used for final draft only
  • 2 - Computer used prior to creating the final draft

When coding drawings, both the drawing and the student's description of their drawing were reviewed prior to assigning a score. All drawings were coded by one rater. However, to examine inter-rater reliability, a sample of 20 drawings was coded by a second rater. For these 20 drawings, there was no discrepancy between the two raters' scores.

Keyboarding Test

To measure keyboarding skills, all students performed a computer based keyboarding test. The keyboarding test contained two passages which students had two minutes apiece to type verbatim into the computer. Words per minute unadjusted for errors was averaged across the two passages and was used to estimate students' keyboarding speed. Both keyboarding passages were taken directly from encyclopedia articles to assure that the reading level was not too difficult.

Although there is considerable debate about how to quantify keyboarding ability (see West, 1968, 1983; Russon & Wanous, 1973; Arnold, et al, 1997; and Robinson, et al, 1979), for the purposes of this study, students average words per minute (WPM) uncorrected for errors was recorded. In each of the scoring guidelines used to rate student responses to the open-ended test items, spelling was not explicitly listed as a criterion raters should consider when scoring student responses. For this reason, students keyboarding errors did not seem to be directly relevant to this study.

Open-Ended Tests

This study examines the mode of administration effect on student performance in three subject areas: science, math, and language arts. To restrict testing time to 60 minutes per test, 6 science items, 6 math items and two sets of 3 language arts items were administered. All items included in this study were taken directly from open-ended test instruments used previously. Sources for items include the National Assessment of Educational Progress (NAEP) and the Massachusetts Comprehensive Assessment System (MCAS).

Language Arts Items

In total, six language arts items were used in this study. Three of the language arts items were taken from the 1999 Spring administration of MCAS. Two of the items were taken directly from the 1988 Grade 8 NAEP Writing Assessment. And the final language arts item was taken from the 1992 Grade 8 NAEP Writing Assessment.

The three MCAS language arts items focus on reading comprehension. For each of these items, students read a brief passage and then answers an open-ended question about the passage. The passages include a poem titled "The Caged Bird", a speech titled "Sojourner Truth's Speech From the 1850s", and a short story titled "The Lion's Share".

The three NAEP language arts items focus on writing. The first writing item asks students to create a narrative piece that describes an embarrassing experience they have had. The second writing item prompt focuses on creative writing and asks students to write a good, scary ghost story. The final writing item tests students' expository writing skills and asks students to write about their favorite story, telling why they like it and what it means to them.

When selecting the items, two criteria were used. First, the time required to respond to the item could not exceed 30 minutes. Second, the amount of reading (if any) students had to complete before responding to the item could not exceed 1 page. The reason for this second criterion was to maximize the amount of time students spent actually responding to the item. It should be noted that all three MCAS items required students to read a short body of text before responding to a question while none of the NAEP items required students to read any text. For this reason, the MCAS items can be classified as primarily measuring reading comprehension and the NAEP items can be classified as measuring writing ability.

After the six items were selected, they were placed into one of two booklets. Two MCAS items and the 1992 NAEP item formed the test booklet titled Language Arts 1. The remaining MCAS item and the two 1988 NAEP items formed the second test booklet titled Language Arts 2.

Mathematics

The mathematics test booklet contained six items. Three of the items were taken from the 1998 grade 8 spring MCAS test and three items were taken from the 1996 grade 8 NAEP Assessment. Two of the math items tested fractions and proportions. Two items focused on students' ability to read and interpret a graph. One item tested students' ability to calculate and interpret means and medians. And the final item focused on students' problem solving skills.

When selecting mathematics items, two criteria were applied. First, the item had to require students to generate an extended (a minimum of one sentence) written response. Second, the item could not require students to draw a picture, diagram or graph. The first criterion was used to assure that students had to compose text in order to perform well on the item. The second criterion was used to prevent students working on computer from having to access drawing or graphing programs.

Science

Like the mathematics items, three of the science items came from the 1998 grade 8 spring MCAS test and three items came from the 1996 grade 8 NAEP assessment. Similarly, all of the items required students to generate a substantial amount of text (more than a sentence) in order to succeed and none of the items required students to draw pictures or graphs. Two of the items tested students understanding of the physical sciences. Two items focused on human biology. One item tested students understanding of electricity. And the final item tested students' ability to design an experiment.

Scoring Criteria

For all of the items, the scoring criteria developed by MCAS or NAEP were used.

All of the MCAS scoring guidelines used a scale that ranged from 0 to 4. For the MCAS items, a score of 0 indicated that the item was left blank or that the student's response was completely incorrect. Scores of 1 to 4 represented increasingly higher levels of performance.

The scales for the NAEP scoring guidelines varied from 1-3, 1-4, 1-5, and 1-6. A code of 9 was awarded to items that were left blank. For all items, a 1 indicated that the student's response was completely incorrect. Scores of 2 to 6 represented increasingly higher levels of performance. To make the scores for the MCAS and the NAEP items more comparable, all blank responses were re-coded as a zero. The resulting NAEP scales ranged from 0-3, 0-4, 0-5, or 0-6.

Converting Paper Versions to Computer Versions

Before the tests could be administered on computer, the paper versions were converted to a computerized format. Several studies suggest that slight changes in the appearance of an item can affect performance on that item. Something as simple as changing the font in which a question is written, the order items are presented, or the order of response options can affect performance on that item (Beaton & Zwick, 1990; Cizek, 1991). Other studies have shown that people become more fatigued when reading text on a computer screen than when they read the same text on paper (Mourant, Lakshmanan & Chantadisai, 1981). One study (Haas & Hayes, 1986b) found that when dealing with passages that covered more than one page, computer administration yielded lower scores than paper-and-pencil administration, apparently due to the difficulty of reading extended text on screen. Clearly, by converting items from paper to computer, the appearance of items is altered.

To minimize such effects, students taking a test on computer were given a hard copy of the test booklet. The only difference between the hard copy of the test booklets received by students taking the test on computer and the original paper version was that the blank lines on which students recorded their responses were replaced by instructions to write answers on the computer.

Prior to beginning a test, students in the computer group launched a computer program that performed four tasks. First, the program prompted students to record their name and identification number. Second, the program presented the same directions that appeared in their hard copy. Third, the program allowed students to navigate between text boxes in which they recorded their responses to the open-ended questions presented in their test booklet. Finally, after a student completed the test, the program presented two questions about the taking the test on computer (described more fully below).

To assist students in recording their responses in the proper text box, the program placed the question number and accompanying prompt above each text box. To help avoid confusion, only one text box appeared on the screen at a time. To move between text boxes, two buttons appeared on the bottom of the screen. The button labeled "Next" allowed students to navigate to the text box for the next question and the button labeled "Back" allowed students to move to the previous question. Below the last text box, a button labeled "I'm Finished" appeared. Once students felt they were finished with the test, they clicked on the "I'm Finished" button. To assure that they were in fact finished, students were asked again if they were done. If so, they clicked the "I'm Finished" button again. Otherwise, they clicked the "Back" button to continue working on their responses.

When students were finished taking the test and had selected the "I'm Finished" button twice, they were prompted with two questions about the test. The first asked students: "Do you think you would have done better on this test if you took it on paper. Why?" The second question asked: "Besides not knowing the answer to a question, what problems did you have while taking this test on computer?" Students were required to answer these questions before they could quit the program.

To create a computerized version of the test booklets, the following steps were taken:

  1. An appropriate authoring tool, namely Macromedia Director, was selected to create software that would allow students to navigate between questions and to write data to a text file.
  2. A data file was created to store student input, including name, ID number, and responses to each item.
  3. A prototype of each test was created, integrating the text and database into a seamless application. As described earlier, navigational buttons were placed along the lower edge of the screen. In addition, a "cover" page was created in which students entered their name and id numbers.
  4. The prototype was tested on a class of ninth grade students to assure that all navigational buttons functioned properly, that data was stored accurately, and that items were easy to read.
  5. Finally, the prototype was revised as needed and the final versions of the computer tests were installed on twenty computers in the ALL School and twenty- four computers in the Sullivan Middle School.

For all questions, examinees used a keyboard to type their answers into text boxes that appeared on the screen. To enable students to write as much as they desired, scrolling text boxes were used for all items. Although they could edit using the keyboard and mouse, examinees did not have access to word processing tools such as spell-checker or grammar-checker.

Sampling Method

The sample of students was drawn from two Worcester Public Middle Schools, namely The Advanced Learning Laboratory (ALL School) and the Sullivan Middle School. Since the analyses focus on how the mode of administration effect varies across achievement levels and, more importantly, computer use/proficiency levels, the population of students was pooled across the two schools. Before sampling began, a list of all grade eight students in the ALL School and all grade eight students from one team in the Sullivan Middle School was generated. In total, this yielded 327 students. For each student on the list, an indicator of prior achievement, namely grade 7 SAT 9 scores, was collected. Since some of the students were new to the district or had not taken the SAT 9 the previous year, SAT 9 scores were only available for 287 students. Using a stratified random assignment procedure, students were then assigned to one of two groups. Group 1 was then assigned to the Language Arts 1 and Math tests and group 2 was assigned to the Science and Language Arts 2 tests. For each group, this process was repeated again, this time assigning half of the students in group 1 to take the Language Arts 1 test on computer first and the remaining half to take the Math test on computer first. Similarly, half of the second group of students was assigned to take the Language Arts 2 test on computer first and the remaining half took the Science test on computer first.

Those students for whom SAT 9 scores were not available were randomly assigned to one of the four groups. Although their scores are not included in the analyses below, their responses were used to train raters prior to scoring the test booklets for students included in the analyses.

Due to absences and refusals to perform one or more instruments, complete data records were available for 229 students. To be clear, a complete data record was defined as one containing a student's SAT 9 scores, their responses to the student questionnaire, the results of the keyboarding test, and results from at least one of the open-ended tests.

Scoring

To reduce the influence hand writing has on raters' scores (Powers, Fowles, Farnum & Ramsey, 1994), all responses to the open-ended items administered on paper were transcribed verbatim into computer text. The transcribed responses were randomly intermixed with the computer responses. All student responses were formatted with the same font, font size, line spacing and line width. In this way, the influence mode of response might have on the scoring process was eliminated.

Scoring guidelines designed for each item were used to score student responses. To increase the accuracy of the resulting scores, all responses were double-scored. When discrepancies between raters' scores arose, an adjudicator awarded the final score. At the conclusion of the scoring process, one score was recorded for each student response.

To estimate inter-rater reliability, the original scores from both raters were used. The resulting scores were compared both via correlation and percent agreement methods. Table 1 shows that for most items the correlation between the two raters' scores was above .8 and for many items the correlation was above .9. For two of the items on the first language arts test, however, correlations were closer to .7. Nonetheless, this represents an adequate level of inter-rater reliability.

Table 1
Inter-rater Reliability for Open-Ended Items

  Correlation % Exact Agreement % Within 1 Point
Language Arts 1    
Item 1 .80 .68 1.00
Item 2 .74 .50 1.00
Item 3 .72 .59 .95
Language Arts 2    
Item 1 .95 .84 1.00
Item 2 .94 .88 1.00
Item 3 .91 .76 1.00
Math    
Item 1 .91 .60 1.00
Item 2 .83 .65 .90
Item 3 .88 .80 .95
Item 4 .94 .80 1.00
Item 5 .84 .90 1.00
Item 6 .70 .75 .95
Item 1 .80 .64 .95
Item 2 .86 .73 1.00
Item 3 .88 .82 1.00
Item 4 .88 .86 1.00
Item 5 .92 .82 1.00
Item 6 .85 .73 1.00

To estimate intra-rater reliability, one rater double-scored 20% of the responses. The resulting scores for this rater were compared via correlation and percent agreement methods. Table 2 shows high correlations between the two sets of scores. Moreover, where discrepancies occurred, the difference between the two scores was never more than one point.

Table 2
Intra-rater Reliability for Open-Ended Items

  Correlation % Exact Agreement % Within 1 Point
Language Arts 1     
Item 1 .92 .86 1.00
Item 2 .95 .95 1.00
Item 3 .88 .77 1.00
Language Arts 2     
Item 1 .91 .82 1.00
Item 2 .93 .91 1.00
Item 3 .94 .86 1.00
Math    
Item 1 .92 .77 1.00
Item 2 .97 .91 1.00
Item 3 .98 .95 1.00
Item 4 .96 .82 1.00
Item 5 .88 .91 1.00
Item 6 .84 .82 1.00
Science    
Item 1 .94 .86 1.00
Item 2 .96 .91 1.00
Item 3 .93 .91 1.00
Item 4 .92 .91 1.00
Item 5 .98 .95 1.00
Item 6 .94 .86 1.00

 

Note that the adjudicated scores were produced for all students and that the adjudicated scores were used for all analyses described below.

Results

This study explores the relationships between prior computer use and performance on four open-ended test booklets. To examine this relationship, three types of analyses were performed. First, independent samples t-tests were employed to compare group performance. Second, total group regression analyses were performed to estimate the mode of administration effect controlling for differences in prior achievement. And third, sub-group regression analyses were performed to examine the group effect at different levels of keyboarding speed. However, before the results of these analyses are described, summary statistics are presented.

Summary Statistics

Summary statistics are presented for each of the instruments included in this study. The raw data are also available from this point. These original data are presented by the author for othes who may wish to perform secondary data analyses; anyone publishing analyses of these data should cite this article as the original source. For the student questionnaire, keyboarding test, and the SAT 9 scores, summary statistics are based on all 229 students included in the study. For the language arts, math and science open-ended tests, summary statistics are based on the sub-set of students that performed each test. When between group analyses are presented, summary statistics for select variables are presented for each sub-set of students that performed a given test.

Keyboarding Test

The keyboarding test contained two passages. Table 3 shows that the mean number of words typed for passage 1 and passage 2 was 31.2 and 35.0, respectively. As described above, the number of words typed for each passage was summed and divided by 4 to yield the number of words typed per minute for each student. Across all 229 students included in this study, the mean WPM was 16.5. Considering that the minimum WPM required by most employers when hiring a secretary is at least 40, an average of 16.5 WPM suggests that most students included in this study were novice keyboarders.

Table 3
Summary Statistics for the Keyboarding Test

N=229 Mean Std Dev Min Max
Passage 1 31.2 11.2 5 71
Passage 2 35.0 10.6 9 80
WPM 16.5 5.3 4.8 37.8

 

Student Questionnaire

The student questionnaire contained 11 questions. The maximum score for the Survey was 46 and the minimum score was 10. The scale for each item varied from 0 to 2, 1 to 2, 1 to 3 and 1 to 6. To aid in interpreting the summary statistics presented in table 4, the scale for each item is also listed. In addition to the Survey total score, summary statistics are presented for the Comp-Writing sub-score.

Although comparative data is not available, Table 4 suggests that on average students included in this study do not have a great deal of experience working with computers. The average student reports using a computer for between two and three years, having had a computer in the home for less than a year, and using a computer in school and in their home less than 1-2 hours a week. Furthermore, most students report that they do not use a computer when brainstorming, creating an outline or writing a first draft. Slightly more students report using a computer to edit the first draft. Most students, however, report using a computer at least sometimes to write the final draft. Similarly, most students indicate that if given the choice, they would prefer to write a paper on computer than on paper. Yet, when asked to draw a picture of their writing process, less than half the students included a computer in their drawing.

Again, the divergence between students' preference and their reported use of a computer in the writing process may indicate that when recording their preference some students provided a socially desirable response. If students did provide socially desirable responses, estimating the effect preference had on students' performance will be less precise.

Table 4
Summary Statistics for the Student Quesitonnaire

N=229 Scale Mean Std Dev Min Max
Years having computer at home 1-6 2.81 1.86 1 6
Years using computer 1-6 4.75 1.51 1 6
Use computer in school 1-6 2.74 1.10 1 6
Use computer at home 1-6 2.75 1 6
Brainstorm with computer 1-3 1.35 .55 1 3
Outline with computer 1-3 1.50 .57 1 3
First draft with computer 1-3 1.72 .75 1 3
Edit with computer 1-3 1.85 .73 1 3
Final draft with computer 1-3 2.50 .65 1 3
Preference 1-2 1.80 .40 1 2
Computer in drawing 0-2 .59 .72 0 2
Survey 10-43 24.37 5.68 12 41
Comp-Writing 5-17 9.51 2.77 5 17

Indicator of Prior Achievement

Four indicators of prior achievement were collected prior to the study. Specifically, SAT 9 Composite, Reading, Math and Science NCE scores were collected for each student. Note that the NCE scores provided for this study had been multiplied by 10 before they were supplied by the district office. Thus, the range for the NCE scores was 10 to 990 with a mean of 500 and a standard deviation of approximately 210 (see Crocker and Algina, 1986 for a fuller description of NCE scores). Table 5 displays the mean and standard deviation for each SAT 9 score. The mean score for each subject area and for the composite score for students included in this study is approximately .5 standard deviations below the national average. However, within this sample of students there is a substantial variation.

Table 5
Summary Statistics for Indicators of Prior Achievement

N=229 Mean Std Dev Min Max
Composite NCE 402 150.2 119 888
Reading NCE 391 178 10 990
Math NCE 389 174 10 990
Science NCE 434 172 67 896

Open-Ended Tests

Four open-ended tests were administered in three subject areas: math, science and language arts. As is described more fully above, two versions of the language arts test were administered. Each of the tests was administered to a sample of students. The number of students who performed each test ranged from a high of 117 for Language Arts 1 to a low of 100 for Language Arts 2. Within each sample, approximately half of the students performed the test on computer and half of the students performed the test on paper.

The summary statistics for the total sample of students who performed each test are presented in tables 6 through 9. Since each test contained some items from NAEP and some items from MCAS, it is not possible to directly compare the total test scores to the performance of students in other settings. However, to aid in interpreting the test scores, summary statistics are presented for each item along with the national or state average performance for each item. Note that comparison data for the MCAS items represents the mean score on a 0-4 point scale for all students in the state. Comparison data for the NAEP items represents the percentage of students nationally performing adequately or better on the item.

Table 6 presents the summary statistics for Language Arts 1 and Table 7 presents the summary statistics for Language Arts 2. For all items but one item included in the language arts tests, a score below 3 indicates inadequate performance. For item 2 on language arts test 1, a score below 4 indicates inadequate performance. For all items, many students failed to perform adequately. For all items, the mean performance was below 3 and for four items the mean performance was below 2. This low level of performance suggests these items were difficult for these samples of students.

Table 6
Summary Statistics for Language Arts 1

N=117

Scale Mean Std Dev % Adequate* Mean on MCAS % Adequate on NAEP

Item 1

0-4 1.42 1.04 19 1.99 NA

Item 2

0-4 1.50 1.05 16 1.73 NA

Item 3

0-6 2.91 1.41 31 NA 29

Total

0-14 5.84 2.85

 

 

 

* For items 1 and 2, a score of 3 or higher was considered adequate performance.
For item 3, a score of 4 or higher was considered adequate performance.

Table 7
Summary Statistics for Language Arts 2


N=100

Scale

Mean

Std Dev

% Adequate*
Mean on MCAS % Adequate on NAEP
Item 1 0-4 1.23 0.98 10 1.74 NA
Item 2 0-4 1.67 1.07 31 NA 51
Item 3 0-4 2.12 0.81 22 NA 25
Total 0-12 5.02 2.29     

* For all three items, a score of 3 or higher was considered adequate performance.

Table 8 displays the summary statistics for the Math test. Again, for all items except number 4, a score below 3 indicates inadequate performance. For item 4, a score below 4 indicates inadequate performance. For all items, the mean performance for this sample of students indicates that on average students performed below the adequate level.

Table 8
Summary Statistics for Math


N=110


Scale

Mean

Std Dev

% Adequate*
Mean on MCAS % Adequate on NAEP
Item 1 0-4 1.18 1.17 15 2.05 NA
Item 2 0-4 1.26 1.23 18 1.83 NA
Item 3 0-4 1.64 1.03 22 1.84 NA
Item 4 0-5 2.45 1.28 33 NA 28
Item 5 0-3 1.44 .53 2 NA 11
Item 6 0-4 1.99 .89 30 NA 26
Total 0-24 9.96 4.18

 

 

 

* For items 1, 2, 3 and 6, a score of 3 or higher was considered adequate.
For item 4, a score of 4 or higher was considered adequate.
For item 5, a score of 3 was considered adequate.

Table 9 displays the summary statistics for the Science test. For all items, a score below 3 indicates inadequate performance. For all items, the mean performance was below the adequate level.

Table 9
Summary Statistics for Science


N=102

Scale

Mean

Std Dev

% Adequate*
Mean on MCAS % Adequate on NAEP
Item 1 0-4 1.89 1.02 32 1.49 NA
Item 2 0-4 1.71 1.21 26 1.70 NA
Item 3 0-3 1.50 .75 8 NA 19
Item 4 0-3 1.21 .67 3 NA 9
Item 5 0-3 1.78 .90 22 NA 52
Item 6 0-4 1.57 1.09 20 1.81 NA
Total 0-24 9.66 4.14    

* For all items, a score of 3 or higher was considered adequate performance.

Clearly, students had difficulty with all four of these tests. For all MCAS items, this sample of students performed at a level below that of other students in the state of Massachusetts. For the NAEP items, students performed about as well or worse than other students in the nation.

Comparing Performance on Computer and on Paper

For each test, approximately one half of the sample of students was randomly assigned to perform the test on computer while the other half performed the test on paper. Tables 10 through 13 present the results of between group comparisons for each test. For each test, an independent samples t-test (assuming equal variances for the two samples and hence using a pooled variance estimate) was performed for the total test score. The null hypothesis for each of these tests was that the mean performance of the computer and the paper groups did not differ from each other. Thus, these analyses test whether performance on computer had a statistically significant effect on students' test scores.

To examine whether prior achievement, computer use or keyboarding skills differed between the two groups of students who performed each test, independent samples t-tests were also performed for students' SAT 9 Composite score, the corresponding SAT 9 sub-test score, Survey, Comp-Writing and WPM. The results of these tests are also presented in tables 10 through 13.

Table 10 shows that on average students who performed the first language arts test on paper performed the same as students who performed the test on computer. Similarly, differences between the two groups' SAT 9 Comprehensive scores, SAT 9 Reading scores, Survey scores, and WPM were not statistically significant. However, Table 10 shows that the mean Comp-Writing score for students who performed the language arts 1 test on computer was larger than the mean for students who performed the test on computer.

Table 10
Between Group Comparisons for Language Arts 1

Paper N = 57
Computer N = 60

Mean

Std Dev
SE of Mean
t-value

Sig.
LA 1      
Paper 5.84 2.65 .35    
Computer 5.83 3.04 .39 .02 .99
       
SAT 9 Comp.      
Paper 379 145 19   
Computer 421 159 21 -1.49 .14
       
SAT 9 Reading       
Paper 360 168 22   
Computer 426 189 24 -1.98 .05
       
Survey       
Paper 24.6 5.6 .75   
Computer 23.9 5.9 .76 .67 .51
       
Comp-Writing       
Paper 10.1 2.8 .37   
Computer 9.0 2.6 .33 2.09 .04*
       
WPM      
Paper 17.5 4.3 .56    
Computer 16.4 5.3 .68 1.17 .24

*Significant at the .05 level.

On the second language arts test, table 11 shows that Comp-Writing was the only measure on which the two groups differed. However, for the second language arts test, students who performed the test on computer had higher Comp-Writing scores on average than did those students who performed the test on paper. For all other instruments, the two groups did not differ significantly.

Table 11
Between Group Comparisons for Language Arts 2

Paper N = 45
Computer N = 55

Mean

Std Dev
SE of Mean
t-value

Sig.
LA 2       
Paper 5.07 1.70 .25   
Computer 4.98 2.70 .36 .18 .86
       
SAT 9 Comp.       
Paper 413 138 20   
Computer 393 146 19 .59 .55
       
SAT 9 Reading       
Paper 402 173 26   
Computer 376 172 23 .74 .46
       
Survey       
Paper 24.4 5.8 .87   
Computer 25.1 5.7 .76 -.63 .53
       
Comp-Writing       
Paper 8.9 3.1 .46   
Computer 10.1 2.6 .36 -2.09 .04*
       
WPM       
Paper 15.7 4.9 .73   
Computer 17.2 6.5 .88 -1.23 .22

*Significant at the .05 level.

With a few exceptions, the students who performed the first language arts test also performed the math test. However, those students who performed the first language arts test on computer performed the math test on paper and vice versa. For this reason, table 12 indicates that the mean Comp-Writing score for the computer group was higher than that of the paper group. Again, this difference is statistically significant. For all other instruments, Table 12 indicates that differences between the two groups' scores were not statistically significant.

Table 12
Between Group Comparisons for Math

Paper N = 54
Computer N = 56


Mean

Std Dev
SE of Mean
t-value

Sig.
Math

 

 

 

 

 

Paper 10.70 4.34 .59

 

 

Computer 9.25 3.90 .52 1.84 .07

 

 

 

 

 

 

SAT 9 Comp.

 

 

 

 

 

Paper 414 155 21

 

 

Computer 407 154 21 .23 .82

 

 

 

 

 

 

SAT 9 Math

 

 

 

 

 

Paper 401 179 24

 

 

Computer 406 190 25 -.16 .87

 

 

 

 

 

 

Survey

 

 

 

 

 

Paper 23.6 5.98 .81

 

 

Computer 25.1 5.72 .76 -1.29 .20

 

 

 

 

 

 

Comp-Writing

 

 

 

 

 

Paper 9.0 2.66 .36

 

 

Computer 10.1 2.65 .35 -2.22 .03*

 

 

 

 

 

 

WPM

 

 

 

 

 

Paper 16.0 4.7 .64

 

 

Computer 17.9 5.2 .69 -2.01 .05

*Significant at the .05 level.

The open-ended science test was the only test for which there was a statistically significant difference in the two groups' test performance. Table 13 shows that on average the computer group performed better than the paper group. There were no other statistically significant differences between the two groups.

Table 13
Between Group Comparisons for Science

Paper N = 51
Computer N = 51

Mean

Std Dev
SE of Mean
t-value

Sig.
Science       
Paper 8.55 3.88 .54   
Computer 10.76 4.14 .58 -2.79 .006*
       
SAT 9 Comp.       
Paper 388 134 19   
Computer 426 152 21 -1.33 .19
       
SAT 9 Science      
Paper 414 154 21    
Computer 466 181 25 -1.57 .12
       
Survey       
Paper 25.4 5.5 .77   
Computer 24.0 5.7 .80 1.22 .23
       
Comp-Writing       
Paper 9.9 2.6 .37   
Computer 9.1 3.0 .43 1.39 .17
       
WPM       
Paper 17.1 6.3 .88   
Computer 15.9 5.1 .72 1.01 .32

*Significant at the .05 level.

Note that statistical significance for the t-tests reported above was not adjusted to account for multiple comparisons. Given that six comparisons were made for each group, there is an increased probability that reported differences occurred by chance. Employing the Dunn approach to multiple comparisons (see Glass & Hopkins, 1984), a for c multiple comparisons, a pc, is related to simple a for a single comparison as follows:

a pc = 1 - (1- a )1/c

Hence, for six comparisons the adjusted value of a simple 0.05 alpha level becomes 0.009. Analogously, a simple alpha level of 0.01 for a simple comparison becomes 0.001.

Once the level of significance is adjusted for multiple comparisons, the open-ended science test is the only instrument for which there is a statistically significant group difference. This difference represents an effect size of .57 (Glass's delta effect size was employed). Although this effect size is about half of that reported by Russell and Haney (1997), it suggests that while half of the students in the computer group scored above 10.76, approximately 30% of students performing the test on paper scored above 10.76. The difference between the two groups' open-ended science scores, however, may be due in part to differences in their prior achievement as measured by SAT 9 Science scores.

To control for differences in prior achievement, a multiple regression was performed for each open-ended test. Tables 14 through 17 present the results of each test score regressed on the corresponding SAT 9 score and group membership. For all four regression analyses, the regression coefficient (B) for group membership indicates the effect group membership has on students' performance when the effect of SAT 9 scores is controlled. Group membership was coded 0 for the paper group and 1 for the computer group. A positive regression coefficient indicates that performing the test on computer has a positive effect on students' test performance. A negative regression coefficient suggests that on average students who performed the test on computer scored lower than students who performed the test on paper.

Table 14 indicates that SAT 9 Reading scores are a significant predictor of students' scores on the first open-ended language arts test. For each one standard score unit increase in SAT 9 Reading scores, on average students experience a .42 standard score increase in their test score. Table 14 also indicates that after controlling for differences in SAT 9 Reading scores, performing the first language arts test on computer has a negative impact on students scores. This effect, however, is not statistically significant.*

Table 14
Language Arts 1 Regressed on SAT 9 Reading and Group Membership

 

B

SE B

Beta

T

Signif.

SAT 9 Reading

.007

.001

.42

4.87

<.0001

Group

-.443

.492

-.08

-.90

.37

 

 

 

 

 

 

F

11.85

 

 

 

<.0001

N

117

 

 

 

 

R2

.17

 

 

 

 

Adjusted R2

.16

 

 

 

 

The results for the second language arts test are similar to those for the first language arts test. Table 15 shows that a one point standard score increase in SAT 9 Reading score is associated with a .4 point standard score increase in language arts 2 score and that this effect is statistically significant. Controlling for SAT 9 Reading scores, group membership does not have a significant effect on students' test score.

Table 15
Language Arts 2 Regressed on SAT 9 Reading and Group Membership

 

B

SE B

Beta

T

Signif.

SAT 9 Reading

.005

.001

.40

4.3

<.0001

Group

.051

.428

.01

.1

.91

 

 

 

 

 

 

F

9.12

 

 

 

.0002

N

100

 

 

 

 

R2

.16

 

 

 

 

Adjusted R2

.14

 

 

 

 

For both the math and science tests, SAT 9 scores and group membership have statistically significant effects on students' scores. The direction of the effect, however, is different for each test. Table 16 indicates that performing the open-ended math test on computer has a negative effect on students' test scores when SAT 9 Math scores are controlled. For science, this effect is reversed. Table 17 shows that after controlling for differences in SAT 9 Science scores, performing the open-ended science test on computer leads to higher scores than performing the same test on paper. For both tests, the effects are equivalent to just less than .2 standard score units.

Table 16
Math Regressed on SAT 9 Math and Group Membership

 

B

SE B

Beta

T

Signif.

SAT 9 Math

.016

.001

.72

11.03

<.0001

Group

-1.546

.541

-.19

-2.86

.005

 

 

 

 

 

 

F

64.41

 

 

 

<.0001

N

110

 

 

 

 

R2

.54

 

 

 

 

Adjusted R2

.54

 

 

 

 

Table 17
Science Regressed on SAT 9 Science and Group Membership

 

B

SE B

Beta

T

Signif.

SAT 9 Science

0.014

.002

.59

7.50

<.0001

Group

1.466

.645

.18

2.28

.025

 

 

 

 

 

 

F

34.19

 

 

 

<.0001

N

102

 

 

 

 

R2

.41

 

 

 

 

Adjusted R2

.40

 

 

 

 

 

Sub-Group Analyses

The regression analyses presented above indicate that mode of administration did not have a significant effect on students' performance on either language arts test. For the science test, performing the test on computer had a positive effect on students' scores. And for the math test, performing the test on computer led to lower performance. For all four of these analyses, the effect was examined across levels of computer use. To test whether the effect of mode of administration varied for students with different levels of computer skill, students' WPM was used to form three groups. The first group contained students whose WPM was .5 standard deviations below the mean, or less than 13.8. The second group contained students whose WPM was between .5 standard deviations below the mean and .5 standard deviations above the mean, or between 13.8 and 19.2. The third group contained students whose WPM was .5 standard deviations above the mean or greater than 19.2. For each group, the open-ended test scores were regressed on SAT 9 scores and group membership.

Table 18 displays the results of the three separate regressions for the first language arts test. For students whose WPM is .5 standard deviations below the mean and for students whose WPM is within .5 standard deviations of the mean, performing the test on computer has a negative effect on their scores. However, for these two groups of students, neither SAT 9 Reading nor group membership is a statistically significant predictor of language arts 1 score. However, for students whose keyboarding speed is one-half of standard deviation above the mean, or greater than 19.2 words per minute, performing the test on computer has a statistically significant positive effect on their performance. This effect is also three times stronger than the relationship between their SAT 9 reading score and their performance on the first language arts test. For the first language arts test, performing the test on computer seems to hurt students whose WPM is near or well below the mean and helps students whose WPM is well above the mean.

Table 18
Language Arts 1 Regressed on SAT 9 Reading and Group
for Three Sub-Groups

WPM <13.8 N=30

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Reading

0.006

0.003

.36

1.98

.06

Group

-1.115

1.001

-.20

-1.12

.27

Adjusted R2

.08

 

 

 

 

 

 

 

 

 

 

13.8<WPM<19.2 N=54

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Reading

0.004

.002

.23

1.66

.10

Group

-1.330

.694

-.27

-1.91

.06

Adjusted R2

.05

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WPM >19.2 N=33

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Reading

.003

.002

.19

1.32

.20

Group

2.946

.764

.56

3.86

.0006

Adjusted R2

.38

 

 

 

 

The same relationship was found for the second language arts test (Table 19). However, for this test, the negative effect of taking the test on computer was statistically significant for students whose WPM was .5 standard deviations below the mean. For students whose WPM was within .5 standard deviations of the mean, performing the test on computer also had a negative effect on test performance, but this effect was not statistically significant. For students whose WPM was .5 standard deviations above the mean, performing the test on computer had a positive effect of nearly a half standard score on their language arts 2 test scores. This effect was statistically significant.

Table 19
Language Arts 2 Regressed on SAT 9 Reading and Group
for Three Sub-Groups

WPM <13.8 N=35

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Reading

-.0001

.002

-.02

-.10

.92

Group

-1.48

.601

-.41

-2.47

.02

Adjusted R2

.11

 

 

 

 

 

 

 

 

 

 

13.8<WPM >19.2 N=37

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Reading

.002

.002

.17

1.02

.31

Group

-.974

.631

-.26

-1.54

.13

Adjusted R2

.08

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WPM >19.2 N=28

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Reading

.006

.002

.37

2.32

.03

Group

2.068

.71

.46

2.90

.008

Adjusted R2

.43

 

 

 

 

For the open-ended math test, performing the test on computer had a negative effect on students' scores at all levels of keyboarding (Table 20). However, as keyboarding speed increased, this effect became less pronounced. For students whose WPM was .5 standard deviations below the mean, taking the test on computer had an effect of -.39 standard score units. For students whose WPM was within .5 standard deviations of the mean, this effect was -.24 standard score units. Both of these effects were statistically significant. However, for students whose WPM was .5 standard deviations above the mean, the effect was -.17 standard units and was not statistically significant.

Table 20
Math Regressed on SAT 9 Math and Group for
Three Sub-Groups

WPM <13.8 N=30

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Math

.015

.003

.68

5.58

<.0001

Group

-3.068

.970

-.39

-3.16

.004

Adjusted R2

.57

 

 

 

 

 

 

 

 

 

 

13.8<WPM >19.2 N=49

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Math

.010

.003

.50

4.09

.0002

Group

-1.55

.796

-.24

-1.96

.05

Adjusted R2

.30

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WPM >19.2 N=31

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Math

.019

.003

.73

5.96

<.0001

Group

-1.499

1.070

-.17

-1.40

.17

Adjusted R2

.56

 

 

 

 

 

Conversely, taking the science test on computer had a positive effect on students' scores at all levels of keyboarding speed (Table 21). However, this effect was only statistically significant for students whose WPM was within .5 standard deviations of the mean. For students whose WPM was .5 standard deviation units above the mean, this effect is less pronounced and is not statistically significant.

Table 21
Science Regressed on SAT 9 Science and Group for
Three Sub-Groups

WPM <13.8 N=35

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Science

.011

.003

.50

3.33

.002

Group

.909

1.020

.13

.89

.38

Adjusted R2

.24

 

 

 

 

 

 

 

 

 

 

13.8<WPM >19.2 N=40

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Science

.010

.002

.47

4.00

.0003

Group

3.368

.893

.45

3.77

.0006

Adjusted R2

.48

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WPM >19.2 N=27

 

 

 

 

 

B

SE B

Beta

T

Signif.

SAT 9 Science

.021

.004

.70

4.71

.0001

Group

.170

1.204

.02

.14

.89

Adjusted R2

.45

 

 

 

 

 

Discussion

The experiment described here extends the work of Russell and Haney (1997) and improved upon their study in five ways. First, this study included students whose prior computer experience varied more broadly. Second, many more open-ended items in the area of language arts, math and science were administered. Third, all of the open-ended test items included in this study had been used in state or national testing programs and had been validated previously. Fourth, an indicator of academic achievement was collected prior to the study and was used both to randomly assign students to groups and as a covariate during regression analyses. And fifth, information on students' prior computer use and keyboarding speed was collected and used during analyses.

In their study, Russell and Haney (1997) reported large, positive group differences which were consistent for all writing, math and science open-ended items administered on computer. In this study, a significant positive group difference was found only for the open-ended science test. This effect was about half the size reported by Russell and Haney (1997). However, in this study students' level of prior computer use varied more than it did in the previous study. Although Russell and Haney did not collect a formal measure of computer use, the students included in their study were so accustomed to working on computer that when standardized tests were given, the school had difficulty finding enough pencils for all students. Although three years have passed since the previous study, it may be possible to estimate the difference in the level of prior computer use of the students included in both studies.

This study includes students from two schools, one of which was the focus of the previous study. Table 22 compares the WPM and survey scores for students in the ALL School and Sullivan Middle School. For both measures of computer use and for keyboarding speed, students in the ALL School have significantly higher scores. For the ALL School the mean WPM was nearly .5 standard deviations above the mean for the total sample while the mean for students from the Sullivan Middle School was below the total sample mean. By including Sullivan Middle School students in this study, a broader range and lower levels of computer use were represented. Including students with low levels of computer use and poor keyboarding skills seems to have counteracted the effect described in the previous study since these students performed less well on the language arts computer tests than on the paper tests.

Table 22
Comparison of Computer Use across Participating Schools

ALL N = 35
Sullivan N = 194


Mean


Std Dev

SE of Mean


t-value


Sig.

WPM

 

 

 

 

 

ALL

18.9

5.1

.36

 

 

Sullivan

16.1

6.2

1.05

2.94

.004

 

 

 

 

 

 

Survey

 

 

 

 

 

ALL

27.5

5.8

.99

 

 

Sullivan

23.8

5.5

.39

3.69

<.0001

 

 

 

 

 

 

Comp-Writing

 

 

 

 

 

ALL

11.1

2.6

.43

 

 

Sullivan

9.2

2.7

.20

3.84

<.0001

To examine the effect the mode of administration had on student performance at different levels of computer use, sub-group analyses were performed. Figure 1 summarizes the effects found for three sub-groups: a. students whose WPM was .5 standard deviations below the mean; b. students whose WPM was within .5 standard deviations of the mean; and c. students whose WPM was .5 standard deviations above the mean.

Figure 1 Effect of Performing Test on Computer When Prior
Achievement is Controlled

Figure 1 shows that across three of the four tests, performing the test on computer had an adverse effect on the performance of students whose WPM was .5 standard deviations below the mean. Conversely, for students whose WPM was at least .5 standard deviations above the mean, performing the language arts tests on computer had a moderate positive effect. While performing the math test on computer had a negative effect for all students, this negative effect became less pronounced as students' keyboarding speed increased. For the Science test, performing the test on computer had a positive effect across levels of computer use. However, the effect was much larger for students whose WPM was within .5 standard deviations of the mean than it was for students whose keyboarding speed was either .5 standard deviations above the mean or .5 standard deviations below the mean.

Explaining the Effects

To explore the reasons why some students had difficulty working on computers, students were asked to answer the following two questions after they completed the computer version of the test: 1. Do you think you would have done better on this test if you took it on paper? Why?; and 2. Besides not knowing the answer to a question, what problems did you have while taking this test on computer?

    Students' responses to these questions were coded in two ways. First, the following numerical code was used:
  • 0 - No, I would not perform better on paper
  • 1 - I would perform the same or it didn't matter
  • 2 - Yes, I would perform better on paper

In addition to these codes, an emergent coding scheme was used to tabulate the reasons students provided for their answers. While coding responses to the post-test questions, it became apparent that when read together, the two questions provided more information about students' experience than reading them separately. Some students would simply write yes or no for the first question, but their reasoning became apparent in their response to the second question. Other students explained the problems they encountered for the first question and wrote little for the second question. For this reason, responses to both questions were read during the emergent coding.

Table 23 presents the numerical codings for the first question. Across all tests, only 10% of students indicated that they would have performed better if they had taken the test on paper. However, over half of those who indicated they would have performed better on paper took the math test on computer. To explore why more students who took the math test on computer felt they would perform better on paper, the full responses to the two follow-up questions were examined.

Table 23
Frequency of Students Responses to Post-Test Question 1:
Do you think you would have done better on this test if you
took it on paper?

 

Frequency

Percent

Language Arts 1

 

 

Not better on paper

38

63.3

Same on paper

19

31.7

Better on paper

3

5.0

 

 

 

Language Arts 2

 

 

Not better on paper

36

65.5

Same on paper

15

27.3

Better on paper

4

7.3

 

 

 

Math

 

 

Not better on paper

20

35.7

Same on paper

18

32.1

Better on paper

18

32.1

 

 

 

Science

 

 

Not better on paper

32

62.7

Same on paper

16

31.4

Better on paper

3

5.9

Table 24 presents the frequency of student responses by test. Clearly, the most frequently sited problem related to students' keyboarding skills.* Across all tests, about 25% of the students who performed the test on computer indicated they had difficulty "finding the keys," "pressing the wrong key" or simply said they "couldn't type." Twenty percent of the students who performed the math test on computer also complained that it was difficult to show their work on the computer or that they had to solve problems on paper and then transfer it to the computer. Several students who performed the language arts tests on computer mentioned that they preferred the computer because it was neater and that they didn't have to erase mistakes but could simply delete them. Across all tests, a few students also stated that they preferred the computer because their hands did not get as tired or that it was faster to write on the computer.

Table 24
Frequency of Responses to Post-test Questions 1 and 2

 

LA 1

LA 2

Math

Science

Total

Difficulty typing

12

12

17

18

59

Neater on computer/can delete

8

9

2

2

21

Can't show work/drawings

 

 

10

 

10

Ran out of time on computer

1

2

3

1

7

Hand doesn't get tired

3

1

2

1

7

Faster on computer

3

1

2

1

7

Can take notes/solve problems on paper

1

 

4

 

5

Think better on computer/concentrate better

2

1

1

1

5

Write easier on paper

 

 

1

2

3

Hard looking back and forth between paper and computer

 

 

2

 

2

Hard to read screen

 

 

2

 

2

Problems with mouse

 

1

1

 

2

Easier to concentrate on paper

1

1

 

 

2

Write Poorly on paper

 

 

 

1

1

More comfortable on paper

 

1

 

 

1

More space on paper

 

1

 

 

1

Became confused where to put answers

 

 

1

 

1

Examining these responses, it appears that many more students who took the language arts tests recognized the computer's ability to display text that is easy to read and edit as an advantage. Conversely, students who took the math test felt that the inability to present and manipulate numbers in text was a disadvantage. In part, these different reactions to performing the tests on computer may explain the negative group effect for the math test and the positive group effects for the language arts tests. However, students' responses provide little insight into the overall group effect for science.

The Effect of WPM on Student Performance

To further examine the relationship between level of computer use and students' performance on the language arts tests, separate regression analyses were performed for students who performed the tests on paper and those who performed the tests on computer. For each of these regression analyses, the effect of prior computer use on students' performance was estimated controlling for SAT 9 scores. To provide separate estimates for keyboarding speed and for students' survey scores, two sets of regressions were performed for each sub- group. First, the test score was regressed on SAT 9 score, WPM and Survey. Second, the test score was regressed on SAT 9 score, WPM and Comp-Writing. Since Survey is partially composed of Comp-Writing, effects for each variable are estimated through separate regressions to avoid redundancy in the data and hence decrease the effects of colinearity. Figures 2 through 5 display the effects each variable had on test performance for students who took the test on computer and for those who took the test on paper.

Figure 2 and 3 show that across all tests, WPM is a weak predictor of students' scores when the test is performed on paper. However, for both language arts tests and the science test, WPM is a good predictor of students' scores when the test is performed on computer. This suggests that when these tests are performed on computer, the speed with which a student can type had a significant effect on their performance. However, for the math test, the effect of WPM on students' performance on computer is much less pronounced.

Figure 2: Effect of WPM on Student Performance Controlling for Survey

Figure 3: Effect of WPM on Student Performance Controlling for
Comp-Writing

Figures 4 and 5 indicate that neither the total Survey score nor the Comp- Writing score had a meaningful effect on the performance of students in either group. In fact, when the effect of WPM is considered, both the amount of prior computer use and use of computers during the writing process have slightly lower effects when the test is taken on computer for the language arts and science tests. Yet, for the math test, the effect is larger and positive. This pattern is difficult to