How the Internet Will Help Large-Scale Assessment Reinvent Itself

Large-scale assessment in the United States is undergoing enormous pressure to change. That pressure stems from many causes. Depending upon the type of test, the issues precipitating change include an outmoded cognitive-scientific basis for test design; a mismatch with curriculum; the differential performance of population groups; a lack of information to help individuals improve; and inefficiency. These issues provide a strong motivation to reconceptualize both the substance and the business of large-scale assessment. At the same time, advances in technology, measurement, and cognitive science are providing the means to make that reconceptualization a reality. The thesis of this paper is that the largest facilitating factor will be technological, in particular the Internet. In the same way that it is already helping to revolutionize commerce, education, and even social interaction, the Internet will help revolutionize the business and substance of large-scale assessment.

Whether for educational admissions, school and student accountability, or public policy, large-scale assessment in the United States is undergoing enormous pressure to change.This pressure is most evident with respect to high-stakes tests, like those used for grade promotion or college entrance.However, it is becoming apparent for lower-stakes survey instruments too, like the National Assessment of Educational Progress (NAEP) (e.g., Pellegrino, Jones, & Mitchell, 1999).
Several factors underlie the pressure to change.First, whereas our tests have incorporated many psychometric advances, they have remained separated from equally important advances in cognitive science, in essence measuring the same things in ever more technically sophisticated ways.Although decades of research have documented the importance of such cognitive constructs as knowledge organization, problem representation, mental models, and automaticity (Glaser, 1991), our tests typically do not account for them explicitly.As a result, our tests probably owe more to the behavioral psychology of the early 20th century than to the cognitive science of today (Shepard, 2000).
A second factor is the mismatch with the content and format of curriculum, a criticism more true of the developed ability tests commonly used in postsecondary admissions than of school achievement measures, but relevant to the latter too.The mismatch arises in part from the fact that the elemental, forced-choice problems dominating many tests are effective indicators of skills and abilities, and thus provide an efficient means for estimating student standing on those constructs.However, the mismatch becomes problematic because of the increasing attention being paid to test preparation.Although persistent direct training on these indicator tasks may increase test performance, it certainly is not the best way to improve construct standing.Further, it distracts attention from other, arguably more critical, learning activities (Frederiksen, 1984).
Differential performance of population groups is another factor.Because of the curricular mismatch, it is easy to blame group differences on purported bias in the test and more difficult to create a convincing defense than it would be if the tests were strongly linked to learning goals.In a high-stakes decision setting like admissions, tests become a lightning rod for the failure of schools and society to educate all groups effectively.With the potential elimination of affirmative action in university admissions, there is no politically acceptable choice but to reduce the role of such tests.California, Texas, Florida, and Pennsylvania are proposing to admit, or have begun admitting, all students with high-school rank above a certain point to their state higher education systems.At the same time, promotion tests tied to state curricular standards are being put into place to encourage schools to teach all students valued skills.Although in Texas one such test was challenged in court on the basis of differential performance, that challenge was rejected (Schmidt, 2000).This rejection suggests that when wellconstructed tests closely reflect the curriculum, group differences should become more an issue of instructional inadequacy than test inaccuracy (Bennett, 1998).
As attention shifts to the adequacy of instruction, the ability to derive meaningful information from test performance becomes more critical.A weak connection between test and curriculum insures that the value of feedback for the examinee will be limited.Even for tests where the connection is stronger, feedback is still too often of marginal value, in part because of the additional cost and processing time that would be incurred.For achievement surveys like NAEP, which offer no information to individuals, schools, or districts, motivation to participate is undoubtedly diminished.
Finally, there is efficiency.Testing programs are expensive to operate.That

The Promise of New Technology
Radical improvements in assessment will derive from advances in three areas: technology, measurement, and cognitive science (Bennett, 1999).Of the three, new technology will be the most influential in the short term and, for that reason, I focus on it in this paper.New technology will have the greatest influence because it-not measurement and not cognitive science-is pervading our society.Billions of dollars are being invested annually to create and make commonplace powerful, general technologies for commerce, communications, entertainment, and education.Due to their generality, these technologies can also be used to improve assessment.
These technological advancements revolve primarily around the Internet.The Internet is (or will be) interactive, broadband, switched, networked, and standards-based.What does that mean?
Interactive means that we can present a task to a student and quickly respond to that student's actions.Switched means that we can engage in different interactions with different students simultaneously.In combination, these two characteristics (interactive and switched) make for individualized assessments.Broadband means that those interactions can contain lots of information.For assessment tasks, that information could include audio, video, and animation.Those features might make tasks more authentic and more engaging, as well as allow us to assess skills that cannot be measured in paper and pencil (Bennett, Goodman, Hessinger, Ligget, Marshall, Kahn, & Zack, 1999).We might also use audio and video to capture answers, for example, giving examinees choice in their response modalities (typing, speaking, or, for a deaf student, American Sign Language).Networked indicates that everything is linked.This linkage means that testing agencies, schools, parents, government officials, item writers, test reviewers, human scorers, and students are tied together electronically.That electronic connection can allow for enormous efficiencies.Finally, standards-based means that the network runs according to a set of conventional rules that all participants follow.That fact permits both the easy interchange of data and access from a wide variety of computing platforms, as long as the software running on those platforms (e.g., Internet browsers), adheres to those rules too.(Note 1) As an embodiment of these characteristics, what does the Internet afford?It affords the potential to deliver efficiently on a mass scale individualized, highly engaging content to almost any desktop; get data back immediately; process it; and make information available anywhere in the world, anytime day or night.Paper delivery cannot compete with this potential.
The Internet is, of course, not being built to service the needs of large-scale assessment.It is, instead, being built for e-commerce: to sell products and services over the web to consumers and to businesses directly.Coincidentally, the capabilities needed for e-commerce are essentially those needed for e-assessment: interactive (so that products can be offered and orders transacted), switched (so different business transactions can be conducted with different customers simultaneously), broadband (so that those offers can be as engaging and enticing as possible), networked (so that product offers, orders, shipping, inventory, and accounting can be integrated), and standards-based (so that everyone can get to it, regardless of computing platform).
Will we be able to count on continued investment in the Internet to support its use as a delivery medium?By any measure, the Internet and use of it, has grown dramatically, to say the least.As a communications medium, the Internet last year surpassed the telephone, with 3 billion email messages sent each day (Church, 1999).The number of unique URLs (web-page directory and subdirectory addresses) has grown from just under a billion in 1998 to a projected 3 billion in 2000 ("Big fish," 1999).In the United States, the percentage of homes with Internet access has increased from 26% in December 1998 to 42% in August 2000 (U.S.Department of Commerce, 2000).(Note 2) Worldwide, the number of users has grown from somewhere between 117 to 142 million in 1998 to about 400 million in 2000 ("Big fish," 1999;Global Reach, 2000;"How many online?", 2000).Finally, the number of host computers has gone from about 30 million to 75 million from January 1998 to January 2000 ("Internet domain survey host count," 2000).This phenomenal growth may slow as investment subsides from the speculative rates of the past few years.However, the vast size of the Internet and its user base constitute a critical mass that should continue to attract substantial capital.
For commerce, the promise of the Internet is all about being faster, cheaper, and better.Two "laws" of the digital era illustrate this promise.Moore's Law predicts the doubling of computational capability (specifically, at the level of the microchip) every 18 months.As Negroponte (1995) has explained, what filled a room yesterday is on your desk today and will be on your wrist tomorrow.Metcalfe's Law says that the value of a network increases by the square of the number of people on it.The true value of a network is, thus, less about information and more about community (Negroponte, 1995).One can see this effect clearly in eBay, the online auction broker (Cohen, 1999).Each new user potentially benefits every other existing user because every eBay member can be both buyer and seller.(Note 3) Metcalfe's law is playing out well beyond eBay.Online business-to-business auction brokers are appearing in a variety of industries, including natural gas, electricity, steel, and bandwidth (Friedman, 2000, pp. 386-387;Gibney, 2000).
Another illustration of this cheaper-faster-better result is the effect of the Internet on the traditional relationship between richness and reach, where richness is the depth of the interaction that a business can have with a customer and reach is the number of customers that a business can contact through a given channel.Traditionally, one limited the other.That is, a business could attain maximal reach but only limited richness.For example, through direct mail, broadcast, or newspaper ads a company could communicate with many people but have a meaningful interaction with none of them.Similarly, a business could attain maximal richness but limited reach.Via personal contact (e.g., door-to-door sales), very deep interactions can occur, but with only a relatively small number of people.What has the Internet done?It has transformed the relationship between richness and reach by allowing businesses to touch many people in a personalized but inexpensive way (Evans & Wurster, 2000).What does richness with reach make for?It makes for mass customization.
We can already see the effects in Dell Computer Corporation's business model.Customers can log onto Dell's Internet site (www.dell.com),choose from a menu of basic machine designs, and then configure a particular design to meet their needs.A second example is Radio.SonicNet (http://radio.sonicnet.com/splash.asp).
Radio.SonicNet allows one to pick from a variety of music styles, choose artists within that style, and indicate how frequently each artist should play.The end result is a radio station uniquely tuned to the individual and continually interesting; it always plays what you like but you never know exactly what it is going to play.As a final example, consider Customatix ( www.customatix.com/customatix/common/homepage/HomepageGeneral.po ), which allows you to design your own shoes using up to three billion trillion combinations of colors, graphics, logos and materials per shoe.You design them.They build them.And nobody else is likely to have exactly the same ones.

Reinventing the Business
There are two major dimensions to reinventing assessment.One is the business of assessment.This dimension centers on the core processes that define an enterprise.In many cases, those core processes can become many times more efficient because moving bits is faster and easier than moving atoms (Negroponte, 1995); that is, electronically processing information is far more cost effective than physically manipulating things.
For large-scale testing programs, some examples of the potential for electronic processing are in: developing tests, making the items easier to review, revise, and automatically morph into still more items (e.g., Singley & Bennett, in press) because the items themselves are digitally represented; delivering tests, eliminating the costs of printing, warehousing, and shipping tons of paper; presenting dynamic stimuli like audio, video, and animation, making the need for specialized testing equipment (e.g., audio cassette recorders, VCRs) obsolete (Bennett, Goodman, Hessinger, Ligget, Marshall, Kahn, & Zack, 1999); transmitting some types of complex constructed responses to human graders, removing the need to transport, house, and feed the graders (Odendahl, 1999;Whalen & Bejar, 1998); scoring other complex constructed responses automatically, reducing the need for human reading (Burstein et al., 1998;Clauser et al., 1997); and distributing test results, cutting the costs of printing and mailing reports.
To get a sense of how reinventing the business of assessment might affect testing organizations, take a look at reference book publishing, in particular the case of Encyclopaedia Britannica (Evans & Wurster, 2000;Landler, 1995;Melcher, 1997).Encyclopaedia Britannica was established in Scotland in 1768.It is the oldest and most famous encyclopedia in the English-speaking world.By 1990, its sales had reached $650 million per annum.But then suddenly, Britannica's fortunes drastically changed.In 1996, the company was sold for less than half its net worth (i.e., the value of its assets, including its encyclopedia inventory, minus its liabilities).That same year, it eliminated its entire door-to-door North American sales force.By 1998, sales had fallen 80%.What happened?
What happened was that the reference book business was reinvented because of the emergence of new technology.At its peak, Britannica was a 32-volume set of books costing well over $1,000.In 1993, Microsoft introduced Encarta on CD-ROM for under $100 and even though Britannica was much more comprehensive, the difference for most people wasn't worth an extra $900+.Initially, Britannica did not respond as it didn't take the threat from Encarta seriously.But when it did respond, it did so ineffectively because Britannica wouldn't fit on a single CD-ROM and because the company's large sales force wasn't suited to selling software.But, ultimately, Britannica wasn't ready to cannibalize its existing paper business to enter this new electronic one.
Why is this story important?It's important because similar (though less extreme) scenarios are playing themselves out now in individual investing, book selling, travel planning, music distribution, long distance telephony, and even business-to-business transactions.(As to the last, Cisco Systems makes 90% of its revenue from business-to-business transactions done over the Internet [Cisco Systems, Inc., 2000]).These reinvention scenarios are forcing organizations-including some in educational assessment-to come quickly to grips with where new technology will and will not help core business processes.
As should be obvious, technology-driven changes in business processes can occur quickly and their consequences can be significant for the organizations that service a particular market.In fact, if radical and pervasive enough, process changes can force shifts in the substance of the business itself.So, although reinventing the business of assessment by incorporating technology into specific assessment processes is about trying to achieve the efficiencies needed to remain competitive today, reinventing the substance of assessment-most fundamentally, the reason we do it-is not about today.It's about tomorrow.

Reinventing the Substance
The populations seeking education are changing and so are their purposes for learning.At the college level, just 16% of students fit the traditional profile: 18-22 years old, full-time, on-campus resident (Levine, 2000a).This is not because fewer 18-22 year olds are going to college.It is because more adults are.The adult cohort is, in fact, the fastest growing segment in postsecondary education (Kerrey & Isakson, 2000).Working adults over age 24 constitute some 44% of college students ("Education prognosis 1999("Education prognosis ," 1999)).
Why are so many adults returning to college?Over the past 25 years, employer demand in the U.S. has shifted toward higher educational qualifications, as indicated by an increasing premium paid for those with a college degree (Barton, 1999).But in addition to this rise in entry qualifications, the knowledge required to maintain a job in many occupations is changing so fast that 50% of all employees' skills are estimated to become outdated within 3-5 years (Moe & Blodget, 2000).Witness any job that requires interaction with information technology (IT), which is a growing proportion of jobs.In fact, by 2006 almost half of all workers will be employed by industries that are either major producers or intensive users of IT products and services (Henry et al., 1999).
So, more people want postsecondary education because they need to have it if they want to become-and stay-employed.And, more of these individuals are nontraditional students who may work, travel in their jobs, or have families.For these people, physically attending classes is not always feasible, let alone convenient.(Note 4) This population's unmet educational need is increasingly becoming the target of distance learning.According to the National Center for Education Statistics, between fall 1995 and 1997-98, the percentage of higher education institutions offering distance learning courses increased by one-third (from 33% to 44%), and the number of course offerings and enrollments approximately doubled (Lewis et al., 1999).But although many institutions have delivered distance learning via mail, radio, or television for years, this growth is not in those media.Rather, it is distance learning via the Internet that is booming.Among all higher-education institutions offering any distance learning, the percentage of institutions using asynchronous Internet-based technologies nearly tripled, from 22% in 1995 to 60% in 1997-1998.More recent data from Market Data Retrieval (MDR) confirm the trend ("Report: College Net use growing," 2000).MDR relates that, as of the 1999-2000 academic year, 34% of two-and four-year colleges offered accredited degree programs via computer, up from 15% the year before.As of 2000, U.S. institutions reportedly offered more than 6,000 accredited courses on the Web and, by 2002, over 2 million students will be enrolled, a tripling of the 1998 enrollment (Moe & Blodget, 2000).
At the same time, Internet-based distance learning is finding its way into high school.The need is generated by home-schooled students (of which there are over 1 million in the US), districts without a full complement of qualified teachers, and the children of migrant workers.So-called "virtual high schools" have emerged in Alabama, Arizona, California, Florida, Illinois, Indiana, Kentucky, Maryland, Massachusetts, Michigan, Missouri, Nebraska, New Mexico, and Utah (Carr, 1999;Carr & Young, 1999;Kerrey & Isakson, 2000).These programs can cross state lines, with offerings open to students regardless of residence.Of particular note is that both the University of Missouri at Columbia High School and the Indiana University High School have been granted accreditation by the North Central Association of Colleges and Schools (Carr, 1999).Accreditation means that students can apply course grades earned through these online institutions toward their high-school graduation.Both programs offer more than 100 high school courses.
The growth of Internet-based distance learning will have a significant impact upon traditional education.For one, it may threaten the existence of established institutions (Dunn, 2000;Levine, 2000b).Many in the private sector see education as a huge industry that produces mediocre results for a high cost.If the private sector can leverage new technologies, like distance learning, to deliver greater value, the institutions that dominate education today will not be the leaders tomorrow.The rapid growth of for-profit education companies (e.g., the University of Phoenix), and the seemingly endless creation of well-capitalized new ones (e.g., UNext, Caliber, KaplanCollege.com,University Access, K12), suggests that a serious challenge to the existing order is well underway.The gravity of the threat is evident in how non-profits have responded.Cornell University, Columbia University, the University of Maryland, and New York University, among others, have each announced their own for-profit distance learning subsidiaries (Carr, 2000a) !A second reason that the growth of Internet-based distance learning will influence traditional education is that regardless of its impact on nonprofit institutions, the distance learning industry will produce sophisticated software that everyone can use, in school and out.Both Dunn (2000) and Tulloch (2000) suggest that this occurrence will blur the distinctions between distance learning and local education.APEX offers an example (http://apex.netu.com/).This company markets online Advanced Placement (AP) courses, targeting districts that want to offer AP but which do not have qualified teachers.Districts can, thus, use APEX offerings on site.(Note 5) The considerable potential of online learning-local or distance-is reflected in a report to the President and Congress of the bipartisan Web-Based Education Commission (Kerrey & Isakson, 2000).The Commission reached the following conclusion: The question is no longer if the Internet can be used to transform learning in new and powerful ways.The Commission has found that it can.Nor is the question should we invest the time, the energy, and the money necessary to fulfill its promise in defining and shaping new learning opportunity.The Commission believes that we should.(p.134, italics in original) If acted on, the consequences of this statement for assessment are profound.As online learning becomes more widespread, the substance and format of assessment will need to keep pace.Another quote from the Commission's report: Perhaps the greatest barrier to innovative teaching is assessment that measures yesterday's learning goals…Too often today's tests measure yesterday's skills with yesterday's testing technologies-paper and pencil.(p.59) So, as students do more and more of their learning using technology tools, asking them to express that learning in a medium different from the one they typically work in will become increasingly untenable, especially where working with the medium is part of the skill being tested (or otherwise impacts it in important ways).Searching for information using the World Wide Web or writing on computer are examples.(Note 6) These changes in learning methodology offer exciting possibilities for assessment innovation.On site or off, an obvious result of delivering courses via the Internet is the potential for embedding assessment, perhaps almost seamlessly, in instruction (Bennett, 1998).Since students respond to instructional exercises electronically, their responses can be recorded, leaving a continuous learning trace.Depending upon how the course and the assessment are designed, this information could conceivably support a sophisticated model of student proficiencies (Gitomer, Mislevy, & Steinberg, 1995).That model might be useful both for dynamically deciding what instruction to present next and for making more global judgments about what the student knows and can do at any given point.
In addition to assessment embedded in Internet-delivered courses, one can imagine Internet-delivered-assessment embedded in traditional classroom activity.Such assessment might take the form of periodically delivered exercises that both teach and test.In this scenario, the exercises would be standardized and performance might serve, depending upon the level of aggregation, to indicate individual, classroom, school, district, state, or national achievement.Thus, these exercises could serve summative as well as formative purposes and be useful to individuals as well as institutions.If the exercises were of high enough quality, such a model might improve the motivation to participate in voluntary surveys like NAEP.
There are, to be sure, many difficult issues: How can we generate comparable inferences across students and institutions when variation in school equipment may cause items to display differently from one student to the next, potentially affecting performance? 1.
How can we deliver assessment dependably given the unreliable nature of computers and the Internet, and the limited technical support available in most schools? 2.
How might we make sense of the huge corpus of data that the electronic recording of student actions might provide? 3.
How would student learning be affected by knowing that one's actions are being 4.
recorded?How can we prevent assessments that serve both instructional and accountability purposes from being corrupted by unscrupulous students or school staff?

5.
How can we manage the costs of online assessment? 6.
How can we assure that all parties can participate?7.
Let's, for the moment, turn to this last issue.

Are the Schools Ready?
A continuing concern with such reinvention visions is whether schools (and students) are ready technologically and, in particular, what to do about technology differences across social groups.The National Center for Education Statistics (NCES) reports that as of September 1999, 95% of schools were connected to the Internet, up from 35% in 1994 (NCES, 2000).Schools in all categories, (i.e., by grade level, poverty concentration, and metropolitan status), were equally likely to have Internet access.Further, most schools had dedicated lines: only 14% were using dial-up modem, a slower and less reliable access method.(Note7) Clearly many of these schools could have only a single connected machine and that machine could be the one sitting on the principal's desk.How many classrooms were actually wired?According to NCES (2000), as of September 1999, 63% of all instructional rooms had Internet access (up from 3% in 1994, a 20-fold increase in five years).The ratio of students to Internet-connected computers was 9:1, down from 12:1 only a year earlier.These are staggering numbers, for they imply that classrooms are connecting to the Internet at a very rapid rate.
This success is in no small part due to federal efforts.The government's e-rate program has been giving public schools and libraries discounts of up to 90% on phone service, Internet hook-ups, and wiring for several years ("FCC: E-rate subsidy funded," 2000).In total, the program has committed 3.65 billion dollars to over 50,000 institutions, helping connect more than one million public school classrooms (Kennard, 2000).In addition, 70% of the program's last round of funding went to schools in the lowest income areas.
However, even with these very significant efforts, there continue to be equity issues.As of September 1999, in high poverty schools, the ratio of students to Internet computers was 16 to 1.In low poverty schools, it was less than half that amount-7 to 1 (NCES, 2000).
What should we conclude?Certainly, with few exceptions, it would be impossible to deliver large-scale assessment via the Internet today.But the trend is clear: the infrastructure is quickly falling into place for Internet delivery of assessment to schools, perhaps first in survey programs like NAEP that require only a small participant sample from each school, but eventually for inclusive assessments delivered directly to the desktop.As evidence, witness the requests-for-proposals recently released by the state education departments of Oregon, Virginia, and Georgia for building Internet-delivered, state-assessment systems (Department of Education, 2000; Virginia Department of Education, undated, State of Georgia, 2001).
Assuming that every classroom is wired, will all students then have the technology skills needed to take tests on-line?Clearly, more students are becoming computer-familiar every day and developing such skills is a national educational technology goal (Riley, Holleman, & Roberts, 2000).But, as Negroponte (1995) suggests, computer familiarity is really the wrong issue.The secret to good interface design is to make it go away.Thus, advances in technology will eventually eliminate the need to be computer familiar.After nomadic computing, which we are now entering with the proliferation of wireless Internet devices and personal digital assistants, comes ubiquitous computing (Olsen, 2000)-the embedding of new technology into everyday items.Inventions like "radio" paper (Gershenfeld, 1999, p. 18;Maney, 2000; "NCS secures rights," 2000) may allow students to interact with computers in the same way that they interact with paper today.Smart desks are another likelihood, in which case a test may be electronically delivered, quite literally, to every desktop.
In the U. S., then, we may see a future in which every classroom is wired and every student can easily take tests on line.What of the rest of the world?To be sure, the Internet is an American phenomenon.It derives from research sponsored by the Defense Department in the 1960's (Cerf, 1993).As a result of this history, the overwhelming majority of users were, until very recently, from our shores.At this writing, over 60% of Net users reside outside of the United States and the foreign growth rate now exceeds the domestic one ("How many online?",2000; "U.S. dominance seen slipping," 2001).
The largest numbers of foreign Internet users are, of course, in developed nations.These nations have the telecommunications infrastructure and citizens with enough disposable income to afford the trappings of Internet use.But what about developing nations?Will they be left irretrievably behind?The challenges for these nations are undoubtedly great.Over time, however, we should see significant progress in building the infrastructure and the user base here too (Cairncross, 1997;Fernandez, 2000).This progress will occur for at least two reasons.First, the cost of technology has been dropping precipitously and, by Moore's law, will continue to decline.Further, because the future of computing is undoubtedly in wireless devices (Grice, 2000), a telecommunications infrastructure will be much cheaper to acquire than the land-lines of old.Second, as Metcalfe's law suggests, markets will become all the more valuable as they are interconnected.(Witness the global economy and the economic benefits resulting to nations from integration with it.)That developing nations join the e-commerce network means greater opportunity for all.It means more vendor choice for the people of developing nations; more opportunity for developed nations to serve these markets; and a new opportunity for third-world businesses themselves to compete globally.(Note 8) The same holds true for assessment.The Internet will make it easier for developing nations to get access to assessment services from elsewhere and for those nations to distribute their own assessment services regionally or around the world.This ease of access and distribution should make it possible to form international consortia.Such consortia will be able to assemble technical resources that a single nation might not be able to acquire.In addition, those consortia may be able to purchase services from others more efficiently than nations could obtain individually.Finally, an electronic network should make it easier to participate in international studies, bringing the benefits of benchmarking to nations throughout the world.

But is Technology-Based Assessment Really Worth the Investment?
One of the largest instantiations of technology-based assessment to date is computer-based testing (CBT) in postsecondary admissions.As programs like the Graduate Record Examinations, the Graduate Management Admission Test, and the Test of English as a Foreign Language have found, CBT can be enormously costly.Being among the first large-scale programs to move to computer, they bore the brunt of creating the infrastructure for what was essentially a new business.The building of that infrastructure was initiated in the early 1990's before test developers knew how to create tests for computer, before computers were widely available for individuals to take tests on, and before the Internet was ready to bring those tests to students.In essence, these programs needed to build both a factory to stamp out a new product and a new distribution mechanism.A first generation infrastructure now exists, but it is not yet optimized to produce and deliver tests as efficiently as possible.Right now, there's no question about it: for these programs, assessment by computer costs far more than assessment by paper.
If we have learned anything from the history of innovation, it is that new technologies are often initially far too expensive for mass use.That was true of the automobile, telephone service, commercial aviation, and the personal computer, among many other innovations.For example, in 1930 the cost of a three-minute telephone call from New York to London was $250 (in 1990 dollars).By 1995, the cost had dropped to under $1 (World Bank, 1995, cited in Cairncross, 1997, p. 28).As a second instance, when the IBM Personal Computer was introduced in 1981, it cost around $5,000.At the time, the median family income in the United States was on the order of $25,000, so that a computer cost about 20% of the average family's earnings-not very affordable.At this writing, the cost of a computer with many times greater capability is a little more than $500 and the median income is closer to $55,000.(Note 9) A computer now costs about 1% of average income.(Note 10) When a promising new technology appears, individuals and institutions invest, allowing the technology to evolve and a supporting infrastructure to develop.Over the course of that development, failures inevitably occur.Eventually, the technology either dies or becomes commercially viable-that is, efficient enough.
So, who's investing in CBT?At this point, it's an impressive list including non-profit testing agencies, for profit-testing companies, school districts, state education departments, government agencies, and companies with no history in testing at all.The list includes ACT, the Bloomington (MN) Public Schools, CITO (the Netherlands), the College Board, CTB/McGraw-Hill, Edison Schools, ETS, Excelsior College (formerly Regents College), Harcourt Educational Measurement, Heriot-Watt University (Scotland), Houghton-Mifflin, Microsoft, the National Board of Medical Examiners, the National Institute for Testing and Evaluation (Israel), NCS Pearson, the Northwest Evaluation Association, the Oregon Department of Education, the Qualifications and Curriculum Authority (Great Britain), Thomson Corporation, the University of Cambridge Local Examinations Syndicate (UCLES), the U.S. Armed Forces, Vantage Technologies, and the Victoria (Australia) Board of Studies.These organizations are producing tests for postsecondary admissions, college course placement, course credit, school accountability, instructional assessment, and professional certification and licensure (see the Appendix for details.)In concert, they already administer something on the order of 10 million computerized tests each year.(Note 11) Why are these organizations investing?I think it's because they believe that technology-based assessment will eventually achieve important economies over paper and that, fundamentally, assessment will benefit.But I also think it's because they don't want to become Britannica.That is, they see improvements in the business and substance of assessment which, if they fail to embrace, will lead them to the same fate as that encyclopedia publisher.

CBT as a Disruptive Technology
But as the case of admissions testing suggests, the road to improvement may be a difficult one since CBT might not be a typical innovation.Christensen (1997) distinguishes between two types of innovation, called sustaining and disruptive technologies.Sustaining technologies enhance the performance of established products in ways that mainstream customers have traditionally valued.Historically, most technological advances in any given industry have been sustaining ones (e.g., in the personal computer industry, faster chips and bigger, higher-resolution monitors).Occasionally, disruptive technologies emerge.Companies introduce these technologies hoping their features will provide competitive edge.However, these features characteristically overshoot the market, giving customers more than they need or are willing to pay for.Thus, disruptive technologies result in worse product performance, at least in the near-term, on key dimensions in a company's established markets.
Interestingly, a few fringe customers typically find a disruptive technology's new features attractive.In these niche markets, such technology may thrive.If and when it advances to the level and nature of performance demanded in the mainstream market, the new technology can invade it, rapidly knocking out the traditional technology and its dependent practitioners.Remember Britannica.
CBT has many of the characteristics of a disruptive technology.Established testing organizations are applying it in their mainstream markets, most notably postsecondary admissions.This innovation was introduced, in good part, to provide competitive edge through features like the ability to take a test at one's convenience and to get score reports immediately.As it turned out, these features overshot the market.At least initially, registrations for continuously-offered computer-based admissions tests mirrored those for fixed-date administrations, suggesting that scheduling convenience was not a highly valued feature in the market of the time.Moreover, examinees were dissatisfied with losing some of the features of paper exams, including the ability to proceed through the test nonlinearly, the option to review the scoring of items actually taken, and the low cost (Perry, 2000).
Although it encountered difficulty in the mainstream admissions testing market, CBT found more rapid acceptance in the niches.One example is information technology (IT) certification, which individuals pursue to document their competence in some computer-related proficiency.In 1999, over three million examinations in 25 languages were administered in this market (Adelman, 2000).Most of these tests were delivered on computer and most were offered on a continuous basis.Three delivery vendors provided the bulk of examinations: CAT, Inc. (a subsidiary of Houghton-Mifflin), Prometric (a subsidiary of Thomson Corporation), and Vue (a subsidiary of NCS Pearson).Together, these vendors operated some 5,000 testing centers in 140 countries.As of June, 2000, over 1.9 million credentials had been awarded, most for Microsoft or Novell technologies.
Why is the CBT of today so well suited to this market niche?Let's start by asking what features a testing product must have to succeed in this niche.First, it must be continuously offered because these test candidates build technology skill on their own schedules-at home or on the job, very often through books or online learning.These individuals want to test when they are ready, not when the testing companies are.Second, such a test must generally be offered on computer since technology use is the essence of the certification.
What are the financial considerations associated with serving this market?One consideration is whether the test fee can cover the cost of assessment.As it turns out, this market is less price-sensitive than postsecondary admissions.Why?With IT testing, employers pay the fee for over half the candidates (Adelman, 2000).In addition, certified employees command a substantial salary premium (4-14%), which makes examinees more willing to absorb the higher fees that CBT currently requires.A second consideration is that security is not as critical as in admissions testing, so large item pools are not needed, reducing production cost.Lower security is tolerable because if an individual appears on the job with a dishonestly obtained credential but without the required skill, he or she will not last.Finally, test volume is self-replicating: there are many repeat test takers because information technology changes rapidly, so skills must be updated constantly.From an innovation perspective, then, IT certification may be one context in which the CBT of today can flourish and develop to better meet the needs of other assessment markets.
So why do industry leaders tend to fail with disruptive technology while fringe players succeed?Industry leaders often fail precisely because they attempt to introduce disruptive technologies into major markets before it's time (Christensen, 1997).Because niche markets are often too small to be of interest, leaders do not pursue those opportunities to refine the technology.Instead, they give up, having run out of resources or credibility.Making a disruptive technology work requires iteration and iteration means failure.Because they risk neither large resources nor reputations in the mainstream market, it is the fringe players who can fail early, often, and inexpensively enough to eventually challenge and overtake the industry leaders.

Toward the Technology Based Assessment of Tomorrow
Are there other niche markets in which CBT might evolve?One such niche may be online learning.If we believe the Web-Based Education Commission (Kerrey & Isakson, 2000), online learning will become a major enterprise, especially for the lifelong updating of skills.In this market, institutions will be less concerned with questions of who gets in and more with who gets out, and what it is they have to do to get out (Messick, 1999).Why?Because once hired, businesses are becoming more concerned with what employees know and can do, and less with where they went to school.Similarly, individuals are becoming more concerned with finding course offerings that meet their skill development goals and less with whether those offerings come from one institution or a half-dozen.
What's the assessment need?First, it is for knowledge facilitation and, second, for knowledge certification; that is, to help people develop their skills and then document that they've developed them.What's the assessment challenge?The challenge is to figure out how to design and deliver embedded assessment that provides instructional support and that globally summarizes learning accomplishment.In other words, the challenge is to combine richness with reach to achieve mass customization-use the Internet's ability to deliver the richness of customized assessment to reach a mass audience.
Can assessment be customized?In very rudimentary ways, it already is.Certainly, we can dynamically adapt along a global dimension, as is done in many of today's computerized tests.But as we move assessment closer to instruction, we should eventually be able to adapt to the interests of the learner and to the particular strengths and weaknesses evident at any particular juncture, as intelligent tutors now do (e.g., Schulze, Shelby, Treacy, & Wintersgill, 2000).Likewise, we should be able to customize feedback to describe the specific proficiencies the learner evidenced in an instructional sequence.
But perhaps the most far-reaching customization of assessment will come through modular online courses, whereby an instructor-or even a sophisticated learner-assembles a series of components into a unique offering.The Department of Defense (DOD) has taken a significant step through the Sharable Courseware Object Reference Model (SCORM) (www.adlnet.org).SCORM is to embody specifications and guidelines providing the foundation for how DOD will use technology to build and operate the learning environment of the future.SCORM will allow mixing and matching of learning segments to create lower cost, reusable training resources.(Note 12) If embedded assessment can be built into course modules following a similar set of conventional specifications, the assessment too will be customized by default.

Conclusion
Whether for postsecondary admissions, school and student accountability, or national policy, large-scale assessment must be reinvented.Reinvention is not an option.If we do not reinvent it, much of today's paper-based testing will become an anachronism-"yesterday's testing technology," in the words of the Web-Based Education Commission (Kerrey & Isakson, 2000)-because it will be inconsistent with what and how students learn.
This reinvention must occur along both business and substantive lines.As educators, we often behave as if business considerations are unimportant, even distasteful.However, the business and substance of assessment are intertwined.Even for non-profit educational institutions-state education departments, federal agencies, schools, research organizations-providing quality assessment for a low cost matters.Using new technology to do assessment faster and cheaper can free up the resources to do assessment better.
We will be able to do assessment better because advances in technology, cognitive science, and measurement are laying the groundwork to make reinvention a reality.Whereas the contributions of cognitive and measurement science are in many ways more fundamental than those of new technology, it is new technology that is pervading our society.My thesis, therefor, is that new technology will be the primary facilitating factor precisely because of its widespread societal acceptance.(Note 13) In the same way that the Internet is already helping to revolutionize commerce, education, and even social interaction, this technological advance will help revolutionize the business and substance of large-scale assessment.It will do so by allowing richness with reach-that is, mass customization on a global scale-as never before.However, as the history of innovation suggests, this reinvention won't come immediately, without significant investment, or without setback.With few exceptions, we are not yet ready for large-scale assessment via the Internet (at least in our schools).However, as suggested above, this story is not so much about today.It really is about tomorrow.

Notes
This article is based on a paper presented at the annual conference of the International Association for Educational Assessment (IAEA), Jerusalem, May 2000.
I appreciate the helpful comments of Isaac Bejar, Henry Braun and Drew Gitomer on an earlier draft of this manuscript.
allows for the representation of unlimited classes of documents.Leadership in developing and implementing the many standards used by the Internet is provided by the World Wide Web Consortium (www.w3.org).For more on Internet standards, see their website or see Green (1996), who gives a more basic introduction.According to Neilsen//NetRatings, 56% of U.S. households had Internet access as of November 2000 ("Internet access tops 56 percent," 2000).

2.
And it works.eBay is reported to be the most successful company in cyberspace, with 22.5 million registered users and 2000 revenues of $430 million (Cohen, 2001).Why?It has none of the costs of retailing: No buying, no warehousing, no shipping, no returns, no overstock.

3.
A recent, but potentially significant, addition to this population is the U.S. Army.In July, 2000, Secretary of the Army, Louis Caldera, announced a 600 million dollar program to allow any interested soldier to take college courses over the Internet at little or no cost (Carr, 2000b).

4.
A second, perhaps more interesting, example is Florida's Daniel Jenkins Academy, where students physically attend but take all academic courses on-line from off-site teachers (Thomas, 2000).

5.
Russell has conducted several studies on the mismatch between learning and testing methods in writing (e.g., Russell & Plati, 2001).The repeated result is that the writing proficiencies of students who routinely use word processors are underestimated by paper-and-pencil tests.

6.
The Teaching, Learning, and Computing-1998 survey provides similar data (Anderson & Ronnkvist, 1999).This survey, conducted using a national probability sample in Spring 1999, reports Internet access in 90% of schools and at least medium-speed, dedicated connections in 57%.

7.
Developing a technology infrastructure and integrating into the e-commerce network may, in fact, help jump-start the growth required to deal with the serious problems of public health, education, and welfare that these countries typically face (Friedman, 2000).

9.
Price and quality-adjusted data tell a similar story.In 1983, the quality-adjusted cost of a personal computer in constant 1996 dollars was $1098 (D. Wasshausen, personal communication, April 13, 2000).By 1996, the cost of a PC, holding quality constant, was $100, less than a tenth of the 1983 cost.By 1999, that quality-adjusted PC had further deflated to $29. www.imsproject.org/aboutims.html ).IMS is developing open specifications for facilitating distributed learning activities such as locating and using educational content, tracking learner progress, reporting learner performance, and exchanging student records between administrative systems.Both IMS and SCORM incorporate XML (see note 1 above).
That the largest facilitating factor will be technological is not to say that we 13.
estimate on unduplicated volumes claimed by Thomson Prometric ( 11. www.prometric.com),Vantage Technologies (www.intellimetric.com/index.html ), and the U.S. Armed Forces (A.Nicewander, personal communication, November 2, 2000).These three organizations alone claim some 8.5 million tests annually.These tests include both high-stakes and low-stakes assessments.SCORM is being built upon the work of the IMS Global Learning Consortium (IMS) ( 12.