This article has been retrieved   times since October 16, 2001

   other vols.   |   abstracts   |   editors   |   board   |   submit   |   comment   |   subscribe   |   search


 

Education Policy Analysis Archives

Volume 9 Number 42

October 16, 2001

ISSN 1068-2341


A peer-reviewed scholarly journal
Editor: Gene V Glass, College of Education
Arizona State University

Copyright 2001, the EDUCATION POLICY ANALYSIS ARCHIVES .
Permission is hereby granted to copy any article
if EPAA is credited and copies are not sold.

Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education.

Significance of Test-based Ratings
for Metropolitan Boston Schools

Craig Bolon
Planwright Systems Corporation (USA)

Citation: Bolon, C. (2001, October 16). Significance of Test-based Ratings for Metropolitan Boston Schools. Education Policy Analysis Archives, 9(42). Retrieved [date] from http://epaa.asu.edu/epaa/v9n42/.

Abstract

In 1998 Massachusetts began state-sponsored, annual achievement testing of all students in three public school grades. It has created a school and district rating system for which scores on these tests are the sole factor. It proposes to use tenth-grade test scores as a sole criterion for high school graduation, beginning with the class of 2003. The state is treating scores and ratings as though they were precise educational measures of high significance. A review of tenth-grade mathematics test scores from academic high schools in metropolitan Boston showed that statistically they are not. Community income is strongly correlated with test scores and accounted for more than 80 percent of the variance in average scores for a sample of Boston-area communities:

Once community income was included in models, other factors--including percentages of students in disadvantaged populations, (Note 1) percentages receiving special education, percentages eligible for free or reduced price lunch, percentages with limited English proficiency, school sizes, school spending levels, and property values--all failed to associate substantial additional variance. Large uncertainties in residuals of school-averaged scores, after subtracting predictions based on community income, tend to make the scores ineffective for rating performance of schools. Large uncertainties in year-to-year score changes tend to make the score changes ineffective for measuring performance trends.

 

Contents

Section 1: Background

A. State Testing in Massachusetts Public Schools

The most recent form of state testing in Massachusetts public schools is the Massachusetts Comprehensive Assessment System (MCAS), a set of achievement tests sponsored and produced by the Massachusetts Department of Education and administered in the spring of years beginning in 1998 (see Appendix 1 and Bolon, 2000). These tests have four sections: English language arts, mathematics, science and technology, and history and social science. For the years 1998-2000, tests were administered in grades 4, 8 and 10. Beginning in April, 2001, the former grade 4 test sections were divided between grades 4 and 5; new tests have been added in grades 3, 6 and 7.

MCAS tests are loosely timed and include questions in multiple choice, short answer and extended answer formats; they are provided in English and Spanish. All public school students are required to take MCAS tests; there are no "opt-out" provisions. Students taught in parochial schools, other private schools, home schooling and out-of-state schools are not required to take or pass MCAS. For 1998 through 2000, the Department of Education has published test questions used for scoring approximately six months after test administration (see Mass. DoE, 2000g, for example). It has produced new test forms each year. According to current plans, starting with the class of 2003 minimum scores on the English language arts and mathematics sections will be required to graduate from high school (Mass. DoE, 1999f) and to enroll at state colleges, except for MCAS test preparation courses at two-year colleges.

The Massachusetts Department of Education publishes MCAS results as scaled scores in a range of 81 scale points, using the integers 200 through 280 (Mass. DoE, 2000h). The Department assigns labels it calls "performance levels" to four scaled score intervals (Mass. DoE, 1998b) and currently considers 220 the minimum passing scaled score on all test sections (Mass. DoE, 1999f). The Department has not fully disclosed details of assigning scale factors, assuring consistent scores across test forms or assuring that scores quantitatively reflect published academic standards. It has not published distributions of either raw scores or scaled scores. It has released limited information about test design and properties in "technical reports" for the years 1998 (Mass. DoE, 1999c) and 1999 (Mass. DoE, 2000i). After an independent analysis of score averages by "racial" and "ethnic" categories for 1998 (Uriarte and Chavez, 1999), the Department published its own analysis of this type for 1999 (Mass. DoE, 2000c).

Massachusetts tests appear to rank near the high end of state achievement tests in difficulty, although failure rates are lower than those for some tests used in Arizona and Virginia. As in several other states, substantially higher failure rates are found on mathematics than on language tests in high schools. Since the tenth-grade version of the mathematics test section sets the graduation threshold for most students, its scores have been used as subjects for these studies.

B. Schools in the Boston Metropolitan Area

Metropolitan Boston is diverse. Besides the City of Boston it includes many smaller municipalities, all operating their own school systems. These studies consider communities inside Route 128, a highway designed in the late 1940s (now an Interstate), enclosing areas within about 9-12 miles from Boston's government center--that is, Boston and its inner and middle suburbs. They share a public transit system, several public and private utilities, and an economy dominated by service industries. They include poverty areas, concentrations of wealth, middle-income communities, prosperous suburban towns, a few medium-sized cities and one large city. The areas are bounded by the Massachusetts Bay and Atlantic Ocean to the east, Salem and Peabody to the north, Waltham and Newton to the west, and Braintree and Quincy to the south (see Metropolitan Area, 1997).

Schools in the Boston metropolitan area are also diverse. These studies, focusing on testing for graduation, consider only high schools. While a majority of the area's population of high-school age attends public schools, (Note 2) a substantial proportion attends parochial schools that began to be established by the Roman Catholic Church more than 150 years ago. A smaller fraction is taught in other private schools or though home schooling. The Metropolitan Council for Educational Opportunities (METCO), founded in 1963, uses state funding to help send over 3,000 Boston minority students to suburban schools (Orfield, et al., 1997).

Within public school systems there is also substantial diversity. All communities must support regional "vocational," "technical" and "agricultural" high schools. Some such schools began as "manual training" schools in the 1800s. Some communities have closed their local vocational schools; some have merged them with their academic schools. These studies look in detail only at academic schools, because the curriculum of vocational schools is substantially different and is not designed to prepare students for MCAS tests, an issue of controversy (Nicodemus, 2000). For purposes of these studies there are difficulties with a few communities, including Cambridge, Quincy, Revere and Waltham, which provide vocational education in the same facilities as academic programs (Mass. DoE, 2000f). I chose to include such schools in these studies while noting their special characteristics.

Several communities also operate experimental schools, including "pilot schools" in Boston and "charter schools" in several communities (Partee, 1997, and Wood, 1999), as regulated under the Massachusetts Education Reform Act of 1993. All that offer ninth-grade curriculum and above are smaller than the regular academic schools. These schools provide motivational environments and may exercise indirect forms of student selection that differentiate them from other public schools. Primarily because of concerns about small sample sizes, schools with fewer than 100 students per grade are excluded from these studies. So far no experimental school is that large.

The City of Boston presents a unique situation. Of its large academic high schools, three are exam schools: the Boston Latin School (founded in 1635) and the more recent Latin Academy (formerly Girls Latin) and O'Bryant School of Mathematics and Science (formerly Boston Technical). These draw away many Boston students who tend to score well on achievement tests, promoting a longstanding social stratification in Boston schools. Over half the students at Boston Latin come to it from parochial and other private schools (Daley, 1997); some say those students would not otherwise attend Boston schools. However, other public school students who are not admitted leave the district for high school. Starting in 1975, because of federal court orders to desegregate, exam school admission policies included a 35 percent set-aside for African American and Latino students, maintained voluntarily after 1987. As a result of another federal court decision (McLaughlin, 1996), this approach was weakened in 1997. As with academic schools that provide vocational education, the Boston exam schools are included in these studies, but their special characteristics are noted.

C. Statewide MCAS Test Results

Table 1-1 shows that statewide, tenth-grade MCAS test scores have remained nearly constant in English language arts and in science and technology for the years 1998-2000, while scores in mathematics have risen substantially (Mass. DoE, 2000h). (Tenth-grade tests were not given in history and social science.)

Table 1-1

MCAS Statewide Results, 1998-2000

Section

Year

Average

% Level 4

% Level 3

% Level 2

% Level 1

English

2000

229

7

29

30

34

English

1999

229

4

30

34

32

English

1998

230

5

33

34

28

Math.

2000

228

15

18

22

45

Math.

1999

222

9

15

23

53

Math.

1998

222

7

17

24

52

Science

2000

226

3

23

37

37

Science

1999

226

3

21

39

38

Science

1998

225

1

21

42

36

Source of data: Mass. DoE, 2000h

Table 1-1 reflects Massachusetts Department of Education practice of recording students absent for a test section as scoring 200 and in Level 1, the lowest level (Mass. DoE, 2000h, Table 11 footnote). An undisclosed fraction of students were excluded from testing because of special conditions and are not counted in this report; others may have been provided with an alternative assessment. As currently planned, students in Level 1 will be ineligible to graduate from high school as of 2003. Based on this record of scores, about half of all Massachusetts public school students are at risk of being denied graduation.

The labels of the four "performance levels" designated by the 1998 Board of Education(Note 3) for reporting MCAS results are:

  • Level 4, "Advanced"
  • Level 3, "Proficient"
  • Level 2, "Needs Improvement"
  • Level 1, "Failing"

Although these levels have qualitative descriptions (Mass. DoE, 1998b), there are no quantitative links to levels of achievement specified in academic standards; content of standards has not been prioritized; nor have standards been promulgated through state regulations, as anticipated by law. (Note 4) Although Massachusetts law requires "competency determination" in mathematics, science and technology, history and social science, foreign languages and English, (Note 5) Massachusetts laws and regulations continue to require only US history and physical education as subjects of instruction. Massachusetts tries to set legal standards for learning indirectly (Note 6) through MCAS tests, procedures to set scale factors, and regulations for minimum scaled scores. It lacks corresponding legal commitments for instruction. It has made major changes to "curriculum frameworks" every few years (Mass. DoE, 2000k) and has not provided reasonable spans of time for instruction to catch up before using revised "curriculum frameworks" as a basis for revised MCAS tests. Its teachers, parents and students cannot find out exactly what must be learned in order to meet minimum standards for high school graduation. The 1993 Education Reform Act left several such problems; few have been addressed yet by the Massachusetts legislature or Board of Education.

Students with disabilities (also called special education students) and students with limited English proficiency (LEP students) tend to receive drastically lower MCAS scores than other students, although some students with disabilities are soon to be provided alternate assessments (Mass. DoE, 2000l), and some LEP students have been able to take tests in Spanish (Mass. DoE, 2000d). The Department of Education has not disclosed the fractions of students who are eligible for or have utilized its special accommodations, although it has published statewide summary data using these student categories (Mass. DoE, 2000i, Table 14.5). Most minority students also receive lower scores than other students. The Department of Education has published 1999 statewide and district summary data for students categorized as "African American / Black," "Asian or Pacific Islander," "Hispanic / Latino," "Native American," "White" and "Mixed" (Mass. DoE, 2000c, Tables 5-10). As previously noted, most students in vocational programs receive lower MCAS scores than students in academic programs; this can readily be shown for the state's more than 30 vocational, technical and agricultural high schools (Appendix 2).

Based on the sources of information cited, Table 1-2 shows statewide impacts of these known risk factors on average 1999 tenth-grade mathematics scores and rates of failure.

Table 1-2

MCAS 1999 Grade 10 Math Scores by Risk Factors

Category

Average Score

Percent Failing

All students

222

53

Students with disabilities

203

92

Limited English proficiency

206

84

African American

209

80

Hispanic / Latino

208

85

Native American

211

77

Vocational, technical, agricultural

210

78

Source of data: Mass. DoE, 2000i

The Department of Education has not reported scores classified by other potential risk factors on which it collects information. These include: Gender of students, Tests taken in Spanish or as alternate assessments, Free or reduced price lunches, as indicators of poverty, Schools with large class sizes, especially in early grades, Students retained below grade or placed below grade level, Teachers who lack certification in their subjects of instruction.

There is also little published information about combinations of risk factors. However, since the Department of Education lists regional vocational, technical and agricultural schools as separate districts in its reports of MCAS results, it is possible to use their categories of minority students (Appendix 2). For those schools for which categories are reported, results are shown in Table 1-3.

Table 1-3

MCAS 1999 Grade 10 Math Scores by Combined Risk Factors

Combined Category

Average Score

Percent Failing

Vocational + African American

203

97

Vocational + Hispanic / Latino

205

95

While the results in Table 1-3 are not strictly comparable with Table 1-2, because not all the schools and categories can be found in published data, they indicate that factors can combine to worsen the scores of students with more than one risk factor.

D. Test Score Studies

Recent studies question assumptions that "high-stakes" tests like MCAS can provide valid measures of either student achievement or school performance, showing gains on them that are not matched by gains on other tests for closely related educational content (Haney, 2000, and Klein, et al., 2000). Political environments of "high-stakes" tests create heavy pressure to improve scores, regardless of underlying educational progress. For "low-stakes" tests aimed at measuring long-term trends, like those of the federal NAEP, it has been shown that "family variables explain most of the variance across scores in states" (Grissmer, et al., 2000, Chapter 9). Individual and longitudinal studies demonstrate strong influences of parenting practices, family structure, parent education and degrees of poverty on cognitive development (for example, Smith, et al., 1997). Other longitudinal and cross-sectional studies show cumulative responses of test scores to educational environments (for example, Phillips, et al., 1998, and Ferguson, 1998). However, the data generally available for test score research fail to capture much of the critical information needed to understand development of cognitive abilities and educational achievement in the settings of public schools.

MCAS test scores have already been the subject of several attempts to explain, predict or interpret them (Mass. DoE, 2001, Gaudet, 2001, Tuerck, 2001a, and Tuerck, 2001b). These prior MCAS test score studies fall into three main categories: 1) Trends studies of year-to-year and multi-year changes; 2) Effects studies involving social factors for the population; 3) Effects studies involving operating factors for the schools.

Research on scores from school-based standard tests suggests that many such studies are likely to yield results of low significance. Grissmer, et al., 2000, among others, show that:

  • Real year-to-year changes in average student performance, as assessed by conventional tests, are relatively small; they can easily be masked by statistical uncertainties.
  • Social factors are strongly associated with test scores.
  • Self-reported social information tends to have high error and omission rates.
  • Census and other community-based social information often includes confounding factors that require adjustment to reflect the households for a school population.
  • Uncategorized school spending is only weakly associated with test scores.

The MCAS test score studies cited use scores and statistical data to estimate the performance of schools or districts according to simple formulas, unsupported by other evidence. They frequently present results in a table that is ranked or can be ranked like the teams in a sports league. The "league table" approach to presenting such results begs the question of whether the ordering of schools or districts and the differences in performance estimates have educational significance, that is, whether such rankings may instead be largely matters of chance or be associations with factors other than school performance. This article presents a trends study and an effects study I conducted to explore the significance that can be associated with such results.

E. Sources of Data

The school characteristics used in these studies are taken from information reported by public schools to the Massachusetts Department of Education for 1999 and published by the Department (Mass. DoE, 2000f). MCAS test scores summarized by schools are from 1998-2000 Department reports (Mass. DoE, 2000h). Other information is published by the Department for school districts, including program budgets and percentages of special education students. Information for census tracts and communities is available from the US Bureau of the Census and other sources. Data analysis for these studies focuses on information associated with individual schools because aggregate information for school districts or general populations can mask school characteristics. Data used in these studies are reproduced in Appendix 3 and Appendix 4; interested readers can confirm them at the sources and can repeat these studies or perform other analysis with them.

The Department of Education and the school districts collect other potentially useful information that is not currently published. Of particular interest are data on class size and teacher preparation. Recent research has shown significant association of educational achievement as measured by "low-stakes" tests with small class size in elementary schools (Nye, et al., 1999, and Krueger, 1999) and with teacher certification and education (Darling-Hammond, 2000), after adjustments for student backgrounds. Studies of the development of cognitive abilities cast doubt on whether other information currently published by government sources about population and economic characteristics in large geographical areas would substantially improve the understanding of test scores.

A. Trends Study of Variability

This study considers 47 academic high schools in 32 metropolitan Boston communities through the average tenth-grade MCAS mathematics test scores recorded for years 1998-2000. Achievement tests in mathematics typically require substantial skill at language interpretation (see, for example, Gipps and Murphy, 1994, Chapter 6, p. 183). Haney, 2000, in a study of another state, found stronger correlations of state mathematics test scores with grades in English than with grades in math. As previously noted, the tenth-grade mathematics test is used in this study of significance because it sets a graduation threshold for most students.

Test boycotts have been organized by students in several schools each year (Steinberg, 2000), involving 10 to 31 percent of students in 19 cases out of the 141 test samples. To be able to compare average scores of schools more accurately, the average scores reported by the Department of Education have been adjusted by removing the scores of 200 that were assigned to students who did not take the test, averaging only scores of students who participated.

Table 2-1 shows changes in schools' average scores (Appendix 3) between 1998 and 1999 and between 1999 and 2000, expressed in units of scale points and of standard deviations.

Table 2-1

MCAS Grade 10 Math Test Score Changes by School, 1998-2000

Changes 1998-1999

Changes 1999-2000

City or Town

High School

Points

Delta

Points

Delta

Arlington

Arlington

4

3

5

-1

Belmont

Belmont

-2

-4

4

-2

Boston

Boston High

1

0

6

0

Boston

Brighton

2

1

3

-4

Boston

Charlestown

-1

-2

4

-2

Boston

Dorchester

-2

-3

1

-5

Boston

East Boston

1

0

5

-1

Boston

Hyde Park

0

-1

1

-5

Boston

Jeremiah Burke

4

3

3

-3

Boston

South Boston

0

-1

5

-1

Boston

The English High

1

0

4

-2

Boston

West Roxbury

3

2

5

-1

Boston Exam

Boston Latin

8

10

8

3

Boston Exam

Latin Academy

3

2

19

17

Boston Exam

O'Bryant Science

4

4

8

3

Braintree

Braintree

-1

-3

13

10

Brookline

Brookline

2

1

5

-1

Cambridge

Rindge & Latin*

-2

-5

-1

-10

Chelsea

Chelsea

1

0

3

-4

Dedham

Dedham

1

0

11

6

Everett

Everett*

5

5

-3

-12

Lexington

Lexington

-1

-3

5

-1

Lynn

Classical

0

-2

7

1

Lynn

English

1

0

11

7

Malden

Malden

2

1

4

-3

Marblehead

Marblehead

-5

-7

9

3

Medford

Medford*

2

1

7

1

Melrose

Melrose

1

0

2

-5

Milton

Milton

-1

-3

4

-2

Newton

North*

-3

-6

9

5

Newton

South

2

1

10

5

Peabody

Veterans*

1

0

6

0

Quincy

North Quincy

3

2

7

1

Quincy

Quincy*

-2

-4

8

3

Revere

Revere*

2

1

6

0

Salem

Salem*

1

0

4

-2

Saugus

Saugus

4

3

7

1

Somerville

Somerville*

-1

-3

2

-5

Stoneham

Stoneham

-6

-8

8

2

Swampscott

Swampscott

16

17

-1

-8

Wakefield

Memorial

-3

-5

4

-2

Waltham

Waltham*

-1

-3

8

3

Watertown

Watertown

10

10

2

-4

Weymouth

Weymouth*

0

-2

5

-1

Winchester

Winchester

-1

-3

6

0

Winthrop

Winthrop*

5

4

4

-2

Woburn

Woburn

0

-2

11

7

* school providing vocational education

Source of data: Appendix 3

A "delta" is a change in scale points between two years, minus the average change for the year span, divided by a standard error of the change that is estimated from test reliability and number of participants. The uncertainty for one test score is based on an average standard error of 6.7 scale points, from reliability estimated by the Department of Education for the tenth-grade mathematics test of 1998, using randomized split-half comparisons (Mass. DoE, 1999c). The variance of an average score for a school is estimated by the square of this quantity, plus variance contributed by roundings, divided by the number of test participants.

A delta expresses the change for a particular school beyond the average change, in units of estimated standard errors. When standard errors are accurately estimated, deltas outside +/- 2 are statistically significant at the p< .05 level, but here about half of the schools in both year spans have deltas well outside this range. Especially large deltas were those shown in Table 2-2.

Table 2-2

Large Changes in MCAS Grade 10 Math Test Scores, 1998-2000

High School

Year Span

Delta

Boston Latin

1998-1999

+10

Latin Academy

1999-2000

+17

Braintree

1999-2000

+10

Rindge & Latin*

1999-2000

-10

Everett*

1999-2000

-12

Swampscott

1998-1999

+17

Watertown

1998-1999

+10

* school providing vocational education

Source of data: Table 2-1

Statistically large deltas occur about ten times as frequently as they would by chance, were test reliability the only factor in standard errors. Since deltas adjust for changes in test difficulty and in average efforts at teaching, learning and test preparation, there appear to be other factors affecting test score changes that differ from school to school and from year to year. However, the Department of Education has not published any studies on other variability factors.

Table 2-3 reflects data for all 47 schools included in this study, showing the average score changes in scale points for the 1998-2000 years, weighted by numbers of test participants.

Table 2-3

Average Changes in MCAS Grade 10 Math Test Scores, 1998-2000

Year span

Average point change

Standard error from test reliability

1998-1999

+1.3

0.1

1999-2000

+5.9

0.1

Source of data: Table 2-1

The average score change for all schools would be highly significant for both year spans at the p<.0001 level and better if test reliability were the only factor in variability. Scatterplots of score changes in Figure 2-1 and Figure 2-2, showing deltas for a span versus average scores in 1998, do not indicate strong relationships but show possible outliers; they are schools noted in Table 2-2. These plots of year-to-year changes provide a picture of the variability in school-averaged test scores, which is much greater than estimates based on test reliability and number of test participants. Evaluating year-to-year point score changes, excluding three cases that have deltas beyond +/- 10 as outliers, yields a practical measurement of variability for school-averaged MCAS test scores.

Figure 2-1: Changes in MCAS Grade 10 Math Test Scores, 1998-1999

.

.

Figure 2-2: Changes in MCAS Grade 10 Math Test Scores, 1999-2000

Standard deviations of school-averaged score change, less average score change for a year span for all schools, are 2.9 scale points for 1998-1999, 3.1 scale points for 1999-2000 and 3.0 scale points for both spans combined. These are about five times larger than the uncertainty estimated on the basis of test reliability. As one might expect, scatterplots of score changes in scale points versus school sizes show greater variability for smaller schools. An analysis found that an equivalent standard error in scale points for school averages of grade 10 MCAS mathematics test scores can be estimated as the constant 33. divided by the square root of the number of test participants. This estimate of standard error combines contributions from test reliability with random variations in student mental alertness, student backgrounds and school performance. For typical metropolitan Boston schools from these studies, school-averaged test score gains, losses or differences of less than about five scale points are not statistically significant at the p<.05 level.

Average mathematics score increases from 1998 to 2000 for all schools in these studies combined substantially exceed the rates of change that other studies have found to reflect genuine and sustainable improvements in learning. Based on test reliability as a measure of variance, the change is more than 20 standard deviations; based on calculated variance, it is more than four standard deviations. As noted, other tenth-grade MCAS tests showed little score change. Statistical magnitudes of mathematics test score changes strongly suggest causes other than or in addition to improvements in learning. Anecdotal accounts report heavy efforts at test preparation in some schools, but the general upsweep in scores indicates that the 2000 mathematics test may also have been significantly easier than corresponding 1998 and 1999 tests. Sparse published information from the Department of Education about test calibration makes this issue difficult to trace.

B. Effects Study Involving Social Factors

This study, like the trends study in Section 2A, considers 47 academic high schools in 32 metropolitan Boston communities, through average tenth-grade MCAS mathematics test scores as recorded for 1999. School-specific social factors considered in this study (Appendix 4), as of 1999, are listed in Table 2-4.

Table 2-4

School-specific Factors for MCAS Grade 10 Math Test Scores

Factor

Range of values

A. School population, average per grade

140-495

B. Percent African American

0.5-86.0

C. Percent Asian or Pacific Islander

0.3-31.8

D. Percent Hispanic / Latino

0.2-63.7

E. Percent limited English proficiency

0.0-45.5

F. Percent free or reduced price lunch

1.6-64.3

G. Percent reduction, grades 9+10 to 11+12

­12.9-47.0

Source of data: Appendix 4

The factors in Table 2-4 were used as independent variables in linear models for a dependent variable of 1999 school-averaged tenth-grade MCAS mathematics test scores. (Note 7) Residuals from the models are considered as estimators of school performance. Variances and error estimates are calculated by conventional multivariate methods (Bevington and Robinson, 1992).

For 1999, MCAS tests were, in the parlance of US public schools, a "medium-stakes" enterprise, associated with some indirect social pressures but no hazardous consequences for students. No current high school students stood to be denied graduation because of test scores, although summaries of test scores by districts and schools were being published. This was the second year of regular testing. It had been preceded by a year of trial testing, following three years of curriculum specification and test development.

Note that grade size reduction, calculated as a percent decrease in grades 11 and 12 school population as compared with grades 9 and 10, is not identical with dropout rate. While dropout statistics are available, as in other states they are compromised by lack of consistent longitudinal data for educational outcomes. Grade size reduction simply indicates that, for whatever reasons, later year high school grades are smaller than earlier year grades. When substantial, it suggests that many students do not graduate in a normal time sequence. (Note 8) Values in about +/- 5 to +/- 8 percent ranges will be typical of fluctuations for schools of these sizes with stable area boundaries and populations and very low transiency, retention and dropout rates, according to Poisson statistics.

The Department of Education publishes only district information about budgets and special education. Without an accurate means to apportion such measurements to high schools (and in four communities to specific high schools), they have been excluded from consideration. There is considerable variation. District percentages of special education students ranged from 11.1 to 25.5 percent of all students and grades in 1999. Annual district spending reported for all regular education programs ranged from $3,986 to $9,251 per student in 1999. Even within this small group of communities, the Education Reform Act of 1993 failed to equalize school spending.

Factor distributions and correlations for this model are shown in Figure 2-3, a matrix of histograms and scatterplots with unweighted, unconstrained lines of best fit.

Figure 2-3: Factor Distributions for MCAS Grade 10 Math Test Scores

Although there are strong associations in Figure 2-3, such as "Percent Hispanic / Latino" with "Percent limited English proficiency," there are no factors so highly correlated as to be entirely redundant. The numerical correlation matrix is presented in Table 2-5, corresponding to the matrix of plots in Figure 2-3, with all values beyond about +/- 0.5 significant at the p<.05 level.

Table 2-5

Factor Correlations for MCAS Grade 10 Math Test Scores

Factor

A

B

C

D

E

F

G

A. School population, average per grade

1.00

-.11

.24

.02

-.03

-.01

.16

B. Percent African American

-.11

1.00

.07

.56

.80

.80

.28

C. Percent Asian or Pacific Islander

.24

.07

1.00

.08

.13

.23

.27

D. Percent Hispanic / Latino

.02

.56

.08

1.00

.77

.84

.65

E. Percent limited English proficiency

-.03

.80

.13

.77

1.00

.88

.50

F. Percent free or reduced price lunch

-.01

.80

.23

.84

.88

1.00

.60

G. Percent reduction, grades 9+10 to 11+12

.16

.28

.27

.65

.50

.60

1.00

Sources of data: Appendix 4, Statistica model

Some of the correlations in Table 2-5 are strong enough that multiple regression coefficients are likely to be unstable. Therefore a model was developed in stages, examining factors for significance.

The full model from the factors in Table 2-4 was first evaluated with weights proportional to numbers of test participants. It yielded two strong factors with low correlation (C and E): "Percent Asian or Pacific Islander" at p<.02, with a positive coefficient, and "Percent limited English proficiency" at p<.002, with a negative coefficient: Factors for school population and percent grade reduction had particularly small coefficients and low significance. They were removed, and a model with the remaining five factors then associated 67 percent of the variance and produced the factor weights shown in Table 2-6.

Table 2-6

5-Factor Model for 1999 MCAS Grade 10 Math Test Scores

Factor

Coefficient

Standard Error

Intercept, for all factors zero

229.4

1.936

B. Percent African American

0.047

0.104

C. Percent Asian or Pacific Islander

0.347

0.154

D. Percent Hispanic / Latino

-0.002

0.183

E. Percent limited English proficiency

-0.637

0.217

F. Percent free or reduced price lunch

-0.174

0.157

Sources of data: Appendix 3, Appendix 4, Statistica model

With model factors in Table 2-6, high factor weight and significance found in other studies for percentages of African American or Latino students disappear. Both factors have small coefficients and low significance. Statistical weight that might have been attached to these factors instead follows cultural and economic factors: "Percent limited English proficiency" and "Percent free or reduced price lunch." As an experiment, the model was rerun with the latter factors removed; only 57 percent of the variance was associated, and factor weights became those shown in Table 2-7.

Table 2-7

Racial and Ethnic Model for 1999 MCAS Grade 10 Math Test Scores

Factor

Coefficient

Standard Error

Intercept, for all factors zero

230.0

2.107

B. Percent African American*

-0.221

0.068

C. Percent Asian or Pacific Islander

0.219

0.156

D. Percent Hispanic / Latino*

-0.435

0.114

Sources of data: Appendix 3, Appendix 4, Statistica model

In Table 2-7, two "racial" or "ethnic" factors (marked *) have become significant at a p<.05 level. The coefficient for "Percent African American" has turned from positive to negative, and the coefficient for "Percent Hispanic / Latino" has become strongly negative. It seems likely that these two factors are acting as proxies for cultural and economic factors with more predictive power.

Residuals from the five-factor model of Table 2-6 are shown in Table 2-8. This include standard error estimates based on results from the trends study of Section 2A.

Table 2-8

Residuals for MCAS Grade 10 Math Test Scores, 5-Factor Model

City or Town

High School

Residual

Std. Error

Ratio

Arlington

Arlington

4.6

2.6

1.8

Belmont

Belmont

12.2

2.7

4.5

Boston

Boston High

-8.9

5.1

-1.7

Boston

Brighton

1.7

3.3

0.5

Boston

Charlestown

1.0

4.8

0.2

Boston

Dorchester

-1.6

4.3

-0.4

Boston

East Boston

2.1

4.2

0.5

Boston

Hyde Park

-6.1

4.8

-1.3

Boston

Jeremiah Burke

4.1

5.0

0.8

Boston

South Boston

-8.6

3.7

-2.3

Boston

The English High

11.4

5.0

2.3

Boston

West Roxbury

1.2

3.9

0.3

Boston Exam

Boston Latin

21.3

3.5

6.0

Boston Exam

Latin Academy

4.0

4.1

1.0

Boston Exam

O'Bryant Science

4.65

4.3

1.1

Braintree

Braintree

-1.78

2.5

-0.7

Brookline

Brookline

9.5

2.3

4.1

Cambridge

Rindge & Latin*

-6.3

4.0

-1.6

Chelsea

Chelsea

2.2

7.0

0.3

Dedham

Dedham

-1.6

3.1

-0.5

Everett

Everett*

-2.8

2.6

-1.1

Lexington

Lexington

4.4

2.8

1.6

Lynn

Classical

-6.5

3.3

-2.0

Lynn

English

-5.0

3.9

-1.3

Malden

Malden

-6.6

3.2

-2.1

Marblehead

Marblehead

3.4

3.2

1.1

Medford

Medford*

-7.4

2.7

-2.7

Melrose

Melrose

-2.7

2.7

-1.0

Milton

Milton

-1.7

3.1

-0.6

Newton

North*

8.6

2.3

3.7

Newton

South

10.0

2.8

3.6

Peabody

Veterans*

-7.2

2.5

-2.9

Quincy

North Quincy

-9.2

3.6

-2.5

Quincy

Quincy*

-14.1

3.1

-4.5

Revere

Revere*

-10.3

2.7

-3.9

Salem

Salem*

-1.5

3.1

-0.5

Saugus

Saugus

-2.8

2.9

-1.0

Somerville

Somerville*

2.7

4.7

0.6

Stoneham

Stoneham

-1.7

3.1

-0.5

Swampscott

Swampscott

11.8

3.2

3.7

Wakefield

Memorial

-2.0

2.8

-0.7

Waltham

Waltham*

-7.4

2.6

-2.8

Watertown

Watertown

4.7

3.1

1.5

Weymouth

Weymouth*

-7.0

2.3

-3.0

Winchester

Winchester

12.5

2.9

4.4

Winthrop

Winthrop*

-5.5

3.5

-1.6

Woburn

Woburn

-1.9

2.6

-0.7

* school providing vocational education

Sources of data: Appendix 3, Appendix 4, Statistica model

At first glance, some residuals in Table 2-8 look substantial, several scale points of difference from the average scores predicted by the model. However, residual ratios for most schools are within +/- 2 standard errors, not significant at a p<.05 level. Someone familiar with metropolitan Boston will recognize that schools with high and low residual ratios tend to be in high-income and low-income communities, respectively. It therefore seems likely that adding a factor for incomes can increase the predictive power of the model.

The most recent community income data were from the US Census of 1990, for 1989 per-capita income. Comparable 1999 income statistics were not yet available. The Massachusetts Department of Revenue could produce current community income statistics but has not done so; the state continues to use 1989 federal census data on incomes to apportion aid to public schools. After adding 1989 per-capita community income in $1,000s as a factor (Mass. DoR, 1999), without any attempt to adjust incomes so as to reflect school districts or student households, the model associates 80 percent of the statistical variance, and factor weights became those shown in Table 2-9.

Table 2-9

6-Factor Model for 1999 MCAS Grade 10 Math Test Scores

Factor

Coefficient

Standard Error

Intercept, for all factors zero

202.7

5.400

B. Percent African American

-0.020

0.083

C. Percent Asian or Pacific Islander*

0.371

0.121

D. Percent Hispanic / Latino

0.044

0.144

E. Percent limited English proficiency*

-0.695

0.171

F. Percent free or reduced price lunch

0.050

0.131

H. Per-capita community income (1989)*

1.186

0.230

Sources of data: Appendix 3, Appendix 4, Statistica model

Three factors in Table 2-9 (marked *) have substantial significance, at a p<.005 level or better, and three have very low significance. Factor weight has shifted from "Percent free or reduced price lunch" to "Per-capita community income (1989)," while "Percent limited English proficiency" retains a large coefficient and high significance. Dropping low-significance factors, the resulting three-factor model is shown in Table 2-10.

Table 2-10

3-Factor Model for 1999 MCAS Grade 10 Math Test Scores

Factor

Coefficient

Standard Error

Intercept, for all factors zero

204.9

4.446

C. Percent Asian or Pacific Islander

0.381

0.109

E. Percent limited English proficiency

-0.626

0.081

H. Per-capita community income (1989)

1.104

0.197

Sources of data: Appendix 3, Appendix 4, Statistica model

The three-factor model of Table 2-10 also associates 80 percent of the statistical variance. All of its factors are statistically significant at a p<.001 level.

For each school included in these studies, Table 2-11 presents adjusted average 1999 tenth-grade MCAS mathematics test scores and residuals from the three-factor statistical model of Table 2-10, with the uncertainties in average scores and residuals expressed as standard errors, based on the variance estimate calculated in the trends study of Section 2A.

Table 2-11

Residuals for MCAS Grade 10 Math Test Scores, 3-Factor Model

City or Town

High School

Average

Std. Error

Residual

Std. Error

Arlington

Arlington

234

2.1

4.2

2.4

Belmont

Belmont

243

2.2

6.7

2.7

Boston

Boston High

204

2.9

-10.3

3.1

Boston

Brighton

205

2.2

-0.1

2.9

Boston

Charlestown

206

2.7

-1.6

3.7

Boston

Dorchester

204

3.0

-0.5

3.5

Boston

East Boston

205

2.3

-1.0

2.9

Boston

Hyde Park

203

3.3

-5.0

3.7

Boston

Jeremiah Burke

208

2.7

5.1

3.4

Boston

South Boston

20