Reclassification of English Learners

Ron Unz, originator of Proposition 227, claimed, prior to the passage of Prop. 227, that the five percent annual reclassification rate of English learners to fluent English proficient indicated bilingual education was a failure. Critics of Prop. 227 have countered that the annual reclassification rate has changed little since the passage of Prop. 227, indicating the new legislation had no effect on reclassification rates. Unfortunately, the annual reclassification rate does not provide a clear indicator of how long it takes students to be reclassified after entering the school system. To better estimate reclassification rates for English learners in California, cohorts were created to track the same groups of students over time. Ron Unz also claimed that test scores for immigrant students improved dramatically after the passage of Prop. 227. To evaluate his claim, average test scores were calculated by language fluency. Based on statewide data from three different cohorts tracked across four years, Prop. 227 has had no effect on reclassification rates or test scores. Education Policy Analysis Archives Vol. 12 No. 36 2 Introduction Ron Unz, originator of Proposition 227, stated, prior to the passage of Prop. 227, that a five percent annual reclassification rate of English learners (EL) to fluent English proficient (R-FEP) in California implied to him a failure rate for bilingual education of 95 percent (Unz, 1997). Critics of Prop. 227 (Crawford, 2003; Hakuta, 2002; Mora, 2000) have challenged Unz’s statements about EL reclassification two ways. First, they present evidence that reclassification rates, available from the California Department of Education (CDE) were closer to seven than five percent and were rising prior to the passage of Prop. 227 (CDE, 2004). Second, the annual reclassification rate, since the passage of Prop. 227, has stabilized around eight percent, indicating that Prop. 227 has had little or no effect on reclassification rates. In addition, critics of Prop. 227 have emphasized that less than 30 percent of EL students were enrolled in bilingual programs prior to the passage of Prop. 227 (Gandara, 2000). As such, annual reclassification rates could not be interpreted as evidence that bilingual education programs were failing since more than 70 percent of EL students were not in bilingual programs. Although Unz has claimed Prop. 227 a success, he has been quiet about its effect on reclassification rates. It is the contention of this study that the reclassification rates cited by Unz and his critics are misleading in two ways. First, the data upon which these reclassification rates are based do not account for students moving into and out of the California school system. The EL student population is not stable. It is increasing each year (CDE, 2004). When there are more EL students entering the school system than leaving, the denominator is inflated and the proportion of students who have been reclassified (i.e., the number of reclassified students divided by the number of EL students) is underestimated. Second, the reported reclassification rates are simply the proportion of EL students who have been reclassified in a particular year. The rates do not provide an indicator of how long it takes students to be reclassified after they have enrolled in the California school system. According to Unz, most EL students can learn English in just a few months (Ron Unz Exposes, 2001) and so, the EL designation should not last much longer than a year. The language of Prop. 227, now part of California Law’s Education Code (EC), reflects this philosophy. Reclassification of English Learners 3 Children who are English learners shall be educated through sheltered English immersion during a temporary transition period not normally intended to exceed one year... Once English learners have acquired a good working knowledge of English, they shall be transferred to English language mainstream classrooms (EC, Section 305). The notion that EL students can learn English in just a few months has been called into question by researchers in language development. Hakuta, Butler, & Witt (2000) reported that English oral proficiency takes 3 to 5 years to develop and academic proficiency takes 4 to 7 years. They considered academic proficiency to be academic success in an English speaking classroom. This seems to be a tautology because the number of years to achieve academic proficiency was based on the length of time it took to reclassify students. Reclassifying students from EL to R-FEP status is a process that uses multiple criteria (EC, Sec. 313), which include: 1) Assessment of English language proficiency 2) Teacher evaluation 3) Parent opinion and consultation 4) Comparison of performance in basic skills The intent of using multiple criteria is to protect EL students from being reclassified before they are ready. It is thought if students are reclassified before they have achieved academic language skills or content-area knowledge and abilities they are at risk of academic failure. The first reclassification criterion, language proficiency, is determined by an English language proficiency test. English language proficiency tests are designed to measure students’ communication, reading, and writing skills in English. In May 2001 all Local Education Agencies (LEAs) were mandated by law (EC, Sec. 313) to use the California English Language Development Test (CELDT) to evaluate the English language proficiency of students whose home language is other than English. Prior to this date, LEAs were free to select from a list of CDE-approved English language development tests. Education Policy Analysis Archives Vol. 12 No. 36 4 The other crucial reclassification criterion is the assessment of basic skills. Scores on a standardized achievement test are used to evaluate basic skills. In September 2002, all LEAs were advised to use the California Standards Test (CST) to evaluate the proficiency of EL students in basic skills. Prior to this date, LEAs had discretion in determining academic proficiency. It was common for districts to require EL students to score at or above the 36 percentile on one or more portions of the statewide norm-referenced test (NRT), the Stanford Achievement Test version 9 (SAT/9), form T, to be reclassified. However, proficiency could be defined as higher or lower than the 36 percentile. Academic proficiency as defined by Hakuta, Butler, & Witt (2000) is a tautology because the length of time to achieve academic proficiency was based on the length of time it took students to be reclassified, and reclassification depends on academic performance. School districts report the biggest barrier to reclassification was not English proficiency but academic proficiency (Parrish, Linquanti, Merickel, Quick, Laird, & Esra, 2002). That is, students might be English fluent, based on results from an English language proficiency test, but would not be reclassified R-FEP because they could not meet the threshold (e.g., the 36 percentile) on a standardized achievement test. As a result, it could not be known if students would be able to demonstrate academic proficiency in the classroom if they only had to demonstrate proficiency in English to be reclassified. Whatever length of time it takes EL students to be academically proficient, Hakuta, Butler, & Witt (2000) argued that linguistic competence is complex, and even the most privileged second language learners take a significant amount of time to attain mastery, especially for the level of language required for school success. Given that reclassification rates have been used by proponents of Prop. 227 to support its passage and opponents to criticize its effectiveness, there should be interest in how long it takes students to be reclassified. Toward that end, the purpose of this study is to track three different cohorts of EL students over a span of time in order to calculate the proportion of EL students reclassified RFEP during this span. Reclassification of English Learners 5 Although Unz has been quiet about the effect of 227 on reclassification rates, Unz claims that test scores for EL students have improved dramatically since the passage of Prop. 227. Unz’s claims are based on an initial CDE achievement test report in which EL and R-FEP scores were mistakenly combined and reported as EL. When R-FEP scores were disaggregated from EL scores, the dramatic EL improvement disappeared. However, even after being informed of the error, Unz refused to modify his statements (Weintraub & Chey, 1999). Test scores of over one million immigrant students in California have risen by more than 50% since 1998, with those districts most rigorously embracing Prop. 227 having actually doubled their academic performance (Unz, 2001). A second purpose of this study is to evaluate Unz’s claim that EL test scores have improved dramatically since the passage of Prop. 227. Method Each spring California public schools administer a series of standardized achievement tests: the Standardized Testing and Reporting (STAR) program. These tests are administered to all public school students enrolled in grades two through eleven. As part of the testing program, demographic information, including language fluency, is collected. Students are classified into one of four language fluency categories: (1) English Only (EO), (2) Fluent English Proficient (FEP), (3) Reclassified Fluent English Proficient (R-FEP), or (4) English Learner (EL). The STAR tests were first administered in the spring of 1998. Through 2002, the standardized NRT, SAT/9, form T, was administered as part of the STAR program. In 2003, the NRT was changed to the California Achievement Tests, Sixth Edition Survey (CAT/6). This study uses data from tests administered from the spring of 1998 through 2003. STAR data were used to create three matched cohort files. For the first cohort file second-grade students tested in 1998 were matched with


Introduction
Ron Unz, originator of Proposition 227, stated, prior to the passage of Prop.227, that a five percent annual reclassification rate of English learners (EL) to fluent English proficient (R-FEP) in California implied to him a failure rate for bilingual education of 95 percent (Unz, 1997).Critics of Prop.227 (Crawford, 2003;Hakuta, 2002;Mora, 2000) have challenged Unz's statements about EL reclassification two ways.First, they present evidence that reclassification rates, available from the California Department of Education (CDE) were closer to seven than five percent and were rising prior to the passage of Prop. 227 (CDE, 2004).Second, the annual reclassification rate, since the passage of Prop.227, has stabilized around eight percent, indicating that Prop.227 has had little or no effect on reclassification rates.In addition, critics of Prop.227 have emphasized that less than 30 percent of EL students were enrolled in bilingual programs prior to the passage of Prop.227 (Gandara, 2000).As such, annual reclassification rates could not be interpreted as evidence that bilingual education programs were failing since more than 70 percent of EL students were not in bilingual programs.Although Unz has claimed Prop.227 a success, he has been quiet about its effect on reclassification rates.
It is the contention of this study that the reclassification rates cited by Unz and his critics are misleading in two ways.First, the data upon which these reclassification rates are based do not account for students moving into and out of the California school system.The EL student population is not stable.It is increasing each year (CDE, 2004).When there are more EL students entering the school system than leaving, the denominator is inflated and the proportion of students who have been reclassified (i.e., the number of reclassified students divided by the number of EL students) is underestimated.Second, the reported reclassification rates are simply the proportion of EL students who have been reclassified in a particular year.The rates do not provide an indicator of how long it takes students to be reclassified after they have enrolled in the California school system.
According to Unz, most EL students can learn English in just a few months (Ron Unz Exposes, 2001) and so, the EL designation should not last much longer than a year.The language of Prop.
227, now part of California Law's Education Code (EC), reflects this philosophy.
Children who are English learners shall be educated through sheltered English immersion during a temporary transition period not normally intended to exceed one year… Once English learners have acquired a good working knowledge of English, they shall be transferred to English language mainstream classrooms (EC, Section 305).
The notion that EL students can learn English in just a few months has been called into question by researchers in language development.Hakuta, Butler, & Witt (2000) reported that English oral proficiency takes 3 to 5 years to develop and academic proficiency takes 4 to 7 years.They considered academic proficiency to be academic success in an English speaking classroom.This seems to be a tautology because the number of years to achieve academic proficiency was based on the length of time it took to reclassify students.
Reclassifying students from EL to R-FEP status is a process that uses multiple criteria (EC, Sec.313), which include:

1)
Assessment of English language proficiency 2) Teacher evaluation 3) Parent opinion and consultation

4) Comparison of performance in basic skills
The intent of using multiple criteria is to protect EL students from being reclassified before they are ready.It is thought if students are reclassified before they have achieved academic language skills or content-area knowledge and abilities they are at risk of academic failure.
The first reclassification criterion, language proficiency, is determined by an English language proficiency test.English language proficiency tests are designed to measure students' ), form T, to be reclassified.However, proficiency could be defined as higher or lower than the 36 th percentile.
Academic proficiency as defined by Hakuta, Butler, & Witt (2000) is a tautology because the length of time to achieve academic proficiency was based on the length of time it took students to be reclassified, and reclassification depends on academic performance.School districts report the biggest barrier to reclassification was not English proficiency but academic proficiency (Parrish, Linquanti, Merickel, Quick, Laird, & Esra, 2002).That is, students might be English fluent, based on results from an English language proficiency test, but would not be reclassified R-FEP because they could not meet the threshold (e.g., the 36 th percentile) on a standardized achievement test.As a result, it could not be known if students would be able to demonstrate academic proficiency in the classroom if they only had to demonstrate proficiency in English to be reclassified.
Whatever length of time it takes EL students to be academically proficient, Hakuta, Butler, & Witt (2000) argued that linguistic competence is complex, and even the most privileged second language learners take a significant amount of time to attain mastery, especially for the level of language required for school success.
Given that reclassification rates have been used by proponents of Prop.227 to support its passage and opponents to criticize its effectiveness, there should be interest in how long it takes students to be reclassified.Toward that end, the purpose of this study is to track three different cohorts of EL students over a span of time in order to calculate the proportion of EL students reclassified R-FEP during this span.
Although Unz has been quiet about the effect of 227 on reclassification rates, Unz claims that test scores for EL students have improved dramatically since the passage of Prop.227.Unz's claims are based on an initial CDE achievement test report in which EL and R-FEP scores were mistakenly combined and reported as EL.When R-FEP scores were disaggregated from EL scores, the dramatic EL improvement disappeared.However, even after being informed of the error, Unz refused to modify his statements (Weintraub & Chey, 1999).
Test scores of over one million immigrant students in California have risen by more than 50% since 1998, with those districts most rigorously embracing Prop.227 having actually doubled their academic performance (Unz, 2001).
A second purpose of this study is to evaluate Unz's claim that EL test scores have improved dramatically since the passage of Prop.227.

Method
Each spring California public schools administer a series of standardized achievement tests: the Matching students on home language generated a sub-sample of students, since the data field for home language could be missing or contain inconsistencies.If home language for a student was missing or inconsistent, a match could not be made and the student was dropped from the sample and the sample size was reduced.After the matching process, home language was constrained to two categories: Spanish (i.e., EL students whose home language was Spanish) and other language (i.e., EL students whose home language was something other than Spanish).The sub-sample for the 1998-2001 cohort had 57,348 students and was created to compare reclassification of Spanish EL students with other EL students.
Three different types of analyses were conducted.In one set of analyses the probability that EL students would be reclassified as R-FEP between second and fifth-grade was estimated.Toward that end the number and percent of students reclassified as R-FEP between second and fifth-grade were calculated.These analyses also calculated the number and percent of students not reclassified (i.e., the students who remained EL) between second and fifth-grade.These percents can be interpreted as probabilities.Analyses were also conducted for subgroups: gender (i.e., females compared to males), the national school lunch program (NSLP) participation (i.e., students receiving free and reduced lunch compared to those who do not), and home language (i.e., students whose home language is Spanish compared to students whose home language is neither English or Spanish).
A second series of analyses used logistic regression to test for subgroup differences in reclassification rates after accounting for differences in achievement.Reclassification, defined as whether a student had been reclassified or not by the end of fifth-grade, was regressed on gender, NSLP, and home language and NRT scores.
A third series of analyses evaluated academic performance by language fluency.Average Stanford 9 total reading NCE scores were calculated for EO, FEP, R-FEP, and EL students across four years.For the 2000-2003 cohort the CAT/6 was administered in fifth-grade.The fifth-grade CAT/6 average reading NCE scores were converted to SAT/9 average reading NCE scores through equipercentile equating.These analyses were conducted to evaluate the effect of Prop.
227 on the test scores of EL and R-FEP students.

Results
Students in the matched cohorts have higher test scores on average than the state as a whole.It is assumed that scores are higher because students have remained in the same school for at least four years.Figure 1 compares the SAT/9 mean total reading scale scores for the whole state and for the 1998-2001 cohort sample.Figure 1 shows that the 1998-2001 cohort sample on average had higher SAT/9 mean total reading scale scores than the state as a whole.Figure 2 shows these same data for EL students.EL students in the 1998-2001 cohort sample on average scored slightly higher in total reading than EL students for the state as a whole.Results were consistent across other cohorts and indicate that subsequent analyses are based on groups of students that have higher test scores than the state as a whole.Results and conclusions need to be interpreted with these results in mind.

Reclassification Rates
Figure 3 shows the reclassification rate for the 1998-2001 matched cohort.It is a truer indicator of the reclassification process than annual reclassification rates because students did not move in or out of the group and a single group of students was tracked for four years.

R-FEP EL
Results indicate the length of time to be reclassified is different than what might be imagined from the annual reclassification rates reported by Unz and his critics.That is, the percent of students reclassified each year is neither 5 nor 8 percent but varies from year to year.Within a year or two of being classified EL, few students are reclassified as R-FEP.It is not unreasonable to think that most of the second-grade EL students in this cohort were also first-grade EL students.In any case, less than 2 percent of EL students who started second, and possibly first grade, as EL were reclassified R-FEP by the end of the school year.By the end of third-grade, only an additional 4 percent of these same students had been reclassified.However, after two or three years of EL designation the reclassification rate began to increase.By the end of fourth-grade, an additional 10 percent were reclassified and by the end of fifth-grade, 14 percent more were reclassified.The pattern indicates that few students were reclassified within one to three years of entering the school system.State law asserts that the EL designation should not normally exceed one year, but after four or five years of schooling, only 30 percent of EL students had been reclassified.
These results can be interpreted as probabilities.That is, after four or five years of schooling (i.e., by the end of fifth-grade) EL students had a 30 percent probability of being reclassified as R-FEP and a 70 percent probability of remaining EL.Since reclassification is based in part on achievement data and the 1998-2001 cohort is higher achieving than the state as a whole, the true rate may be something less than 30 percent.

R-FEP EL
The pattern is the same but the probability of being reclassified by the end of fifth-grade improved slightly.EL students now had a 32 percent probability of being reclassified and a 68 percent probability of remaining EL.

Reclassification Rates by Subgroups
Next, reclassification rates were calculated for three subgroups: gender (i.e., females compared to males), NSLP participation (i.e., students who receive free or reduced lunch compared to those who do not), and home language (i.e., students whose home language is Spanish compared to students whose home language is neither English nor Spanish).Figure 6 shows reclassification rates by gender for the 1998-2001 cohort.

R-FEP EL
Figure 6 shows female EL students were more likely to be reclassified than males.By the end of fifth-grade the probability of females being reclassified R-FEP was 32 percent and for males the probability was 28 percent.
Figure 7 shows reclassification rates for NSLP students and non-NSLP students.Figure 8 shows the reclassification rates for Spanish EL students compared to other language EL students.

R-FEP EL
In Figure 8 Spanish EL students had a 27 percent chance to be reclassified R-FEP by the end of fifth-grade and other language EL students had a 40 percent.
Results were consistent across cohorts.The data suggest that male, NSLP, and Spanish EL students have a lower probability of being reclassified R-FEP than female, non-NSLP, and other language EL students.However, the reclassification process relies on multiple criteria and a crucial aspect of the reclassification process was the assessment of basic skills.Scores on a standardized achievement test were used to evaluate basic skills.To account for the relationship between academic achievement and reclassification, logistic regression was used to test for group differences after holding achievement constant.Reclassification, defined as to whether a student had been reclassified or not by the end of fifth grade, was regressed on gender, NSLP, home language, and NRT total reading normal curve equivalent (NCE) scores for four years.Table 1 shows these results for the 1998-2001 cohort.The intercept (i.e., -5.3359) represents the probability of being reclassified.This logit value represents approximately .005%.Parameter estimates with positive values move this percent closer to 1 (i.e., increase the likelihood of being reclassified) and negative values move the value away from 1 (i.e., decrease the likelihood of being reclassified).For example, holding achievement and other variables constant, females were significantly, at the .008level of significance, more likely than males to be reclassified.
Even though NSLP students were more likely to be reclassified after holding achievement and other variables constant, the difference between reclassification rates for NSLP and non-NSLP EL students was not significant, at a .01level of significance.Non-NSLP students scored higher than NSLP EL students on the SAT/9 reading test and were thus more likely to be reclassified.
However, when achievement was held constant the difference in reclassification rates disappeared.
Home language was significant at the .0001level of significance.After controlling for the effects of achievement and other variables, Spanish EL students were more likely to be reclassified R-FEP than other language EL students.If both home language groups were being treated in the same way, controlling for test score differences should have the same effect as NSLP.That is, the differences between the groups would have no longer been significant.However, the direction of the parameter estimate raises the suspicion that a large number of other language EL students, who were eligible for reclassification, given their NRT test scores, were not reclassified.
To test this suspicion, test scores for EL students from the two different home language groups were compared.Table 2 shows these results.Other language EL students on average had higher SAT/9 reading scores than Spanish EL students and were thus more likely to be reclassified.
The next analysis attempted to determine if other language EL students were under-represented in the R-FEP language category.If so, that would explain the regression results.For each year in the 1998-2001 the EL students who had not been reclassified and who had scored at or above the 36 th percentile on the SAT/9 were identified.The 36 th percentile was selected because it has been a traditional score to determine student reclassification.Figure 9 shows these results.

24.8%
Figure 9 shows a couple of different things.First, it shows the percentage of students who were possible candidates for reclassification based on scoring at or above the 36 th percentile on the SAT/9 reading test.In 1998, 12.9 percent (i.e., 5.3% + 7.6%) met the reclassification threshold of the 36 th percentile but were not reclassified.In 1999, 16.4 percent met the threshold value, and in 2000 and 2001 there were 21.3 and 24.8 percent, respectively, that met the threshold value.Each year there were a certain percentage of students who were strong candidates for reclassification but were not reclassified and each year this percentage increased.By grade five, 25 percent of EL students who were strong candidates for reclassification had not been reclassified.
Second, Figure 9 shows the percent of other language and Spanish EL students who met the reclassification threshold of the 36 th percentile but were not reclassified.In 1998 for example, the percent of other language EL students was 5.3 percent.For Spanish EL students it was 7.6 percent.Continuing to use 1998 as an example, other language EL students represented 41.2 percent of the 12.9 percent total and Spanish EL students represented 58.8 percent.However, for the full 1998-2001 cohort, other language EL students represented 26 percent of the total and Spanish EL students represented 74 percent.The other language EL students represented a larger percentage of EL students that met the NRT threshold for reclassification but were not reclassified than they did of EL students overall.That is why the regression analysis indicated that Spanish EL students were more likely to be reclassified than other language EL students when achievement was controlled.Figure 9 shows this same value as 26,970 students.The number of students in Figure 9 represents those EL students who had reading test scores for grades 2 through 5 and non-missing home language information.The requirement to have non-missing data for the four different reading tests and home language reduced the sample size.
Back to Table 1, there is also a significant interaction effect for NSLP and home language.The interpretation is that even though Spanish EL students were more likely to be reclassified than other language EL students, the Spanish / NSLP students were even more likely than the Spanish / non-NSLP students to be reclassified after holding achievement and other variables constant.
The regression analysis indicates that the strongest predictors of whether students would be reclassified were reading test scores.As test scores went up, the probability of being reclassified increased.In addition, when achievement was held constant Spanish students were more likely, rather than less likely, than other language EL students to be reclassified.Table 3 shows the regression analysis for the 1999-2002 cohort.
Results for the 1999-2002 cohort were consistent with the 1998-2001 cohort.Reading test scores and language (i.e., Spanish) were again the variables most strongly related to reclassification.And again, female EL students were more likely to be reclassified than males after controlling for the effects of achievement and other variables.However, for the 1999-2002 cohort the difference in reclassification rates for NSLP and non-NSLP students was statistically significant at the .01level.Students receiving free and reduced lunch were more likely to be reclassified R-FEP after controlling for achievement and other variables.
Table 4 shows the regression analysis for the 2000-2003 cohort.Results are consistent with the other cohorts except female EL students were neither more nor less likely to be reclassified than male EL students holding achievement and other variables constant.EL students receiving NSLP were neither more or less likely to be reclassified than non-NSLP EL students and Spanish speaking EL students were neither more or less likely to be reclassified than non-Spanish EL students.Again, there was a significant interaction effect for NSLP and home language but the direction was reversed from the other cohorts.The interpretation is that, even though there was no relationship between being reclassified, NSLP and home language, the NSLP / non-Spanish students were more likely than the NSLP / Spanish students to be reclassified after holding achievement and other variables constant.As with the other cohorts, the strongest predictors of reclassification were reading test scores.The average reading NCE scores for EO, FEP, and R-FEP students were comparable, but for EL students the average reading scores were much lower.For EO and FEP students there was a slight upward trend in the average reading score but for R-FEP students there was a slight downward trend.For EL students the average reading scores remained fairly constant over time.For EO and FEP students, the test scores were computed for the same students each year.For R-FEP and EL students test scores were computed for different students each year.Each year the number of R-FEP students increased and the number of EL students decreased because each year more students were reclassified.
Students reclassified as R-FEP in 1998 were the most academically precocious EL students by virtue of the fact that they were the first to meet both language and academic reclassification requirements.Students reclassified as R-FEP in 2001 were the least academically proficient of those reclassified by virtue of the fact that it took them the longest time to meet the reclassification requirements.
The downward trend in test scores for R-FEP students should not automatically be interpreted to mean R-FEP performance was declining.The lower scores indicate that each year less able students joined the R-FEP group.Even so, the R-FEP average in 2001 was above the 50 th percentile of the norming sample.
The low EL test scores represent the opposite trend of R-FEP.Each year the most academically proficient students left this group and were reclassified R-FEP.The continuously low academic performance of EL students should not be interpreted to mean that EL students never improve or were failing to close the gap between themselves and the other language categories.Each year the EL group represented those students who were left behind after the most academically able were reclassified as R-FEP.
Test scores in Figure 10 are average scores.There was variance around these scores.In 2001 for each language designation, the individual NCE scores ranged from 1 to 99.The standard deviations for EO, FEP, R-FEP, and EL students scores were 19, 18, 15, and 14 respectively.The overall standard deviation for the 2001 grade 5 reading NCE scores was 21.Therefore, even though average EL scores were noticeably lower than EO, FEP, and R-FEP average scores there were EO, FEP, and R-FEP students scoring lower than the average for EL students.
Figure 11 shows the pattern of test scores across years by the grade in which students were reclassified.The number of students represents the total number of students who were reclassified in grades 2 through 5. Students reclassified in 1998 had a pattern of highest test scores.Students reclassified in 2001 had the pattern of lowest test scores.These data support the contention that students reclassified in second-grade were more academically precocious than students reclassified in grade five.
However, students reclassified in fifth-grade showed the most improvement over time.Average performance of students reclassified in second-grade had stabilized while students reclassified in the fourth and fifth-grades were closing the achievement gap.Results for the 1999-2002 cohort show the same pattern as the 1998-2001 cohort.However, the average score across language groups improved.This was not surprising since it has been widely reported that when the same test series is used year after year, test scores tend to improve as teachers become more aware of test content (Linn, Graue, & Sanders, 1990).

Academic Performance by Language Fluency in a Single District
Unz often references a particular school district in California as a model of the positive effects of Prop.227 (Nishioka, 1999).Unz claims that the 50 percent rise in test scores was evidence that the English immersion practiced in this model district and Prop.227 were working (Sailer, 2002).reclassification rate at the end of fifth-grade is higher than the state average.Even so, students in this model district take much longer than a year to be reclassified and test scores for their R-FEP and EL students were lower than the state average.

Discussion
To better estimate reclassification rates, cohorts were created so the same group of students could be tracked over time.Based on data from three cohorts, the probability that EL students would be reclassified R-FEP by the end of fifth-grade was 30 to 32 percent.Conversely, the probability that EL students would not be reclassified R-FEP by the end of fifth-grade was 68 to 70 percent.The goal of reclassifying EL students as R-FEP within a year or two of entering the school system has not been achieved with the passage of Prop.227.
It is unlikely Prop.227, as written, had or will have any effect on reclassification rates.
Reclassification is dependent on the multiple criteria used in the reclassification process.These criteria existed before and after the passage of Prop.227.One of these criteria, performance in basic skills, was reported by districts to be the biggest barrier to reclassification.Unz was critical of the basic skills requirement.
Children from immigrant or Latino backgrounds are categorized as not knowing English if they merely score below average on English tests, meaning that unknown numbers of children whose first and only language is English spend their elementary school years trapped in Spanish-only bilingual programs (Unz, 1997).
However, when Unz drafted Prop.227, the reclassification criteria were not addressed in the new legislation.
Reclassification rates for the 1999-2002 and 2000-2003 cohorts were slightly higher than the 1998-2001 cohort.This slight improvement, if it is improvement rather than random year-to-year fluctuation, was more likely the result of better tracking at the local level.Rather than Prop.227, CELDT testing and the requirement to include EL students in California's statewide accountability index have pushed districts to improve the tracking of EL students.Since it is less likely for students to fall through the cracks, reclassification rates improved.
There were differences in reclassification rates for subgroups.Females were more likely to be reclassified than males.Non-NSLP students were more likely to be reclassified than NSLP students, and other language EL students were more likely to be reclassified than Spanish EL students.However, regression analyses revealed when achievement was held constant these differences generally disappeared.Females were still more likely to be reclassified than males when achievement was held constant for the 1998-2001 and 1999-2002 cohorts but not for the 2000-2003 cohort.When achievement was held constant the difference in reclassification rates between non-NSLP students and NSLP students either disappeared or reversed (i.e., NSLP students are more likely to be reclassified) and when achievement was held constant Spanish EL students were either more likely to be reclassified than other language EL students or there was no difference.
The regression analyses further indicate that a major factor for reclassification was performance on standardized tests.
Multiple classification criteria exist to protect students from being reclassified too quickly.
However, there may be an overprotected group of EL students.Shepard, Flexer, Hiebert, Marion, Mayfield, & Weston (1996) found no achievement differences between experimental and control subjects after a year-long project focused on modifying teacher pedagogy to improve student achievement.
After looking at achievement data, it appears Prop.227 had no effect on student test scores.For EO and FEP students, there was a slight upward trend in the average reading score.Much of this change was likely due to using the same test year-to-year.For R-FEP students, there was a slight downward trend, except for the 2000-2003 cohort.The downward trend was likely due to less academically able EL students (i.e., less able than the already reclassified R-FEP students) being reclassified R-FEP.This should not be interpreted to mean that students who take longer to be reclassified are not academically capable.It simply means they tend to be less capable than students who have already been reclassified.For EL students, the average reading score remained consistently low over time.Scores remained low because the more academically proficient students were reclassified R-FEP.
There was no dramatic improvement in test scores across years within a cohort or from cohort to cohort for any of the language fluency categories.For example, for the 1998-2001 cohort there was no dramatic improvement in reading scores from grade two to grade three and there was also no dramatic improvement in reading scores from the 1998-2001 to the 2000-2003 cohort for any of the language fluency categories.Test scores changed in a manner that might be expected when the same test battery was administered year after year.
Data from Unz's model district do not support his claims that English immersion programs dramatically improved EL and/or R-FEP test scores.Test scores from Unz's model district did not show any dramatic upward trend.Scores were even lower than the statewide average.In addition, EL students in Unz's model district took considerably longer than a year to be reclassified.Hakuta (2002) reported comparable results.
Prop. 227 has had no effect on EL reclassification rates or test scores.Yet, a review of magazine and newspaper articles indicated that reporters generally accepted and reported Unz's data and anecdotal evidence without question.It is difficult to find a clear coherent criticism of Unz's statements in the press.For example, Unz's critics were correct when they said the annual reclassification rate was closer to 7 than 5 percent.Yet, Unz's 5 percent rate was reported over and over again.Aryal (1998) reported there were specific reasons why Unz's message was more widely reported than his critics.During the Prop.227 campaign, Unz repeated the same message, promptly returned phone calls, provided sound bites, and was the clear point person for the initiative.In contrast, opponents of Prop.227 were a diverse group with a profusion of messages and difficult to reach.Even so, reporters could have verified or at least called into question Unz's statistics by visiting CDE's web site but failed to do so.The unfortunate aspect of not verifying data is that Unz has been given free reign to report misinformation that has influenced educational policy.The false claim, that there was a 50 percent improvement in EL achievement, has been reported so often in so many different sources that it has assumed a reality that this study is unlikely to undermine.

Figure
Figure 1.Average SAT/9 total reading scale score for all students compared to the 1998-2001 matched cohort sample

Figure
Figure 2. Average SAT/9 total reading NCE score for EL students statewide compared to the 1998-2001 matched cohort sample

Figure
Figure 3. Proportion of EL students reclassified R-FEP for the 1998-2001 cohort, n = 58,775

Figure 4
Figure 4 shows the same results for the 1999-2002 cohort.The 1999-2002 cohort is the class that is one year behind the 1998-2001 cohort.

Figure
Figure 4. Proportion of EL students reclassified R-FEP for the 1999-2002 cohort, n = 72,806

Figure 6 .
Figure 6.Proportion of EL students reclassified R-FEP for the 1998-2001 cohort by gender, n = 58,775

Figure 7
Figure 7 indicates that EL NSLP students had a 27 percent chance of being reclassified R-FEP by the end of fifth-grade and EL no NSLP students had a 46 percent chance.

Figure 8 .
Figure 8. Proportion of EL students reclassified R-FEP for the 1998-2001 cohort by home language, n = 57,348

Figure
Figure 9. Percent of EL students scoring at or above the 36th pecentile by home language for the 1998-2001 cohort, n = 26,970

Figure 3
Figure 3 the shows the number of EL students after grade 5 for the 1998-2001 cohort as 41,143.

Figure
Figure 10.Average SAT/9 total reading NCE score by language fluency for the 1998-2001 cohort, n = 145,873

Figure
Figure 11.Average SAT/9 total reading NCE score for 1998-2001 cohort R-FEP students by the grade in which students were reclassified, n = 17,436

Figure
Figure 12.Average SAT/9 total reading NCE score by language fluency for the 1999-2002 cohort, n = 195,082 For the 2000-2003 cohort, the average reading score across groups improved even more and the EO, FEP, and EL trends are comparable to the 1998-2001 and 1999-2002 cohorts.However, the R-FEP students in the 2000-2003 cohort did not demonstrate the downward trend in test scores seen in the other cohorts.

Figure
Figure 13.Average SAT/9 total reading NCE score by language fluency for the 2000-2003 cohort, n = 214,830

Figure 14
Figure 14 shows average total reading NCE scores and reclassification rates for the 1998-2001 EL and R-FEP students in the model district.

Figure 14 .
Figure 14.Average SAT/9 reading NCE score for 1998-2001 R-FEP and EL students in the model district, n = 239 15 indicate that test scores were still a bit lower than the state average for EL and R-FEP students.Reclassification rates have improved over the district's 1998-2001 cohort and the Prior to this date, LEAs were free to select from a list of CDE-approved English language development tests.The other crucial reclassification criterion is the assessment of basic skills.Scores on a standardized achievement test are used to evaluate basic skills.In September 2002, all LEAs were advised to use the California Standards Test (CST) to evaluate the proficiency of EL students in basic skills.Prior to this date, LEAs had discretion in determining academic proficiency.It was common for districts to require EL students to score at or above the 36 th percentile on one or more portions of the statewide norm-referenced test (NRT), the Stanford Achievement Test version 9 (SAT/9 communication, reading, and writing skills in English.In May 2001 all Local Education Agencies (LEAs) were mandated by law (EC, Sec.313) to use the California English Language Development Test (CELDT) to evaluate the English language proficiency of students whose home language is other than English.
The matched-cohort file contained information about the same group of students in the same school for four years at four different grade levels.Students who left the school, entered the school after grade two, or were held back were not part of the matched cohort.
Standardized Testing and Reporting (STAR) program.These tests are administered to all public school students enrolled in grades two through eleven.As part of the testing program, demographic information, including language fluency, is collected.Students are classified into one of four language fluency categories: (1) English Only (EO), (2) Fluent English Proficient (FEP),

Table 4 Reclassification regressed on gender, NSLP, language, and reading scores for the 2000-2003 cohort
That is, by the end of fifth-grade 25 percent of EL students in the 1998-2001 cohort who were strong candidates for reclassification, based on standardized test scores, had not been reclassified.This finding warrants further study to better understand how LEAs use the reclassification criteria.The 30 to 32 percent reclassification rate of EL to R-FEP after four or five years of schooling raises questions about the reclassification process itself.Educators of English learners need to evaluate whether students are being reclassified at an appropriate rate or too slowly.Are the safe guards to protect students from being reclassified too quickly helping or hindering the academic achievement of EL students?What are the advantages and disadvantages of long term EL designation?Although Unz claimed a dramatic improvement in EL test scores after the passage of Prop.227, his claims seemed questionable even before looking at the data.First, the dramatic improvement was based on the change in scores from 1998 to 1999 data.The initial CDE STAR report for 1999 had an error that was not caught until after data were released.The error consisted of combining EL and R-FEP scores and reporting the combined data as EL.At first, EL scores seemed to have improved dramatically.When the error was discovered and corrected by disaggregating the R-FEP from EL scores, the dramatic EL improvement disappeared.Even though Unz was well aware of the error in the initial 1999 report, he has failed to modify his statements about dramatically improved EL test scores.Second, when EL students demonstrate higher academic performance they are reclassified R-FEP.So, it is difficult to track improvement in EL scores because higher performing EL students would no longer be classified EL.Third, other large scale assessments such as NAEP do not support dramatic year-to-year change in student performance.It is very difficult to dramatically improve student achievement, even when that is the specific focus.