How Consistent are Course Grades ? An Examination of Differential Grading

Differential grading occurs when students in courses with the same content and curriculum receive inconsistent grades across teachers, schools, or districts. It may be due to many factors, including differences in teacher grading standards, district grading policies, student behavior, teacher stereotypes, teacher quality, and curriculum adherence. If it occurs systematically, certain types of students may receive higher or lower grades relative to other students, despite having similar content mastery or ability. Using three years of statewide data on Algebra I and English I courses in North Carolina public high schools, I find that student characteristics are stronger predictors of differential grading than teacher, school, or district characteristics. Female, Limited English Proficient, and 12th grade students earn statistically significant higher grades than other students, holding test scores and student, teacher, school, and district characteristics constant. Low-income students, conversely, earn lower grades than other students, all else constant. With the exception of Algebra I low-income students, these differences are large enough to move a student one grade category on a plus/minus 7-point A-F grading scale. Black students earn higher Algebra I grades but lower English I grades than white or Asian students with the same test score, but these effect sizes are smaller than other student epaa aape Education Policy Analysis Archives Vol. 22 No. 92 2 characteristics. Interactions between student and teacher race and gender yielded small estimates that were not consistent between subjects.


Introduction
High school grades play an important role in college admissions.Many mid-and low-tier colleges make admissions decisions based almost exclusively on a student's GPA and SAT or ACT score.However, a student who receives an A in a course in one classroom may not have the same subject mastery as a student who receives an A in the same course in another classroom.This phenomenon, defined as differential grading in this paper, occurs when students in courses with the same content and curriculum receive inconsistent grades across teachers, schools, or districts (Godfrey, 2011).Many factors can lead to differential grading, including differences in teacher grading standards, district grading policies, student behavior, teacher stereotypes, teacher quality, and curriculum adherence.
High school teachers often have significant latitude in determining their grade distributions (Camara, 1998).Whether intentional or unintentional, teachers may assign a student's grade to reflect effort, persistence, a personal relationship, or a desire to increase a student's chances for college admission or scholarship.
Differential grading may also occur if certain types of students exert different amounts of effort depending on teacher characteristics.For example, Dee (2007) finds that students with samegender teachers perform better than students with opposite-gender teachers.Same-gender teachers can also have better perceptions of same-gender students.Evans (1992) finds evidence that black students perform better on standardized tests of economic literacy when they have black teachers relative to teachers of another race.Similarly, at the community college level, Fairlie, Hoffman, and Oreopoulos (2011) find that underrepresented minority students have lower course withdrawal rates and higher pass rates when taught by a minority instructor.
Racial, gender, and other stereotypes of student performance also may influence how a teacher issues grades.Gender, ethnic, and socioeconomics stereotypes have some impact on how teachers view students (Madon, Jussin, Keiper, Eccles, Smith, & Palumbo, 1998).For example, teachers may believe that boys are better at math than girls or that minority students perform worse in school than white students (Hyde & Jaffree, 1998;Reyna, 2000).These stereotypes may cause grade discrimination, whereby a teacher assigns a grade at least partially based upon stereotypes of a student's innate characteristics rather than solely based upon student performance.Ehrenberg, Goldhaber, and Brewer (1995) compare student learning gains over two years with teacher perceptions of their students' learning.They find that a teacher's race, gender, and ethnicity are more likely to influence their subjective evaluations of students than how much students actually learn as measured by a standardized test.Teachers routinely rate female and white students higher than other students, holding test scores constant.Further, teachers of a certain race have more favorable views of ability for students of the same race relative to students of other races, holding test scores constant.Thus, the interaction of teacher and student characteristics may result in differential grading.Lavy (2008) compares scores given by a student's teacher grades with scores given by an identity-blind grader on a similar test in nine high school courses in Israel.In each class, students took a teacher-graded school exam and an almost identical state exam within a few weeks that was graded blindly.Lavy finds that teacher-graded exam scores are higher than blind test scores for most tests.More importantly, he finds a grading bias against boys of 0.05 to 0.25 standard deviations in the state exam score distribution across all subjects.These differences are sensitive to teacher characteristics, such as gender, age, experience, and family size, but are not sensitive to student characteristics, such as race, parent's education, or previous achievement.Thus, Lavy concludes that the grading difference can be attributed to an anti-male grade discrimination.Hinnerich, Höglin, and Johannesson (2011) follow a similar methodology for a random sample of Swedish high school students' performance on a national Swedish language exam.A blind grader and the student's teacher scored each test, so the study uses blind and non-blind grading of the same test rather than two tests.Consequently, their results are not susceptible to differences in test structure or student behavior between test administrations, as is the case with Lavy's study.While they find that boys scored 15% lower than girls, they do not find evidence of discriminatory grading between genders.However, they find that blind grading yields scores that are 13% lower than non-blind grading for both genders, meaning that classroom teachers tend to grade their students more favorably than blind graders.
In an experiment with Indian students who were offered a monetary incentive for taking an exam, Hanna and Linden (2012) compare the scores of blind graders with graders who received a face sheet with each exam that included randomized student characteristics (age, gender, and caste).This strategy allows them to isolate discrimination in grading on the same exam.They find that teachers issue scores to "lower-caste" students 0.03 to 0.08 standard deviations below "high caste" students.They do not find patterns by gender or age.
These studies provide insight into the existence of differential grading on standardized tests.However, they do not examine differential grading in the assignment of course grades, which play a significant role in college admissions in the United States.Standardized tests provide a one-time snapshot of student's subject knowledge and test-taking skills, while course grades reflect a more holistic picture of student performance as determined by multiple assessments, class participation, homework, attendance, and other factors.Grade comparisons between students with similar test scores can provide a picture of differential grading patterns, assuming the tests are unbiased.Cornwell, Mustard, and Van Parys (2013) compare the performance of elementary students in the 1998-99 ECLS-K cohort on objective reading, math, and science assessments with their teacher's assessment of student mastery of each subject.They find that girls receive grades that, depending on the subject, race, and grade level, are generally 0.10 to 0.25 standard deviations higher than boys with the same test score.The differential grading is larger in math and science and is unaffected by teacher experience and education level.In addition, they find that controlling for noncognitive skills reduces or eliminates the gender difference in all subjects.
In Sweden, Lindahl (2007) compares student performance on national tests in Swedish, English, and Mathematics with teacher-assigned school leaving certificates, which are used for college admissions and job applications, to measure whether systematic differences exist by gender or native status.Teachers are supposed to use the test scores as a guide for school leaving certificates, but they have flexibility on a student-by-student basis.Lindahl finds that school leaving certificate averages are 0.02 to 0.06 standard deviations higher than test scores in all subjects.Using a difference-in-difference estimator with school and time effects, she finds that teachers assign higher grades to girls relative to boys in all subjects.Teachers also assign higher grades to non-native Swedish students relative to native students in mathematics and Swedish but not in English.The differences are largest in mathematics, making up 0.11 standard deviations in the grade difference distribution for girls and 0.23 standard deviations for non-natives.Lindahl notes that these findings may only be generalizable to above average students because the data only included students with scores in all three subjects, eliminating lower performing students who may not have finished school.While grade discrimination may drive some of this difference, other factors, such as student effort and performance over the course of the school year, may have an impact on teacher grading.Godfrey (2011) compares Advanced Placement (AP) exam scores and actual course grades in five courses across five schools in the United States.AP courses provide high-achieving high school students with college-level curriculum that culminates with a comprehensive exam for college credit.The exams are aligned with the curriculum in each subject.She regresses AP exam scores on AP course grades with school fixed effects and finds that the relationship between course grades and AP Exam scores varies widely within each subject and across schools.However, as the author notes, the generalizability of results is limited because the data only include high school graduates or those who took AP exams.Thus, the evidence for differential grading cannot extend beyond highachieving students, as is the case with Lindahl's study in Sweden.
Due to this limitation, an ideal standardized measure of student learning would be curriculum-based and required for all students.Statewide end-of-course tests (EOC) fit this description because all students who are enrolled in a class with a corresponding EOC test must take the exam.However, research on the relationship between EOC test scores and course grades has been limited.In 1999, the Texas Education Agency examined the correlation between Algebra I course grades and EOC test scores from a representative sample in one year.It finds that a larger proportion of minority and low-income students are promoted to the next level of math without adequate preparation relative to other students.Additionally, course grades explain 35% of the variation in EOC test scores, while ethnicity and income explain only 6% of the variation.
In a descriptive analysis, Clark (2009) matches course grades with Georgia EOC tests and finds sizable differential grading across districts in eight subjects.Some school districts had a significant proportion of students receiving an A for a course even though they failed to meet standards on the EOC test.Other school districts had a large portion of students who received a C but exceeded standards on the exam.Humanities courses had larger differential grading than science and mathematics courses perhaps because humanities courses have a larger emphasis on writing, which is more subjective to grade than math.Finally, the EOC failure rate exceeded the course failure rate for each course.The differences ranged from 8.18 percentage points in 11 th grade English to 29.98 percentage points in Economics.Thus, each course had a significant number of students who passed the course but failed the EOC test.
In sum, previous research points to the existence of differential grading in high schools.However, it does not examine whether differential grading varies by school, teacher, or student characteristics.The current paper provides a more complete picture of differential grading by comparing course grades and EOC tests in multiple years across the population of North Carolina students in two courses, Algebra I and English I. Thus, the results have stronger external validity than previous studies.While the paper cannot distinguish between grade discrimination and other forms of differential grading, it does examine student, teacher, school, and district-level patterns across the state.
Overall, I find that student characteristics are stronger predictors of differential grading than teacher, school, or district characteristics.Female, Limited English Proficient, and 12 th grade students earn statistically significant higher grades than other students in all subjects, holding test scores and student, teacher, and school characteristics constant.Low-income students, in contrast, earn lower grades than other students in both subjects.With the exception of Algebra I low-income students, these differences are large enough to move a student one grade category on a plus/minus 7-point A-F grading scale.Black students earn higher Algebra I grades but lower English I grades than white or Asian students with the same test score, but these effect sizes are smaller than other student characteristics.Interactions between student and teacher race and gender yielded small estimates that were not consistent between subjects.
The next section includes background information on EOC testing in North Carolina and describes the data used in this paper.The third section provides statewide descriptive statistics on the relationship between EOC test scores and course grades.The next section describes the empirical model, which is followed by a discussion of regression results.The paper ends with a brief conclusion.Students receive three measures of their score: a scale score, a percentile rank, and an achievement level of I, II, III, or IV.The scale score ranges from about 120 to 180, depending on the test.The percentile rank is calculated based upon how a student's score compares to students who took the test in the norming year rather than students in the same test administration.For Algebra I and English I, the norming year occurred prior to the first year of data, so test scores are comparable across years.In terms of achievement levels, Level I and II are considered failing, while III and IV are considered passing.State law requires that a student's EOC test score constitute 25% of the overall course grade, giving students an immediate incentive to perform well on the test.Each district has a formula to convert the scale score to a 100-point converted score that teachers are expected to enter as 25% of each student's final grade. 1 The Department of Public Instruction (DPI) does not provide statewide recommendations or track the policies districts use to meet this requirement.

Data Description
I use statewide, student-level data for the 2007-08 to 2009-10 academic years from the North Carolina Education Research Data Center (NCERDC) at Duke University's Sanford School of Public Policy.The data include student scores on the Algebra I and English I EOC tests and each student's grade in the course corresponding with the EOC test.In addition, the data include student-level information on race, free or reduced price lunch eligibility (FRL), exceptionality status, and Limited English Proficient (LEP) status.The NCERDC's unique student identifier allows student EOC test scores, course grades, and demographics to be linked between datasets and across the three years.Most districts reported course grades on a numeric 1-100 scale or letter grades on a 7-point scale.However, several districts reported grades on a numeric 1-10 scale or a 7-point letter grade scale with plus and minus grades.I converted all grades to a numeric 1-100 scale to allow for comparison and inclusion as the dependent variable in my regression analysis. 2Appendix A describes the methodology for converting grades to the same scale.
Using NCERDC's course membership data, I linked each student with his or her teacher in each course, enabling regressions to include teacher race, gender, certification type, and class size.However, due to linking limitations in the source data, only 72% of students in English I and 78% of students in Algebra I had a matching teacher.Thus, regression specifications that include teacherlevel characteristics use only this subset of data.Regressions without teacher characteristics include the entire high school student population with matching EOC test score and course grade.Appendix 3 includes summary statistics comparing student characteristics between the full student population and the subset in both courses.Finally, I supplemented the data with school-and district-level variables from NCERDC.
I restricted the data to include only North Carolina high school students.Since some districts encourage students to take Algebra I in middle school, high schools in these districts have fewer Algebra I students than English I.Many middle school students who enroll in Algebra I are high achieving relative to their peers.Since these students are not included in the data, the statewide average Algebra I EOC test score is lower than in English I. Appendix A provides a more complete description of the methodology used to clean and merge the data.

Statewide Patterns
Differential grading occurs when students in courses with the same content and curriculum receive inconsistent grades across teachers, schools, or districts.Before examining differential grading patterns, it is helpful first to understand the relationship between course grades and EOC test scores across the state.As a whole, grades and EOC test scores should be highly correlated, especially because the EOC test score makes up 25% of the course grade.Table 1 provides statelevel summary statistics of this relationship over three years in Algebra I and English I.As noted above, the average Algebra I EOC test score and course grade are lower than in English I because I exclude middle school students who take Algebra I, many of whom are higher achieving than their peers.For each course, a 1 standard deviation change in course grade is about 10 points on a 100point scale.
Disaggregating the EOC scores by letter grade provides further information about the relationship between test scores and course grades.Table 2 provides the statewide average EOC percentile score by letter grade.The letter grades are on a 7-point scale, where A is a 93-100, B is an 85-92, C is a 77-84, D is a 70-76, and F is below 70.Achievement Levels range from I to IV. Level I and II are considered failing, while III and IV are considered passing.Overall, the average test scores in each letter grade are monotonic in Algebra I and English I.However, the percentile scores are much lower than the numeric grade required for a student to receive a certain letter grade.For example, the cutoff grade for a C is 77, but the percentile score is 51.9 in Algebra I and 47.4 in English I.
2 Each regression specification includes an indicator variable to control for the grading scale.Table 2 also counts the percentage of students in each letter grade category who failed the EOC.For example, in Algebra I, 1.0% of students who received an A in the course failed the EOC test.Similarly, 90.5% of students who received an F in the course also failed the EOC test.Thus, nearly 10% of students passed the EOC test but received an F in the course.In English I, roughly 10 percentage points fewer "D" and "F" students failed the EOC test than in Algebra I.The English I EOC test includes composition and textual analysis sections, both of which test reading and writing ability rather than subject specific content.Algebra I requires more content knowledge and skills, so students who do not learn coursework may be more penalized in Algebra I than in English I.
The final part of Table 2 shows the gap between the EOC and course failure rates in each course.Algebra I has a higher course and EOC failure rate because higher achieving students enroll in middle school.In addition, the gap between failure rates is larger than in English I.While more than a third of Algebra I high school test takers failed the EOC, only 15.6% of them failed the course.This larger gap indicates that teachers may anchor grades around a certain distribution to some extent, regardless of ability or performance.3

Empirical Model
The descriptive statistics in the previous section show that differential grading likely exists between North Carolina students.However, regression analysis with individual students as the unit of observation is necessary to identify grading patterns by student, teacher, school, and district characteristics, controlling for the student's test score.To accomplish this goal, I use regression models with the following form: where G i,t is student i's numeric course grade from 60 to 100 in year t, T i,t is a vector of four linear splines of student i's EOC scale score in year t, X i is a vector of student and teacher characteristics, Z i is a vector of school and district characteristics, λ t represents time effects, and ε i,t is the error term.
The linear splines in T i,t are split by the four achievement levels. 4Each spline equals the amount of the scale score that falls into its achievement level.For example, in Algebra I, the range for Level I is 139 or less, Level II is 140 to 147, Level III is 148 to 157, and Level IV is 158 or greater.A student with a score of 160 is in Level IV by 3 points.Therefore, his Level I spline is 139, Level II spline is 8, the Level III spline is 10, and Level IV spline is 3 points.A student with a score of 145 is in Level II by 6 points.As a result, his Level I spline is 139 and Level II spline is 6.Level III and Level IV are zero because his scores fall below these levels.This flexible functional form allows the slope of the regression line to vary between each achievement level.EOC percentile scores range from 0-100, while course grades range from 60-100.Level I and Level II scores are considered "failing" and thus have a narrow range on the course grade range (60-69).However, the possible scale scores range from 120 to 147.Level III and IV scores, on the other hand, range from 148 to 180 but with grade range from 70-100.Furthermore, while each district sets the conversion from EOC score to numeric grade for the 25% requirement, many use the Achievement Levels as letter grade breaks for the converted grades.In sum, this functional form captures the effect of test scores on course grade more completely than a linear functional form by assuming that the relationship between EOC scores and course grades does not remain constant across achievement levels.In each specification, the coefficient on each spline was statistically different from the other spline coefficients.5X i represents a vector of student and teacher characteristics.The student characteristics include indicator variables for grade level, gender, race, free or reduced price lunch status (FRL), exceptionality status, and Limited English Proficiency (LEP) status.The regressions also include interactions terms between student gender and race indicator variables.The teacher characteristics include indicator variables for gender, race, license type, and degree level as well as a continuous variable for class size.One specification also includes interaction terms between student and teacher race as well as student and teacher gender to measure whether teachers of a certain race or gender differentially grade students of a certain race or gender.Table B1 in Appendix B includes a description of covariates.
Z i is a vector of school and district characteristics.School characteristics include percent poverty, percent by race, school size, and an indicator variable for whether the school has missed Adequate Yearly Progress for federal accountability in at least one of the three years.The district characteristics include a vector of geographic indicator variables using the same method as Clotfelter, Ladd, and Vigdor (2008).In addition, it includes a vector of indicator variables signifying a district's grading scale. 6One specification replaces school and district characteristics with school fixed effects to hold constant unobserved differences between schools, including differences in grading policies or emphases on test-taking strategies.In this case, the remaining coefficients measure differential grading within schools.Table B1 in Appendix B includes a description of covariates.
The time effects, λ t , hold constant all statewide differences across the three years of data.I use indicator variables for the 2008-09 and 2009-10 academic years, and the 2007-08 academic is the omitted year for reference.
Each specification includes robust standard errors that are clustered at the school level.The next section discusses results from each model.Appendix C provides descriptive statistics for the independent variables in each course.

Results
This section discusses the regression results for Algebra I and English I.The following subsections are split to match the groups of variables in the regressions.Full regression results are reported in Appendix E. Table 1 in Section 3 provides the mean and standard deviation for course grades.In both courses, a 1-point change in course grade is equivalent to roughly 0.1 standard deviations.In the Algebra I base model (Model 1), which includes the test score linear splines and the district grading scale variables, the observables explain 57.2% of the variation in course grades.Model (5) includes student and teacher characteristics along with school fixed effects for the subset of data that have teacher-student matches, which is 72% of students in English I and 78% in Algebra I.The observables explain 63.8% of the variation in course grades.In English I, Model (1) yields an R 2 of 0.513, and it increases to 0.576 in Model (5).Thus, test scores explain more of the variation in course grades in Algebra I than English I.While the R 2 from Models (1) and ( 5) cannot be directly compared because the latter uses the subset of data, generally the other observables explain 6.3% to 6.6% of the grade variation in both courses.
Since this paper focuses on differential grading across a set of covariates, the results discussion emphasizes the change in course grades for a unit or indicator value change in each covariate rather than the model's explanatory power as a whole.

Student Characteristics
Differential grading exists between student characteristics.Table 3 provides regression results for student-level variables in Algebra I for Models (1), ( 2), ( 3), (4), and (5).Table 4 provides the same results for English I.By grade level, Algebra I and English I have different grading patterns in 9 th , 10 th , and 11 th grade.In Algebra I, 10 th graders receive grades 0.723 to 0.988 points lower than 9 th graders, holding all else constant.This difference is equivalent to 0.07 to 0.10 standard deviations.This finding may be due to course-taking patterns whereby most 10 th grade Algebra I students are either weaker students or are retaking the course.Students retaking the course may improve their test score from previous attempts, but if they do not change their habits in the classroom, their grade may not improve.Thus, they may receive a lower grade relative to 9 th graders, many of whom are taking it for the first time.Eleventh graders receive higher and statistically different grades than 10 th graders in every specification except Model (4).These values are still 0.340 to 0.809 points below 9 th graders.English I students, in contrast, receive higher grades in each subsequent grade level relative to 9 th graders.Students in 11 th grade earn a course grade that is 1.964 to 2.731 higher than 9 th graders in English I, equivalent to between 0.19 and 0.27 standard deviations.
Twelfth graders earn statistically significant higher grades than other grade levels in both English I and Algebra I.These grades are 0.07 to 0.12 standard deviations higher in Algebra I and 0.35 to 0.46 standard deviations higher in English I. Some of this increase is likely due to increased students' effort to graduate because they are closer to the end of school.Additionally, teachers may raise the grades of students who are in danger of failing so that they can graduate, and principals may pressure them to do so.
Differential grading also exists by gender and race.Female students earn grades that are 1.8 to 2.4 points (0.18 to 0.24 standard deviations) higher than male students in both courses regardless of race, holding test scores constant.This difference is large enough to move a female student up one category on a plus/minus 7-point grading scale, such as a B+ to an A-.These results are not sensitive to other covariates or school fixed effects.These findings are comparable to effect sizes in Lavy (2008) and Cornwell et al. (2013) but are larger than those in Lindahl (2007).Girls may be more conscientious students by turning in more work, having fewer discipline problems, and studying more than boys.In addition, girls may have higher non-cognitive skills than boys that lead to increased grades without a change in test scores (Cornwell et al., 2013).Further, girls may respond more negatively to pressure on testing day, causing them to receive scores below their ability.Finally, teachers may have gender stereotypes that systematically cause them to give higher grades to girls (Ehrenberg et al.,1995;Lavy, 2008).
In terms of race, opposite findings emerge for Algebra I and English I, and the effect sizes are smaller than with gender.The regressions include indicator variables for race and gender as well as interactions between race and gender.All minority male students in Algebra I have positive coefficients relative to white and Asian male students when teacher and school characteristics are included in the model, but only the coefficient for black male students is statistically significant.Black male students earn a grade about 0.774 points (0.08 standard deviations) higher than white or Asian male students, holding the test score, teacher characteristics, and school constant.In the same model, female black students earn a grade that is 2.687 points higher than white male or Asian students, a difference of 0.27 standard deviations, but a large portion of this difference is driven by gender rather than race.The black coefficient only becomes statistically significant when other student characteristics, including poverty, and school characteristics are added.Hispanic male students receive statistically significant higher grades only when the control for Limited English Proficient (LEP) students is not in the regression (Model 2).Since many Hispanic students may also be classified as LEP, it seems that their higher grades have more to do with their language classification than with their ethnicity.
In English I, the signs on all minority variables are negative and statistically significant, but the grade differences only represent 0.03 to 0.07 standard deviations when teacher and school characteristics are included in the regression.As with Algebra I, when teacher and school characteristics are added, the coefficients on the Black and Hispanic variables move in a positive direction.
Differential grading also occurs based upon other student characteristics.When school fixed effects are included in Model (5), Algebra I students who are eligible for free or reduced price lunch (FRL) receive grades that are 1.110 points (0.11 standard deviations) lower than other students.In English I, FRL students earn grades that are 1.947 points (0.20 standard deviations) lower than non-FRL students.The difference in Algebra I is only large enough to move borderline non-FRL students up one category on a plus/minus 7-point grading scale, but the English I difference is large enough to move non-FRL students up one category.One possible explanation for this finding is that non-FRL students may have more access to test preparation resources than FRL students.It is also possible that fewer FRL students have college aspirations than non-FRL students.High school grades have a limited impact on the blue-collar labor market, so FRL students may not try to earn higher grades on the margin as long as they pass the course (Arcidiacono, Bayer, & Hizmo 2010).The result could also be due to grading discrimination against lower low-income students, as Hanna and Linden (2012) find in India with teachers issuing 0.03 to 0.08 standard deviation lower scores to "lower-caste" students relative to "high caste" students.Each of these factors could have some role in the differential grading, but this model does not allow for distinguishing between causes.
Students with an exceptionality receive higher grades in both courses, but the coefficients in English I are not statistically significant.In Model (5), exceptional students in Algebra I earn grades that are 0.793 (0.08 standard deviations) higher than other students.Exceptional students receive additional services relative to other students, which may benefit them more on course performance than on test performance.Additionally, teachers may try to compensate for academic exceptionalities by artificially improving grades.Or, for students with exceptionalities related to discipline, teachers may increase their grades so that they do not have to teach them again.
LEP students earn grades that are 2.217 (0.22 standard deviations) points higher than non-LEP peers in Algebra I and 2.035 (0.20 standard deviations) points higher than non-LEP peers in English I, all else constant.As with exceptional students, LEP students may benefit from the extra supports afforded to them in several ways.They receive more one-on-one attention, which may increase the amount of classwork they complete.They also may have help for their classwork, allowing them to earn higher grades than students with the same test score who do not receive such support.Lastly, teachers may try to compensate for the language barrier by artificially raising a student's grade.
In sum, female students, LEP students, and 12 th graders in both subjects earn grades that are at least 0.19 standard deviations higher than other students with the same test score.In addition, low-income students in English I receive grades that are 0.20 standard deviations lower than other students with the same test score.Low-income Algebra I students earn grades that are 0.11 points lower than non-FRL peers.With the exception of Algebra I FRL students, these differences are large enough to move a student one grade category on a plus/minus 7-point grading scale.These findings persist when the model controls for teacher, school, and other student characteristics.Finally, the sign and significance of the race indicators are not consistent in both subjects, and the effect sizes are smaller than other student characteristics.

Teacher Characteristics
Teacher characteristics as a whole do not play as large a role in differential grading as student characteristics.In Algebra I, female teachers give grades that are 0.379 points (0.04 standard deviations) lower than male teachers, holding test scores, student characteristics, and school characteristics constant.Other teacher characteristics are not statistically significant.None of the teacher characteristics are statistically significant in English I.
In both courses, the class size variable has a small but statistically significant negative coefficient.For each additional student, a student's course grade decreases by 0.045 points in Algebra I and 0.037 points in English I, which is less than 0.01 standard deviations in both cases.Appendix E includes the full results for teacher characteristics.

Teacher and Student Interactions
To fully capture differential grading at the student and teacher level, Models (6) and ( 7) include interaction terms between teacher and student race and gender.While some differential grading patterns exist by race and gender within subjects, the findings do not hold for both subjects.In addition, many of the coefficients are not statistically significant, and the size of the differential grading is much smaller than the individual student variables (0.03 to 0.07 standard deviations).
Appendix D presents a table with calculated coefficients from the regression results and includes a discussion of the findings.

School Characteristics
Overall, school characteristics appear to play a minor role in differential grading.In English I, for a 10-percentage point increase in a school's low-income students, a student's course grade increases by 0.21 points (0.02 standard deviations).Percent poverty is not statistically significant in Algebra I.
For a 10-percentage point increase in American Indian students, student grades increase by 0.59 points (0.06 standard deviations) in English I and 0.034 (0.03 standard deviations) in Algebra I, all else constant.Other school-level race variables are not statistically significant.
In Algebra I, as school size increases by 100 students, a student's grade decreases by 0.1 points (0.01 standard deviations).While this effect is small, it is in the expected direction because larger schools may be more impersonal than smaller schools.The threat of missing Adequate Yearly Progress (AYP) is not significant in both courses.
Appendix E includes the full results for school characteristics.

District Characteristics
Differential grading also exists between the five large school districts and six regions in the state, as seen in Table 5. Wake County and Charlotte-Mecklenburg Schools have lower course grades than all other groups in both courses.In Algebra I, the grade difference in the other districts and regions ranged from 1.145 (0.11 standard deviations) in the Urban Coast to 4.220 (0.42 standard deviations) in Guilford.In other words, Guilford students earn grades that are 4.220 points higher than Wake or Charlotte-Mecklenburg students with the same test score, which is enough to move them up two grade categories on a plus/minus 7-point scale.Students in the Rural Mountains, Rural Piedmont, and Urban Mountains earn higher and statistically different grades from the other three geographic groups, but the differences are less than one point.In English I, Charlotte-Meckenburg students earn grades that are 1.184 points lower than Wake County students, 2.872 points lower than Cumberland, and 2.958 points lower than Guilford students, all else constant.Table 5.

District Characteristics Regression Results
Algebra I English I Model ( 6 Of the six geographic groupings, Rural Mountains is the only group with a positive coefficient that is statistically different from the other five geographic groupings in English I, but the grade difference is less than one point.While this analysis cannot determine the reasons for this differential grading, it is important to note that differential grading can also occur systematically at the district or region level.

Summary
Overall, student characteristics are stronger predictors of differential grading than teacher, school, or district characteristics.Female, Limited English Proficient, and 12 th grade students earn statistically significant higher grades than other students in Algebra I and English I, holding test scores and student, teacher, school, and district characteristics constant.Low-income students, in contrast, earn lower grades than other students in both subjects, all else constant.With the exception of Algebra I low-income students, these differences are large enough to move a student one grade category on a plus/minus 7-point A-F grading scale.Black students earn higher Algebra I grades but lower English I grades than white or Asian students with the same test score, but these effect sizes are smaller than other student characteristics.Interactions between student and teacher race and gender yield small estimates that are not consistent between subjects.

Conclusion
Descriptive statistics show that significant variation exists between course grades and EOC scores in both Algebra I and English I.In addition, certain groups of students are systematically graded differently from other student groups, even when controlling for EOC test performance and other factors.This differential grading matters because course grades play an important role in college admissions.Students who receive artificially higher grades than other students with similar ability, content knowledge, and environment may have an advantage in college admissions.
Furthermore, these students may need additional remedial work to relearn material in college, which has costly implications for the students and the state.
This research adds to the literature on grading patterns in several ways.It is the first study to examine the relationship between course grades and scores on mandatory, curriculum-based tests for all high school students in a state over multiple years.As a result, the findings have stronger external validity than previous studies that use tests given primarily to high-achieving students or that have data for only one year.The paper also adds to the debate in the literature on whether teachers of a certain race or gender grade students of a certain race or gender differently.
One limitation of the research is that it cannot distinguish between causes of differential grading, such as varying grading standards, grade discrimination, or systematic differences in student behavior.Nonetheless, college admissions decisions often rest heavily on a student's GPA, and the existence of differential grading benefits some students at the expense of others with similar ability and content knowledge as measured by test scores.
students with failing grades far below 70 would bias estimates downward.In addition, differential grading is not important between failing grades because students do not receive credit for a course in both cases.
To convert 10-point numeric grades to the 100-point numeric scale, I initially added a zero on the end of each number.However, the regression results in each course indicated that districts with this scale were issuing grades that were 4.5 to 5 points lower than other districts.Since this pattern occurred in all subjects, I added 4.5 to each numeric grade for students on this scale, which also coincides with the midpoint of 80 and 89.For example, I converted a numeric grade of 8 to 84.5.After this adjustment, the indicator variable was no longer statistically significant in either subject.
To convert letter grades to numeric grades, I imputed the average numeric grade in each letter grade category from all districts on the numeric scale.The average racial and socioeconomic composition of districts on a letter grade system and those on a numeric grade system were not statistically different.Assigning grades based upon the midpoint of the grade range for each letter would have assumed that actual numeric grades within each letter are distributed randomly.While this pattern holds true for B, C, and D, it does not hold true for A's and F's.The imputed value is about 1 point below the midpoint of the A range, likely because of a ceiling effect at 100.For F, the distribution is weighted heavily toward 60 because all grades below 60 are converted to 60.As a result, the imputed value is 62.49 instead of 65.For schools on the plus/minus letter grade system, I imputed the midpoint for the range, with the exception of A+, because the range was only 2 to 3 points in each category.Due to the ceiling effect, I imputed 99 for A+ since the range was 99-100.The following table shows the ranges and imputed values.Another obstacle was accounting for students who take a course in two parts.For example, some students take Algebra IA first semester and IB second semester.In these cases, I used only the grade from Algebra IB because students take the EOC test after this course, which means that the EOC test score will be included only in the grade for Algebra IB.
Some students took a course or an EOC test multiple times, sometimes within the same year.To address this issue, I only included students who had an EOC score and a course grade in the same semester.I excluded students with either a missing test score or a missing course grade.If a student had multiple test scores in the same semester, I used the highest test score.Students who receive a Level I or II score are provided with the opportunity to retake the test several days after initial testing.Teachers are most likely to input the highest score as part of the student's overall course grade, so it is the most accurate representation of the relationship between course grades and test scores.
In the teacher-level characteristics, less than 1% of the class size variable had a value larger than 50 students.To prevent biased findings due to these outliers, I changed all class size values above 50 to equal 50.Teacher/Class Characteristics Description Female = 1 if teacher j is female.Black = 1 if teacher j is black.White/Asian teachers are the omitted category.Hispanic = 1 if teacher j is Hispanic.White/Asian teachers are the omitted category.Other Race = 1 if teacher j is either multiracial or American Indian.White/Asian teachers are the omitted category.Temporary License = 1 if teacher j has a temporary license.Type 2 license teachers (3+ year of experience) are the omitted category.New Teacher = 1 if teacher j has Type 1 license (0-2 years of experience).Type 2 license teachers (3+ years of experience) are the omitted category.Masters = 1 if teacher j has a master's degree or higher.Class size = number of students in student i's class.

School Characteristics Description
Percent Poverty = percent of students in student i's school who are eligible for free or reduced price lunch.Percent Black = percent of students in student i's school who are black.Percent Hispanic = percent of students in student i's school who are Hispanic.Percent Indian = percent of students in student i's school who are Indian.School-level data do not have an "other" category.Missed AYP* = 1 if student i's school missed Adequate Yearly Progress (AYP) targets at least once over the three years of data.

Student Characteristics Description
School size (students) = the number of students in student i's school.

District Characteristics Description
Charlotte-Mecklenburg = 1 if student i's school is located in Charlotte-Meckenburg. Winston Salem/Forsyth = 1 if student i's school is located in Winston Salem/Forsyth.Guilford = 1 if student i's school is located in Guilford.Cumberland = 1 if student i's school is located in Cumberland.Rural Coast = 1 if student i's school is located in the rural coast.Rural Mountains = 1 if student i's school is located in the rural mountains.Rural Piedmont = 1 if student i's school is located in the rural piedmont.Urban Coast = 1 if student i's school is located in the urban coast.Urban Mountains = 1 if student i's school is located in the urban mountains.Urban Piedmont = 1 if student i's school is located in the urban piedmont.Numeric 1-10 Grading Scale = 1 if student i's district grades are reported on a numeric 1-10 grading scale.Districts with numeric 1-100 scales are the omitted category.7-point Letter Grade (A-F) = 1 if student i's district grades are reported on an A-F letter grade system.Districts with numeric 1-100 scales are the omitted category.7-point Letter Grade (+/-A-F) = 1 if student i's district grades are reported on an A-F letter grade system that includes +/-.Districts with numeric 1-100 scales are the omitted category.
Note: Under district characteristics, North Carolina regions are as defined in Clotfelter, Ladd, & Vigdor (2008).Wake County is the omitted category.

Appendix C Independent Variable Summary Statistics by School and Class
Table C1 and C2 provide summary statistics for school and district independent variables in Algebra I and English.Tables C3 and C4 provide summary statistics for the student and teacher covariates in Algebra I and English I.In addition, each table provides a comparison of student characteristics between the full data and the subset of data that includes all students with a matched teacher.For most covariates, the average values between groups were statistically different due to the large number of observations, even if the values were almost equivalent.

Appendix D Teacher and Student Interactions
For ease of interpretation, Table D.1 includes the calculated coefficients from the regression results with white and Asian male students and white and Asian male teachers as the omitted categories.The following discussion uses calculations from Model (7), which includes school fixed effects.Full regression results are included in Appendix E.
In Algebra I, black students with black teachers earn grades that are 0.581 points (0.06 standard deviations) higher than white students with white teachers, all else constant.This finding is statistically significant.However, black students with black teachers earned grades that are slightly higher but not statistically different from black students with white teachers.
This pattern does not hold in English I. Black students with black teachers earn grades that are not statistically different from white students with white teachers.However, black students with black teachers earn grades that are 0.344 points (0.04 standard deviations) lower than black students with white teachers, which is a statistically significant difference.Furthermore, white teachers also issue statistically different lower grades to both Hispanic and "other" students.However, these differences represent a grading difference of less than 0.07 standard deviations.
The teacher and student gender interactions have more statistically significant results than the race interactions.In Algebra I, female students with female teachers earn grades that are 0.265 points (0.03 standard deviations) lower than female students with male teachers.In English I, however, female students with female teachers earn grades that are 0.264 points higher than female teachers with male teachers.Only the Algebra I result is statistically significant.Male Algebra I students with female teachers earn grades that are 0.480 points lower than male students with male teachers.In English I, male students with female teacher earn grades that are 0.147 points lower male students with male teachers.Only the Algebra I coefficient is statistically significant.
As with student and teacher race, the patterns in grading by teacher and student gender are not consistent between subjects.Thus, differential grading does not appear to occur systematically between teachers and students of a certain gender.Note: Dependent variable is a student's numeric course grade.Omitted Category in race interactions is white or Asian teacher/white or Asian student.Thus, coefficients represent the difference in grade in a specific group relative to this group.Omitted category in gender interactions is male teacher/male student.Thus, coefficients represent the difference in grade in a specific group relative to this group.Standard errors are robust and clustered at the school level.Models (4), ( 5), (6), and (7) use only the subset of data with teacher-student matches.Note: Dependent variable is a student's numeric course grade.Omitted Category in race interactions is white or Asian teacher/white or Asian student.Thus, coefficients represent the difference in grade in a specific group relative to this group.Omitted category in gender interactions is male teacher/male student.Thus, coefficients represent the difference in grade in a specific group relative to this group.Standard errors are robust and clustered at the school level.Models (4), ( 5), (6), and (7) use only the subset of data with teacher-student matches.

North Carolina End-of-Course Tests First
instituted in 1987, North Carolina EOC tests are aligned with the state's Standard Course of Study.Students on a full-year schedule take EOC tests in the last ten days of class, and those on a semester or block schedule take EOC tests in the last five days of class.Schools are required to offer makeup testing to absent students for two weeks after the initial test administration date.The Algebra I exam has 64 operational items and 14 field test items.The English I exam has 56 operational test items and 24 field test items.All questions are multiple choice, and scoring is automated.

Table 1 .
2007-08 to 2009-10 Statewide EOC and Course Grade Summary StatisticsCalculations include all non-alternative high school students with a matching course grade and EOC score.Percentile scores are calculated relative to the norming year, rather than other students in the same year.Thus, percentile rank over the 3-year period does not equal 50.Source: Calculations from NCERDC transcript and EOC data.

Table 2 .
2007-08 to 2009-10 EOC and Grade Summary Statistics Note: Calculations include all non-alternative high school students with a matching course grade and EOC score.Percentile scores are calculated relative to the norming year, rather than other students in the same year.Thus, percentile rank over the 3-year period does not equal 50.Students who failed EOC earned a Level I or II score.Source: Calculations from NCERDC transcript and EOC data.

Table 3 .
Algebra Dependent variable is a student's numeric course grade.White and Asian students are the omitted race category.Thus, coefficients represent the difference in grade in a specific group relative to this group.Standard errors are robust and clustered at the school level.Full results in Appendix E. Model 1 includes test score linear splines, grading scale indicators, and year effects.Model 2 adds student grade level, race, and gender.Model 3 adds other student characteristics.
Model 4 adds teacher, school and district characteristics and uses only subset of data with teacher-student matches.Model 5 replaces school and district characteristics with school fixed effects and uses only subset of data with teacher-student matches.

Table 4 .
English * p<0.05, ** p<0.01, *** p<0.001Note: Dependent variable is a student's numeric course grade.White and Asian students are the omitted race category.Thus, coefficients represent the difference in grade in a specific group relative to this group.Standard errors are robust and clustered at the school level.Full results in Appendix E. Model 1 includes test score linear splines, grading scale indicators, and year effects.Model 2 adds student grade level, race, and gender.Model 3 adds other student characteristics.Model 4 adds teacher, school and district characteristics and uses only subset of data with teacher-student matches.Model 5 replaces school and district characteristics with school fixed effects and uses only subset of data with teacher-student matches.
Dependent variable is a student's numeric course grade.Model 6 includes student, school, and district characteristics and year effects.Model 7 replaces school and district characteristics with school fixed effects.Standard errors are robust and clustered at the school level.
* Each coefficient used in calculation is significant with p<0.05.Note: