Grading Standards in Education Departments at Universities

Students who take classes in education departments at universities receive significantly higher grades than students who take classes in other academic departments. The higher grades awarded by education departments cannot be explained by differences in student quality or by structural differences across departments (i.e., differences in class sizes). The remaining explanation is that the higher grades are the result of lower grading standards. This paper formally documents the grading-standards problem in education departments using administrative grade data from the 2007-2008 academic year. Because a large fraction of the teachers in K-12 schools receive training in education departments, I briefly discuss several possible consequences of the low grading standards for teacher quality in K-12 schools.

There is a large and growing research literature showing that teacher quality is an important determinant of student success (recent studies include Aaronson et al., 2007; Koedel, 2008; Nye et al., 2004; Rivkin et al., 2005; Rockoff, 2004).But while there is persistent research into a variety of interventions aimed at improving teacher quality, surprisingly little attention has been paid to the primary training ground for K-12 teachers-education departments at universities.This paper provides an evaluation of the grading standards in these education departments.I show that education students receive higher grades than do students in every other academic discipline.The grading discrepancies that I document cannot be explained by differences between education and non-education departments in student quality, or by structural differences across departments.The likely explanation is grade inflation.
The earliest evidence on the grading-standards problem in education departments comes from Weiss and Rasmussen in 1960.They showed that undergraduate students taking classes in education departments were twice as likely to receive an "A" when compared to students taking classes in business or liberal arts departments.The low grading standards in education departments, illustrated by these authors over 50 years ago, are still prevalent today.
I document the disparity in current grading standards between education departments and twelve other academic departments that are common to most universities using administrative data from the 2007-2008 academic year.The comparison departments include (1) math, science and economics departments: biology, chemistry, computer science, economics, mathematics and physics; (2) social-science departments: political science, psychology and sociology; and (3) humanities departments: English, history and philosophy. 1 With great consistency, the data show that the grades awarded by education departments are substantially higher than the grades awarded by all other academic departments.
The primary purpose of this paper is to highlight the magnitude of the current gradingstandards discrepancy between education and non-education departments.Anecdotally, although most people seem to be aware that grading standards are lower in education departments, the magnitude of the difference does not seem to be well understood.It would also be of interest to evaluate the policy implications of the low grading standards in education departmentsfor example, they are likely to affect the quality of the K-12 teaching workforce.But a formal analysis along these lines seems virtually impossible given current data conditions.More concretely, there appears to be little in the way of meaningful variation in grading standards across education departments at different universities (see below), or within education departments over time (Weiss and Rasmussen, 1960).Without this variation, researchers cannot evaluate the counterfactual impact of more-stringent grading standards in education departments.The lack of variation in the data, and the corresponding lack of rigorous research on this topic, however, should not be viewed as a verdict on the importance of the grading-standards problem.The low grading standards in education departments may have a large impact on the delivery of K-12 education in the United States despite our difficulty in formally evaluating this issue.
Although data limitations prevent a formal analysis, I draw on evidence from the broader research literature to consider several ways that the low grading standards in education departments may affect teacher quality in K-12 schools.First, Babcock (2010) shows that grade inflation in college reduces student effort, which in turn, reduces human-capital accumulation (Stinebrickner and Stinebrickner, 2008). 2 Under fairly modest conditions, lower human-capital accumulation among prospective teachers in college will negatively affect teacher quality in K-12 schools.Second, there is a striking similarity between the low grading standards in education departments and the low evaluation standards for teachers in the workforce (see, for example, Harris and Sass, 2010; Jacob and Lefgren, 2008; TNTP, 2009).Murphy and Cleveland (1991) indicate that employee evaluations can be affected by contextual normsif the low grading standards to which prospective K-12 teachers are exposed in college affect their expectations in the workforce, this may affect their evaluations in schools.The low evaluation standards for teachers in K-12 schools have been identified as a likely impediment to student achievement by The New Teacher Project (TNTP, 2009).Finally, the grading standards in education departments are so low that grades cannot be used to meaningfully distinguish students.In other departments, grades signal information about performance to students, allowing them to sort into disciplines for which they are well-matched (Arcidiacono, 2004; Michaels, 1976).There appears to be no such role for grades in education departments-students who are poorly-matched for teaching careers are unlikely to receive any indication that this is the case in their grade reports. 3

I. Data
In my primary analysis I evaluate data from three large, public universities that have sizeable undergraduate programs in education -Indiana University, Bloomington; Miami University, Oxford (Ohio); and the University of Missouri, Columbia.Total enrollments at these universities in the fall of 2007 were approximately 39,000, 16,000 and 28,000, respectively.The dataset includes the universe of undergraduate classroom-level grade reports from administrative data for each university for the 2007-2008 academic year.I use these grade reports to characterize the distributions of classroom-level grades awarded by academic department. 4 3 Arcidiacono (2004) shows that students sort across college majors using information about grades, which is consistent with grades providing information to individuals about their relative strengths and weaknesses.In a different context, Michaels (1976) provides a general discussion of the role of grades at universities, specifically noting their role in sorting students. 4I use the term "classroom" to mean "lecture"that is, I do not treat teaching-assistant "sections" of the same lecture as separate classes.
I compare the grade distributions found in the education departments to those in the 12 academic departments listed above.These departments were chosen to ensure that the major academic departments found at most universities are represented in the data.In results omitted for brevity, I show that adding additional smaller departments to the comparisons, such as anthropology or human studies, does not affect the findings.
All three universities designate a level ranging from one to four for each undergraduate class.
The levels roughly correspond to first-year, second-year, third-year, and fourth-year classes.I focus on all classes that are designated as level-2 or higher because education departments offer very few level-1 classes.Because the level assigned to most classes beyond the introductory level will be endogenous to some extent (e. g. , most classes that are designated as level-4 could also be designated as level-3, and vice versa), I do not distinguish by course-level in the analysis.However, note that class-level designations are of no consequence to the results-my findings can be replicated within class-level.
Table 1 reports the average class size, the number of classrooms observed, and the number of student-by-classroom observations for each department-by-university in the primary dataset.
Overall, the data include grade reports from 2,902 classrooms across the three universities, 665 of which are education classes.Over 100,000 student-by-classroom observations underlie the classroom-level grade reports.
The three universities that are the focus of the primary analysis are a convenience samplethese universities post classroom-level grades and enrollment information online.Most universities do not make such data available; but at considerable expense, Myedu.com has constructed a database with course-level grade information for all of the major public universities in the United States. 5Below, I use the Myedu.comdata to confirm that the grade distributions from the education departments in the convenience sample are not unique.Table 2 shows that there are substantial differences in the grade distributions between education and non-education departments.The classroom-level average GPAs in the education departments are 0.5 to 0. 8 grade points higher than in the other department groups.The GPA gaps are even larger at the bottom of the distribution.

II. Grade Distributions
Figure 1 graphically illustrates the grade distributions at each university in the convenience sample.The graphs in the figure are cluttered, but that is largely the point: while all other university departments work in one space, education departments work in another.Notice that it is generally difficult to distinguish which department is which, with the exception that the distributions from the education departments are quite obvious.
5 Myedu.comcollects administrative grade data directly from universities by invoking the Freedom of Information Act (in most cases).Although the information they collect is "free", they are required to pay the costs associated with processing their data requests at each university, which are often in the thousands of dollars.Myedu.com also expends considerable effort to ensure that they are getting the proper information from each university.Several explanations for the observed GPA gaps between education and non-education departments, beyond pure grade inflation, merit discussion.First, the GPA gaps could be justified if education departments draw students who, on average, are of higher quality than students in other academic departments.However, the available evidence suggests that this explanation is unlikely.
For example, Arcidiacono (2004) uses a nationally representative dataset to show that education majors enter college with considerably lower SAT scores, on average, than students in other disciplines.Similar evidence is available from The College Board (2010), which compares collegebound high-school seniors by intended major.Although SAT scores surely do not measure every dimension of quality, the documented discrepancies in SAT scores between education and non-6 Because of the way that the data are stored by myedu.com, it was easiest for them to extract grades at the coursenumber level, not the classroom level.Therefore, the grade data from myedu.com are aggregated to a higher level than are the data in Figures 1 and 2 (a single course will often correspond to more than one class).Descriptively, this is of little practical importance.Also, because of heterogeneity in the numbering sequences across universities, Myedu.com did not attempt to filter out freshman-level courses.Again, this is of little practical importance because there are very few freshman-level education courses.
education majors are not consistent with education students being of higher-quality than students in other disciplines. 7 A second explanation for the GPA gaps is that they are the result of a structural difference between education and non-education departments.An obvious difference that can be seen from Table 1 is that education departments generally offer smaller classes than other departments.If smaller classes correspond to better grades, some of the GPA discrepancies may be attributable to this structural difference.To evaluate this possibility I begin by estimating the correlation between class size and classroom-level GPA using the following regression model, estimated separately for each of the three universities in the primary data sample: In ( 3), the GPA in classroom i, taught in department j at level p during semester t, is regressed on enrollment (that is, class size), ijpt E , and fixed effects for semester (fall or spring-t  ) and department-by-level ( jp  ).This regression identifies systematic differences in grades between courses that differ in size but are taught by the same department and at the same level.
Because there is so little variation in the grades assigned in education departments relative to other departments, I omit education classrooms from the regression.Therefore, the correlations captured by 1  can be interpreted as indicating the university-wide relationships between class size and GPA, measured within department and level, and outside of education departments.I use the estimates of 1  from each university to adjust the classroom-level grade reports for differences in class size, meaning that any remaining differences will be attributable to something else.
7 Arcidiacono (2004) reports average math and verbal SAT scores for science, business, social-science/humanities and education majors upon college entry.For math, the average scores by major are 566, 498, 500, and 458, respectively; the average verbal scores are 499, 444, 481 and 431.The College Board (2010) provides an even moredetailed comparison using more recent data -the SAT-score gaps reported by The College Board (2010) are very similar to those reported by Arcidiacono (2004).
Perhaps not surprisingly, 1  is negative and statistically significant in all three regressions, which means that smaller classes are indeed associated with higher grades.The association is likely attributable to both causal and non-causal factors.For example, smaller classes may cause better grades by improving instructors' abilities to monitor students, facilitating different instructional philosophies, or by increasing the social stigma attached to poor performance.Conversely, a noncausal explanation is that departments teach selective classes on advanced topics, and purposefully limit enrollment to their most able students.If the relationship between class size and GPA is entirely causal, adjusting the GPA gaps for the differences in class sizes between education and noneducation departments will provide a more-accurate comparison.However, to the extent that the correlation is not causal, and simply represents the effect of student sorting by class size, adjusting the GPA gaps based on the output from equation (3) will likely overcompensate for the structural component, and understate the grade-inflation problem.8 Table 3 reports GPA gaps that are adjusted for the class-size discrepancies between education and non-education departments using the estimates of 1  from equation (3).Although the adjusted GPA gaps are slightly smaller, they are still large.This suggests that the GPA gaps are not explained by differences in class sizes between education and non-education departments.
It may also be that fundamental differences in instructional practice between education and non-education departments influence the grade distributions.One possible difference is in instructional philosophy.As a specific example, the mastery-learning framework is likely to be more common in education departments.Within the mastery-learning framework, class topics are handled discretely and students who have difficulty with a given topic receive additional instruction until they succeed.The underlying philosophy that all students can master the topics of the course suggests that grades in courses that are taught under the mastery-learning framework will be higher.While in principle any professor in any discipline can adopt the mastery-learning approach, or something similar, mastery-learning courses are likely to be more common in education departments because (1) the sizeable research literature on mastery learning is in education and (2) the mastery-learning approach may be a more reasonable fit for education classes based on course content.9 A second issue related to instructional practice involves the prevalence of practice-based and/or internship courses in degree programs in education and non-education departments.Relative to other academic disciplines, the nature of the training in education departments is likely to require more practice-based courses.Grades in such courses can either be assigned as pass/fail, in which case they are not factored into student GPAs, or as letter grades.In the latter case, it seems likely that "A's" will be commonly awarded to indicate satisfactory completion, meaning that such classes will cause some grade inflation.
Unfortunately, I cannot investigate the effects on the grade distributions of differences in instructional philosophies across academic departments because data are not available (for education or other departments).However, this seems like a logical starting point for future work that attempts to identify why the grading discrepancies that I document above exist.Alternatively, using data from the University of Missouri and Miami University, where course descriptions are available, I can evaluate the role of practice-based courses in determining the education-department grade distributions. 10At Miami University, the courses that are clearly labeled as practice-based courses in the administrative data are graded on a pass-fail basis, which means that they are not included in the analysis above.At the University of Missouri, the practice-based courses do appear to be gradedif I omit them from the GPA calculations, the average course-level GPA for the Missouri education department declines by just 0.02 grade points.Clearly, the grading discrepancies are not driven by differences across departments in the prevalence of practice-based courses.
Finally, I briefly raise a conceptual issue regarding mechanisms.While it is important to understand the mechanisms that underlie the grade distributions, only in cases where the mechanism is outside of the control of the faculty in academic departments is it reasonable to make an adjustment to the grade distributions.The first two mechanisms considered abovedepartmental differences in student quality and class sizeare plausibly outside of faculty control.11However, the latter two mechanismsdifferences in instructional philosophy and the prevalence and grading of practice-based courses-are determined by faculty.For example, any professor in any discipline can adopt a teaching philosophy that leads to higher student grades.Similarly, departments have some discretion over the role and prevalence of practice-based courses in their degree programs, and equally importantly, over how these courses are graded (i.e., by letter grade or pass-fail).The choices that are made by faculty across departments contribute to the across-department grading discrepancies that are highlighted by this study.

III. Potential Implications of the Low Grading Standards in Education Departments
The previous section documents sizeable differences in the grading standards between education and non-education departments.Unfortunately, an empirical evaluation of the policy implications seems virtually impossible given that the counterfactual of more-stringent grading standards does not appear to exist in a meaningful way.For example, the low grading standards in education departments may adversely affect the quality of the K-12 teaching workforce; however, the data lack sufficient variation to investigate this possibility.
Although I cannot provide direct evidence on the link between the grading standards in First consider the case where each education department from the convenience sample raised its grading standards to be in line with the next-highest grading department at the same university.The corresponding reductions in classroom-level average GPAs in the education departments at Indiana, Miami and Missouri would be 0.42, 0.26 and 0.54, respectively.Applying Babcock's estimate, these differences would correspond to effort increases of between 5 and 11 percent by undergraduate education students.At the other extreme, if the education departments raised their grading standards to be in line with the lowest-grading departments at their respective universities, effort would increase by 17 to 23 percent.Perhaps a reasonable expectation is that education departments bring their grading standards in line with humanities departments, which appear to most closely resemble education departments in terms of class sizes and grades.In this case, student effort in education departments would increase by 11 to 13 percent.
Under fairly modest assumptions, increases in effort during prospective teachers' undergraduate training will increase teacher quality in K-12 schools.First, trivially, increased effort must correspond to increased learning, which appears to be the case (general evidence is available from Stinebrickner and Stinebrickner, 2008).Second, it must be that either (1) a better understanding of the content of classes taught in education departments improves teacher quality, or (2) that indirectly, teachers gain other skills as a result of a more-demanding college experience (e.g., skills in time-management or improved work ethics).I am not aware of any direct evidence that confirms either of these latter points, although recent data collection efforts will facilitate studies that can provide insight in the near future. 13here may also be other consequences of the low grading standards in education departments.For one, grades do not provide meaningful information to students about their relative performance in education classes, meaning that students cannot use their grades to evaluate their fit in the discipline.Among the general college population, Arcidiacono (2004) empirically establishes that grades play a role in helping students sort into college majors.In education, however, students who are not a good fit for the discipline will receive no indication that this is the case from their grade reports.
Another possibility that merits attention from policymakers and higher-education administrators is that the low grading standards in education departments may contribute to the culture of low evaluation standards in education more generally.Although the existence of such a link is merely speculative at this point, there is a striking similarity between the favorable grades awarded to prospective teachers during university training and the favorable evaluations that teachers receive in K-12 schools.To illustrate the low evaluation standards for K-12 teachers I draw on several studies in the literature.First, Jacob and Lefgren (2008) show that school principals consistently award favorable ratings to teachers.In their study, principals from a Midwestern school district were asked to rate each teacher's overall effectiveness on a 10-point scale.Principals were given the following descriptions to guide them in assigning their ratings: 1-2: Inadequate-The teacher performs substantially below minimal standards.

3-5:
Adequate-The teacher meets minimal standards (but could make substantial improvements).

6-8:
Very good-The teacher is highly effective.9-10: Exceptional-The teacher is among the best I have ever seen (e.g., in the top 1% of teachers).
Figure 3 shows the distribution of principals' actual ratings of teachers.As can be seen from the figure, the evaluations are overwhelmingly positive.Even the 30 th percentile teacher, who is well below the median, received an eight; and roughly 40 percent of the teachers received an "exceptional" rating, described as indicating that the teacher's performance is in the top 1 percent.
Findings consistent with those from Jacob and Lefgren (2008) are available elsewhere.For example, Harris and Sass (2010) survey principals in a different school district and obtain similar results (Harris and Sass (2010) report that the average teacher rating on a nine-point scale exceeds 7.0).14And in 2009, the New Teacher Project documented teacher evaluations at 12 school districts across four states (TNTP, 2009).The TNTP report notes that "in districts that use binary evaluation ratings (generally "satisfactory" or "unsatisfactory"), more than 99 percent of teachers receive the satisfactory rating.Districts that use a broader range of rating options do little better; in these districts, 94 percent of teachers receive one of the top two ratings and less than 1 percent are rated unsatisfactory" (TNTP, p. 6).Murphy and Cleveland (1991) suggest a mechanism by which the low grading standards in education departments may affect teacher evaluations in K-12 schools.Namely, the low grading standards may contribute to a cultural norm within the education sector: Murphy and Cleveland (1991) write "in an organization where the norm is to give high ratings, the rater who defies the norm might experience disapproval from his or her peers…pressures for non-conformity may be a significant factor in rating inflation" (Murphy and Cleveland,p. 197).Murphy and Cleveland (1991) also suggest that strongly held beliefs about how appraisals should be done within an organization "may make it difficult to change the appraisal system …" (Murphy and Cleveland,p. 181).Although no causal link can be established given current data, it is worth considering the possibility that prospective teachers' lack of exposure to critical evaluations during their university training translates into expectations that they continue to receive non-critical evaluations in the workforce. 15

IV. Conclusion
This paper uses recent administrative grade data to document large GPA gaps between education departments and other academic departments at universities.Classroom-level average GPAs in education departments are on the order of 0.5 to 0.8 grade points higher than in other departments.The GPA gaps do not appear to be explained by differences in student quality across departments, nor are they driven by the fact that classes in education departments are typically 15 There is a large literature in economics, management and psychology showing that performance evaluations for all workers tend to be inflated and compressed, so this problem is not unique to education (see, for example, Murphy and Cleveland, 1991).Nonetheless, the evaluations for teachers seem particularly lenient and compressed (TNTP, 2009).It is noteworthy that some of the checks and balances that reduce ratings leniency and compression in the private sector are absent in the public sector, including education (for example, see Murphy and Cleveland, 1991;Prendergast, 2002).
smaller.The remaining explanation is that the higher GPAs in education classes are the result of low grading standards in education departments.
The contribution of this study is to empirically document the low grading standards in education departments.Much work remains in the areas of identifying mechanisms and understanding policy consequences.In terms of the former, the issue of differences across departments in instructional philosophy seems like an obvious starting point.Additionally, a better understanding of how faculty in different departments perceive the role of grades may provide useful insights.
Understanding of the policy consequences of the favorable grades awarded by education departments is also important.Because the vast majority of education majors go on to work as classroom teachers, a first-order issue is to determine if and how the low grading standards in education departments affect teacher quality in K-12 schools.Based on the larger research literature I suggest some of the most likely possibilities.These include that the low grading standards (1) reduce human-capital accumulation during college for prospective teachers, (2) result in inaccurate performance signals being sent to students in education classes, and (3) affect evaluation standards for teachers in the workforce.There is a considerable research basis for making the connections in (1) and (2), although again, there is no direct evidence.Linking the low grading standards in education departments to the low evaluation standards for teachers in the workforce is more speculative, although there is some support in the literature for this possibility as well.
In conclusion, the rationale for the low grading standards in education departments is unclear.Rather than asking why these grading standards should be changed, perhaps the more reasonable policy question is this: why shouldn't the grading standards in education departments be changed?
Beyond noting that the current system has considerable inertia, what benefits does it confer?Or, put differently, if we were to start over with university education, and could choose the grading distributions in each discipline, would we choose the currently-observed discrepancy between education departments and all other academic departments at universities?Note: The average course-level GPA across education departments is approximately 3. 66.If the course-level grades are weighted by enrollment within departments, then averaged across departments, the average is 3. 60.

Figure 2
Figure2confirms that the grade distributions for the education departments in Figure1 are

Figure 1 .Figure 2 .
Figure 1.Probability Density Functions of Classroom-Level GPAs by University and Department, Shown from GPA =1 to GPA =4 (Education Departments -solid lines; Math and Science Departments -dashed lines; Social Science Departmentsx's; Humanities Departments -circles).Indiana University

Figure 3 .
Figure 3. Distribution of Principal's Ratings of Teachers on a 10-Point Scale, taken from Jacob and Lefgren (2008).

Table 1 .
Average Class Sizes, Numbers of Classrooms Observed, and Numbers of Student-by-Classroom Observations that Underlie the Grade Report Data by University and Department.
* Miami University (OH) does not have a biology department and combines its chemistry and biochemistry departments.In place of the biology department, I use the microbiology department, and I report data from the "chemistry and biochemistry" department under the "chemistry" label.

Table 2 .
GPA Comparisons Between Education and Other Department Groups.Notes: Majors included in the "Math, Science, Economics" group are biology, chemistry, computer science, economics, math and physics.Majors included in the "Social Sciences" group are political science, psychology and sociology.Majors included in the "Humanities" group are English, history and political science.Appendix TableA.1 reports disaggregated departmental data.

Table 3 .
Class-Size Adjusted GPA Gaps between Education Departments and Each Department Group at Each University, Based on Simple Averages.

Table 4 .
Predicted Effort Responses to Adjustments to Grading Standards in Education Departments, Based on Simple Averages.