How Teacher Rotation in Japanese High Schools Affects the Clustering of Teacher Quality: Comparing the Distribution of Teachers across Public and Private Education Sectors

I examine a unique facet of Japan’s public education system: jinji idou, a mandatory teacher rotation system governed by the prefectural board of education where teachers are systematically transferred to other schools throughout their careers to appropriately staff schools, facilitate varied career paths, and identify future leaders for administrative roles. Although not a formal goal, this centralized system may also produce a more equal distribution of teacher quality across schools compared to the decentralized teacher labor market found in private schools. Because this system is present in public schools and absent in private schools, comparing sector differences offers a look at its impact on teacher quality distribution. Using a sample of 1,456 teachers nested in 49 schools, private vs. public group comparison tests indicate that, for most of the teacher 1 This research was funded in part by the following agencies or fellowships: Fulbright Fellowship, National Science Foundation, Japan Society for the Promotion of Science, National Council on Teacher Quality, Boren Fellowship, and the University of Arizona (Social and Behavioral Sciences). Education Policy Analysis Archives Vol. 29 No. 91 2 quality traits examined, the public sector distributes teachers more equitably. Furthermore, the public sector has higher mean levels of teacher quality, intimating that education labor markets can be structured in ways that simultaneously minimize variation between schools without hindering quality, findings germane to scholars interested in educational equality.


How Teacher Rotation in Japanese High Schools Affects the Clustering of Teacher Quality: Comparing the Distribution of Teachers across Public and Private Education Sectors
I investigate the organization of education labor markets in Japan: how the careers of educators are managed and how this affects teacher distribution. This is predicated on four claims. First, teacher quality is a predominant causal variable affecting student performance, outweighing others such as class size, student background, or spending (Darling-Hammond, 2000;Rice, 2003). Second, there is an unequal distribution of quality teachers in decentralized education labor markets, such as in the US, that disfavors underprivileged students (Borman & Kimball, 2005). Third, this unequal distribution is caused by policies that encourage quality teachers to gravitate to affluent schools while less qualified teachers remain at poor schools (Peske & Haycock, 2006). Fourth, overall Japan's education system is more egalitarian in that socioeconomically disadvantaged students have equal access to quality teachers along with their advantaged peers, unlike in the US (PISA, 2010, p. 156)-a country with one of the largest opportunity gaps in student access to qualified teachers (Akiba et al., 2007).
Given the importance of teacher quality to student performance, that the U.S. system's localized structure has produced a disparity in opportunities justifies examining Japan's centrally controlled system as an alternative organizational tactic to providing equal educational opportunities, as one explanation for the maldistribution of teacher quality in the US is because of its laissez-faire teacher labor market in which educators largely have autonomy over where they work (Prince, 2002). This locally controlled system has produced policies inhibiting educational egalitarianism, such as funding laws that enable schools in wealthy cities to offer higher salaries, teacher transfer restrictions that make it difficult to move to low-achieving districts, or seniority clauses enabling experienced teachers to choose their placements (Prince, 2002). Akiba and LeTendre (2009) argue that the decentralization of education systems in the U.S. has created a disparity in school quality and educational opportunities around socioeconomic status when contrasted with Japan's centralized public school system, which enables a broader approach to teacher hiring and placement.
A possible solution to the maldistribution in teacher quality found in the US is to adopt systematic teacher transfers similar to jinji idou in Japan (Letendre, 2000;White, 1987;Wray, 1999), a system "specifically designed to reduce school-based differences in overall teacher quality" by assigning and periodically rotating public school teachers to other schools throughout their careers (Akiba et al., 2007, p. 381). This system has been lauded by qualitative scholars for creating a more equal distribution of teachers by preventing the clustering of quality teachers at certain schools (Letendre, 2000;White, 1987;Wray, 1999). For instance, Wray (1999, p. 31) asserts that the systematic rotation of teachers prevents "the highest ranked schools from always having the best teachers" and ensures that rural areas have a fair and consistent supply of teachers. These qualitative studies suggest that teacher rotation creates a more equal distribution of teacher quality, which should create more equal educational opportunities for students.
However, while extant studies on Japan's teacher rotation system laud its benefits, there is a paucity of comparative, quantitative analyses of its effect on teacher distribution, particularly in English. Japanese studies on the topic tend to focus qualitatively on how the system works (Choshi, 2015) or how it affects teachers' career paths (Fukuya, 2015) and professional development (Kawakami & Senoh, 2011) rather than its impact on teacher distribution. I therefore complement these studies by quantitatively comparing the distribution of teachers across public and private schools in Japan. Because jinji idou is absent in private schools, a comparison of the distribution of teacher quality across schools in the private sector with its public sector counterpart produces a comparative design that increases the ability to credit it as a causal condition affecting teacher distribution, as there is ultimately only one factor governing where teachers work: either teachers have the autonomy to choose individual schools (as in the private sector) or they do not (as in the public sector). Finally, comparing systems within a prefecture in Japan mitigates concern over externalities as lurking variables. Following that, this paper fills a significant gap in the literature by conducting the first large-scale, quantitative analysis of the distribution of teacher quality in Japan that is written in English.

Background Teacher Quality and its Import
There is consensus among recent education research that, controlling for socioeconomic status and other student factors, school-level variables impact student performance, with particularly strong effects for teacher characteristics (Darling-Hammond, 2000;Prince, 2002;Rice, 2003;Rivkin et al., 2005;Sanders & Rivers, 1996;Wright et al., 1997). For example, Wright et al. (1997, p. 66) conducted a multivariate, longitudinal, mixed-model analysis of school resources and student achievement across multiple subjects and grades, finding that "teacher effects are dominant factors affecting student academic gain." Furthermore, in their longitudinal study of a cohort of elementary students, Sanders and Rivers (1996) found the effects of teacher quality on student achievement are additive and cumulative: while individual students did not recover after a year under an ineffective teacher, students who spent a year under an effective teacher experienced benefits up to two years later. The authors conclude that teacher quality is more highly correlated with student achievement than other variables such as student traits like 'race' or socioeconomic status. Similarly, in a largescale study on teacher quality and educational equality, Borman and Kimball (2005) used multilevel models to show that classes taught by higher quality teachers produced higher mean achievement than those taught by lower quality teachers.
Most education systems emphasize quality teaching as a means of enhancing student achievement, but there is no accord with what constitutes teaching quality (Croninger et al., 2007;Rice, 2003;Rivkin et al., 2005). Consequently, in this paper I analyze a variety of common teacher quality measures: full-time teacher status, certification status, years of experience, teaching in-field, holding an advanced degree, and the prestige of one's alma mater. Hereafter I describe each.
Full-Time Status: The rationale behind using full-time status as a teacher quality measure is severalfold. First, full-time employees are likely to be more committed to their jobs and employers just as those employers are also more likely to be committed to them-points especially relevant to Japan (Ouchi, 1981). This is evident in how full-time teachers are better remunerated (Ishikida, 2005;Okano & Tsuchiya, 1999). Second, one way to capture unobserved traits such as teacher quality is to capture them with another concept-in this case, full-time teacher status. This is particularly true in the Japanese prefecture studied here where becoming a full-time teacher requires passing a test and an interview for a limited number of slots (Seebruck, 2019). The thought is that the administrators know a great deal about the teachers they hire and make hiring decisions based on their judgments of those teachers. This has support in the literature, with Jacob and Lefgren (2008) finding that school principal rankings of teachers are superior predictors of teacher quality than are observed teacher traits (although it is not clear if this proxy is as equally robust in the US and Japan).
Certification: Teacher certification is the bestowal of a formal document by an accredited institution recognizing one's ability to teach in a particular jurisdiction. While many teacher quality measures are thought to contribute to the distinction between a high and low quality teacher, certification status is often considered the most reliable predictor of student achievement as becoming certified as a teacher in most U.S. states requires formal education in a state-approved education program, the completion of either a major or minor in the subject field, plus minimum satisfaction of education credits and student teaching credits (Darling-Hammond, 2000). Certification requirements are similarly stringent in Japan (MEXT, 2016) and certification status factors into teacher rotation decisions (Fukuya, 2015). Because of the strictness and thoroughness of these requirements, certification status is one of the strongest indicators of teacher performance (Seebruck, 2015).
Experience: Viewed as a proxy for on-the-job training (Harris & Sass, 2011), studies have consistently found a positive relationship between teacher efficacy and years of experience (Klitgaard & Hall, 1975;Murnane & Phillips, 1981). Darling-Hammond (1995) notes that experienced teachers are more effective at resolving classroom problems, maintaining discipline, motivating students, and adapting to students' diverse learning needs. Clotfelter et al. (2007), using value-added models to analyze the effects of teacher characteristics on student achievement, found that teacher experience has positive effects. Likewise, Fetler (1999) found that student achievement in mathematics significantly correlates with teacher experience. Pupils of first-year teachers learn less, on average, than pupils of more experience teachers (Boyd et al., 2008). Simply put, experienced teachers are more knowledgeable about curriculum, instruction, and assessment (Prince, 2002). In Japan, years of experience factors into teacher development (MEXT, 2016) and rotation decisions (Seebruck, 2019) and correlates with teacher salaries (Okano & Tsuchiya, 1999).
Novice: There is ambiguity regarding the causal relationship between teacher experience and student achievement-not about whether experience matters but about how much experience matters. Buddin and Zamarro (2009) contend that the impact of teacher experience is more of a dichotomy between novices and veterans than a linear progression. Correspondingly, Rice (2003) notes that efficacy gains are particularly salient for those in their first few years of teaching. Although years of experience correlates with student achievement (Fetler, 1999), evidence suggests a flattening slope in their relationship. For instance, while beginning teachers (those with zero to three years of experience) at the high school level are less effective than those with more than three years of experience, there is no difference between teachers with three to five years of experience and those with 25 years of experience (Clotfelter et al., 2010).
In-Field: Another common teacher quality measure is whether one teaches in their field of expertise, as teaching in-field positively contributes to education outcomes for students (Rice, 2003). Rowan, Chiang, and Miller (1997) tested the impact of the subject matter of teachers' degrees, finding that mathematics teachers with a degree in mathematics positively predicted student achievement among high school students. Several other studies have also found that subject-specific training significantly influences high school students' math and science performance (Monk, 1994;Wenglinsky, 2002).
Advanced Degree: Having an advanced degree such as a master's presumably indicates a higher quality teacher since obtaining one requires years of additional coursework and training; but numerous studies argue that teacher quality appears unrelated to advanced degree status (Croninger et al., 2007;Hanushek et al., 2005;Ladd & Sorensen, 2015). One explanation for the mixed findings is the conflation of effects at the elementary and high school levels. For example, Rice (2003) claims that, although the evidence between holding an advanced degree and student achievement is mixed at the elementary school level, at the high school level it is clearer, with evidence showing positive effects on science and math achievement. As this study surveys high schools, I include advanced degree status as a teacher quality measurement.
Alma Mater: Many education scholars consider teachers' alma mater as a stand-in for unobservable traits tied to teacher quality, such as intelligence or ambition. As Rice (2003) notes, studies suggest the selectivity or prestige of one's alma mater has a positive impact on student performance, particularly for high school students. Summers and Wolfe (1975) examined a random sample of nearly 1,900 urban middle school and high school students and found, after controlling for other factors, that the selectivity of the teacher's alma mater was significantly related to teacher efficacy: pupils taught by instructors who graduated from higher-ranked universities fared better, on average, than their peers whose instructors attended lower-ranked universities.

Teacher Quality and its Distribution
As important as teacher quality is to student success, quality teachers are not distributed equally across U.S. school districts (Prince, 2002). This discrepancy in access to quality teachers contributes to achievement gaps between racial and socioeconomic groups in the US (Clotfelter et al., 2010), disadvantaging lower-class and minority youth (Borman & Kimball, 2005;Peske & Haycock, 2006;Seebruck, 2016). Lankford, Loeb, and Wyckoff (2002) stress there is an uneven sorting of teachers across schools, with the least qualified ones clustering at schools with higher shares of disadvantaged and under-achieving students. Not only have these discrepancies in access to equal educational resources remained consistent over time but, in many cases, the gaps are widening despite an elevated priority among politicians in redressing them (Barton & Coley, 2009).
Several qualitative researchers have suggested that jinji idou, the mandatory teacher rotation system of Japan's centrally controlled education labor market, produces a more equal distribution of quality teachers by preventing the clustering of quality teachers at certain schools (Letendre, 2000;White, 1987;Wray, 1999). In this system the prefectural board of education rotates public school teachers to other schools in the prefecture approximately every five years, throughout their entire careers. Kariya (2011, p. 247) describes teacher rotation at the middle school level as a measure to ensure equity-"to equalize quality of teaching between schools." Teacher rotation occurs at the high school level too but, unlike at the elementary and middle school levels, high schools are more stratified due to the requirement for matriculates to pass entrance examinations (Ono, 2001;Rohlen, 1983). Consequently, I analyze the impact of jinji idou on the distribution of teachers at the high school level, where, because of its tiered nature, it is less clear if the top-down allocation of personnel will remain as equitable as it is at the elementary and middle school levels.
This study is therefore motivated by the qualitative studies described above arguing that teacher quality distribution in Japan's public education system does not suffer from the same clustering seen in U.S. districts but, rather, provides more equal access to qualified teachers on a variety of measures, according to national-level statistics (Akiba et al., 2007). That is, this study aims to examine quantitatively if the expectations of the teacher rotation system's expected equitability hold true at the high school level, which is particularly important in Japan given the key role high schools play in determining social placement in adulthood (Rohlen, 1983). Following that, I examine the distribution of teachers across high schools in a specific prefecture to determine if the different organizational structures of private and public sector teacher labor markets in Japan explains differences in teacher quality distribution.

Data Expectations
The purpose of this paper is to investigate the internal distribution of high school teachers across sectors in a Japanese prefecture, to determine if the mandatory teacher rotation system present in the public sector but absent in the private sector results in significant differences. 2 Within Japan's 47 prefectures, the prefecture studied here ranks in the top fifteen in land area, population, and population density, and the high school education system comprises over 100,000 students and nearly 7,000 full-time teachers (Statistics Japan, 2013). There are a variety of formal and informal policies governing systematic rotation of public school teachers in this prefecture (for details on teacher rotation in this prefecture, see Seebruck, 2019), but what is important to this paper is that private school teachers have autonomy over where they work whereas public school teachers' career paths are decided by the prefectural board of education.
I have two expectations based on qualitative data. First, the public education sector in the Japanese prefecture surveyed is seen as being of higher quality than the private sector. Given the strong association between teacher quality and student performance, I expect the public sector to have a higher average level of teacher quality. This proposition is stated formally: P1: The public education sector will have higher average teacher quality, compared to the private sector. Second, the mandatory, systematic teacher transfer system that is present in the public sector but absent in the private sector should lead to a less clustered teacher quality distribution in the public sector. The rationale stems from qualitative studies arguing Japan's mandatory teacher rotation system produces a less clustered distribution of teachers (Letendre, 2000;White, 1987;Wray, 1999). This proposition is stated formally: P2: The public education sector should have a smaller variance in the between-school distribution in teacher quality, compared to the private sector.
To collect the quantitative data needed to confirm these propositions, I designed teacherand school-level questionnaires and surveyed high schools from 2011 to 2012.

Survey
I surveyed high schools in a Japanese prefecture using a disproportionate stratified random sample without replacement of 'normal' high schools, stratified on sector (public, private) and region (east, central, west). By 'normal' I mean standard academic high schools, as opposed to vocational, correspondence, part-time, night, or branch schools. There are 83 such schools, 51 public and 32 private.
Based on my experience working in the Japanese public education system, along with my pilot research on the jinji idou system in the years prior and my understanding of the literature on teacher quality distribution, I created a multi-faceted questionnaire designed to gather individuallevel data on teacher quality traits (described below). The paper-based questionnaire was written in Japanese and edited by native speakers to improve its readability. In addition to a teacher-level questionnaire, I also constructed a school-level questionnaire aimed at providing population data for each school, to aid in statistical weighting procedures.
The survey resulted in a final sample of forty-nine schools for a final response rate of 72.1% at the organizational level. Each school averaged nearly 30 respondents, with an average teacher-level response rate of 60.2%, a minimum of 26%, and a maximum of 100%. Thus, the final analytic sample comprises 1,456 teachers-794 teachers in 23 private schools and 662 teachers in 26 public schools. Prior to surveying, I conducted power analyses to determine the minimum sample size needed to produce a representative sample with a normal distribution (Cohen, 1992). The power analyses revealed that, with a Level 2 sample size of 49 schools, an average Level 1 sample size of 30 teachers per school, and a maximum intraclass correlation coefficient (ICC) of 0.146, the final sample has an estimated power greater than 0.98, which is well above the standard cutoff of 0.80.

Operationalization
The questionnaire gathered information on the following teacher traits commonly used in the literature as an indicator of teacher quality: full-time status, certification status, years of experience or status as a novice, teaching in-field, holding an advanced degree, and the prestige of one's alma mater. I test these indicators both separately and as an index (described below).
Full-time status is a dichotomous variable coded as 1 for those who are a full-time teacher and coded 0 for those who are not. It stems from the question, "What is your current employment status?" Of the 1,453 teachers who responded to this question, 1,158 were full-time teachers (79.7%), 137 were full-time lecturers (9.4%), and 158 were part-time lecturers (10.9%). To dichotomize these responses, I collapsed the second and third options (lecturers), coding them as 0 and leaving full-time teachers coded as 1. 3 Certification status is a dichotomous variable coded as 1 for those who hold the highest level of teacher certification and coded 0 for those who do not. It stems from the question, "What type of teaching certificate to you have?" Of the 1,383 teachers who responded, 216 held a specialized certificate (15.6%), 1,036 held a primary license (74.9%), and 131 held a secondary license (9.5%). To dichotomize these responses, I coded as 1 those who had a specialized certificate, indicating the highest level of certification, and coded as 0 everyone else.
Years of Experience is a continuous measure consisting of total years of teaching experience. It stems from the prompt, "Write your total years of teaching experience below (include both public and private school experience, part-time and full-time, as well as experience at different school levels)." Of the 1,449 teachers who responded to this question, the mean years of teacher experienced was 18.2 (with a standard deviation of 11.6 years), a minimum of 0 years of experience (i.e. a first-year teacher) and a maximum of 52 years of experience. The 25 th , 50 th , and 75 th percentiles were 8, 19, and 28 years.
Status as a Novice is an alternative, non-linear measure of teacher experience-a dichotomous variable distinguishing beginning teachers and non-beginning teachers, demarcated at having at least three full years of experience. This threshold is based on evidence of when teachers begin to become more effective (Clotfelter et al., 2010;Hanushek et al., 2005;Klitgaard & Hall, 1975;Murnane & Phillips, 1981). This three-year demarcation also fits with qualitative examinations of teacher rotation systems in Japan as an important cutoff (Fukuya, 2015;Seebruck, 2019). Teachers amidst their first, second, or third year of teaching were coded as 1; those amidst their fourth year of teaching or beyond were coded as 0. Of the 1,449 teachers who responded, 159 were defined as novices (11%) whereas 1,290 reported having equal to or greater than three full years of teaching experience (89%).
Teaching In-Field is a dichotomous variable based on teachers' responses to two questions: "What is the primary subject you currently teach?" and "Is this the same subject as your college major?" Teachers who answered 'yes' to the latter were coded as 1 for teaching in their field of expertise and coded as 0 if they answered 'no.' Of the 1,420 teachers who responded, 1,262 reported that they were primarily teaching in their field of specialization (88.9%) and 158 reported that the primary subject they were currently teaching was not the same as their field of specialization (11.1%).
Advanced Degree is a dichotomous variable collapsed from the question "What is the highest degree you hold?" To dichotomize educational attainment, I coded as 1 those having either a master's or a doctoral degree (246 respondents, or 17%) and coded as 0 those teachers who had an associate or bachelor's degree (1,200 respondents, or 83%).
Alma Mater Prestige is a continuous measure based on Shimano's (2009) annual rankings of Japanese universities, running from 1 to 10 (low to high prestige). Shimano's rankings are based on hensachi-"deviation values" stemming from a university's acceptance rate, which are considered the most renowned college ranking system in Japan (Masuda, 2003). Shimano relies on the hensachi rankings compiled by Yoyogi Seminar, which is one of the largest yobikou-for-profit, private college preparatory schools (Rohlen, 1983)-in Japan that are officially recognized by the Ministry of Education, Culture, Sports, Science and Technology (Blumenthal, 1992). Shimano improves the Yoyogi Seminar scores via university-wide aggregation and temporal and sectoral adjustments. Shimano adjusts for changes in scores over time-since the range in hensachi scores over the past few decades have contracted-by assigning schools an ordinal ranking based on their temporally contextual hensachi scores. Shimano's rankings also adjust for the fact that hensachi scores are higher in private universities due to the different types of entrance examinations issued by these schools. Finally, instead of reporting only disparate scores for colleges within universities like other reporting agencies, Shimano unifies his rankings, proving one score for each university. Of the 1,129 teachers who responded, there was a mean score of 8.3, a standard deviation of 1.7, a minimum of 2.5 and a maximum of 10. The 25 th , 50 th , and 75 th percentiles saw prestige scores of 7.0, 9.0, and 9.5. 4 Teacher Quality Index (TQI) is an integer-scale index of teacher quality traits, as a means of capturing the combined effects of teacher quality, comprising six dichotomous variables delineated above: full-time, certified, veteran (i.e. not a novice), in-field, advanced degree, and prestigious alma mater (i.e. greater than 8.0 on Shimano's scale). These dichotomous variables are summed to create a composite score ranging from 0, indicating low teacher quality, to 6, indicating high teacher quality. 5 For example, a full-time (1), certified (1), novice (0) with an advanced degree (1) from a nonprestigious institution (0) who is teaching in-field (1) would garner a 4 out of 6 on the index. Of the 1,067 teachers who had scores for all six constituents, 6 had a score of 0 (0.6%), 49 had a score of 1 (4.6%), 117 had a score of 2 (11.0%), 333 had a score of 3 (31.2%), 391 had a score of 4 (36.6%), 83 had a score of 5 (7.8%), and 88 had a score of 6 (8.3%). The average score was a 3.6, and the standard deviation was 1.2.

Overview
As the analyses are based on school-level traits, teacher-level responses are aggregated by school. The first analysis examines the sector-level grand mean in teacher quality (i.e. the overall average score for each teacher quality measure in the public sector versus the same in the private sector). The second analysis examines the sector-level, between-school grand coefficient of variation (i.e. the overall relative standard deviation between schools in the public sector versus the same in the private sector). In the case of dichotomous variables, the aggregated mean indicates the proportion of respondents satisfying the outcome. Consequently, this results in a variance that is the product of the probability of the outcome being true and one minus the probability of it not being true (Janda, 2015). Calculating those sector-level grand means and variances enables inter-sector comparisons of these intra-sector, between school differences (see Biggs 1991). This makes it possible to compare both the average teacher quality in the public and private school sectors as well as inter-sector differences in how teacher quality is distributed across schools within these sectors.

Selection Bias
Because jinji idou is absent in private schools, a comparison with public schools increases the ability to credit it as a causal condition affecting teacher distribution as, ultimately, private school teachers choose where they work whereas public school teachers do not. Comparing systems within a single prefecture mitigates concern of external parameters clouding the results. That said, given the differences between the public and private education labor markets here-such as the compulsory teacher rotation system, the socioeconomic statuses of schools, etc.-there is the potential of selfselection bias obfuscating the results as different types of teachers and students may prefer one sector to the other, which could impact the main variable of interest: teacher quality distribution. However, my research design addresses the issue of self-selection bias in the following ways.
As the primary analysis compares the overall variance in the distribution of teacher quality in the public sector to the overall variance in the distribution of teacher quality in the private sector, it is therefore a comparison of the relative variation in each sector. This inter-sector comparison of intra-sector distributions mitigates concerns of self-selection bias across sectors by first obtaining the variance in teacher quality between schools in the same sector, and then comparing those variances across sectors to determine if there is an organizational effect on the variance of the distribution of teacher quality. In other words, my primary analyses are an inter-sector comparison of intra-sector variance, not a sector-level comparison of teacher quality variance. Put differently, regardless of differences between public and private school teachers, this study examines how giving one of those groups autonomy to choose where they work (e.g. an organic distribution) while the other group's career paths are centrally controlled (e.g. an artificial distribution) impacts how those teachers are dispersed across schools within their respective sectors.
Nevertheless, despite the analyses being inter-sector comparisons of intra-sector differences, selection bias could still confound those results if there is a drastic difference in the mean values of teacher quality across sectors, which could make difficult comparisons of the variance since standard deviations are based on the mean. A solution to this is to employ the relative variability-that is, the coefficient of variation-which is the standard deviation divided by the mean and is useful when comparing variation in samples with different means (Marwick, 2018). This standardizes the comparisons of variance across sectors and is the method used here.
Additionally, I employ supplementary sensitivity analyses via randomized reassignment and difference-in-difference models. These results are available in the appendix and confirm that the different distributions in the public and private sectors are not entirely due to selection bias.

Analyses
Approach I use Wald tests for the equality of sector-level grand means to test Proposition 1: the public education sector will have higher average teacher quality. I then analyze sector-level differences in within-sector, between-school variation in teacher quality, conducting likelihood-ratio and Pitman-Morgan tests for the equality of sector-level grand coefficients of variation to test Proposition 2: the public education sector will have a smaller variance in the between-school distribution of teacher quality.
As the data come from a multi-stage, stratified random sample with disproportionate sampling without replacement, multi-level sampling weights were applied to account for survey design, non-response, and non-coverage, raking on full-time teacher status and gender at the teacher level. For missing data (e.g. 29 of the 1,480 respondents had missing data on gender), I used multiple imputation via chained regression, thereby enabling me to complete the raking process. However, in contrast to typical multilevel weights, in which the Stage 1 final weight is multiplied with the Stage 2 final weight, because the analyses involve sector-level comparisons of aggregated, school-level data, the multi-stage weights were calculated sequentially. That is, first, the Level 1 teacher weights were employed while aggregating the school-level data, and then the Level 2 school weights were subsequently employed to calculate the sector-level values.

Difference in Means
There are eight analyses, one for each measure of teacher quality: full-time status, certification status, years of experience, status as a beginning teacher, teaching in-field, holding an advanced degree, the prestige of one's alma mater, and an integer-based Teacher Quality Index (TQI) composed of dichotomous versions of six teacher quality traits: full-time, certification, experience, in-field, advanced degree, and alma mater prestige.
Sector-level summary statistics for the eight measures of teacher quality (not shown) reveal potential support for Proposition 1 as the public sector has preferable mean scores on every measure. For employment status, 81.8% of public school teachers, on average, are full-time teachers compared to only 66.7% in the private sector. For credentialization, 16.0% of public school teachers are highly credentialed compared to 11.0% in private schools. The average years of experience in the public sector is 19.1 versus 15.1 in the private sector; and for the percentage of novices per school the public sector has an average of 11.9% compared to 16.5% in the private sector (n.b. a lower percentage of novices is preferable). The public sector also has a higher percentage of teachers teaching in their field of expertise at 92.0%, compared to 87.3 in the private sector. In contrast to expectations that private schools would entice educated teachers by financially rewarding those with advanced degrees, the public sector has a higher percentage of teachers on this measure (15.9%) than the private sector (12.9%). The public sector also has higher average prestige scores for teachers' alma mater, at 8.7 compared to 7.7. Finally, the public sector also has a higher mean TQI at 3.7 versus 3.1.
To test whether these differences in means are statistically significant, I employ Wald tests for the equality of means (Gupta & Ma, 1996;Shoukri et al., 2008). The results of these analyses are in Table 1. The means calculated are sector-level grand averages-that is, they are the mean scores per sector, of the means scores per school, for schools in that sector. Using two-tailed tests, the F-values for the following teacher quality measures all have statistically significant p-values: the percentage of full-time teachers at a school, mean years of experience at a school, the percentage of teachers teaching in-field at a school, the mean prestige rating at a school of teachers' alma mater, and the mean TQI score at a school. For all five of those measures, the public sector has a significantly higher grand mean, thereby supporting Proposition 1. Conversely, Proposition 1 was not supported for the following teacher quality traits: the percentage of highly credentialed teachers at a school, the percentage of novices teaching at a school, and the percentage of teachers at a school with advanced degrees. That most teacher quality measures were higher in the public sector suggests that the qualitative evidence proclaiming the public sector in the sampled Japanese prefecture is of higher quality seems to have merit in terms of teacher quality. This intimates that there is some selection bias in this prefecture's teacher labor markets, with higher quality teachers, on average, selecting into the public sector. There may be many reasons for this-such as differences in salaries or working conditions, or the desire to teach more motivated or proficient students (Seebruck, 2016)-but, for now, this is an interesting finding because it means the jinji idou system is not so off-putting to quality teachers that they eschew the public sector. (In fact, many public school teachers I interviewed stated that the teacher rotation system was a positive factor that increased job satisfaction). It is also important to reiterate that this finding of the public sector generally seeing higher levels of teacher quality is a generalization based on sector-level data; there is variation in teacher quality between schools in both sectors.

Difference in Variation
That most of the teacher quality traits analyzed here have higher averages in the public sector is interesting, but the core analysis is the inter-sector comparison of intra-sector, betweenschool differences in average teacher quality, as the primary research question is whether the different organizational structures of the public-and private-sector teacher labor markets affect the distribution of teacher quality. In the US, there is a pervasive maldistribution of teacher quality (Borman & Kimball, 2005), with high quality teachers clustering at certain schools, at the expense of others. This maldistribution has been attributed to laissez-faire policies that largely permit school teachers in the US to work where they want (Peske & Haycock, 2006). This decentralized organizational setup is also found among the private-sector education labor market in Japan. In contrast, public sector education labor markets in Japan are centrally governed by prefectural boards of education, with all public high school teachers subjected to regular, compulsory transfers within each prefecture. This teacher transfer system likely affects the distribution of teacher quality.
To test this I employ likelihood-ratio tests (Bhoj & Ahsanullah, 1993;Gupta & Ma, 1996;Liu et al., 2011;Shoukri et al., 2008;Verrill, 2009) and Pitman-Morgan tests (Morgan, 1939;Pitman, 1939;Shoukri et al., 2008;Snedecor & Cochran, 1989) on the equality of grand coefficients of variation across sectors. The Pitman-Morgan test for the equality of coefficients of variation has been employed in a variety of disciplines (Y. K. Kim, Kim, Park, & Lee, 2017) as it is favored for being a powerful, unbiased test compared to a typical F-test (Haynes, 1981;Mudholkar et al., 2003), particularly when there is a possibility of non-independence among subjects (Cochran, 1965), such as may be the case when analyzing overlapping labor market pools (many teachers in the sample have worked in both sectors). The Pitman-Morgan test excels across many types of distributions, particularly normal or lightly tailed ones (Wilcox, 2015). However, it encounters difficulty with heavier tailed ones (Mudholkar et al., 2003). Consequently, in line with Shoukri et al. (2008), in addition to the Pitman-Morgan tests I also employ likelihood-ratio tests as a measure of robustness.
Each sector's grand coefficient of variation-that is, relative standard variation-is compared because the previous subsection demonstrated significant differences between mean teacher quality scores across sectors (Sokal & Braumann, 1980). Comparing the coefficient of variation instead of standard deviation mitigates concerns of self-selection bias in each sector's teacher labor market impacting the results, instead standardizing the variation measure to allow for inter-sector comparisons of intra-sector, inter-school differences in teacher quality. The results of the likelihood-ratio (LR) and Pitman-Morgan (PM) tests for the equality of sector-level grand coefficients of variation are listed in Table 2. Larger coefficients of variation indicate higher variation. The tests are two-tailed.
Overall, the LR and PM tests largely support each other, with the p-values for both tests agreeing on six of the eight measures. On five of those six measures, both tests demonstrate significant differences between the grand coefficients of variation across sectors for the following measures: the percentages of full-time teachers at a school, highly credentialed teachers at a school, and teachers at a school who hold an advanced degree as well as the average prestige rating of teachers' alma mater and average Teacher Quality Index (TQI) score at a school. The results for these five unambiguous results support Proposition 2. Note. Weighted data, 2-tailed tests, N = 49 (23 Private, 26 Public), df = 43, p < .05* p < .01** p < .001*** Both the LR and PM tests are also in accord that the there is no significant difference in the variation of the percentage of novices teaching at a school, across sectors. Thus, Proposition 2 is not supported for this measure. One possible reason for this null finding is that the public sector has more rural schools, which are more difficult to staff, particularly with veteran teachers. This is the only one of the eight measures to be unambiguously non-significant.
There are two measures where the LR and PM tests are not in accord: years of experience and the percentage of in-field teachers at a school. For these two variables, the PM test finds a statistically significant difference between sectors, with the between-school variation in the private sector significantly larger at the .05 level. However, the p-values for the LR tests of these two measures are not statistically significant, creating an ambiguous situation. In this sense, Proposition 2 is supported for these two variables, but the results are not as unambiguously robust as the other teacher quality measures.

Data Visualization
Although statistical significance tests are useful in hypothesis testing, particularly when testing for differences in means and variation, they should not be relied on alone, as it is possible for data sets dissimilar in reality to appear similar when comparing summary statistics (Anscombe, 1973). I graph the statistics formally tested in the previous sections to determine if the distribution of the data reaffirms the results of the statistical significance tests, contradicts them, or unveils differences not previously identified. Regarding the inter-sector comparisons of intra-sector, between-school variation in mean teacher quality, results of the Pitman-Morgan (PM) tests found that the private sector had significantly larger between-school, relative variation for seven out of the eight measures of teacher quality: full-time percentage, highly credentialed percentage, years of experience, in-field percentage, graduate degree percentage, alma mater prestige, and TQI. The only variable not found to significantly differ by sector, according to the PM tests, was the percentage of novices. Likelihood-ratio (LR) tests largely reaffirmed these findings, except for years of experience and in-field percentage, which were non-significant in the LR tests.
To examine these relationships, I graph between-school differences using mean plots, depicted in Figure 1. Private and public schools in the sample are demarcated, with private schools on the left and public schools on the right in each figure. Every ring represents a school, and a ring's location on the y-axis depicts the school-level mean score for the teacher quality trait listed. The solid circle on either side of the graph represents the sector-level, grand mean of those school-level means. The solid line traversing the graph offers a visual comparison of the difference in grand means for each sector. The dotted envelope slopes demarcate one grand standard deviation above and below the grand mean. Therefore, narrowing slopes across the graph intimate a shrinking variation in between-school, mean teacher quality.
First, I will address the five teacher quality measures that were found to be unambiguously significantly higher in the private sector. Regarding the percentage of full-time teachers, it is clear from the mean plot that the public sector has a higher grand mean and slightly narrower envelope slopes, mostly due to the wider range on the y-axis between schools in the private sector. For the percentage of certified teachers, the mean plot reveals more dispersion among private schools as well, with noticeably wider envelope slopes compared to the public sector. Graduate degree percentage likewise has narrower envelope slopes in the private sector, as do alma mater prestige and the TQI. Second, the visual depictions of the sole variable that was unanimously nonsignificant-the percentage of novices-seems to support that null finding, as it is difficult to discern any difference in the width of the envelope slopes for that variable. These are all in line with the LR and PM tests in Table 2.
The two variables where the LR and PM tests differed are more interesting cases. Examining the mean plot for years of teaching experience reveals a higher grand mean for the public sector, but differences in the grand standard deviation are unclear, as the general dispersion for both are similar, as are the corresponding widths of their envelope slopes. Thus, visualizing this variable did not clearly support either the LR or PM test. In contrast, examining the mean plot for the percentage of in-field teachers is more definitive, with a noticeably larger range in mean teacher quality among private schools, with one having a 100% score and two all the way down near the 75% mark. In contrast, the range of public schools is much smaller. As such, the envelope slopes are wider in the private sector. Together, the visual evidence reaffirms the PM test of a significant difference in sector-level variation for one of the two previously unclear variables: in-field teachers.

Mean Plots of School-Level Teacher Quality Measures, By Sector
Note: Y-axis location of each ring is a school's mean score for a teacher quality trait. Dotted lines demarcate one grand standard deviation from sector-level grand means (i.e., the solid lines endpoints).

Mean Plots of School-Level Teacher Quality Measures, By Sector
Note: Y-axis location of each ring is a school's mean score for a teacher quality trait. Dotted lines demarcate one grand standard deviation from sector-level grand means (i.e., the solid lines endpoints).

Conclusion
This paper tested (1) the difference in average high school teacher quality between the public and private education sectors in a Japanese prefecture, and (2) the differences in the distribution of high school teacher quality across these two sectors. The former analysis answers the question of which sector, on average, has higher levels of teacher quality. Remembering that variation exists within sectors as well (meaning many private schools have higher mean levels of teacher quality than many public schools), Wald tests for the equality of sector-level grand means reveal that the public sector has higher levels of average teacher quality for the following five measures: the percentage of full-time teachers, average years of teaching experience, the percentage of teachers primarily teaching their subject of expertise, the average prestige rating of teachers' alma mater, and the average score on the composite measure (the Teacher Quality Index). These findings support Proposition 1 for those traits. Proposition 1 was not supported for three teacher quality measures: the percentage of highly credentialed teachers, the percentage of novices, and the percentage of teachers with an advanced degree, as no significant difference was found between sectors for those measures.
The second set of analyses investigated sector-level differences in within-sector, betweenschool variation in teacher quality, conducting likelihood-ratio (LR) and Pitman-Morgan (PM) tests for the equality of sector-level grand coefficients of variation. These inter-sector analyses of intrasector, between-school differences in teacher quality support the argument that the public sector in the sampled Japanese prefecture has a more equal distribution of quality teachers than does the private sector. The PM tests found that, for seven out of the eight measures, the private sector had significantly more clustering of teacher quality, thereby supporting Proposition 2 with one exception: the percentage of novices, which was not significantly different across sectors. The LR tests corroborated five of those seven differences, with visualizations of the data adjudicating and finding some support for the PM tests.
Collectively, these examinations reveal that the distribution of quality teachers in the public sector is more equal compared to the distribution of quality teachers in the private sector. As the coefficient of variation was used to essentially control for differences in the populations of teachers in each sector (Sokal & Braumann, 1980), this suggests that the most conspicuous organizational difference between these two education labor markets-that is, the centrally controlled, large-scale, systematic rotation of teachers across schools that is present in the public sector but absent in the private sector-likely accounts for their different teacher distributions.
Akin to the maldistribution of teacher quality in the US, the private sector in this Japanese prefecture has notable variation in teacher quality across schools. This is interesting because the education labor market in the Japanese private sector, being locally controlled, is organized similarly to both private and public teacher labor markets in the US. In contrast, the Japanese public sector, with its centrally controlled education labor market marked by a mandatory, systematic, and careerlong geographic relocation of teachers across schools, has a more even distribution of teacher quality. These findings are important since the negative effects of being taught by ineffective teachers is substantial. Consider Rivers and Sanders's (2002, p. 18) finding that the "cumulative and residual effects of teachers on the academic progress of students are huge" and that the "extreme variability of teachers' effectiveness" in the US has a "dramatic effect" on student progress.
Centralizing control over teachers' career paths like Japan's public school system may be one solution to the issue of educational inequality in the US (Akiba & Letendre, 2009), where school districts are locally governed and financed and where there is a trend of well-off areas trying to break away from larger districts, at the expense of poorer areas (Newkirk, 2014). Increased decentralization such as this could exacerbate the maldistribution of educational resources whereas centralization of resources, particularly governance over teacher labor markets, could alleviate inequalities in educational opportunities. Such changes surely would need to be modified to fit sociopolitical nuances and geographic challenges in the US-a difficult endeavor given the federalist nature of its education systems-but, even then, narrowing the opportunity gap in students' access to quality teachers does not guarantee a narrowing of the achievement gap (Akiba et al., 2007).
That said, there are some limitations to this study that could be addressed in future research. First, this study examines only high schools. This reduces generalizability, as there are qualitative differences between education levels when it comes to how jinji idou operates-such as the geographic range of teacher transfers, which spans the entire prefecture for high schools but are limited to smaller geographical municipalities in elementary and middle schools (Seebruck, 2019). Moreover, unlike high schools in Japan, which are very much tiered academically by student capability, this is less of an issue at elementary and middle schools, where students do not have to pass entrance examinations to gain enrollment (Kariya, 2011;Ono, 2001;Rohlen, 1983). As such, there should be a less variation in student achievement across elementary and middle schools in Japan, making for an interesting scenario for a replication of the analyses in this paper. Second, this study examines a single prefecture and other scholars have shown jinji idou policies vary somewhat across prefectures in Japan (Kawakami, 2013). Replication studies in other prefectures are needed to determine the generalizability of the findings here.
Finally, while this paper examined the distribution of various teacher quality traits, it does not examine the extent to which those teacher quality traits impact student achievement. Future research could build off these findings by ascertaining the extent to which the traits studied here matter to student performance and by including student and family voices as well.
standard deviation is used, and with that, a shift from the likelihood-ratio (LR) and Pitman-Morgan (PM) significance tests to Bartlett's test for the homogeneity of sector-level grand standard deviations. The Bartlett tests find that, for every teacher quality measure, the actual distribution of teachers in both the public and private sector are significantly (p < .000) more variable than random (results not shown). These results are not surprising, as humans do not randomly decide where to work (private sector) or where others should work (public sector). Instead, these results serve to justify the DiD analyses comparing inter-sector differences in the magnitude of these intra-sector variances in teacher quality distribution.
The DiD analyses shown in Table 3 are similar to past research (see Card & Krueger, 1994, p. 780). In short, they reveal the inter-sector (i.e. public vs. private) difference between the intrasector (i.e. randomized vs. actual) differences in the between-school variation of each teacher quality measure. The results show, with one exception, the differences between the actual and the randomized public school sample are more favorable, from an educational egalitarianism perspective, compared to the differences between the actual and the randomized private school sample. The one exception is the percent of novices, which has a larger DiD in the public sectorsomething unsurprising given the need for the public sector to staff rural schools (Wray, 1999). Overall, the DiD analyses support the proposition that the centrally controlled public sector, with its mandatory teacher rotation system, distributes teachers more equitably across schools than the decentralized private sector.

Randomized Reassignment
In addition to the DiD analyses, randomized reassignment analyses can provide further illumination. These analyses mimic those in Table 2, but instead of comparing the actual public and private school data, they compare the randomized versions of them. If self-selection bias is a concern, the outcomes of the randomized data set comparisons (Table 4) should be identical to those of the actual data sets ( Table 2). As a comparison of different labor pools, LR and PM tests are once again used. Recall that seven out of the eight measures in the actual analyses revealed that the private sector was significantly more variable than the public sector. If these differences were due to self-selection bias rather than the treatment (i.e. jinji idou), then the randomized analyses should corroborate these results. However, as seen in Table 4, some of the results are counter to this expectation, meaning that self-selection bias is not a concern for these measures. Note. Private sector randomized sample vs. public sector randomized sample; unweighted data, 2-tailed tests, N = 49 (23 Private, 26 Public), df = 43, p < .05* p < .01** p < .001*** Note that three of the eight measures are significantly more variable in the private sector for the randomized analyses: full-time teacher status, alma mater prestige, and the Teacher Quality Index (TQI). This suggests that the differences found in the actual analyses may be-but are not necessarily-due to self-selection bias for these measures. In contrast, the remaining five teacher quality measures reveal that self-selection bias is unlikely, as the randomized analyses saw either no significant difference between sectors-as is the case with credentialization, years of experience, novices, and in-field teaching status-or, even more interestingly, showed the randomized version of the public sector had a significantly larger between-school variance in the distribution of teachers holding an advanced degree, which is the opposite of the actual analyses. These results suggest that, compared to the more laissez-faire education labor market of the private sector, the teacher rotation system in the public sector distributes teachers more equitably across schools for most of the teacher quality variables identified in the main analyses as having different distributions (credentialization, experience, in-field teaching, and advanced degree).
Consequently, given the results of the DiD and randomized reassignment analyses, it is unlikely that the results of the inter-sector comparative analyses-which largely found that teacher quality distribution was more clustered in the private sector compared to the public sector-were the consequence of self-selection bias rather than organizational differences such as jinji idou.