Is Shadow Education the Driver of East Asia’s High Performance on Comparative Learning Assessments?

East Asian students consistently top comparative assessments of academic achievement. Yet, rather than attempting to develop more sophisticated understandings of this difference, the most common reaction is to attribute East Asian performance to longer study hours and/or the attendance at schools focused on academic skill enhancement and test preparation (i.e., juku). Herein we seek to contribute to a richer debate both by presenting new data and findings in relation to Japan, and by highlighting new analytical strategies to understand the relationship between East Asian performance and shadow education. Specifically, we highlight that comparatively high levels of achievement among Japanese students were apparent even at the level of fourth graders, even though juku attendance was low prior to this stage. This suggests that juku attendance is not the primary factor for the high academic achievement of Japanese students. The wider significance of these findings lies in countering both common portrayals of East Asian success and factually inaccurate information disseminated by organizations such as the OECD. In so doing, researchers are in better position to 1 Both authors contributed equally to this piece. Education Policy Analysis Archives Vol. 28 No. 67 2 elaborate new, more sophisticated theories that explain East Asia’s consistently world leading academic achievement.

elaborate new, more sophisticated theories that explain East Asia's consistently worldleading academic achievement. Keywords: alternative theories; international large-scale assessments; scale mismatch; stereotypes; TIMSS ¿Es la educación en la sombra el impulsor del alto rendimiento de este de Asia en las evaluaciones de aprendizaje comparativo? Resumen: Los estudiantes de este de Asia constantemente superan las evaluaciones comparativas de rendimiento académico. La comprensión común de esta diferencia es atribuir el rendimiento del este asiático a horas de estudio más largas y / o la asistencia a escuelas enfocadas en la mejora de habilidades académicas y la preparación de exámenes (es decir, juku). Este estudio presenta nuevos datos y hallazgos de Japón y destaca estrategias de noticias para comprender la relación entre el desempeño de este de Asia y la educación en la sombra. Los niveles de logro comparativamente altos entre los estudiantes japoneses fueron evidentes a nivel de cuarto grado, a pesar de que la asistencia al juku era baja antes de esta etapa. Esto sugiere que la asistencia al juku no es el factor principal para el alto rendimiento académico de los estudiantes japoneses. Los hallazgos contrarrestan tanto las representaciones comunes del éxito de este de Asia como la información inexacta difundida por organizaciones como la OCDE. La investigación futura necesita desarrollar teorías que expliquen mejor el logro académico líder mundial de este de Asia. Palabras clave: teorías alternativas; evaluaciones internacionales a gran escala; desajuste de escala; estereotipos TIMSS ¿A educação na sombra do impulsor do alto desempenho da Ásia leste nas avaliações de aprendizagem comparativa? Resumen: Os estudiosos da Ásia leste podem superar as avaliações comparativas de desempenho acadêmico. A composição comum dessa diferença é atribuir o r etorno a este número de horas de estúdio asiáticas, maior do que o número de sessões de assistência enfocadas na grande quantidade de habilidades acadêmicas e na preparação de exames (es decir, juku). Este estúdio apresenta novos dados e resultados do Japão e exibe estratégias de notícias para incluir a relação entre o desempate da Ásia e a educação na sombra. Os niveis de logotipos comparativamente altos entre os estudantes japoneses japoneses evidenciam um nível de desenho animado, um peso que ajuda na era juku antes de esta etapa. É sugerido que a ajuda no juku não é o principal fator para o alto desempenho acadêmico dos estudantes japoneses. Os controles contrariam tanto as representações comuns do movimento da Ásia lestecomo as informações inexatas difundidas por organizações como o OCDE. A investigação futura necessária descreve as teorias que explicam melhor o logótipo acadêmico líder mundial da Ásia leste. Palavras-chave: teorías alternativas; avaliações internacionais a gran escala; desajuste de escala; estereotipos TIMSS However, researchers, many from East Asia, who have looked deeply into the issue have been unable to verify the purportedly strong links between shadow education and higher achievement, either within countries or between countries. An extended study from 2015 out of the University of Tokyo utilizing PISA 2006 data to compare the effects of supplementary tutoring on mathematics achievement in Japan and the United States suggested "no overall effect of out-ofschool tutoring in Japan" (Mori, 2015, p. 70). In Korea, drawing on new large-scale longitudinal educational surveys, a 2014 comprehensive study found that for middle school achievement "cram schools made a small difference in achievement gains in math, whereas other forms of shadow education (e.g., individual tutoring, correspondence courses, on-line tutoring services, and EBS) made little difference" (Byun, 2014, p. 15, see also Park, 2013. Similarly, a 2011 analysis of Taiwanese middle school students' mathematics achievement utilizing propensity score matching and nationally representative panel data detected a "positive but fairly small" effect (Kuan, 2011, p. 363).
At present, as one Korean scholar confides, the links between shadow education and achievement in East Asia remain unclear for methodological reasons: "although the effects of shadow education on academic achievement have been widely investigated in recent years, emerging empirical evidence has been inconsistent, contradictory, and even confusing," underscoring that this uncertainty "precludes definitive conclusions about the effects of shadow education" (Byun, 2014, p. 40). Our intended contribution in this piece is situated within this wider constellation and uncertainty, and is best conceptualized on two distinct levels.
First, within the on-going scholarly investigations of shadow education and achievement in East Asia, we seek to put forth several new analytical strategies that may help nuance and advance existing research. 2 In doing so, we primarily focus on Japan, where there is both an abundance of unexamined domestic survey data and relative dearth of research in the English language (as compared to Korea and Taiwan). Our analytical innovation centers on finding new ways to take up the problem that do not rely on individual-scale analyses to describe country-scale phenomenon. As we detail below, this problem of scale mismatch is found in Entrich (2014), among many others. Certainly, such individual-scale analyses have their own advantages and purposes, and will remain important. But we seek to underscore that it is problematic if educational research lacks analytical strategies for understanding whether or not shadow education plays a decisive role in determining country-scale achievement and between-country differences, particularly at this juncture when OECD analyses and media outlets casually observing ILSA results confidently suggest this is the case.
Against this backdrop, we propose a novel strategy founded on the fact that shadow education differs among different grades (i.e., school year of students) and subjects. If shadow education is really the primary factor explaining achievement differentials among countries, we would observe more pronounced between-country differences in achievement in those grades and subjects wherein shadow education is more pervasive. Concretely, this means analyzing (i) the link between shadow education and achievement link for particular subjects, and (ii) not at age 15 (PISA) but at the elementary school level (i.e., fourth grade). To underline the importance of the scale mismatch problem, we also show that juku attendance did not explain international differences, nor inter-regional differences in achievement (when using Japanese domestic surveys).
The second dimension of our intended contribution is to challenge the authority of the OECD-style portrayal of East Asia's world leading achievement. While many researchers seem content to simply labor away on the analytical question, we find ourselves increasingly dissatisfied with analyses that do not engage with the wider policy and epistemic context in which we work. That is, the findings (and continued uncertainty) we present herein are not simply of academic interest, but also raise the question of the basis of authority the OECD and Western media outlets stand upon when drawing conclusions about East Asia's academic achievement. As we show, there is little evidence to support the idea that East Asia's achievement is primarily driven by shadow education, but yet the OECD and Western media repeatedly present that view in an authoritative voice. Importantly, this problem is not limited to issues surrounding shadow education but extend much more broadly to portrayals of entire systems across East Asia. Take for example, the OECD's 2012 report entitled Lessons from PISA for Japan that states: At the primary school level, juku participation increased from 16% in 1985 to 26% in 2007, and at the lower secondary level, from 44% to 53%. At the upper secondary level, participation in private tutoring is even greater (Figure 2.2.4;OECD, 2012, p. 72, italics added) Here the word private tutoring means juku. But as shown in Fig. 1 (below), this latter assertion (italics) is factually inaccurate, even blatantly so. Moreover, the OECD concludes that this rising juku attendance is "driven by the severe competition to enter the country's top universities." (OECD, 2012; see also OECD, 2011, p. 14). But a series of recent papers, we have shown how learning time, pressure surrounding study, and the stresses surrounding college entrance exams -the so-called East Asian 'Exam Hell' -is no longer existent in Japan . More broadly, in other recent studies, we have shown how the OECD portrayals of the 'East Asian Miracle' are also unsupported by data (e.g., Komatsu & Rappleye, 2019), and how the purported links between PISA Scores and economic growth rates worldwide suggested by the OECD are also based on flawed statistics (Komatsu & Rappleye, 2017a;. All of this points to how stereotypes -later presented as rigorous, evidence-based research on East Asian education -continue to be disseminated by the OECD and Western media outlets. Rather than improving our understanding, these stereotypes actually frustrate and mislead. The consequence, we argue, is that the deeper, alternative perspectives on, say, the nature of knowledge, the meaning assigned to education, and concepts of selfhood existent in East Asia (e.g., Markus & Kitayama, 1991) are replaced not simply by structural analyses (i.e., a priori premised on the idea of no deeper diversity of worldviews), but upon disappointingly shallow ones. Thus, whilst we continue to investigate the analytical question herein, we also seek to clear up some of the blatant misinformation promoted by the OECD about East Asia. In so doing, we seek to invite readers' attention to the wider political and epistemic questions in play here as well, not simply the analytical details. We return to address these wider themes in the final section of our conclusion.

Situation of Shadow Education Attendance in Japan
To understand the patterns of shadow education attendance across Japan, we used data collected by the Ministry of Education, Culture, Sports, Science and Technology (MEXT, 2008). As supplement, we also used data collected by the NHK Broadcasting Culture Research Institute (2013). NHK is the national broadcasting organ and research arm of the Japanese government and is charged with conducting non-partisan public surveys (analogous to Pew Survey research in the USA).
MEXT recorded data for shadow education attendance rates for first -sixth graders (i.e., primary students) and seventh -ninth graders (lower secondary students). This survey was conducted in 1976, 1985, 1993, 2002, and 2007. In each survey cycle, MEXT randomly selected municipal governments nationwide for which shadow education attendance would be examined. The final sample size of students was approximately about 1% of the total student population in Japan (total population: 53,458 students in the 2007 academic year). MEXT then examined the nature and quality of shadow education attendance for the sample students using questionnaire. This questionnaire included questions about (1) the specific form of shadow education (schools focused on academic skill enhancement and test preparation, home tutoring, remote tutoring, or other types such as sports, music, and calligraphy) and (2) which subject the student learns in shadow education. Hereafter, the term juku is used to exclusively refer to schools focused on academic skill enhancement and test preparation.
NHK Broadcasting Culture Research Institute data included the mean juku attendance rate for 10 th -12 th graders (i.e., upper secondary students) as well as the mean juku attendance rate for seventh -ninth graders. The survey was conducted in 1982, 1987, 1992, 2002, and 2012. In each survey cycle, NHK researchers first randomly selected municipal governments, and then visited the same municipalities repeatedly, selecting students at random using the municipal population registry. NHK researchers then visited their homes to conduct interviews. In the interviews, NHK researchers examined whether or not the student attended a juku. The sample size for 10 th -12 th graders ranged between 969 -1350 students in all survey cycles.
Considering the random sampling strategy used by MEXT and the NHK Broadcasting Culture Research Institute, we anticipated that the data would contain only very small systematic biases. Indeed, the mean juku attendance rate for seventh -ninth graders derived from the former dataset was almost the same as that derived from the latter dataset, suggesting little systematic biases in the data. It would be thus reasonable to conflate data for first -ninth graders from the former dataset and those for 10 th -12 th graders in our analysis. Using these data, we analyzed in which grade and in which subject juku attendance was common. Based on this analysis, we generated a hypothesis about when the achievement gap between Japan and other countries should become apparent if shadow education attendance was indeed the primary factor for explaining Japan's achievement.

Achievement Gap Between Japan and Other Countries
We used data derived from the Third International Mathematics and Science Study (TIMSS, TIMSS & PIRLS International Study Center, 2018a,b). Our selection of TIMSS was because TIMSS had achievement data for two different grades, i.e., fourth and eighth grades, whereas PISA includes achievement data only for one fixed age (15 year olds). TIMSS thus allows us to examine in which grade the achievement gap between East Asia (in this case only Japan) and other countries is most pronounced.
To examine the achievement gap between Japan and other countries, we used achievement data of Math and Science for fourth and eighth graders obtained by TIMSS (1995TIMSS ( , 2003TIMSS ( , 2007TIMSS ( , 2011TIMSS ( , and 2015. We did not use data obtained by TIMSS 1999, because TIMSS 1999 did not survey achievement for fourth graders. To estimate achievement gap between Japan and other countries (or regions), we first selected countries which regularly participated in TIMSS (i.e., England, Hong Kong, Hungary, Iran, Norway, Singapore, Slovenia, and United States). We then calculated the difference in Japan's achievement score and the mean score for these regular participant countries.
We also calculated the difference divided by the standard deviation (SD) of the scores for the regular participant countries on the assumption that the regular participant countries were the control group (see Ellis, 2010). The focus of our analysis was whether or not the achievement gap observed here meshed with the variation in the juku attendance rates between different grades and between math and science.

Regional Variations in Juku Attendance and Achievement
To examine the relationship between the regional variation in juku attendance and that in academic achievement in Japan, we used data collected by MEXT's National Survey of Academic Achievement and Learning Conditions conducted in 2007 (MEXT, 2008). This survey aimed to assess academic achievement and learning conditions for all sixth graders and ninth graders in public schools in Japan. Virtually all public primary schools (19,251 among the 19,361 schools) participated in this survey. This survey included data of the students' achievement of two math subjects and two language subjects for sixth and ninth graders for different prefectures. This survey also included data for the juku attendance rates for different prefectures. We utilized data for the mean of the correct answer rates for the two math assessments and the juku attendance rates of sixth graders. These data were derived from the National Institute for Education Policy Research (2018) and the Prefecture Rankings (2018).
Using these data, we examined the relationship between juku attendance rates and math achievement among different prefectures. We would obtain a positive correlation if juku attendance was the primary factor determining academic achievement. To examine the correlation, we used the Pearson's correlation coefficient (r) and its 95% confidence interval (CI). CI was calculated using the bootstrapping method (Efron, 1979;Fox, 2008). Specifically, we obtained random samples from the original data with replacement and calculated r values 10,000 times and then identified the range in which 95% of the r values fell (see Komatsu & Rappleye, 2017b).
We were cognizant of the possibility that data for academic achievement of sixth graders might be biased due to the exclusion of academic achievement data for students attending national (kokuritsu) and private schools (i.e., usually considered superior academically), although this bias would likely be small due to much lower numbers of national and private primary schools (70 and 217 schools, respectively) as compared with numbers of public (koritsu) primary schools (19,648 schools) across Japan. Yet to consider the effect of this potential bias on our results, we also classified prefectures according to the rate of students attending national and private schools. We then examined the relationship between juku attendance and math achievement for prefectures having a similar rate of students attending national and private schools to confirm.
One final note: throughout this study we did not conduct significance testing. The primary reason is that statistical testing is quite often misleading. With a sufficiently large sample size, one can always find a statistically significant difference between any two variables and a statistically significant correlation between the two variables. What is most important is not whether there is a difference or a correlation, but the magnitude of the difference or the strength of the correlation. The confusion created by statistical testing was noted by numerous statisticians for many years (Bakan, 1966;Berkson, 1938;Komatsu, Shinohara, & Otsuki, 2015;Nuzzo, 2014;Thompson, 1996Thompson, , 2002. What is important from the perspective of education research is that these statisticians include Thompson (1996), who penned the guidelines for statistical reporting for the American Education Research Association. Thompson (1996Thompson ( , 2002, in resonance with other statisticians, recommended reporting effect sizes and confidence intervals instead of reporting statistical significance (i.e., p values).

Situation of Shadow Education Attendance in Japan
In Japan, the attendance of shadow education for academic subjects (i.e., jukus, home tutoring, and remote tutoring) was common at the lower secondary level (grades 7-9), but it was not very common at the primary level (graders 1-6; Fig. 1a). At the primary level, non-academic activities were the focus, particularly in the early grades. It is true that juku attendance increased with grade even at the primary level, but the incremental change becomes considerable after the fourth grade (Fig. 1b). Home tutoring and remote tutoring changed little with grade and were supplementary in Japan.
We observed a clear contrast in juku attendance between math and science. A majority of students attending juku took math lessons, while very few students took science lessons (Fig. 2). For example, the mean percentage of first -fourth graders taking math lessons was 15.0%. The same percentage for those taking science lessons was merely 0.9%. In relation to claims reviewed above that juku attendance was the primary cause for Japan's academic achievement, we can generate the hypothesis that the achievement gap between Japan and other countries would be pronounced only in math at higher grades (i.e., seventh -ninth grades). This hypothesis will be tested using TIMSS data below.
It is important to note here that the period when TIMSS data were available was exactly the period when juku attendance was the most prevalent in Japan's history (Fig. 3). Juku attendance had increased between 1976 and 1993 and it has remained fairly consistent since then. Juku attendance before 1976 would be less prevalent. Indeed, the survey of juku attendance in 1976 was triggered by the perception that juku attendance had become unprecedentedly widespread (Yamada, 2014). Data after 2007 were lacking in the MEXT dataset, but juku attendance is unlikely to have changed greatly since 2007. A different dataset covering the period 2009-2017 provided by a major private education institute in Japan (Benesse, 2017) observed no pronounced changes in juku attendance in the period, although the data were not fully comparable with MEXT data due to some differences in the definition of juku attendance.

Achievement Gap Between Japan and Other Countries
Table 1 summarizes achievement of math and science for Japan and other countries which regularly participated in TIMSS. The achievement gap between Japan and other countries was apparent not only at the eighth-grade level but importantly already present at the fourth grade level. For example, Japan's math scores for fourth and eighth graders were 593 and 586 points, respectively, in TIMSS 2015. These scores were much higher than the mean values for other countries (536 and 526 points, respectively). These results held even when comparing Japan with each regular participant (Table A1 of Appendix A). In other words, the achievement gap between Japan and each participant was apparent already at the fourth grade level. We obtained qualitatively the same results for science. Japan's science scores for fourth and eighth graders were 569 and 571 points, respectively. These scores were again much higher than the mean values for other countries (529 points for both cases). Japan's scores were generally higher than the other regular participating countries by one full standard deviation regardless of the grade and subject in TIMSS 2015 (Table 1). In light of the fact that juku attendance was uncommon at the primary level (particularly prior to Grade 4) and that primary school juku attendance in science subjects was virtually non-existent, TIMSS data did not support the hypothesis that juku attendance was the primary factor for high academic achievement of Japan.

Regional Variations in Juku Attendance and Achievement
Table 2 shows regional variations in juku attendance and math achievement for sixth graders in Japan. Juku attendance rates varied considerably among prefectures. The range was between 22.1% (Akita) and 57.9% (Tokyo). In general, juku attendance was relatively high for urban prefectures (e.g., Tokyo, Kanagawa, Hyogo, and Nara). However, academic achievement for these prefectures was not always high. The achievement scores for Hyogo and Nara (62 points for both prefectures) were less than the mean score for all prefectures (63 points), although the scores for Tokyo and Kanagawa (65 and 64 points, respectively) were slightly higher than the mean. We rather found an interesting paradox, as viewed by those who would advocate the juku-achievement hypothesis: rural prefectures where juku attendance rates were low often had the highest levels of academic achievement. Indeed, the achievement score for Akita was 67 points, which was the second highest among all prefectures. The score for Akita was higher than the mean by 2.3 SD, whereas the juku attendance rate for Akita was lower than the mean by 2.4 SD. Furthermore, the highest achievement score was recorded by Ishikawa of which juku attendance rate (28.2%) was considerably lower than the mean for all prefectures (41.8%). The score for Ishikawa was higher than the mean by 3.4 SD, whereas the juku attendance rate for Ishikawa was lower than the mean by 0.6 SD. We further found the overall correlation between juku attendance rates and academic achievement was not positive (r = -.16 with CI between -.44 and .16, Fig. 4a).  These results did not change qualitatively when using data for prefectures having similar rates of students attending national and private schools. The rates of students attending national and private schools for most prefectures were less than 3.0%, but there were several prefectures having higher rates (Table 2). We thus examined the relationships between juku attendance rates and achievement rates for three categories having sufficient samples to examine correlation, i.e., those having the rates of students attending national and private schools being less than 1.0%, from 1.0% to 2.0%, and from 2.0% to 3.0% (Figs. 4b, 4c, and 4d, respectively). In all cases, we failed to observe positive correlations. r values were -.27 (with the between -.93 and .32), -.38 (with CI between -.58 and .08), and -.94 (with CI between -1.00 and .98), respectively. Note that the very wide rages of the CIs for the first and third cases suggest that calculating CIs based on such limited samples was not very meaningful. Juku attendance thus did not explain intra-national regional differences in academic achievement, as well as between-country differences, underlining the importance of the scale mismatch problem. Figure 4: Relationships between juku attendance and achievement for (a) all prefectures, and prefectures having a percentage of students attending private or national schools being (b) less than 1.0%, (c) from 1.0% to 2.0%, and (d) from 2.0% to 3.0%.

Discussion
We are, of course, aware that the correlational analyses above do not indicate that juku attendance had no effect on improving academic achievement. However, this data does strongly suggest that juku attendance is not the primary determinant of the superior performance of Japan, when viewed at the country-scale. To clarify and extend the implications of our findings, we first link to other research and then return to the question of how future research might build on the new analytical strategies we have introduced herein.

Links to Other English-Language Research
The results of our analyses mesh with other previous data and studies. First, Japan's achievement was already high in FIMS (1964) andSIMS (1980-82) which examined math achievements of 13-year-olds for various countries. For example, the mean FIMS score of Japan was 31.16 points (with SD being 16.90 points; Postlethwaite, 1967, p. 94). This was much higher than that of the United States whose mean score was 17.85 points (with SD being 13.21 points). However, the juku attendance rates of primary students in the 1960s and 1970s was lower than in the 1990s and later (see Results). It is thus difficult to attribute Japan's achievement in FIMS and SIMS only to juku attendance of primary students. Second, our results complement findings in The Learning Gap by Stevenson and Stigler (1992). Stevenson and Stigler studied math achievement for first and fifth graders in Sendai (Japan) and Minneapolis (the United States) in 1980. They conducted a similar study in Sendai and Chicago in 1987. Their main point was that the achievement gap between Japan and the United States expanded between the first and fifth grades. More importantly, we note that the achievement gap was already present at the first grade level in both studies (conducted in 1980 and 1987). This achievement gap cannot be explained by juku attendance given the low rate of juku attendance by Japanese students at the first grade level. We further underscore that Miyagi prefecture which includes Sendai that was the focus of the Stevenson and Stigler's empirical study is not a region where juku attendance is prevalent (Table 2) and, moreover, juku attendance in the 1980s was less common than in the 1990s and later (Fig. 3).

Towards Analytical and Methodological Innovation
Our results showed that the common hypothesis that shadow education attendance is the leading cause for high academic achievement of East Asian countries cannot be supported by country-scale analyses. This stands in direct opposition to results from several individual scale studies, such as Entrich (2014), that we highlighted at the outset. One major problem we find with Entrich (2014) is scale mismatch between analysis and conclusions. Entrich (2014) found that shadow education is a major factor corresponding to between-individual variations in achievement in Japan. However, he concludes that shadow education plays a decisive factor explaining between-country variations in achievement. Is it actually possible that a factor explaining micro-scale variations does not explain macro-scale variations? To explore this question, let us take a simple example of two hypothetical countries having different achievement scores in international assessments. We shall assume that all students in Country A study 50 hours a week, while all students in Country B study 2 hours a week. The difference in study time would be one plausible factor explaining the betweencountry achievement variation. However, study time cannot be a factor explaining betweenindividual variations in achievement in Country A because of no between-individual variation in learning time. Here we can imagine that individual-scale analysis cannot provide direct evidence for a country-scale phenomenon. The key conceptual-turned-methodological point here is to nuance and advance the discussion with attention to the issue of scale, i.e. selecting an appropriate scale to understand the target phenomenon. We suggest that education researchers should conduct countryscale analyses of East Asian countries to examine the relationship between East Asian performance in international assessments and shadow education, as we have done. Due to the limited number of studies examining the relationship at a country scale, results of individual scales are often being misused to interpret academic achievement at a country scale.
Here it is worth pausing to think about the potential reasons for the shortage of countryscale studies. One apparent reason, of course, would be that historically individual data were more easily available and education scholars persist in using methodologies and strategies developed in the period when international data was unavailable. We, however, surmise another reason, one more philosophical. Education scholars seem to often implicitly assume that the realities are independent of perspective. This assumption might lead scholars to smaller scale analyses. Indeed, we have frequently received negative comments by reviewers of our other manuscripts that country-scale (or other aggregated) data cannot capture the fine-grained realities of education. Our position is different from this. We believe that the "realities" are only defined in relation to perspective and frame. There is little doubt that today national perspectives/frames are being replaced by international comparative ones, due in large part to the fervor surrounding international large-scale assessments such as TIMSS and PISA. Meanwhile, most education researchers remain wedded to the idea that classrooms are the primary reality of education. But it could be argued that we are in the midst of a slow shift wherein the "primary reality" is the global scale. Without recognizing this, politically expedient but untested hypotheses can easily take hold. Put somewhat less philosophically, reality is defined in relation to the problem, and there has been a large shift in the dominant problem over the past two decades due to the instruments like PISA and the research it makes possible (see Komatsu & Rappleye, 2017b). Our perspectivism position is quite common in other research fields, not only in philosophy but across the natural sciences (e.g., Jarvis & McNaughton, 1986;Komatsu et al., 2012;Waring & Running 1998). We hope this philosophical issue and its practical implications -How should educational research change in light of new global instruments for policy and research? -can be taken up more by a new generation of education scholars.
In this context, it should be unmistakable that we disagree with the way the OECD uses its data. The OECD quite often ignores rich domestic data which are potentially useful to check the validity of its arguments (e.g., the data used in Figures 1-3). In this piece, we brought into conversation ILSA data and rich domestic data for shadow education, suggesting that another analytical innovation should be a synthesizing of international and domestic data. This mutual exchange is essential for identifying relevant hypotheses explaining the target phenomenon. When we rely exclusively on international data to conduct analyses, we are often left with many "equally plausible" hypotheses. Yet, this diversity creates the possibility for someone to simply pick the hypothesis which is politically expedient (e.g., Morris, 2012), while rejecting other hypotheses without any deeper ensuing inquiry. Multiple hypotheses are not harmful so long as there is substantive dialogue, but without it the situation is ripe for selective reporting for one's own benefit. We feel this is frequently the case with OECD analyses: much more attention is given to identifying problems so as to pave the way for selling OECD-style solutions or 'policy authority' rather than in furthering understanding of different approaches to education worldwide (Komatsu & Rappleye, in review). We hope that our study, modest as it is, will lead more education scholars to engage in the exchange between international and domestic data, eventually leading to less potential "selling" of merely ideological positions.

Conclusion: From Japan to East Asia? From Analytical Questions to Onto-Epistemic Alternatives?
It remains unclear to what extent our findings from Japan can help in understanding the situation elsewhere in East Asia, i.e. Korea and Taiwan, systems with both similarities and highly significant divergences (see Aizawa, Kagawa, & Rappleye, 2018). Yet, given our preliminary analyses of those contexts woven in at various points above, we can readily imagine similar findings. Future research along those lines is necessary, led by researchers more familiar than ourselves with those contexts. As stated at the outset, our intended contribution herein was both analytic advance and reopening of potential explanations cut short by uninformed, OECD-style explanations.
We might well end the analysis there, but that would still leave open the most crucial question for us: If not shadow education, what drives East Asia's world leading achievement? If we were to listen to OECD experts and the reports they write, the question itself would never arise: it would already be answered by reference to structural features. But to the degree to which with those explanations are refuted, it become possible to engage with alternative explanations.
A leading scholar of differences in mathematics achievement worldwide based in Hong Kong wrote two decades ago: …the essential difference between the features of East Asia and the West rest on the different views about who or what the centre in the teaching and learning process should be. This is in turn based on fundamental differences in cultural values such as the nature of human beings, the nature of mathematics, etc. (Leung, 2001, p. 47, bold in original).
Although many scholars today would blush at the apparent reification inherent in the notion of "essential" differences and quickly problematize what "culture" might mean, could we accept the thrust of the argument here if we called this complex simply a "worldview"? That is, what if we simply understood "essential differences" as divergences in the deeper perspectives on the nature of knowledge, the meaning assigned to education, and concepts of selfhood? Could we accept it if "culture" was understood as less a fixed and reified entity, more a "mind set"?
Closely examining differences in PISA 2006 Science data, our recent study put forth the idea that structural and institutional factors do not explain observed differences (Komatsu & Rappleye, 2017b). The anomalies between theories espoused by OECD analysts do not square with observed data, suggesting the need for new theories. And it has been well established why and how the OECD needs to avoid "culture" if it is to maintain its legitimacy (Auld & Morris, 2016). In that vein, our study put forth the view that in Japan, and perhaps elsewhere in East Asia, an alternative theory of learning, what we called Type II Learning, was partially responsible for high achievement. By working to remove the usual tropes about East Asia, i.e., students are not creative, under tremendous pressure, education is geared towards exams and economic growth, and here that performance is secretly driven by shadow education, that piece tried to keep open space to think about the alternatives East Asia presents, joining others who have come before (e.g., Li, 2012;Stigler & Hiebert, 1999;Stigler & Stevenson, 1992;Tobin, Hsueh, & Karasawa, 2009;Tobin, Wu, & Davidson, 1989;Takayama, 2015Takayama, , 2018.
But why aren't those alternatives more visible already, after decades of research and global comparisons? The reasons are multiple and complex, of course, but the same Hong Kong-based scholar of mathematics cited earlier captures some of the main dynamics very well: …mathematics education, as a discipline, unlike the disciplines of mathematics and education is a relatively young field of study. If a narrow definition of mathematics education is taken, its root can be traced to the emergence of learning psychology at the beginning of the 20 th century. This was exactly the time when East Asian countries were either colonies of or subjected to heavy influence from Western countries. In the area of education, instead of developing a theory of mathematics education of its own, educators in East Asian countries either adopted a Western model of mathematics education or failed to develop any theory of mathematics education at all. Yet even without a theory of its own, teachers in these countries at the classroom level seem to have developed rather distinctive ways of teaching mathematics… (Leung, 2001, p. 37) The basic pattern described herein, where theories are imported and overlaid upon existing practice at the practical level, finds many similarities around the region. Recent studies too have stated, building on sociological theory developed in Japan, how these attempted "revolutions from above" driven by the repeated import of Western theories have failed to deeply change practice in ways that make it convergent on the West (Rappleye, 2018). Moreover, this pattern leads to considerable confusion about achievement across East Asia: the pedagogy looks 'backward' when viewed through Western lenses but it may be precisely that pedagogy, more specifically the underpinning worldviews and "mind sets" from which it emerges, that is the main reason for world-leading achievement (Komatsu & Rappleye, 2017b). It is crucial here to note that when "culture" is understood as something embedded in context (which it surely is) there is little hope to "learn from" rather than simply "learn about" (Takayama, 2015). But when it is understood as a "mind set", worldview, and/or as a set of onto-epistemic possibilities available to anyone, these East Asian approaches to achievement become a resource for learning for anyone, which -if sustained over long periods of time -would likely lead to higher levels of achievement. In fact, we already see affirmation of this general approach in, say, the importation of Japanese Lesson Study worldwide.
In light of this, we hope that other scholars will join in these lines of research, as a necessary prerequisite for the more arduous task of developing original theory (i.e., working up what Tobin et al. (2009, p. 242) usefully describe as 'unmarked beliefs' and 'an implicit cultural logic' into explicit symbolic systems). We feel that keeping open the possibilities to think differently is now an even more urgent and vital task, not least because the statistics the OECD uses and descriptions of the region they present are often factually flawed yet increasingly influential (see also recall Fig. 1). In the face of renewed OECD and Western media attempting to inscribe their own cultural views about education onto the minds of not just the East Asia region, but the wider world, we renew the call for the "push back" not just at the analytical level but also at the deeper level: from different onto-epistemic depths. Highlighting these differences should not be mistaken as divisive and reactionary, but instead as desideratum for continued learning and achievement worldwide. education policy analysis archives editorial board