The Untapped Promise of Secondary Data Sets in International and Comparative Education Policy Research.

The objective of this commentary is to call attention to the feasibility and importance of large-scale, systematic, quantitative analysis in international and comparative education research. We contend that although many existing databases are under- or unutilized in quantitative international-comparative research, these resources present the opportunity for important, policy-relevant descriptive studies. We conclude the commentary with overarching observations about the strengths and limitations of such secondary data-based analysis.


Introduction 1
Who teaches marginalized children in developing countries?What sort of school infrastructure is available to children across diverse settings?What is the profile of school leaders in low-income settings internationally?These questions have several things in common.They have important implications for education policies related to access, equity, and quality.They are descriptive in nature and reasonably answerable with analysis of existing secondary datasets.And perhaps most importantly, these questions are largely unanswered.Yet, with the growing prevalence of international data collection efforts, accompanied by increasing participation of developing countries in these efforts, the potential for rich, policy -relevant educational research across diverse education systems has expanded substantially.
The objective of this commentary is to call attention to the feasibility and importance of large-scale, systematic, quantitative analysis in international and comparative education policy research.We contend that although many existing databases are under-or unutilized in quantitative international-comparative research, these resources present the opportunity for important, policy-relevant descriptive studies.
In the sections that follow, we describe the growing use of large-scale, secondary data generally and in cross-national educational research, pointing out opportunities, challenges, and key considerations for conducting this type of work.This discussion includes the identification of more than 20 relevant datasets that researchers can draw upon for international comparative education research.We conclude with a discussion of implications for the use of large -scale data in cross-national education policy research.

Data That Sing? Potential and Pitfalls of Large-Scale Secondary Data
There is an undeniable excitement around the use of large-scale data across various academic and commercial disciplines.Phrases like "data revolution," "big data," and "datadriven decision-making" are common in social and commercial spheres.Education is no exception.As the availability and technical capacity to handle large, complex datasets have grown, so have the use and awareness of the potential of this information for a multitude of purposes.
A few recent examples of data use stand out.Observers of international and comparative educational research may recall Dr. Hans Rosling's 2006 TED talk, "The Best Stats You've Ever Seen," as an example of excellent large-scale data use.The TED website rightly noted, "In Hans Rosling 's hands, data sings" (TED, n.d.).In this talk, viewed over 10.5 million times as of June 2016, Dr. Rosling, a medical doctor and statistician, provides a compelling and highly informative whirlwind tour through the changing wealth and health of countries and regions across the world.
Drawing from data "reported and processed" by the UNESCO Institute for Statistics (UIS), UNESCO's Global Monitoring Reports (GMR), and now Global Education Monitoring Report (GEM) have also employed excellent graphical display of information to illustrate global educational trends (GMR, n.d.).The accompanying GEM website's emphasis on "Data Visuals" is also evident.UIS itself serves as an excellent online source for "cross-nationally comparable statistics on education, science and technology, culture, and communication for more than 200 countries and territories" (UIS, n.d.).
The World Inequality Database on Education (WIDE), first created as the Deprivation and Marginalization in Education (DME) dataset for the 2010 EFA GMR, brings together various large-scale cross-national databases and provides another excellent example of the descriptive power of large-scale secondary data (http://www.education-inequalities.org/).As these examples demonstrate, in the right hands, data can tell a very compelling story, if not sing.Increasingly, we also find prominent conversations about the "data revolution" (http://www.undatarevolution.org/) and the participation of education scholars in these conversations (e.g., Rose, 2014).The data revolution website and the associated report provide further context for these discussions.Most recently, the need for a data revolution2 was expressed by a High Level Panel appointed to guide the post-MDG discussion by the UN Secretary-General Ban Ki-moon.Various prominent research and educational organizations also regularly arrange workshops (both online and at academic conferences) on large-scale secondary datasets and methods to analyze such data.These efforts are no doubt putting a spotlight on use of big data for international and comparative educational scholarship.
Notwithstanding the examples cited above, the overall use of existing, large-scale, secondary databases for descriptive work in international comparative education is limited.Broadly, this may be due to either the lack of data or the lack of capacity, within and outside of academia, to work with large datasets.Several additional nuances further complicate the situation: data that are available may not always be sufficiently high quality, easily accessible, easily retrievable, or able to unite with other data sources.Similarly, real conceptual, technical, and epistemological challenges associated with large-scale quantitative work may generate additional considerations that limit the relevance and viability of such research.All of these nuances deserve attention and must be attended to carefully.In this commentary however, we argue that at least the first of these two broad challenges, i.e., the unavailability of interesting and suitable data, should not constrain the field of international and comparative scholarship.As we describe below, the growing diversity among cross-national datasets offers substantial potential for important cross-national descriptive policy-relevant research in education.

Data Availability for International and Comparative Research: A Range of Options and Possibilities
Since the turn of the 21 st century, many more developing countries have begun participating in cross-national data collection exercises.The more commonly known studies like TIMSS and PISA have gradually grown from 35-40 countries to 65-70 countries.Regional efforts such as LLECE in Latin America (since 1997), as well as SACMEQ (since 1999) and PASEC (since 1993-94) in sub-Saharan Africa, have continued to generate large amounts of systematic, cross-national educational information.Within the last decade or so, volunteerdriven efforts to test learning, such as ASER in India and Pakistan and UWEZO in East Africa, also offer prominent additions to this list.Recent USAID-funded efforts across the world like the Early Grade Reading Assessment (EGRA) and Early Grade Math Assessment (EGMA) provide other valuable resources.And this is just a brief list of educational databases that directly measure student learning.In Table 1, we provide a comprehensive-but by no means exhaustive-list of multi-country datasets available, along with a few important attributes of these data that researchers should consider.
Table 1 illustrates the substantial range of data that have been gathered across the globe, often from multiple countries, often multiple times.These datasets vary considerably in terms of their purpose and the focus of their data collection.While many datasets are gathered repeatedly, the presence of longitudinal datasets is limited.Table 1 provides a few important ways in which education scholars or practitioners may think of large-scale databases for their own work.
Since one simple logic driving data selection is often an interest in a specific education system, country, or region, one standard way to think about these datasets is according to the countries or regions that they represent.Obtaining country-participation information is typically not difficult.For instance, on their websites, large IEA databases provide a list of all the countries covered in a particular data collection exercise.Some important differences in country coverage across these datasets are noteworthy.Long-existing cross-national student performance data collection exercises like TIMSS have much broader coverage than more recent cross-national datasets that investigate newer topics, like TEDS-M.It is also often the case that some large-scale data collection exercises are skewed in favor of higher-to middle-income  (Chudgar & Luschei, 2009).On the other hand, when data are funded by bilateral aid agencies such as USAID or DFID, or gathered through South-South cooperation or volunteer-driven efforts, we find a heavier emphasis on developing countries such as the DHS, Young Lives, or ASER data.
An alternative approach to assessing or selecting datasets is to focus on prominent agencies associated with data collection exercises.For example, a user familiar with IEA will know that the organization gathers data covering issues as varied as civic learning and computer literacy.Another benefit of using data associated with larger, well-organized efforts is that data documentation and related support for data use may be readily available.For instance, several of these datasets are gathered using a complex sampling framework.While this approach makes data nationally representative, researchers working with these data must take into account associated features, like the complex sampling structure and sample weights to generate representative estimates from these samples.To facilitate researchers' efforts, IEA provi des an excellent online tool called IDB Analyzer which allows even novice researchers to readily and accurately use the IEA data.In some instances, large data collection operations may also include robust online user groups, as in the case of the DHS data.Several data collection agencies also regularly engage in training efforts both online and at relevant conference venues, providing users a chance to learn more about these resources.However, this level of support may not be available for some of the other smaller (in scale or funding) data collection efforts.
For educational scholarship, the unit of analysis of data collection may be another important criterion in assessing or selecting datasets.Broadly, data used in educational research come from one of three sources: households, classrooms, or schools/educational institutions.Household data allow us to observe a child along with his or her family, which help to generate a rich picture of the child's home background, parental education, and sibling st ructure.However, in such datasets, with a few notable exceptions like ASER, it is not possible to learn a great deal about children's performance on standardized tests or their classroom, teacher, or school experiences.In this regard, data that are gathered from the household will be limited compared to data collected directly from the classroom teacher or school principal.
When data are gathered at the classroom level, we may get a clear sense of a child's peergroup and in most cases, some measure of learning levels, as well as extensive information on the teacher and school (see Heyneman & Lee, 2015, for recent reviews of such resources).However, such datasets may be missing detailed information about the child's home circumstances, as children are often not the best informants when it comes to reporting on parental education, wealth, or income levels (see Chudgar, Luschei, & Fagioli, 2014, for a related discussion).Classroom-level data may also be limited in sufficient material available to allow a researcher to paint a nuanced picture of the school beyond the basics.
One standard issue is that most such data collection efforts usually survey one classroom per school, or they survey two classrooms in two different grades (for example, TIMSS, SACMEQ, and PASEC).These datasets are not ideal for a researcher interested in studying, for example, teaching communities within a school, as we observe no more than the teachers associated with the surveyed classrooms.The third category of data, gathered at the institution level (for example, TALIS) may, by design, focus on the school as the unit of analysis, surveying teachers within the school.Such data may sketch a general profile of students in a school but may not provide information on specific children.
The purpose of data collection may also be important, although several large -scale resources are collected with a broad mandate and can be useful for a wide range of uses that may not have been conceived by the initial planners.Not all databases that are useful for educational research may have been collected for that purpose, but they still may contain important information (variables) that is relevant for extensive educational scholarship that goes beyond understanding variation in student test performance.For instance, the DHS data permit detailed investigation of various adulthood outcomes including attitudes, access to information, sexual behavior, fertility practices, and how they associate with individual education levels.Datasets like AfroBarometer or Pew Global Attitudes and Trends have similarly been used by scholars to assess the attitudes of adults with varying levels of education toward a range of social and political issues (for example, Shafiq, 2010).
Although it provides many key resources, Table 1 does not cover all of the multi-country datasets that may be relevant for educational researchers.Readers may also be interested in exploring resources and data archives such as the World Bank database, data available through the LIS Cross-national Data Center in Luxembourg, and the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan, which provides systematic listings of a range of databases.
The table also does not provide information about several excellent country-specific resources.Many developing countries have data collection efforts that generate nationally representative datasets.In the case of India, for instance, the National Sample Survey Organization (NSSO) gathers large-scale, nationally representative data on education, employment, and household expenditure, which may all be relevant for education scholars.Data from Brazil (SAEB), Colombia (ICFES), and Chile (SIMCE) all provide important examples of educationally-relevant data in Latin America.Another fruitful source for education may be national administrative databases.As countries digitize their educational systems, opportunities to obtain large amounts of information on students, teachers, and schools through administrative records also increase.
In spite of the vast availability of educationally-relevant data, with the exception of a few commonly known datasets likes TIMSS or PISA, lesser-known regional resources receive far less attention in international and comparative educational research.As an example, we used ProQuest to search the abstracts of six international and comparative education journals that are widely recognized across the field.In the abstracts, we searched for the occurrence of the names/acronyms by which the data are most commonly known (such as "TIMSS," "PISA" etc.).3 Admittedly this is a crude approach and will undoubtedly miss papers in which the authors use these data, but have chosen to refer to them by their complete name in the abstract, or in some instances not mention the the data in the abstract at all.Nonetheless it provides one quick way to assess how the 20 or so datasets listed in Table 1 are used across the field of international and comparative education.The results showed 54 papers that mentioned PISA in their abstracts, 31 that mentioned TIMSS, 10 that mentioned SACMEQ, and four that mentioned Young Lives.For all the other datasets we have listed in Table 1, our search yielded zero to one paper.

A Nod to Challenges of Causal Research
As we discuss the various strengths of existing data resources, we would be remiss if we did not discuss an important limitation of several of these datasets.As we noted in Table 1, in most cases, the data available are cross-sectional and in few instances were these data gathered specifically for policy evaluation.These features of the data limit their potential for generating causal estimates.Establishing cause-and-effect relationships is important for educational scholarship when we hope to change educational outcomes (the effect) by identifying what can help create that change (the cause) (for example, see Murnane and Willet, 2010).Studies t hat establish causality are therefore evaluated as more robust for policy purposes in comparison to studies that establish that two things are related (for example, see recent literature reviews by Ashley et al., 2014, or Glewwe et al., 2011).Causal studies may draw primary data from randomized field trials (see Duflo and Kremer, 2005), or they may make innovative use of existing databases, including the types of data we discuss in this commentary (for example, West & Woessmann, 2010).
Indeed, employing certain techniques-like regression discontinuity or difference-indifference analysis-with secondary data, researchers can closely approximate a randomized experiment and arrive at findings with a strong causal warrant (for example, Angrist and Pischke, 2008).Yet this sort of research is demanding in terms of data required; most existing databases, especially cross-sectional data, although perfectly suited for descriptive analysis, are not always able to meet the standards for causal research (see Rutkowski and Delandshere, 2016, for a related discussion).
While noting the limits of such data for causal work, we may also note that the focus on causality, especially the use of RCTs, is not without its critics, including prominent economists like Angus Deaton (2009).It is not the purpose of this commentary to argue for or against causal research, but we do argue that such a focus ought not to prematurely draw scholarly attention away from the many descriptive affordances of large-scale secondary data.4

The Potential of Good Descriptive Work
As Table 1 and the above discussion make clear, scholars have access to extensive secondary data from a range of countries around the world.Although a vast majority of these data are not readily amenable to causal work, they are perfectly suitable for extensive descriptive analysis.Here, we use the term "descriptive analysis" to include all research that is not explicitly causal (either experimental or quasi-experimental).In other words, well-executed multivariate regressions are also descriptive in this sense if they are unable to identify a specific causal mechanism.A 2002 article in The Lancet noted that descriptive studies inform "trend analysis, health-care planning, and hypothesis generation" (Grimes & Schulz, 2002, p. 145).This observation is accurate for education as well.Indeed, well-designed and innovative descriptive studies have been instrumental in the field of international comparative education to shed light on new areas of study and to focus our attention on questions that have been under-studied.
To illustrate the potential of excellent descriptive analysis, we note two important studies that span the last four decades.First, Heyneman, and Loxley (1983) brought together disparate data from 29 high-and low-income countries and investigated the relative importance of home versus school background factors in explaining variations in student performance.Although not causal in nature, their findings revealed an interesting insight about the relatively greate r importance of school resources in poor countries.This study questioned the universality of the findings of the 1966 United States Coleman Report, which stated that the influence of the home was greater than that of the school.
More recently, Carnoy and Rothstein (2015) revisited the relative underperformance of the United States in various cross-national studies of educational achievement.Once again, through a careful descriptive investigation, they highlight the importance of social class in explaining U.S. educational performance.They argue that the US contends with a much larger low-SES population than the countries with which it is often compared.If these differences are accounted for, then U.S. performance is not as dismal as portrayed in standard narratives.This descriptive analysis essentially serves to reframe conversations around U.S. underperformance on cross-national tests.
These two studies are just a sample of the range of such research that educational scholars have generated in recent years.A range of other such work both in the U S and internationally has defined education policy scholarship in important ways (for example, Farrell and Oliveira's 1987 study of teacher effectiveness and related costs in developing countries as well as Lankford, Loeb, and Wyckoff's 2002 analysis of the distribution of teachers in New York State).Most recently, work at the Stanford Education Data Archive provides another outstanding example of harnessing large, diverse yet related databases to understand and improve educational opportunities in the United States (https://cepa.stanford.edu/seda/overview).Yet given the vast resources available to us, the space for thoughtful, cross-national, descriptive work that relies on existing large-scale resources is underexplored.

Limitations of Large-Scale Data for Cross-National Research and Final Reflections
Having illustrated and discussed the strength of such resources above, in the final section of this commentary, we provide some concluding observations on the limitations of such databases, while also offering thoughts on the way forward.
Although large-scale data offer many promises and possibilities, these resources are not without their limitations.Most descriptive studies using secondary data cannot adequately address the questions of why or how educational phenomena occur.To shed light on these critical questions, researchers must often turn to a more qualitatively-oriented approach, including indepth case studies along with ethnographies, interviews, and focus groups.
We also identify several other challenges and limitations of working with these types of data.To begin, while most of the resources we have discussed are easy to access, some often require additional paperwork (and in some cases payment, such as NSSO data from India).Also, depending on the data collection agency, the quality of data documentation may or may not be adequate.Data documentation-or documents that provide user guides, background on the data, and original questionnaires-are crucial to make meaningful use of these resources.
Data may also suffer from technical limitations, such as an absence of important concepts or constructs that are challenging to measure (for example, family wealth or even income are important but not easy to accurately measure and report); measures that don't follow psychometric conventions in student test-score measurement (for example, many of the volunteer-driven test-score collection efforts); and vast amounts of missing data.
There are also challenges posed by the absence of a crucial variable that may render an otherwise interesting dataset useless for a specific question.For example, a study attempting to understand the performance of contract teachers must identify a dataset that provides a range of teacher variables, but crucially identifies whether the teacher is a contract teacher or not.Just as the lack of key variables can be an impediment, we must also note that various levels of education are also unevenly covered in the present data sources.For instance, a large-scale study of the early childhood or higher education sector encompassing diverse developing countries and using secondary data is currently not feasible with the data available, per our knowledge.A call for more systematic and appropriate data collection remains quite relevant for education research in spite of the availability of the resources we have discussed here.
Another technical issue that is commonly faced by researchers working with multiple datasets is the difficulty of merging different datasets across levels of analysis, like villages or districts.Further, as we noted above, most of these datasets are not longitudinal and do not align with policy shifts.These factors can make it difficult to answer some of the more exciting policy questions, especially those related to causation.
Finally, even as we highlight the potential of cross-national comparisons, we must note that comparing datasets across countries and over time can pose many challenges and require careful thought and resolution.One key consideration is the comparability of constructs related to student background and socioeconomic status, which serve as an important control variable in quantitative educational studies (Fuller & Clarke, 1994).As Buchmann (2002) noted, comparative researchers must straddle the "fine line between sensitivity to local context and the concern for comparability across multiple contexts" (p.168).For example, the number of books at home is generally considered a useful indicator of family socioeconomic status (Wößmann, 2003).Yet, as readers familiar with developing countries will attest, such a variable may not "perform as well" as an indicator of home circumstances in many less-developed countries (see also Chudgar et al., 2014).
These limitations notwithstanding, we hope that we have made a strong case for more systematic attention to the use of large-scale secondary databases to inform pressing education policy questions in cross-national and international scholarship.As access to computers and hand-held technology becomes ubiquitous, data collection driven both by public and private actors will increase.According to one estimate, 90% of the data available today have been created just in the last two years (IBM, n.d.).An important outcome of larger and more systematic data collection by public actors will be greater availability of administrative databases.Such local databases will also open up more opportunities for not just scholars, but also for bureaucrats and policymakers in countries across the world to engage in data-driven decisionmaking (for example, see Vignoles, 2016, for a further discussion of how scholarship in the United Kingdom has benefited from large administrative databases).
To return to the questions with which we began this commentary, it must be evident to the reader that the range and types of data we discuss here are capable of answering these and many other such important questions.For instance, datasets like TALIS permit an in -depth study of school leaders and leadership styles and datasets like TIMSS, SACMEQ, PASEC, and TERCE provide information on school background that can be used to study variations in school infrastructure.Our own work has addressed the issue of teacher distribution c rossnationally (Chudgar & Luschei, in press).These questions allow us to understand learning opportunities in low-income countries by focusing on school leaders, infrastructure, and teachers.
To conclude, we must note that in this commentary, we have not engaged with larger epistemological debates about the appropriateness of knowledge represented by large -scale secondary datasets.It is certainly not our intention to suggest that this form of scholarship can or should replace other forms of research, either qualitative or quantitative.We have also not discussed important ethical and human subject issues that will become relevant as more data become available from developing countries.We acknowledge that these are important issues and a critical area of scholarly attention and discussion that should move in parallel with a call to make more and better use of existing secondary datasets in international and comparative education policy research.

David Carlson, Sherman Dorn, David R. Garcia, Margarita Jimenez-Silva, Eugene Judson, Jeanne M. Powers, Iveta Silova, Maria Teresa Tatto
education policy analysis archives editorial board Lead Editor: Audrey Amrein-Beardsley (Arizona State University) Consulting Editor: Gustavo E. Fischman (Arizona State University) Associate Editors: (Arizona State University) Cristina Alfaro San Diego State University