What International Educational Evaluations Tell Us About Education Quality in Developing Nations

Few developing countries participate in external educational evaluations. Information gaps on education quality make it imperative to expand such evaluations . Furthermore, international comparability across different evaluations should be improved. In addition, data from evaluations must be combined with data on access or coverage . Finally, educational evaluations reveal social inequalities; socioeconomic status has a systematic influence on educational outcomes, but social gradients vary over countries. Resources alone cannot explain massive performance gaps between developing and developed countries. Large efficiency improvements must occur in classrooms and schools. The need is not for “league tables,” but for data that allow countries to judge the appropriateness of their policies and strategies in an international context. Efficient and Education Policy Analysis Archives Vol. 26 No. 50 SPECIAL ISSUE 2 targeted application of resources and policies to improve education in developing countries requires information on system performance, inequalities, progress and stagnation. International evaluations should be expanded to more countries, should be better anchored and comparable, and should be demystified. Too little international educational evaluation is the enemy of progress.


What International Educational Evaluations Tell Us About Education Quality in Developing Nations
International educational evaluations are much maligned.One of many instances of public criticism of such evaluations can be found in an open letter to the director of PISA, Dr. Andreas Schleicher, from an international group of academics (Guardian, 2014).Therein, they and other critics mention arguments against such evaluations, including reservations about the validity and reliability of standardized testing and reliance on quantitative measures; encouragement of shortterm fixes to help a country quickly climb international rankings; emphasis on measurable aspects of education only; encouragement of scripted plans for students to perform better on multiple-choice testing, which reduces teacher autonomy; and increased stress level in schools.
But as Sahlberg and Hargreaves (2015) point out, Just think for a moment what would global education look like if PISA had never been launched?There would be, as there was in the 1990s, a number of countries that mistakenly believed their education systems are the best in the world and should set the direction for other nations.
They mention particularly that PISA had shown that the admiration that had previously existed globally for the education systems and policies of the USA and UK were not justified.According to them, the PISA results corrected that view and probably contributed to the U.S. and British models not being copied as much as may have occurred otherwise.Many of the arguments about the value of international educational evaluations are brought from the context of economically developed countries, where internal evaluations are often already strong.In the context of developing countries, where there is often a dearth of external evaluations, many of these arguments do not hold.This chapter makes a case that (i) information gaps regarding educational quality in the developing world make it imperative that such evaluations should be expanded rather than reduced, (ii) international comparability across different evaluations should be improved, (iii) evaluations contribute more to our understanding of educational deficits in developing countries when they are combined with data on access and/or coverage, and (iv) educational evaluations can tell us more about social gradients and other inequalities (e.g.gender inequalities) in developing countries.The information offered in this chapter is an attempt to speak to some of these concerns, utilizing data from a variety of sources and applied to different contexts, but placing a special focus on South Africa and Mexico, two middle-income developing countries.

The Expansion of Access to Education
Developing countries have made considerable progress in the last few decades in improving access to education and keeping children at school longer.The Education for All campaign contributed by focusing international attention on problems of access to primary education, in particular amongst poor children and specifically girls.Though there is evidence of improving trends even before the Dakar Declaration of 1999 and the start of the Millennium Development Goals (MDGs), there was a subsequent acceleration in access to primary schools.As a consequence, fewer children in developing countries never go to schools (see Figure 1).Even in low-income countries, the proportion of children who have never been to school fell from 32% in 1992, to 23% by 1999 and then to 14% in 2008.Access to primary school has also become more common in the developing world, as Figure 2 shows.The primary attainment rate, i.e. the percentage of children starting primary school1 , rose for low-income countries to 57% in 2008, from 43% in 1992, and for low and middle income countries combined, it rose from 70% to 81% over the same timeframe.Another way to show the progress with educational attainment in terms of education quantity is displayed in Figure 3.For the five countries of the South African Customs Union, the data in Figure 3 are the proportions of different birth cohorts that have completed Grade 7, the end of primary school in that part of the world.As it is based on the population who have survived until the census or survey from which the data were derived, it would tend to paint an overly optimistic picture for older cohorts, as differential mortality favors better-educated people.Despite this, it is evident that there has been remarkable progress in these five countries over the six decades."Schooling Ain't Learning:" The Quality Imperative However, despite notable advances across the globe in schooling (as measured by years of education), developing countries still face a large deficit in learning (as measured by cognitive skills), a distinction strongly made by Lant Pritchett in his book The rebirth of education: Schooling ain't learning.Pritchett (2013) states that in many …countries around the world, the promise of schooling-getting children into seats in a building called a school-has not translated into the reality of educating children.Getting children into schools was the easy part.Schooling has seen a massive expansion such that today, nearly every child in the world starts school, and nearly all complete primary school (as their country defines it).This expansion of schooling is a necessary first step to education, but only a step.(p. 2) He goes on to argue that what is required is learning rather than schooling.From this it follows that measuring access to school does not provide much information about how much learning takes place.
The weak performance of many schooling systems in the developing world is demonstrated by international evaluations, as will be illustrated below.If education quantity is improving but the quality remains weak, there is a danger that the gains in education quantity will not translate into commensurate gains in cognitive outcomes.Poor quality education is also likely to constrain economic development, though low levels of economic development in turn may retard educational progress.It is no wonder that the Education for All Global Monitoring Report of 2005 was subtitled "The quality imperative".The realisation had dawned that improving quantity was just one part of the challenge.Consequently, the focus has now shifted to the quality of education as reflected in the  1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 Birth year Botswana Lesotho Namibia South Africa Swaziland changes from the Millenium Development Goals (MDGs) to the Sustainable Development Goals (SDGs).
To illustrate the cognitive backlog of developing countries, it is instructive to consider Figure 4.This figure shows the cumulative density curve of scores on the 2006 PIRLS (Progress in International Reading Literacy Study) reading and literacy test for children from South Africa and from England.Most children in South Africa were tested in their home language.Even though South African children were tested in Grade 5 and English children in Grade 4, the South African children are far behind their counterpart: only 8% of English children did not reach the low international benchmark of 400, but a massive 78% of South African children failed to reach it.Amongst English children, 70% reached at least the international set point (the average across participating countries) of 500, but this was achieved by only 9% of South African children.

International Evaluations: Coverage and Gaps in Coverage
The dilemma many developing countries face is that they do not have the means to assess quality of cognitive outcomes in ways that allow for both a geographic and a temporal comparison.Most countries now have at least one large national assessment that offers some measure of change over time.According to UNESCO (2015, p. 18), the proportion of countries that have at least one national assessment rose from 34% in 2000 to 69% in 2013.But some of these assessments do not ensure that difficulty levels are comparable over time, something that the international evaluations spend much time ensuring inter-temporal comparability.The bigger problem, though, lies with measuring outcomes against an external standard.In all of Africa's more than 50 countries, only five (Botswana, Egypt, Ghana, Morocco and South Africa) have participated in large international evaluations such as TIMSS or PIRLS, and none in PISA.Fortunately, there are several important regional evaluations.In Southern and Eastern Africa there is SACMEQ (Southern and Eastern African Consortium for Educational Quality), in Francophone West Africa there is PASEC (Programme d'analyse des systèmes éducatifs de la Confemen), and Latin America has SERCE (Second Regional Comparative and Explanatory Study).These regional evaluations, important as they are, still leave two large knowledge gaps.The first is that many countries are not covered by any of these regional or international evaluations.In Africa alone there are more than 20 such countries; in a large number of Asian countries, educational planners and the public have no inkling of the quality of the education that the school system offers.One of the main routes to potentially plug this knowledge gap is by the introduction of PISA for Development, also known as PISA-D, a new initiative aimed at designing and then expanding a PISA-type evaluation for developing countries.This would potentially offer a testing system that could be applied in many developing countries and can also be scaled relative to performance in PISA, thereby showing how far developing countries still have to go and what progress they are making towards performing at developed country levels.The second knowledge gap is that even those countries that do participate in regional evaluation efforts such as SACMEQ, PASEC or SERCE still do not know how their education quality compares to that of the rest of the world.This issue is discussed in the next section.

Plugging a Knowledge Gap: Calibrating Scores across International Evaluations
Currently, the only way to compare across international evaluations is by utilizing some overlap between countries that participated in different international evaluations to convert performance to a common metric.Attempts to roughly calibrate across different international evaluations in this manner include Gustafsson (2012Gustafsson ( , 2014)), Hanushek & Woessmann (2008, 2009), and Hanushek & Zhang (2009).Despite the limitations of such exercises, they do present proximate indications of the differences in education quality amongst countries.Figure 5 shows a number of countries whose scores have been converted to a common PISA metric by Gustafsson (2012Gustafsson ( , 2014)), plotted against their GDP per capita.The trend line shows that, generally speaking, higher GDP per capita is associated with better educational performance.Mexico lies somewhat below the line, i.e. their PISA score is somewhat worse than one would expect based on the country's economic status.South African educational quality is even much further below expectations, while Kenya performs well above expectations.These results show that performance in international tests, influenced as it is by the resources available to a country, is not solely determined by a country's level of economic development.In other words, how well the education system is functioning matters and, presumably, that in turn is influenced by policies and strategies applied in the education sector.The data on which this analysis was based do not show the full difference between developed and developing countries.The reason is that it does not consider that most children in developed countries are in school, while this often not the case in developing countries.For example, PISA tests 15-year-old children who are in school and at least in grade 7.That means that in 2012, 91% of Japanese children were tested, but according to the PISA Technical Report (OECD, 2014a) only 63% of Mexican children were. 2 Thus, according to the PISA data, more than a third of Mexican children aged 15 had not reached grade 7 and were therefore not included in the sample. 3 That could be because they have never started school, have dropped out, or have repeated so often that they have not reached grade 7 by age 15 (late entry into school may also affect this last reason).In Turkey only 68% of 15-year-olds were tested, and in Vietnam only 56%, illustrating that the PISA tests only covered part of the target age group and excluded those who have dropped out of mainstream education.
Figure 6 perhaps best illustrates the failure of school systems to provide acceptable cognitive outcomes at both a quantitative and a qualitative level. 4The data in this figure shows, for all the countries that participated in PISA, the proportion of all 15-year-olds that reached basic numeracy.Such basic numeracy, LEVEL 2 in PISA, is not very onerous: At Level 2 students can interpret and recognise situations in contexts that require no more than direct inference.They can extract relevant information from a single source and make use of a single representational mode.Students at this level can employ basic algorithms, formulae, procedures, or conventions to solve problems involving whole numbers.They are capable of making literal interpretations of the results.(OECD, 2014a, p. 297) Combining the PISA results for the proportion of the tested population that are "low achievers" (i.e., those who have not reached LEVEL 2, basic numeracy) and the Coverage Index 3 (the proportion of 15-year-olds in grade 7 or above) gives us the data that underlie Figure 6.This better reflects the large differences between the more developed and the few developing countries participating in PISA.Of the 65 participating countries (and territories), only seven had more than 80% of all 15-year-olds performing above LEVEL 1, and another 16 countries more than 70%.That means that even in many developed OECD countries large proportions of 15-year-olds are performing below a basic level in mathematics.Of course, the implicit working assumption in these calculations is that those 15-year-olds who have not reached at least grade 7 perform below LEVEL 2 (i.e., that they have not achieved basic numeracy), an assumption that is likely to be a relatively true 2 These figures are presented as Coverage Index 3 in the PISA Technical Report (OECD, 2014a).
3 Mexican education officials say that PISA only tested 15 year olds who had completed at least Grade 8. PISA documentation (PISA, 2014c), however, state that 1.1% of the 15 year olds tested were in Grade 7 and 5.2% in Grade 8.If what the officials say is accurate, it is an exaggeration that 37% of this age group in Mexico had not passed Grade 7.However, "most of the grade 7 and 8 students who were tested would most likely have performed below the basic numeracy level.In terms of the data in Figure 6, there would be a small shift from the category "Have not reached high school" (the green bar) to "Below basic numeracy" (the red bar).The bar of most interest, "Basic Numeracy" (the blue bar), would probably remain almost unaffected, as most 15 year olds in Grade 7 or 8 would probably perform below the basic numeracy level. 4A more sophisticated analysis of this sort has been undertaken for SACMEQ by two of my colleagues (see Spaull & Taylor, 2015;Taylor & Spaull, 2015) reflection of reality.This does not necessarily imply that children out of school learn nothing, but the informal learning that takes place is unlikely to improve their numeracy scores much.
Figure 6.PISA performance across countries in mathematics, 2012: Proportions achieving basic numeracy (level 2 or above), below basic numeracy, and not having reached grade 7 Source: Derived from the proportion of low achievers (below level 2) in PISA 2012 and Coverage Index 3, the proportion of the 15-year-old population not in grade 7 or higher.
The differential performance of countries across the international spectrum shown in Figure 6 is very relevant for drawing an important conclusion.Educational performance at any given time and with any given input of fiscal and educational resources is not fixed and immutable, as there are large differences in performance levels even amongst rich countries.Policy and effort matter, which is an exceedingly important finding.Educational performance is not destiny, but amenable to policy interventions.This makes the availability of information crucial for policy makers and participants in policy dialogues within countries.

A Further Problem: Inequality Within Countries
In the field of economics, the association between socio-economic status and a specific educational outcome is referred to as a social gradient.There are steep social gradients in cognitive outcomes within most developing countries (i.e., children from higher socioeconomic strata far outperform poorer children).This reflects the fact that only a relatively small segment of the population obtains a quality education.Typically, in many international educational evaluations, a measure of socioeconomic status (SES) is derived to rank participating children by SES.The most common SES measure used in such studies is an asset or wealth index, constructed using Principal Component Analysis (PCA) or Multiple Correspondence Analysis (MCA; see Filmer & Pritchett, 2001;also Booysen et al., 2008, for comparisons between PCA and MCA). 5 South Africa's steep social gradient, reflecting its legacy of inequality, is shown in Figure 7. Schools with higher SES students generally performed much better than those containing mainly poor children.The existence of such social gradients is universal, but gradients are seldom as steep as in South Africa.These steep social gradients are the source of much debate both within and across countries.Some contend that such high levels of inequality are detrimental to educational progress even for the rich, implying that the general performance of all children in highly unequal societies would be affected.The debates on educational inequality include a relatively large literature around the impact of different education systems and interventions in schools on reducing educational inequality, what Willms (2004) refers to as "levelling the bar" rather than simply "raising the bar".By that he means that interventions should be sought that not only raise aggregate learning across the whole spectrum, but that also reduce inequalities by especially benefiting poorer students.Kotzé (2016), in unpublished work, has tried to increase international comparability across countries and surveys for SACMEQ and SERCE 6 by utilizing wealth rankings from the international evaluations and matching these with per capita consumption rankings from household surveys.She corrects for the effect of some children not being in school in Grade 6 by allocating a low score to them.This allowed her to draw social gradients that have the log of per capita consumption rather than an asset index on the horizontal axis.Using Gustafsson's (2012Gustafsson's ( , 2014) ) PISA metrics allowed her to convert scores across SACMEQ and SERCE to a common metric, a PISA equivalent score.She derived two interesting graphs from these, showing respectively six weaker performing and six stronger performing countries in these two evaluations (Figures 8a and 8b).
The top figure, presenting the weaker performers, shows that Mozambique, one of the world's poorest countries, performs best among these six countries on the PISA-calibrated scale at 5 Such methods do not arbitrarily give weights to different possessions or attributes but rather use the available data to identify and extract a common latent variable, household wealth, and to allocate weights accordingly. 6Both SACMEQ and SERCE are Grade 6 evaluations.every level of per capita consumption. 7At income levels below the two poverty lines (representing $2 per person per day and $3.10 per person per day consumption), South Africa performs worse than Uganda and the Dominican Republic, though wealthier South African children perform better than children from richer households in the other countries shown here, reflecting South Africa's very steep social gradient.
In the case of Figure 8b, Mexico and Argentina are the weakest performers amongst poor children, but this deficit is largely reduced amongst children from wealthier households.The real surprise is the excellent performance of Kenya, which performs only slightly worse than Costa Rica amongst the poor, but is clearly the best performer amongst children from somewhat wealthier households.This presumably has to do with greater efficiency of the Kenyan school system than of its counterparts.
Figures 8a and 8b.Socio-economic gradient6 for 6 weaker and 6 better performing countries in SACMEQ and SERCE Source: Kotzé (2016). 7Because there are few rich people in Mozambique, the graph does not extend much to the right.

Summarizing the Argument
International education evaluations tell us several things about education quality in developing countries.Firstly, they tell us that for most developing countries there is a massive gap in performance compared to their developed country counterparts.For Mexico to perform at similar levels as the USA, its neighbour, would require massive improvements in the functioning of the education system.For such improved performance, fiscal and human resources could help, but a major part of the improvement would have to come from improved performance within classrooms and schools.The magnitude of this gap compared to developed countries is even much greater for South Africa, whose educational performance is much weaker than that of many other countries of Southern and Eastern Africa, even though South Africa has many more resources than most of these countries.
This finding leads to a second lesson that can be learned from the international evaluations.There are large performance differences between countries that can often not be explained by the availability of resources.Though the gaps are generally large between developed and developing countries, there are large differentials in performance within each of these two groupings.If resources cannot explain this result-and they usually cannot-there must be considerable scope for learning from comparative research on education.This does not imply that models found to work in one country would necessarily translate well to another country, but it does provide evidence that deep understanding of the similarities and differences between education systems must be of some value in education policy debates.Again, these debates would be so much the poorer in the absence of international evaluations to compare aspects of cognitive development and learning in different contexts.
A third lesson that international educational evaluations teach us is that socio-economic status always has a systematic influence on educational outcomes.Socio-economic inequality and educational inequality are thus linked in an important way.However, the fact that social gradients differ between countries, or change over time within countries, again raises important questions about the lessons that we may learn from comparative perspectives.The slopes of these social gradients are not immutable, and we can once again learn from comparative work.
In the introduction, I stated that information gaps regarding educational quality in the developing world make it imperative that such evaluations should be expanded rather than reduced.The need is not so much to know how countries perform on an international "league table," but rather to allow countries to judge the appropriateness of their policies and strategies and to enable them to compare themselves to other countries.Are there differences in resource availability, in teacher training, in homework, in parental involvement?What can one expect in one education system, given what we observe in others?
In the introduction, I also stated that international comparability across different evaluations should be improved.I presented some findings where countries participating in two different international educational evaluations, SACMEQ and SERCE, are compared.However, this comparison is by its nature imperfect, and more collaboration between different international evaluations would allow greater possibilities for making international comparisons, with all the benefits that that would bring.If there were common test items contained within SERCE, SACMEQ, PASEC, and PIRLS, for instance, using Item Response Theory (IRT) would make it possible to equate the difficulty level of the tests, which would contribute much to improving international comparability of results.This is not simple, though, as the selection of test items, taking into consideration the different aims of different evaluations, cultural factors and translation problems, all create significant barriers that need to be overcome.
Educational evaluations contribute more to our understanding of educational deficits in developing countries when they are combined with data on access and/or coverage.As access improves in developing countries, the measures that can be derived from census or survey data, such as access to and coverage of the school system as well as differential attainment of people of different age, gender or socio-economic groupings would tell us progressively less, whilst measurement of cognitive outcomes though educational evaluations would become increasingly more important.Utilizing educational evaluations along with census or survey data is not always easy to do, as it requires relatively skilled interrogation and interpretation of data.It might also sometimes not take place to the extent required if policy makers prefer a more positive message than such data triangulation sometimes brings.

In Conclusion: The Challenge
How to improve the quality of the education received by poor children in developing countries remains a major unresolved issue.Though resource constraints may play a role in some cases, policies are also needed to ensure greater efficiency of resource use in schools serving the poor.The efficient and targeted application of resources and of policies cannot, however, take place in an information vacuum: they require information on system performance, inequalities, progress and stagnation that can only be gleaned from wide-ranging data gathering and interpretation processes.International evaluations already do play a major role in this regard, but should be expanded to more countries, be better anchored and comparable, and be demystified.In most developing countries, too little international educational evaluation is the enemy of educational progress.
University of Montreal (Canada) and a doctorate from the Center for Research and Advanced Studies in Mexico.At the Center she leads a research program in the politics, institutions and actors that shape the relations between education and work; and with the agreement of her Center and the National Union of Educational Workers, for the years 1989-1998 she served as General Director of the Union's Foundation for the improvement of teachers' culture and training.Maria has served as President of the Mexican Council of Educational Research, and as an adviser to UNESCO and various regional and national bodies.She has published more than 50 research papers, 35 book chapters, and 20 books; and she is a Past-President of the International Academy of Education.

Figure 1 .
Figure 1.Percentage of children who have never been to school by country grouping Source: UNESCO (2015), p. 8, Fig. 0.6

Figure 2 .
Figure 2. Percentage of children aged 9-11 in low-and middle-income countries who have attained some primary education Note: The term "attain" as it relates to primary education is used by UNESCO to indicate starting primary education.Note that Figures 1 and 2 relate to different age groups.Source: UNESCO (2015), p. 9, Fig. 0.7

Figure 3 .
Figure 3. Percentage of birth cohort that completed Grade 9 Source: Calculated from survey and census data.

Figure 4 .
Figure 4. Cumulative percentage of children from South Africa and England in PIRLS 2006 scoring below each score level shown Note: First Plausible Value (PV1) in PIRLS dataset used.South African children were tested in Grade 5 (blue line), English children in Grade 4 (red line).

Figure 5 .
Figure 5. Country performance in international educational evaluations in PISA metrics by per capita GDP (PPP$), around 2011 Source: PISA metric from Gustafsson, 2014; GDP per capita from World Tables

Figure 7 .
Figure 7. Social gradient in South Africa in SACMEQ III: Mathematics score of schools by average SES of pupils Source: Calculated from SACMEQ data.
.phillips@gmail.comD. C. Phillips was born, educated, and began his professional life in Australia; he holds a B.Sc., B.Ed., M. Ed., and Ph.D. from the University of Melbourne.After teaching in high schools and at Monash University, he moved to Stanford University in the USA in 1974, where for a period he served as Associate Dean and later as Interim Dean of the School of Education, and where he is currently Professor Emeritus of Education and Philosophy.He is a philosopher of education and of social science, and has taught courses and also has published widely on the philosophers of science Popper, Kuhn and Lakatos; on philosophical issues in educational research and in program evaluation; on John Dewey and William James; and on social and psychological constructivism.For several years at Stanford he directed the Evaluation Training Program, and he also chaired a national Task Force representing eleven prominent Schools of Education that had received Spencer Foundation grants to make innovations to their doctoral-level research training programs.He is a Fellow of the IAE, and a member of the U.S. National Academy of Education, and has been a Fellow at the Center for Advanced Study in the Behavioral Sciences.Among his most recent publications are the Encyclopedia of Educational Theory and Philosophy (Sage; editor) and A Companion to John Dewey's "Democracy and Education" (University of Chicago Press).