On Ideology , Causal Inference and the Reification of Statistical Methods : Reflections on “ Examining Instruction , Achievement and Equity with NAEP Mathematics Data ”

The purpose of this article is to comment on the prior article entitled “Examining Instruction, Achievement and Equity with NAEP mathematics data,” by Sarah Theule Lubienski. That article claims that a prior article by the author suffered from three weaknesses: (1) An attempt to justify No Child Left Behind (NCLB); (2) drawing causal inferences from cross-sectional data; (3) and various statistical quibbles. The author responds to the first claim, by indicating that any mention of NCLB was intended purely to make the article relevant to a policy journal; to the second claim, by noting his own reservations about using cross-sectional data to draw causal inferences; and to the third claim by noting potential issues of quantitative methodology in the Lubienski article. He concludes that studies that use advanced statistical methods are often so opaque as to be difficult to compare, and suggests some advantages to the quantitative transparency that comes from the findings of randomly controlled field trials.

Editor's Note: This article is a response to Sarah Lubienski's (2006) article that appears at http://epaa.asu.edu/epaa/v14n14/, which discussed Wenglinsky's (2004) article available at http://epaa.asu.edu/v12n64/.It is the practice of Education Policy Analysis Archives to publish one round of responses to articles where it is merited.Additional discussion of this and other articles is welcome online at http://epaa.info/wordpress.Over the last decade, the author has published half a dozen studies of relationships among school and teacher characteristics and student achievement using data from the National Assessment for Educational Progress (NAEP).Otherwise known as "the Nation's Report Card," NAEP provides test data on nationally representative samples of fourth, eighth and twelfth graders in a variety of subjects over multiple years.There are many methodological challenges faced in the analysis of NAEP data, but none as nettlesome as its cross-sectional nature.Because the data are cross-sectional, the finding of relationships between school characteristics, such as class size, and student achievement cannot be used to draw causal inferences.This point has been made by the author in nearly all of his publications on the topic, and is reiterated by opponents of the author's conclusions.If the policy conclusions are of a constructivist nature, the author finds himself attacked by conservative-learning researchers on the grounds that he is making causal inferences.If the policy conclusions are of a didactic nature, the author finds himself attacked by liberal-leaning researchers, generally on the same grounds.The critics usually also sprinkle in some methodological quibbles, such as wondering what the results would have been if variable X had been measured slightly differently, but the core criticism is that the cross-sectionality of the data make causal inferences impossible.
A recent instance of this occurred with Sarah Theule Lubienski's (2006) response to the current author's article on the achievement gap, "Closing the Racial Achievement Gap: The Role of Reforming Instructional Practices" (Wenglinsky, 2004).The current author's article distinguished between two types of racial achievement gap, that "between schools," meaning between predominantly minority and predominantly white schools, and "within schools," meaning between White and minority students in the same school.The author found that a variety of instructional practices, mostly of the constructivist variety, but some not, were negatively related to the withinschool achievement gap, but unrelated to the between-school achievement gap.The author concluded that using the identified practices might be a viable strategy for reducing the achievement gap within schools, but not the one between schools, which would require some more macroinstitutional change.Lubienski's (2006) response, "Examining Instruction, Achievement and Equity with NAEP Mathematics Data," also found some instructional practices to be associated with the racial achievement gap, but with the caveats that only constructivist techniques evinced a relationship and that these techniques did not go so far as eliminating achievement gaps (the gaps she examined were analogous to the within-school gaps the author examined).
Lubienski's article raised the three questions that the current author finds are commonly raised about NAEP secondary analyses.First, Lubienski suggested an ideological underpinning to her critique: The current author's study was supposed to suggest how the Bush Administration's No Child Left Behind (NCLB) could succeed, whereas her study focused on empirical support for the National Council of Teachers of Mathematics' "reform-minded" (read constructivist) instructional practices.Second, Lubienski raised the issue of causal inferences, claiming that the current author used causal language in his study.And, third, she proposed some statistical quibbles which amounted to the notion that she approached her analysis slightly differently than the current author did.
With regard to the first argument, the current author had no ideological agenda.The mention of NCLB did not constitute and endorsement of it, but simply an attempt to find some relevance to policymakers in an article submitted to a policy journal.It has typically been the author's experience that the findings of NAEP secondary analyses rarely fit neatly into one ideological framework or another.Thus the author's study of school finance found that the effectiveness of school dollars depended upon how they were spent (Wenglinsky, 1997).As another example, the author's foray into the debate about whether educational technology made a difference found that technology effects depended upon how the technology was used (Wenglinsky, 2005).Rarely are the findings from statistical analyses of large-scale data unequivocal, and "Closing the Racial Achievement Gap" was no different, finding that the effective practices were an ideological potpourri, leaning somewhat towards the constructivist side.
The second argument, about causal inferences, is to some extent a red herring.As Lubienski admits, the current author acknowledged repeatedly that causal inferences cannot be drawn from cross-sectional data.He specifically noted that while he would use the phrase "school effect," he meant it in the statistical sense (as in "effect size") and did not intend it to connote causality.Lubienski herself sometimes falls into causal language, such as when she refers to instructional practices as "predictors" of achievement, suggesting that they are temporally, and thus causally, prior to test scores (Lubienski, 2006, p. 7).And in another analysis of NAEP data, in which Lubienski seeks to measure the relationship between attending a private school and student achievement, she talks of private-school effects (C.Lubienski & S. T. Lubienski, 2006).Given this challenge in both Wenglinsky's and Lubienski's work, one may rightly ask whether it is worthwhile to engage in secondary analyses of NAEP at all.The answer is that, while correlational analyses are not good at establishing causal relationships they are good at identifying variables that should subsequently be subject to more rigorous analyses.This view is the rationale behind the Institute for Education Sciences framework for its discretionary grants program; it has a continuum of research goals under which researchers can apply, beginning with secondary analyses to identify variables of importance, proceeding to developmental studies that make the transition from variable identification to the creation of an intervention, finally to experimental studies that can establish causation for interventions.The reason that correlational analyses are an important first step is because they suggest where to concentrate and where not to concentrate.Given that, in the Lubienski study, the use of calculators proved unrelated to the racial achievement gap, it is unlikely that providing calculators to students, alone, is a worthwhile intervention.Thus Lubienski is correct in suggesting that cross-sectional data do not support causal inferences, but such secondary analyses are a crucial piece of work prior to the development and testing of educational interventions.
There is perhaps a more important reason that secondary analyses must be viewed as preliminary to more rigorous research designs, and that is because secondary analyses nearly always fall victim to "statistical reification."This term refers to the fact that researchers typically treat the statistical method of the day as an absolute basis for truth, even though the rationale behind using the technique, to say nothing of the mathematics involved, is typically opaque.Why use hierarchical linear modeling rather than structural equation modeling?When is multicollinearity severe enough to discredit results, given that most of the research questions of interest involve creating a significant degree of multicollinearity?Statistical quibbles can be raised about any secondary analysis.Lubienski's is no exception.Although she does not present the correlations among her factors, she refers to their being highly correlated.Highly-correlated factors suggest a poorly-fitting factor model and therefore potentially invalidate it.In addition, the proper way to verify a factor structure is through confirmatory factor analysis using a separate replication sample, not through creating scales and running Cronbach Alphas on the same data.And, disturbingly, here Cronbach Alpha's are mostly below what many consider to be the cutoff of .7.One other issue is that her models disaggregate teacher effects to the student level, which means that the instructional variables are only partitioning student-level variance in achievement, not school-level variance in achievement, and thus may understate the size of instructional effects.Does all of the foregoing invalidate her conclusions?The reader will probably decide by comparing the findings to his or her own experiences with education reform.
Secondary analyses need to be conducted with limited goals, and some kind of more robust research design (quasi-experimental or experimental) used for the more ambitious goal of demonstrating the efficacy of an intervention.One reason for this is that experiments are designed to support causal inferences, by holding constant all variables besides exposure to the treatment.They therefore address selection bias in a way that statistical analyses cannot.But a more important reason is the transparent nature of the results of an experiment.The results are transparent because they generally involve performing a student's t-test on two raw scores, that of the treatment and that of the control.These kinds of comparisons are more likely to be persuasive to a policymaker or educator than the elaborate debates over the appropriate multivariate method, the appropriate fit statistic, or any number of other running debates among quantitative methodologists.This is not to say that experiments are the "gold standard," but simply that they are less subject to reification, and therefore more trustworthy from the standpoint of making policy decisions.