Education Policy Research in the Big Data Era: Methodological Frontiers, Misconceptions, and Challenges

Despite abundant data and increasing data availability brought by technological advances, there has been very limited education policy studies that have capitalized on big data—characterized by large volume, wide variety, and high velocity. Drawing on the recent progress of using big data in public policy and computational social science research, this commentary discusses how to approach big data and how big data can be used in education policy research. First, I introduce big data that is potentially relevant to education policy research. I then present methodological frontiers by examining the assumptions, key concepts, merits, and caveats of three commonly used analytical approaches to mining massive amounts of text data: topic models, network text analysis, and sentiment analysis. Next, to ensure the veracity of using big data in education policy research, I debunk three methodological misconceptions. This commentary concludes with a discussion on developing interdisciplinary research capacity and addressing the privacy concerns and ethical conundrums as we explore a research agenda of using big data in education policy.


Introduction
The rise of big data-characterized by large volume, wide variety, and high velocity (boyd & Crawford, 2012)-has breathed new life into the social sciences (King, 2011). The progress in acquiring, processing, and analyzing big data has enlightened many fields in the social sciences, such as political science (Bode, Hanna, Yang, & Shah, 2015;Schwartz & Ungar, 2015), public health (Achrekar, Gandhe, Lazarus, Yu, & Liu, 2011;Bates, Saria, Ohno-Machado, Shah, & Escobar, 2014), economics (Einav & Levin, 2014), and criminology (Chen, Cho, & Jang, 2015), to name a few. In the field of education, despite the fast growing body of literature on learning analyticscollecting and analyzing big data to optimize student learning, particularly in online learning environment (Baker & Yacef, 2009), there has been limited scholarship on capitalizing on big data in education policy.
To discuss the potential of big data and how big data can be used in education policy research, in this commentary I first introduce big data that is potentially relevant to education policy research. Given the conspicuous absence of education policy studies using big data, I primarily draw upon the literature on big data in public policy and computational social science-the emerging field at the convergence of computer science and the social sciences, using computational modeling to analyze massive amounts of digital data harvested mostly from digital media sources to study social phenomenon (Lazer et al., 2009;Shah, Cappella, & Neuman, 2015;Watts, 2013). Grounded in the broad literature relevant to education policy research, I then introduce three methodological frontiers of mining massive amounts of text data (i.e., a corpus of texts; Fleuren & Alkema, 2015;Hearst, 1999): topic models, network text analysis, and sentiment analysis. In particular, I examine the assumptions, key concepts, merits, and caveats of each of the three analytical approaches. Next, to ensure the veracity of using big data in education policy research, I debunk three methodological misconceptions. This paper concludes with the discussion on developing interdisciplinary research capacity and addressing the privacy concerns and ethical conundrums as we explore a research agenda of using big data in education policy.

Big Data in Education Policy Research
What is big data? In fact, big data is very loosely defined. There is no arbitrary cutoff point regarding how big the data must be in order to be considered as big data. Generally speaking, big data has three distinct features: large volume, wide variety, and high velocity (boyd & Crawford, 2012). In addition, some posit a fourth feature of veracity-the trustworthiness and accuracy of big data (Bello-Orgaz, Jung, & Camacho, 2015). In this commentary, I draw upon the literature of big data in public policy research and computational social science to delineate each of the first three features of big data, followed by the potential value of big data in education policy research. I then address the fourth feature of veracity as I present the methodological frontiers and debunk potential misconceptions. Here I proceed to introduce the first three features of big data: large volume, wide variety, and high velocity.

Large Volume
The first defining feature of big data is its massive volume. While the name of "big data" suggests large volume, there is no arbitrary cutoff point distinguishing "big" from "small" data. Oftentimes, the volume of big data is staggeringly large, such as over 118,000 speeches in the U.S. Senate (Quinn, Monroe, Colaresi, Crespin, & Radev, 2010), over 16,000 articles in the journal Science from 1990 to 1999 (Blei & Lafferty, 2007), 13,246 political blog posts on the 2008 presidential election (Roberts, Stewart, & Tingley, 2016), and over 16,000 documents on climate change (Boussalis & Coan, 2016).
In education policy research, the large volume of data can come from a host of sources. First, data-driven education accountability systems provide the unprecedentedly large volume of data on students, teachers, administrators, schools, and communities. For a ready example, a recent study tapped into the record of about 200 million test scores in math and reading to examine educational inequalities (Reardon, Kalogrides, & Shores, 2016). Second, digital footprints-digital trace data we create as we use digital services and devices (Howison, Wiggins, & Crowston, 2011)-are another source of big data that are valuable to education policy research. The data generated from our online activities, either posting tweets on Twitter (e.g., Barberá, 2015) or reading the Facebook newsfeed (e.g., Kramer, Guillory, & Hancock, 2014), could serve as the proxy for online public opinion on education policy. For instance, the hashtags #CommonCore and #CCSS are among the most frequently used hashtags by the digital public when discussing the education policy on the Common Core State Standards on social media (Supovitz & Reinkordt, 2017;Wang & Fikis, 2016). One of the frequently co-occurring hashtags with #CommonCore and #CCSS is #OptOut, which refers to the movement of opting out of high-stakes standardized testing . At the epicenter of the opt-out movement in the state of New York, the Long Island Opt-out group-a major advocate for opting out of state standardized testing-has tens of thousands of Facebook group members discussing testing and sharing resources on how to opt out (Wang, 2017). Third, technological advances in data acquisition increase data availability for education policy research. For instance, the optical character recognition (OCR) was used to convert the un-digitalized text data into the data that can be read in computers in a study examining the latent topics in 1,539 articles published in Educational Administration Quarterly from 1965 to 2014 . Another common data acquisition approach is to use application programming interfaces (APIs) to retrieve data generated on the Internet. Many technology companies (e.g., Google, Facebook, and Twitter) use APIs to grant others limited access to their data so that more applications can be created using their data. Twitter API is one of the most popular APIs among researchers. For instance, Twitter API was used to collect 660,051 tweets containing the hashtags of #CommonCore and #CCSS to study online public opinion on the Common Core State Standards . Fourth, collaborations with technology companies allow academics to access big data. An example is a collaborative research undertaking that carried out a randomized controlled experiment involving 61 million Facebook users to study social influence within online social networks (Bond et al., 2012). All these sources-the data-driven education accountability systems, the digital footprints generated by digital services and devices, data acquisition technologies, and collaborations with technology companies-provide the unprecedentedly voluminous amounts of data for education policy research.

Wide Variety
Big data does not simply mean large volume. It also refers to the wide variety of types of data, including, but not limited to, videos, images, audios, media coverage, blogs, social media posts, online comments, records of government agencies, mobile phone logs, data generated by wearable digital devices, and satellite images. These diverse data sources, along with the conventional data sources in education policy research (e.g., interviews, observations, artifacts, surveys, and archived documents), provide researchers with rich, multi-dimensional insights into the education policy issues of interest.
One potential for using big data in education policy research is to investigate public opinion on education policy. Since public opinion is one of the factors shaping policymaking (Burstein, 2003;Gormley Jr., 2016;Page & Shapiro, 1983), researchers have studied public opinion on a variety of education issues, including the policy in early childhood education (Gormley Jr., 2016), school reform (Henderson, Peterson, & West, 2016), school quality (Jacobsen, Snyder, & Saultz, 2014), and race-based and wealth-based student achievement gaps (Valant & Newark, 2016). Prior studies on public opinion on education and policy used the data collected from surveys, including the EdNext annual survey of American public opinion on education (Peterson, Henderson, West, & Barrows, 2017), the survey administered to the members of YouGov-a company conducting academic survey research and online political polling (Valant & Newark, 2016), and the survey fielded by Knowledge Networks to a sample representative of the U.S. population to study the effect of school report card formats on public opinion on school quality (Jacobsen, Snyder, & Saultz, 2014). In the big data era, in addition to survey data, a growing number of studies have examined public opinion on policy issues using social media data-the data generated by the digital public as they discuss policy issues on social media. A few examples could suffice. Over 120,000 tweets were collected to examine public opinion on health reform (King et al., 2013). Chung and Zeng (2015) developed a system called 'iMood' to identify opinion leaders, influential users, and community activists on the U.S. immigration policy by analyzing about one million tweets. Whitman (2015) detailed how the data from Twitter and the Google Trends were used to measure public opinion on the U.S. space policy. Reddicka, Chatfieldb and Jaramilloa (2015) examined public opinion on National Security Agency massive surveillance programs of Americans through a critical discourse analysis of tweets along with the survey data collected from a randomly sampled public opinion poll of Americans.
In addition to social media data, researchers have been exploring how to use other sources of big data, such as mobile phone metadata and satellite images. Toole and his team demonstrated the potential of using mobile phone metadata to improve the forecasts of critical economic indicators for governments' policymaking (Toole et al., 2015). Toole et al. used the mobile phone metadata (e.g., who called whom, the location of the cell towers through which the calls were made, the time of the calls, the total number of calls, and the number of incoming and outgoing calls) as a proxy to detect the changes in mobility and social interactions caused by layoffs, and then predicted the aggregated unemployment rates months before the official reports were released. Moreover, to leverage the value of image data, a novel approach is to use satellite-recorded nighttime lights to estimate the poverty and economic growth (Henderson, Storeygard, & Weil, 2012;Pinkovskiy & Sala-i-Martin, 2015;Xie, Jean, Burke, Lobell, & Ermon, 2016). Suffice it to say, these examples present the tantalizing potential of using big data in education policy research.

High Velocity
The third feature of big data is high velocity: the high speed of data generation. Unlike survey datawhich are generated periodically whenever the survey is administered, much of the big data-such as web browsing, Facebook status updates, YouTube videos, phone call logs-are generated in a constant stream. High-velocity streaming data pose both methodological challenges and opportunities of processing the constant incoming data, particularly processing data in a timely fashion, so that the data can be analyzed in real-time or near real-time. For instance, a group of researchers capitalized on the simplicity (i.e., no more than 140 characters) and immediacy features of tweets to detect the traffic events in urban areas (Gutierrez, Figuerias, Oliveira, Costa, & Jardim-Goncalves, 2015). In education policy research, a common limitation is the time lag of months, if not years, between data collection and result report. In the big data era, one solution to the time lag limitation is to capitalize on diverse data sources and automate or semi-automate data processing, analysis, and visualization, so that raw data can be processed, analyzed, reported, and visualized in a timely manner to inform education policymaking, implementation, and evaluation. In fact, the methodological advances in real-time analytics have been fast growing. The three methodological frontiers introduced in the following section can all be automated or semi-automated. The real-time analytics is definitely an area that those who are interested in using big data in education policy research should watch for.

Methodological Frontiers
In the big data era, the wealth of data available to social scientists has been considered as the microscope to microbiologists (King, 2011). It is certainly appealing that big data could enrich our understanding of social phenomena and relevant education policy issues. Yet big data, as King (2016) noted, "is not about the data" (p. iii). Rather, big data is about the analytics-the methodological approaches that extract insights from large-volume, wide-variety, and high-velocity data. To surmount the methodological challenges brought by big data, emerging analytical tools have been rapidly developed at the intersection of the social sciences and computer science to analyze massive amounts of digital data (Alvarez, 2016;Lazer et al., 2009;National Research Council, 2013;Shah, Cappella, & Neuman, 2015;Watts, 2013). Given the conspicuous absence of education policy studies using big data, I primarily draw upon the relevant literature on big data in public policy and computational social science to introduce three methodological frontiers-topic models, network text analysis, and sentiment analysis-along with their assumptions, key concepts, merits, and caveats. It is important to stress at the outset that the methodological approaches introduced here are only the commonly used ones (Alvarez, 2016;Feldman, 2013;Verd & Lozares, 2014). They do not represent a comprehensive survey of all the methodological tools to analyze big data. Neither do they supplant conventional methodological approaches in education policy research. By introducing the methodological tools, this article aims to invite education policy researchers to venture into big data by applying the tools to education policy research.

Topic Models
Topic modeling (Blei, 2012;Blei, Ng, & Jordan, 2003) is one of the prominent, rapidly developing methods to infer latent topics in massive amounts of text data. The topic models assume that each document has multiple topics, and each topic can be inferred by the probability distribution over a set of words. For example, the topic of social justice is described by the words such as "inequity," "justice," "race," "disability," and "bilingual" . In addition to identifying latent topics in text data, topic models have been developed to uncover how the topics are related to one another-such as the correlated topic models (Blei & Lafferty, 2007), and how the topics evolve over time-such as the dynamic topic models (Blei & Lafferty, 2006).
The popularity of topic models is partly explained by the fact that they are effective and scalable to explore the latent topics in massive amounts of text data. Topic models can be either fully automated (i.e., unsupervised) or semi-automated (i.e., supervised) (Roberts et al., 2016). When unsupervised topic models are applied, no previous annotations or labeling of the documents is needed, as the topics are identified by the high probability words that describe a particular topic. Therefore, topic models are highly scalable, without the constraint of human cognition entailed in manual coding. Thanks to the scalability, topic models have been applied to analyze the large volume of text datasets from a variety of data sources: 24,236 press releases from the U.S. Senate (Grimmer, 2010), the opinion texts from the U.S. Supreme Court's 4,321 non-unanimous Court decision from 1949 to 2006 (Lauderdale & Clark, 2014), over 21,000 scholarly articles on literary studies over the last 120 years (Goldstone & Underwood, 2014), and 233,452 online healthcare chat logs (Wang, Huang, & Gan, 2016). Recently, topic models have been used for automated annotation of images (Feng & Lapata, 2010) and videos (Katsurai, Ogawa, & Haseyama, 2012).
In education policy research, topic models can be applied to infer latent topics in large corpora compiled from multiple data sources. They include records of government agencies, news coverage, and social media discourse throughout the policy processes from problem identification and framing to ongoing evaluations of existing policies. Since no study was found that has applied topic models in education policy research, here I present a topic modeling study in education research. Wang and her team used topic modeling and identified 120 latent topics in 16,524 documents published in the 116-year history of the longest running journal in education Teachers College Record (TCR) from 1900 to 2015 (Wang, Bowers, Chae, Fikis, & Natriello, 2017). The 120 topics were identified by generating two matrices: (1) a matrix of word probabilities by latent topics, and (2) a matrix of topic proportions by TCR articles (see Blei, 2012, for a thorough explication of probabilistic topic modeling). Among the 120 topics in education literature over the last 116 years, the topic of education in wartime disappeared after the end of cold war in the 1980s; the topic of social justice has been on the rise since the 1950s; the topic of measurement of student achievement has garnered persistent attention in the education research community since the 1900s.
Researchers who apply topic models to education policy research need to be aware of a couple of caveats. First, topic models, albeit valuable, are only a blunt tool to explore large corpora. Unlike the fine-grained results generated by manual coding, the results of topic models are a coarsegrained description of the text datasets. Further, topic models, even unsupervised topic models, necessitate researchers' subjective decisions on modeling and choosing the best fitting model. Most topic models do not run on every single word in the datasets. Rather, before running topic models, the text data need to be processed and cleaned. In this data cleaning process, a common practice is to remove the words that do not convey topical meaning or the words that are used most and least frequently. To do so, scholars may take different approaches, as "there is no globally best method" (Grimmer & Stewart, 2013, p. 3). A few examples would suffice. Blei and Lafferty (2007) removed the words by two criteria: the words occurred fewer than 70 times, and the 296 stopwords such as "a," "an," "the," and "around". Roberts et al. (2016) removed the words that occurred fewer than 1% of the 13,246 blog posts. Grimmer (2010) removed the words that occurred fewer than 0.5% and over 90% of the documents, as well as the stopwords. Grün and Hornik (2011) calculated the mean term frequency-inverse document frequency (tf-idf), and then only included the words that have a tf-idf value slightly above the median to remove the very frequently used words. All these examples demonstrate how running topic models entail the researchers' modeling decisions. Moreover, researchers interpret the topic model output and oftentimes label the topics through drawing upon the researchers' knowledge on the context of the text data. For instance, to label each topic, the researchers examined the topics that have been identified periodically in the existing literature in the field, and took account into the results generated from the topic models: the highprobability terms and the high-probability articles . This process of data interpretation relies on the researchers' contextual knowledge of the data. For this reason, topic models, while do not replace humans, "amplify human abilities" (Grimmer & Stewart, 2013, p. 4). To that end, the most promising way of automated text mining is not to negate the researchers' need to read the texts, but rather "to identify the best way to use both humans and automated methods for analyzing texts" (Grimmer & Stewart, 2013, p. 4).

Network Text Analysis
Network text analysis is another emerging methodological frontier to analyze text data. A network is formed by nodes and ties (Borgatti, Everett, & Johnson, 2013). To conceptualize texts as networks, the units of texts (i.e., words or concepts) are connected by the ties (i.e., the co-occurrence of words or the relationships between concepts; Verd & Lozares, 2014). Network text analysis assumes that the semantic meaning of texts is revealed by the patterns of network structure-how the units of texts are connected in the network (Diesner & Carley, 2005). For instance, in the network text analysis of stem-cell research literature, the words-such as "bone," "marrow," "transplantation," "tissue," and "peripheral"-are tightly connected by their co-occurrence ties (Leydesdorff & Hellsten, 2005), denoting that the co-occurrence relationships between the tightly connected words are stronger than those words with others. In the network analysis of texts compiled by 1.9 billion anonymous queries on epilepsy, the nodes are the words that are connected by the co-occurrence ties; the tightly connected clusters of words suggest the topics, including the seizures and their effects on the body, anti-epileptic drugs and their side effects, and the life changes (e.g., driving and employment) related to epilepsy (Miller, Groves, Knopf, Otte, & Silverman, 2017). In addition to conceptualizing words as the units of texts in the networks, researchers also consider hashtags as the units of texts in analyzing the text data acquired from social media. For instance, in the network analysis of 660,051 tweets containing the hashtags #CommonCore and #CCSS (the Common Core State Standards), the nodes are the hashtags that are connected by the co-occurrence ties; the tightly connected clusters detected by the faction algorithm suggest the online discourse on the Common Core State Standards on Twitter centered around the topics, including the Republican Party's 2016 presidential candidates (e.g., #Trump2016 and #TedCruz2016), anti-Common Core (e.g., #StopCommonCor and #StopCC), education policy and reform (e.g., #NCLB and #ESSA), as well as teaching and testing (e.g., #teaching and #testing) . Another novel approach to conceptualizing texts as the networks is to consider nodes as the nouns that are connected by the verbs as ties. For instance, in the network analysis of 130,213 news articles on the 2012 U.S. presidential elections, the nodes are the nouns such as Obama, economy, and efforts, and the ties are the verbs such as celebrate, acclaim, and blame (Sudhahar, Veltri, & Cristianini, 2015).
Both network text analysis and the aforementioned topic models are emerging methodological approaches to analyze text data. One might wonder: If we use the two methods to analyze the same text dataset, do their results differ? The answer, according to the literature (Leydesdorff & Nerghes, 2015;Wang & Fikis, 2016), is affirmative. In fact, the two methods can yield the results that are significantly uncorrelated. This by no means suggests topics models and network text analysis are invalid. Rather, it suggests the methods work well if applied for different purposes: the topic modeling is more appropriate to reveal similarities, whereas network text analysis is more appropriate to reveal semantic meanings. The different results yielded by using topic models and network text analysis are analogous to the different results generated from using different conceptual frameworks and coding schemes when manually coding the same dataset in qualitative research.
It is worth noting that when analyzing the hashtag co-occurrence networks, it is of critical importance to select the appropriate hashtags. The hashtags might keep changing and evolving over the process of education policymaking and implementation. Correspondingly, the appropriate hashtags should be broad enough to include all permutations of the words and phrases relevant to the topic of interest. However, if it is too broad, there is a risk of including irrelevant content, adding noise to data. Therefore, an iterative process is recommended to examine the data retrieved using the selected hashtags.

Sentiment Analysis
The third methodological frontier that holds great promise in education policy research is sentiment analysis, also known as opinion mining. Sentiment analysis identifies the sentiment and emotions in text data through detecting emotion-bearing words (Liu, 2010). For instance, the negative sentiment is considered to be expressed in the tweet "Fear, retaliation ruled @UserID HR department, employees say", because of the negative emotion-bearing words (i.e., fear and retaliation) were used; the positive sentiment is considered to be expressed in the tweet "Thank you for surprising one of our amazing teachers today!", because of the positive emotion-bearing words (i.e., thank and amazing) were used. Sentiment analysis can be applied at various levels (e.g., document, sentence, and aspect) using sentiment lexicons such as SentiWordNet (Feldman, 2013). With the wealth of digital data, sentiment analysis has emerged as a supplement to the existing methods (e.g., surveys and interviews) to gauge public opinion-an aggregate of individual views, attitudes, and beliefs about a particular topic expressed by the digital public.
The automated sentiment analysis has been applied to research in many fields to examine public opinion. In business, sentiment analysis has been used to evaluate the sentiment in financial news articles to predict stock prices (Schumaker, Zhang, Huang, & Chen, 2012). In political science, sentiment analysis was conducted to measure the public opinion on the 2008 Obama-McCain debate (Fernández-Gavilanes, Álvarez-López, Juncal-Martínez, Costa-Montenegro, & González-Castaño, 2016). In the field of public policy, sentiment analysis was conducted to gauge the public opinion on the U.S. immigration policy (Chung & Zeng, 2015) and crime problems (Jeremy, Paul, Krone, Spiranovic, & Cockburn, 2015). In the field of education policy, sentiment analysis was conducted to investigate the public opinion on the Common Core, and it was found that the negative sentiment surpassed the positive sentiment in all 50 states and the District of Columbia .
If sentiment analysis is used to analyze social media data, researchers can often pinpoint the geographical locations of social media users by geographic identifiers, including geotagged locations and self-reported locations. Prior literature has consistently suggested that approximately 1% of tweets are geotagged, thus providing the data on latitude and longitude of the locations where the tweets are posted (e.g., Jahanbakhsh & Moon, 2014;Mislove, Lehmann, Ahn, Onnela, & Rosenquist, 2012;Ram et al., 2015;Young, Rivers, & Lewis, 2014). In addition, around 70% of Twitter users self-report their geographic locations on their Twitter profiles (e.g., Mislove et al., 2012;. These data on locations, along with the data on the time stamps of social media posts, offer researchers opportunities to examine when and how public opinion emerge, fluctuate, and evolve at different geographical scales such as states, congressional and legislative districts, and large cities and towns. Moreover, when social media users post selfies like more than half of millennials have already done in the United States (Taylor, 2014;Wilson, 2014), the public opinion can be examined at an even more granular level by demographic groups such as African Americans, Hispanics, and Asian Americans. More intriguingly, public opinion is not a static entity. Rather, public opinion is contagious (Kramer et al., 2014), and they evolve throughout the policymaking and implementation process. As a corollary, sentiment analysis can be an instrumental tool in education policy research to glean insights into how education policy affect the public, and the interplays among education policy, public opinion, and policy outcomes in diverse political, economic, and cultural contexts.
Like other methodological approaches introduced earlier, sentiment analysis is a crude tool to detect sentiment in large-volume text datasets. Other than that, the automated sentiment analysis has a few unique limitations. First, the fully automated sentiment analysis does not detect sarcasm well. Sarcasm is a sophisticated expression, in which "praises" carries negative sentiment (Altrabsheh, Cocea, & Fallahkhair, 2015). Take the tweet "Brilliant! The real agenda behind #CommonCore!" as an example. The algorithms in sentiment analysis might deem "brilliant" as a positive-emotion-bearing word and categorize erroneously the tweet with positive sentiment. The second limitation of sentiment analysis is the lack of contextual understanding. Without it, the sentiment analysis algorithms might categorize the tweets "Testing is so green." and "Opt out is so White." as the tweets with the neutral sentiment, which might be in disagreement with manual coding results ("Testing is so green" might be manually coded as "testing industry wring profits from public education"; "Opt out is so white" might be manually coded as "The students and their parents who advocate for opting out of standardized testing are predominantly White"). The third limitation is the exclusion of the content on the webpage directed by the hyperlinks in tweets.
Consider the tweet "Must read. http://t.co/eUJqNEGZSm Is this where we are headed? Lots to think about." The text data in this tweet itself suggest neutral sentiment, but the content on the webpage directed by the hyperlinks in the tweet carries negative sentiment towards the proposed changes in the Elementary and Secondary Education Act (ESEA) Reauthorization. Given the limitations, sentiment analysis often needs some portion of manual coding to ensure the veracity of data analysis.

Debunking the Misconceptions
For all the richness of big data and emerging methodological tools at our disposal, if used properly, big data could be a gold mine of education policy research. However, big data is not a panacea. The label of "big" renders big data prone to misunderstanding. The fundamental premise to establish the veracity of big data-the fourth feature of big data-lies with not only aptly applying the aforementioned methodological tools and many others, but also exercising caution against the misconceptions of big data. The three misconceptions discussed below have been noted in the literature in public policy and computational social science. These misconceptions have the potential to misguide education policy researchers as they use big data. Here to ensure the veracity of big data in education policy research, I debunk the misconceptions as a cautionary note for future inquiry.

Bigger is Not Necessarily Better
The first misconception is that bigger is better. Not true! Big data, indeed, has much value to offer; yet it may delude us into thinking bigger is better. In fact, bigger is not necessarily better. Big data-the large-volume, wide-variety, and high-velocity data-inherently implies more noise in data, rendering more efforts to find the true patterns in data because "the signal-to-noise ratio may be waning" (Silver, 2015, p. 447). More importantly, big data and "small data" are not mutually exclusive. The techniques in traditional statistics are indispensable to examine the validity and reliability of big data analytics. Take sampling as an example. Some argue that sampling may outlive its usefulness as we can now access samples so large that N ≈ all (i.e., samples are close to population; Mayer-Schönberger & Cukier, 2013). Surveying the literature in the big data era, it is not uncommon to see the studies that analyzed multi-million pieces of data. For instance, 46 billion words posted by 63 million unique Twitter users were considered as the social sensors of human happiness (Dodds, Harris, Kloumann, Bliss, & Danforth, 2011); a corpus that contained 509 million tweets posted by 2.4 million Twitter users from 84 countries was examined to detect people's seasonal mood changes (Golder & Macy, 2011); 9 million tweets posted by the Twitter followers of candidates for the U.S. House and Senate and governorship in 2010 midterm elections were used to study the digital public's political expression (Bode et al., 2015). However, the "N ≈ all" view has been challenged by many skeptics who have called to caution the biased sampling, particularly in those studies relying on specific social networking sites (boyd & Crawford, 2012;Hargittai, 2015;Sandvig, 2015). In other words, the statement "we examined how 10.1 million U.S. Facebook users interact" does not indicate a representative sample of the U.S. population, but rather the 4% of Facebook users who selfreported their political affiliation on their Facebook profile page (Sandvig, 2015). In fact, people do not randomly opt into the use of Facebook and social networking sites in general (Hargittai, 2015). Whether people use a specific social networking site is associated with a host of factors, including age, gender, ethnicity, education, and income. Specifically, younger people are more likely than older people to use three popular social media sites-Facebook, Twitter, and LinkedIn; those with more education and higher income are more likely to use these three sites than those who have lower socioeconomic status; women are more likely to use Facebook but less likely to use LinkedIn; African Americans are more likely to be on LinkedIn and Twitter, but less likely to use Facebook than Whites (Hargittai, 2015). As a corollary, the online population demographics might shift constantly, and do not represent the ones in the offline world (Diaz, Gamon, Hofman, Kıcıman, & Rothschild, 2016).
More importantly, in an article titled "How Big Data is Unfair", Hardt (2014) noted that the algorithms are apt to "favor those who belong to the statistically dominant groups" (para. 9), and "a variable that is positively correlated with the target in the general population might be negatively correlated with the target in a minority group" (para. 12). Therefore, despite a large N, the sheer number of sample size alone is not the factor substantiating the claim that N ≈ all. Researchers must be cautious when making generalized claims based on a biased sample, as it is problematic to generalize the patterns in one demographic group at one place at one time to all demographic groups at all places at all times.

Theoretical Framing Is Important As Always
The second potential misconception of using big data in education policy research is that theoretical framing may become obsolete. Some predicted that the scientific approach of hypothesizing, modeling, and testing would become obsolete, because patterns can be revealed by crunching data through algorithms without being guided by theories (Anderson, 2008). This view is unfounded, because the revealed "patterns" might not exist and the "enormous quantities of data can offer connections that radiate in all directions" (boyd & Crawford, 2012, p. 668). The so-called "patterns" are deemed as "chance associations"-the discernable associations that would inevitably show up if we look at enough data (Leinweber, 2007). Without appropriate theories to frame and guide data mining, it is easy to fall into the trap of "data mining sins" (Leinweber, 2007, p. 15). An example is that the Google Flu Trends (GFT) was touted that it could predict flu by matching 50 million Google search terms to over 1,000 terms suggesting the propensity of flu, and the GFT could generate the flu prediction two weeks before the U.S. Centers for Disease Control and Prevention did. However, the GFT failed to predict the non-seasonal flu in 2009, partly because the GFT "was part flu detector and part winter detector" (Lazer, Kennedy, King, & Vespignani, 2014, p. 1203. This is why boyd and Crawford (2012) argued that data, big or small, lose value when they are taken out of context. In education policy research, the socially-constructed data should never override theory. Theory matters. Context matters. These principles apply to big data analytics as well. Technology truly helps satiate our voracious appetite for data; computer science provides the capacity to collect and analyze voluminous amounts of data. But still, in the big data era, theories and data are inextricably linked. Data alone tell us nothing. Data are merely a vehicle we can lean on to enrich our understanding of social phenomena. When we simplify human behaviors to numbers, we run the risk of losing the richness of human behaviors. It is the researchers' job to interpret data: unpacking the meaning behind the data in a given context (boyd & Crawford, 2012). Therefore, the data interpretation, data contextualization, and sense-making process must be guided by theories, rather than algorithms. In education policy research, on the one hand, theories are analogous to beacons guiding the data collection, analysis, and interpretation, so that researchers do not cherrypick variables to blindly hunt for patterns; on the other hand, the results from a theoretical-guided analytical approach can help facilitate theory development and enrich our understanding of intricate education policy issues.

Even Automation Needs Human Involvement
Given the large-volume, wide-variety, and high-velocity features of big data, much of the big data analytics is automated by algorithms. The third potential misconception of using big data in education policy research is that automation does not need human involvement. This misconception runs the risk of overemphasizing algorithm-enabled automation at the expense of the veracity of big data. The sophisticated algorithms, along with the brute force of high-performance computing (also called supercomputer), do not mean that valid results are automatically produced after feeding data into the magic wand of algorithms. First, human judgment is needed in every step of big data analytics in education policy research to establish the veracity of research. The data quality needs to be evaluated by humans, as the data feeding into the algorithms can be unrepresentative of targeted population and error-prone (Tufekci, 2014). The model parameters used in the automated data analysis need to be set by humans, as the ones noted earlier in the section of topic models. The data interpretation entails researchers to draw upon their prior knowledge in the field to make sense of the data. As a corollary, to tap the potential of big data in the field of education policy, the key is not to blindly pursue clever algorithms or troves of data, but to augment automation and human judgment, so that they are complementary to each other.

Challenges
Amid the abundant potential of big data, pressing challenges abound in using big data in education policy research. Among them are developing interdisciplinary research capacity, and addressing the privacy concerns and ethical conundrums. Here I discuss these two challenges that may deter using big data in education policy research. More importantly, I encourage future researchers to offer viable strategies to unleash the potential big data in education policy research.

Developing Interdisciplinary Research Capacity
The first challenge we face is the deficiency of education policy researchers who are well versed in the fields of both education policy and computer science. In its essence, using big data in education policy research entails interdisciplinary efforts, in which the knowledge in education policy provides vital theoretical underpinnings that frame the studies, and the expertise in computer science enables the algorithmic approaches to data acquisition, mining, and analysis to scale up analytical capacity. Unfortunately, education policy researchers are often inadequately trained in computer science.
To surmount this challenge, I propose two viable solutions to build and maximize the research capacity in using big data in education policy research. The first solution is to develop interdisciplinary research teams. Stepping out the silos in individual researcher's disciplinary boundary, the researchers in different fields-such as education policy, computer science, and data science-collaborate and marshal the intellectual capital on data acquisition and analysis, as well as interpreting data in rich social contexts. Another solution to the deficiency of interdisciplinary research capacity is to build a pipeline of ambidextrous researchers who are deft in both education policy research and computer science. In fact, government agencies have already taken the initiatives to train aspiring socially-minded computer scientists and computationally-minded social scientists. For instance, National Science Foundation has been funding the initiatives to train interdisciplinary big data researchers (NSF, 2012). Moreover, many universities have begun building interdisciplinary programs that bring together social science, computer science, statistics, and data visualization (Wallach, 2015). It is thus recommended that future education policy researchers participate in such interdisciplinary training programs so that they will be well equipped to apply computational social science to education policy research.

Addressing Privacy Concerns and Ethical Conundrums
In a data-rich environment, researchers have been wrestling with privacy concerns that derive from the inherent tension between data and privacy. The word "data" is a plural of the Latin word "datum" which means "something given", whereas privacy refers to "the ability to control the dissemination and use of one's information" (Wright et al., 2003, p. 148). As a result, big data and privacy have inherently contradictory goals (O'Leary, 2015). With Internet-connected digital devices at disposal, people might trade privacy for convenience. We disclose our whereabouts when we hail a ride with Uber or use Google Map to get to our destination. We share a part of our life when we use Facebook to stay in touch with families and friends. Further complicating the privacy issue is that once we share something online, either a picture or a comment, that "something" then has a life of its own on the Internet. That is, we no longer have much control over how the data about us are disseminated. In this digitally hyper-connected world, everything about our online activities, as boyd (2010) aptly put it, is "public by default, private when necessary" (boyd, 2010, para. 1). Still, for our researchers, as computerized algorithms allow us to scale up data collection and analysis through an automated process, we are now presented with the mounting challenges on how to strike a balance between strengthening the oversight of data access and advancing scientific discoveries.
Further, ethical conundrums loom large as we capitalize on big data in education policy research. First, readily access to data does not necessarily mean that the data are meant to be consumed by anyone (boyd, 2012). Rather, the easy access to big data highlights the need for greater oversight to deter prying eyes on personal data and prevent the potential abuse of massive datasets. A recent controversial study was about the massive-scale experiment (n = 689,003) on emotional contagion on Facebook (Kramer et al., 2014). To test whether emotions were contagious through Facebook's social networks without face-to-face communication or non-verbal cues, Kramer et al. manipulated the extent of the emotional content of Facebook users' News Feed. Kramer et al. stated that the data collection and analysis process was "consistent with Facebook's Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this [Kramer et al.'s] research" (Kramer et al., 2014, p. 8789). This leads to a critical question: Are social media sites' Terms of Service the de facto informed consent for social experiments in the digital age? To date, this question has not received an agreed-upon answer yet. Most Institutional Review Board Protocols have not provided adequate guidelines on conducting large-scale social experiments on social media sites. This is particularly problematic when human participants are not fully informed and are not explicitly asked for informed consent for online social experiment participations.
Another ethical challenge centers on the use of social bots in the experiments. Social bots are the social media accounts that are operated by algorithms, instead of humans, to automate the content generation and interaction with other human users on social media (Ferrara, Varol, Davis, Menczer, & Flammini, 2016;Kollanyi, Howard, & Wooley, 2016). Recently, social bots have been increasingly used to contribute to and even steer the direction of online discourse on political elections and public policy issues (Bessi & Ferrara, 2016;Ferrara et al., 2016). A prime example is that the automated pro-Trump bots were used more aggressively than pro-Clinton bots during the U.S. presidential debates in 2016, and particularly in the final presidential debate the pro-Trump bots out-produced seven times more traffic on Twitter than the pro-Clinton bots (Kollanyi, Howard, & Wooley, 2016). Thus, social bots can potentially have a bearing on shaping and even intentionally manipulating the contours of online discourse, wielding subtle and even significant influence on public opinion and policy issues. In the research realm, the social bots have been used as the "virtual confederates" in social experiments conducted online to study human social behaviors, like the confederates-the people trained by the researchers to follow the pre-assigned scripts in social experiments-used in Stanley Milgram's experiment as the pedestrians walking on the street and looking at the balcony to examine how many other pedestrians followed the confederates' behavior (Milgram, Bickman, & Berkowitz, 1969), and in Solomon Asch's (1956) conformity experiment. As a result, the ethical challenges on the informed consent, deceptions, and direct harm must be addressed as the bots are used as the "virtual confederates" in the experiments to control the variables and create an artificial context (Krafft, Macy, & Pentland, 2016).

Concluding Remarks
Big data research has been growing by leaps and bounds. In education policy, researchers can collect big data from an array of sources, including, but not limited to, the data-driven education accountability systems, the digital footprints left by the digital services and devices used by education stakeholders, and data acquisition technologies such as the OCR converting the un-digitalized text data into digitalized data and API accessing data shared by technology companies. With the rich data, new models are rapidly being developed to analyze texts, videos, images, and audios, mobile phone call log data, and satellite image data. The wealth of data and analytical tools allow researchers to examine education policy research questions that might not be easily explored in the past. The three methodological approaches (topic models, network text analysis, and sentiment analysis) introduced in this commentary hold great potential to the real-time or near-real-time analysis of big data in education policy. In doing so, the timely results can be offered to inform education policymaking, implementation, and evaluation. For education policy researchers to venture into big data, it is important to develop interdisciplinary research capacity, as well as address the privacy concerns and ethical conundrums. This paper did not take a systematic approach to comprehensively surveying all methodological tools used in analyzing big data. Rather, to ensure the empirical examples presented in this paper are applicable in education policy research, this commentary introduces three analytical approaches used in the closely related fields of public policy and computational social science in an effort to invite education policy researchers to venture into big data. As we embrace big data in education policy research, many hurdles remain and new obstacles will emerge. It is the hope that this commentary could invite forward-looking discussions and the explorations of a research agenda of using big data in education policy. Readers are free to copy, display, and distribute this article, as long as the work is attributed to the author(s) and Education Policy Analysis Archives, it is distributed for noncommercial purposes only, and no alteration or transformation is made in the work. More details of this Creative Commons license are available at http://creativecommons.org/licenses/by-nc-sa/3.0/. All other uses must be approved by the author (