Reddit data as a new tool and source for social research

The use of non-traditional data (i.e., data collected from non-probability sample surveys, passive data, or Big Data) to supplement or replace survey data is growing.  However, these data are not without weaknesses; they suffer from their own sources of error, access challenges, and confidentiality concerns.  This project uses survey data collected on and posts scraped from Reddit.com to answer three research questions: 1) Can social media data be used to accurately assess social attitudes? 2) What are the sources of error in social media data? 3) What variability in the conclusions drawn from these data is introduced by the researcher’s choice in analytic methods? In addition to the research questions, this project also offers some descriptions of the data and access to it so future Reddit data users can further refine their budgets, timelines, and expectations.

Project Team: Ruben Bach, Ashley Amaya (RTI), Frauke Kreuter, Florian Keusch