Meta-research in times of data science: Using automation, machine learning, and modern econometrics to analyze open science in economics and related fields
Cumulative thesis
Date of Examination:2024-10-17
Date of issue:2024-12-18
Advisor:Prof. Dr. Helmut Herwartz
Referee:Prof. Dr. Stephan B. Bruns
Referee:Prof. Dr. Thomas Kneib
Referee:Prof. Dr. Robert Malina
Referee:Dr. Nele Witters
Referee:Dr. Michèle B. Nuijten
Files in this item
Name:Dissertation___Thesis___Publication.pdf
Size:30.1Mb
Format:PDF
Abstract
English
This cumulative doctoral thesis applies different data science techniques to the field of meta-research. On the one hand, the output of this thesis are large and robust research data sets. On the other hand, the thesis provides new insights into economics research and related fields such as psychology. The generated data sets allow to answer new research questions, challenge findings that are based on small samples, and resolve mixed evidence in previous literature. In particular, we show how automation, machine learning, as well as modern econometrics can be leveraged to study research practices and advances in open science. Thus, we discuss the increased accessibility of articles, data, and methodologies that have benefited meta-research efforts and efficiency. In addition, we examine how various stages of the research process can be partially or fully automated. Lastly, we provide an overview of various collaborative tools and forms of collaboration, highlighting their advantages. The main accomplishment of this thesis is the development of DORIS, a software tool that automatically gathers statistical tests from academic papers. By automating the manual coding process, this tool can benefit future meta-research and the peer review process. During the course of this dissertation, DORIS produced a data set comprising approximately 600,000 tests that have undergone manual verification. These tests, along with the metadata of the respective articles, form a comprehensive research data set for use in upcoming meta-research studies. The individual articles included in this cumulative dissertation make important contributions to the research community. On a large scale, we demonstrate that there is a notable prevalence of statistical reporting errors in economics, with a tendency to highlight results that are supposedly statistically significant. We give evidence that implementing mandatory data and code policies in economics journals significantly reduces the occurrence of overstated reporting errors. However, we find no evidence to suggest that the top 5 economics journals are less susceptible to these errors. In fact, there is slight evidence indicating that the top 5 journals may be more prone to overstated reporting errors (Bruns et al. 2023). Furthermore, using the same large-scale data set, we find an indication that the enforcement of mandatory data and code policies could help decrease questionable research practices and encourage authors to perform more rigorous work (Islam et al. 2024). Through an examination of recent time trends in economics using data provided by DORIS, merged with additional data, we find a continued emphasis on causal methods and more robust econometric techniques, such as clustered standard errors. Moreover, we observe a diminution in the exaggeration of effect sizes, a slightly growing awareness of open science, but still an increasing reliance on proprietary data (Brodeur et al. 2024). Expanding the analysis and data set to related disciplines, we identify a correlation between the strength of empirical findings and writing style in both the social sciences and the natural sciences, despite the expectation that there should be none. Ambiguity-driven results are not oversold using sensational language. In fact, clear results tend to be sold confidently using sensational language. Articles with ambiguous results are more focused on statistical significance than effect sizes, and there is no practically significant association with readability (Bruns et al. 2024b). Lastly, switching fields to psychology and employing a many-analysts approach, we find that religious individuals tend to report higher levels of well-being. In addition, this correlation depends on the perceived cultural norms of religion in the respective country (Hoogeveen et al. 2022b). The final article suggests enhancing many-analysts approaches by diversifying the data sets and replicating the previous study in the future to obtain long-term insights (Islam and Lorenz 2022). This thesis advocates for incentivizing researchers to adopt more robust research methods and open science practices, such as sharing data and code, creating pre-analysis plans, and publishing null results. Ideally, these practices should be introduced during the academic training.
Keywords: Data science; Meta-research; Open science; Data and code availability policies; Questionable research practices