Causality, Prediction, and Replicability in Applied Statistics: Advanced Models and Practices
von Peter Pütz
Datum der mündl. Prüfung:2019-05-03
Erschienen:2019-07-11
Betreuer:Prof. Dr. Thomas Kneib
Gutachter:Prof. Dr. Thomas Kneib
Gutachter:Prof. Dr. Sebastian Vollmer
Gutachter:Prof. Dr. Bernhard Brümmer
Dateien
Name:Dissertation_Pütz.pdf
Size:1.23Mb
Format:PDF
Zusammenfassung
Englisch
Statistical tools to analyze research data are widely applied in many scientific disciplines and the need for adequate statistical models and sound statistical analyses is apparent. This thesis addresses limitations in statistical models commonly used to identify causal effects and for prediction purposes. Moreover, difficulties in the replicability of statistical results are revealed and remedies are suggested. With regard to causality, the incorporation of penalized splines into fixed effects panel data models is proposed. Fixed effects panel data models are often used in order to establish causal effects since they control for unobserved time-invariant heterogeneity of the study entities. The inclusion of penalized splines relieves the researcher from determining functional shapes of the covariate effects. Instead, the functional forms are allowed to be flexible and are estimated based on the data at hand such that a data-driven degree of nonlinearity is identified. Simultaneous confidence bands are presented as a computationally fast and reliable uncertainty measure for the estimated functions. Furthermore, this thesis studies causal effects not only on the expectation but on all aspects of the distribution of the dependent variable. In particular, generalized additive models for location, scale and shape are introduced to (quasi-)experimental methods. A step-by-step guide demonstrates how the proposed methodology may be applied and provides insights which may go unnoticed in common regression frameworks. In the domain of prediction, a small area prediction problem is considered. It is shown how to obtain reliable up-to-date welfare estimates when an outdated census without information on income and a more recent survey with information on income are available. Instead of using survey variables to explain income in the survey, the proposed approach uses variables constructed from the census. The underlying assumptions are less restrictive than those in commonly applied methods in this field that are tailored to situations with simultaneous census and survey collection. As an overarching topic relating to all statistical analyses, the replicability of statistical results is considered from two viewpoints. On the one hand, the prevalence of reporting errors in statistical results is investigated. On the other hand, studies are replicated if possible by using the same data and software code as in the reference study. It is shown that replicability is frequently made impossible by reporting errors as well as by missing data and software code. At the same time, simple solutions to enhance replicability in future research are presented. Open data and software code policies together with a vivid replication culture seem to be most promising.
Keywords: Statistics; Causality; Prediction; Replicability