Semi-Parametric Distributional Regression in Forestry and Ecology
Software, Models and Applications
Doctoral thesis
Date of Examination:2023-06-13
Date of issue:2023-08-10
Advisor:Prof. Dr. Thomas Kneib
Referee:Prof. Dr. Elisabeth Bergherr
Referee:Prof. Dr. Niko Balkenhol
Sponsor:Deutsche Forschungsgemeinschaft (DFG)
Files in this item
Name:riebl-regression-2023.pdf
Size:12.7Mb
Format:PDF
Description:Dissertation
Abstract
English
Recent advances in machine learning software, such as automatic differentiation and just-in-time (JIT) compilation, have significantly changed machine learning research. They have accelerated model development and contributed to the emergence of AI tools such as the chatbot ChatGPT and the image generator DALL-E. In the context of probabilistic programming, similar methods are used to implement efficient gradient-based inference algorithms applicable to a broad range of Bayesian models, e.g. Hamiltonian Monte Carlo (HMC) and the No-U-Turn Sampler (NUTS). This cumulative dissertation includes three research papers that combine methods from machine learning and probabilistic programming with semi-parametric regression models from applied statistics. This combination enables the development of novel models with semi-parametric regression predictors and the corresponding inference algorithms. Moreover, various applications in forestry and ecology are presented. In the first paper, we present the probabilistic programming framework Liesel, which aims to provide a software basis for efficient and reliable research in applied statistics, suitable for the implementation of complex models and inference algorithms. The software focuses on semi-parametric regression predictors with linear, non-linear, random and spatial covariate effects. A typical workflow with Liesel would be: (1) configuration of a model graph as a baseline, e.g. using Liesel's R interface, (2) adaptation of the model graph to implement new research ideas, and (3) fully Bayesian inference using the included Markov chain Monte Carlo (MCMC) library, either with a standard algorithm or a user-defined variant. Samplers such as HMC and NUTS are supported and can be combined with conventional methods, e.g. iterative weighted least squares (IWLS) proposals and Gibbs updates. Liesel is written in Python and uses the machine learning library JAX as a backend. The second and third paper discuss extensions and applications of semi-parametric distributional regression in forestry and ecology. The new models arise from the introduction of certain response structures into a regression context, e.g. in the form of Gaussian processes (GPs) with parametric mean and covariance functions. We apply the GP model to measurements from high-resolution circumference dendrometers. These instruments record both the irreversible growth of tree stems as well as the reversible shrinking and swelling due to the water content. With our model, the data can be decomposed into a permanent and a temporary component, and differences between trees and years can be explained by covariates. In the last paper, we propose the multi-species count model (MSCM) to estimate relationships between environmental conditions and different indices of species diversity. We use the model with semi-parametric regression predictors to assess the effects of European beech, Norway spruce and Douglas fir on the species diversity of various taxa, based on data collected in the Research Training Group (RTG) 2300 and taking into account spatial correlation.
Keywords: statistical modeling; regression analysis; semi-parametric statistics; generalized additive model for location, scale and shape; Bayesian statistics; statistical software