Mathematics of Biomolecular Structure Experiments
Dissertation
Datum der mündl. Prüfung:2023-09-04
Erschienen:2023-09-29
Betreuer:Prof. Dr. Stephan F. Huckemann
Gutachter:Prof. Dr. Stephan F. Huckemann
Gutachter:PD Dr. Benjamin Eltzner
Dateien
Name:PHD_Thesis.pdf
Size:44.7Mb
Format:PDF
Zusammenfassung
Englisch
One of the main objectives of structural biology is to understand the complicated three-dimensional structure of biomolecules, and thus provide meaningful links between structure and functionality. There is a wide range of different methods to determine the structure of biomolecules, which are applicable in different cases. RNA molecules are usually determined by X-ray crystallography or cryogenic electron microscopy. This naturally leads to data at different resolutions and to the question of how to model data at different scales and develop learning algorithms. To tackle this issue, we introduce a novel approach to model RNA strands at two scales, the microscopic (atomic level) and the mesoscopic (intermediate scale between the microscopic scale and macroscopic scale (e.g., the whole RNA strand)). At the microscopic scale, we work with suites which can be represented on the seven-dimensional torus. At the mesoscopic scale, we work with mesoscopic shapes, which are modeled in the size-and-shape space. In order to learn clash-free corrections for both scales, we developed a new clustering method which can be applied to data in general metric spaces. Another approach to obtain structural information of molecules is the use of spectroscopic methods. ENDOR spectroscopy can be used to determine intramolecular distances. For this purpose, two different challenges have been worked on. The first challenge is to denoise the data: we present an asymptotic analysis for the homoscedastic drift model, a pioneering parametric model that achieves striking model fits in practice and allows both hypothesis testing and confidence intervals for spectra. The ENDOR spectrum and an orthogonal component are modeled as an element of complex projective space, and formulated in the framework of generalized Fréchet means. To this end, two general formulations of strong consistency for set-valued Fréchet means are extended and subsequently applied to the homoscedastic drift model to prove strong consistency. Building on this, central limit theorems for the ENDOR spectrum are shown. Furthermore, we extend applicability by taking into account a phase noise contribution leading to the heteroscedastic drift model. Both drift models offer improved signal-to-noise ratio over pre-existing models. The second challenge was to develop an analysis of the spectra to determine parameters that describe the conformation of the biomolecules from the spectra. For this purpose, we drastically accelerated a spectrum simulation code to enable optimizations. Building on this, a Bayesian optimization-based pipeline was implemented and successfully applied to ENDOR data.
Keywords: Generalized Fréchet Means; Strong Consistency; Torus; Size-and-Shape Space; Central Limit Theorem; Clustering; ENDOR Spectroscopy; RNA Molecules