Bayesian Structural Ensemble Determination from Single-Molecule X-ray Scattering
Kumulative Dissertation
Datum der mündl. Prüfung:2023-11-24
Erschienen:2024-02-09
Betreuer:Prof. Dr. Helmut Grubmüller
Gutachter:Prof. Dr. Helmut Grubmüller
Gutachter:Prof. Dr. Simone Techert
Dateien
Name:thesis.pdf
Size:17.0Mb
Format:PDF
Zusammenfassung
Englisch
Single-molecule X-ray scattering experiments using ultrashort X-ray free electron laser (XFEL) pulses have opened a new route for the structure determination of biomolecules. They also hold the potential to extract the structural ensemble of a molecule without the need for synchronization. In these experiments, a stream of single copies of the molecule to be studied enters the pulsed XFEL beam, and for each pulse, the scattered photons are recorded as a scattering image. However, structure refinement from single-molecule X-ray scattering images is quite challenging due to unknown molecular orientations, typically very low numbers of recorded photons per scattering image, and low signal-to-noise ratios in this extreme Poisson regime. In the first and main part of this thesis I therefore develop and assess a novel Bayesian approach and demonstrate that it should be possible to determine not only a single structure, but an entire structural ensemble from these experiments. This approach allows for the systematic treatment of noise and other complicating experimental effects and, simultaneously, eliminates the need for classification, hit selection, and orientation determination. In fact, I explicitly include many complicating experimental effects, such as Ewald curvature, intensity fluctuations, hits vs. misses, beam polarization, irregular detector shapes, incoherent scattering and background scattering. On the single structure level, I demonstrate that my approach can achieve near-atomistic resolutions for the protein crambin from noise-free synthetic scattering images, and that it achieves the same resolution of 9 nm from experimental data for the coliphage PR772 virus as previous approaches, using only a very small fraction of the available data. On the structural ensemble level, I demonstrate that my approach can determine the conformational ensemble of alanine dipeptide and even the unfolded ensemble of the mini-protein chignolin. I further demonstrate using synthetic images that my approach can reliably determine electron densities even in the extreme low hit rate and high noise regime. Further, I systematically analyze the scaling behavior of my approach, finding, for instance, that the number of images required to determine a structural ensemble is proportional to the square of the number of conformers, that the amount of structural information per image is proportional the square of the number of photons, and that already a small amount of noise strongly decreases the achievable resolutions. In a second part of this thesis, I present an analysis of time-lagged independent component analysis (tICA), a widely used dimension reduction method for the analysis of molecular dynamics trajectories. I seek to understand how much information on the actual protein dynamics is contained in the tICA-projections of MD-trajectories, as opposed to noise due to the inherently stochastic nature of each trajectory. To that end, I analyze the tICA-projections of high dimensional random walks using a combination of analytical and numerical methods, finding that they resemble cosine functions and strongly depend on the lag time, exhibiting strikingly complex behavior. Further, I demonstrate that the tICA-projections of protein trajectories can indeed be strikingly similar to those of random walks, suggesting that not only the ensemble properties of the non-converged protein trajectories resemble those of random walks, as has been shown earlier via PCA, but also the time correlations of the underlying protein dynamics.
Keywords: XFEL; single-molecule X-ray scattering; Bayesian; structure determination