New Approaches to Cryo-EM Image Processing
New Approaches to Cryo-EM Image Processing
by Florian Alexander Jochheim
Date of Examination:2022-10-26
Date of issue:2022-12-16
Advisor:Prof. Dr. Patrick, Cramer
Referee:Prof. Dr. Patrick, Cramer
Referee:Prof. Dr. Sarah Köster
Files in this item
Name:Jochheim_Thesis_Revision.pdf
Size:11.9Mb
Format:PDF
Abstract
English
Using cryo-electron microscopy (cryo-EM), it is possible to resolve structures of biological macromolecules by averaging numerous projections of macromolecules (particle images) frozen in vitrified ice and imaged using the electron microscope. Developments over the last years have thereby sparked a so-called “resolution revolution”. Nowadays, reaching (near) atomic resolution for a large range of protein complexes has become a standard practice for structural biology and a complementing technique for X-ray crystallography. These developments entail developments in microscope hardware, such as direct electron detectors, that allow for single electron detection (counting) and therefore a much reduced radiation dose. Algorithmic developments as well as improvements in computer hardware, however, have greatly improved the in silico processing of samples. Especially relevant is the ability to quickly process datasets of hundreds of thousands of particles by using efficient processing algorithms as well as computation on graphic processing units (GPU). Due to the low electron dose used in cryo-EM experiments today, the signal-to-noise ratio is low and accurately reconstruction of the high resolution features entails, among other things, to average numerous particle images to increase this ratio. In single particle cryo-EM each particle image originates from a biological copy of the macromolecule under investigation. Ideally, these copies would be exact, i.e. each macromolecule is exactly the same as all other copies. Otherwise, during reconstruction, averaging of projections will result in blurry reconstructions. In reality, however, this assumption cannot hold. Especially when active proteins or protein complexes are under investigation, it can be expected that the set of projection images originated from a homogeneous set of macromolecules that are in different conformational states, have different occupancy or have certain regions that could undergo free movement prior to being frozen in the ice. Effort has gone into the development of classification techniques which aim to divide the dataset in such a way that within each class, the assumption of exact copies holds again. Alternatively, subsections of the structure that are in themselves rigid are refined individually. Both approaches only handle free movement suboptimally, though. Classification cannot divide a dataset such that there is no residual movement within each class and refining only sections of a structure does not give a global reconstruction. In this work, I explore a pseudo atom based approach to reconstruction. The volume to be reconstructed is represented using a pseudo atom cloud. When two projection images originate from different macromolecules, movement of the pseudo atoms can model the difference between the two when using the projection images to update the intensities of the pseudo atoms. With this, it is possible to reconstruct a single volume from a dataset with projections originating from different macromolecule states. This approach was fully incorporated into a deep learning framework. This enables further development into an end-to-end machine learning approach. Deep learning algorithms are thereby a promising class of algorithms for cryo-EM processing. In a separate chapter, I explored the possibility of using a generative adversarial network instead of conventional ab-initio reconstruction algorithms. In this approach, a generator learns a volume that represents the real protein in the experimental images, without seeing the experimental images directly. This approach already shows promising results. The learned volume is accurate enough that it can be used as a reference for subsequent high resolution refinement. Multiple parts can still be added and improved to further increase the usability of this approach. In the last part of this thesis, an approach to identify dimeric particles in a homogenous dataset of monomeric and dimeric SARS-CoV-2 RdRp particles is presented. The approach was successfully applied to identify enough dimeric particles to reconstruct the dimeric state with 5.5 Å resolution and subsequently publishing it. Fitting previously published RdRp structures allowed for a closer examination of this dimeric state. Furthermore, we hypothesize that this dimeric form might be functional and plays a role in subgenomic RNA production.
Keywords: Cryo-EM; Image Processing