Functional Phage Genomics of selected Taxa
by Cynthia Maria Chibani
Date of Examination:2019-05-21
Date of issue:2019-06-26
Advisor:Prof. Dr. Rolf Daniel
Referee:Dr. Heiko Liesegang
Referee:Prof. Dr. Burkhard Morgenstern
Files in this item
Name:Dissertation_CynthiaMariaChibani-noCV.pdf
Size:16.7Mb
Format:PDF
Description:PhD Thesis
Abstract
English
This work reports the potential of a fully automated genome based phage prediction and classification method with ever-increasing amounts of sequencing data. Firstly, we present an approach we call ClassiPhage and ClassiPhages 2.0 which was established describing phage taxonomical classification. ClassiPhagewas generated as a proof of principle on a defined set of phage families infecting Vibrio species while ClassiPhage 2.0 was broadly applied to include all phage families available. The method is based on generating and refining protein profile Hidden Markov Models (HMM) for every group of 12 phage families in total. To test sensitivity and specificity, 5,920 HMMs were used to scan the initial phage protein-coding sequences from 8,721 phages. Thus a cross-scan scoring matrix was generated. We profited from machine learning techniques which are proving to be valuable for extracting critical information and outcome prediction from big data. Thus the cross-scan matrix was used as an input for an artificial neural network (ANN) for phage classification. The accuracy of the ANN reached 84.18 % indicating the efficiency of the method. The method was tested on a set of vibriophages classified via multiple HMM hits results. Our results emphasize the need for more comprehensive and representative phage sequencing data in public databases. Secondly, a method we call IdentiPhage was established describing the prediction of integrated prophages in bacterial genome hosts. The method uses a set of 12 sequence derived features generated from a dataset of 11,373 bacterial using a sliding window approach. To assign a positive phage label to the matrix, we employed 8,721 phage genomes as a reference database for a BLASTn approach. The generated matrix was used as an input for a Deep Neural Network (DNN) for the prediction of potential prophage regions and achieved a specificity of 80.14%. We show that IdentiPhagecan locate prophages without any sequence similarities to known phages by testing the method on a set of experimentally identified Inoviridae phages infecting various Vibrio alginolyticus species. Our results indicate that IdentiPhage plays a complementary role to existing tools. However it would benefit from a feature selection process to select the most informative sequence features for future developments.
Keywords: phages; classification; identification