Show simple item record

Aspects of Temporal Patient Similarity in Complex Diseases

dc.contributor.advisorSax, Ulrich Prof. Dr.
dc.contributor.authorHügel, Jonas
dc.date.accessioned2024-12-09T18:02:04Z
dc.date.available2024-12-16T00:50:10Z
dc.date.issued2024-12-09
dc.identifier.urihttp://resolver.sub.uni-goettingen.de/purl?ediss-11858/15665
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-10928
dc.format.extent212de
dc.language.isoengde
dc.subject.ddc510de
dc.titleAspects of Temporal Patient Similarity in Complex Diseasesde
dc.typedoctoralThesisde
dc.contributor.refereeSax, Ulrich Prof. Dr.
dc.date.examination2024-11-29de
dc.description.abstractengWhen comparing patients with complex chronic diseases, such as cancer or Post COVID-19, in Real-World Data (RWD) it is crucial to consider not only their condition at the time point of analysis but also their disease trajectories over time. Typical data mining approaches and patient similarity measures tend to overlook the inherent temporal dimension of RWD, such as Electronic Health Record (EHR) data. To measure patient similarity, the performance of available metrics must be analyzed to select the one resulting in the most realistic similarity scores. Therefore, I applied 88 graph and set theory algorithm combinations to the International Statistical Classification of Diseases and Related Health Problems (ICD) code sets of 29 pancreatic cancer patients in a comprehensive benchmark. Introducing a scaling term resulted in a better representation of comorbidities. While this approach showed a significant correlation (0.75) with clinician-derived similarity scores, it did not consider the inherent temporal dimension of RWD. One possibility to integrate the temporal aspects is to use Transitive Sequential Pattern Mining (tSPM). Based on on the original tSPM algorithm, I developed the Transitive Sequential Pattern Mining Plus (tSPM+) algorithm to mine temporal representations from clinical data. The tSPM algorithm massively outperforms the tSPM algorithm by reducing the memory consumption and the runtime by up to factor 40 and 900, respectively. Furthermore, it provides the duration of the patterns and additional utility functions. I explored encoding sequential patterns mined from EHRs instead of the raw EHR data to render nontemporal Machine Learning (ML) models time-sensitive and to derive temporal patient characteristics. In the context of precision oncology, I investigated available knowledge bases, and data types and contributed to the development of several Extract, Transform, Load (ETL) pipelines in multiple cancer-related research projects to identify available on-premise data. This effort resulted in a pancreatic and a lung cancer cohort, which is feasible for applying the tSPM+ workflow. In two proof-of-concept studies, I integrated sequential patterns into downstream ML approaches, such as Random Forest classification, by extending the SPM+ workflow to extract the temporal characteristics of these cohorts. Subsequent data reviews using a new network visualization approach confirmed that the identified temporal characteristics were clinically sound for both cohorts. Multiple complex diseases, such as Post COVID-19, are defined by complex definitions of exclusions, which are challenging to implement on RWD. In a second, highly relevant use case, I demonstrated how sequential patterns in concert with the utility functions of tSPM+ can be used to curate a Post COVID-19 precision cohort with patient-specific symptoms achieving a positive predictive value of 0.79. This use case provides significant opportunities for Post COVID-19 research by allowing researchers to build symptoms-specific cohorts in large databases. In conclusion, this thesis presents a fundamental approach for integrating the temporal dimension of EHR data into ML tasks for complex chronic diseases, addressing a critical gap in the field of clinical research informatics and precision medicine. This work lays the foundation for further endeavors in modeling temporal disease trajectories, contributing towards a better understanding and treatment of complex chronic diseases.de
dc.contributor.coRefereeEstiri, Hossein Prof. Dr.
dc.contributor.thirdRefereeBellazzi, Riccardo Prof. Dr.
dc.subject.engreal-world datade
dc.subject.engtransitive sequential pattern miningde
dc.subject.engcancerde
dc.subject.engpost COVID-19de
dc.subject.engtemporal characterizationde
dc.subject.engmachine learningde
dc.subject.engehr datade
dc.identifier.urnurn:nbn:de:gbv:7-ediss-15665-5
dc.affiliation.instituteFakultät für Mathematik und Informatikde
dc.subject.gokfullInformatik (PPN619939052)de
dc.description.embargoed2024-12-16de
dc.identifier.ppn1911619233
dc.identifier.orcid0000-0002-4183-1287de
dc.notes.confirmationsentConfirmation sent 2024-12-09T19:45:01de


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record