Achieving Scalable AI Inference in a Unified Cloud and HPC Environment by Employing System of Systems Architecture Design
Doctoral thesis
Date of Examination:2025-11-10
Date of issue:2025-12-04
Advisor:Prof. Dr. Julian Kunkel
Referee:Prof. Dr. Julian Kunkel
Referee:Prof. Dr. Arnulf Quadt
Referee:Dr. Christian Boehme
Referee:Prof. Dr. Jens Grabowski
Referee:Prof. Dr. Dagmar Krefting
Referee:Prof. Dr. Roland Leißa
Referee:Prof. Dr. Ramin Yahyapour
Files in this item
Name:Dissertation_Jonathan_Decker_Druck.pdf
Size:7.50Mb
Format:PDF
Abstract
English
Providing access to AI powered services commonly requires access to powerful compute infrastructure equipped with hardware accelerators such as GPUs. The recent rise of LLMs further emphasized this need and already caused many companies to invest in the construction of compute and data centers as well as the development of specialized AI hardware. Many HPC centers are equipped with GPUs but only support batch job orchestration and cannot deploy and expose AI models to provide AI-as-a-Service. Moreover, as AI models might be required to process sensitive and personal data, the platforms exposing them must be designed to handle said data with respect to the applicable privacy regulations, e.g., GDPR. In this work we present two architecture designs of AI inference platforms that enable access to AI models deployed in an HPC environment. The designs preserve the batch job orchestration capabilities of the respective HPC environments while also adhering to privacy regulations. The two designs represent a cloud-native and an HPC-native approach, respectively. The first of the two designs is called Scalable Kubernetes Inference Platform (SKIP) and focuses on a cloud-native approach by building on Kubernetes and employing and configuring mature software products from the Kubernetes ecosystem. The second design is called Scalable AI Accelerator (SAIA), which represents the HPC-native approach and builds on top of Slurm in the HPC-environments. In order to contrast the cloud-native and HPC-native approaches we systematically compared SKIP and SAIA to identify significant tradeoffs and limitations. Furthermore, as both designs consist of many subsequent systems, for instance, to handle the respective cloud and HPC environments, we developed a novel scientific workflow to design and validate System of Systems (SoS) architectures. This method enables the systematic design and evaluation of SoS architectures that internally consist of intermediate SoSs.
Keywords: Kubernetes; Cloud; HPC
