High-Performance Persistent Identification for Research Data Management
by Fatih Berber
Date of Examination:2018-09-07
Date of issue:2018-09-17
Advisor:Prof. Dr. Ramin Yahyapour
Referee:Prof. Dr. Ramin Yahyapour
Referee:Prof. Dr. Dieter Hogrefe
Files in this item
Name:fatih_thesis.pdf
Size:8.26Mb
Format:PDF
Description:Fatih Berber Dissertation
Abstract
English
Durable identification and access to datasets, especially to research datasets, become increasingly important. This is mainly driven by the explosive dataset growth in the current age. Although the Internet was originally founded as a large-scale end-to-end communication platform, in the current era, it has developed to an information consumption medium with an overwhelming large spreading. However, the conception of the Internet against its original purpose aggravates an efficient data consumption. This is particularly based on the address-based data access mechanism, in which data is only consumable through a specific locator. Since, data mobility therefore leads to changing locators, the concept of persistent identification has been developed to track these changes. Instead of addressing data directly through its current valid locator, Persistent Identifiers (PIDs) enable data retrieval by globally unique and durable identifiers. This in turn has led research datasets to be increasingly assigned with PIDs. With the advent of massive research dataset generation, also the load on PID systems has dramatically increased, which causes PID record management to constitute a considerable performance problem. Therefore, this thesis focuses on the performance aspects behind PIDs. The goal is to provide solutions for high-performance PID management and resolution. Based on the established Handle System, we provide approaches which enable an accelerated usage of PIDs for research datasets, which are stored in sophisticated research data repositories. Moreover, this thesis also provides contributions for the area of performance analysis based on the queuing networks. The basic approach is to model a PID system as a multi-tier transactional Internet system and to mathematically investigate improvements of the response time.
Keywords: Persistent Identifier; PID; Handle System; DOI; Multi-Tier System; Performance; Queuing Theory; Mean-Value-Analysis Algorithm; MVA Algorithm; DNS; Response Time; High-Performance; Data; Data-Management; Research Data; Research Data Management; Research Data Repository; Speedup; Resolution Time