High-Performance Persistent Identification for Research Data Management

Berber, Fatih

von Fatih Berber

Dissertation

Datum der mündl. Prüfung:2018-09-07

Erschienen:2018-09-17

Betreuer:Prof. Dr. Ramin Yahyapour

Gutachter:Prof. Dr. Ramin Yahyapour

Gutachter:Prof. Dr. Dieter Hogrefe

Zum Verlinken/Zitieren: http://dx.doi.org/10.53846/goediss-7051

Dateien

Name:fatih_thesis.pdf

Size:8.26Mb

Format:PDF

Description:Fatih Berber Dissertation

ViewOpen

Lizenzbestimmungen:

Zusammenfassung

Englisch

Durable identification and access to datasets, especially to research datasets, become increasingly important. This is mainly driven by the explosive dataset growth in the current age. Although the Internet was originally founded as a large-scale end-to-end communication platform, in the current era, it has developed to an information consumption medium with an overwhelming large spreading. However, the conception of the Internet against its original purpose aggravates an efficient data consumption. This is particularly based on the address-based data access mechanism, in which data is only consumable through a specific locator. Since, data mobility therefore leads to changing locators, the concept of persistent identification has been developed to track these changes. Instead of addressing data directly through its current valid locator, Persistent Identifiers (PIDs) enable data retrieval by globally unique and durable identifiers. This in turn has led research datasets to be increasingly assigned with PIDs. With the advent of massive research dataset generation, also the load on PID systems has dramatically increased, which causes PID record management to constitute a considerable performance problem. Therefore, this thesis focuses on the performance aspects behind PIDs. The goal is to provide solutions for high-performance PID management and resolution. Based on the established Handle System, we provide approaches which enable an accelerated usage of PIDs for research datasets, which are stored in sophisticated research data repositories. Moreover, this thesis also provides contributions for the area of performance analysis based on the queuing networks. The basic approach is to model a PID system as a multi-tier transactional Internet system and to mathematically investigate improvements of the response time.

Keywords: Persistent Identifier; PID; Handle System; DOI; Multi-Tier System; Performance; Queuing Theory; Mean-Value-Analysis Algorithm; MVA Algorithm; DNS; Response Time; High-Performance; Data; Data-Management; Research Data; Research Data Management; Research Data Repository; Speedup; Resolution Time

Statistik