Long-Term Location-Independent Research Data Dissemination Using Persistent Identifiers
by Oliver Wannenwetsch
Date of Examination:2017-01-11
Date of issue:2017-03-24
Advisor:Prof. Dr. Ramin Yahyapour
Referee:Prof. Dr. Ramin Yahyapour
Referee:Prof. Dr. Jens Grabowski
Files in this item
Name:dissertation.pdf
Size:5.22Mb
Format:PDF
Abstract
English
Research data occurs in all scientific experiments, computer simulations, observations or as a derivation from other datasets, literature or publications. As a subset of the general concept of digital data, it is classified through its distinct state and its origin. Enriched with descriptive metadata, research data serves as a foundation for discoveries and publishing results in various formats. For citing and linking specific research datasets and publications, unique and persistent identification is necessary. Today, this is realized by Persistent Identifier (PID) systems that provide stable identification for digital entities and an optional annotation by descriptive metadata. Moreover, PID systems abstract the current network location of data in order to anticipate changes in its network location, owed to alternating Uniform Resource Locators (URL) on the World Wide Web (WWW). Applying these concepts, PID systems have tagged billions of research datasets and publications over the past 20 years. On these foundations, the Handle PID system, known from the Digital Object Identifier (DOI) system, provides reliable access to digital publications and research data to the whole scientific community. While the architecture of the Handle system itself, which depends on fixed network locations, was designed with farsightedness, additional end-user services for PID resolution and management have introduced critical weak spots that can be discovered by comprehensively reviewing the current state-of-the-art. This thesis focuses on the adaption of location-independent network paradigms which have shown encouraging results when applied to several problems in the domain of decentralized network infrastructures in PID systems. Our first approaches aim at evolving the Handle system design into a self-adjusting system for all major infrastructure services that does not depend on fixed network locations. We tackle this by incorporating strategies and techniques from location-independent network paradigms originating from the current research branch of Named Data Networking (NDN). By this, major weak spots can be eliminated in the Handle PID system and it becomes robust against core infrastructure outages, sudden network topology changes, packet loss and heavy load situations. The second goal of the thesis is the integration of next generation data dissemination technologies based on location-independent network paradigms into the domain of persistent identifier systems. Therefore, we propose to employ the Handle system for citing research datasets which are disseminated by location-independent technologies based on BitTorrent and NDN. To tackle the trust challenges of dynamic data locations, we create a novel approach for trusted data dissemination in location-independent networks that ensures the authenticity of data as well as the attribution to data issuers. This is done by incorporating the foundations of the Handle PID system and a further format for exchanging complex access information in PIDs.
Keywords: Persistent Identifiers; Overlay Network; Information Centric Network; Named Data Networking; BitTorrent; NDN; ICN; PID