• Deutsch
    • English
  • English 
    • Deutsch
    • English
  • Login
Item View 
  •   Home
  • Naturwissenschaften, Mathematik und Informatik
  • Fakultät für Mathematik und Informatik (inkl. GAUSS)
  • Item View
  •   Home
  • Naturwissenschaften, Mathematik und Informatik
  • Fakultät für Mathematik und Informatik (inkl. GAUSS)
  • Item View
JavaScript is disabled for your browser. Some features of this site may not work without it.

Big Data Infrastructure for Analysing Digitalized Library Collections

by Triet Doan
Doctoral thesis
Date of Examination:2025-07-09
Date of issue:2025-07-31
Advisor:Prof. Dr. Ramin Yahyapour
Referee:Prof. Dr. Ramin Yahyapour
Referee:Prof. Dr. Bela Gipp
crossref-logoPersistent Address: http://dx.doi.org/10.53846/goediss-11408

 

 

Files in this item

Name:triet-doan-phd-thesis.pdf
Size:13.7Mb
Format:PDF
ViewOpen

The following license files are associated with this item:


Abstract

English

Digital Humanities (DH) represents an interdisciplinary field at the intersection of digital technologies and the study of the humanities. A variety of projects fall under the umbrella of DH, including digital archives, cultural analytics, online publishing, and other related endeavors. The present study focuses on text analysis in the context of DH. The objective of this work is to address two key challenges: the acquisition of data and the conduct of large-scale text analysis. The initial challenge arises from the difficulty in locating historical texts. The second issue arises from the fact that it is not a simple process for DH scientists to conduct an analysis on a large amount of text. Following interviews with numerous DH scientists and discussions with relevant stakeholders, a list of functional and non-functional requirements has been compiled. In light of this, an evaluation of the available services on the market is conducted. It is regrettable that none of the aforementioned services aligns with our requirements. Consequently, a service has been developed with the objective of addressing the aforementioned issues. The newly developed service is designated as MINE. The service offers a search engine that enables users to locate historical texts from a range of data sources. Moreover, users are afforded the option of constructing corpora from the search results or uploaded files. Subsequently, users may instruct the system to analyze their corpora in accordance with the selected text analysis models and parameters. These analyses are executed on a high-performance cluster, which is a powerful computing infrastructure. This allows scientists to perform much larger analyses than they would be able to on their personal desktops or laptops. Although MINE is still in the prototype phase at this time, the majority of the defined requirements have already been achieved. For features which are still under development or discussion, a comprehensive plan for their future implementation is also available.
Keywords: digital humanities; search engine; hpc; knowledge graph
 

Statistik

Publish here

Browse

All of eDissFaculties & ProgramsIssue DateAuthorAdvisor & RefereeAdvisorRefereeTitlesTypeThis FacultyIssue DateAuthorAdvisor & RefereeAdvisorRefereeTitlesType

Help & Info

Publishing on eDissPDF GuideTerms of ContractFAQ

Contact Us | Impressum | Cookie Consents | Data Protection Information | Accessibility
eDiss Office - SUB Göttingen (Central Library)
Platz der Göttinger Sieben 1
Mo - Fr 10:00 – 12:00 h


Tel.: +49 (0)551 39-27809 (general inquiries)
Tel.: +49 (0)551 39-28655 (open access/parallel publications)
ediss_AT_sub.uni-goettingen.de
[Please replace "_AT_" with the "@" sign when using our email adresses.]
Göttingen State and University Library | Göttingen University
Medicine Library (Doctoral candidates of medicine only)
Robert-Koch-Str. 40
Mon – Fri 8:00 – 24:00 h
Sat - Sun 8:00 – 22:00 h
Holidays 10:00 – 20:00 h
Tel.: +49 551 39-8395 (general inquiries)
Tel.: +49 (0)551 39-28655 (open access/parallel publications)
bbmed_AT_sub.uni-goettingen.de
[Please replace "_AT_" with the "@" sign when using our email adresses.]