Digitization of High-Stakes Exams
Empirical Insights and Design Recommendations for the Digital Execution and Scoring of Exams
von Philipp Hartmann
Datum der mündl. Prüfung:2023-08-29
Erschienen:2023-08-31
Betreuer:Prof. Dr. Matthias Schumann
Gutachter:Prof. Dr. Matthias Schumann
Gutachter:Prof. Dr. Susan Seeber
Gutachter:Prof. Dr Manuel Trenz
Dateien
Name:Dissertation_Print.pdf
Size:2.95Mb
Format:PDF
Zusammenfassung
Englisch
The opportunities for digitization in education have been addressed in research and practice for a long time. These extend across all components of the Curriculum-Instruction-Assessment (CIA) triad according to PELLEGRINO (2010). Thus, digitization is not only changing the required competencies of future workers, but also the way of teaching, learning, and testing. This development is further accelerated by the technological progress in the field of artificial intelligence (AI). In this context, both the STÄNDIGE KONFERENZ DER BILDUNGS- UND KULTUSMINISTER (2022) and the STÄNDIGE WISSENSCHAFTLICHE KOMMISSION (2022) point to the need for increased addressing of digital assessment. A literature analysis conducted as part of this dissertation shows that current research on digital exam execution often focuses on the usage perspective. Thus, primarily an isolated consideration of individual factors influencing the examinees (e.g., stress, familiarity, etc.) takes place. A comprehensive consideration of potential interrelationships between these factors is largely omitted. In the case of digital exam scoring, the use of AI is said to have a high potential in essay scoring. It is shown that currently only the scoring accuracy, but not the design of essay scoring systems, is addressed. This purely technical focus also means that the user perspective (e.g., trust) is not taken into account. Building on these findings, five studies on digital exam execution and scoring are conducted in this cumulative dissertation. In conjunction with the findings from the literature analysis, a total of 13 recommendations for practice were derived based on the results of these five studies. These show that examiners can address usage-oriented factors even before digital exams are conducted. This can reduce the influence of construct-irrelevant factors on test results and thus increase test quality. In the area of digital exam scoring, it is shown that despite technological advances, human scoring involvement can increase confidence in AI-based scorings. Based on these findings, specific design recommendations for semi-automatic AI-based scoring systems are derived. This simplifies the general transfer of technical research results on AI-based exam scoring into productive systems. Finally, further starting points for future research are derived. In particular, the development of large language models (LLM) is expected to have potential.
Keywords: education; digital assessment; high-stakes exams