Ontology-based transformation of natural language queries into SPARQL queries by evolutionary algorithms
by Sebastian Schrage
Date of Examination:2022-01-10
Date of issue:2022-01-21
Advisor:Prof. Dr. Wolfgang May
Referee:Prof. Dr. Wolfgang May
Referee:Prof. Dr. Florentin Wörgötter
Files in this item
Name:eDiss_NLQtoSPARQL_Schrage_2022.pdf
Size:11.5Mb
Format:PDF
Abstract
English
In this thesis an ontology-driven evolutionary learning system for natural language querying of RDF graphs is presented. The learning system itself does not answer the query, but generates a SPARQL query against the database. For this purpose, the Evolutionary Dataflow Agents framework, a general learning framework is introduced that, based on evolutionary algorithms, creates agents that learn to solve a problem. The main idea of the framework is to support problems that combine a medium-sized search space (use case: analysis of natural language queries) of strictly, formally structured solutions (use case: synthesis of database queries), with rather local classical structural and algorithmic aspects. For this, the agents combine local algorithmic functionality of nodes with a flexible dataflow between the nodes to a global problem solving process. Roughly, there are nodes that generate informational fragments by combining input data and/or earlier fragments, often using heuristics-based guessing. Other nodes combine, collect, and reduce such fragments towards possible solutions, and narrowing these towards the unique final solution. For this, informational items are floating through the agents. The configuration of these agents, what nodes they combine, and where exactly the data items are flowing, is subject to learning. The training starts with simple agents, which –as usual in learning frameworks– solve a set of tasks, and are evaluated for it. Since the produced answers usually have complex structures answers, the framework employs a novel fine-grained energy-based evaluation and selection step. The selected agents then are the basis for the population of the next round. Evolution is provided as usual by mutations and agent fusion. As a use case, EvolNLQ has been implemented, a system for answering natural language queries against RDF databases. For this, the underlying ontology medatata is (externally) algorithmically preprocessed. For the agents, appropriate data item types and node types are defined that break down the processes of language analysis and query synthesis into more or less elementary operations. The "size" of operations is determined by the border between computations, i.e., purely algorithmic steps (implemented in individual powerful nodes) and simple heuristic steps (also realized by simple nodes), and free dataflow allowing for arbitrary chaining and branching configurations of the agents. EvolNLQ is compared with some other approaches, showing competitive results.
Keywords: Ontology; Natural Query Processing; Evolutionary Algorithms