Deep Learning Metadata Fusion for Traffic Light to Lane Assignment
by Tristan Matthias Langenberg
Date of Examination:2019-07-26
Date of issue:2019-08-01
Advisor:Prof. Dr. Florentin Wörgötter
Referee:Prof. Dr. Florentin Wörgötter
Referee:Prof. Dr. Carsten Damm
Referee:Prof. Dr. Wolfgang May
Referee:Prof. Dr. Jens Grabowski
Referee:Prof. Dr. Stephan Waack
Referee:Dr. Minija Tamosiunaite
Files in this item
Name:Dissertation_TristanLangenberg.pdf
Size:12.7Mb
Format:PDF
Description:Dissertation
Abstract
English
This dissertation focuses on a novel deep fusion method with heterogeneous metadata and image data to resolve the one-to-many traffic light to lane assignment problem. The traffic light to lane assignment belongs to the research field of autonomous robotics or driving and is handled using artificial intelligence. The work uses a dataset with over 45 thousand frames from 848 complex intersection scenarios in Germany. Each intersection scenario has as a ground truth, the traffic light to lane connections and is annotated with the following metadata: traffic lights, lane line markings, lane arrow markings, and lane signs. An optimised inverse perspective mapping method is introduced which is independent from extrinsic camera parameters and creates a stitched inverse perspective mapping full panorama image. This method is employed for image data preparation and enables an efficient annotation of inverse perspective mapping lane line markings. At first, it is shown that a convolutional neuronal network can transfer an assignment problem in a regression problem to assign all relevant traffic lights to their associated lanes. Here, an indication vector defines the output of the network. The vector encodes all relevant traffic light column positions as binary information. This introduced strategy resolves the traffic light to lane assignment problem by vision, exclusively. Furthermore, the vision solution is enhanced by a deep metadata fusion approach. This approach is able to fuse heterogeneous metadata into a convolutional neural network. It transforms the metadata into several metadata feature maps. These metadata feature maps are fused into the convolutional neural network by means of an element-wise multiplication and an adaptive weighting technique with the global average of the selected fusion layer. The approach is examined for all working steps, compared against rule-based, only-metadata, and only-vision approaches and extended by a sequence approach. To appraise the deep metadata fusion approach in an expert manner, a subjective test is conducted that measures the real human performance for the traffic light to lane assignment and defines an independent baseline. As result, the deep metadata fusion approach reaches a mean accuracy of 93.7 % and outperforms rule-based, only-metadata, and only-vision approaches significantly. It also outperforms the human performance in the accuracy (+2.7 %) and F1score (+4.1 %) metric for the full dataset. However, the human performance and deep metadata fusion approach achieve an almost identical mean precision result with 92.9 ±1.3 %. Additionally, it results that an early fusion is most effective and all fused metadata feature maps have a positive effect on the results. The ideal fusion operator is the element-wise multiplication and the results increase the closer the vehicle approaches the stop line similar to humans perception.
Keywords: Convolutional Neural Networks; Deep Fusion; Intelligent Transportation Systems; Robotics and Automation; Traffic Light Assistance