Action-oriented Scene Understanding

Lüddecke, Timo

dc.contributor.advisor	Wörgötter, Florentin Prof. Dr.
dc.contributor.author	Lüddecke, Timo
dc.date.accessioned	2019-09-20T09:47:18Z
dc.date.available	2019-09-20T09:47:18Z
dc.date.issued	2019-09-20
dc.identifier.uri	http://hdl.handle.net/21.11130/00-1735-0000-0003-C1C0-9
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-7632
dc.language.iso	eng	de
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc	510	de
dc.title	Action-oriented Scene Understanding	de
dc.type	doctoralThesis	de
dc.contributor.referee	Sporleder, Caroline Prof. Dr.
dc.date.examination	2019-08-21
dc.description.abstracteng	In order to allow robots to act autonomously it is crucial that they do not only describe their environment accurately but also identify how to interact with their surroundings. While we witnessed tremendous progress in descriptive computer vision, approaches that explicitly target action are scarcer. This cumulative dissertation approaches the goal of interpreting visual scenes “in the wild” with respect to actions implied by the scene. We call this approach action-oriented scene understanding. It involves identifying and judging opportunities for interaction with constituents of the scene (e.g. objects and their parts) as well as understanding object functions and how interactions will impact the future. All of these aspects are addressed on three levels of abstraction: elements, perception and reasoning. On the elementary level, we investigate semantic and functional grouping of objects by analyzing annotated natural image scenes. We compare object label-based and visual context definitions with respect to their suitability for generating meaningful object class representations. Our findings suggest that representations generated from visual context are on-par in terms of semantic quality with those generated from large quantities of text. The perceptive level concerns action identification. We propose a system to identify possible interactions for robots and humans with the environment (affordances) on a pixel level using state-of-the-art machine learning methods. Pixel-wise part annotations of images are transformed into 12 affordance maps. Using these maps, a convolutional neural network is trained to densely predict affordance maps from unknown RGB images. In contrast to previous work, this approach operates exclusively on RGB images during both, training and testing, and yet achieves state-of-the-art performance. At the reasoning level, we extend the question from asking what actions are possible to what actions are plausible. For this, we gathered a dataset of household images associated with human ratings of the likelihoods of eight different actions. Based on the judgement provided by the human raters, we train convolutional neural networks to generate plausibility scores from unseen images. Furthermore, having considered only static scenes previously in this thesis, we propose a system that takes video input and predicts plausible future actions. Since this requires careful identification of relevant features in the video sequence, we analyze this particular aspect in detail using a synthetic dataset for several state-of-the-art video models. We identify feature learning as a major obstacle for anticipation in natural video data. The presented projects analyze the role of action in scene understanding from various angles and in multiple settings while highlighting the advantages of assuming an action-oriented perspective. We conclude that action-oriented scene understanding can augment classic computer vision in many real-life applications, in particular robotics.	de
dc.contributor.coReferee	Piater, Justus Prof. Dr.
dc.contributor.thirdReferee	Yahyapour, Ramin Prof. Dr.
dc.contributor.thirdReferee	Damm, Carsten Prof. Dr.
dc.contributor.thirdReferee	Modrow, Eckart Prof. Dr.
dc.subject.eng	Computer Vision	de
dc.subject.eng	Robotics	de
dc.subject.eng	Affordances	de
dc.subject.eng	Segmentation	de
dc.subject.eng	Object Semantics	de
dc.subject.eng	Neural Networks	de
dc.subject.eng	Action Anticipation	de
dc.subject.eng	Action Plausibility	de
dc.identifier.urn	urn:nbn:de:gbv:7-21.11130/00-1735-0000-0003-C1C0-9-0
dc.affiliation.institute	Fakultät für Mathematik und Informatik	de
dc.subject.gokfull	Informatik (PPN619939052)	de
dc.identifier.ppn	1677423129

Dateien

Name:phd-thesis-final-linearized.pdf

Größe:11.48Mb

Format:PDF

Beschreibung:Dissertation

Öffnen

Name:: phd-thesis-final-linearized.pdf
Größe:: 11.48Mb
Format:: PDF
Beschreibung:: Dissertation

Öffnen

Das Dokument erscheint in:

Fakultät für Mathematik und Informatik (inkl. GAUSS) [519]

Zur Kurzanzeige