Perceptual Segmentation of Visual Streams by Tracking of Objects and Parts

Papon, Jeremie

dc.contributor.advisor	Wörgötter, Florentin Prof. Dr.
dc.contributor.author	Papon, Jeremie
dc.date.accessioned	2014-11-27T09:54:02Z
dc.date.available	2014-11-27T09:54:02Z
dc.date.issued	2014-11-27
dc.identifier.uri	http://hdl.handle.net/11858/00-1735-0000-0023-9946-F
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-4794
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-4794
dc.language.iso	eng	de
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/
dc.subject.ddc	510	de
dc.title	Perceptual Segmentation of Visual Streams by Tracking of Objects and Parts	de
dc.type	doctoralThesis	de
dc.contributor.referee	Wörgötter, Florentin Prof. Dr.
dc.date.examination	2014-10-17
dc.description.abstracteng	The ability to parse visual streams into semantically meaningful entities is an essential element of intelligent systems. This process - known as segmentation - is a necessary precursor to high-level behavior which uses vision, such as identification of objects, scene understanding, and task planning. Tracking these segmented entities over time further enriches this knowledge by extending it to the action domain. This work proposes to establish a closed loop between video object segmentation and multi-target tracking to parse streaming visual data. We demonstrate the strengths of this approach, and show how such a framework can be used to distill basic semantic understanding of complex actions in real-time, without the need for a-priori object knowledge. Importantly, this framework is highly robust to occlusions, fast movements, and deforming objects. This thesis has four key contributions, each of which lead towards fast and robust video segmentation through tracking. First, we present Video Segmentation by Relaxation of Tracked Masks, which serves as a proof of concept, demonstrating the feasibility of Dynamic Segment Tracking in 2D video. This method serves as a demonstration of the viability of a feedback loop between Video Object Segmentation and Multi-Target Tracking. This is accomplished using a sequential Bayesian technique to generate predictions which are used to seed a segmentation kernel, the results of which are used to update tracked models. The second contribution consists of a 3D voxel clustering technique, Voxel Cloud Connectivity Segmentation, which makes use of a novel adjacency octree structure to efficiently cluster 3D point cloud data, and provide a graph lattice for the otherwise unstructured points. These clusters of voxels, or supervoxels, and their adjacency graph are used to maintain a world model which serves as an internal buffer for observations for trackers. Importantly, this world model uses ray-tracing to ensure that it does not delete occluded voxels as new frames of data arrive. The third contribution is a novel spatially stratified sampling technique for evaluating the likelihood function in particle filters. In particular, we show that in the case where the measurement function uses spatial correspondence, we can greatly reduce computational cost by exploiting spatial structure to avoid redundant computations. We present results which quantitatively show that the technique permits equivalent, and in some cases, greater accuracy, as a reference point cloud particle filter at significantly faster run-times. We also compare to a GPU implementation, and show that we can exceed their performance on the CPU. In addition, we present results on a multi-target tracking application, demonstrating that the increases in efficiency permit online 6DoF multi-target tracking on standard hardware. Our final contribution is Predictive Association of Supervoxels, which implements a closed loop between segmentation and tracking by minimizing a global energy function which scores supervoxel associations. The energy function is efficiently computed using the adjacency octree, with candidate associations provided by the 3D correspondence based particle filters. The association found determines a fully segmented point cloud, and is used to update the tracker models (as in VSRTM). This allows for the segmentation of temporally consistent supervoxels, avoiding the need to pre-define object models for segmentation. Each of these contributions has been implemented in live systems and run in an online streaming manner. We have performed quantitative evaluation on existing benchmarks to demonstrate state-of-the-art tracking and segmentation performance. In the 2D case, we compare against an existing tracking benchmark, and show that we can match their tracking performance, while in the 3D case we use a benchmark to show that we can outperform a GPU implementation. Finally, we give qualitative results in a robotic teaching application, and show that the system is able to parse real data and to distill semantic understanding from video.	de
dc.contributor.coReferee	Piater, Justus Prof. Dr.
dc.subject.eng	Video Segmentation	de
dc.subject.eng	Point Clouds	de
dc.subject.eng	Segmentation	de
dc.subject.eng	Visual Tracking	de
dc.subject.eng	Computer Vision	de
dc.identifier.urn	urn:nbn:de:gbv:7-11858/00-1735-0000-0023-9946-F-9
dc.affiliation.institute	Fakultät für Mathematik und Informatik	de
dc.subject.gokfull	Informatik (PPN619939052)	de
dc.identifier.ppn	806914211

Dateien

Name:thesis_out.pdf

Größe:31.75Mb

Format:PDF

Öffnen

Name:: thesis_out.pdf
Größe:: 31.75Mb
Format:: PDF

Öffnen

Das Dokument erscheint in:

Fakultät für Mathematik und Informatik (inkl. GAUSS) [518]

Zur Kurzanzeige