dc.contributor.advisor | Wörgötter, Florentin Prof. Dr. | de |
dc.contributor.author | Abramov, Alexey | de |
dc.date.accessioned | 2012-10-16T15:51:50Z | de |
dc.date.accessioned | 2013-01-18T13:23:11Z | de |
dc.date.available | 2013-01-30T23:50:55Z | de |
dc.date.issued | 2012-10-16 | de |
dc.identifier.uri | http://hdl.handle.net/11858/00-1735-0000-000D-F073-9 | de |
dc.identifier.uri | http://dx.doi.org/10.53846/goediss-2539 | |
dc.identifier.uri | http://dx.doi.org/10.53846/goediss-2539 | |
dc.description.abstract | Sehen, Hören, Fühlen, Geruch und Geschmack
gehören zu den wichtigsten menschlichen Sinnen. Sie verbinden
verschiedene Aspekte: sehen bezieht sich zum Beispiel auf
mindestens drei wahrzunehmende Modalitäten: Bewegung, Farbe und
Intensität. Das Extrahieren von disen Modalitäten fängt im
menschlichen Auge im Retinalnetz an und die resultierenden Signale
dringen ins Gehirn als Datenketten von raumzeitlichen Mustern ein.
Sehen ist für uns der wichtigste Sinn für die Wahrnehmung von
dreidimensionalen Strukturen in der Welt um uns herum. Bis heute
wurden unzählige Versuche gemacht, die bis heute bekannten
Kenntnisse in einem künstlichen Visionsystem zu verstehen und zu
simulieren. Forschungsergebnisse aus den letzten Jahrzehnten in den
Bereichen der digitalen Bildverarbeitung und dem maschinellen
Sehen, verbunden mit den Fortschritten in Hardware für die
Parallelbearbeitung, ermöglichen den Aufbau des sogenannten
kognitiven Visionsystems und dessen Anwendung in Robotern. Das Ziel
eines kognitiven Visionsystems besteht in der Umwandlung der
visuellen Eingabe in die deskriptivere Darstellung. Darüber hinaus
benötigen die meisten Roboter “live | de |
dc.format.mimetype | application/pdf | de |
dc.language.iso | eng | de |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/ | de |
dc.title | Compression of visual data into symbol-like descriptors in terms of a cognitive real-time vision system | de |
dc.type | doctoralThesis | de |
dc.title.translated | Die Verdichtung der Videoeingabe in symbolische Deskriptoren im Rahmen des kognitiven Echtzeitvisionsystems | de |
dc.contributor.referee | Wörgötter, Florentin Prof. Dr. | de |
dc.date.examination | 2012-07-18 | de |
dc.subject.dnb | 004 Informatik | de |
dc.subject.gok | Computer science | de |
dc.description.abstracteng | Humans have main senses: sight, hearing,
touch, smell, and taste. Most of them combine several aspects. For
example vision addresses at least three perceptual modalities:
motion, color, and luminance. Extraction of these modalities begins
in the human eye in the retinal network and the preprocessed
signals enter the brain as streams of spatio-temporal patterns. As
vision is our main sense, particularly for the perception of the
three dimensional structure of the world around us, major eorts
have been made to understand and simulate the visual system based
on the knowledge collected to date. The research done over the last
decades in elds of image processing and computer vision coupled
with a tremendous step forward in hardware for parallel computing
opened the door to building of so-called cognitive vision systems
and for their incorporation into robots. The goal of any cognitive
vision system is to transform visual input information into more
descriptive representations than just color, motion, or luminance.
Furthermore, in most robotic systems \live" interactions of robots
with the environment are required, greatly increasing demands on
the system. In such systems all pre-computations of the visual data
need to be performed in real-time in order to be able to use the
output data in the perception-action loop. Thus, a central goal of
this thesis is to provide techniques which are strictly compatible
with real-time computation. In the first part of this thesis we
investigate possibilities for the powerful compression of the
initial visual input data into symbol-like descriptors, upon which
abstract logic or learning schemes can be applied. We introduce a
new real-time video segmentation framework performing automatic
decomposition of monocular and stereo video streams without use of
prior knowledge on data and considering only preceding information.
All entities in the scene, representing objects or their parts, are
uniquely identied. In the second part of the thesis we make
additional use of stereoscopic visual information and address the
problem of establishing correspondences between two views of the
scene solved with apparent ease in the human visual system (for
images acquired with left and right eye). We exploit these
correspondences in the stereo image pairs for the estimation of
depth (distance) by proposing a novel disparity measurement
technique based on extracted stereo-segments. This technique
approximates shape and computes depth information for all entities
found in the scene. The most important and novel achievement of
this approach is that it produces reliable depth information for
objects with weak texture where performance of traditional stereo
techniques is very poor. In the third part of this thesis we employ
an active sensor, producing indoors much more precise depth
information encoded as range-data than any passive stereo
technique. We perform fusion of image and range data for video
segmentation which results in better results. By this we can now
even handle fast moving objects, which was not possible so far. To
address the real-time constraint, the proposed segmentation
framework was accelerated on a Graphics Processing Unit (GPU)
architecture using the parallel programming model of Compute Uni ed
Device Architecture (CUDA). All introduced methods: segmentation of
single images, segmentation of monocular and stereo video streams,
depth-supported video segmentation, and disparity computation from
stereosegment correspondences run in real-time for middle-size
images and close to real-time for higher resolutions. In summary:
The main result of this thesis is a framework which can produce a
compact representation of any visual scene where all meaningful
entities are uniquely identied, tracked, and important descriptors,
such as shape and depth information, are extracted. The ability of
the framework was successfully demonstrated in the context of
several European projects (PACO-PLUS, Garnics, IntellAct, and
Xperience). The developed real-time system is now employed as a
robust visual front-end in various real-time robotic systems. | de |
dc.contributor.coReferee | Kurth, Winfried Prof. Dr. | de |
dc.subject.topic | Mathematics and Computer Science | de |
dc.subject.ger | Bildverarbeitung | de |
dc.subject.ger | Maschinelles Sehen | de |
dc.subject.ger | Videosegmentierung | de |
dc.subject.ger | Stereo vision | de |
dc.subject.eng | Image processing | de |
dc.subject.eng | Computer vision | de |
dc.subject.eng | Video segmentation | de |
dc.subject.eng | Stereo vision | de |
dc.subject.bk | Informatik | de |
dc.identifier.urn | urn:nbn:de:gbv:7-webdoc-3740-9 | de |
dc.identifier.purl | webdoc-3740 | de |
dc.affiliation.institute | Mathematisch-Naturwissenschaftliche Fakultäten | de |
dc.identifier.ppn | 737898917 | de |