|Nowadays, robots become more and more integrated into everyday life. Smartphones, desktop computers, and even cars can be thought of as robots, even though probably not autonomous robots. Many discussions about the term "autonomy" have sparked in recent years and one expects from a robot the ability to learn correlations between its actions and the resulting changes in its environment. The robot acts inside the so called action-perception loop, where it acts, similar to a human being, on a scene and is also able to perceive the changes. In this work, two robot systems are built and analyzed in terms of their action-perception loop.
The first part focuses on the perception side. Here, we consider three robots: A flying one and two wheeled ones. These machines have omnidirectional cameras installed. The data acqiered from the sensor usually require preprocessing in real-time. For this purpose a filtering algorithm called Edge-Preserving Filter (EPF) is introduced. It achieves higher quality results than traditional local methods and compared to current global state-of-the art methods its runtime is about three magnitudes faster. EPF performs on any dimension and scales well with data size. This enables it to run on 2d images as well as 1d sensor data, e.g. an accelerometer or gyroscope. Afterwards, the processed data are utilized for pose tracking. Here, a novel Visual Odometry algorithm named Embedded Visual Odometry (EVO) is developed. All computations run in real-time on embedded hardware without external tracking or data link to an external computing station. It is shown that the setup performs appromximately twice as good as current state-of-the art systems. As the proposed framework is entirely bottom-up and runs on embedded hardware, it enables truly autonomous robots.
In the second part, the focus lies on the action side of the action-perception-loop. A general way of bootstrapping, learning, and execution of actions, which is called Semantic Event Chains (SEC) is analyzed. In this work, a novel extension, which allows for high level planning of robot actions, is introduced. First, pose information, which is generated by a novel 3d geometric reasoning algorithm, is included into SECs. This bottom-up abstract layer enables defining preconditions for actions in a natural way, which in turn allows to compute a scene's affordance. Second, adding the postconditions of an action makes the robot estimate the outcome of an action. This leverages high level action planning using only low level methods. SECs are applied to both two-dimensional and three-dimensional image data. Due to their clear structure, SECs can be utilized to solve a wide range of different problems in everyday life.
In total, this work consists of the following novel contributions: An efficient denoising algorithm, a Visual Odometry algorithm for robot pose estimation, and a planning framework, which allows to solve complex action plans using bottom-up, low level data. Each of these contributions has been implemented in live systems and has been run in an online manner. For each algorithm quantitative evaluation on existing benchmarks to demonstrate state-of-the art perception and action is performed.
This work enables robots to navigate in previously unknown and possibly unstructured environments and perform complex action planning.