Computer Vision for Automated Robotic Device Disassembly: Object Detection, Pose Estimation, and Action Prediction
Doctoral thesis
Date of Examination:2024-10-30
Date of issue:2024-11-12
Advisor:Prof. Dr. Florentin Wörgötter
Referee:Prof. Dr. Aleš Ude
Referee:Prof. Dr. Alexander Ecker
Referee:Prof. Dr. Marcus Baum
Referee:Prof. Dr. Fabian Sinz
Referee:Prof. Dr. Stefan Klumpp
Files in this item
Name:thesis_final_07-11-2024_twoside_with_links_c...pdf
Size:39.5Mb
Format:PDF
Abstract
English
The growth in population and technology has led to an increased demand for electronic products, resulting in a higher volume of end-of-life devices. While recycling these products is critical for reducing environmental impact, efficient recycling is rarely prioritized in the design of these products by manufacturers. Currently, recycling automation is primarily limited to specific device models, with processes manually programmed for devices that are received in large quantities by recycling companies. Modifying these processes for different models or devices is often difficult and typically requires a complete overhaul of both hardware and software systems. Robotic automation could make disassembly economically viable, but achieving efficient disassembly of electronic devices presents significant challenges. Devices vary widely in terms of disassembly requirements, even within the same product family. As such, any automated system needs to be capable of handling this high degree of variability. The development of a robust robotic automation system for disassembly has the potential to enhance sustainability efforts by increasing the number of devices that are recycled. Moreover, in electronic recycling, many variations of the same device type, such as smartphones or hard drives, need to be disassembled. These devices often arrive in different physical conditions, requiring a flexible robotic system that can adapt to these variations. Although processing large batches of similar devices can reduce variability, the broad range of models and conditions still presents a major challenge. This dissertation focuses on the computer vision and action prediction methods for the disassembly of electronic devices. More specifically, the objective is to create a software system that generalizes to new in-distribution devices, and is reconfigurable to new device families. The research breaks down the problem into key components: pose estimation, device classification, rotation estimation, gap detection, and action prediction, analysing the performance and generalization capabilities of each method. Pose estimation, employing a segmentation model, demonstrates high accuracy on seen devices and only slightly reduced performance on unseen devices. Device classification is addressed using a supervised convolutional neural network, achieving 99% accuracy, and a zero-shot classification model with an average accuracy of 89%. While the supervised method proves more effective, the zero-shot classification model shows promise for further development. Rotation estimation achieves consistent accuracy across both seen and unseen devices, highlighting its potential as a robust method. For gap detection, a segmentation model outperforms a clustering method on seen devices, in cases involving unseen devices, the segmentation model accuracy varied depending on the complexity of the gaps. Action prediction is approached using two methods: a decision graph and a large language model (LLM) with retrieval augmented generation. The LLM demonstrates the flexibility of a data-driven approach, while the decision graph requires handcrafted decision rules. The LLM is provided with the disassembly objective, context of the possible actions it can take, and knowledge from known disassembly actions, to predict an action. On a test set, it achieves 91% accuracy and outperforms a baseline majority vote. The findings indicate that while generalizing to new devices poses significant challenges, it is achievable with varying degrees of success across different methods. Future work is proposed to enhance these methods' adaptability and performance, aiming to make them suitable for real-world applications in reconfigurable industrial settings. This dissertation establishes a foundational framework for further exploration of generalizable vision and action prediction systems.
Keywords: Robotic disassembly; Automation; Computer vision; Action prediction; Pose estimation; Object detection; Industrial device disassembly; LLMs for action prediction