Advancing Machine Learning Algorithms for Object Localization in Data-Limited Scenarios : Techniques for 6DoF Pose Estimation and 2D Localization with limited Data

Pöllabauer, Thomas Jürgen

Advancing Machine Learning Algorithms for Object Localization in Data-Limited Scenarios : Techniques for 6DoF Pose Estimation and 2D Localization with limited Data

Files

Dissertation.pdf (4.67 MB)

Date

2025-01-20

Authors

Pöllabauer, Thomas Jürgen

Abstract

Recent successes of Machine Learning (ML) algorithms have profoundly influenced many fields, particularly Computer Vision (CV). One longstanding problem in CV is the task of determining the position and orientation of an object as depicted in an image in 3D space, relative to the recording camera sensor. Accurate pose estimation is essential for domains, such as robotics, augmented reality, autonomous driving, quality inspection in manufacturing, and many more. Current state-of-the-art pose estimation algorithms are dominated by Deep Learning-based approaches. However, adoption of these best in class algorithms to real-world tasks is often constrained by data limitations, such as not enough training data being available, existing data being of insufficient quality, data missing annotations, data having noisy annotations, or no directly suitable training data being available at all. This thesis presents contributions on both 6D object pose estimation, as well as on alleviating the restrictions of data limitations, for pose estimation, and for related CV problems such as classification, segmentation, and 2D object detection. It offers a range of solutions to enhance quality and efficiency of these tasks under different kinds of data limitations. The first contribution enhances a state-of-the-art pose estimation algorithm to predict a probability distribution of poses, instead of a single pose estimate. This approach allows to sample multiple, plausible poses for further refinement and outperforms the baseline algorithm even when sampling only the most likely pose. In our second contribution, we drastically improve runtime and reduce resource requirements to bring state-of-the-art pose estimation to low power edge devices, such as modern augmented and extended reality devices. Finally, we extend a pose estimator based on dense-feature prediction to incorporate additional views and illustrate its performance benefits in the stereo use case. The second set of two contributions focuses on data generation for ML-based CV tasks. High quality training data is a crucial component for best performance. We introduce a novel yet simple setup to record physical objects and generate all necessary annotations in a fully automated way. Evaluated on the 2D object detection use case, training on our data performs favourably with more complex data generation processes, such as real-world recordings and physically-based rendering. In a follow-up paper, we further improve upon the results by introducing a novel postprocessing step based on denoising diffusion probabilistic models (DDPM). At the intersection of 6D pose estimation and data generation methods, a final group of three contributions focuses on solving or circumventing the data problem with a range of different approaches. First, we demonstrate the use of physically-based, photorealistic, and non-photorealistic rendering to localize objects on Microsoft HoloLens 2, without needing any real-world images for training. Second, we extend a zero-shot pose estimation method by predicting geometric features, thereby improving estimation quality with almost no additional runtime. Third, we demonstrate pose estimation of objects with unseen appearances based on a 3D scene representation, allowing robust mesh-free pose estimation. In summary, this thesis advances the fields of 6D object pose estimation and alleviates some common data limitations for pose estimation and similar Machine Learning algorithms in Computer Vision problems, such as 2D detection and segmentation. The solutions proposed include several extensions to state-of-the-art 6D pose estimators and address the challenges of limited or poor quality training data, paving the way for more accurate, efficient, and accessible pose estimation technologies across various industries and fields.