Sammendrag
Exploring and utilizing the world's oceans in a sustainable manner is crucial for solving challenges such as global warming. Autonomous underwater robots aim to replace human operators in dangerous and harsh underwater environments while efficiently and sustainably surveying, exploring and acting on underwater structures. Essential for autonomy is the ability to map the surroundings, and this thesis explores the mapping of underwater environments with a monocular camera using deep learning.
A conditional variational autoencoder (CVAE) conditioned on the deep features of images is implemented as a baseline for estimating the depth of scenes, trained in a supervised fashion on synthetic underwater images. The aleatoric uncertainty of its predictions is additionally estimated, allowing the predictions to be continuously evaluated and fused with other probabilistic models. Several modifications to the baseline architecture are proposed to improve depth estimation in underwater environments. An edge detector based on the estimated aleatoric uncertainty is derived, allowing smoothness priors on otherwise unsuited data for geometric smoothing. A novel method for incorporating auxiliary sparse depth is proposed, fusing sparse data at multiple scales in a late fusion scheme for convolutional neural networks. Spatiotemporal networks are also investigated, using multiple temporally adjacent images as input to the model. A pre-training scheme for learning motion filters in 2D spatiotemporal networks is presented, and the generation of geometrically consistent depth estimates between multiple views is also explored. Results on a synthetic, photorealistic underwater dataset show that many of the proposed modifications improve the performance of the baseline, both in terms of qualitative and quantitative results. However, the model is only evaluated on synthetic data, and it remains to be seen how the proposed modifications affect performance in actual underwater environments.
Vis fullstendig beskrivelse