Contributions to neural network models and training datasets for facial depth
View/ Open
Date
2023-03-27Author
Khan, Faisal
Metadata
Show full item recordUsage
This item's downloads: 203 (view details)
Abstract
The depth estimation problem has made significant progress due to recent improvements in
Convolutional Neural Networks (CNN) and the incorporation of traditional methodologies
in these deep learning systems. Depth estimation is one of the fundamental computer
vision tasks, as it involves the inverse problem of reconstructing the three-dimensional
scene structure from two-dimensional projections. Due to the compactness and low cost of
monocular cameras, there has been a significant and increasing interest in depth estimation
from a single RGB image. Current single-view depth estimation techniques, however, are
extremely slow for real-time inference on an embedded platform and are based on fairly large
deep neural networks that require a large range of training sets. Due to the difficulties in
obtaining dense ground-truth depth at scale across various environments, a range of datasets
with distinctive features and biases have developed. This thesis firstly provides a summary of
the depth estimation datasets, depth estimation techniques, studies, patterns, difficulties, loss
function and opportunities that are present for open research. For effective depth estimation
from a single image frame, a method is proposed to generate synthetic high accuracy human
facial depth from synthetic 3D face models that enables us to train the CNN models to
resolve facial depth estimation challenges. To validate the synthetic facial depth data, a brief
comparison analysis of cutting-edge depth estimation algorithms on individual image frames
from the generated synthetic dataset is proposed. Following that, two different lightweight
encoder-decoder-based neural networks for training on the generated dataset are proposed,
and when tested and evaluated across four public datasets, the proposed networks are shown
to be computationally efficient and outperform the current state-of-the-art. The proposed
lightweight models will allow us to use the low-complexity models, making them suitable
for implementation on edge devices. Synthetic human facial depth data can help overcome
the lack of real data and can increase the performance of the deep learning methods for depth
maps.