Contributions to data augmentation techniques and synthetic data for training deep neural networks
Date
2022-06-10Author
Varkarakis, Viktor
Metadata
Show full item recordUsage
This item's downloads: 160 (view details)
Abstract
In the recent years deep learning has become more and more popular and it is applied in
a variety of fields, yielding outstanding results in different machine learning applications.
Deep learning based solutions thrive when a large amount of data is available for a specific
problem but data availability and preparation are the biggest bottlenecks in the deep learning
pipelines. With the fast-changing technology environment, new unique problems arise daily.
In order to realise solutions in many of these specific problem domains there is a growing
need to build custom datasets that are tailored for a particular use case with matching ground
truth data. Acquiring such datasets at the scale required for training with today’s AI systems
and subsequently annotating them with an accurate ground truth is challenging. Furthermore,
with the recent introduction of GDPR and associated complications introduced, industry
now faces additional challenges in the collection of training data that is linked to individual
persons.
This dissertation focuses on ways to overcome the unavailability of real data and avoid
the challenges that come with a data acquisition process. More specifically data augmentation
techniques are proposed to overcome the unavailability of real data, improve performance
and allow the use of low-complexity models, suitable for implementation in edge devices.
Furthermore, the idea of using AI tools to build large synthetic datasets is considered as an
alternative to real data samples. The first steps in order to build and incorporate synthetic
datasets effectively into the deep learning training pipelines include: building AI tools, that
will generate a large amount of new data and/or augment these data samples and also create
methodologies and techniques to validate that the generate data behave like real ones and
also measure whether their use is effective when incorporated in the training pipelines, with
this dissertation contributing to both of these steps.