Contributions to data augmentation techniques and synthetic data for training deep neural networks

Varkarakis, Viktor

View/Open

Final_Thesis_Report_Viktor_Varkarakis.pdf (20.02Mb)

Date

2022-06-10

Author

Varkarakis, Viktor

Metadata

Show full item record

Usage

This item's downloads: 160 (view details)

Abstract

In the recent years deep learning has become more and more popular and it is applied in a variety of fields, yielding outstanding results in different machine learning applications. Deep learning based solutions thrive when a large amount of data is available for a specific problem but data availability and preparation are the biggest bottlenecks in the deep learning pipelines. With the fast-changing technology environment, new unique problems arise daily. In order to realise solutions in many of these specific problem domains there is a growing need to build custom datasets that are tailored for a particular use case with matching ground truth data. Acquiring such datasets at the scale required for training with today’s AI systems and subsequently annotating them with an accurate ground truth is challenging. Furthermore, with the recent introduction of GDPR and associated complications introduced, industry now faces additional challenges in the collection of training data that is linked to individual persons. This dissertation focuses on ways to overcome the unavailability of real data and avoid the challenges that come with a data acquisition process. More specifically data augmentation techniques are proposed to overcome the unavailability of real data, improve performance and allow the use of low-complexity models, suitable for implementation in edge devices. Furthermore, the idea of using AI tools to build large synthetic datasets is considered as an alternative to real data samples. The first steps in order to build and incorporate synthetic datasets effectively into the deep learning training pipelines include: building AI tools, that will generate a large amount of new data and/or augment these data samples and also create methodologies and techniques to validate that the generate data behave like real ones and also measure whether their use is effective when incorporated in the training pipelines, with this dissertation contributing to both of these steps.

URI

http://hdl.handle.net/10379/17194

Collections

University of Galway Theses (PhD Theses)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland