Neural transfer learning for natural language processing
Date
2019-06-07Author
Ruder, Sebastian
Metadata
Show full item recordUsage
This item's downloads: 14682 (view details)
Abstract
The current generation of neural network-based natural language processing models excels at learning from large amounts of labelled data. Given these capabilities, natural language processing is increasingly applied to new tasks, new domains, and new languages. Current models, however, are sensitive to noise and adversarial examples and prone to overfitting. This brittleness, together with the cost of attention, challenges the supervised learning paradigm.
Transfer learning allows us to leverage knowledge acquired from related data in order to improve performance on a target task. Implicit transfer learning in the form of pretrained word representations has been a common component in natural language processing. In this dissertation, we argue that more explicit transfer learning is key to deal with the dearth of training data and to improve downstream performance of natural language processing models. We show experimental results transferring knowledge from related domains, tasks, and languages that support this hypothesis.
We make several contributions to transfer learning for natural language processing: Firstly, we propose new methods to automatically select relevant data for supervised and unsupervised domain adaptation. Secondly, we propose two novel architectures that improve sharing in multi-task learning and outperform single-task learning as well as the state-of-the-art. Thirdly, we analyze the limitations of current models for unsupervised cross-lingual transfer and propose a method to mitigate them as well as a novel latent variable cross-lingual word embedding model. Finally, we propose a framework based on fine-tuning language models for sequential transfer learning and analyze the adaptation phase.