Skip to content

Latest commit

 

History

History
21 lines (18 loc) · 1.7 KB

week6-multi-task-learning-deep-networks.md

File metadata and controls

21 lines (18 loc) · 1.7 KB

An Overview of Multi-Task Learning in Deep Neural Networks∗

Author: Sebastian Ruder

Idea

  • Multi-task learning, sharing representations between related task in order for the model to learn and generalize better on the task.

  • Hard Parameter Sharing:

    • involves sharing the hidden layers while keeping the output layers task specific
    • helps in reducing overfitting
  • Soft Parameter Sharing:

    • each task has its own separate model with with own parameters
  • Inductive Bias, provided by auxiliary tasks causes the model to prefer the assumption that explains more than one task.

Recent Work

  • Deep Relationship Networks, learns relationship between tasks using priors but still dependent on a pre-defined structure for sharing which can be errorenous.
  • Fully-Adaptive Feature Sharing, bottom-up approach which groups similar tasks. may not produce globally optimal models.
  • Cross-stitch Networks, 2 seperate models stitched together by learning a linear combination of the output of the previous layers and make use of the knowledge of the other tasks.
  • A Joint Many-Task Model,utilizes several NLP based tasks in the form of a heirarchial structure for jointly performing multi-task learning.
  • Weighting losses with uncertainty, architecture based on adjusting the relative weights derived from a multi-task loss function, utlizing the uncertainity of the task.
  • Sluice Networks, learns the layers and subspaces that must be shared and the layers at which the model has learned the best representations of the input sequences. generalizes deep-learning based MTL models like cross-stitch networks, etc.