Pre-trained Models for Natural Language Processing: A Survey

doi:10.1007/S11431-020-1647-3

Open AccessJournal ArticleDOI

Pre-trained Models for Natural Language Processing: A Survey

Xipeng Qiu, +5 more

- 18 Mar 2020 -

Science China-technological Sciences

- Vol. 63, Iss: 10, pp 1872-1897

Chats0

TLDR

Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era as mentioned in this paper, and a comprehensive review of PTMs for NLP can be found in this survey.

Abstract:

Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four different perspectives. Next, we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Self-supervised Learning: Generative or Contrastive.

Xiao Liu, +6 more

- 15 Jun 2020 -

arXiv: Learning

TL;DR: This survey takes a look into new self-supervised learning methods for representation in computer vision, natural language processing, and graph learning, and comprehensively review the existing empirical methods into three main categories according to their objectives.

...read moreread less

Journal ArticleDOI

Deep Learning--based Text Classification: A Comprehensive Review

Shervin Minaee, +5 more

- 17 Apr 2021 -

ACM Computing Surveys

TL;DR: This paper provided a comprehensive review of more than 150 deep learning-based models for text classification developed in recent years, and discussed their technical contributions, similarities, and strengths, and provided a quantitative analysis of the performance of different deep learning models on popular benchmarks.

...read moreread less

Journal ArticleDOI

Pre-trained Models for Natural Language Processing: A Survey

Xipeng Qiu, +5 more

- 18 Mar 2020 -

arXiv: Computation and Language

TL;DR: This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.

...read moreread less

Posted Content

Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

Laurent Valentin Jospin, +4 more

- 14 Jul 2020 -

arXiv: Learning

TL;DR: This tutorial provides deep learning practitioners with an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian neural networks, i.e., stochastic artificial neural networks trained using Bayesian methods.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018 -

arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Collapse

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

Pre-trained Models for Natural Language Processing: A Survey

Citations

Self-supervised Learning: Generative or Contrastive.

Deep Learning--based Text Classification: A Comprehensive Review

Pre-trained Models for Natural Language Processing: A Survey

Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

A Survey of Large Language Models

References

Long short-term memory

Glove: Global Vectors for Word Representation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Distributed Representations of Words and Phrases and their Compositionality

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Attention is All you Need

Deep contextualized word representations

Glove: Global Vectors for Word Representation