README.md · 百度开源/ERNIE-GEN

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Proposed Generation Framework
Pre-trained Models
Fine-tuning on Downstream Tasks
Usage
Citation

For technical description of the algorithm, please see our paper:

ERNIE-GEN:An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Dongling Xiao*, Han Zhang*, Yukun Li, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang (* : equal contribution)

Preprint January 2020

Accepted by IJCAI-2020

ERNIE-GEN is a multi-flow language generation framework for both pre-training and fine-tuning. We propose a novel span-by-span generation pre-training task to enable the model to generate a semantically-complete span at each step rather than a word, in light of the fact that entities, phrases in human writing are organized in a coherent manner. An infilling generation mechanism and a noise-aware generation method are incorporated into both pre-training and fine-tuning to alleviate the problem of exposure bias. In the pre-training phase, ERNIE-GEN adopts a multi-granularity target fragments sampling strategy to force decoder to rely more on the encoder representations other than the previous generated words to enhancing the correlation between encoder and decoder.

Proposed Generation Framework

We construct three novel methods to enhance the language generation ability:

Span-by-span Generation Pre-training Task: to enable model to generate a semantically-complete span at each step rather than a word.
Infilling Genration and Noise-aware Generation: to alleviate the problem of exposure bias.
Multi-Granularity Target Fragments: to enhance the correlation between encoder and decoder during pre-training.

Specifically, the span-by-span generation task and word-by-word generation task based on infilling generation mechanism are impemented by a carefully designed Multi-Flow Attention architecture as shown below.

multi-flow-attention

Pre-trained Models

We release the checkpoints for ERNIE-GEN base model and ERNIE-GEN large model which are both pre-trained on English Wikipedia and BookCorpus (totally 16GB). Besides, ERNIE-GEN large pre-trained on the 430GB corpus (see ERNIE-GEN Appendix A.1 for the description of the corpus) is available as well.

ERNIE-GEN base (lowercased | 12-layer, 768-hidden, 12-heads, 110M parameters)
ERNIE-GEN large (lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters)
ERNIE-GEN large with 430G (lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters)

Fine-tuning on Downstream Tasks

We compare the performance of ERNIE-GEN with the existing SOTA pre-training models for natural language generation (UniLM, MASS, PEGASUS, BART and T5) on 5 genration tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA).

Abstractive Summarization

Gigaword

The results on Gigaword-10k (10K examples of Gigaword) are presented as follows:

Model	Data / Params	Rouge-1	Rouge-2	Rouge-L
UniLM	16G / 340M	34.21	15.28	31.54
ENRIE-GEN base	16G / 110M	33.75	15.23	31.35
ERNIE-GEN large	16G / 340M	35.05	16.10	32.50
ERNIE-GEN large (430G)	430G / 340M	35.51	16.79	33.23

The results on Gigaword are presented as follows:

Model	Data / Params	Rouge-1	Rouge-2	Rouge-L
MASS	18G / 160M	38.73	19.71	35.96
BERTSHARE	16G / 110M	38.13	19.81	35.62
UniLM	16G / 340M	38.45	19.45	35.75
PEGASUS (C4)	750G / 568M	38.75	19.96	36.14
PEGASUS (HugeNews)	3.8T / 568M	39.12	19.86	36.24
ENRIE-GEN base	16G / 110M	38.83	20.04	36.20
ERNIE-GEN large	16G / 340M	39.25	20.25	36.53
ERNIE-GEN large (430G)	430G / 340M	39.46	20.34	36.74

We preprocess the raw Gigaword dataset following UniLM, the preprocessed data is avalilable at this Gigaword.

CNN/Daily Mail

The results on CNN/Daily Mail are presented as follows:

Model	Data / Params	Rouge-1	Rouge-2	Rouge-L
MASS	18G / 160M	42.12	19.50	39.01
UniLM	16G / 340M	43.33	20.21	40.51
T5 large	750G / 340M	42.50	20.68	39.75
T5 xlarge	750G / 11B	43.52	21.55	40.69
BART	160G / 400M	44.16	21.28	40.90
PEGASUS (C4)	750G / 568M	43.90	21.20	40.76
PEGASUS (HugeNews)	3.8T / 568M	44.17	21.47	41.11
ENRIE-GEN base	16G / 110M	42.30	19.92	39.68
ENRIE-GEN large	16G / 340M	44.02	21.17	41.26
ENRIE-GEN large (430G)	430G / 340M	44.31	21.35	41.60

We preprocess the raw CNN/Daily Mail dataset following UniLM, the preprocessed data is avalilable at this CNN/Daily Mail.

Question Generation

SQuAD

The results on the SQuAD 1.1 dataset following the data split in [Du et al., 2017] are presented as follows:

Model	BLEU-4	METEOR	Rouge-L
SemQG	18.37	22.65	46.68
UniLM large (beam size=1)	22.12	25.06	51.07
ENRIE-GEN base (beam size=1)	22.28	25.13	50.38
ERNIE-GEN large (beam size=1)	24.03	26.31	52.36
ERNIE-GEN large (beam size=5)	25.40	26.92	52.84
ERNIE-GEN large (beam size=5) + (430G)	25.41	26.77	52.91

The results following the reversed dev-test data split in [Zhao et al., 2018] are presented as follows:

Model	BLEU-4	METEOR	Rouge-L
SemQG	20.76	24.20	48.91
UniLM large (beam size=1)	23.75	25.61	52.04
ENRIE-GEN base (beam size=1)	23.52	25.61	51.45
ERNIE-GEN large (beam size=1)	25.57	26.89	53.31
ERNIE-GEN large (beam size=5)	26.95	27.57	53.77
ERNIE-GEN large (beam size=5) + (430G)	27.05	27.43	53.83

*Note that we also report the results with higher beam size to 5.

The preprocessed data for question generation task can be downloaded from SQuAD.

Generative Dialogue Response

Personal-Chat

Comparison with current state-of-the-art results on the multi-turn conversations task (Persona-Chat) is presented as follows:

Model	BLEU-1	BLEU-2	Distinct-1	Distinct-2
LIC	40.5	32.0	0.019	0.113
PLATO	45.8	35.7	0.012	0.064
PLATO w/o latent	40.6	31.5	0.021	0.121
ERNIE-GEN large	46.8	36.4	0.023	0.168

The training data can be downloaded from Personal-Chat.

Generative Question Answering

CoQA

Results of development set on CoQA task is presented as follows:

Model	F1-score
Seq2Seq	27.5
PGNet	45.4
UniLM large	82.5
ERNIE-GEN large	84.5

We preprocess the raw CoQA dataset, the preprocessed data is avalilable at this CoQA-preprocessed.

Usage

Install PaddlePaddle

This code base has been tested with Paddle Fluid 1.7 with Python 2.7. Other dependency of ERNIE-GEN is listed in requirements.txt, you can install it by

pip install -r requirements.txt

Fine-tuning

Please update LD_LIBRARY_PATH about CUDA, cuDNN, NCCL2 before running ERNIE-GEN. We have put the parameter configurations of the above downstream tasks in config/. You can easily run finetuning through these configuration files. For example, you can finetune ERNIE-GEN base model on Gigaword by

MODEL="base"      # base or large or large_430g
TASK="gigaword"   # cnndm, coqa, gigaword, squad_qg or persona-chat
sh run_seq2seq.sh ./configs/${MODEL}/${TASK}_conf

The log of training and the evaluation results are in log/job.log.0. To finetune on your own task data, you can refer to the data format we provide for processing your data.

Our fine-tuning experiments are carried on 8 NVIDIA V100 (32GB) GPUs. If your GPU memory is not enough, you can reduce the batch size in the corresponding configuration file.

**NOTICE: ** The actual total batch size is equal to configured batch size * number of used gpus.

Employ Dynamic Computation Graph

The ERNIE-GEN code using dynamic graph is more concise and flexible, please refer to ERNIE-GEN Dygraph for specific use.

The ERNIE 1.0 is avaliable for Chinese Generation Tasks

The ERNIE-GEN code is compatible with ERNIE 1.0 model. After specifying the parameters related to the model and data in the configuration file, you can use ERNIE 1.0 to fine-tune chinese generation tasks.

Citation

You can cite the paper as below:

@article{xiao2020ernie-gen,
  title={ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation},
  author={Xiao, Dongling and Zhang, Han and Li, Yukun and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
  journal={arXiv preprint arXiv:2001.11314},
  year={2020}
}

百度开源 / ERNIE-GEN

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Proposed Generation Framework

Pre-trained Models

Fine-tuning on Downstream Tasks

Abstractive Summarization

Question Generation

Generative Dialogue Response

Generative Question Answering

Usage

Install PaddlePaddle

Fine-tuning

Employ Dynamic Computation Graph

The ERNIE 1.0 is avaliable for Chinese Generation Tasks

Citation

简介

发行版

贡献者

近期动态

百度开源 / ERNIE-GEN .gitee-modal { width: 500px !important; }

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Proposed Generation Framework

Pre-trained Models

Fine-tuning on Downstream Tasks

Abstractive Summarization

Question Generation

Generative Dialogue Response

Generative Question Answering

Usage

Install PaddlePaddle

Fine-tuning

Employ Dynamic Computation Graph

The ERNIE 1.0 is avaliable for Chinese Generation Tasks

Citation

简介

发行版

贡献者

近期动态

搜索帮助

百度开源 / ERNIE-GEN