English | 简体中文
For technical description of the algorithm, please see our paper:
Dongling Xiao*, Han Zhang*, Yukun Li, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang (* : equal contribution)
Preprint January 2020
Accepted by IJCAI-2020
ERNIE-GEN is a multi-flow language generation framework for both pre-training and fine-tuning. We propose a novel span-by-span generation pre-training task to enable the model to generate a semantically-complete span at each step rather than a word, in light of the fact that entities, phrases in human writing are organized in a coherent manner. An infilling generation mechanism and a noise-aware generation method are incorporated into both pre-training and fine-tuning to alleviate the problem of exposure bias. In the pre-training phase, ERNIE-GEN adopts a multi-granularity target fragments sampling strategy to force decoder to rely more on the encoder representations other than the previous generated words to enhancing the correlation between encoder and decoder.
We construct three novel methods to enhance the language generation ability:
Specifically, the span-by-span generation task and word-by-word generation task based on infilling generation mechanism are impemented by a carefully designed Multi-Flow Attention architecture as shown below.
We release the checkpoints for ERNIE-GEN base model and ERNIE-GEN large model which are both pre-trained on English Wikipedia and BookCorpus (totally 16GB). Besides, ERNIE-GEN large pre-trained on the 430GB corpus (see ERNIE-GEN Appendix A.1 for the description of the corpus) is available as well.
We compare the performance of ERNIE-GEN with the existing SOTA pre-training models for natural language generation (UniLM, MASS, PEGASUS, BART and T5) on 5 genration tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA).
The results on Gigaword-10k (10K examples of Gigaword) are presented as follows:
Model | Data / Params | Rouge-1 | Rouge-2 | Rouge-L |
---|---|---|---|---|
UniLM | 16G / 340M | 34.21 | 15.28 | 31.54 |
ENRIE-GEN base | 16G / 110M | 33.75 | 15.23 | 31.35 |
ERNIE-GEN large | 16G / 340M | 35.05 | 16.10 | 32.50 |
ERNIE-GEN large (430G) | 430G / 340M | 35.51 | 16.79 | 33.23 |
The results on Gigaword are presented as follows:
Model | Data / Params | Rouge-1 | Rouge-2 | Rouge-L |
---|---|---|---|---|
MASS | 18G / 160M | 38.73 | 19.71 | 35.96 |
BERTSHARE | 16G / 110M | 38.13 | 19.81 | 35.62 |
UniLM | 16G / 340M | 38.45 | 19.45 | 35.75 |
PEGASUS (C4) | 750G / 568M | 38.75 | 19.96 | 36.14 |
PEGASUS (HugeNews) | 3.8T / 568M | 39.12 | 19.86 | 36.24 |
ENRIE-GEN base | 16G / 110M | 38.83 | 20.04 | 36.20 |
ERNIE-GEN large | 16G / 340M | 39.25 | 20.25 | 36.53 |
ERNIE-GEN large (430G) | 430G / 340M | 39.46 | 20.34 | 36.74 |
We preprocess the raw Gigaword dataset following UniLM, the preprocessed data is avalilable at this Gigaword.
The results on CNN/Daily Mail are presented as follows:
Model | Data / Params | Rouge-1 | Rouge-2 | Rouge-L |
---|---|---|---|---|
MASS | 18G / 160M | 42.12 | 19.50 | 39.01 |
UniLM | 16G / 340M | 43.33 | 20.21 | 40.51 |
T5 large | 750G / 340M | 42.50 | 20.68 | 39.75 |
T5 xlarge | 750G / 11B | 43.52 | 21.55 | 40.69 |
BART | 160G / 400M | 44.16 | 21.28 | 40.90 |
PEGASUS (C4) | 750G / 568M | 43.90 | 21.20 | 40.76 |
PEGASUS (HugeNews) | 3.8T / 568M | 44.17 | 21.47 | 41.11 |
ENRIE-GEN base | 16G / 110M | 42.30 | 19.92 | 39.68 |
ENRIE-GEN large | 16G / 340M | 44.02 | 21.17 | 41.26 |
ENRIE-GEN large (430G) | 430G / 340M | 44.31 | 21.35 | 41.60 |
We preprocess the raw CNN/Daily Mail dataset following UniLM, the preprocessed data is avalilable at this CNN/Daily Mail.
The results on the SQuAD 1.1 dataset following the data split in [Du et al., 2017] are presented as follows:
Model | BLEU-4 | METEOR | Rouge-L |
---|---|---|---|
SemQG | 18.37 | 22.65 | 46.68 |
UniLM large (beam size=1) | 22.12 | 25.06 | 51.07 |
ENRIE-GEN base (beam size=1) | 22.28 | 25.13 | 50.38 |
ERNIE-GEN large (beam size=1) | 24.03 | 26.31 | 52.36 |
ERNIE-GEN large (beam size=5) | 25.40 | 26.92 | 52.84 |
ERNIE-GEN large (beam size=5) + (430G) | 25.41 | 26.77 | 52.91 |
The results following the reversed dev-test data split in [Zhao et al., 2018] are presented as follows:
Model | BLEU-4 | METEOR | Rouge-L |
---|---|---|---|
SemQG | 20.76 | 24.20 | 48.91 |
UniLM large (beam size=1) | 23.75 | 25.61 | 52.04 |
ENRIE-GEN base (beam size=1) | 23.52 | 25.61 | 51.45 |
ERNIE-GEN large (beam size=1) | 25.57 | 26.89 | 53.31 |
ERNIE-GEN large (beam size=5) | 26.95 | 27.57 | 53.77 |
ERNIE-GEN large (beam size=5) + (430G) | 27.05 | 27.43 | 53.83 |
*Note that we also report the results with higher beam size to 5.
The preprocessed data for question generation task can be downloaded from SQuAD.
Comparison with current state-of-the-art results on the multi-turn conversations task (Persona-Chat) is presented as follows:
Model | BLEU-1 | BLEU-2 | Distinct-1 | Distinct-2 |
---|---|---|---|---|
LIC | 40.5 | 32.0 | 0.019 | 0.113 |
PLATO | 45.8 | 35.7 | 0.012 | 0.064 |
PLATO w/o latent | 40.6 | 31.5 | 0.021 | 0.121 |
ERNIE-GEN large | 46.8 | 36.4 | 0.023 | 0.168 |
The training data can be downloaded from Personal-Chat.
Results of development set on CoQA task is presented as follows:
Model | F1-score |
---|---|
Seq2Seq | 27.5 |
PGNet | 45.4 |
UniLM large | 82.5 |
ERNIE-GEN large | 84.5 |
We preprocess the raw CoQA dataset, the preprocessed data is avalilable at this CoQA-preprocessed.
This code base has been tested with Paddle Fluid 1.7 with Python 2.7. Other dependency of ERNIE-GEN is listed in requirements.txt
, you can install it by
pip install -r requirements.txt
Please update LD_LIBRARY_PATH about CUDA, cuDNN, NCCL2 before running ERNIE-GEN. We have put the parameter configurations of the above downstream tasks in config/
. You can easily run finetuning through these configuration files. For example, you can finetune ERNIE-GEN base model on Gigaword by
MODEL="base" # base or large or large_430g
TASK="gigaword" # cnndm, coqa, gigaword, squad_qg or persona-chat
sh run_seq2seq.sh ./configs/${MODEL}/${TASK}_conf
The log of training and the evaluation results are in log/job.log.0
. To finetune on your own task data, you can refer to the data format we provide for processing your data.
Our fine-tuning experiments are carried on 8 NVIDIA V100 (32GB) GPUs. If your GPU memory is not enough, you can reduce the batch size in the corresponding configuration file.
**NOTICE: ** The actual total batch size is equal to configured batch size * number of used gpus
.
The ERNIE-GEN code using dynamic graph is more concise and flexible, please refer to ERNIE-GEN Dygraph for specific use.
The ERNIE-GEN code is compatible with ERNIE 1.0 model. After specifying the parameters related to the model and data in the configuration file, you can use ERNIE 1.0 to fine-tune chinese generation tasks.
You can cite the paper as below:
@article{xiao2020ernie-gen,
title={ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation},
author={Xiao, Dongling and Zhang, Han and Li, Yukun and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:2001.11314},
year={2020}
}
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。