1 Star 0 Fork 0

百度开源 / ERNIE-GEN

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
README.md 15.61 KB
一键复制 编辑 原始数据 按行查看 历史
Indexea 提交于 2024-02-20 10:52 . init

English | 简体中文

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

For technical description of the algorithm, please see our paper:

ERNIE-GEN:An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Dongling Xiao*, Han Zhang*, Yukun Li, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang (* : equal contribution)

Preprint January 2020

Accepted by IJCAI-2020

ERNIE-GEN Gigaword Gigaword SQuAD Personal-Chat CoQA

ERNIE-GEN is a multi-flow language generation framework for both pre-training and fine-tuning. We propose a novel span-by-span generation pre-training task to enable the model to generate a semantically-complete span at each step rather than a word, in light of the fact that entities, phrases in human writing are organized in a coherent manner. An infilling generation mechanism and a noise-aware generation method are incorporated into both pre-training and fine-tuning to alleviate the problem of exposure bias. In the pre-training phase, ERNIE-GEN adopts a multi-granularity target fragments sampling strategy to force decoder to rely more on the encoder representations other than the previous generated words to enhancing the correlation between encoder and decoder.

Proposed Generation Framework

We construct three novel methods to enhance the language generation ability:

  • Span-by-span Generation Pre-training Task: to enable model to generate a semantically-complete span at each step rather than a word.
  • Infilling Genration and Noise-aware Generation: to alleviate the problem of exposure bias.
  • Multi-Granularity Target Fragments: to enhance the correlation between encoder and decoder during pre-training.

Specifically, the span-by-span generation task and word-by-word generation task based on infilling generation mechanism are impemented by a carefully designed Multi-Flow Attention architecture as shown below.

multi-flow-attention

Pre-trained Models

We release the checkpoints for ERNIE-GEN base model and ERNIE-GEN large model which are both pre-trained on English Wikipedia and BookCorpus (totally 16GB). Besides, ERNIE-GEN large pre-trained on the 430GB corpus (see ERNIE-GEN Appendix A.1 for the description of the corpus) is available as well.

Fine-tuning on Downstream Tasks

We compare the performance of ERNIE-GEN with the existing SOTA pre-training models for natural language generation (UniLM, MASS, PEGASUS, BART and T5) on 5 genration tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA).

Abstractive Summarization

  • Gigaword

The results on Gigaword-10k (10K examples of Gigaword) are presented as follows:

Model Data / Params Rouge-1 Rouge-2 Rouge-L
UniLM 16G / 340M 34.21 15.28 31.54
ENRIE-GEN base 16G / 110M 33.75 15.23 31.35
ERNIE-GEN large 16G / 340M 35.05 16.10 32.50
ERNIE-GEN large (430G) 430G / 340M 35.51 16.79 33.23

The results on Gigaword are presented as follows:

Model Data / Params Rouge-1 Rouge-2 Rouge-L
MASS 18G / 160M 38.73 19.71 35.96
BERTSHARE 16G / 110M 38.13 19.81 35.62
UniLM 16G / 340M 38.45 19.45 35.75
PEGASUS (C4) 750G / 568M 38.75 19.96 36.14
PEGASUS (HugeNews) 3.8T / 568M 39.12 19.86 36.24
ENRIE-GEN base 16G / 110M 38.83 20.04 36.20
ERNIE-GEN large 16G / 340M 39.25 20.25 36.53
ERNIE-GEN large (430G) 430G / 340M 39.46 20.34 36.74

We preprocess the raw Gigaword dataset following UniLM, the preprocessed data is avalilable at this Gigaword.

  • CNN/Daily Mail

The results on CNN/Daily Mail are presented as follows:

Model Data / Params Rouge-1 Rouge-2 Rouge-L
MASS 18G / 160M 42.12 19.50 39.01
UniLM 16G / 340M 43.33 20.21 40.51
T5 large 750G / 340M 42.50 20.68 39.75
T5 xlarge 750G / 11B 43.52 21.55 40.69
BART 160G / 400M 44.16 21.28 40.90
PEGASUS (C4) 750G / 568M 43.90 21.20 40.76
PEGASUS (HugeNews) 3.8T / 568M 44.17 21.47 41.11
ENRIE-GEN base 16G / 110M 42.30 19.92 39.68
ENRIE-GEN large 16G / 340M 44.02 21.17 41.26
ENRIE-GEN large (430G) 430G / 340M 44.31 21.35 41.60

We preprocess the raw CNN/Daily Mail dataset following UniLM, the preprocessed data is avalilable at this CNN/Daily Mail.

Question Generation

  • SQuAD

The results on the SQuAD 1.1 dataset following the data split in [Du et al., 2017] are presented as follows:

Model BLEU-4 METEOR Rouge-L
SemQG 18.37 22.65 46.68
UniLM large (beam size=1) 22.12 25.06 51.07
ENRIE-GEN base (beam size=1) 22.28 25.13 50.38
ERNIE-GEN large (beam size=1) 24.03 26.31 52.36
ERNIE-GEN large (beam size=5) 25.40 26.92 52.84
ERNIE-GEN large (beam size=5) + (430G) 25.41 26.77 52.91

The results following the reversed dev-test data split in [Zhao et al., 2018] are presented as follows:

Model BLEU-4 METEOR Rouge-L
SemQG 20.76 24.20 48.91
UniLM large (beam size=1) 23.75 25.61 52.04
ENRIE-GEN base (beam size=1) 23.52 25.61 51.45
ERNIE-GEN large (beam size=1) 25.57 26.89 53.31
ERNIE-GEN large (beam size=5) 26.95 27.57 53.77
ERNIE-GEN large (beam size=5) + (430G) 27.05 27.43 53.83

*Note that we also report the results with higher beam size to 5.

The preprocessed data for question generation task can be downloaded from SQuAD.

Generative Dialogue Response

  • Personal-Chat

Comparison with current state-of-the-art results on the multi-turn conversations task (Persona-Chat) is presented as follows:

Model BLEU-1 BLEU-2 Distinct-1 Distinct-2
LIC 40.5 32.0 0.019 0.113
PLATO 45.8 35.7 0.012 0.064
PLATO w/o latent 40.6 31.5 0.021 0.121
ERNIE-GEN large 46.8 36.4 0.023 0.168

The training data can be downloaded from Personal-Chat.

Generative Question Answering

  • CoQA

Results of development set on CoQA task is presented as follows:

Model F1-score
Seq2Seq 27.5
PGNet 45.4
UniLM large 82.5
ERNIE-GEN large 84.5

We preprocess the raw CoQA dataset, the preprocessed data is avalilable at this CoQA-preprocessed.

Usage

Install PaddlePaddle

This code base has been tested with Paddle Fluid 1.7 with Python 2.7. Other dependency of ERNIE-GEN is listed in requirements.txt, you can install it by

pip install -r requirements.txt

Fine-tuning

Please update LD_LIBRARY_PATH about CUDA, cuDNN, NCCL2 before running ERNIE-GEN. We have put the parameter configurations of the above downstream tasks in config/. You can easily run finetuning through these configuration files. For example, you can finetune ERNIE-GEN base model on Gigaword by

MODEL="base"      # base or large or large_430g
TASK="gigaword"   # cnndm, coqa, gigaword, squad_qg or persona-chat
sh run_seq2seq.sh ./configs/${MODEL}/${TASK}_conf

The log of training and the evaluation results are in log/job.log.0. To finetune on your own task data, you can refer to the data format we provide for processing your data.

Our fine-tuning experiments are carried on 8 NVIDIA V100 (32GB) GPUs. If your GPU memory is not enough, you can reduce the batch size in the corresponding configuration file.

**NOTICE: ** The actual total batch size is equal to configured batch size * number of used gpus.

Employ Dynamic Computation Graph

The ERNIE-GEN code using dynamic graph is more concise and flexible, please refer to ERNIE-GEN Dygraph for specific use.

The ERNIE 1.0 is avaliable for Chinese Generation Tasks

The ERNIE-GEN code is compatible with ERNIE 1.0 model. After specifying the parameters related to the model and data in the configuration file, you can use ERNIE 1.0 to fine-tune chinese generation tasks.

Citation

You can cite the paper as below:

@article{xiao2020ernie-gen,
  title={ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation},
  author={Xiao, Dongling and Zhang, Han and Li, Yukun and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
  journal={arXiv preprint arXiv:2001.11314},
  year={2020}
}
1
https://gitee.com/baidu/ERNIE-GEN.git
git@gitee.com:baidu/ERNIE-GEN.git
baidu
ERNIE-GEN
ERNIE-GEN
master

搜索帮助