1 Star 0 Fork 0

百度开源 / ERNIE-Gram

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding

For technical description of the algorithm, please see our paper:

ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding

Dongling Xiao, Yu-Kun Li, Han Zhang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Accepted by NAACL-HLT 2021

ERNIE-Gram GLUE SQuAD RACE

ERNIE-Gram is an explicitly n-gram masking and predicting method to eliminate the limitations of previous contiguously masking strategies and incorporate coarse-grained linguistic information into pre-training sufficiently. To model the intra-dependencies and inter-relation of coarse-grained linguistic information, n-grams are masked and predicted directly using explicit n-gram identities rather than contiguous sequences of n tokens. Furthermore, ERNIE-Gram employs a generator model to sample plausible n-gram identities as optional n-gram masks and predict them in both coarse-grained and fine-grained manners to enable comprehensive n-gram prediction and relation modeling.

Proposed Methods

We construct three novel methods to model the intra-dependencies and inter-relation of coarse-grained linguistic information:

  • Explicitly N-gram Masked Language Modeling: n-grams are masked with single [MASK] symbols, and predicted directly using explicit n-gram identities rather than sequences of tokens.
  • Comprehensive N-gram Prediction: masked n-grams are simultaneously predicted in coarse-grained (explicit n-gram identities) and fine-grained (contained token identities) manners.
  • Enhanced N-gram Relation Modeling: n-grams are masked with plausible n-grams identities sampled from a generator model, and then recovered to the original n-grams.

ernie-gram

Pre-trained Models

We release the checkpoints for ERNIE-Gram 16G and ERNIE-Gram 160G models which are pre-trained on the base-scale corpora (16GB text for BERT) and the large-scale corpora (160GB text for RoBERTa) respectively.

  • ERNIE-Gram 16G (lowercased | 12-layer, 768-hidden, 12-heads, 110M parameters)
  • ERNIE-Gram 160G (lowercased | 12-layer, 768-hidden, 12-heads, 110M parameters)

Fine-tuning on Downstream Tasks

We compare the performance of ERNIE-Gram with the existing SOTA pre-training models for natural language generation (MPNet, UniLMv2, ELECTRA, RoBERTa and XLNet) on several language understanding tasks, including GLUE benchmark (General Language Understanding Evaluation), SQuAD (Stanford Question Answering).

GLUE benchmark

The General Language Understanding Evaluation (GLUE) is a multi-task benchmark consisting of various NLU tasks, which contains 1) pairwise classification tasks like language inference MNLI, RTE), question answering (QNLI) and paraphrase detection (QQP, MRPC), 2) single-sentence classification tasks like linguistic acceptability (CoLA), sentiment analysis (SST-2) and 3) text similarity task (STS-B).

The results on GLUE are presented as follows:

Tasks MNLI QNLI QQP SST-2 CoLA MRPC RTE STS-B AVG
Metrics ACC ACC ACC ACC MCC ACC ACC PCC AVG
XLNet 86.8 91.7 91.4 94.7 60.2 88.2 74.0 89.5 84.5
RoBERTa 87.6 92.8 91.9 94.8 63.6 90.2 78.7 91.2 86.4
ELECTRA 88.8 93.2 91.5 95.2 67.7 89.5 82.7 91.2 87.5
UniLMv2 88.5 93.5 91.7 95.1 65.2 91.8 81.3 91.0 87.3
MPNet 88.5 93.3 91.9 95.4 65.0 91.5 85.2 90.9 87.7
ERNIE-Gram 89.1 93.2 92.2 95.6 68.6 90.7 83.8 91.3 88.1

Download the GLUE data by running this script and unpack it to some directory ${TASK_DATA_PATH}

After the dataset is downloaded, you should run sh ./utils/glue_data_process.sh $TASK_DATA_PATH to convert the data format for training. If everything goes well, there will be a folder named data created with all the converted datas in it.

SQuAD benchmark

The Stanford Question Answering (SQuAD) tasks are designed to extract the answer span within the given passage conditioned on the question. We conduct experiments on SQuAD1.1 and SQuAD2.0 by adding a classification layer on the sequence outputs of ERNIE-Gram and predicting whether each token is the start or end position of the answer span.

The results on SQuAD are presented as follows:

Tasks SQuADv1 SQuADv2
Metrics EM / F1 EM / F1
RoBERTa 84.6 / 91.5 80.5 / 83.7
XLNet - / - 80.2 / -
ELECTRA 86.8 / - 80.5 / -
MPNet 86.8 / 92.5 82.8 / 85.6
UniLMv2 87.1 / 93.1 83.3 / 86.1
ERNIE-Gram 87.2 / 93.2 84.1 / 87.1

The preprocessed data for SQuAD can be downloaded from SQuADv1 and SQuADv2. Please unpack them to ./data.

The preprocessed data for tasks involving long text can be downloaded from RACE, IMDB and AG'news. Please unpack them to ./data.

Usage

Install PaddlePaddle

This code base has been tested with PaddlePaddle 2.0.0+, You can install PaddlePaddle follow this site.

Fine-tuning

Please update LD_LIBRARY_PATH about CUDA, cuDNN, NCCL2 before running ERNIE-Gram. We have put the parameter configurations of the finetuning tasks in ./task_conf. You can easily run finetuning through these configuration files. For example, you can finetune ERNIE-Gram model on RTE by

TASK="RTE"   # MNLI, SST-2, CoLA, SQuADv1..., please see ./task_conf
MODEL_PATH="./ernie-gram-160g" #path for pre-trained models
sh run.sh ${TASK} ${MODEL_PATH}

The log of training and the evaluation results are in log/*job.log.0. To finetune on your own task data, you can refer to the data format we provide for processing your data.

Employ Dynamic Computation Graph

The ERNIE-Gram-zh code using dynamic graph is more concise and flexible, please refer to ERNIE-Gram Dygraph for specific use.

Citation

You can cite the paper as below:

@article{xiao2021ernie-gram,
  title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding},
  author={Xiao, Dongling and Li, Yukun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
  journal={arXiv preprint arXiv:2010.12148},
  year={2021}
}

Communication

  • ERNIE homepage
  • Github Issues: bug reports, feature requests, install issues, usage issues, etc.
  • QQ discussion group: 760439550 (ERNIE discussion group).
  • QQ discussion group: 958422639 (ERNIE discussion group-v2).
  • Forums: discuss implementations, research, etc.

空文件

简介

多粒度语言知识模型 展开 收起
Python 等 2 种语言
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/baidu/ERNIE-Gram.git
git@gitee.com:baidu/ERNIE-Gram.git
baidu
ERNIE-Gram
ERNIE-Gram
master

搜索帮助