1 Star 0 Fork 0

堕落的牧羊人 / MiniRBT

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

中文说明 | English



GitHub

在自然语言处理领域中,预训练语言模型(Pre-trained Language Models)已成为非常重要的基础技术。为了进一步促进中文信息处理的研究发展,哈工大讯飞联合实验室(HFL)基于自主研发的知识蒸馏工具TextBrewer,结合了全词掩码(Whole Word Masking)技术和知识蒸馏(Knowledge Distillation)技术推出中文小型预训练模型MiniRBT


中文LERT | 中英文PERT | 中文MacBERT | 中文ELECTRA | 中文XLNet | 中文BERT | 知识蒸馏工具TextBrewer | 模型裁剪工具TextPruner

查看更多哈工大讯飞联合实验室(HFL)发布的资源:https://github.com/iflytek/HFL-Anthology

内容导引

章节 描述
简介 介绍小型预训练模型所应用的技术方案
模型下载 提供了小型预训练模型的下载地址
快速加载 介绍了如何使用🤗Transformers快速加载模型
模型对比 提供了本目录中模型的参数对比
蒸馏参数 预训练蒸馏超参设置
中文基线系统效果 列举了部分中文基线系统效果
两段式蒸馏方法 列举了两段式蒸馏与一段式蒸馏的效果对比
预训练 说明预训练代码的使用方法
使用建议 提供了若干使用中文小型预训练模型的建议
FAQ 常见问题答疑
引用 本项目的技术报告
参考文献 参考文献

简介

目前预训练模型存在参数量大,推理时间长,部署难度大的问题,为了减少模型参数及存储空间,加快推理速度,我们推出了实用性强、适用面广的中文小型预训练模型MiniRBT,我们采用了如下技术:

  • 全词掩码技术:全词掩码技术(Whole Word Masking)是预训练阶段的训练样本生成策略。简单来说,原有基于WordPiece的分词方式会把一个完整的词切分成若干个子词,在生成训练样本时,这些被分开的子词会随机被mask(替换成[MASK];保持原词汇;随机替换成另外一个词)。而在WWM中,如果一个完整的词的部分WordPiece子词被mask,则同属该词的其他部分也会被mask。更详细的说明及样例请参考:Chinese-BERT-wwm,本工作中我们使用哈工大LTP作为分词工具。

  • 两段式蒸馏:相较于教师模型直接蒸馏到学生模型的传统方法,我们采用中间模型辅助教师模型到学生模型蒸馏的两段式蒸馏方法,即教师模型先蒸馏到助教模型(Teacher Assistant),学生模型通过对助教模型蒸馏得到,以此提升学生模型在下游任务的表现。并在下文中贴出了下游任务上两段式蒸馏与一段式蒸馏的实验对比,结果表明两段式蒸馏能取得相比一段式蒸馏更优的效果。

  • 构建窄而深的学生模型。相较于宽而浅的网络结构,如TinyBERT结构(4层,隐层维数312),我们构建了窄而深的网络结构作为学生模型MiniRBT(6层,隐层维数256和288),实验表明窄而深的结构下游任务表现更优异。

MiniRBT目前有两个分支模型,分别为MiniRBT-H256MiniRBT-H288,表示隐层维数256和288,均为6层Transformer结构,由两段式蒸馏得到。同时为了方便实验效果对比,我们也提供了TinyBERT结构的RBT4-H312模型下载。

我们会在近期提供完整的技术报告,敬请期待。

模型下载

模型简称 层数 隐层大小 注意力头 参数量 Google下载 百度盘下载
MiniRBT-h288 6 288 8 12.3M [PyTorch] [PyTorch]
(密码:7313)
MiniRBT-h256 6 256 8 10.4M [PyTorch] [PyTorch]
(密码:iy53)
RBT4-h312 (TinyBERT同大小) 4 312 12 11.4M [PyTorch] [PyTorch]
(密码:ssdw)

也可以直接通过huggingface官网下载模型(PyTorch & TF2):https://huggingface.co/hfl

下载方法:点击任意需要下载的模型 → 选择"Files and versions"选项卡 → 下载对应的模型文件。

快速加载

使用Huggingface-Transformers

依托于🤗transformers库,可轻松调用以上模型。

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
model = BertModel.from_pretrained("MODEL_NAME")

注意:本目录中的所有模型均使用BertTokenizer以及BertModel加载,请勿使用RobertaTokenizer/RobertaModel!

对应的MODEL_NAME 如下所示:

原模型 模型调用名
MiniRBT-H256 "hfl/minirbt-h256"
MiniRBT-H288 "hfl/minirbt-h288"
RBT4-H312 "hfl/rbt4-h312"

模型对比

模型结构细节与参数量汇总如下。

模型 层数 隐层大小 FFN大小 注意力头数 模型参数量 参数量(除去嵌入层) 加速比
RoBERTa 12 768 3072 12 102.3M (100%) 85.7M (100%) 1x
RBT6 (KD) 6 768 3072 12 59.76M (58.4%) 43.14M (50.3%) 1.7x
RBT3 3 768 3072 12 38.5M (37.6%) 21.9M (25.6%) 2.8x
RBT4-H312 4 312 1200 12 11.4M (11.1%) 4.7M (5.5%) 6.8x
MiniRBT-H256 6 256 1024 8 10.4M (10.2%) 4.8M (5.6%) 6.8x
MiniRBT-H288 6 288 1152 8 12.3M (12.0%) 6.1M (7.1%) 5.7x

括号内参数量百分比以原始base模型(即RoBERTa-wwm-ext)为基准

  • RBT的名字是RoBERTa三个音节首字母组成
  • RBT3:由RoBERTa-wwm-ext 3层进行初始化继续预训练得到,更详细的说明请参考:Chinese-BERT-wwm 小参数量模型
  • RBT6 (KD):助教模型,由RoBERTa-wwm-ext 6层进行初始化,通过对RoBERTa-wwm-ext蒸馏得到
  • MiniRBT-*:通过对助教模型RBT6(KD)蒸馏得到
  • RBT4-H312: 通过对RoBERTa直接蒸馏得到

蒸馏设置

模型 Batch Size Training Steps Learning Rate Temperature Teacher
RBT6 (KD) 4096 100KMAX512 4e-4 8 RoBERTa-wwm-ext
RBT4-H312 4096 100KMAX512 4e-4 8 RoBERTa-wwm-ext
MiniRBT-H256 4096 100KMAX512 4e-4 8 RBT6 (KD)
MiniRBT-H288 4096 100KMAX512 4e-4 8 RBT6 (KD)

中文基线系统效果

为了对比基线效果,我们在以下几个中文数据集上进行了测试。

经过学习率搜索,我们验证了小参数量模型需要更高的学习率和更多的迭代次数,以下是各数据集的学习率。

最佳学习率:

模型 CMRC 2018 DRCD OCNLI LCQMC BQ Corpus TNEWS ChnSentiCorp
RoBERTa 3e-5 3e-5 2e-5 2e-5 3e-5 2e-5 2e-5
* 1e-4 1e-4 5e-5 1e-4 1e-4 1e-4 1e-4

*代表所有小型预训练模型 (RBT3, RBT4-H312, MiniRBT-H256, MiniRBT-H288)

实验结果(开发集):

Task CMRC 2018 DRCD OCNLI LCQMC BQ Corpus TNEWS ChnSentiCorp
RoBERTa 87.3/68 94.4/89.4 76.58 89.07 85.76 57.66 94.89
RBT6 (KD) 84.4/64.3 91.27/84.93 72.83 88.52 84.54 55.52 93.42
RBT3 80.3/57.73 85.87/77.63 69.80 87.3 84.47 55.39 93.86
RBT4-H312 77.9/54.93 84.13/75.07 68.50 85.49 83.42 54.15 93.31
MiniRBT-H256 78.47/56.27 86.83/78.57 68.73 86.81 83.68 54.45 92.97
MiniRBT-H288 80.53/58.83 87.1/78.73 68.32 86.38 83.77 54.62 92.83

相对效果:

Task CMRC 2018 DRCD OCNLI LCQMC BQ Corpus TNEWS ChnSentiCorp
RoBERTa 100%/100% 100%/100% 100% 100% 100% 100% 100%
RBT6 (KD) 96.7%/94.6% 96.7%/95% 95.1% 99.4% 98.6% 96.3% 98.5%
RBT3 92%/84.9% 91%/86.8% 91.1% 98% 98.5% 96.1% 98.9%
RBT4-H312 89.2%/80.8% 89.1%/84% 89.4% 96% 97.3% 93.9% 98.3%
MiniRBT-H256 89.9%/82.8% 92%/87.9% 89.7% 97.5% 97.6% 94.4% 98%
MiniRBT-H288 92.2%/86.5% 92.3%/88.1% 89.2% 97% 97.7% 94.7% 97.8%
注意:为了保证结果的可靠性,对于同一模型,训练轮数分别设置为2、3、5、10,每组至少训练3遍(不同随机种子),汇报模型性能最大平均值。

两段式蒸馏对比

我们对两段式蒸馏(RoBERTa→RBT6(KD)→MiniRBT-H256)与一段式蒸馏(RoBERTa→MiniRBT-H256)做了比较。实验结果证明两段式蒸馏效果较优。

模型 CMRC 2018 OCNLI LCQMC BQ Corpus TNEWS
MiniRBT-H256(两段式) 77.97/54.6 69.11 86.58 83.74 54.12
MiniRBT-H256(一段式) 77.57/54.27 68.32 86.39 83.55 53.94

:该表中预训练模型经过3万步蒸馏,不同于中文基线效果中呈现的模型。

预训练

我们使用了TextBrewer工具包实现知识蒸馏预训练过程。完整的训练代码位于pretraining目录下。

代码结构

  • dataset:
    • train: 训练集
    • dev: 验证集
  • distill_configs: 学生模型结构配置文件
  • jsons: 数据集配置文件
  • pretrained_model_path:
    • ltp: ltp分词模型权重,包含pytorch_model.binvocab.txtconfig.json,共计3个文件
    • RoBERTa: 教师模型权重,包含pytorch_model.binvocab.txtconfig.json,共计3个文件
  • scripts: 模型初始化权重生成脚本
  • saves: 输出文件夹
  • config.py: 训练参数配置
  • matches.py: 教师模型和学生模型的匹配配置
  • my_datasets.py: 训练数据处理文件
  • run_chinese_ref.py: 生成含有分词信息的参考文件
  • train.py:预训练主函数
  • utils.py: 预训练蒸馏相关函数定义
  • distill.sh: 预训练蒸馏脚本

环境准备

预训练代码所需依赖库仅在python3.8,PyTorch v1.8.1下测试过,一些特定依赖库可通过pip install -r requirements.txt命令安装。

预训练模型准备

可从huggingface官网下载ltp分词模型权重与RoBERTa-wwm-ext预训练模型权重,并存放至${project-dir}/pretrained_model_path/目录下相应文件夹。

数据准备

对于中文模型,我们需先生成含有分词信息的参考文件,可直接运行以下命令:

python run_chinese_ref.py

因为预训练数据集较大,推荐生成参考文件后进行预处理,仅需运行以下命令:

python my_datasets.py

运行训练脚本

一旦你对数据做了预处理,进行预训练蒸馏就非常简单。我们在distill.sh中提供了预训练示例脚本。该脚本支持单机多卡训练,主要包含如下参数:

  • teacher_name or_path:教师模型权重文件
  • student_config: 学生模型结构配置文件
  • num_train_steps: 训练步数
  • ckpt_steps:每ckpt_steps保存一次模型
  • learning_rate: 预训练最大学习率
  • train_batch_size: 预训练批次大小
  • data_files_json: 数据集json文件
  • data_cache_dir:训练数据缓存文件夹
  • output_dir: 输出文件夹
  • output encoded layers:设置隐层输出为True
  • gradient_accumulation_steps:梯度累积
  • temperature:蒸馏温度
  • fp16:开启半精度浮点数训练

直接运行以下命令可实现MiniRBT-H256的预训练蒸馏:

sh distill.sh

提示:以良好的模型权重初始化有助于蒸馏预训练。在我们的实验中使用教师模型的前6层初始化助教模型RBT6(KD) ! 请参考scripts/init_checkpoint_TA.py来创建有效的初始化权重,并使用--student_pretrained_weights参数将此初始化用于蒸馏训练!

使用建议

  • 初始学习率是非常重要的一个参数,需要根据目标任务进行调整。
  • 小参数量模型的最佳学习率和RoBERT-wwm相差较大,所以使用小参数量模型时请务必调整学习率(基于以上实验结果,小参数量模型需要的初始学习率高,迭代次数更多)。
  • 在参数量(不包括嵌入层)基本相同的情况下,MiniRBT-H256的效果优于RBT4-H312,亦证明窄而深的模型结构优于宽而浅的模型结构
  • 在阅读理解相关任务上,MiniRBT-H288的效果较好。其他任务MiniRBT-H288MiniRBT-H256效果持平,可根据实际需求选择相应模型。

FAQ

Q: 这个模型怎么用?
A: 参考快速加载。使用方式和HFL推出的中文预训练模型系列如RoBERTa-wwm相同。

Q:为什么要单独生成含有分词信息的参考文件?
A: 假设我们有一个中文句子:天气很好,BERT将它标记为['天','气','很','好'](字符级别)。但在中文中天气是一个完整的单词。为了实现全词掩码,我们需要一个参考文件来告诉模型应该在哪个位置添加##,因此会生成类似于['天','##气','很','好']的结果。
注意:此为辅助参考文件,并不影响模型的原始输入(即与分词结果无关)。

Q: 为什么RBT6 (KD)在下游任务中的效果相较RoBERTa下降这么多? 为什么miniRBT-H256/miniRBT-H288/RBT4-H312效果这么低?如何提升效果?
A: 上文中所述RBT6 (KD)直接由RoBERTa-wwm-ext在预训练任务上蒸馏得到,然后在下游任务中fine-tuning,并不是通过对下游任务蒸馏得到。其他模型类似,我们仅做了预训练任务的蒸馏。如果希望进一步提升在下游任务上的效果,可在fine-tuning阶段再次使用知识蒸馏。

Q: 某某数据集在哪里下载?
A: 部分数据集提供了下载地址。未标注下载地址的数据集请自行搜索或与原作者联系获取数据。

引用

如果本项目中的模型或者相关结论有助于您的研究,请引用以下文章:https://arxiv.org/abs/2304.00717

@misc{yao2023minirbt,
      title={MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model}, 
      author={Xin Yao and Ziqing Yang and Yiming Cui and Shijin Wang},
      year={2023},
      eprint={2304.00717},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

参考文献

[1] Pre-training with Whole Word Masking for Chinese BERT (Cui et al., IEEE/ACM TASLP 2021)
[2] TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing (Yang et al., ACL 2020)
[3] CLUE: A Chinese Language Understanding Evaluation Benchmark (Xu et al., COLING 2020)
[4] TinyBERT: Distilling BERT for Natural Language Understanding (Jiao et al., Findings of EMNLP 2020)

关注我们

欢迎关注哈工大讯飞联合实验室官方微信公众号,了解最新的技术动态。

qrcode.png

问题反馈

如有问题,请在GitHub Issue中提交。

  • 在提交问题之前,请先查看FAQ能否解决问题,同时建议查阅以往的issue是否能解决你的问题。
  • 重复以及与本项目无关的issue会被[stable-bot](stale · GitHub Marketplace)处理,敬请谅解。
  • 我们会尽可能的解答你的问题,但无法保证你的问题一定会被解答。
  • 礼貌地提出问题,构建和谐的讨论社区。
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

暂无描述 展开 收起
Python 等 2 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Python
1
https://gitee.com/daijicheng/MiniRBT.git
git@gitee.com:daijicheng/MiniRBT.git
daijicheng
MiniRBT
MiniRBT
main

搜索帮助