1 Star 0 Fork 0

shzgamelife / fairseq

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
README.md 2.90 KB
一键复制 编辑 原始数据 按行查看 历史

Deep Transformers with Latent Depth (Li et al., 2020)

https://arxiv.org/abs/2009.13102.

Introduction

We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection. As an extension of this framework, we propose a novel method to train one shared Transformer network for multilingual machine translation with different layer selection posteriors for each language pair.

Training a multilingual model with latent depth

Below is an example of training with latent depth in decoder for one-to-many (O2M) related languages. We use the same preprocessed (numberized and binarized) TED8 dataset as in Balancing Training for Multilingual Neural Machine Translation (Wang et al., 2020), which could be generated by the script the author provided.

lang_pairs_str="eng-aze,eng-bel,eng-ces,eng-glg,eng-por,eng-rus,eng-slk,eng-tur"
databin_dir=<path to binarized data>

fairseq-train ${databin_dir} \
  --user-dir examples/latent_depth/latent_depth_src \
  --lang-pairs "${lang_pairs_str}" \
  --arch multilingual_transformer_iwslt_de_en \
  --task multilingual_translation_latent_depth \
  --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  --share-encoders \
  --share-decoders \
  --decoder-langtok \
  --share-decoder-input-output-embed \
  --dropout 0.3 --attention-dropout 0.3 \
  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
  --lr-scheduler inverse_sqrt --stop-min-lr 1e-9 --warmup-init-lr 1e-7 --warmup-updates 8000 \
  --max-tokens 4096 --update-freq 1  \
  --lr 0.0015 \
  --clip-norm 1.0 \
  --seed 2 \
  --ddp-backend=legacy_ddp \
  --encoder-layers 12 \
  --decoder-layers 24 \
  --decoder-latent-layer \
  --sparsity-weight 0.1 \
  --anneal-updates 5000 \
  --soft-update 500  \
  --target-layers 12 \
  --share-weight 0.1

Inference command

lang_pairs_str="eng-aze,eng-bel,eng-ces,eng-glg,eng-por,eng-rus,eng-slk,eng-tur"
databin_dir=<path to binarized data>
model_path=<path to checkpoint>
src_lang=<source language to translate from>
tgt_lang=<target language to translate to>
gen_data=<name of data split, e.g. valid, test, etc>

fairseq-generate ${databin_dir} \
  --path ${model_path} \
  --task multilingual_translation_latent_depth \
  --decoder-latent-layer \
  --lang-pairs "${lang_pairs_str}" \
  -s ${src_lang} -t ${tgt_lang} \
  --gen-subset $gen_data \
  --scoring sacrebleu \
  --remove-bpe 'sentencepiece' \
  --lenpen 1.0 \
  --beam 5  \
  --decoder-langtok \
  --max-tokens 4096

Citation

@article{li2020deep,
  title={Deep Transformers with Latent Depth},
  author={Li, Xian and Stickland, Asa Cooper and Tang, Yuqing and Kong, Xiang},
  journal={arXiv preprint arXiv:2009.13102},
  year={2020}
}
1
https://gitee.com/shzgamelife/fairseq.git
git@gitee.com:shzgamelife/fairseq.git
shzgamelife
fairseq
fairseq
master

搜索帮助