1 Star 0 Fork 0

shzgamelife / fairseq

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
README.md 1.69 KB
一键复制 编辑 原始数据 按行查看 历史
Chau Tran 提交于 2020-10-22 11:29 . Fix fairseq/criss README

Cross-lingual Retrieval for Iterative Self-Supervised Training

https://arxiv.org/pdf/2006.09526.pdf

Introduction

CRISS is a multilingual sequence-to-sequnce pretraining method where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time.

Requirements:

Unsupervised Machine Translation

1. Download and decompress CRISS checkpoints
cd examples/criss
wget https://dl.fbaipublicfiles.com/criss/criss_3rd_checkpoints.tar.gz
tar -xf criss_checkpoints.tar.gz
2. Download and preprocess Flores test dataset

Make sure to run all scripts from examples/criss directory

bash download_and_preprocess_flores_test.sh
3. Run Evaluation on Sinhala-English
bash unsupervised_mt/eval.sh

Sentence Retrieval

1. Download and preprocess Tatoeba dataset
bash download_and_preprocess_tatoeba.sh
2. Run Sentence Retrieval on Tatoeba Kazakh-English
bash sentence_retrieval/sentence_retrieval_tatoeba.sh

Mining

1. Install faiss

Follow instructions on https://github.com/facebookresearch/faiss/blob/master/INSTALL.md

2. Mine pseudo-parallel data between Kazakh and English
bash mining/mine_example.sh

Citation

@article{tran2020cross,
  title={Cross-lingual retrieval for iterative self-supervised training},
  author={Tran, Chau and Tang, Yuqing and Li, Xian and Gu, Jiatao},
  journal={arXiv preprint arXiv:2006.09526},
  year={2020}
}
1
https://gitee.com/shzgamelife/fairseq.git
git@gitee.com:shzgamelife/fairseq.git
shzgamelife
fairseq
fairseq
master

搜索帮助