Code for "Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval" (Findings of ACL 2024).
- python >= 3.7
- pytorch >= 1.10.0
- faiss-gpu >= 1.7.3
- sacremoses == 0.0.41
- sacrebleu == 1.5.1
- fastBPE == 0.1.0
- streamlit >= 1.13.0
- scikit-learn >= 1.0.2
- seaborn >= 0.12.1
You can install this toolkit by
git clone https://github.com/DeepLearnXMU/knn-mt-dr.git
cd knn-mt-dr
pip install --editable ./
Note: Installing faiss with pip is not suggested. For stability, we recommand you to install faiss with conda
CPU version only:
conda install faiss-cpu -c pytorch
GPU version:
conda install faiss-gpu -c pytorch # For CUDA
You can prepare pretrained models and dataset by executing the following command:
cd knnbox-scripts
bash prepare_dataset_and_model.sh
cp ../pretrain-models/wmt19.de-en/dict.en.txt ../pretrain-models/wmt19.de-en/fairseq-vocab.txt
use bash instead of sh. If you still have problem running the script, you can manually download the wmt19 de-en single model and multi-domain de-en dataset, and put them into correct directory (you can refer to the path in the script).
You can build datastore by executing the following command:
cd skip-knn-mt
python get_output_projection.py
bash build_datastore.sh
bash build_valid_datastore.sh
bash prepare_dataset.sh
You can train model and inference by executing the following command:
bash train.sh
bash skip_inference.sh
kNN-box: the codebase we built upon. This repository is an open-source toolkit to build kNN-MT models. We greatly appreciate the excellent foundation provided by the authors.
You can refer to kNN-box for more detailed information.
If you found this repository helpful in your research, please consider citing:
@inproceedings{gao-etal-2024-efficient,
title = "Efficient $k$-Nearest-Neighbor Machine Translation with Dynamic Retrieval",
author = "Gao, Yan and
Cao, Zhiwei and
Miao, Zhongjian and
Yang, Baosong and
Liu, Shiyu and
Zhang, Min and
Su, Jinsong",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
year = "2024",
}