[!138][RELEASE] Direct models for Simultaneous Speech Translation and…

… Automatic Subtitling (IWSLT 2023) # Which work do we release? Models and inference codes for the FBK participation to IWSLT 2023 SimulST and Subtitling tasks. # What changes does this release refer to? Commit 3d1408f0affffd9e898689623120228fe020d9fd
hlt-mt · Sep 27, 2023 · 1300af2 · 1300af2
1 parent 8cee29f
commit 1300af2
Show file tree

Hide file tree

Showing 5 changed files with 78 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -8,6 +8,7 @@ Dedicated README for each work can be found in the `fbk_works` directory.
  - [[INTERSPEECH 2023] **AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation**](fbk_works/ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md)
  - [[INTERSPEECH 2023] **Joint Speech Translation and Named Entity Recognition**](fbk_works/JOINT_ST_NER2023.md)
  - [[ACL 2023] **Attention as a Guide for Simultaneous Speech Translation**](fbk_works/EDATT_SIMULST_AGENT_ACL2023.md)
+ - [[IWSLT 2023] **Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023**](fbk_works/IWSLT_2023.md)
  - [**Reproducibility is Nothing Without Correctness: The Importance of Testing Code in NLP**](fbk_works/BUGFREE_CONFORMER.md)
 
  ### 2022

diff --git a/fbk_works/ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md b/fbk_works/ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md
@@ -28,7 +28,7 @@ The output will be saved in `--output`.
 
 ```bash
 simuleval \
-    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/simul_offline_alignatt.py \
+    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/v1_0/simul_offline_alignatt.py \
     --source ${SRC_LIST_OF_AUDIO} \
     --target ${TGT_FILE} \
     --data-bin ${DATA_ROOT} \

diff --git a/fbk_works/EDATT_SIMULST_AGENT_ACL2023.md b/fbk_works/EDATT_SIMULST_AGENT_ACL2023.md
@@ -2,7 +2,7 @@
 Code for the paper: ["Attention as a Guide for Simultaneous Speech Translation"](https://arxiv.org/pdf/2212.07850.pdf) published at ACL 2023.
 
 ## 📎 Requirements
-To run the agent, please make sure that [SimulEval v1.0.2](https://github.com/facebookresearch/SimulEval) is installed 
+To run the agent, please make sure that [SimulEval v1.0.2](https://github.com/facebookresearch/SimulEval) (commit [d1a8b2f](https://github.com/facebookresearch/SimulEval/commit/d1a8b2f0b13fe5204f3dcb4935cae9c73dbfc285)) is installed 
 and set `--port` accordingly.
 
 ## 📌 Pre-trained offline models
@@ -20,7 +20,7 @@ The output will be saved in `--output`.
 
 ```bash
 simuleval \
-    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/simul_offline_edatt.py \
+    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/v1_0/simul_offline_edatt.py \
     --source ${SRC_LIST_OF_AUDIO} \
     --target ${TGT_FILE} \
     --data-bin ${DATA_ROOT} \

diff --git a/fbk_works/IWSLT_2023.md b/fbk_works/IWSLT_2023.md
@@ -0,0 +1,71 @@
+# Direct Models for Simultaneous Translation and Automatic Subtitling (IWSLT2023)
+Models and inference scripts for the paper: [Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023](https://aclanthology.org/2023.iwslt-1.11/).
+
+## 💬 Simultaneous Speech Translation
+
+We release the offline ST model used for the FBK participation to the Simultaneous Speech Translation task: [**model folder**](https://fbk-my.sharepoint.com/:f:/g/personal/spapi_fbk_eu/EnnwDZFnXJdNjlhrKPqtNm8BHPz2d0E316Pp-yBy-dBpTg?e=Vhdvaw).
+
+### 🤖 Inference with AlignAtt and EDAtt
+Please install [SimulEval v1.1.0](https://github.com/facebookresearch/SimulEval/) (commit [3c19e1c](https://github.com/facebookresearch/SimulEval/commit/3c19e1c5e5deee043ab938d9b51996d5578b626c)) to run the evaluation.
+
+#### 📌 AlignAtt
+Set the parameters as described in [AlignAtt README](fbk_works/ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md) and 
+run the following code:
+```bash
+simuleval \
+    --agent-class examples.speech_to_text.simultaneous_translation.agents.v1_1.simul_offline_alignatt.AlignAttSTAgent \
+    --source ${SRC_LIST_OF_AUDIO} \
+    --target ${TGT_FILE} \
+    --data-bin ${DATA_ROOT} \
+    --config config_simul.yaml \
+    --model-path ${ST_SAVE_DIR}/avg7.pt --prefix-size 1 --prefix-token "nomt" \
+    --extract-attn-from-layer 3 --frame-num $FRAMES \
+    --source-segment-size 1000 \
+    --device cuda:0 \
+    --quality-metrics BLEU --latency-metrics LAAL AL ATD --computation-aware \
+    --output ${OUT_DIR}
+```
+
+#### 📌 EDAtt
+Set the parameters as described in [EDAtt README](fbk_works/EDATT_SIMULST_AGENT_ACL2023.md) and 
+run the following code:
+```bash
+simuleval \
+    --agent-class examples.speech_to_text.simultaneous_translation.agents.v1_1.simul_offline_edatt.EDAttSTAgent \
+    --source ${SRC_LIST_OF_AUDIO} \
+    --target ${TGT_FILE} \
+    --data-bin ${DATA_ROOT} \
+    --config config_simul.yaml \
+    --model-path ${ST_SAVE_DIR}/avg7.pt --prefix-size 1 --prefix-token "nomt" \
+    --extract-attn-from-layer 3 --frame-num 2 --attn-threshold ${ALPHA} \
+    --source-segment-size 1000 \
+    --device cuda:0 \
+    --quality-metrics BLEU --latency-metrics LAAL AL ATD --computation-aware \
+    --output ${OUT_DIR}
+```
+
+## 📺 Automatic Subtitling
+
+We release the Automatic Subtitling models for the FBK participation to the Automatic Subtitling task: 
+- [**en-de model folder**](https://fbk-my.sharepoint.com/:f:/g/personal/spapi_fbk_eu/Es7feuTJ0phEqt450DN7clYBa_GdFfoZxpL5rBf-ix4ubQ?e=fxb01K) 
+- [**en-es model folder**](https://fbk-my.sharepoint.com/:f:/g/personal/spapi_fbk_eu/Emn1YEgB2iBIq2LhMY4lNUcBnriFPTaUmHgWEXtJmM89xQ?e=UePzIQ)
+
+For instructions of use, please refer to the [Direct Speech Translation for Automatic Subtitling README](fbk_works/DIRECT_SUBTITLING.md).
+
+## 📍Citation
+```bibtex
+@inproceedings{papi-etal-2023-direct,
+    title = "Direct Models for Simultaneous Translation and Automatic Subtitling: {FBK}@{IWSLT}2023",
+    author = "Papi, Sara  and
+      Gaido, Marco  and
+      Negri, Matteo",
+    booktitle = "Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)",
+    month = jul,
+    year = "2023",
+    address = "Toronto, Canada (in-person and online)",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2023.iwslt-1.11",
+    doi = "10.18653/v1/2023.iwslt-1.11",
+    pages = "159--168",
+    }
+```
diff --git a/fbk_works/SIMULTANEOUS_OFFLINE_ST.md b/fbk_works/SIMULTANEOUS_OFFLINE_ST.md
@@ -2,7 +2,7 @@
 
 Agent for the paper: [Does Simultaneous Speech Translation need Simultaneous Models?](https://arxiv.org/abs/2204.03783)
 
-To run the agent, please make sure that [SimulEval](https://github.com/facebookresearch/SimulEval) is installed and set `--port` accordingly. 
+To run the agent, please make sure that [SimulEval 1.0.2](https://github.com/facebookresearch/SimulEval) (commit [d1a8b2f](https://github.com/facebookresearch/SimulEval/commit/d1a8b2f0b13fe5204f3dcb4935cae9c73dbfc285)) is installed and set `--port` accordingly. 
 
 Set `--source`, `--target`, and `--config` as described in the [Fairseq Simultaneous Translation repository](https://github.com/facebookresearch/fairseq/blob/main/examples/speech_to_text/docs/simulst_mustc_example.md#inference--evaluation).
 `--model-path` is the offline ST model checkpoint, 
@@ -12,7 +12,7 @@ The simultaneous output will be saved in `--output`.
 ## Fixed Word Detection
 ```bash
 simuleval \
-    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/simul_offline_waitk.py \
+    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/v1_0/simul_offline_waitk.py \
     --source ${SRC_LIST_OF_AUDIO} \
     --target ${TGT_FILE} \
     --data-bin ${DATA_ROOT} \
@@ -28,7 +28,7 @@ simuleval \
 ## Adaptive Word Detection
 ```bash
 simuleval \
-    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/simul_offline_waitk.py \
+    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/v1_0/simul_offline_waitk.py \
     --source ${SRC_LIST_OF_AUDIO} \
     --target ${TGT_FILE} \
     --data-bin ${DATA_ROOT} \