-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to reproduce the PhoMT results with the HuggingFace Model #2
Comments
Are you using sacreBLEU? |
I'm using the sacreBLEU from HuggingFace Metric. Is this different than the sacreBLEU you used in the paper? If so, can you share the Command-line that you used with the sacreBLEU? |
Our training and inference stages (an example below) were originally performed by using fairseq. We then computed the detokenized and case-sensitive BLEU score using SacreBLEU (with the signature “BLEU+case.mixed+numrefs.1+smooth.exp+tok.1- 3a+version.1.5.1”).
|
@justinphan3110 I just had a bit of time to redo the evaluation. Using the simple script below, you'd obtain the scarebleu score at 44.2. import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer_en2vi = AutoTokenizer.from_pretrained(
"vinai/vinai-translate-en2vi", src_lang="en_XX"
)
model_en2vi = AutoModelForSeq2SeqLM.from_pretrained("vinai/vinai-translate-en2vi")
device_en2vi = torch.device("cuda")
model_en2vi.to(device_en2vi)
def translate_en2vi(en_texts: str) -> str:
input_ids = tokenizer_en2vi(en_texts, padding=True, return_tensors="pt").to(
device_en2vi
)
output_ids = model_en2vi.generate(
**input_ids,
decoder_start_token_id=tokenizer_en2vi.lang_code_to_id["vi_VN"],
num_return_sequences=1,
num_beams=5,
early_stopping=True
)
vi_texts = tokenizer_en2vi.batch_decode(output_ids, skip_special_tokens=True)
return vi_texts
with open("PhoMT-detokenization-test/test.en", "r") as input_file:
lines = [line.strip() for line in input_file.readlines()]
index = 0
writer = open("PhoMT-detokenization-test/test.vi_generated.v1", "w")
while index < len(lines):
texts = lines[index : index + 8]
outputs = translate_en2vi(texts)
print(outputs)
for output in outputs:
writer.write(output.strip() + "\n")
index = index + 8
writer.close()
import evaluate
references = [[line.strip()] for line in open("PhoMT-detokenization-test/test.vi", "r").readlines()]
predictions = [
line.strip() for line in open("PhoMT-detokenization-test/test.vi_generated.v1", "r").readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results) |
Evaluation for import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("VietAI/envit5-translation")
model = AutoModelForSeq2SeqLM.from_pretrained("VietAI/envit5-translation")
device = torch.device("cuda")
model.to(device)
def translate(vi_texts: str) -> str:
input_ids = tokenizer(vi_texts, padding=True, return_tensors="pt").to(device)
output_ids = model.generate(
**input_ids,
num_return_sequences=1,
num_beams=5,
early_stopping=True,
max_length=512
)
en_texts = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
return en_texts
with open("PhoMT-detokenization-test/test.vi", "r") as input_file:
lines = ["vi: " + line.strip() for line in input_file.readlines()]
index = 0
writer = open("PhoMT-detokenization-test/test.en_generated.vietai", "w")
while index < len(lines):
texts = lines[index : index + 8]
outputs = translate(texts)
print(outputs)
for output in outputs:
writer.write(output[4:].strip() + "\n")
index = index + 8
writer.close()
with open("PhoMT-detokenization-test/test.en", "r") as input_file:
lines = ["en: " + line.strip() for line in input_file.readlines()]
index = 0
writer = open("PhoMT-detokenization-test/test.vi_generated.vietai", "w")
while index < len(lines):
texts = lines[index : index + 8]
outputs = translate(texts)
print(outputs)
for output in outputs:
writer.write(output[4:].strip() + "\n")
index = index + 8
writer.close()
references = [
[line.strip()]
for line in open("PhoMT-detokenization-test/test.en", "r").readlines()
]
predictions = [
line.strip()
for line in open(
"PhoMT-detokenization-test/test.en_generated.vietai", "r"
).readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results)
references = [
[line.strip()]
for line in open("PhoMT-detokenization-test/test.vi", "r").readlines()
]
predictions = [
line.strip()
for line in open(
"PhoMT-detokenization-test/test.vi_generated.vietai", "r"
).readlines()
]
sacrebleu = evaluate.load("sacrebleu")
results = sacrebleu.compute(predictions=predictions, references=references)
print(results) |
Hi, I'm trying to reproduce the En2Vi result described in the paper on the PhoMT Test Set.
I used the generation type as showed in the example
Yet, the testing result I got from the HuggingFace model was around 42.2 (The result showed in the paper is 44.29).
Do you plan to release the eval code/pipeline to reproduce the result discussed in the paper?
The text was updated successfully, but these errors were encountered: