-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/add t5 support #131
Feat/add t5 support #131
Conversation
@pommedeterresautee I'm still testing triton but i need your review for the other parts (conversion onnx and tensorRt) and i'd like to see if those conversions can run on your machine without errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only the first part of the review because thing are still in progress. triton, etc...
@@ -26,17 +26,24 @@ | |||
|
|||
|
|||
def infer_classification_pytorch( | |||
model: PreTrainedModel, run_on_cuda: bool | |||
model: PreTrainedModel, run_on_cuda: bool, generate_text: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's misleading that we have generate_text in a function infer_classification_pytorch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe could be removed depending on my comment below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it and I added a dedicated inference funcion for text generation with appropriate parameters:
def infer_text_generation(
model: PreTrainedModel, run_on_cuda: bool, min_length: int, max_length: int, num_beams: int
) -> Callable[[Dict[str, torch.Tensor]], torch.Tensor]:
src/transformer_deploy/convert.py
Outdated
|
||
if run_on_cuda: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this line after model_pytorch.eval()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/transformer_deploy/convert.py
Outdated
|
||
timings = {} | ||
|
||
def get_pytorch_infer(model: PreTrainedModel, cuda: bool, task: str): | ||
if commands.generative_model == "t5": | ||
return infer_classification_pytorch(model=model, run_on_cuda=cuda, generate_text=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be used for benchmark and accuracy check. Do we need to do it using .generate
and not simply call the model one time like it was done before ? Like for GPT2
src/transformer_deploy/convert.py
Outdated
if task in ["classification", "text-generation", "token-classification", "question-answering"]: | ||
return infer_classification_pytorch(model=model, run_on_cuda=cuda) | ||
if task == "embedding": | ||
return infer_feature_extraction_pytorch(model=model, run_on_cuda=cuda) | ||
raise Exception(f"unknown task: {task}") | ||
|
||
if run_on_cuda: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line duplicated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/transformer_deploy/convert.py
Outdated
) | ||
if commands.generative_model == "t5": | ||
encoder_onnx_path = os.path.join(commands.output, "t5-encoder") + "/model.onnx" | ||
tensorrt_encoder_path = os.path.join(commands.output, "t5-encoder") + "/model.plan" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's better to declare variable close to where they are used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/transformer_deploy/convert.py
Outdated
results = ( | ||
inference_onnx_binding(model_onnx=ort_model, inputs=inputs, device=commands.device) | ||
if commands.generative_model != "t5" | ||
else ort_model.generate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, do we need generate ? it's for accuracy check
inference: Callable[[torch.Tensor], torch.Tensor], | ||
torch_type: torch.dtype = torch.float32, | ||
): | ||
super().__init__(config, device, encoder_path, decoder_path, torch_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of those arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
return model_decoder(input_ids=decoder_input_ids, encoder_hidden_states=encoder_hidden_states) | ||
|
||
|
||
def are_equal(a: torch.Tensor, b: torch.Tensor, atol: float = fp16_default_tolerance) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we need only one in such function in the whole project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this one, because the other function is in the demo folder
for (o_dec_k, o_dev_v, o_enc_k, o_enc_v), (p_dec_k, p_dev_v, p_enc_k, p_enc_v) in zip( | ||
out_decoder_onnx_no_cache.past_key_values, out_decoder_pytorch.past_key_values | ||
): | ||
are_equal(a=o_dec_k, b=p_dec_k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do all quality check ? The test in convert isn't enough ?
In the other convert_to_onnx we don't do this. So if we keed something like that it should be extracted IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the quality check (I think the test in conversion is enough)
return inputs | ||
|
||
|
||
def decoder_onnx_inference( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code duplicated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
…eploy into feat/add-t5-support
…eploy into feat/add-t5-support
…eploy into feat/add-t5-support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything good for me, tested locally.
Add t5 support in convert script (transformer-deploy).
WIP