Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add t5 support #131

Merged
merged 79 commits into from
Jun 1, 2023
Merged

Feat/add t5 support #131

merged 79 commits into from
Jun 1, 2023

Conversation

ayoub-louati
Copy link
Contributor

@ayoub-louati ayoub-louati commented Aug 23, 2022

Add t5 support in convert script (transformer-deploy).

WIP

@ayoub-louati ayoub-louati marked this pull request as ready for review August 29, 2022 13:43
@ayoub-louati
Copy link
Contributor Author

@pommedeterresautee I'm still testing triton but i need your review for the other parts (conversion onnx and tensorRt) and i'd like to see if those conversions can run on your machine without errors.

@ayoub-louati ayoub-louati self-assigned this Sep 1, 2022
@ayoub-louati ayoub-louati added the enhancement New feature or request label Sep 1, 2022
Copy link

@gaetansnl gaetansnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only the first part of the review because thing are still in progress. triton, etc...

@@ -26,17 +26,24 @@


def infer_classification_pytorch(
model: PreTrainedModel, run_on_cuda: bool
model: PreTrainedModel, run_on_cuda: bool, generate_text: bool = False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's misleading that we have generate_text in a function infer_classification_pytorch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe could be removed depending on my comment below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it and I added a dedicated inference funcion for text generation with appropriate parameters:

def infer_text_generation(
    model: PreTrainedModel, run_on_cuda: bool, min_length: int, max_length: int, num_beams: int
) -> Callable[[Dict[str, torch.Tensor]], torch.Tensor]:


if run_on_cuda:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this line after model_pytorch.eval()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


timings = {}

def get_pytorch_infer(model: PreTrainedModel, cuda: bool, task: str):
if commands.generative_model == "t5":
return infer_classification_pytorch(model=model, run_on_cuda=cuda, generate_text=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be used for benchmark and accuracy check. Do we need to do it using .generate and not simply call the model one time like it was done before ? Like for GPT2

if task in ["classification", "text-generation", "token-classification", "question-answering"]:
return infer_classification_pytorch(model=model, run_on_cuda=cuda)
if task == "embedding":
return infer_feature_extraction_pytorch(model=model, run_on_cuda=cuda)
raise Exception(f"unknown task: {task}")

if run_on_cuda:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line duplicated ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

)
if commands.generative_model == "t5":
encoder_onnx_path = os.path.join(commands.output, "t5-encoder") + "/model.onnx"
tensorrt_encoder_path = os.path.join(commands.output, "t5-encoder") + "/model.plan"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's better to declare variable close to where they are used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

results = (
inference_onnx_binding(model_onnx=ort_model, inputs=inputs, device=commands.device)
if commands.generative_model != "t5"
else ort_model.generate(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, do we need generate ? it's for accuracy check

inference: Callable[[torch.Tensor], torch.Tensor],
torch_type: torch.dtype = torch.float32,
):
super().__init__(config, device, encoder_path, decoder_path, torch_type)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose of those arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

return model_decoder(input_ids=decoder_input_ids, encoder_hidden_states=encoder_hidden_states)


def are_equal(a: torch.Tensor, b: torch.Tensor, atol: float = fp16_default_tolerance) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we need only one in such function in the whole project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this one, because the other function is in the demo folder

for (o_dec_k, o_dev_v, o_enc_k, o_enc_v), (p_dec_k, p_dev_v, p_enc_k, p_enc_v) in zip(
out_decoder_onnx_no_cache.past_key_values, out_decoder_pytorch.past_key_values
):
are_equal(a=o_dec_k, b=p_dec_k)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do all quality check ? The test in convert isn't enough ?
In the other convert_to_onnx we don't do this. So if we keed something like that it should be extracted IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the quality check (I think the test in conversion is enough)

return inputs


def decoder_onnx_inference(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code duplicated ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@pommedeterresautee pommedeterresautee removed their request for review September 19, 2022 14:09
Copy link
Contributor

@jonathlela jonathlela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything good for me, tested locally.

@ayoub-louati ayoub-louati merged commit 6b88e24 into main Jun 1, 2023
@ayoub-louati ayoub-louati deleted the feat/add-t5-support branch June 1, 2023 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants