Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove superfluous IMGT information in TCR beta model parms file #72

Open
zacmon opened this issue May 10, 2023 · 0 comments
Open

Remove superfluous IMGT information in TCR beta model parms file #72

zacmon opened this issue May 10, 2023 · 0 comments
Labels

Comments

@zacmon
Copy link

zacmon commented May 10, 2023

The default TCR beta model_parms.txt contains extraneous information from the IMGT where, ideally, only the name of the allele should be. Compare this to the model_parms.txt files for IGL, IGK, IGH, and TCR alpha. While this extra information doesn't present a problem for IGoR to my knowledge, it has consequential downstream effects. In particular, OLGA, and therefore SONIA or soNNia, requires only the name of the allele to precede the allele sequence in the model_params.txt, which is taken as the final_parms.txt file from a custom-trained IGoR model. Notably the default TCR beta OLGA model doesn't have this extra IMGT information.

Training a TCR beta model without supplying a model_parms.txt would ensue in the final_parms.txt of the custom model being roughly identical to the default model_parms.txt file with the extra IMGT information present (but with a different error rate). At this moment in time, OLGA does not raise an exception if the name of the allele is not the only piece of information preceding the allele sequence, so a user with a custom TCR beta model from IGoR would not know what the problem is. While there are fixes to be made in OLGA to ensure the user knows when parsing/input file errors are encountered, it would set up everyone for success if the superfluous IMGT information was removed from the default TCR beta mode_parms.txt.

I've attached what I believe should be the default TCR beta model_parms.txt, with the IMGT information removed for the alleles:
model_parms.txt.

Thanks and take good care,
Zach

@zacmon zacmon added the bug label May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant