-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using pre-trained model vs de-novo transcript discovery #404
Comments
Hi Asher, Mostly you are correct that both ways will use the pretrained model, and that NDR cannot be calculated in denovo mode. However there are a few nuanced differences. When you provide the annotations, bambu can use that to assist in junction correction meaning the generated read classes are more likely to match annotations. It also helps in assigning gene ids, without which you may end up with amalgamated genes, which has impacts on how the model interprets the read classes. If you do have annotations, we would always recommend to run bambu with them even if you do not train the model using it. Kind Regards, |
Hi Andre, Thank you this is very helpful to know, and answers my question. My one follow-up question is -- does junction correction still happen in denovo mode and if so, how is the probability of a true splicing junction predicted? Cheers, |
Hi Asher, Yes junction correct still happens in de novo mode. It categorizes junctions are high or low confidence based on the number of reads that support them and distance and support of neighboring junctions. This categorization is done using a similar pretrained model used by the de novo mode. Low confidence junctions are corrected to high confidence junctions if they fall within 10bp. This is an aspect of Bambu we do want to delve into more when we find the opportunity. Hope this helps. |
Hi Andre, Thanks this is super helpful! I'm realizing I have a couple more follow-up questions related to this (I hope it's okay to ask here, happy to correspond in another way):
Thanks again for all your help. Cheers, |
Hi Asher, No problem replying here, as these are good questions, so others might have the same ones and can then find the answer here too.
That was a bit longer of an answer than I was expecting to write! Let me know if that came through clear. Kind Regards, |
Hi Andre, Great, glad to hear it! This is super helpful as usual. I do have a few follow-up questions:
Thanks again for all your help! Cheers, |
Hi Asher,
Kind Regards, |
Hi Andre,
Cheers, |
Hi Asher,
Best of luck with your analysis, |
Hi Andre, Great, thanks for explaining and thanks again for all of your advice! Cheers, |
Hi there,
Thanks again for creating this great tool. I was wondering if you could clarify if there is a difference in the way transcripts are assembled when running Bambu using a pretrained model with this set of commands:
se <- bambu(reads = test.bam, annotations = annotations, genome = fa.file, opt.discovery = list(fitReadClassModel = FALSE))
As compared to running Bambu in "de-novo discovery mode" with this set of commands:
novelAnnotations <- bambu(reads = test.bam, annotations = NULL, genome = fa.file, NDR = 0.5, quant = FALSE)
From my reading of your documentation, it seems that both use the pre-trained model, but in the example of commands for running a pre-trained model, the NDR is computed by comparing how many novel transcripts are discovered relative to the reference annotation, whereas in "de-novo discovery mode", we are not able to compute NDR so TPS is used as a threshold instead. So the only difference between these two modes is how transcripts are filtered after assembly. Is this correct?
Thanks again for all your help!
Cheers,
Asher
PS Sorry I still have yet to make much progress on the other open issue I have.
The text was updated successfully, but these errors were encountered: