-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few transcripts without strand infomation in the extended_annotations.gtf #403
Comments
Hi there, Novel spliced transcripts should be assigned a strand. I tried doing some tests on my side and was not able to replicate the observations you made. Could you please share your sessionInfo() and a minimal testable example (for example extracting just the part of the bam file that includes this novel gene) that reproduces this issue so that I can try resolve it? Does BambuGene1753 have any other isoforms other than BambuTx385? Or is this a single isoform gene? The other thing that is odd about this output is the final line does not have an exon_number but all exons in extended_annotations.gtf should. Is this just due to it being cut off from pasting it here, or is it missing in the gtf file? Kind Regards, |
Hi Andre, Thanks for your response. BambuGene1753 is a single isoform gene and has no other isoforms other than BambuTx385. And the last line of
And my sessionInfo() is,
Many thanks! |
Hi, Thanks for providing me these files. I had a more in-depth look at BambuTx2 and for this particular transcript there are two splice junctions, the first has the "CT-AC" motif and the second "GC-AG". The first motif is associated with canonical negative strand transcripts and the second with positive. As the reads in unstranded protocols themselves provide no strand information we attempt to guess them using splice junctions, using the direction which has the most junctions supporting them. Earlier I wrongly said that novel spliced transcripts should be assigned a strand, as in this case there is equal support for it being from the positive and negative strand which is why it is assigned '*' as we cannot be certain of the strand. Because of this ambiguity in stranding you may want to remove these transcripts from the reference annotations depending on what you plan to do with these annotations downstream. However there does appear to be convincing read support for these junctions, therefore if you were interested in this gene locus, experimental validation would be recommended to confirm what was seen with Nanopore sequencing. I hope this explains why there are spliced transcripts labeled with '*' and hopefully gives you some guidance on how to proceed. Kind Regards, |
Hi Andre, Thanks very much. Your explanation is very helpful to me. |
Hi team,
I run multiple samples simultaneously through Bambu
And there are a few multi-exon transcripts in novel and known genes lacking chain infomation labed as
.
in the output fileextended_annotations.gtf
like thisUsing
cut -f7 extended_annotations.gtf | sort | uniq -c
, I gotThere is no
.
in the reference genome annotation file. Can anyone explain why they are marked as.
? And how should I handle these transcripts?Any help is much appreciated!
The text was updated successfully, but these errors were encountered: