Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad end parameter. #327

Closed
fangbohao opened this issue Jan 30, 2023 · 8 comments
Closed

Bad end parameter. #327

fangbohao opened this issue Jan 30, 2023 · 8 comments

Comments

@fangbohao
Copy link

Hi there, I got the following error message when I ran 'agat_sp_filter_incomplete_gene_coding_models' on and GFF3 which produced from 'agat_sp_merge_annotations.pl'. Do you have any solution to fix the GFF3 file that produced from 'agat_sp_merge_annotations.pl'?

------------- EXCEPTION -------------
MSG: Bad end parameter (3). End must be less than the total length of sequence (total=2)
STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/Bio/PrimarySeq.pm:452
STACK Bio::Seq::subseq /usr/local/lib/perl5/site_perl/Bio/Seq.pm:633
STACK toplevel /usr/local/bin/agat_sp_filter_incomplete_gene_coding_models.pl:135

I attached this GFF3 file for your reference from this link: https://drive.google.com/file/d/1iTqn0mSbVyBwT5lsf_D7zdb8ok9LFc2k/view?usp=sharing
The reference genome could be found here: https://s3.amazonaws.com/genomeark/species/Haemorhous_mexicanus/bHaeMex1/assembly_curated/bHaeMex1.pri.cur.20220203.fasta.gz

Thanks!
Bohao

General (please complete the following information):

  • AGAT-v1.0.0
  • AGAT installation/use: singularity
  • OS: computation cluster

Screenshot 2023-01-30 at 3 54 56 PM

@Juke34
Copy link
Collaborator

Juke34 commented Jan 31, 2023

Hi, I don't think the problem is from agat_sp_merge_annotations.pl. It is more likely a problem in the underlying annotations. One of these annotations contain a feature with start/end location out of the fasta sequence file. Either the different annotation have not been done using the same fasta reference. In such case you should not merge the annotations.

@fangbohao
Copy link
Author

Hi Juke, thank you for pointing out this issue!

I realize that for some annotation files, I started with a soft-masked reference genome, and however for the rest starting with an un-masked genome, despite the same species. This might be the issue and will try to rerun things all from soft-masked genomes.

Thanks!
Bohao

@Juke34
Copy link
Collaborator

Juke34 commented Jan 31, 2023

Masked or not it should not change the boundaries of the annotated features, I mean out of the sequences. Could you try agat_sp_filter_incomplete_gene_coding_models on each file to see if it works. If it does then potentially agat_sp_merge_annotations.pl might introduce some errors.

@fangbohao
Copy link
Author

Hi Jacques, great suggestion! by running each file separately, I found which file has an issue. It was not an issue of the masked genome, but an issue of a specific annotation program.
Thanks!
Bohao

@fangbohao
Copy link
Author

Hi Jacques, could I have your help to check a GFF3 file that could not pass 'agat_sp_filter_incomplete_gene_coding_models.pl'? This GFF3 is the only one that could not work with 'agat_sp_filter_incomplete_gene_coding_models.pl'. The other files work very well. The error message is still "Bad end parameter".

I have checked the features in this GFF3 - none exceed the coordinates in the reference genome chromosomes. The GFF3 was converted from a BED file produced by TOGA program using 'agat_convert_bed2gff.pl'. The pipeline seems ok.

I attached the GFF3, BED file, and genome files below. Thank you very much for taking a look and pointing out the issue in this GFF3!

GFF3: https://drive.google.com/file/d/1xAlfLtkG-m3hjiB20QIT_moDNw770AyR/view?usp=sharing
BED (the original output from TOGA): https://drive.google.com/file/d/1xfr1-acc1IsuLmhzwziQoJIEFx9JQRv7/view?usp=sharing
Reference genome (1Gb): https://drive.google.com/file/d/16-QMCiT2IELr1nZDqBuNH8_XG3EyNb5e/view?usp=sharing

Thanks!
Bohao

@fangbohao fangbohao reopened this Feb 1, 2023
@Juke34
Copy link
Collaborator

Juke34 commented Feb 1, 2023

Could it be a similar case: https://www.biostars.org/p/9552902/

@Juke34
Copy link
Collaborator

Juke34 commented Feb 16, 2023

Right the problem is that the CDS is < 3 and we need at least 6 nt to be checked (3nt for a start codon, 3nt for a stop_codon)
e.g:

SUPER_11	TOGA	exon	2357069	2357070	.	-	.	ID=exon39600;Parent=4030
SUPER_11	TOGA	CDS	2357069	2357070	.	-	0	ID=CDS39600;Parent=4030
SUPER_11	TOGA	five_prime_UTR	2357071	2357070	.	-	.	ID=five_prime_UTR4030;Parent=4030

I will update the agat_sp_filter_incomplete_gene_coding_models.pl script to fix the problem

@fangbohao
Copy link
Author

Hi Juke, thank you very much for finding and solving this!
Bohao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants