Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No results about the agat_sp_statistics.pl scripts #242

Closed
Huangyizhong opened this issue Apr 21, 2022 · 5 comments · Fixed by #243
Closed

No results about the agat_sp_statistics.pl scripts #242

Huangyizhong opened this issue Apr 21, 2022 · 5 comments · Fixed by #243

Comments

@Huangyizhong
Copy link

Hi,
I want to use the agat_sp_statistics.pl to summary the mouse gff3 file ,but it take long time and the results just contains the information about the c_gene_segment (Mus_musculus.GRCm39.105.gff3 and Mus_musculus.GRCm39.dna.toplevel.fa are all from the ensembl, http://ftp.ensembl.org/pub/release-105/gff3/mus_musculus/), should do something before add the files into the agat_sp_statistics.pl ?
Thanks advance!
Yizhong Huang

@Juke34
Copy link
Collaborator

Juke34 commented Apr 21, 2022

What version did you use?
I tried Mus_musculus.GRCm39.105.gff3 with AGAT v0.9.0, it tooks some times (935 seconds) but it worked fine.

@Juke34
Copy link
Collaborator

Juke34 commented Apr 21, 2022

Sorry I was talking about agat_convert_sp_gxf2gxf.pl, it will give a shot to agat_sp_statistics.pl

@Huangyizhong
Copy link
Author

Huangyizhong commented Apr 21, 2022 via email

@Juke34
Copy link
Collaborator

Juke34 commented Apr 22, 2022

Right There is an issue with a loop no well design that check overlap between features. The problem arise when there is a lot of topfeature or standalone features (e.g. biological_region in your case).
I will push a fix.

@Juke34 Juke34 mentioned this issue Apr 23, 2022
Juke34 pushed a commit that referenced this issue Apr 23, 2022
* add possibility to send verbosity to statistic lib

* stop topfeature/standalone feature analysis after first round

* compute genome size only once at the beginning. Reactivate stat related to genome size. Remove plurial. Use chimeric name for overlaping result (because on level2 type can use several level1 feature types). Analyse L2 without isoform outside the get_omniscient_statistics function.

* increment to 0.9.1

* Fix #242 - error due to inefficient loop over topfeatures and standalone feature.

* Re-activate genome coverage statistics

* remove plurial

* increase time efficiency - e.g. compute only once the genome size, only once the topfeatures and standalone feature statistics ...
@Juke34
Copy link
Collaborator

Juke34 commented Apr 25, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants