Skip to content

Automatic annotation of bgi v3 assembly of e. coli ty 2482 genome

marina-manrique edited this page Jun 11, 2011 · 1 revision

Era7 automatic annotation of the third BGI assembly of the E. coli TY-2482 strain genome (get the assembly file here ftp://ftp.genomics.org.cn/pub/Ecoli_TY-2482/Escherichia_coli_TY-2482.scaffold.20110610.fa.gz) is already available

This annotation was done with BG7. The set of reference proteins had 137,063 proteins:

  • The representative Uniprot proteins corresponding to all Uniref90 clusters for all Escherichia coli proteins
  • All Uniprot proteins from organisms including in their name the terms “EHEC” or “EAEC”
  • All Uniprot proteins from bacteria that have in any Uniprot field the term “toxin”
  • All Uniprot proteins from bacteria that have in any Uniprot field “hemolysin”
  • All the proteins from Salmonella typhi, Yersinia pestis and Shigella dysenteriae

Results

5,936 genes were detected

  • 5,806 protein encoding genes
  • 130 RNA genes

4,881 out of the 5,806 (84.06%) protein encoding genes have canonical start and stop codon and haven´t either frame-shifts or intragenic stop codons.

533 out of the 5,806 (9.18%) protein encoding genes have some frameshifts or intragenic stop codon in their sequences, probably caused by inherent technology errors.

You can get the results of the annotation from the repos https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/tree/master/strains/TY2482/seqProject/BGI/annotations/era7bioinformatics/BGI_V3