Skip to content

Automatic annotation of e. coli h112180280 strain (hpa sequence and assembly)

marina-manrique edited this page Jun 11, 2011 · 1 revision

The Oh no sequences! (Era7) automatic annotation of E. coli H112180280 strain is already available. Assembly provided by the Health Protection Agency (HPA) was used. This is a de novo assembly (not confirmed) of 454 reads. Get the assembly from the repos https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/tree/master/strains/H112180280/seqProject/HealthProtectionAgencyUK/assemblies/HealthProtectionAgencyUK

Annotation was done with BG7 using this set of reference proteins (137,063 proteins in total):

  • The representative Uniprot proteins corresponding to all Uniref90 clusters for all Escherichia coli proteins
  • All Uniprot proteins from organisms including in their name the terms “EHEC” or “EAEC”
  • All Uniprot proteins from bacteria that have in any Uniprot field the term “toxin”
  • All Uniprot proteins from bacteria that have in any Uniprot field “hemolysin”
  • All the proteins from Salmonella typhi, Yersinia pestis and Shigella dysenteriae

Results

5,916 genes were detected

  • 5,792 protein encoding genes
  • 124 RNA genes

4,912 out of the 5,792 (84.80%) protein encoding genes have canonical start and stop codon and haven´t either frame-shifts or intragenic stop codons.

615 out of the 5,792 (10.61%) protein encoding genes have some frameshifts or intragenic stop codon in their sequences, probably caused by inherent technology errors.

You can get the results of the annotation from the repos https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/tree/master/strains/H112180280/seqProject/HealthProtectionAgencyUK/annotations/era7bioinformatics/era7_HPA_H112180280_annotations