Automation of functional annotation of genomes and transcriptomes

  • Luis Fernando Cadavid Gutiérrez National University
  • José Nelson Pérez Castillo Universidad Distrital Francisco José de Caldas
  • Cristian Alejandro Rojas Quintero Universidad Distrital Francisco José de Caldas
  • Nelson Enrique Vera Parra Universidad Distrital Francisco José de Caldas
Palabras clave: Annotator, Functional annotation, Gene ontology, High Throughput Sequencing. (en_US)
Palabras clave: Annotator, Functional annotation, Gene ontology, High Throughput Sequencing. (es_ES)

Resumen (en_US)

Functional annotation represents a means to investigate and classify genes and transcripts according to their function within a given organism.

This paper presents Massive Automatic Functional Annotation (MAFA - Web), which is an online free bioinformatics tool that allows automation, unification and optimization of functional annotation processes when dealing with large volumes of sequences. MAFA includes tools for categorization and statistical analysis of associations between sequences. We have evaluated the performance of MAFA with a set of data taken from Diploria-Strigosatranscriptome (using an 8-core computer, namely E7450 @ 2,40GHZ with 256GB RAM), processing rates of 2,7 seconds per sequence (using Uniprot database) and 50,0 seconds per sequence (using Non-redundant from NCBI database) were found together with particular RAM usage patterns that depend on the database being processed (1GB for Uniprot database and 9GB for Non-redundant database).. Aviability: https://github.com/BioinfUD/MAFA.

 

Resumen (es_ES)

Functional annotation represents a means to investigate and classify genes and transcripts according to their function within a given organism.

This paper presents Massive Automatic Functional Annotation (MAFA - Web), which is an online free bioinformatics tool that allows automation, unification and optimization of functional annotation processes when dealing with large volumes of sequences. MAFA includes tools for categorization and statistical analysis of associations between sequences. We have evaluated the performance of MAFA with a set of data taken from Diploria-Strigosatranscriptome (using an 8-core computer, namely E7450 @ 2,40GHZ with 256GB RAM), processing rates of 2,7 seconds per sequence (using Uniprot database) and 50,0 seconds per sequence (using Non-redundant from NCBI database) were found together with particular RAM usage patterns that depend on the database being processed (1GB for Uniprot database and 9GB for Non-redundant database). Aviability: https://github.com/BioinfUD/MAFA.

 

Descargas

La descarga de datos todavía no está disponible.

Biografía del autor/a

Luis Fernando Cadavid Gutiérrez, National University

Medicine Doctor, Ecology and Evolutionary Biology PhD., IEI Research Group - Teacher / Researcher, Institute of Genetics and Department of Biology, National University, Bogotá. 

José Nelson Pérez Castillo, Universidad Distrital Francisco José de Caldas

System Engineer, Informatics PhD., GICOGE Research Group - Director of Center for Scientific Research and Development, Universidad Distrital Francisco José de Caldas, Bogotá. 

Cristian Alejandro Rojas Quintero, Universidad Distrital Francisco José de Caldas

System Engineer Student, GICOGE Research Group - Student, Universidad Distrital Francisco José de Caldas, Bogotá. 

Nelson Enrique Vera Parra, Universidad Distrital Francisco José de Caldas

Electronic Engineer, Information Sciences and Communication M.Sc., GICOGE Research Group - Teacher / Researcher, Universidad Distrital Francisco José de Caldas, Bogotá. 

Referencias

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., &Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403-410.

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., ...& Sherlock, G. (2000). Gene Ontology: tool for the unification of biology. Nature genetics, 25(1), 25-29.

Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., ...&Yeh, L. S. L. (2005). The universal protein resource (UniProt). Nucleic acids research, 33(suppl 1), D154-D159

Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., &Madden, T. L. (2009). BLAST+: architecture and applications. BMC bioinformatics, 10(1), 421.

Carbon, S., Ireland, A., Mungall, C. J., Shu, S., Marshall, B., & Lewis, S. (2009). AmiGO: online access to ontology and annotation data. Bioinformatics, 25(2), 288-289.

Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., ...& de Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422-1423.

Metzker, M. L. (2010). Sequencing technologies—the next generation. Nature Reviews Genetics, 11(1), 31-46.

Pruitt, K. D., Tatusova, T., &Maglott, D. R. (2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research, 35(suppl 1), D61-D65.

Cómo citar
Cadavid Gutiérrez, L. F., Pérez Castillo, J. N., Rojas Quintero, C. A., & Vera Parra, N. E. (2014). Automation of functional annotation of genomes and transcriptomes. Tecnura, 18, 90-96. https://doi.org/10.14483/22487638.9246
Publicado: 2014-12-01
Sección
Investigación

Artículos más leídos del mismo autor/a

1 2 3 > >>