Brona Brejova, Daniel G. Brown, Tomas Vinar. Optimal Spaced Seeds for Homologous Coding Regions. Journal of Bioinformatics and Computational Biology, 1(4):595-610. January 2004. Early version appeared in CPM 2003.
Download preprint: 04seedscoding.pdf, 2657Kb
Download from publisher: http://dx.doi.org/10.1142/S0219720004000326
Related web page: http://www.bioinformatics.uwaterloo.ca/supplements/03seeds/
Bibliography entry: BibTeX
See also: early version
Abstract:
Optimal spaced seeds were developed as a method to increase sensitivity of local alignment programs similar to BLASTN. Such seeds have been used before in the program PatternHunter, and have given improved sensitivity and running time relative to BLASTN in genome�genome comparison. We study the problem of computing optimal spaced seeds for detecting homologous coding regions in unannotated genomic sequences. By using well-chosen seeds, we are able to improve the sensitivity of coding sequence alignment over that of TBLASTX, while keeping runtime comparable to BLASTN. We identify good seeds by first giving effective hidden Markov models of conservation in alignments of homologous coding regions. We give an efficient algorithm to compute the optimal spaced seed when conservation patterns are generated by these models. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.