Brona Brejova, Daniel G. Brown, Tomas Vinar. Optimal Spaced Seeds for Homologous Coding Regions. Journal of Bioinformatics and Computational Biology, 1(4):595-610. January 2004. Early version appeared in CPM 2003.

Download preprint: 04seedscoding.pdf, 2657Kb

Download from publisher: http://dx.doi.org/10.1142/S0219720004000326

Related web page: http://www.bioinformatics.uwaterloo.ca/supplements/03seeds/

Bibliography entry: BibTeX

See also: early version

Abstract:

Optimal spaced seeds were developed as a method to increase sensitivity of 
local alignment programs similar to BLASTN. Such seeds have been used before 
in the program PatternHunter, and have given improved sensitivity and running 
time relative to BLASTN in genome�genome comparison. We study the problem of 
computing optimal spaced seeds for detecting homologous coding regions in 
unannotated genomic sequences. By using well-chosen seeds, we are able to 
improve the sensitivity of coding sequence alignment over that of TBLASTX, 
while keeping runtime comparable to BLASTN. We identify good seeds by first 
giving effective hidden Markov models of conservation in alignments of 
homologous coding regions. We give an efficient algorithm to compute the 
optimal spaced seed when conservation patterns are generated by these models. 
Our results offer the hope of improved gene finding due to fewer missed exons 
in DNA/DNA comparison, and more effective homology search in general, and may 
have applications outside of bioinformatics.