Yanni Sun | Colloquium Series | MSU Computer Science and Engineering

CSE Colloquium Series

Designing Filtering Strategies for Faster Protein and RNA Annotation

Yanni Sun

Thursday, April 17, 2008
9:45 a.m.-10:45 a.m.
3105 Engineering

Abstract

With the availability of sequenced genomes for multiple species, an urgent task today is to decipher the biological functions of these sequences. Annotating genomic sequence function helps us understand the genetic background of complex diseases and thus aids drug design. The state-of-the-art method for function annotation is to compare a query sequence against database of sequences with known functions. However, the high computational cost of comparison algorithms and the sheer amount of genomic data pose a great challenge for genome function analysis. For example, it takes several CPU months to compare a bacterial genome with a database of noncoding RNA sequence families.

In this talk, I will present systematic filter design methods for accelerating protein and noncoding RNA function annotation. A filter excludes a large portion of the database that is unlikely to be related to the query and hence comparisons are only conducted on regions with functional similarity. The computational challenge lies in designing filters with optimal tradeoff between sensitivity and specificity from a large design space. I will first present our filters based on regular expression patterns and weight matrices for protein annotation. Then, I will focus on designing secondary structure profiles to accelerate noncoding RNA annotation. Our experiments demonstrate that, by using our designed filters, a protein sequence annotation program based on profile hidden Markov model can obtain 20 to 35 times speedup and a noncoding RNA annotation program based on stochastic context-free grammar can obtain over 100 times speedup on average. I will conclude with an overview of my research interests and plan of future works.