Unusual Motifs in Biological Data

Laxmi Parida

Computational Biology Center

IBM Watson Research Center, Yorktown

Abstract

The research community is inundated with data such as the genome sequences of various organisms, microarray data and so on, of biological origin.

This data-volume is rapidly increasing and the process of understanding the data is lagging behind the process of acquiring it. The sheer enormity calls for a systematic approach to understanding using computational methods. As a first step towards making sense out of the data, we study the patterns in various guises and hypothesize that this reveals vital information towards greater understanding of biological systems

The talk will focus on various kinds of patterns in data that we identify and devise methods for unsupervised (automatic) discovery. We will particularly discuss two kinds of patterns (1) permutation motifs, and (2) cluster motifs. We will define the problems, present a non-statistical (or model-less) method of pruning the data. We will show the relationship between permutation motifs and a well known data structure in computer science (PQ Trees) and demonstrate its potential on man-rat data. We will also briefly discuss the work on cluster motifs in protein simulation data to extract state-to-state transitions

Bio

Laxmi Parida did her PhD in computational genomics from the Courant Institute of Mathematical Sciences, New York University in 1998. She has been with the Computational Biology Center, IBM T. J. Watson Research Center since 1998, working mainly in the area of bioinformatics and computational biology. Her primary focus has been on formulating and designing efficient algorithms to answer biological questions.