Mining and Searching Graphs in Biological Databases
Dr Jiawei
Han
Database and Information Systems Research Lab
Department of Computer Science
University
of Illinois
at Urbana-Champaign
Date: Friday, March
17, 2006
Time: 11:00am-12:00pm
Place: 1225
Engineering
Host: P.N. Tan
Abstract:
Recent
research on pattern discovery has progressed from mining frequent itemsets and sequences to mining structured patterns
including trees, lattices, and graphs. As a general data structure, graph can
model complicated relations among data with wide applications in
bioinformatics. However, mining and
searching large graphs in graph databases is challenging due to the presence of
an exponential number of frequent subgraphs.
In this
talk, we present our recent progress on developing efficient and scalable methods
for mining and searching of graphs in large biological databases. We first introduce
gSpan, an efficient method for mining all the
frequent graph patterns in graph databases, by extension of a depth-first
frequent pattern growth method, developed in our previous research. Then we introduce CloseGraph,
an efficient method for mining closed frequent graph patterns. A graph g is closed in a database if there exists no proper supergraph of g
that has the same support as g. After that,
we introduce a graph indexing method, gIndex and a
graph approximate searching method, grafil, both
taking advantages of frequent graph mining to construct a compact but highly
effective graph index and perform similarity search with such indexing structures. These methods not only facilitate mining and
querying graph patterns in massive biological datasets but also claim broad
applications in other fields, including DB/OS systems and software engineering.
Biography:
Jiawei Han, Professor,
Department of Computer Science, University of Illinois
at Urbana-Champaign. He
has been working on research into data mining, data warehousing, database
systems, data streams, spatial databases, and biological databases, with over
300 journal and conference publications. He has chaired or served in the program committees of major conferences
and workshops in data mining and database systems. Besides serving on the editorial boards for
several journals, he is the founding Editor-In-Chief of ACM Transactions on Knowledge Discovery from Data. He is an ACM Fellow and has received ACM
SIGKDD Innovations Award (2004) and IEEE CS Technical Achievement Award (2005).