Abstract:
Ab initio protein folding is one of the most challenging problems in
computational biology. Recently protein contact prediction and contact-assisted
folding that exploits residue co-variation in sequences has made some progress,
but this method is not effective on proteins without a large number (>1000) of
sequence homologs. This talk will present a deep learning method that predicts
contacts by integrating both residue co-variation and conservation information
through an ultra-deep neural network formed by two deep residual networks. This
deep network can learn very complex sequence-contact relationship as well as
long-range contact correlation from the very large protein sequence and
relatively small structure databases and thus, yield much more accurate contact
prediction and accordingly contact-assisted folding for proteins without many
sequence homologs.
Tested on three datasets of 579 proteins, the top L long-range prediction
accuracy (L is sequence length of a protein) of our method is 0.47, much better
than two representative methods CCMpred and MetaPSICOV, which have accuracy
only 0.21 and 0.30, respectively. In terms of the top L/10 long-range accuracy,
our method is 0.77, while CCMpred and MetaPSICOV is 0.47 and 0.59,
respectively. Ab initio folding using our predicted contacts as restraints can
yield correct folds for 203 test proteins; while that using MetaPSICOV- and
CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively.
Our contact-assisted folding also outperforms homology modeling. In particular,
we can (ab initio) fold 208 of the 398 membrane proteins, while homology
modeling can only do so for 10 of them. One interesting finding is that even if
we do not train our deep learning models by any membrane proteins, they work
equally well on membrane proteins. Finally, in the three weeks of blind test
with the live benchmark CAMEO, our fully-automated contact prediction web
server predicted correct folds for three hard targets with a new fold: a
mainly-beta protein of 182 residues with only 250 sequence homologs, an
alpha+beta protein of 125 residues with only 180 sequence homologs, and an
alpha protein of 140 residues with 330 sequence homologs.
The contact prediction server implementing our method is available at
http://raptorx.uchicago.edu/ContactMap/.
See
http://biorxiv.org/content/early/2016/09/16/073239 for the technical and result
details.
Biography:
Dr. Jinbo Xu is an associate professor at the Toyota Technological Institute at
Chicago, a computer science research and educational institute located at the
University of Chicago and a Senior Fellow at the Computational Institute of the
University of Chicago. Dr. Xu's research lies in machine learning,
optimization and computational biology (especially protein bioinformatics and
biological network analysis). He has developed several popular bioinformatics
programs such as the CASP-winning RaptorX (http://raptorx.uchicago.edu) for
protein structure prediction and IsoRank for comparative analysis of protein
interaction networks. Dr. Xu is the recipient of Alfred P. Sloan Research
Fellowship and NSF CAREER award.
Host:
Dr. Hu Ding
|