Skip to main content
MSU CSE Colloquium Series 2016-2017: Dr. Jinbo Xu Folding Proteins by Big Data and Deep Learning

Jinbo Xu
Associate Professor
Toyota Technological Institute at Chicago

Time: Friday, November 18, 2016, 11:00am
Location: EB 3105


Abstract:
Ab initio protein folding is one of the most challenging problems in computational biology. Recently protein contact prediction and contact-assisted folding that exploits residue co-variation in sequences has made some progress, but this method is not effective on proteins without a large number (>1000) of sequence homologs. This talk will present a deep learning method that predicts contacts by integrating both residue co-variation and conservation information through an ultra-deep neural network formed by two deep residual networks. This deep network can learn very complex sequence-contact relationship as well as long-range contact correlation from the very large protein sequence and relatively small structure databases and thus, yield much more accurate contact prediction and accordingly contact-assisted folding for proteins without many sequence homologs.

Tested on three datasets of 579 proteins, the top L long-range prediction accuracy (L is sequence length of a protein) of our method is 0.47, much better than two representative methods CCMpred and MetaPSICOV, which have accuracy only 0.21 and 0.30, respectively. In terms of the top L/10 long-range accuracy, our method is 0.77, while CCMpred and MetaPSICOV is 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds for 203 test proteins; while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively. Our contact-assisted folding also outperforms homology modeling. In particular, we can (ab initio) fold 208 of the 398 membrane proteins, while homology modeling can only do so for 10 of them. One interesting finding is that even if we do not train our deep learning models by any membrane proteins, they work equally well on membrane proteins. Finally, in the three weeks of blind test with the live benchmark CAMEO, our fully-automated contact prediction web server predicted correct folds for three hard targets with a new fold: a mainly-beta protein of 182 residues with only 250 sequence homologs, an alpha+beta protein of 125 residues with only 180 sequence homologs, and an alpha protein of 140 residues with 330 sequence homologs.

The contact prediction server implementing our method is available at http://raptorx.uchicago.edu/ContactMap/.
See http://biorxiv.org/content/early/2016/09/16/073239 for the technical and result details.

Biography:
Dr. Jinbo Xu is an associate professor at the Toyota Technological Institute at Chicago, a computer science research and educational institute located at the University of Chicago and a Senior Fellow at the Computational Institute of the University of Chicago. Dr. Xu's research lies in machine learning, optimization and computational biology (especially protein bioinformatics and biological network analysis). He has developed several popular bioinformatics programs such as the CASP-winning RaptorX (http://raptorx.uchicago.edu) for protein structure prediction and IsoRank for comparative analysis of protein interaction networks. Dr. Xu is the recipient of Alfred P. Sloan Research Fellowship and NSF CAREER award.

Host:
Dr. Hu Ding