Graphs everywhere: Novel methods for summarization and natural language processing

University of Michigan

Date: Tuesday, November 1, 2005

Abstract: Graph-based representations turn out to be a very helpful tool for natural language processing and machine learning. Some recent successes include Brin and Page's Pagerank method for ranking Web pages and Zhu, Ghahramani, and Lafferty's semi-supervised learning methods using harmonic functions. In this talk, I will present a framework (and two demos) for natural language processing using random walks on graphs. The first part introduces the concept of lexical centrality, based on random walks on lexical similarity graphs. Lexical centrality is used to find the most important passages in a collection of textual documents. The second part will discuss some work in progress on semi-supervised learning with binary features with applications to natural language problems such as parsing. In both cases I will show state of the art results on competitive challenges. I will also show two publicly available demos that illustrate the concepts of the talk.

Biography: Dragomir R. Radev is an Associate Professor of Information, Electrical Engineering and Computer Science, and Linguistics at the University of Michigan, Ann Arbor . He holds a Ph.D. in Computer Science from Columbia University . Before joining Michigan, he was a Research Staff Member at IBM's TJ Watson Research Center in Hawthorne, NY . He is the author of more than 45 papers on text summarization, question answering, machine translation, text generation, information extraction, and information retrieval. His research group at Michigan, CLAIR (Computational Linguistics And Information Retrieval) includes 6 PhD students, 4 MS students, and a varying number of undergraduate students.

Dr. Radev's current research on probabilistic and link-based methods for exploiting very large textual repositories, graph-based methods for natural language processing, representing and acquiring knowledge of genome regulation, and semantic entity and relation extraction from Web-scale text document collections is supported by NSF and NIH. He serves on the HLT-NAACL advisory committee, was recently reelected as treasurer of NAACL, is a member of the editorial boards of JAIR and Information Retrieval, and is a four-time finalist at the ACM programming finals (as contestant in 1993 and as coach in 1995-1997).

Dragomir received a graduate teaching award at Columbia and recently, the U. of Michigan award for Outstanding Research Mentorship (UROP). He has worked in different capacities for AT&T, IBM, Bellcore, MITRE, Microsoft, and Yahoo! He likes to spend time with his wife Axinia and kids Laura (8) and Victoria (2). He also enjoys reading books and newspapers, movies, and walking around big cities. Additional information, including Dragomir's top movie list is available at http://tangra.si.umich.edu/clair.