Dr. Nigel Ward
Professor
Department of Computer Science and Engineering
University of Texas at El Paso
Tuesday, May 2, 2017
11 AM - 12 PM
EB 3405 (Dean's Conference Room)
Abstract:
Speech conveys many things beyond content, including attitude, sentiment and related stances. These may be useful for filtering and information retrieval, for example for a mission planner organizing a relief effort after an earthquake, who needs to extract useful information from high-volume streams of spoken-language input. We identified 14 aspects of stance that are common in radio news stories, including indications of subjectivity, surprisingness, local relevance, and urgency. Finding that speakers convey stances with prosody, by means of specific configurations of pitch, energy, timing, and other features, we model this first at the level of 6-second patches. For each, we estimate the stances present using a k nearest-neighbors algorithm using 90 prosodic features. The patch estimates are then aggregated to estimate the overall stance of each news story. Testing on 3+ hours each of English, Mandarin, and Turkish news, we obtained good performance on most aspects, extracting on average 46% to 60% of the stance information present. We are currently seeking ways to further improve performance, including cross-language performance.
This work was done in collaboration with Jason Carlson and Olac Fuentes, and was supported by the DARPA Lorelei program.
Biography:
Nigel G. Ward received the Ph.D. in Computer Science from the University of California at Berkeley in 1991. He was on the engineering faculty of the University of Tokyo for ten years before joining the University of Texas at El Paso in 2002. He recently organized a panel on Prosodic Constructions in Dialog at IPrA 2015, gave a tutorial on Dialog Models and Dialog Phenomena at Interspeech 2015, and was a Fulbright scholar at Kyoto University in 2015-2016. Ward's research areas are at the intersection of spoken language and human-computer interaction. Current topics include modeling the subtle non-lexical and prosodic signals that enable inference of a speaker's dialog needs, intentions, and feelings at the sub-second level. Using a variety of methods he has applied these models to problems in speech recognition, information retrieval, dialog systems, language teaching, videoconferencing, and speech synthesis. His open-source Mid-level Prosodic Feature Toolkit is in use in academia, government, and industry.
Host:
Dr. Joyce Chai