Declarative Analytics: Applications and Tools
Dr. Shivakumar Vaithyanathan
IBM Chief Scientist for Big Data Analytics
Time: Friday, April 12, 2013, 11:00-noon.
Location: 3105 Engineering Building
Abstract:
Modern enterprises are performing complex analyses on
increasingly large data sets to drive business decisions. Tasks such as root
cause analysis from system logs and social media analytics for lead generation,
customer retention and digital marketing are rapidly gaining importance. These
applications consist of three major analytic phases: text analytics,
semi-structured data processing (joins, group-by, aggregation), and
statistical/predictive modeling. The size of the data-sets in conjunction with
the complexity of the analysis necessitates large-scale distributed processing
of the analytical algorithms. At IBM we are building tools and technologies to
support each of these analytic phases and in particular we are building
declarative languages for these phases. While the declarative nature of the
language abstracts away the need for programmer-optimization, the syntax of
these languages is designed to appeal to the corresponding communities. As an
example for statistical modeling, we expose a high-level language with syntax
similar to R -- a very popular statistical processing language.
In this talk I will
give an overview of some real-world big data applications we are currently
working on and use that to motivate the need for the three major phases
discussed above. I will then describe, in some detail, declarative systems for
text analytics and statistical modeling along with a discussion on speeds,
feeds and comparisons.
Speaker Bio
: Shivakumar
Vaithyanathan is the IBM Chief Scientist for Big Data Analytics and the
Department Manager of the Large Scale Analytics and Discovery Group at the IBM
Almaden Research Center. Since joining IBM in 1998, he has been involved in
multiple research areas. His department is currently involved in building
systems for scalable text analytics, enterprise search and large-scale machine
learning. Multiple technologies developed in his department currently ship with
several IBM products including IBM’s Big Data Products. Prior to IBM,
Shivakumar was part of the newly formed Altavista Group at Digital. Shivakumar
has co-authored more than 40 publications and was a invited keynote speaker at
the 2011 German Database Conference and 2011 ACM SiGIR Industrial Track.
Host: Dr Anil Jain