Bayesian Logistic Regression for Text Classification and Mining
David D. Lewis Consulting
Date:
Time:
Place: 315 Ernst Bessey Hall
Host: R. Jin
Abstract: An advantage of
logistic regression over other discriminative learners is its explicit
probabilistic foundation. This allows
incorporating task knowledge through priors on parameters and model structure.
I will discuss our use in content-based text categorization and author
identification of 1) priors that lead to dense vs. sparse models or positive
vs. mixed-sign models, 2) priors that incorporate domain knowledge from
reference books and other texts, and 3) the use of polytomous (1-of-k)
dependent variables. Time permitting, I
will also discuss software engineering and numerical optimization issues in our
open-source Bayesian logistic regression programs, BBR and BMR, make some
comparisons with other logistic regression codes, and discuss some directions
for future improvements. (This is joint
work with David Madigan, Alex Genkin, Aynur Dayanik, and Dmitriy Fradkin of
Biography: Dave Lewis is a
consulting computer scientist (www.DavidDLewis.com) based in