Ravi Kannan: Topic Modeling and NMF using Soft Clustering

Tuesday, September 22, 2015 - 4:15pm to 5:15pm
Refreshments: 
Light Refreshments at 4pm
Location: 
Patil/Kiva G449
Speaker: 
Ravi Kannan
Abstract:  The model fitting problem in Topic Modeling is a special case
of Non-Negative Matrix Factorization and both are computationally
hard. Soft clustering, where, each datapoint can belong fractionally
to several clusters is a useful tool for both. We make two main assumptions
on the data - that each datapoint belongs dominantly to one cluster and
each cluster has some dominant features and prove that an algorithm
with 3 natural steps - Thresholding, SVD and k-means - does the soft-clustering
in polynomial time. We use that to solve Topic Modeling and NMF with provable
error guarantees which are better than known algorithms.
We demonstrate good empirical performance of the algorithm
as well as reasonableness of the assumptions.
 
Joint work with: T. Bansal, C. Bhattacharyya, N. Goyal, J. Pani