Tuesday, September 22, 2015 - 4:15pm to 5:15pm

Refreshments:

Light Refreshments at 4pm

Location:

Patil/Kiva G449

Speaker:

Ravi Kannan

Seminar group:

Abstract: The model fitting problem in Topic Modeling is a special case

of Non-Negative Matrix Factorization and both are computationally

hard. Soft clustering, where, each datapoint can belong fractionally

to several clusters is a useful tool for both. We make two main assumptions

on the data - that each datapoint belongs dominantly to one cluster and

each cluster has some dominant features and prove that an algorithm

with 3 natural steps - Thresholding, SVD and k-means - does the soft-clustering

in polynomial time. We use that to solve Topic Modeling and NMF with provable

error guarantees which are better than known algorithms.

We demonstrate good empirical performance of the algorithm

as well as reasonableness of the assumptions.

of Non-Negative Matrix Factorization and both are computationally

hard. Soft clustering, where, each datapoint can belong fractionally

to several clusters is a useful tool for both. We make two main assumptions

on the data - that each datapoint belongs dominantly to one cluster and

each cluster has some dominant features and prove that an algorithm

with 3 natural steps - Thresholding, SVD and k-means - does the soft-clustering

in polynomial time. We use that to solve Topic Modeling and NMF with provable

error guarantees which are better than known algorithms.

We demonstrate good empirical performance of the algorithm

as well as reasonableness of the assumptions.

Joint work with: T. Bansal, C. Bhattacharyya, N. Goyal, J. Pani