Mrinal Kanti Das, Chiranjib Bhattacharyya,
Trapit Bansal
CSA, Indian Institute of Science, Bangalore, India
This paper introduces a novel stick-breaking process namely ordered stick-breaking process (OSBP), where the atoms appear in order. The choice of weights on atoms of OSBP ensure two important things; (1)that probability of adding new atoms exponentially decrease with time and (2)OSBP, though non-exchangeable, admits pre- dictive probability functions (PPFs). We apply OSBP to Bayesian nonparametric (BNP) models and find that in a sequential setting where data is arriving in mini-batches OSBP forms a natu- ral prior over mini-batches, facilitating exchange of relevant statistical information across mini- batches by sharing the atoms of OSBP. One of the major contributions of this paper is SUMO, an MCMC algorithm, for solving the inference problem arising from applying OSBP to BNP models. SUMO uses the PPFs of OSBP to ob- tain a Gibbs-sampling based truncation-free al- gorithm which applies generally to BNP mod- els. For large scale inference problems exist- ing algorithms such as Particle Filtering(PF) are not practical and variational procedures such as TSVI(Wang & Blei, 2012) are the only alterna- tive. SUMO is thus an important addition to MCMC family which works well empirically. For Dirichlet process mixture model (DPMM), SUMO outperforms TSVI (Wang & Blei, 2012) on perplexity by 33% on 3 datasets with million data points, which are beyond the scope of PF, using only 3GB RAM.
The supplementary material contains necessary mathematical background, proofs of theorems, examples and some illustration of concepts.
Download The Supplementary Material
All the datasets are publicly available. Please see the paper for details.
Will be added soon.