Ordered Stick-Breaking Prior for Sequential MCMC Inference of Bayesian Nonparametric Models

Mrinal Kanti Das, Chiranjib Bhattacharyya, Trapit Bansal
CSA, Indian Institute of Science, Bangalore, India

Download Paper

Abstract

This paper introduces a novel stick-breaking process namely ordered stick-breaking process (OSBP), where the atoms appear in order. The choice of weights on atoms of OSBP ensure two important things; (1)that probability of adding new atoms exponentially decrease with time and (2)OSBP, though non-exchangeable, admits pre- dictive probability functions (PPFs). We apply OSBP to Bayesian nonparametric (BNP) models and find that in a sequential setting where data is arriving in mini-batches OSBP forms a natu- ral prior over mini-batches, facilitating exchange of relevant statistical information across mini- batches by sharing the atoms of OSBP. One of the major contributions of this paper is SUMO, an MCMC algorithm, for solving the inference problem arising from applying OSBP to BNP models. SUMO uses the PPFs of OSBP to ob- tain a Gibbs-sampling based truncation-free al- gorithm which applies generally to BNP mod- els. For large scale inference problems exist- ing algorithms such as Particle Filtering(PF) are not practical and variational procedures such as TSVI(Wang & Blei, 2012) are the only alterna- tive. SUMO is thus an important addition to MCMC family which works well empirically. For Dirichlet process mixture model (DPMM), SUMO outperforms TSVI (Wang & Blei, 2012) on perplexity by 33% on 3 datasets with million data points, which are beyond the scope of PF, using only 3GB RAM.

Supplementary material

The supplementary material contains necessary mathematical background, proofs of theorems, examples and some illustration of concepts.

Download The Supplementary Material

Datasets

All the datasets are publicly available. Please see the paper for details.

Code

Will be added soon.