Projects (myself being the primary researcher)
Efficient differential privacy for drug sensitivity prediction
Due to widespread digitization and automation, privacy of common people is put on risk. Blending differential privacy with Bayesian models is an elegant and promising solution to the threat. Although differential privacy is about a decade old now, but hardly any algorithm is parctically efficient. We make major breakthrough here considering the case of Bayesian linear regression we demonstrate the efficacy.
Ordered StickBreaking Prior for Sequential MCMC Inference of Bayesian Nonparametric Models
Big data is now a days common place, and has drawn a great deal of research interest due to its inherent challenges. We take a different look to the problem by splitting big data into small chunks (minibatches) and process them sequentially. This allows small scale systems to process large information. Technical contribution is to develop a Bayesian nonparametric process named OSBP which leads to the scalable MCMC inference named SUMO.
Specific correspondence topic models (SCTM)
Correspondence between a news article and a comment can be specific in nature i.e. the comment may be related only to a very small part of the article which may not be contiguous. Similar relationship can be found in paperbibliography, imagetags etc. We call such relationships as specific correspondence and propose SCTM to model it. more details.
Classification of text documents without any labelled data
It is expensive and some times near impossible to generate labelled training data given explosion in text information at present whether it be blogs, comments to news, software codes or websites. However classification is a basic step in many situations where the user is expected to have idea of the categories she wants the documents to be classified into. We propose to provide few descriptive words for each category and that can lead to excellent classification accuracy which can be very close to supervised methods like SVM which used labelled training data. more details.
Multilingual hierarchical topic models
Hierarchy of topics are useful representation of any corpus, where topics near the root present general topics and topics away from the roor describe more specific topics. For example, sports will be in some higher level of the tree than that of football. Nested Chinese restaurant process (nCRP) is well known to model such hierarchy for monolingual scenario. I am working on extending nCRP for learning the hierarchy in multilingual scenario where each node in language 1 will have a correspondence node in language 2. more details.
