Mrinal Kanti Das

Projects (myself being the primary researcher)

  • Efficient differential privacy for drug sensitivity prediction
    Due to widespread digitization and automation, privacy of common people is put on risk. Blending differential privacy with Bayesian models is an elegant and promising solution to the threat. Although differential privacy is about a decade old now, but hardly any algorithm is parctically efficient. We make major breakthrough here considering the case of Bayesian linear regression we demonstrate the efficacy.

  • Ordered Stick-Breaking Prior for Sequential MCMC Inference of Bayesian Nonparametric Models Big data is now a days common place, and has drawn a great deal of research interest due to its inherent challenges. We take a different look to the problem by splitting big data into small chunks (minibatches) and process them sequentially. This allows small scale systems to process large information. Technical contribution is to develop a Bayesian nonparametric process named OSBP which leads to the scalable MCMC inference named SUMO.

  • Subtle topic models (STM)
    Subtle topics are prominently present neither in the corpus nor in any single document. It is hard to detect such topics due to their subtle presence motivating the name. However a subtle topic despite being rare may have significant information. We propose STM to discover such topics. more details.

  • Specific correspondence topic models (SCTM)
    Correspondence between a news article and a comment can be specific in nature i.e. the comment may be related only to a very small part of the article which may not be contiguous. Similar relationship can be found in paper-bibliography, image-tags etc. We call such relationships as specific correspondence and propose SCTM to model it. more details.

  • Context sensitive topic models (CSTM)
    Software concerns are latent intents of the programmer to develop the code. It has been observed that given the textual content of a software it is possible to infer the concerns automatically using topic models provided the code is written with meaningful identifiers. We define context of a statement as the statements around the given statement and propose to utilize the context to find the concern of a statement leading to CSTM. more details.

  • Classification of text documents without any labelled data
    It is expensive and some times near impossible to generate labelled training data given explosion in text information at present whether it be blogs, comments to news, software codes or websites. However classification is a basic step in many situations where the user is expected to have idea of the categories she wants the documents to be classified into. We propose to provide few descriptive words for each category and that can lead to excellent classification accuracy which can be very close to supervised methods like SVM which used labelled training data. more details.

  • Multi-lingual hier-archical topic models
    Hier-archy of topics are useful representation of any corpus, where topics near the root present general topics and topics away from the roor describe more specific topics. For example, sports will be in some higher level of the tree than that of football. Nested Chinese restaurant process (nCRP) is well known to model such hierarchy for mono-lingual scenario. I am working on extending nCRP for learning the hierarchy in multi-lingual scenario where each node in language 1 will have a correspondence node in language 2. more details.