Mrinal Kanti Das

alt text 
  • Assistant Professor, CSE, IIT Palakkad.

  • Founding member of Palakkad Machine Learning Group.

  • Founding member of CREDS (Centre for Research and Education in Data Science).

  • Founding member and Participating investigator in the Technology Innovation Hub (TIH) on Intelligent Collaborative Systems and IIT Palakkad.

Email: mrinal at iitpkd dot ac dot in



  • PhD. Yumna Fatma Farooqi

  • MS. Rimmon Bhosala

  • MS. Gaurav Jain


  • SERB grant for Bayesian Deep Models for Efficient Privacy-Aware Learning in the Era of Big Data and Personalization, 2020

Past Positions

  1. Postdoc. Department of Computer Science, UMass, Amherst, MA, USA.
    Supervisor: Prof. Andrew McCallum
    August, 2016 – June, 2017.
    Project: Building next generation reviewing system.

  2. Postdoc. Department of Computer Science, Aalto University in Helsinki, Finland.
    Supervisor: Prof. Samuel Kaski.
    October, 2014 – July 2016.
    Project: Building differentially private Bayesian models for personalized medicine. Academy of Finland funded.

  3. PhD. Department of Computer Science and Automation, Indian Institute of Science, India.
    Supervisor: Prof. Chiranjib Bhattacharyya.
    PhD Thesis: Extensions and Applications of Stick-Breaking Process on Topic Models.


My fascination is to develop simple and novel mathematical models to address interesting and challenging practical problems.
Broadly, I am interested in


  • Topic models

  • Bayesian nonparametrics

  • Differentially private Bayesian models

  • Bayesian Deep Learning


  • Information retrieval and extraction from large unstructured text datasets.
    I like to explore different areas of application and in recent past I have worked with various types of text datasets like software projects, speech transcripts, multi-lingual corpora, news/blogs and comments. I have observed some challenging problems associated with these applications and developed novel mathematical models to solve them.

  • Preserving privacy with widespread digitization and automation in society.
    Privacy has become important after digitization of each and every information which can be personal, private or public. My interest is to explore, if it is possible to utilize the provable privacy guarantees of differential privacy within Bayesian framework despite maintaining accuracy. Specially in the field of personalized medicine, privacy protection has become mandatory due to sensitive genomic information. Similar privacy issues are prevalent in social media, advertisements and recommendation systems.


Organizing ScienceIE – SemEval 2017 task 10: “Extracting Keyphrases and Relations from Scientific Publications” with Isabelle Augenstein, Sebastian Riedel, and Andrew McCallum. See ScienceIE website for details. A WIRED article on ScienceIE.

Topic models are popular mathematical tools for analysing text datasets, where a corpus is a collection of documents. The state of art notion in topic models was to use single topic vector per document. I conceived the novel yet simple idea of using multiple topic vectors (MTV). We have observed phenomenal ability of MTV in (i) discovering subtle topics, (ii) modeling specific correspondence, (iii) modeling multi-glyphic topical correspondence, (iv) content driven user profiling for comment-worthy recommendations, (v) discovering taste of users in e-commerce portals. All of them helped in inventing novel models (i) subtle topic models (STM, in ICML, 2013), (ii) specific correspondence topic models (SCTM, in WSDM, 2014), (iii) multi-glyphic correspondence topic model (AAAI, 2015), (iv) collaborative correspondence topic models (CCTM in RecSys, 2015), (v) SOPER (CIKM, 2017).

I have also worked on Bayesian nonparametric models for learning very large scale (more than 8 million documents and 700 million tokens) datasets. There is NO method known using MCMC for such scale without using expensive parallel hardware. The technique is called SUMO and the work has been published at ICML, 2015.

Recently at Aalto university, we have been able to solve the problem of predicting drug sensitivity using gene expressions even after preserving privacy. I have conceived the novel concept of projecting outliers to tighter bounds without affecting non-outliers which has been the key in our method.