Once Upon A TimeOrganizing ScienceIE – SemEval 2017 task 10: “Extracting Keyphrases and Relations from Scientific Publications” with Isabelle Augenstein, Sebastian Riedel, and Andrew McCallum. See ScienceIE website for details. A WIRED article on ScienceIE. Topic models are popular mathematical tools for analysing text datasets, where a corpus is a collection of documents. The state of art notion in topic models was to use single topic vector per document. I conceived the novel yet simple idea of using multiple topic vectors (MTV). We have observed phenomenal ability of MTV in (i) discovering subtle topics, (ii) modeling specific correspondence, (iii) modeling multi-glyphic topical correspondence, (iv) content driven user profiling for comment-worthy recommendations, (v) discovering taste of users in e-commerce portals. All of them helped in inventing novel models (i) subtle topic models (STM, in ICML, 2013), (ii) specific correspondence topic models (SCTM, in WSDM, 2014), (iii) multi-glyphic correspondence topic model (AAAI, 2015), (iv) collaborative correspondence topic models (CCTM in RecSys, 2015), (v) SOPER (CIKM, 2017). I also worked on Bayesian nonparametric models for learning very large scale (more than 8 million documents and 700 million tokens) datasets. There is NO method known using MCMC for such scale without using expensive parallel hardware. The technique is called SUMO and the work has been published at ICML, 2015. At Aalto university, we could solve the problem to some extent of predicting drug sensitivity using gene expressions even after preserving privacy. I have conceived the novel concept of projecting outliers to tighter bounds without affecting non-outliers which has been the key in our method. Past Positions
|