Sounak Mondal

I am a fourth year Computer Science PhD candidate at Stony Brook University interested in research on multimodal AI, specifically the intersection of Computer Vision and Natural Language Processing. My thesis is focused on exploring the interactions of vision and language influencing human attention. I am fortunate to be advised by Minh Hoai Nguyen (dissertation advisor), Dimitris Samaras and Gregory Zelinsky. I also collaborate with Niranjan Balasubramanian.

Previously, I was an NLP Engineer at Samsung Research Institute, Bangalore, where I worked as part of the Natural Language Understanding team. Before that, I was an undergraduate student at the Department of Computer Science & Engineering, Jadavpur University, Kolkata, working on action detection and recognition in videos.

Résumé  /  Email  /  Google Scholar  /  LinkedIn

profile photo
News
  • [July 2024] One paper on gaze prediction for object referral, and one paper on gaze following accepted to ECCV 2024!
  • [June 2024] I have joined Meta Reality Labs - Research (RLR), Redmond as a Research Scientist Intern!
  • [February 2024] One paper accepted to CVPR 2024!
  • [November 2023] I will serve as a reviewer for CVPR 2024.
  • [March 2023] One paper accepted to CVPR 2023!
  • [March 2023] One preprint is available on arXiv.
  • [July 2022] One paper accepted to ECCV 2022!
Research

I am broadly interested in Computer Vision, Natural Language Processing and Multimodal AI (Vision-Language Modeling). My PhD research focuses on modeling multimodal aspects of human attention as manifested through human eye gaze. For more details, refer to my résumé.

Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubramanian, Dimitris Samaras, Gregory Zelinsky, Minh Hoai
ECCV, 2024
arXiv

Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
Qiaomu Miao, Alexandros Graikos, Jingwei Zhang, Sounak Mondal, Minh Hoai, Dimitris Samaras
ECCV, 2024
arXiv

Unifying Top-down and Bottom-up Scanpath Prediction using Transformers
Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras
CVPR, 2024
arXiv

Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, Minh Hoai
CVPR, 2023
arXiv / Supplement / Code / Talk

Target-absent Human Attention
Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras
ECCV, 2022
arXiv / Supplement / Code

Characterizing Target-absent Human Attention
Yupei Chen, Zhibo Yang, Souradeep Chakraborty, Sounak Mondal, Seoyoung Ahn, Dimitris Samaras, Minh Hoai, Gregory Zelinsky
CVPR Workshop, 2022
Paper / Supplement

ICAN: Introspective Convolutional Attention Network for Semantic Text Classification
Sounak Mondal, Suraj Modi*, Sakshi Garg*, Dhruva Das, Siddhartha Mukherjee
ICSC, 2020 (* indicates equal contribution)
Paper

Violent/Non-Violent Video Classification based on Deep Neural Network
Sounak Mondal, Soumyajit Pal, Sanjoy Kumar Saha, Bhabatosh Chanda
ICAPR, 2017
Paper

A Beta Distribution Based Novel Scheme for Detection of Changes in Crowd Motion
Soumyajit Pal, Sounak Mondal, Sanjoy Kumar Saha, Bhabatosh Chanda
ICVGIP Workshop, 2016
Paper


Webpage template from Jon Barron