Sounak Mondal

I am a final year Computer Science PhD candidate at Stony Brook University interested in research on on multimodal learning, particularly vision-language modeling. My PhD thesis focuses on using vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for modeling human visual attention (eye gaze). I am advised by Minh Hoai Nguyen, Dimitris Samaras and Gregory Zelinsky. I also collaborate with Niranjan Balasubramanian, Lester Loschky, Sidney D'Mello, and Sanjay Rebello.

Previously, I was an NLP Engineer at Samsung Research Institute, Bangalore, where I worked as part of the Natural Language Understanding team. Before that, I was an undergraduate student at the Department of Computer Science & Engineering, Jadavpur University, Kolkata, working on action detection and recognition in videos.

Résumé  /  Email  /  Google Scholar  /  LinkedIn

profile photo
News
  • [August 2025] I am actively looking for full-time industry research scientist / applied scientist opportunities. Please contact me via email or LinkedIn if you have any leads.
  • [June 2025] One paper accepted to ICCV 2025! This work was done during my internship at Meta Reality Labs Research (RL-R) in 2024.
  • [June 2024] I have joined Meta Reality Labs Research (RL-R), Burlingame as a Research Scientist Intern!
  • [June 2025] I successfully defended my thesis proposal!
  • [May 2025] I will serve as a reviewer for NeurIPS 2025.
  • [March 2025] I will serve as a reviewer for ICCV 2025.
  • [February 2025] One paper accepted to CVPR 2025!
  • [November 2024] I will serve as a reviewer for CVPR 2025 and TPAMI.
  • [October 2024] I will continue working at Meta Reality Labs Research (RL-R) remotely as a Part-Time Student Researcher.
  • [July 2024] One paper on gaze prediction for object referral, and one paper on gaze following accepted to ECCV 2024!
  • [June 2024] I have joined Meta Reality Labs Research (RL-R), Redmond as a Research Scientist Intern!
  • [February 2024] One paper accepted to CVPR 2024!
  • [November 2023] I will serve as a reviewer for CVPR 2024.
  • [March 2023] One paper accepted to CVPR 2023!
  • [March 2023] One preprint is available on arXiv.
  • [July 2022] One paper accepted to ECCV 2022!
Research

I am broadly interested in Computer Vision, Natural Language Processing and Multimodal AI (Vision-Language Modeling). My PhD research focuses on using vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for modeling human visual attention (eye gaze). For more details, refer to my résumé.

Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths
Sounak Mondal, Naveen Sendhilnathan, Ting Zhang, Yue Liu, Michael Proulx, Michael Iuzzolino, Chuan Qin, Tanya Jonker
ICCV, 2025
Poster

Few-shot Personalized Scanpath Prediction
Ruoyu Xue, Jingyi Xu, Sounak Mondal, Hieu Le, Gregory Zelinsky, Minh Hoai, Dimitris Samaras
CVPR, 2025
Paper

Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubramanian, Dimitris Samaras, Gregory Zelinsky, Minh Hoai
ECCV, 2024
arXiv / Project Page / Code / Dataset / Talk

Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
Qiaomu Miao, Alexandros Graikos, Jingwei Zhang, Sounak Mondal, Minh Hoai, Dimitris Samaras
ECCV, 2024
arXiv

Unifying Top-down and Bottom-up Scanpath Prediction using Transformers
Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras
CVPR, 2024
arXiv

Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, Minh Hoai
CVPR, 2023
arXiv / Supplement / Code / Talk

Target-absent Human Attention
Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras
ECCV, 2022
arXiv / Supplement / Code

Characterizing Target-absent Human Attention
Yupei Chen, Zhibo Yang, Souradeep Chakraborty, Sounak Mondal, Seoyoung Ahn, Dimitris Samaras, Minh Hoai, Gregory Zelinsky
CVPR Workshop, 2022
Paper / Supplement

ICAN: Introspective Convolutional Attention Network for Semantic Text Classification
Sounak Mondal, Suraj Modi*, Sakshi Garg*, Dhruva Das, Siddhartha Mukherjee
ICSC, 2020 (* indicates equal contribution)
Paper

Violent/Non-Violent Video Classification based on Deep Neural Network
Sounak Mondal, Soumyajit Pal, Sanjoy Kumar Saha, Bhabatosh Chanda
ICAPR, 2017
Paper

A Beta Distribution Based Novel Scheme for Detection of Changes in Crowd Motion
Soumyajit Pal, Sounak Mondal, Sanjoy Kumar Saha, Bhabatosh Chanda
ICVGIP Workshop, 2016
Paper


Webpage template from Jon Barron