Sounak Mondal
I am a final year Computer Science PhD candidate at Stony Brook University interested in research on on multimodal learning, particularly vision-language modeling. My PhD thesis focuses on
using vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for
modeling human visual attention (eye gaze). I am advised by Minh Hoai Nguyen, Dimitris Samaras and Gregory Zelinsky. I also collaborate with Niranjan Balasubramanian, Lester Loschky, Sidney D'Mello, and Sanjay Rebello.
Previously, I was an NLP Engineer at Samsung Research Institute, Bangalore, where I worked as part of the Natural Language Understanding team. Before that, I was an undergraduate student at the Department of Computer Science & Engineering, Jadavpur University, Kolkata, working on action detection and recognition in videos.
Résumé  / 
Email  / 
Google Scholar  / 
LinkedIn
|
|
News
- [August 2025] I am actively looking for full-time industry research scientist / applied scientist opportunities. Please contact me via email or LinkedIn if you have any leads.
- [June 2025] One paper accepted to ICCV 2025! This work was done during my internship at Meta Reality Labs Research (RL-R) in 2024.
- [June 2024] I have joined Meta Reality Labs Research (RL-R), Burlingame as a Research Scientist Intern!
- [June 2025] I successfully defended my thesis proposal!
- [May 2025] I will serve as a reviewer for NeurIPS 2025.
- [March 2025] I will serve as a reviewer for ICCV 2025.
- [February 2025] One paper accepted to CVPR 2025!
- [November 2024] I will serve as a reviewer for CVPR 2025 and TPAMI.
- [October 2024] I will continue working at Meta Reality Labs Research (RL-R) remotely as a Part-Time Student Researcher.
- [July 2024] One paper on gaze prediction for object referral, and one paper on gaze following accepted to ECCV 2024!
- [June 2024] I have joined Meta Reality Labs Research (RL-R), Redmond as a Research Scientist Intern!
- [February 2024] One paper accepted to CVPR 2024!
- [November 2023] I will serve as a reviewer for CVPR 2024.
- [March 2023] One paper accepted to CVPR 2023!
- [March 2023] One preprint is available on arXiv.
- [July 2022] One paper accepted to ECCV 2022!
|
Research
I am broadly interested in Computer Vision, Natural Language Processing and Multimodal AI (Vision-Language Modeling). My PhD research focuses on using vision-language representation learning and multimodal foundation models (e.g., multimodal LLMs) for
modeling human visual attention (eye gaze). For more details, refer to my résumé.
|
|
Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths
Sounak Mondal,
Naveen Sendhilnathan,
Ting Zhang,
Yue Liu,
Michael Proulx,
Michael Iuzzolino,
Chuan Qin,
Tanya Jonker
ICCV, 2025
Poster
|
|
Few-shot Personalized Scanpath Prediction
Ruoyu Xue,
Jingyi Xu,
Sounak Mondal,
Hieu Le,
Gregory Zelinsky,
Minh Hoai,
Dimitris Samaras
CVPR, 2025
Paper
|
|
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal,
Seoyoung Ahn,
Zhibo Yang,
Niranjan Balasubramanian,
Dimitris Samaras,
Gregory Zelinsky,
Minh Hoai
ECCV, 2024
arXiv
/
Project Page
/
Code
/
Dataset
/
Talk
|
|
Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
Qiaomu Miao,
Alexandros Graikos,
Jingwei Zhang,
Sounak Mondal,
Minh Hoai,
Dimitris Samaras
ECCV, 2024
arXiv
|
|
Unifying Top-down and Bottom-up Scanpath Prediction using Transformers
Zhibo Yang,
Sounak Mondal,
Seoyoung Ahn,
Ruoyu Xue,
Gregory Zelinsky,
Minh Hoai,
Dimitris Samaras
CVPR, 2024
arXiv
|
|
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Sounak Mondal,
Zhibo Yang,
Seoyoung Ahn,
Dimitris Samaras,
Gregory Zelinsky,
Minh Hoai
CVPR, 2023
arXiv
/
Supplement
/
Code
/
Talk
|
|
Target-absent Human Attention
Zhibo Yang,
Sounak Mondal,
Seoyoung Ahn,
Gregory Zelinsky,
Minh Hoai,
Dimitris Samaras
ECCV, 2022
arXiv
/
Supplement
/
Code
|
|
Characterizing Target-absent Human Attention
Yupei Chen,
Zhibo Yang,
Souradeep Chakraborty,
Sounak Mondal,
Seoyoung Ahn,
Dimitris Samaras,
Minh Hoai,
Gregory Zelinsky
CVPR Workshop, 2022
Paper
/
Supplement
|
|
ICAN: Introspective Convolutional Attention Network for Semantic Text Classification
Sounak Mondal,
Suraj Modi*,
Sakshi Garg*,
Dhruva Das,
Siddhartha Mukherjee
ICSC, 2020 (* indicates equal contribution)
Paper
|
|
Violent/Non-Violent Video Classification based on Deep Neural Network
Sounak Mondal,
Soumyajit Pal,
Sanjoy Kumar Saha,
Bhabatosh Chanda
ICAPR, 2017
Paper
|
|
A Beta Distribution Based Novel Scheme for Detection of Changes in Crowd Motion
Soumyajit Pal,
Sounak Mondal,
Sanjoy Kumar Saha,
Bhabatosh Chanda
ICVGIP Workshop, 2016
Paper
|
|