Syed Talal Wasim | University of Bonn

About Me

I am a PhD student, currently affiliated with the Computer Vision Group at the University of Bonn, Germany. I am supervised by Professor Dr. Jürgen Gall, and am working in the domain of Long-Term Multimodal Video Understanding.

Previously I was an Associate Researcher in computer vision, affiliated with the Intelligent Visual Analytics Lab (IVAL) at the Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI). I was supervised by Dr. Salman Khan.

I completed my master’s degree in Image Processing and Computer Vision (IPCV) funded by the Erasmus Mundus Joint Master’s Degree (EMJMD) scholarship program. During the master’s program, I was fortunate to have interned at the Empathic Computing Lab supervised by Dr. Mark Billinghurst. I completed my master’s thesis in the CVLAB at EPFL supervised by Dr. Mathieu Salzmann.

I hold an undergraduate degree in Electrical Engineering, with a minor in computer science, from Habib University in Karachi, Paksitan.

My previous website listing high-school, undrgraduate and graduate courses and projects can be found at talalwasim.weebly.com.

Research Interests

Computer Vision: image and video understanding, action anticipation, multimodal learning
Machine Learning: self-supervised learning, out-of-distribution generalization

News

[Feb. 2025] Three of our papers (Video-Panda, GroupMamba, and STING-BEE) have been accepted in CVPR 2025.
[Dec. 2024] Our paper titled "Efficient Video Object Segmentation via Modulated Cross-Attention Memory" is accepted in WACV 2025.
[Oct. 2024] New preprint released titled "Distillation-free Scaling of Large SSMs for Images and Videos".
[Mar. 2024] Our paper titled "VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding" is accepted in CVPR 2024.
[Feb. 2024] My student Muhammad Zain Yousuf's bachelor thesis titled "AR-VPT: Simple Auto-Regressive Prompts for Adapting Frozen ViTs to Videos" is accepted in VISAPP 2024.
[Jan. 2024] I started a PhD at the University of Bonn, Germany working on Long-Term Multimodal Video Understanding, under the supervision of Professor Dr. Juergen Gall.
[Oct. 2023] Our paper titled "Hardware Resilience Properties of Text-Guided Image Classifiers" is accepted in NeurIPS 2023.
[Aug. 2023] Our paper titled "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" is accepted in ICCV 2023.
[Aug. 2023] Our paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting" is accepted in ICCV 2023.
[Jun. 2023] Our paper titled "Toward Automatic Typography Analysis: Serif Classification and Font Similarities" is accepted in the Journal of Data Mining in Digital Humanities (JDMDH).
[Mar. 2023] Our paper titled "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" is accepted in CVPR 2023.
[Jun. 2022] Our paper titled "Using Facial Micro-Expressions in Combination With EEG and Physiological Signals for Emotion Recognition" is accepted in the Frontiers in Psychology.
[Apr. 2022] I started working as a researcher at MBZUAI. I was supervised by Dr. Salman Khan, working on multimodal video understanding.
[Jul. 2021] I was accepted in the ETH Robotics Summer School and Symposium.
[Jun. 2021] I defended my master's thesis and graduated from the IPCV master's program.
[May. 2021] Our paper on synthetic data for object detection is accepted to CVPR 2021 CV4Animals workshop.
[Feb. 2021] I started my master's thesis in the CVLAB at EPFL supervised by Dr. Mathieu Salzmann. I worked on automated typography analysis on figurative content.
[Jul. 2020] I started a remote research internship at the Empathic Computing Lab supervised by Dr. Mark Billinghurst.
[Sep. 2019] I started my master's degree in Image Processing and Computer Vision (IPCV) funded by the Erasmus Mundus Joint Master's Degree (EMJMD) scholarship program.
[Jun. 2019] I completed my undergraduate degree in Electrical Engineering with a Minor in computer science. Graduated first in class with the Dean's Medal.

Publications

CVPR

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Jinhui Yi*, Syed Talal Wasim*, Yanan Luo*, Muzammal Naseer and Juergen Gall

CVPR, 2025

PAPER CODE

CVPR

GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

Abdelrahman Shaker, Syed Talal Wasim, Salman Khan, and Fahad Shahbaz Khan

CVPR, 2025

PAPER CODE

CVPR

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

D. Velayudhan, A. Ahmed, M. Alansari, N. Gour, A. Behouch, T. Hassan, Syed Talal Wasim, N. Maalej, M. Naseer, J. Gall, M. Bennamoun, E. Damiani and N. Werghi

CVPR, 2025

PAPER CODE

WACV

Efficient Video Object Segmentation via Modulated Cross-Attention Memory

Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang and Fahad Shahbaz Khan

WACV, 2025

PAPER CODE

Under Review

Distillation-free Scaling of Large SSMs for Images and Videos

Hamid Suleman*, Syed Talal Wasim*, Muzammal Naseer and Juergen Gall

Under Review

PAPER CODE

CVPR

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang and Fahad Shahbaz Khan

CVPR, 2024

PAPER CODE

VISAPP

AR-VPT: Simple Auto-Regressive Prompts for Adapting Frozen ViTs to Videos

Muhammad Zain Yousuf, Syed Talal Wasim, Syed Nouman Hasany and Muhammad Farhan

VISAPP, 2024

PAPER CODE

NeurIPS

Hardware Resilience Properties of Text-Guided Image Classifiers

Syed Talal Wasim, Kabila Haile Soboka, Abdulrahman Mahmoud, Salman Khan, David Brooks and Gu-Yeon Wei

NeurIPS, 2023

PAPER CODE

ICCV

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

Syed Talal Wasim*, Muhammad Uzair Khattak*, Muzammal Naseer, Salman Khan, Mubarak Shah and Fahad Shahbaz Khan

ICCV, 2023

PAPER CODE

ICCV

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

Muhammad Uzair Khattak*, Syed Talal Wasim*, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang and Fahad Shahbaz Khan

ICCV, 2023

PAPER CODE

CVPR

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan and Mubarak Shah

CVPR, 2023

PAPER CODE

JDMDH

Toward Automatic Typography Analysis: Serif Classification and Font Similarities

Syed Talal Wasim, Romain Collaud, Lara Défayes, Nicolas Henchoz, Mathieu Salzmann and Delphine Ribes

Journal of Data Mining in Digital Humanities, 2023

PAPER CODE

Frontiers

Using Facial Micro-Expressions in Combination With EEG and Physiological Signals for Emotion Recognition

Nastaran Saffaryazdi, Syed Talal Wasim, Kuldeep Dileep, Alireza Farrokhi Nia, Suranga Nanayakkara, Elizabeth Broadbent and Mark Billinghurst

Frontiers in Psychology, 2022

PAPER

CVPRW

Sim-to-Real Transfer for Object Detection and Localization on Animals

Syed Talal Wasim, Syed N. Hasany, Kainat Abbasi, Huda Feroz, Anisa A. Ahmed, Mudasir H. Shaikh and Muhammad Farhan

CV4Animals Workshop, CVPR 2021

POSTER

Services

Journal Reviewers

Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Transactions on Neural Networks and Learning Systems (TNNLS)
Transactions on Image Processing (TIP)
Transactions on Machine Learning Research (TMLR)
International Journal of Computer Vision (IJCV)
Pattern Recognition

Conference Reviewers

Computer Vision (CVPR, ICCV, ECCV, WACV, ACCV)
Artificial Intelligence and Machine Learning (NeurIPS, ICLR, ICML, AAAI)

Project Supervision

Co-supervise undergraduate projects in computer vision at Habib University
Co-supervise high-school students in Pakistan for the Intel International Science and Engineering Fair (ISEF)