Bio:
Dr. Jean-Marc Odobez (MSc 1990, PhD 1994) graduated from the École Nationale Supérieure des Télécommunications de Bretagne (ENSTBr, France) in 1990 and obtained his Ph.D. from Rennes University for his dissertation at INRIA, focusing on dynamic scene analysis for scene understanding. After five years as an Assistant Professor at the University of Le Mans, France, he joined Idiap, where he is now the Head of the Perception & Activity Understanding Group. He is also an Adjunct Faculty at EPFL in the School of Engineering and a member of the Electrical Engineering Doctoral Committee (EDEE). His research interests lie in the design of multimodal perception systems, integrating computer vision, statistical machine learning, deep learning, and social sciences to advance activity and behavior recognition, as well as human-human and human-robot interaction modeling. His work has applications in human health assessment, social robotics, and media content analysis. Dr. Odobez has published over 50 journal articles and 160 peer-reviewed conference papers in his field. He has been the principal investigator of more than 18 European and Swiss projects and has led 10 technology transfer projects with SMEs. He holds several patents in computer vision and is the co-founder of Klewel SA (www.klewel.ch) and Eyeware SA (eyeware.tech), a company specializing in eye tracking and attention modeling. He is an IEEE member and serves as an Associate Editor for the Machine Vision and Applications journal. Additionally, he regularly serves as an Area Chair for conferences such as ICMI, ICCV, CVPR, and ECCV.

Title: Looking Through Their Eyes: Decoding Gaze and Attention in Everyday Life
Abstract: Beyond words, non-verbal behaviors (NVB) are known to play important roles in face-to-face interactions. However, decoding NVB is a challenging problem that involves both extracting subtle physical NVB cues and mapping them to higher-level communication behaviors or social constructs. Gaze, in particular, serves as a fundamental indicator of attention and interest, influencing communication and social signaling across various domains such as human-computer interaction, robotics, and medical diagnosis, notably in Autism Spectrum Disorders (ASD) assessment.
However, estimating others' visual attention, encompassing their gaze and Visual Focus of Attention (VFOA), remains highly challenging, even for humans. It requires not only inferring accurate 3D gaze directions but also understanding the contextual scene to discern which object in the field of view is actually looked at. Context can include people activities that can provide priors about which objects are looked at, or the scene structure to detect obstructions in the line of sight. Recent research has pursued two avenues to address this: the first one focused on improving appearance-based 3D gaze estimation from images and videos, while the second investigated gaze following —the task of inferring where a person looks in an image.
This presentation will explore ideas and methodologies addressing both challenges. Initially, it delves into advancements in 3D gaze estimation, including personalized model construction via few-shot learning and gaze redirection eye synthesis, differential gaze estimation, and leveraging social interaction priors for model adaptation. Subsequently, recent models for estimating gaze targets location ('where') in real-world settings are introduced, including the joint inference of gaze locations for all people in a scene, the inference of social labels like eye contact and shared attention, or the inference of the semantic category of what is being looked at ('what').