Keynotes


Joy Hirsch

Elizabeth Mears and House Jameson Professor of Psychiatry, Comparative Medicine, and Neuroscience at Yale University

Bio: Joy Hirsch is the Elizabeth Mears and House Jameson Professor of Psychiatry, Comparative Medicine, and Neuroscience at Yale University and also Professor of Neuroscience in the Department of Medical Physics and Biomedical Engineering at UCL. The overarching goal of her research is to understand the fundamental neural mechanisms that underlie live interactive social behaviors between individuals. Her laboratory has developed multi-modal two-person neuroimaging technology based on near infrared spectroscopy, fNIRS, configured for real-time live face-to-face and dialogue interactions between humans. Converging evidence from simultaneous measures of neural responses, facial classifications, eye-tracking, pupillometry, EEG, and behavioral reports of subjective effects builds a new “neuroscience of two” suggesting the live social interaction is a fundamental neural function. Emerging models based on the integration of multiple modalities highlight a new appreciation for the role of dynamically exchanged sensory experiences such as eye-to-eye contact and live face processing as fundamental components of social behavior..

Title: Eye Movements During Live Face-to-Face Interactions: A social Information Processing Stage

Abstract: Humans are profoundly social. However, most of what is known about the underlying neural circuitry for human social behavior is acquired under non-interactive conditions during MRI scanning leaving a conspicuous gap in understanding the neural foundation of ecologically valid behaviors. Addressing this problem, emerging optical imaging technology (functional near infrared spectroscopy, fNIRS) presents a paradigm shift that enables simultaneous neuroimaging of interacting dyads during spontaneous social behaviors that occur in naturalistic environments. Further, the absence of a high magnetic field in the experimental suite provides the additional advantage of multiple, simultaneous, and synchronized data streams including eye-tracking, dynamic facial classifications, EEG, physiological variables, and subjective reports to be acquired during interactive tasks such as live conversation and face-to- face interactions. This talk will present novel approaches and recent eye-tracking findings in support of the hypothesis that eye-movements during live and interactive face processing are integrated within the social systems of the brain.



James M. Rehg

Founder Professor of Computer Science and Industrial and Enterprise Systems Engineering at University of Illinois Urbana-Champaign

Bio: James M. Rehg is a Founder Professor in the Departments of Computer Science and Industrial and Enterprise Systems Engineering at UIUC, where he is the Director of the Health Care Engineering Systems Center. He received his Ph.D. from CMU in 1995 and worked at the Cambridge Research Lab of DEC (and then Compaq) from 1995-2001, where he managed the computer vision research group. He was a professor in the College of Computing at Georgia Tech from 2001-2022. He received an NSF CAREER award in 2001 and a Raytheon Faculty Fellowship from Georgia Tech in 2005. He and his students have received best student paper awards at ICML 2005, BMVC 2010 and 2022, Mobihealth 2014, and Face and Gesture 2015, and a Method of the Year Award from the journal Nature Methods. Dr. Rehg served as the Program co-Chair for ACCV 2012 and CVPR 2017 and General co-Chair for CVPR 2009. He has authored more than 200 peer-reviewed scientific papers and holds 30 issued US patents. His research interests include computer vision, machine learning, and mobile and computational health (https://rehg.org). Dr. Rehg was the lead PI on an NSF Expedition to develop the science and technology of Behavioral Imaging, the measurement and analysis of social and communicative behavior using multi- modal sensing, with applications to developmental conditions such as autism. He is currently the Deputy Director and TR&D1 Lead for the mHealth Center for Discovery, Optimization, and Translation of Temporally-Precise Interventions (mDOT), which is developing novel on-body sensing and predictive analytics for improving health outcomes (https://mdot.md2k.org/)

Title: An Egocentric Approach to Social AI

Abstract: While computer vision and NLP have made tremendous progress in extracting semantics from image, video, and textual data, our computational understanding of human social behavior is still in its infancy. Face-to-face communication, using a rich repertoire of visual, acoustic, and linguistic channels, is the foundation for all social interactions and relationships, as well as all other means of communication. Moreover, the acquisition of social skill in infancy is a critical developmental milestone, and its disruption in conditions such as autism has life-long consequences. The current state-of-the-art in AI consists largely of surface-level analyses, e.g., action detection, recognition, and prediction from video, or inferring the sentiment of an utterance via NLP. A major challenge is to leverage this recent progress and mount an attack on the core constructs of social interaction, such as joint attention, theory of mind, and social appraisals. A key hypothesis is that computational social understanding is enabled by an egocentric perspective, i.e. the capture of social signals from the perspective of each social partner via head- and body-worn sensors. This perspective is well-aligned with existing commercial efforts in Augmented and Virtual Reality, and with the literature on child development. In this talk, I will provide background on egocentric perception and summarize our current progress towards egocentric social understanding. A key technical challenge is the inference of social attention from multimodal sensor data. Inferential attention is based on the analysis of video recordings of naturalistic interactions using machine learning models, without the use of eye tracking. I will review recent progress on estimating visual and auditory attention from egocentric data. I will also describe our efforts to develop a benchmark dataset for multimodal social understanding, based on multi-person social deduction games such as One Night Werewolf. A key motivation for our work is the modeling of social attention as a means to improve the diagnosis and treatment of autism, and I will review our progress towards this goal. This is joint work with collaborators at UIUC, Georgia Tech, Weill-Cornell, and Meta Reality Labs Research.