Best Full Paper

A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing

Catarina Moreira
Jeffrey Cockburn
Monica S. Castelhano

In everyday visual search tasks, humans rely on prior knowledge of object placements in scenes to efficiently locate target objects. This ability is evidenced by eye movement patterns, where individuals focus on areas that are more likely to contain the target, such as searching for a cup on a table or shoes on the floor. Building on this, we propose a new annotation pipeline that leverages these priors by extracting a knowledge graph from images based on automatically annotated objects. This knowledge graph is then used with large language models (LLMs) to predict the most likely locations of a specific target object in an image. Our approach is the first instance of using LLMs to identify relevant prior knowledge in images and to bridge the gap between human scene understanding and computational models.

Honorable Mention for Full Paper

Imperceptible Gaze Guidance Through Ocularity in Virtual Reality

Virmarie Maquiling
Li Zhaoping
Enkelejda Kasneci

We introduce to VR a novel imperceptible gaze guidance technique from a recent discovery that human gaze can be attracted to a cue that contrasts from the background in its perceptually non-distinctive ocularity, defined as the relative difference between inputs to the two eyes. This cue pops out in the saliency map in the primary visual cortex without being overtly visible. We tested this method in an odd-one-out visual search task using eye tracking with 31 participants in VR. When the target was rendered as an ocularity singleton, participants' gaze was drawn to the target faster. Conversely, when a background object served as the ocularity singleton, it distracted gaze from the target. Since ocularity is nearly imperceptible, our method maintains user immersion while guiding attention without noticeable scene alterations and can render object's depth in 3D scenes, creating new possibilities for immersive user experience across diverse VR applications.

Best Short Paper

PupilSense: A Novel Application for Webcam-Based Pupil Diameter Estimation

Vijul Shah
Ko Watanabe
Brian Moser
Andreas Dengel

Measuring pupil diameter is vital for gaining insights into physiological and psychological states — traditionally captured by expensive, specialized equipment like Tobii eye-trackers and Pupillabs glasses. This paper presents a novel application that enables pupil diameter estimation using standard webcams, making the process accessible in everyday environments without specialized equipment. Our app estimates pupil diameters from videos and offers detailed analysis, including class activation maps, graphs of predicted left and right pupil diameters, and eye aspect ratios during blinks. This tool expands the accessibility of pupil diameter measurement, particularly in everyday settings, benefiting fields like human behavior research and healthcare. Additionally, we present a new open source dataset for pupil diameter estimation using webcam images containing cropped eye images and corresponding pupil diameter measurements.

Honorable Mention for Short Paper

Proxy-Based Pre-Training for Eye-Tracking Applications

David Robert Reich
Cui Ding
Lena Sophia Bolliger
Patrick Haller
Paul Prasse
Lena A. Jäger

Although eye-tracking offers valuable insights into reading processes, its use for training machine learning models is still limited due to scalability issues, as data collection remains resource-intensive. To tackle this challenge, we build on insights from psycholinguistic and psychological reading research that demonstrated the efficacy of proxy measures, specifically mouse-tracking and self-paced reading, as viable substitutes for high-quality eye movement data. We leverage the state-of-the-art model BEyeLSTM to infer general reading and text comprehension and investigate the effectiveness of pre-training it on more accessible proxy data before fine-tuning it on a smaller eye-tracking dataset. Our findings indicate that pre-training on proxy data (i) reduces the amount of eye-tracking data needed to maintain performance levels and (ii) enhances performance when the full eye-tracking dataset is used for fine-tuning. These results not only suggest a promising path toward making eye-tracking-based reading assessments more scalable, cost-effective, and accessible, but also underscore the broader potential of using proxy data for pre-training as a solution to scalability challenges in developing machine learning methods for eye-tracking data. Our code is publicly available: https://github.com/aeye-lab/transfer-learning-from-proxy.

Best Late Breaking Work

Combining Intuitive Gaze-Based Control with EEG-Based Detection of Motor Imagery and Quasi-Movements

Artem S. Yashin
Yulia G. Shevtsova
Evgeniy P. Svirin
Anatoly N. Vasilyev
Sergei L. Shishkin

In this study, we developed a hybrid eye-brain-computer interface (EBCI) combining gaze-based interaction with control via motor imagery (imagined movements, IM) and quasi-movements (QM). We tested it in a custom game. Both types of covert movements were detected through electroencephalography. Previous studies suggest that QM are more detectable than IM; here, they were used for online control for the first time. We aimed to reduce demands on gaze by minimizing its intentional use. The interaction enabled visual exploration without unnatural fixations for command input. We hypothesized that QM would combine with gaze better than IM, but this was not supported. Covert movements were detected with >80% accuracy. User experience questionnaires indicated that the interface was internally consistent and easy to use, at least with IM, while QM-based control was perceived as effortful. These results support our approach to gaze-based interaction, while the integration of QM in EBCIs requires further studies.

Honorable Mention for Late Breaking Work

Estimating the level of ability of English learners using eye movement features during text reading

Yusuke Oyama
Minoru Nakayama

The text reading ability of English as a second language speakers is predicted using features of eye movements in comparison with those of native speakers. In order to estimate the level of language skill development, normalised distance metrics of word skipping patterns during text reading by the learners and native speakers were measured. The distances between the two groups of readers closed gradually, according to the scores of English proficiency tests of the three ability levels the learners were separated into. The individual distances also correlate with the English test scores.

Best Student Late Breaking Work

Emotions Shape Effort-Reward Choices: Positive Valence Decreases Effort, Except Under High Arousal

Jamie C. Chiu
Nicholas Budny
Aetizaz Sameer
Persis A. Baah
Dan-Mircea Mirea
Isabel M. Berwian
Yael Niv

This study examined how the emotional dimensions of arousal and valence influence effort-based decision-making. Twenty-eight participants were exposed to either low arousal & high valence (-A/+V) or high arousal & low valence (+A/-V) stimuli in a task that required choosing between a more rewarding but also more effortful option versus a less effortful but also less rewarding one. We recorded eye movements, fixation times, and self-reported arousal and valence ratings. Results showed that participants in the +A/-V condition were more likely to select the effortful option (62% vs. 56%), with a significant interaction between arousal and valence (β = 0.46, 95% CI [0.26, 0.67]; p <.001). Specifically, positive valence reduced preference for effortful choices, but this effect became null under high arousal. Additionally, participants in the -A/+V condition took longer to make decisions and attended more to reward information than to effort information. These findings suggest complex emotional influences on effort-based decisions.