Accepted Papers

Long Papers Short Papers Late-Breaking Works Doctoral Consortium

Long Papers

40 Years of Eye Typing: Challenges, Gaps, and Emergent Strategies

Aleesha Hamid
Per Ola Kristensson

Gaze interaction enables users to communicate through eye tracking, and is often the only channel of effective and efficient communication for individuals with severe motor disabilities. While there has been significant research and development of eye typing systems, in the context of augmentative and alternative communication (AAC), there is no comprehensive review that integrates the key findings from the variety of aspects that constitute the complex landscape of gaze communication. This paper presents a detailed review and characterization of the literature and aims to consolidate the disparate efforts to provide eye typing solutions for AAC users. We provide a systematic understanding of the components and functionalities that underpin eye typing solutions, and analyze the interplay of the different facets and their role in shaping the user-experience, accessibility, performance, and overall effectiveness of eye typing technology. We also identify the major challenges and highlight several areas that require further research attention.

Assessing the Size of the Functional Field of View in a Gaze-Contingent Search Paradigm

Sofia Krasovskaya
Árni Kristjánsson
Joseph Macinnes

The functional field of view (FFV) is the part of the visual field centred around the current gaze position that the visual system can process in detail. Its size depends partly on physiological limitations, but its adaptability is largely determined by cognitive factors. For example, changes in the FFV often reflect the demands of a given task: it shrinks with high task demands and expands with easier tasks. Here, we placed an upright or inverted target among distractors. We manipulated the visibility of the search array during gaze-contingent search with small (6°), medium (12°) and large (18°) apertures. Aperture size affected performance, improving it when it increased from 6° to 12°, but not from 12° to 18°, suggesting an upper bound of the 'natural' aperture for this specific task. Furthermore, our results suggest that the FFV does not change in size in response to stimulus inversion.

Automated Assesment of Eye-hand Coordination Skill using a Vertical Tracing Task on a Gaze-sensitive Human Computer Interaction Platform for children with Autism

Dharma Rane
Madhu Singh
Uttama Lahiri

Children with Autism often demonstrate atypical gaze pattern and eye-hand coordination skill deficits marked by difficulties in reaching out for an object, tracing on a vertically mounted canvas, etc. Currently existing conventional methods can assess one's coordination skill during hand movement in 3D-space. But such methods can be subjective and devoid of gaze tracking. Investigation of coordination skill and gaze tracking of this target group in tasks set in 3D-space has been largely unexplored. To quantitatively assess one's eye-hand coordination skill, we have designed Virtual Reality-based Automated gaze-sensitive Tool that can help understand the linkage between their gaze and 3D performance. Results of a study with 10 pairs of age-matched children with Autism (GroupASD) and typically developing children (GroupTD) showed that GroupASD demonstrated reduced eye-hand coordination skill (increased tracing error in vertical tracing task) accompanied with reduced fixation duration on task-relevant regions and atypical gaze path than GroupTD.

NeighboAR: Efficient Object Retrieval using Proximity- and Gaze-based Object Grouping with an AR System

Aleksandar Slavuljica
Kenan Bektas
Jannis Strecker
Simon Mayer

Humans only recognize a few items in a scene at once and memorize three to seven items in the short term. Such limitations can be mitigated using cognitive offloading (e.g., sticky notes, digital reminders). We studied whether a gaze-enabled Augmented Reality (AR) system could facilitate cognitive offloading and improve object retrieval performance. To this end, we developed NeighboAR, which detects objects in a user's surroundings and generates a graph that stores object proximity relationships and user's gaze dwell times for each object. In a controlled experiment, we asked N=17 participants to inspect randomly distributed objects and later recall the position of a given target object. Our results show that displaying the target together with the proximity object with the longest user gaze dwell time helps recalling the position of the target. Specifically, NeighboAR significantly reduces the retrieval time by 33%, number of errors by 71%, and perceived workload by 10%.

GazeIntent: Adapting Dwell-time Selection in VR Interaction with Real-time Intent Modeling

Anish S. Narkar
Jan J. Michalak
Candace E. Peacock
Brendan David-John

The use of ML models to predict a user's cognitive state from behavioral data has been studied for various applications which includes predicting the intent to perform selections in VR. We developed a novel technique that uses gaze-based intent models to adapt dwell-time thresholds to aid gaze-only selection. A dataset of users performing selection in arithmetic tasks was used to develop intent prediction models (F1 = 0.94). We developed GazeIntent to adapt selection dwell times based on intent model outputs and conducted an end-user study with returning and new users performing additional tasks with varied selection frequencies. Personalized models for returning users effectively accounted for prior experience and were preferred by 63% of users. Our work provides the field with methods to adapt dwell-based selection to users, account for experience over time, and consider tasks that vary by selection frequency.

GazeSwitch: Automatic Eye-Head Mode Switching for Optimised Hands-Free Pointing

Baosheng James Hou
Joshua Newn
Ludwig Sidenmark
Anam Ahmad Khan
Hans Gellersen

This paper contributes GazeSwitch, an ML-based technique that optimises the real-time switching between eye and head modes for fast and precise hands-free pointing. GazeSwitch reduces false positives from natural head movements and efficiently detects head gestures for input, resulting in an effective hands-free and adaptive technique for interaction. We conducted two user studies to evaluate its performance and user experience. Comparative analyses with baseline switching techniques, Eye+Head Pinpointing (manual) and BimodalGaze (threshold-based) revealed several trade-offs. We found that GazeSwitch provides a natural and effortless experience but trades off control and stability compared to manual mode switching, and requires less head movement compared to BimodalGaze. This work demonstrates the effectiveness of machine learning approach to learn and adapt to patterns in head movement, allowing us to better leverage the synergistic relation between eye and head input modalities for interaction in mixed and extended reality.

Impact of Design Decisions in Scanpath Modeling

Parvin Emami
Yue Jiang
Zixin Guo
Luis A. Leiva

Modeling visual saliency in graphical user interfaces (GUIs) allows to understand how people perceive GUI designs and what elements attract their attention. One aspect that is often overlooked is the fact that computational models depend on a series of design parameters that are not straightforward to decide. We systematically analyze how different design parameters affect scanpath evaluation metrics using a state-of-the-art computational model (DeepGaze++). We particularly focus on three design parameters: input image size, inhibition-of-return decay, and masking radius. We show that even small variations of these design parameters have a noticeable impact on standard evaluation metrics such as DTW or Eyenalysis. These effects also occur in other scanpath models, such as UMSS and ScanGAN, and in other datasets such as MASSVIS. Taken together, our results put forward the impact of design decisions for predicting users' viewing behavior on GUIs.

Learning User Embeddings from Human Gaze for Personalised Saliency Prediction

Florian Strohm
Mihai Bâce
Andreas Bulling

Reusable embeddings of user behaviour have shown significant performance improvements for the personalised saliency prediction task. However, prior works require explicit user characteristics and preferences as input, which are often difficult to obtain. We present a novel method to extract user embeddings from pairs of natural images and corresponding saliency maps generated from a small amount of user-specific eye tracking data. At the core of our method is a Siamese convolutional neural encoder that learns the user embeddings by contrasting the image and personal saliency map pairs of different users. Evaluations on two public saliency datasets show that the generated embeddings have high discriminative power, are effective at refining universal saliency maps to the individual users, and generalise well across users and images. Finally, based on our model's ability to encode individual user characteristics, our work points towards other applications that can benefit from reusable embeddings of gaze behaviour.

On Task and in Sync: Examining the Relationship between Gaze Synchrony and Self-reported Attention During Video Lecture Learning

Babette Bühler
Efe Bozkir
Hannah Deininger
Peter Gerjets
Ulrich Trautwein
Enkelejda Kasneci

Successful learning depends on learners' ability to sustain attention, which is particularly challenging in online education due to limited teacher interaction. A potential indicator for attention is gaze synchrony, demonstrating predictive power for learning achievements in video-based learning in controlled experiments focusing on manipulating attention. This study (N=84) examines the relationship between gaze synchronization and self-reported attention of learners, using experience sampling, during realistic online video learning. Gaze synchrony was assessed through Kullback-Leibler Divergence of gaze density maps and MultiMatch algorithm scanpath comparisons. Results indicated significantly higher gaze synchronization in attentive participants for both measures and self-reported attention significantly predicted post-test scores. In contrast, synchrony measures did not correlate with learning outcomes. While supporting the hypothesis that attentive learners exhibit similar eye movements, the direct use of synchrony as an attention indicator poses challenges, requiring further research on the interplay of attention, gaze synchrony, and video content type.

Privacy-preserving Scanpath Comparison for Pervasive Eye Tracking

Süleyman Özdel
Efe Bozkir
Enkelejda Kasneci

As eye tracking becomes pervasive with screen-based devices and head-mounted displays, privacy concerns regarding eye-tracking data have escalated. While state-of-the-art approaches for privacy-preserving eye tracking mostly involve differential privacy and empirical data manipulations, previous research has not focused on methods for scanpaths. We introduce a novel privacy-preserving scanpath comparison protocol designed for the widely used Needleman-Wunsch algorithm, a generalized version of the edit distance algorithm. Particularly, by incorporating the Paillier homomorphic encryption scheme, our protocol ensures that no private information is revealed. Furthermore, we introduce a random processing strategy and a multi-layered masking method to obfuscate the values while preserving the original order of encrypted editing operation costs. This minimizes communication overhead, requiring a single communication round for each iteration of the Needleman-Wunsch process. We demonstrate the efficiency and applicability of our protocol on three publicly available datasets with comprehensive computational performance analyses and make our source code publicly accessible.

PrivatEyes: Appearance-based Gaze Estimation Using Federated Secure Multi-Party Computation

Mayar Elfares
Pascal Reisert
Zhiming Hu
Wenwu Tang
Ralf Küsters
Andreas Bulling

Latest gaze estimation methods require large-scale training data but their collection and exchange pose significant privacy risks. We propose PrivatEyes - the first privacy-enhancing training approach for appearance-based gaze estimation based on federated learning (FL) and secure multi-party computation (MPC). PrivatEyes enables training gaze estimators on multiple local datasets across different users and server-based secure aggregation of the individual estimators' updates. PrivatEyes guarantees that individual gaze data remains private even if a majority of the aggregating servers is malicious. We also introduce a new data leakage attack DualView that shows that PrivatEyes limits the leakage of private training data more effectively than previous approaches. Evaluations on the MPIIGaze, MPIIFaceGaze, GazeCapture, and NVGaze datasets further show that the improved privacy does not lead to a lower gaze estimation accuracy or substantially higher computational costs - both of which are on par with its non-secure counterparts.

Real-time Prediction of Students' Math Difficulties using Raw Data from Eye Tracking and Neural Networks

Kathrin Kennel
Stefan Ruzika

Eye tracking technology in adaptive learning systems enhances diagnostic capabilities by providing valuable insights into cognitive processes. This information can be leveraged to identify and address difficulties. So far, there have been only few attempts of realizing this. Studies are usually only about recognizing correctness of answers and the evaluation is complex and difficult to transfer due to features depending on Areas of Interests (AOIs). We close this gap and present a time-dynamic approach to identify specific difficulties based on raw gaze data. The eye tracking data of 139 students while solving a math problem serve as a sample. Difficulties that arose during the solution process are known. A temporal convolutional network (TCN) is trained to perform a multiclass classification on sequential data. On this basis we present an algorithm which simulates a dynamic classification in an adaptive real-time system. We evaluate this procedure achieving an accuracy of almost 80%.

Remapping the Document Object Model using Geometric and Hierarchical Data Structures for Efficient Eye Control

Daniel Vella
Chris Porter

The Web Content Accessibility Guidelines (WCAG) are there to ensure that websites are perceivable, operable, understandable and robust across different user agents and assistive technologies. However, people who rely on eye trackers (ETs) may find that even WCAG-compliant websites are hard to access, and this is further accentuated by designs that offer little to no affordances for ET interaction. Areas with a high density of interactive elements, along with hierarchical navigation menus, such as megamenus or fly-out menus, are just two examples where ET interaction can be problematic. This paper introduces two novel interaction patterns as part of a purpose-built gaze-native web browser (Cactus), namely (a) Quadtree-based Target Selection with Secondary Confirmation and (b) Hierarchical Re-rendering of Navigation Menus. We present results from a between-subject single-blind study with 30 participants and report on metrics such as performance, perceived workload and usability, with demonstrable improvements over the state of the art.

RPG: Rotation Technique in VR Locomotion using Peripheral Gaze

Jaeyoon Lee
Hanseob Kim
Yechan Yang
Gerard Jounghyun Kim

Locomotion is an important task in many virtual reality (VR) applications. Locomotion requires two different directional settings for: (1) translation to a target, and (2) body rotation. Conventional locomotion methods mostly make use of the hand-held controller for such directional control. The controller-based directional setting may not be natural, as in real life, humans set directions with the gaze, often together with the head/body rotation. Note that the view direction is independently and naturally controlled by the head (or gaze) direction. In this paper, we propose to use the peripheral gaze for body rotation, seamlessly together with the foveal gaze for the view control, so that the two tasks practically do not interfere with one another. In the comparative study, the peripheral gaze showed similar task performance to the controller-based rotation but with reduced VR sickness and improved usability.

Shifting Focus with HCEye: Exploring the Dynamics of Visual Highlighting and Cognitive Load on User Attention and Saliency Prediction

Anwesha Das
Zekun Wu
Iza Skrjanec
Anna Maria Feit

Visual highlighting can guide user attention in complex interfaces. However, its effectiveness under limited attentional capacities is underexplored. This paper examines the joint impact of visual highlighting (permanent and dynamic) and dual-task-induced cognitive load on gaze behaviour. Our analysis, using eye-movement data from 27 participants viewing 150 unique webpages reveals that while participants' ability to attend to UI elements decreases with increasing cognitive load, dynamic adaptations (i.e., highlighting) remain attention-grabbing. The presence of these factors significantly alters what people attend to and thus what is salient. Accordingly, we show that state-of-the-art saliency models increase their performance when accounting for different cognitive loads. Our empirical insights, along with our openly available dataset, enhance our understanding of attentional processes in UIs under varying cognitive (and perceptual) loads and open the door for new models that can predict user attention while multitasking.

Teaching Eye Tracking: Challenges and Perspectives

Michael Burch
Kuno Kurzhals

Eye tracking studies are more complicated to design, conduct, and to evaluate than traditional studies solely based on performance measures like error rates and response times. This is typically due to the more complex hardware setup, the calibration procedures, and the spatio-temporal nature of the recorded data that must be analyzed, visualized, or statistically evaluated. As a benefit, eye movement data contains patterns of visual attention over space and time that are not observable in standard error rates, completion times, and qualitative feedback. Students in the field of visualization, human-computer interaction, and user experience represent an interest group that would benefit from the application of eye tracking during their studies and in their future careers. Consequently, instructing them how to design, setup, conduct, and evaluate an eye tracking study is of special interest to current researchers involved in teaching. We describe education in eye tracking in five courses with 79 students from bachelor, master, and PhD levels. We outline our concept and discuss the challenges to raise people with no experience in eye tracking to a level of knowledge that allows them to apply this emerging technology to different scenarios including visual stimuli and related research questions. We discuss our teaching strategy in two course setups (summer school and traditional university lecture), the results of the students' eye tracking studies, and which challenges they and the teachers faced during the course.

Unveiling Deceptive Intentions: Insights from Eye Movements and Pupil Size

Valentin Foucher
Anke Huckauf

Recognition of deceptive intentions from the eyes has been of appealing interest in the last decades but is still unresolved. Here, we report the development of a paradigm based on the Concealed Information Test enabling the study of various kinds of deception, that is, faking and concealing. Based on a card game, we compared fixation as well as pupil behaviour while participants were instructed to fake, conceal, or tell the truth. We realized two different layouts of stimulus presentation. Fixations differed between concealing versus faking and telling the truth; pupil size additionally unveiled the object of deception. We infer that various kinds of deceptive behaviour must be carefully distinguished, and contribute to how to use gaze measures as indicators of deceptive intentions.

VisRecall++: Analysing and Predicting Visualisation Recallability from Gaze Behaviour

Yao Wang
Yue Jiang
Zhiming Hu
Constantin Ruhdorfer
Mihai Bâce
Andreas Bulling

Question answering has recently been proposed as a promising means to assess the recallability of information visualisations. However, prior works are yet to study the link between visually encoding a visualisation in memory and recall performance. To fill this gap, we propose VisRecall++ -- a novel 40-participant recallability dataset that contains gaze data on 200 visualisations and 1,000 questions, including identifying the title and retrieving values. We measured recallability by asking participants questions after they observed the visualisation for 10 seconds. Our analyses reveal several insights, such as saccade amplitude, number of fixations, and fixation duration significantly differ between high and low recallability groups. Finally, we propose GazeRecallNet -- a novel computational method to predict recallability from gaze behaviour that outperforms the state-of-the-art model RecallNet and three other baselines on this task. Taken together, our results shed light on assessing recallability from gaze behaviour and inform future work on recallability-based visualisation optimisation.

What Eyeblinks Reveal: Interpersonal Synchronization in Dyadic Interaction

Mehtap Çakır
Anke Huckauf

This study examines eyeblink synchronization in interactions characterized by mutual gaze without task-related or conversational elements that can trigger similarities in visual, auditory, or cognitive processing. We developed a study design capable of isolating the role of gaze in human-human interaction and observed the blinking behavior of dyads with mobile eye tracking glasses under three conditions: face-to-face mutual gaze, mediated mutual gaze through a mirror, and self-directed gaze in a mirror. The results revealed that when the interaction was through direct mutual gaze, eyeblink synchronization increased concurrently with a more structured temporal pattern. Also, the sense of connection between partners mimicked the synchronization. These findings suggest that even minor deviations caused by mediated interaction lead to reduced synchronization and a weakened sense of connection among partners. The paper also discusses the need for methodologies to enhance the efficacy and authenticity of online environments and human-robot interaction.

Per-Subject Oculomotor Plant Mathematical Models and the Reliability of Their Parameters

Kateryna Melnyk
Lee Friedman
Dmytro Katrychuk
Oleg Komogortsev

The practical value of oculomotor plant mathematical models (OPMMs) has been demonstrated across various domains, including biometrics and eye movement prediction. To further enhance their utilization, new optimization approaches are commonly developed and introduced within the research community. In this study, we demonstrate a new integration of a previously developed per-subject optimization procedure for an Enderle OPMM and introduce methods to evaluate the reliability of OPMM parameters using the intraclass correlation coefficient (ICC) and Kendall's Coefficient of Concordance (KCC). We evaluated two per-subject OPMM models, Bahill and Enderle, using the 'GazeBase' eye movement dataset. The models differed in accuracy, expressed as the per-sample error in degrees of visual angle, based on the differences between the actual and predicted saccade trajectories. We found that some of the parameter estimates for both models were quite unreliable. We discuss the importance of addressing low-reliability issues and suggest methods to modify the models to enhance reliability and performance.

Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems

Viet Dung Nguyen
Reynold Bailey
Gabriel J. Diaz
Chengyi Ma
Alexander Fix
Alexander Ororbia

Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate. Segmentation models trained using supervised machine learning can excel at this task, their effectiveness is determined by the degree of overlap between the narrow distributions of image properties defined by the target dataset and highly specific training datasets, of which there are few. Attempts to broaden the distribution of existing eye image datasets through the inclusion of synthetic eye images have found that a model trained on synthetic images will often fail to generalize back to real-world eye images. In remedy, we use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data, and to prune the training dataset in a manner that maximizes distribution overlap. We demonstrate that our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.

Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)

Virmarie Maquiling
Sean Anthony Byrne
Diederick C. Niehorster
Marcus Nyström
Enkelejda Kasneci

The advent of foundation models signals a new era in artificial intelligence. The Segment Anything Model (SAM) is the first foundation model for image segmentation. In this study, we evaluate SAM's ability to segment features from eye images recorded in virtual reality setups. The increasing requirement for annotated eye-image datasets presents a significant opportunity for SAM to redefine the landscape of data annotation in gaze estimation. Our investigation centers on SAM's zero-shot learning abilities and the effectiveness of prompts like bounding boxes or point clicks. Our results are consistent with studies in other domains, demonstrating that SAM's segmentation effectiveness can be on-par with specialized models depending on the feature, with prompts improving its performance, evidenced by an IoU of 93.34% for pupil segmentation in one dataset. Foundation models like SAM could revolutionize gaze estimation by enabling quick and easy image segmentation, reducing reliance on specialized models and extensive manual annotation.

Using Deep Learning to Increase Eye-Tracking Robustness, Accuracy, and Precision in Virtual Reality

Kevin Barkevich
Reynold Bailey
Gabriel J. Diaz

Algorithms for the estimation of gaze direction from mobile and video-based eye trackers typically involve tracking a feature of the eye that moves through the eye camera image in a way that covaries with the shifting gaze direction, such as the center or boundaries of the pupil. Tracking these features using traditional computer vision techniques can be difficult due to partial occlusion and environmental reflections. Although recent efforts to use machine learning (ML) for pupil tracking have demonstrated superior results when evaluated using standard measures of segmentation performance, little is known of how these networks may affect the quality of the final gaze estimate. This work provides an objective assessment of the impact of several contemporary ML-based methods for eye feature tracking when the subsequent gaze estimate is produced using either feature-based or model-based methods. Metrics include the accuracy and precision of the gaze estimate, as well as drop-out rate.

Learning Gaze-aware Compositional GAN from Limited Annotations

Nerea Aranjuelo
Siyu Huang
Ignacio Arganda-Carreras
Luis Unzueta
Oihana Otaegui
Hanspeter Pfister
Donglai Wei

Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection.

Eye Strokes: An Eye-gaze Drawing System for Mandarin Characters

Derek C.W. Tong
Xuet Ying Tan
Alex Q. Chen

Studies showed that 61% of the elderly in Singapore are English illiterate, and it is essential to find alternatives for non-English literate patients who have literacy in the next most common language, Mandarin. Many eye typing solutions use Mandarin hanyu pinyin, a phonetic system similar to eye typing in the English language. However, most Mandarin-speaking elderly in Singapore are not familiar with Mandarin hanyu pinyin despite being able to read and write Mandarin characters, which makes eye typing redundant. We propose Eye Strokes, a technique to capture eye gaze, as an input modality to identify Mandarin characters for motor neurone disease patients to communicate. This method uses the eye gaze as strokes in a Mandarin character to predict and identify the Mandarin word the user intends to communicate. The proof-of-ideation evaluation discussed in the paper shows that our technique is feasible with promising character prediction accuracy for further investigations, although some limitations exist.

Short Papers

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

Süleyman Özdel
Yao Rong
Berat Mert Albaba
Yen-Ling Kuo
Xi Wang
Enkelejda Kasneci

Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important. To effectively automate the process of video analysis based on eye-tracking data, it is important to accurately replicate human gaze behavior. However, this task presents significant challenges due to the inherent complexity and ambiguity of human gaze patterns. In this work, we introduce a novel method for simulating human gaze behavior. Our approach uses a transformer-based reinforcement learning algorithm to train an agent that acts as a human observer, with the primary role of watching videos and simulating human gaze behavior. We employed an eye-tracking dataset gathered from videos generated by the VirtualHome simulator, with a primary focus on activity recognition. Our experimental results demonstrate the effectiveness of our gaze prediction method by highlighting its capability to replicate human gaze behavior and its applicability for downstream tasks where real human-gaze is used as input.

Are eye movements and covert shifts of attention functional for memory retrieval?

Ruhi Bhanap
Klaus Oberauer
Agnes Rosner

During memory retrieval, people tend to look back at locations where the information was previously presented, known as the Looking at Nothing effect (LAN). Scholz et al. [2018] reported that LAN can be functional for memory not only through eye movements (EM) but also by covert shifts of attention (CA). In our study, we aimed to replicate their findings in an associative recognition task. During encoding, participants were twice presented with three word-pairs at three locations. At retrieval, we manipulated EM or CA to a congruent, incongruent, or central location with a digit tracking task and simultaneously tested their memory with aural cues. We observed a trend with higher accuracy for the congruent over incongruent trials for both EM and CA, however, this trend was not credible statistically. We discuss the differences in studies and implications for testing the link between attention, eye movements and memory. Data and Analysis Scripts: https://osf.io/xf9sb/

Assessment of body-related attention processes via mobile eye tracking: A pilot study to validate an automated analysis pipeline

Nina A. Gehrer
Jennifer Svaldi
Andrew Duchowski

Body dissatisfaction and eating disorder symptoms have previously been linked to biased attentional processes when looking at one’s own body. The tendency is to focus on body parts self-defined to be most unattractive, compared to those self-defined to be most attractive. This attention bias is assumed to contribute to the maintenance of body dissatisfaction. Therefore, there is a need to measure and analyze attentional bias efficiently in studies on etiology and treatment of body dissatisfaction and eating disorders. We validate an automated analysis pipeline for mobile eye-tracking data while looking at one’s own body in a large mirror. The pipeline uses and extends a machine-learning based body pose estimation algorithm by segmenting identified body parts into subregions defined by provided texture coordinates. Results of a pilot study include a guided-viewing validation task, in which participants were asked to look at specific body regions, and a 3-minute free-viewing task.

Beyond Basic Tuning: Exploring Discrepancies in User and Setup Calibration for Gaze Estimation

Gonzalo Garde
José María Armendariz
Ruben Beruete Cerezo
Rafael Cabeza
Arantxa Villanueva

Calibrating gaze estimation models is crucial to maximize the effectiveness of these systems, although its implementation also poses challenges related to usability. Therefore, the simplification of this process is key. In this work, we dissect the impact of calibration due to both the environment and the user in gaze estimation models that employ general-purpose devices. We aim to replicate a workflow close to the final application by starting with pre-trained models and subsequently calibrating them using different strategies, testing under various camera arrangements and user-specific variability. The results indicate differentiation between the impact due to the user and the setup, being the components due to the users a slightly more pronounced impact than those related to the setup, opening the door to understanding calibration as a composite process. In any case, the development of calibration-free remote gaze estimation solutions remains a great challenge, given the crucial role of calibration.

Communication breakdown: Gaze-based prediction of system error for AI-assisted robotic arm simulated in VR

Björn Rene Severitt
Patrizia Lenhart
Benedikt Werner Hosp
Nora Jane Castner
Siegfried Wahl

Neurological degenerative conditions can affect motor functions, making mobility daunting. Recent configurations of mobility devices that leverage artificial intelligence (AI) show its ability to handle complex information like user input. We create a virtual reality environment to measure participants’ reactions to correct and incorrect feedback from an AI-assistance system. Using gaze to evaluate these reactions, we investigate whether we can automatically predict an upcoming system error. Our results show that gaze reactions occur within 300 ms when the system highlights user input, but the delay extends to 1 second without highlighting. Subject dependent gaze behavior proved complicated for developing a generalizable model based on previous work using TCNs for online recognition of upcoming errors. Therefore, more adaptable models for individuals may be a better alternative for gaze-based accessibility systems.

CSA-CNN: A Contrastive Self-Attention Neural Network for Pupil Segmentation in Eye Gaze Tracking

Soumil Chugh
Juntao Ye
Yuqi Fu
Moshe Eizenman

This paper presents a novel Contrastive Self-Attention Convolutional Neural Network (CSA-CNN) model with enhanced Difficulty Aware (DA) loss function to improve the segmentation of pupils in eye images. The incorporation of transformer-style self-attention and Difficulty-Aware loss in a UNET-style architecture allows for robust feature representation and promotes shape alignment. The novel model was trained on two public databases (LPW and RIT-Eyes) and evaluated on two other public datasets (ExCuSe and ElSe). When compared with seven state-of-the-art pupil center detection methods, the CSA-CNN showed improvement of over 6% in pupil center detection accuracy (detection within 5 pixels of the labeled center) and more than 9% in Intersection Over Union (IOU) accuracy, compared to the best performer among the other seven methods. Furthermore, when the CSA-CNN model was integrated into a glint-based eye tracking system that uses learning based methods to detect pupil-center, we saw a 25% improvement in gaze accuracy.

Evaluation of Eye Tracking Signal Quality for Virtual Reality Applications: A Case Study in the Meta Quest Pro

Samantha Aziz
Dillon J Lohr
Lee Friedman
Oleg Komogortsev

We present an analysis of the eye tracking capabilities of the Meta Quest Pro virtual reality headset using a dataset of eye movement recordings collected from 78 participants. We highlight the potential differences in user experience as a function of device performance using a novel, user-centered evaluation framework for eye tracking signal quality analysis. In addition to presenting classical signal quality metrics such as spatial accuracy and spatial precision, we also explore how spatial accuracy varies across the field of view for different users across the performance range of the device. This work contributes to a growing understanding of eye tracking signal quality in virtual reality platforms, where the usability of gaze-based applications is directly dependent on the quality of the device’s eye tracking signal.

Event-based eye tracking for smart eyewear

Simone Mentasti
Francesco Lattari
Riccardo Santambrogio
Gianmario Careddu
Matteo Matteucci

This paper presents an innovative approach to gaze tracking in the context of smart eyewear, utilizing a fully event-based algorithm. Traditional gaze-tracking methods often rely on grayscale or infrared imaging, which can be computationally intensive and raise privacy concerns. Our research addresses these issues by developing an algorithm that exclusively uses data from event-based sensors, optimizing for the limited computational capabilities of smart eyewear. The system uses simple geometrical operations, enabling efficient real-time processing. Experimental results demonstrate the feasibility of this approach, offering a promising solution for gaze tracking in compact, computationally constrained devices. Despite certain limitations in accuracy due to optimization for efficiency, the research underscores the practicality of this approach for practical, privacy-conscious applications in smart eyewear technology.

Examining the Utility of Blink Rate as a Proxy for Cognitive Load in Flight Simulation

Naila Ayala
Claudia Alexandra Martin Calderon
Nikhil Sharma
Elizabeth Irving
Shi Cao
Suzanne Kearns
Ewa Niechwiej-Szwedo

Blink rate has been suggested to be a proxy for cognitive load. However, there are mixed results regarding this association, which is modulated by the visual demand of a task or when a dual-task paradigm is performed. This study investigated blink rate as a measure of cognitive load using a dual task paradigm (i.e., primary task = flying an aircraft; secondary task = auditory oddball task) to specifically assess changes in cognitive load across two flight phases (i.e., cruise, landing). Primary task performance suggested participants focused on maintaining aircraft control across task conditions, but a significant dual-task difference score in auditory oddball task performance (p<0.001) was still evident across flight phases. Blink rate did not show similar differences across flight phases. Our results demonstrate that the utility in using blink rate to measure cognitive load may be limited in its application to more complex naturalistic scenarios when using an auditory dual-task.

Eye Movement in a Controlled Dialogue Setting

David Dembinsky
Ko Watanabe
Andreas Dengel
Shoya Ishimaru

Designing realistic eye movements for animated avatars poses a challenge, as gaze behavior is predominantly unconscious. Accurately modulating those movements is crucial to avoid the Uncanny Valley. The human gaze exhibits different characteristics in conversations, depending on speaking or listening. Albeit these distinctions are known, data for synthesizing eye movement models suitable for avatars is scarce. This research introduces a novel dataset involving human gaze behavior during remote screen conversations. The data are collected from 19 participants, offering 4 hours of gaze data labeled as Speaking and Listening. Our data analysis substantiates prior knowledge of gaze behavior while providing new insights through higher precision. Furthermore, we demonstrate the dataset’s suitability for machine learning algorithms by training a classifier, achieving 88.1% binary classification accuracy.

Eyes on the Narrative: Exploring the Impact of Visual Realism and Audio Presentation on Gaze Behavior in AR Storytelling

Florian Weidner
Jakob Hartbrich
Stephanie Arevalo Arboleda
Christian Kunert
Christian Schneiderwind
Chenyao Diao
Christoph Gerhardt
Tatiana Surdu
Wolfgang Broll
Stephan Werner
Alexander Raake

Augmented Reality (AR) and Virtual Reality (VR) are essential tools for researchers and practitioners, serving purposes from training to entertainment: many of these applications rely on agents. This study explores the impact of agent characteristics on user reactions, focusing on gaze as a primary visual attention indicator in AR and VR. While existing research has investigated the agent’s gaze and its influence on the user, it is unclear how the agent’s auralization and visualization influence gaze behaviour. We investigate this by studying the impact of rendering style and type of audio on gaze behaviour during a narrative AR experience. Participants listened to a story with the agent visualized as a cartoon-style or realistic virtual human and auralized with spatial or non-spatial audio. The results revealed that the agent’s rendering style significantly influenced gaze behaviour, with cartoon-style agents capturing more visual attention. Audio variations did not yield significant differences. Together, our findings inform the design of AR user interfaces with agents, suggesting that low-realism visualizations are more captivating and, thus, more suitable for experiences where the user is supposed to look at the storyteller.

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention

Süleyman Özdel
Yao Rong
Berat Mert Albaba
Yen-Ling Kuo
Xi Wang
Enkelejda Kasneci

Humans utilize their gaze to concentrate on essential information while perceiving and interpreting intentions in videos. Incorporating human gaze into computational algorithms can significantly enhance model performance in video understanding tasks. In this work, we address a challenging and innovative task in video understanding: predicting the actions of an agent in a video based on a partial video. We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input. Our method utilizes a Graph Neural Network to recognize the agent’s intention and predict the action sequence to fulfill this intention. To assess the efficiency of our approach, we collect a dataset containing household activities generated in the VirtualHome environment, accompanied by human gaze data of viewing videos. Our method outperforms state-of-the-art techniques, achieving a 7% improvement in accuracy for 18-class intention recognition. This highlights the efficiency of our method in learning important features from human gaze data.

How Deep Is Your Gaze? Leveraging Distance in Image-Based Gaze Analysis

Maurice Koch
Nelusa Pathmanathan
Daniel Weiskopf
Kuno Kurzhals

Image thumbnails are a valuable data source for fixation filtering, scanpath classification, and visualization of eye tracking data. They are typically extracted with a constant size, approximating the foveated area. As a consequence, the focused area of interest in the scene becomes less prominent in the thumbnail with increasing distance, affecting image-based analysis techniques. In this work, we propose depth-adaptive thumbnails, a method for varying image size according to the eye-to-object distance. Adjusting the visual angle relative to the distance leads to a zoom effect on the focused area. We evaluate our approach on recordings in augmented reality, investigating the similarity of thumbnails and scanpaths. Our quantitative findings suggest that considering the eye-to-object distance improves the quality of data analysis and visualization. We demonstrate the utility of depth-adaptive thumbnails for applications in scanpath comparison and visualization.

Keep Your Eyes on the Target: Enhancing Immersion and Usability by Designing Natural Object Throwing with Gaze-based Targeting

Jaeyoon Lee
Hanseob Kim
Gerard Jounghyun Kim

While controllers can support many generic 3D interactions in virtual reality (VR), it alone may fall short of eliciting the core experience for certain actions. One such task is the “object throwing’’, ubiquitous in many sports contents, involving intricately timed actions of aiming, arm swinging, and object releasing. With the increasing availability of eye tracking, we propose to combine gaze-based targeting with the controller swing gesture to model the object throw. The target is aimed/locked by gaze, and the throw is enacted by the controller swing/button press with the object let-gone by the button release. We compare and evaluate the proposed interface against the conventional controller-only based interaction through two typical baseball tasks – Pitcher and Outfielder. The findings indicated that the task accuracy was similar, but the gaze-based targeting allowed for faster completion. More importantly, the gaze-based method showed significantly higher usability and richer VR experience.

Measuring Motor Task Difficulty using Low/High Index of Pupillary Activity

Ziang Wu
Xianta Jiang
Jingjing Zheng
Bin Zheng
Stella Atkins

Traditional baseline-subtraction methods in calculating pupillary response have short-comings in assessing workloads. The enhanced algorithms based on pupil oscillation, such as the Index of Cognitive Activity (ICA), Index of Pupillary Activity (IPA), and the enhanced version, Low/High Index of Pupillary Activity (LHIPA), utilizing wavelet analysis, have been developed to measure cognitive loads in scenarios where the gaze is relatively static. The applicability of these algorithms in motor tasks, where eye-hand coordination plays a significant role, has not been sufficiently studied. This research explores the effectiveness of LHIPA in measuring task workloads of users in a simulated surgical task. The task’s complexity was determined using Fitts’ index of difficulty, which varied by target size and distance. We found that LHIPA effectively differentiates task workloads across various levels of task difficulties in the surgical task. Results are crucial for creating objective methods to assess task workloads using pupil-based metrics during goal-oriented movements.

My Eyes Don't Consent! Exploring Visual Attention in Cookie Consent Interfaces

Yavuz Inal
Frode S. Volden
Camilla Carlsen
Sarah Hjelmtveit

Each cookie consent interface has a different design variant, featuring various levels of information details, which makes the interaction highly challenging. This study aimed to explore how users' visual attention differs across different variants of cookie consent interfaces. To conduct the study, we designed an experiment testing three cookie consent variants, representing good, suboptimal, and bad design practices. Study results showed that none of the participants read the text on the interfaces. Most participants admitted skipping over the text about the cookie notice and did not carefully consider the options related to cookie consent. Notably, the bad design variant proved to be statistically significantly the most challenging for participants to make decisions, as evidenced by the highest total duration of fixation and number of fixations. However, participants spent a longer average fixation duration when the interface included both poor and decent design practices. Study results highlight the impact of interface design, banner location, content, and options presented on visual attention, indicating the necessity of establishing design guidelines to facilitate users in navigating cookie consent interfaces easily.

Open Iris - An Open Source Framework for Video-Based Eye-Tracking Research and Development

Roksana Sadeghi
Ryan Ressmeyer
Jacob Yates
Jorge Otero-Millan

Eye-tracking is an essential tool in many fields, yet existing solutions are often limited for customized applications due to cost or lack of flexibility. We present OpenIris, an adaptable and user-friendly open-source framework for video-based eye-tracking. OpenIris is developed in C# with modular design that allows further extension and customization through plugins for different hardware systems, tracking, and calibration pipelines. It can be remotely controlled via a network interface from other devices or programs. Eye movements can be recorded online from camera stream or offline post-processing recorded videos. Example plugins have been developed to track eye motion in 3-D, including torsion. Currently implemented binocular pupil tracking pipelines can achieve frame rates of more than 500Hz. With the OpenIris framework, we aim to fill a gap in the research tools available for high-precision and high-speed eye-tracking, especially in environments that require custom solutions that are not currently well-served by commercial eye-trackers.

Saccade Characteristics during Fusional Vergence Tests as a Function of Vergence Demand

Cristina Rovira-Gay
Clara Mestre
Marc Argilés
Jaume Pujol

This study investigated saccadic behaviour during the fusional vergence test as a function of disparity vergence demand. Participants’ divergence and convergence fusional vergence ranges were measured in an haploscopic setup. Disparity of a column of letters increased to 45 prism dioptres (PD) at 1 PD/s for both divergence and convergence. Eye movements were recorded using an Eyelink 1000 Plus, and a velocity-threshold-based algorithm was used to detect saccades. Saccades followed the main sequence, and their characteristics varied significantly with the direction of the underlying vergence movement. Saccades’ amplitude while fusion was maintained was smaller than during diplopia. Participants experienced diplopia for a longer time during divergence evaluation, which resulted in a higher prevalence of horizontal saccades than during convergence. This study highlights the impact of binocular conditions on saccade characteristics during fusional vergence tests, thus contributing to the understanding of how the visual system adapts to different vergence demands.

Saliency3D: A 3D Saliency Dataset Collected on Screen

Yao Wang
Qi Dai
Mihai Bâce
Karsten Klein
Andreas Bulling

While visual saliency has recently been studied in 3D, the experimental setup for collecting 3D saliency data can be expensive and cumbersome. To address this challenge, we propose a novel experimental design that utilises an eye tracker on a screen to collect 3D saliency data, which could reduce the cost and complexity of data collection. We first collected gaze data on a computer screen and then mapped the 2D points to 3D saliency data through perspective transformation. Using this method, we propose Saliency3D, a 3D saliency dataset (49,276 fixations) comprising 10 participants looking at sixteen objects. We examined the viewing preferences for objects and our results indicate potential preferred viewing directions and a correlation between salient features and the variation in viewing directions.

Signal vs Noise in Eye-tracking Data: Biometric Implications and Identity Information Across Frequencies

Mehedi Hasan Raju
Lee Friedman
Dillon J Lohr
Oleg Komogortsev

Prior research states that frequencies below 75 Hz in eye-tracking data represent the primary eye movement termed “signal” while those above 75 Hz are deemed “noise”. This study examines the biometric significance of this signal-noise distinction and its privacy implications. There are important individual differences in a person’s eye movement, which lead to reliable biometric performance in the “signal” part. Despite minimal eye-movement information in the “noise” recordings, there might be significant individual differences. Our results confirm the “signal” predominantly contains identity-specific information, yet the “noise” also possesses unexpected identity-specific data. This consistency holds for both short-(≈ 20 min) and long-term (≈ 1 year) biometric evaluations. Understanding the location of identity data within the eye movement spectrum is essential for privacy preservation.

Swap It Like Its Hot: Segmentation-based spoof attacks on eye-tracking images

Anish S Narkar
Brendan David-John

Video-based eye trackers capture the iris biometric and enable authentication to secure user identity. However, biometric authentication is susceptible to spoofing another user’s identity through physical or digital manipulation. The current standard to identify physical spoofing attacks on eye-tracking sensors uses liveness detection. Liveness detection classifies gaze data as real or fake, which is sufficient to detect physical presentation attacks. However, such defenses cannot detect a spoofing attack when real eye image inputs are digitally manipulated to swap the iris pattern of another person. We propose IrisSwap as a novel attack on gaze-based liveness detection. IrisSwap allows attackers to segment and digitally swap in a victim’s iris pattern to fool iris authentication. Both offline and online attacks produce gaze data that deceives the current state-of-the-art defense models at rates up to 58% and motivates the need to develop more advanced authentication methods for eye trackers.

The Eyes Have It: Exploring the Connection Between Domain-Knowledge and Perception

Sonali Aatrai
Sandhya Gayatri Prabhala
Saurabh Sharma
Rajlakshmi Guha

Leveraging prior domain-knowledge significantly impacts perception, and this study explores whether the prior domain-knowledge has any relation with the eye movements. We explore oculometrics of 95 graduate students from Architecture, Social Science, and Biology backgrounds, categorized by their domain-knowledge levels. Students were given unbiased (neutral), domain-biased (architecture and biology) tasks. Oculometrics of the students were captured during the tasks and were analyzed to determine if prior domain-knowledge affected eye movements. Results showed no significant difference between the three groups in the unbiased tasks. However, significant differences emerged in biased tasks between groups with and without domain-knowledge. Signature markers, such as peak saccadic velocity and mean pupil diameter, along with time-correlated parameters of total scanning duration, total fixation duration, total saccadic duration, and fixation count, were found to differentiate novices from those with prior domain-knowledge. The study highlights the importance of oculometrics to distinguish between novices and those with prior domain-knowledge.

Using Gaze Transition Entropy to Detect Classroom Discourse in a Virtual Reality Classroom

Philipp Stark
Alexander J. Jung
Jens-Uwe Hahn
Enkelejda Kasneci
Richard Göllner

This paper explores gaze entropy as a metric for detecting classroom discourse events in a virtual reality (VR) classroom. Using data from a laboratory experiment with N = 240 secondary school students, we distinguished between events of teacher-centered classroom discourse (question, hand raising, answer) and teacher explanation by analyzing their transition and stationary gaze entropy. Employing multi-level regression models, both entropy measures effectively discriminated between the two events and distinguished different levels of classroom participation as indicated by the degree of hand-raising by virtual students. Furthermore, using both measures in a logistic regression model, the potential of gaze entropy could be demonstrated by predicting the two events with 67% accuracy. By analyzing transition and stationary entropy, the study attempts to uncover different gaze patterns associated with learning events in a virtual classroom. The results contribute to the research and development of VR scenarios that help to simulate effective learning environments.

Late-Breaking Works

A Brief Introduction to ISCET - The International Society for Clinical Eye Tracking

Rasha Sameer Moustafa
Siyuan Chen
Minoru Nakayama
Frederick Shic
Matt J Dunn

The International Society for Clinical Eye Tracking (ISCET) serves as a global platform for promoting international consensus on open standards on clinical eye tracking. Originally formed in March 2023, ISCET was created to facilitate collaboration, knowledge exchange, and advancements in clinical eye tracking applications, with the ultimate goal of fostering interdisciplinary research and improving clinical outcomes. Through collaborative and interdisciplinary efforts, ISCET’s current mission is to provide guidance on conducting eye-tracking tasks in clinical settings, unify clinicians’ voices, and maintain reference datasets for normative comparisons. ISCET now has 80 members, spread across the globe. In the Europe/Africa region meeting on January 24, 2024, ISCET established subcommittees to address specific needs. This paper outlines the rationale behind ISCET’s formation, its mission, objectives, ongoing initiatives, and organizational structure. Ongoing work focuses on surveying current international clinical eye tracker usage to inform standards development.

A pilot study on gaze and mouse data for user identification

Nicole Dalia Cilia
Andrea Pietro Arena
Antonino Barbera
Emanuele Borrello
Giuseppe Capizzi
Salvatore Cappello
Luigi Pio Faletra
Stefano Incardone
Salvatore Sorce

The human gaze is increasingly used for different applications in interactive systems. It is used for explicit and implicit user input detection and is suitable for various purposes, such as activation commands, emotion detection, or user identification.

Anyway, its reliability could not be good enough in some contexts or for some particular applications. In these cases, it is often integrated with other input sources to improve detection quality.

This paper presents a protocol for gaze and mouse data acquisition, along with a possible usage for user identification.

In particular, we show the experimental setup, the tasks the participants were asked to accomplish, and the data we acquired. We also propose a possible use case for these data and evaluate the results. We conclude by discussing how the data acquired with our protocol can be helpful in different applications and contexts, along with a plan for further development.

An Evaluation of Gaze-Based Person Identification with Different Stimuli

Wolfgang Fuhl
Dennis Grüneberg
Abdullah Yalvac

In eye tracking, the most promising approaches for person identification are based on the iris. Therefore, the iris of a subject is extracted and compared against a database of stored iris templates or directly classified by a deep neural network. While this approach is robust and has found its way into many practical applications like security access devices, it has some limitations, since it is possible to fake the iris. This can be done either by presenting a rendered image or by using a contact lens. Our research focuses on the gaze behavior of a subject for person identification. Therefore, we present different stimuli as well as moving dots. In past research, only a static dot has been used so far. We show that, especially, the combination of different stimuli is a promising approach and could be used as an additional security layer in combination with iris recognition.

An Eyelid Simulator for Zero Shot Eyelid Segmentation

Wolfgang Fuhl

Eyelid detection is an important feature for fatigue estimation in autonomous driving, for example. It allows for the estimation of driver readiness if a takeover event occurs, and is therefore crucial for the safety of car passengers and traffic participants. We propose to use two splines and two randomly filled regions as a simple eyelid simulator for deep neural network training. This allows a fast generation of perfectly labeled training images and only needs a fraction of computational resources if compared to rendering-based approaches. We compared our approach against the training on real-world images as well as other rendering-based approaches in terms of accuracy and resource consumption. In addition, we evaluated different combinations of the rendering-based approaches, real-world images, and our approach.

AVAtt : Art Visual Attention dataset for diverse painting styles

Mohamed Amine Kerkouri
Marouane Tliba
Aladine Chetouani
Alessandro Bruno

Preserving cultural heritage is paramount for societal and historical identity. Paintings, spanning ancient to modern eras, are pivotal subjects under constant scrutiny. As art reflects human creativity, studying human visual behavior toward paintings becomes increasingly vital. Thus, we introduce the AVAtt dataset, providing eye movement data for a diverse collection of painting styles across various ages and geographical origins. This dataset aims to facilitate the development and evaluation of computational saliency and scanpath prediction methods in the unique domain of painting (Dataset available at Github).

Car driving temporal cognitive workload estimation using features of eye tracking

Minoru Nakayama
Qian (Chayn) Sun
Jianhong (Cecilia) Xia

The mental workload of older motorists driving various routes on public roads in their own cars was evaluated using estimations of driver’s attention levels based on saccade rates and pupillary changes. Levels of attention while driving were calculated using a Bayesian modelling technique. This was employed due to the extensive amount of data measured over the road course, which was lengthy. Mean attention levels of 5 categories of driving routes were summarised, and routes with turns required additional attention resources. Individual attention levels and confidence intervals correlate with some ratings of NASA-TLX factors, and with the factor for Frustration in particular.

Enhancing Cognitive Assessment: Integrating Hand and Eye Tracking in the Digital Trail-Making Test for Mild Cognitive Impairment

Gustavo Juantorena
Waleska Berrios
María Cecilia Fernández
Agustin Ibanez
Agustín Petroni
Juan E. Kamienkowski

We introduce a novel application of a computerized version of the Trail Making Test (c-TMT), designed to track hand and eye movements during multiple trials to tackle the main limitations of the classic pen-and-pencil version. Adults with mild cognitive impairment (MCI) and a matched control group were studied to obtain fine-grained metrics during the task. Statistically significant differences between groups were found in the scanpath length, while more traditional measures, such as time to complete trials, were not discriminative. We advocate for using new complementary measurements and digital tools in neuropsychology to improve the precision of neuropsychological tests.

Enhancing User Gaze Prediction in Monitoring Tasks: The Role of Visual Highlights

Zekun Wu
Anna Maria Feit

This study examines the role of visual highlights in guiding user attention in drone monitoring tasks, employing a simulated interface for observation. The experiment results show that such highlights can significantly expedite the visual attention on the corresponding area. Based on this observation, we leverage both the temporal and spatial information in the highlight to develop a new saliency model: the highlight-informed saliency model (HISM), to infer the visual attention change in the highlight condition. Our findings show the effectiveness of visual highlights in enhancing user attention and demonstrate the potential of incorporating these cues into saliency prediction models.

Exploring Eye Tracking as a Measure for Cognitive Load Detection in VR Locomotion

Hong Gao
Enkelejda Kasneci

Eye tracking data has long been recognized as a reliable indicator of user cognitive load levels during human-computer interaction (HCI) tasks. However, its potential in the context of virtual reality (VR) remains relatively unexplored. Here, we present an ongoing study aimed at investigating the feasibility of detecting cognitive load in VR, particularly during VR locomotion, using an eye-tracking-based machine-learning approach. Data were collected using a within-subjects design, with participants performing VR locomotion tasks using five locomotion techniques. Our preliminary analyses validate the effectiveness of leveraging eye-tracking data as informative features in uncovering cognitive load in VR locomotion contexts, which motivates our further explorations.

Eye movements during the performance of the neuropsychological Trail Making Test

Joan Goset
Flors Vinuela-Navarro
Clara Mestre
Mikel Aldaba
Meritxell Vilaseca

The Trail Making Test (TMT) is a neuropsychological test to assess cognitive impairment. Although it is widely used in clinical practice, it presents some limitations such as the lack of timeliness. The oculomotor system is altered with cognitive impairment; thus, the objective of this work was to study whether the analysis of eye movements during the test can provide further information for an earlier diagnosis. Five healthy participants were recruited the study and performed the TMT while their eye movements were recorded. The results showed relations between saccadic and fixational parameters and time to completion, suggesting the potential of eye tracking technology to complement the test and potentially overcome its limitations.

Eye tracking data set of academics making an omelette: An egg-breaking work

Yannick Sauer
Rajat Agarwala
Patrizia Lenhart
Regine Lendway
Björn Severitt
Alexander Neugebauer
Benedikt Werner Hosp
Nora Jane Castner
Siegfried Wahl

Just as there are numerous ways to cook an egg, there are numerous ways to recreate a YouTube video of cooking an omelette. We created a dataset of 10 academics replicating a viral video of making an omelette. We evaluated the saccade behavior during the varying subtasks and found differences related to the actions (whisking, sprinkling, etc.) and the objects (eggs, butter, plate, etc.). This dataset can further offer insight into eye movements in complex tasks and is potentially even applicable for task planning and intention prediction. The available data can be found at https://zenodo.org/doi/10.5281/zenodo.10875267.

Gaze Decoding with Sensory and Motor Cortical Activity

Kendra K Noneman
J. Patrick Mayo

Eye movements require neuronal processing of visual stimuli and transmission of motor commands to the eye muscles. The brain circuitry and cortical regions responsible for sensory-motor transformations are generally understood, but deciphering the specific neuronal patterns underlying particular eye movements remains challenging. We used high-resolution eye tracking, invasive neuronal recordings, and machine learning to reveal, for the first time, the information encoded within neuronal populations in sensory and motor cortical areas that governs gaze behavior. We achieved high prediction accuracy when decoding continuous eye position using spiking activity. Decoders trained with only sensory or motor activity showed roughly equal performance, and we explored other methods of combining activity into a single model to enhance performance. Our results estimate the upper bound on decoding accuracy using brain recordings, shed light on sensory and motor cortical signals for gaze decoding, and offer insights into harnessing spatially broader brain signals for brain-computer interfaces.

Gaze-based Assessment of Expertise in Chess

Wolfgang Fuhl
Gazmend Hyseni

Chess is known worldwide and has acquired a deep cultural significance. It is recognized as a sport by the International Olympic Committee, and this sport has around 600 million players worldwide. Since chess is a strategic game, it requires an estimate of the future actions of the opposite player and the integration of these estimates into the own planning. This makes it a perfect sport to study expertise classification based on eye tracking data, since the players sit still and only have visual information about the actions of the opposite player. We conducted a small study and investigated different features to estimate the outcome of a chess game. The level of expertise of the players was determined by the number of years they had played chess already. We found, that transfer learning seems to be a promising approach since it can be trained on larger amounts of data without annotations.

Impact of reward expectation on pupillary change during an adaptive two player card game

Ryo Yasuda
Minoru Nakayama

Change of reward expectation during a competitive card game was evaluated using pupil responses. In order to simulate interactive play of participants in a poker game, the game process was fully controlled by the computer player. As a result, pupil response based on reward expectation was examined. Also, participant’s pupil size dilates in response to the change of reward expectation when the computer player provides some expectation.

Attempts on detecting Alzheimer's disease by fine-tuning pre-trained model with Gaze Data

Junichi Nagasawa
Yuichi Nakata
Mamoru Hiroe
Yujia Zheng
Yutaka Kawaguchi
Yuji Maegawa
Naoki Hojo
Tetsuya Takiguchi
Minoru Nakayama
Maki Uchimura
Yuma Sonoda
Hisatomo Kowa
Takashi Nagamatsu

This poster presents a study on detecting Alzheimer’s disease (AD) using deep learning from gaze data. In this study, we modify an existing pre-trained deep neural network model, gazeNet, for transfer learning. The results suggest the possibility of applying this method to mild cognitive impairment screening tests.

Perceptual Evaluation of Masked AutoEncoder Emergent Properties Through Eye-Tracking-Based Policy

Marouane Tliba
Mohamed Amine Kerkouri
Aladine Chetouani
Alessandro Bruno
Mohammed El Hassouni
Arzu Çöltekin

The advancement of image restoration, especially in reconstructing missing or damaged image areas, has benefited significantly from self-supervised learning techniques, notably through the recent Masked Auto Encoder (MAE) strategy. In this project, we leverage eye-tracking data to enhance image reconstruction quality, and more specifically, with fixation-based saliency combined with the MAE strategy. By examining the emergent properties of representation learning and drawing parallels to human perceptual observation, we focus on how eye-tracking data informs the selection of image patches for reconstruction, aligning computational methods with human visual perception. Our findings reveal the potential of integrating eye-tracking insights to improve the accuracy and perceptual relevance of self-supervised learning models in computer vision. This study thus underscores the synergy between computational image restoration methods and human perception, facilitated by eye-tracking technology, opening new directions and insights for both fields. Our experiments are available for reproducibility in this GitHub Repository.

Predictive Modelling of Cognitive Workload in VR: An Eye-Tracking Approach

Dominik Szczepaniak
Monika Harvey
Fani Deligianni

Cognitive training can boost and sharpen the brain’s abilities to remember, focus, and switch between different tasks. One of the key elements of cognitive training is cognitive load. It allows a manipulation of the intensity of the intervention to suit the participant’s ability level and keep the session enjoyable, i.e. neither too frustrating/hard nor too boring/easy). However, measuring cognitive workload in an objective way is still under-researched and difficult. Here, we have developed a novel sustained attention Virtual Reality (VR) task, using Unity, that aims to predict load in a controlled manner. We demonstrate promising results in that machine learning algorithms can identify perceived as well as objective difficulty of the game accurately, using a combination of eye-tracking and physiological data obtained directly within the VR environment.

Reporting Eye-Tracking Data Quality: Towards a New Standard

Deborah N. Jakobi
Daniel G. Krakowczyk
Lena A. Jäger

Eye-tracking datasets are often shared in the format used by their creators for their original analyses, usually resulting in the exclusion of data considered irrelevant to the primary purpose. In order to increase re-usability of existing eye-tracking datasets for more diverse and initially not considered use cases, this work advocates a new approach of sharing eye-tracking data. Instead of publishing filtered and pre-processed datasets, the eye-tracking data at all pre-processing stages should be published together with data quality reports. In order to transparently report data quality and enable cross-dataset comparisons, we develop data quality reporting standards and metrics that can be automatically applied to a dataset, and integrate them into the open-source Python package pymovements (https://github.com/aeye-lab/pymovements).

SARA: Smart AI Reading Assistant for Reading Comprehension

Enkeleda Thaqi
Mohamed Omar Mantawy
Enkelejda Kasneci

SARA integrates Eye Tracking and state-of-the-art large language models in a mixed reality framework to enhance the reading experience by providing personalized assistance in real-time. By tracking eye movements, SARA identifies the text segments that attract the user’s attention the most and potentially indicate uncertain areas and comprehension issues. The process involves these key steps: text detection and extraction, gaze tracking and alignment, and assessment of detected reading difficulty. The results are customized solutions presented directly within the user’s field of view as virtual overlays on identified difficult text areas. This support enables users to overcome challenges like unfamiliar vocabulary and complex sentences by offering additional context, rephrased solutions, and multilingual help. SARA’s innovative approach demonstrates it has the potential to transform the reading experience and improve reading proficiency.

Investigation of Unconscious Gaze and Head Direction in Videoconference - Turning off Microphone/Camera by Unconscious Gaze and Head Direction -

Yui Atarashi
Kaisei Yokoyama
Buntarou Shizuki

The goal of this study is to quantitatively clarify the experience of forgetting to turn off the microphone and camera in videoconferences and to develop a system to prevent users from forgetting to turn off the camera and microphone in our following research, in which we plan to use users’ unconscious behavior during videoconferences. First, we conducted a questionnaire survey on 60 participants to clarify their experiences of forgetting to turn off the microphone and camera. Next, we conducted a simulated videoconference study with eight participants to analyze gaze and head direction as unconscious behaviors.

Usability of Responsible Gambling Information on Gambling Operator’ Websites: A Webcam-Based Eye Tracking Study

Ruijie Wang
Reece Bush-Evans
Emily Arden-Close
John Mcalaney
Sarah Hodge
Elvira Bolat
Sarah Thomas
Keith Phalp

Recent research underscored the resurgence of webcam-based eye tracking as a cost-effective, ecologically valid, and scalable choice, attributed to advancements in machine learning-assisted calibration methods. Online gambling poses a public health concern, necessitating innovative research methods and evidence-based strategies for harm reduction. This study investigates users’ fixation patterns on responsible gambling (RG) information in different formats on gambling websites using webcam-based eye tracking. Participants browsed six gambling websites to search for RG information. 19 participants’ data were analyzed on Areas of Interest (AOIs) categorized based on RG formats. The study has found that usability varied across RG formats and also highlights both feasibility and challenges of webcam-based eye tracking for online behavior research, with practical implications to gambling stakeholders.

Using mobile eye tracking for gaze- and head-contingent vision simulations

Yannick Sauer
Björn Severitt
Rajat Agarwala
Siegfried Wahl

Eye tracking enables the implementation of gaze-contingent visual impairment simulations, holding promise for research, clinical uses, and demonstrations. While head-mounted displays (HMDs) are common for such simulations, screen-based implementations can be an alternative for individuals who are reluctant or unable to use HMDs. Screen-based simulations are less likely to induce virtual reality sickness and allow eye contact and social interaction with the demonstrator or examiner. To improve the immersion, we use mobile eye tracking with fiducial markers for a head- and gaze-contingent simulation of visual impairments on a screen.

WEyeDS: A desktop webcam dataset for gaze estimation

Anatolii Evdokimov
Catherine Finegan-Dollak
Arryn Robbins

We introduce the Webcam Eye Dataset (WEyeDS) 1, a dataset of images collected from desktops and laptops. WEyeDS will be openly available with the goal of providing researchers a dataset for training gaze estimation models. Our preliminary dataset was collected from 38 participants who completed prosaccade tasks on their personal desktop or laptop computer. We benchmarked the dataset using a convolutional neural network. The results of our benchmark model demonstrate some challenges in accuracy due to variability across devices and small sample size. Future work will significantly expand the number of participants and frames.

Zero Shot Learning in Pupil Detection

Wolfgang Fuhl

In eye tracking, pupil detection is a crucial step for gaze estimation. While there are a plethora of datasets with accurate annotations, new devices, such as differently placed cameras, usually require more data to be annotated since the new perspective is not part of the datasets so far. The research community has already published multiple simple simulators for data generation, as well as rendering-based approaches for the human eye. We created a dataset with different camera perspectives along with different challenges and evaluated the pupil simulators as well as the rendering-based approaches for zero-shot pupil detection. In our evaluation, we highlight the limitations of the simulators and the rendering-based approaches in terms of the different challenges.

Doctoral Consortium

Beyond Evaluation: Utilizing Gaze-Based Interactions for Digital Cartography

Michaela Vojtechovska

Despite the established use of eye-tracking in cartographic evaluation, its potential for interactive purposes is underutilized. The described dissertation aims to bridge the existing gap by developing a framework that incorporates gaze-based interactions within digital cartography. Next, we aim to empirically test their effectiveness in improving task completion speed, accuracy, information recall, and user satisfaction compared to traditional controls. Preliminary results using the GazePoint HD 3 eye-tracker within a Leaflet.js map application demonstrate this approach’s technical feasibility and potential benefits. The research methodology includes a PRISMA systematic review of current gaze-based interactions in cartography, the development of a web-based technological framework, and comprehensive empirical testing of hypotheses. Gaze-based interactions in cartography could benefit sectors with intensive map reliance, such as emergency response, by enabling faster task execution or education by simplifying the teaching of complex natural or social processes through gaze-adaptive map applications. By moving eye-tracking use from evaluation to interaction, the dissertation could offer a path for enhancing decision-making, learning, and operational efficiency in a world increasingly dependent on digital mapping solutions.

Evaluating Subject Behavior During Ingestion: A Portable Eye-Tracking Approach

Izaskun Cia

Eye tracking is a promising technique to measure gaze direction on food while eating that can be used as a tool to asses eating habits and disorders. However, the studies done as yet use high-quality (infrared) eye trackers under controlled lab environments that heavily constrain user movements and behaviour during food intake. This thesis aims to obtain a robust low-cost eye tracking system that can be used out of the lab, getting rid of a computer and furthermore, to delve in techniques for adapting an eye tracking model to a different application with less effort and minimal calibration.

Joint Attention on the Future: Pro-Ecological Attitudes Change In Collaboration.

Iga Szwoch

Climate change is a pressing global concern, with increasing global warming rates demanding immediate attention. People’s perspectives on this issue are multifaceted and can be polarized but are not immutable. My doctoral project seeks to determine whether subtle gaze manipulations can foster pro-ecological attitude change in collaborative settings. Specifically, we want to explore the impact of eliciting joint attention to improve shared understanding and urgency with regard to climate change.

Methods based on eye tracking to assess cognitive impairment

Joan Goset

Eye movements have been shown to be altered in neurological disorders such as Alzheimer's disease (AD), Parkinson's disease (PD), Multiple sclerosis (MS) etc. This has prompted researchers to explore potential links between the altered ocular motility observed and the potential impact on eye movements of individuals. Besides common neurological disorders, another emerging concern of cognitive health is the COVID-19 pandemic. Once the main urgency of the pandemic has finished, the long-term impact of the disease has raised and, in fact, patients with the post-COVID-19 condition (PCC) report a broad spectrum of symptoms, including cognitive impairment. Hence, eye movements might be used as a new biomarker for the assessment of cognition impairment, in particular in PCC patients. The most accurate and precise way to monitor eye movements is the eye tracking technology, which allows researchers to monitor and record the eyes position. More recently, the development of mobile and remote eye tracking devices has made possible the collection of data from larger and more diverse samples. In addition, eye tracking is a non-invasive technology and therefore, it is expected to be useful in a clinical environment, although some limitations of the current state-of-the-art systems must be still overcome. Several commercial eye trackers are currently available and they offer different features in functioning, software and hardware configurations. Most eye trackers are used now in laboratory conditions and it is a challenging task to stablish measurement protocols and configurations for clinical purposes, especially for neurological disorders. In view of all this, the aim of this thesis is to develop new methods based on eye tracking technology to assess ocular motility with the ultimate goal of analysing the patient's cognitive status. The technology developed will be integrated into forthcoming clinical studies and specific visual tasks for the analysis of eye movements in patients with cognitive impairment will be designed, with the goal of identifying useful biomarkers for its early diagnosis. In particular, a special emphasis will be on post-COVID-19 patients although other neurological disorders will also be analysed.

Task Classification using Eye Movements and Graph Neural Networks

Jarod P. Hartley

Eye movements are a key part of increasing our understanding of human vision and the oculomotor system as they are an excellent proxy for attention. It is for this reason studies exploring task-based eye movements and the relationships between said tasks have recently become popular. Recorded eye movements allow us to reverse engineer the steps taken by the oculomotor system and therefore offer us insights into the resource allocation process, such as attention, that our systems undertake. While studies have shown the successful classification of tasks, notably [MACINNES, 2018b; HENDERSON, 2013c; BORJI and ITTI, 2014a], we wish to further explore the relationship between the Search and Memorisation tasks in the context of task switching. Throughout this paper we hope to use graph neural networks to successfully classify these tasks by utilising eye movement data. Magnetoencephalography (MEG) data was also recorded and will be integrated in future work.

Using eye tracking to enhance the efficiency and safety of tram drivers - designing visual attention training

Anna Warchol-Jakubowska

The research is part of a collaborative doctoral project between the Eye-Tracking Research Center at SWPS University in Warsaw and Warsaw Trams Ltd. Two comparative eye-tracking studies were conducted involving both novice and expert tram drivers to describe visual behavior during tram operation accurately. Analysis of the eye-tracking data provided valuable insights into key moments for driving safety, guiding the development of an optimal training program for novice tram operators. This paper presents preliminary ideas for the planned training, incorporating modern technologies such as eye tracking and augmented reality.